[opencms-dev] proper configuration solr configuration for chinese

Patric Dosch patric.dosch at virtual-identity.com
Wed Jun 25 12:01:13 CEST 2014


Hey Kai,

I have added Chinese similar. My configuration uses the SmartChineseSentenceTokenizerFactory. Currently, there are no problems which were reported by the customer.

<analyzer>
    <tokenizer class="solr.SmartChineseSentenceTokenizerFactory"/>
    <filter class="solr.SmartChineseWordTokenFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.PositionFilterFactory" />
</analyzer>


In addition, I still have a copy Field in my schema.xml. Perhaps this helps?
<copyField source="*_zh" dest="text_zh"/>

Regards,
Patric



Von: opencms-dev-bounces at opencms.org [mailto:opencms-dev-bounces at opencms.org] Im Auftrag von Schliemann, Kai
Gesendet: Freitag, 20. Juni 2014 18:59
An: 'The OpenCms mailing list (opencms-dev at opencms.org)'
Betreff: [opencms-dev] proper configuration solr configuration for chinese

Hi list,
can somebody give me a hint on a proper configuration of the solr search for Chinese or check if our configuration is correct, please.

I defined the following in \WEB-INF\solr\conf\schema.xml:
...
<types>
...
    <!-found this on the net, but not sure if it is the right tokenizer and if I need some filters -->
    <fieldType name="text_zh" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
                <tokenizer class="solr.CJKTokenizerFactory"/>
      </analyzer>
    </fieldType>
...
</types>
...
<fields>
...
<!-just copied "text_de" fields and replaced "de" by "zh" -->
<field name="text_zh"             type="text_zh"      indexed="true"  stored="false" multiValued="true"/><!-- Catchall for Chinese text fields -->
...
<!-just copied "text_de" fields and replaced "de" by "zh" -->
<dynamicField name="*_zh"         type="text_zh"      indexed="true"  stored="true"/>
</fields>
...

I get search results but some search phrases don't give results, even if the word is in the document (checked with luke).


Thanks a lot in advance.

Best regards
Kai
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://webmail.opencms.org/pipermail/opencms-dev/attachments/20140625/79e62cf6/attachment.htm>


More information about the opencms-dev mailing list