[opencms-dev] proper configuration solr configuration for chinese

Schliemann, Kai K.Schliemann at comundus.com
Fri Jun 20 18:59:26 CEST 2014


Hi list,
can somebody give me a hint on a proper configuration of the solr search for Chinese or check if our configuration is correct, please.

I defined the following in \WEB-INF\solr\conf\schema.xml:
...
<types>
...
    <!-found this on the net, but not sure if it is the right tokenizer and if I need some filters -->
    <fieldType name="text_zh" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
                <tokenizer class="solr.CJKTokenizerFactory"/>
      </analyzer>
    </fieldType>
...
</types>
...
<fields>
...
<!-just copied "text_de" fields and replaced "de" by "zh" -->
<field name="text_zh"             type="text_zh"      indexed="true"  stored="false" multiValued="true"/><!-- Catchall for Chinese text fields -->
...
<!-just copied "text_de" fields and replaced "de" by "zh" -->
<dynamicField name="*_zh"         type="text_zh"      indexed="true"  stored="true"/>
</fields>
...

I get search results but some search phrases don't give results, even if the word is in the document (checked with luke).


Thanks a lot in advance.

Best regards
Kai
-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 11168 bytes
Desc: not available
URL: <https://webmail.opencms.org/pipermail/opencms-dev/attachments/20140620/725507bd/attachment.bin>


More information about the opencms-dev mailing list