<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Word 14 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Tahoma;
panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri","sans-serif";
mso-fareast-language:EN-US;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
span.E-MailFormatvorlage17
{mso-style-type:personal;
font-family:"Calibri","sans-serif";
color:windowtext;}
span.E-MailFormatvorlage18
{mso-style-type:personal-reply;
font-family:"Arial","sans-serif";
color:#1F497D;
font-weight:normal;
font-style:normal;}
span.hps
{mso-style-name:hps;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:70.85pt 70.85pt 2.0cm 70.85pt;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="DE" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal"><span class="hps"><span lang="EN">Hey Kai,<o:p></o:p></span></span></p>
<p class="MsoNormal"><span class="hps"><span lang="EN"><o:p> </o:p></span></span></p>
<p class="MsoNormal"><span class="hps"><span lang="EN">I have added</span></span><span lang="EN">
<span class="hps">Chinese</span> <span class="hps">similar.</span> <span class="hps">
My configuration</span> <span class="hps">uses the</span> <span class="hps">SmartChineseSentenceTokenizerFactory</span>.
<span class="hps">Currently, there</span> <span class="hps">are no problems</span>
<span class="hps">which were</span> <span class="hps">reported</span> <span class="hps">
by the customer.</span> <o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN"><analyzer><o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN"> <tokenizer class="solr.SmartChineseSentenceTokenizerFactory"/><o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN"> <filter class="solr.SmartChineseWordTokenFilterFactory"/><o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN"> <filter class="solr.LowerCaseFilterFactory"/><o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN"> <filter class="solr.PositionFilterFactory" /><o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN"></analyzer><o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN"><br>
<span class="hps">In addition,</span> <span class="hps">I still have a</span> <span class="hps">
copy</span> <span class="hps">Field</span> <span class="hps">in my schema.xml</span>.
<span class="hps">Perhaps this helps</span>?<br>
<span class="hps"><copyField</span> <span class="hps">source="*_zh"</span> <span class="hps">
dest="text_zh"/></span> <br>
<br>
<span class="hps">Regards,<o:p></o:p></span></span></p>
<p class="MsoNormal"><span class="hps"><span lang="EN">Patric <o:p></o:p></span></span></p>
<p class="MsoNormal"><span class="hps"><span lang="EN"><o:p> </o:p></span></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<div>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif";mso-fareast-language:DE">Von:</span></b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif";mso-fareast-language:DE"> opencms-dev-bounces@opencms.org [mailto:opencms-dev-bounces@opencms.org]
<b>Im Auftrag von </b>Schliemann, Kai<br>
<b>Gesendet:</b> Freitag, 20. Juni 2014 18:59<br>
<b>An:</b> 'The OpenCms mailing list (opencms-dev@opencms.org)'<br>
<b>Betreff:</b> [opencms-dev] proper configuration solr configuration for chinese<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Hi list,<o:p></o:p></p>
<p class="MsoNormal"><span lang="EN-US">can somebody give me a hint on a proper configuration of the solr search for Chinese or check if our configuration is correct, please.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">I defined the following in \WEB-INF\solr\conf\schema.xml:<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">…<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><types><o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">… <o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"> <!—found this on the net, but not sure if it is the right tokenizer and if I need some filters --><o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"> <fieldType name="text_zh" class="solr.TextField" positionIncrementGap="100"><o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"> <analyzer><o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"> <tokenizer class="solr.CJKTokenizerFactory"/><o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"> </analyzer><o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"> </fieldType><o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">…<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"></types><o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">…<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><fields><o:p></o:p></span></p>
<p class="MsoNormal" style="text-indent:35.4pt"><span lang="EN-US">…<o:p></o:p></span></p>
<p class="MsoNormal" style="text-indent:35.4pt"><span lang="EN-US"><!—just copied “text_de” fields and replaced “de” by “zh” -->
<o:p></o:p></span></p>
<p class="MsoNormal" style="text-indent:35.4pt"><span lang="EN-US"><field name="text_zh" type="text_zh" indexed="true" stored="false" multiValued="true"/><!-- Catchall for Chinese text fields --><o:p></o:p></span></p>
<p class="MsoNormal" style="text-indent:35.4pt"><span lang="EN-US">…<o:p></o:p></span></p>
<p class="MsoNormal" style="text-indent:35.4pt"><span lang="EN-US"><!—just copied “text_de” fields and replaced “de” by “zh” -->
<o:p></o:p></span></p>
<p class="MsoNormal" style="text-indent:35.4pt"><span lang="EN-US"><dynamicField name="*_zh" type="text_zh" indexed="true" stored="true"/><o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"></fields><o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">…<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">I get search results but some search phrases don’t give results, even if the word is in the document (checked with luke).<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Thanks a lot in advance.<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Best regards <o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-US">Kai<o:p></o:p></span></p>
</div>
</body>
</html>