[opencms-dev] Indexed searching problem: exact sentences not found
Celio Faria Jr
celiofariajr at gmail.com
Thu Feb 21 11:18:22 CET 2008
Hi all,
I am trying to index documents in Portuguese. The indexing (mostly of PDF
files) goes ok. The index configuration is as follows:
<indexsource>
<name>source1</name>
<indexer class="org.opencms.search.CmsVfsIndexer"/>
<resources>
<resource>/sites/default/</resource>
</resources>
<documenttypes-indexed>
<name>xmlpage</name>
<name>xmlcontent</name>
<name>text</name>
<name>pdf</name>
<name>rtf</name>
<name>html</name>
<name>msword</name>
<name>msexcel</name>
<name>mspowerpoint</name>
<name>image</name>
<name>generic</name>
</documenttypes-indexed>
</indexsource>
<fieldconfiguration>
<name>standard</name>
<description>The standard OpenCms 7.0 search index field
configuration.</description>
<fields>
<field name="content" display="%(key.field.content)"
store="true" index="true" excerpt="true">
<mapping type="content"/>
</field>
<field name="title-key" display="-" store="true"
index="untokenized" boost="0.0">
<mapping type="property">Title</mapping>
</field>
<field name="title" display="%(key.field.title)"
store="false" index="true">
<mapping type="property">Title</mapping>
</field>
<field name="keywords" display="%(key.field.keywords)"
store="true" index="true">
<mapping type="property">Keywords</mapping>
</field>
<field name="description" display="%(
key.field.description)" store="true" index="true">
<mapping type="property">Description</mapping>
</field>
<field name="meta" display="%(key.field.meta)"
store="false" index="true">
<mapping type="property">Title</mapping>
<mapping type="property">Keywords</mapping>
<mapping type="property">Description</mapping>
</field>
</fields>
</fieldconfiguration>
<analyzer>
<class>org.apache.lucene.analysis.snowball.SnowballAnalyzer
</class>
<stemmer>Portuguese</stemmer>
<locale>pt</locale>
</analyzer>
But when I try to search for a sentence within quotation marks that has stop
words, lets say "teste de programa", it shows no result! Of course, if I
open the PDF and search for the sentence within, it appears...
I can only thought of a real basic (and strange) solution: to create a new
Analyzer without the stop words (and possibly without stemming). This way,
every search within quotation marks would bring me the "correct" results, at
the cost of a very much greater index.
Any clues?
TIA,
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://webmail.opencms.org/pipermail/opencms-dev/attachments/20080221/623be188/attachment.htm>
More information about the opencms-dev
mailing list