[opencms-dev] Indexed searching problem: exact sentences not found

Celio Faria Jr celiofariajr at gmail.com
Thu Feb 21 11:18:22 CET 2008


Hi all,

I am trying to index documents in Portuguese. The indexing (mostly of PDF
files) goes ok. The index configuration is as follows:

<indexsource>
                <name>source1</name>
                <indexer class="org.opencms.search.CmsVfsIndexer"/>
                <resources>
                    <resource>/sites/default/</resource>
                </resources>
                <documenttypes-indexed>
                    <name>xmlpage</name>
                    <name>xmlcontent</name>
                    <name>text</name>
                    <name>pdf</name>
                    <name>rtf</name>
                    <name>html</name>
                    <name>msword</name>
                    <name>msexcel</name>
                    <name>mspowerpoint</name>
                    <name>image</name>
                    <name>generic</name>
                </documenttypes-indexed>
            </indexsource>

<fieldconfiguration>
                <name>standard</name>
                <description>The standard OpenCms 7.0 search index field
configuration.</description>
                <fields>
                    <field name="content" display="%(key.field.content)"
store="true" index="true" excerpt="true">
                        <mapping type="content"/>
                    </field>
                    <field name="title-key" display="-" store="true"
index="untokenized" boost="0.0">
                        <mapping type="property">Title</mapping>
                    </field>
                    <field name="title" display="%(key.field.title)"
store="false" index="true">
                        <mapping type="property">Title</mapping>
                    </field>
                    <field name="keywords" display="%(key.field.keywords)"
store="true" index="true">
                        <mapping type="property">Keywords</mapping>
                    </field>
                    <field name="description" display="%(
key.field.description)" store="true" index="true">
                        <mapping type="property">Description</mapping>
                    </field>
                    <field name="meta" display="%(key.field.meta)"
store="false" index="true">
                        <mapping type="property">Title</mapping>
                        <mapping type="property">Keywords</mapping>
                        <mapping type="property">Description</mapping>
                    </field>
                </fields>
            </fieldconfiguration>

           <analyzer>
                <class>org.apache.lucene.analysis.snowball.SnowballAnalyzer
</class>
                <stemmer>Portuguese</stemmer>
                <locale>pt</locale>
            </analyzer>

But when I try to search for a sentence within quotation marks that has stop
words, lets say "teste de programa", it shows no result! Of course, if I
open the PDF and search for the sentence within, it appears...

I can only thought of a real basic (and strange) solution: to create a new
Analyzer without the stop words (and possibly without stemming). This way,
every search within quotation marks would bring me the "correct" results, at
the cost of a very much greater index.

Any clues?

TIA,
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://webmail.opencms.org/pipermail/opencms-dev/attachments/20080221/623be188/attachment.htm>


More information about the opencms-dev mailing list