[opencms-dev] Registry xml for PDF and WORD document search

Trevor Lee Trevor.Lee at 4Loop.com.au
Sun Nov 23 22:00:01 CET 2003


Hi all,

I was wondering what the registry.xml file should have inorder to get lucene
to index word and pdf files using Ernesto De Santis's PDFDocument and
WordDocument classes?

I've got the following in my registry.xml file:

                <docFactory enabled="true" type="binary">
                    <fileType name="pdftext">
                        <extension>.pdf</extension>

<class>net.grcomputing.opencms.search.lucene.PDFDocument</class>
                    </fileType>
                    <fileType name="doctext">
                        <extension>.doc</extension>

<class>net.grcomputing.opencms.search.lucene.WordDocument</class>
                    </fileType>
                </docFactory>

Where do i define the "pdftext" and "doctext" types?

What else needs to be changed or included?

Thanks in advance for your help.

Cheers
Trevor




More information about the opencms-dev mailing list