[opencms-dev] Lucene problem indexing pdf content

Hartmann, Waehrisch & Feykes GmbH hartmann at waehrisch-feykes.de
Mon Mar 8 16:17:02 CET 2004


You can use both of them, but only one at a time. It is important that your
fileType is nested inside the docFactory for binary files and that this one
is activated. The name attribute of the fileType tag is for logging output
only.
If your PDFs are located in a download gallery, you will also have to
declare a directory location inside the directories section, as the system
folder is skipped even when you have a directory location for /.
Example:
       <directory location="/system/galleries/download/">
         <section>download</section>
         <subsearch>true</subsearch>
       </directory>

Bye,
Stephan

----- Original Message ----- 
From: "Thomas Fabbricante" <tom_fabbricante at wunderman.com>
To: <opencms-dev at opencms.org>
Sent: Monday, March 08, 2004 4:03 PM
Subject: [opencms-dev] Lucene problem indexing pdf content


> I successfully  imported net.grcomputing.opencms.search.lucene_1.5.zip,
> configured the registry.xml file, scheduled a task to index content and
ran
> the simple_search page.
>
> All document types (html,xml,word,plain) return hits on the simple search
> except pdfs.  The only hits I get are on the pdf titles.
>
> Content inside the pdf seems to be missed by the indexing process.
>
> I've seen the pdf section in the registry.xml file written 2 ways:
> <fileType name="PDF">
>   <extension>.pdf</extension>
>   <class>net.grcomputing.opencms.search.lucene.PDFDocument</class>
> </fileType>
>
> or
>
> <fileType name="pdftext">
>   <extension>.pdf</extension>
>   <class>net.grcomputing.opencms.search.lucene.PDFDocument</class>
> </fileType>
>
> Tried them both but got the same results. No content was indexed.
>
> Question 1:  Which form of the name attribute is correct, PDF or pdftext?
>
> Question 2:  How do I get my pdf content indexed?
>
> Thanks
> -tom
>
>
>
> ===============================================
> This transmission is confidential and intended
> solely for the person or organization to whom
> it is addressed.  It may contain privileged and
> confidential information.  If you are not the
> intended recipient, you should not copy,
> distribute or take any action in reliance on it.
>
> If you have received this transmission in error,
> please notify the sender at the e-mail address above.
> ================================================
>
> _______________________________________________
> This mail is send to you from the opencms-dev mailing list
> To change your list options, or to unsubscribe from the list, please visit
> http://mail.opencms.org/mailman/listinfo/opencms-dev
>




More information about the opencms-dev mailing list