[opencms-dev] Registry xml for PDF and WORD document search

M Butcher mbutcher at grcomputing.net
Mon Nov 24 21:36:01 CET 2003


DOH! I should have looked more carefully... the module uses the 
extension mapper to figure out what kind of file it is dealing with, so 
as long as word docs end with .doc and pdf files with .pdf, then the 
indexManger will use the correct *Document class.

As Stephan pointed out, you will need to make sure you get the 
textmining classes.

Thanks for catching my mistake, Stephan.

Matt

Stephan Hartmann wrote:
> Well, i never tried it yet but for me it looks ok. The name attribute of the
> fileType tag only has informational character for the debugging output.
> Else what you need to include are the libraries from www.textmining.org
> 
> Bye,
> Stephan
> 
> 
> ----- Original Message -----
> From: "M Butcher" <mbutcher at grcomputing.net>
> To: <opencms-dev at opencms.org>; "Hartmann, Waehrisch & Feykes GmbH"
> <hartmann at waehrisch-feykes.de>; "Ernesto De Santis"
> <ernesto.desantis at colaborativa.net>
> Sent: Monday, November 24, 2003 7:56 PM
> Subject: Re: [opencms-dev] Registry xml for PDF and WORD document search
> 
> 
> 
>>Trevor,
>>
>>I'm not sure. I think you need a Content Definition. I'm copying Stephen
>>on this -- he did most of the work on this part of the module. I'll also
>>copy Ernesto, who contributed the two classes.
>>
>>Stephen, Ernesto -- if you can answer, I'll incorporate your answer into
>>the README/INSTALL files for the module.
>>
>>Matt
>>
>>Trevor Lee wrote:
>>
>>>Hi all,
>>>
>>>I was wondering what the registry.xml file should have inorder to get
> 
> lucene
> 
>>>to index word and pdf files using Ernesto De Santis's PDFDocument and
>>>WordDocument classes?
>>>
>>>I've got the following in my registry.xml file:
>>>
>>>                <docFactory enabled="true" type="binary">
>>>                    <fileType name="pdftext">
>>>                        <extension>.pdf</extension>
>>>
>>><class>net.grcomputing.opencms.search.lucene.PDFDocument</class>
>>>                    </fileType>
>>>                    <fileType name="doctext">
>>>                        <extension>.doc</extension>
>>>
>>><class>net.grcomputing.opencms.search.lucene.WordDocument</class>
>>>                    </fileType>
>>>                </docFactory>
>>>
>>>Where do i define the "pdftext" and "doctext" types?
>>>
>>>What else needs to be changed or included?
>>>
>>>Thanks in advance for your help.
>>>
>>>Cheers
>>>Trevor
>>>
>>>_______________________________________________
>>>This mail is send to you from the opencms-dev mailing list
>>>To change your list options, or to unsubscribe from the list, please
> 
> visit
> 
>>>http://mail.opencms.org/mailman/listinfo/opencms-dev
>>
>>
>>_______________________________________________
>>This mail is send to you from the opencms-dev mailing list
>>To change your list options, or to unsubscribe from the list, please visit
>>http://mail.opencms.org/mailman/listinfo/opencms-dev
> 
> 
> _______________________________________________
> This mail is send to you from the opencms-dev mailing list
> To change your list options, or to unsubscribe from the list, please visit
> http://mail.opencms.org/mailman/listinfo/opencms-dev





More information about the opencms-dev mailing list