[opencms-dev] Lucene and Binary Documents

M Butcher mbutcher at grcomputing.net
Fri Oct 17 20:39:01 CEST 2003


Ben,

On Thu, 2003-10-16 at 22:46, Ben Rometsch wrote:
> Hi Matt,
> 
> Thanks for the reply. If I just want to get the document title to be
> included in the Lucene index, looking at the code in the
> net.grcomputing.opencms.search.BodylessDocument class it appears to ignore
> what the CMSObject is, and attempt to index it regardless. Is this correct?
> 

Correct. It will already index the title, but it will not attempt to
index the body.

> If this is the case, is it simply a matter of instructing Lucene to index
> obects other than HTML files in the VFS  (i.e. Documents) ? Or would I have
> to create another class, something like
> net.grcomputing.opencms.search.FileDocument and add a new hook into that
> class via the registry.xml fragment?  Or does the BodyLess document provide
> this functionality, and it's just a matter of adding a new XML fragment to
> the registry.xml are?

Again, you are right -- simply adding the appropriate configuration to
the registry.xml file will suffice. I believe that you will just need to
extend the plainDocument tag set to include extensions and processors...
I _think_ that binary files get handled by the plain handler.

Matt

-- 
M Butcher <mbutcher at grcomputing.net>



More information about the opencms-dev mailing list