[opencms-dev] Lucene Search Integration
Marian Kasala
marian.kasala at apsoft.sk
Fri Apr 26 13:39:43 CEST 2002
Hi Simon,
I have extended your first version of Lucene - OpenCms integration.
There is reindexing feature and support of additional formats (PDF,RTF).
Reindexing updates only differences.
PDF indexing is done either with support of Etymon PJ library
(http://www.etymon.com/pj/)
or any suitable external convertor (batch conversion)
Because Etymon PJ is limited in text extraction (doesn't work with
encrypted pdfs, and
also in some cases extracted text is collapsed in single word)
and I didn't find any other java library I added support for external batch
extractor.
I use for this purpose Advanced Pdf to HTML converter v. 1.4
but this is licensed. (http://www.intrapdf.com/index.html)
I enclose no documentation, but outer interface is almost same as in first
Simon's version
except that you can specify index directory in templates:
<indexDirectory>webapps/opencms/index</indexDirectory>
Maybe you or anyone may find these files usefull so I'm posting them.
Best Regards,
Marian Kasala
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lucene-opencms.zip
Type: application/x-zip-compressed
Size: 15026 bytes
Desc: not available
URL: <https://webmail.opencms.org/pipermail/opencms-dev/attachments/20020426/4568f2e2/attachment.bin>
More information about the opencms-dev
mailing list