[opencms-dev] Search Engine integration - advice on available options?

M Butcher mbutcher at grcomputing.net
Wed Nov 5 04:11:01 CET 2003


Joe,

AFAIK, htdig basically spiders the site and builds an index. It searches 
over the HTML pages generated by OpenCms. For that reason, I'm not sure 
that it needs a module or anything. Configuring and running htdic is 
done outside of OpenCms

Lucene is (more or less) just a library for providing search 
functionality to a Java application. The Lucene module you've seen 
basically runs the indexing process on the VFS -- and it runs as part of 
OpenCms. Current additions to the Lucene module provide searching of Doc 
and PDF documents. The advantage to doing things this way (keep in mind, 
I'm biased toward this module) is that search applications can be 
tightly integrated with OpenCms. For instance, configuration is done via 
the registry.xml file, and since the search engine has access to the CMS 
information (e.g. content types, location in VFS, permissions), it can 
index in a more intelligent way. Also, search results can use CMS 
templates, which can take full advantage of the CMS (common templates, 
newsfeeds, etc.).

Matt

Joe McFadden wrote:

>Hi,
>
>We'd like to integrate OpenCMS with a search engine, and I'm looking
>for some general advice on what options are available and their relative
>merits.
>
>I've looked at the docs and the mailing list archive, and seen various
>posts on the details of setting up the lucene module,  plus a few about
>htdig. However, I can only find the lucence module in the module sandbox
>- is the htdig module still available?
>
>Any comments on pros and cons of htdig vs lucene vs whatever in the
>context of integrating with OpenCMS?
>
>This is for an intranet site; I will need to be able to index Word and
>PDF documents. Using OpenCMS 5.0 on Linux.
>We currently use htdig to index our existing static intranet site
>(which we plan to migrate to OpenCMS).
>
>Regards,
>Joe
>
>PS - thanks to Thomas Maarz for his response to my post last week on
>updating control code - your suggestion worked.
>
>
>_______________________________________________
>This mail is send to you from the opencms-dev mailing list
>To change your list options, or to unsubscribe from the list, please visit
>http://mail.opencms.org/mailman/listinfo/opencms-dev
>  
>





More information about the opencms-dev mailing list