[opencms-dev] Ideas for improving search performance and Lucene integration
Jonathan Woods
jonathan.woods at scintillance.com
Fri Jun 2 08:24:55 CEST 2006
I'd welcome any comments on the proposals belows, a few suggestions for
improving the performance and integration of Lucene within OpenCms.
1. With the default searching mechanism, OpenCms constructs a new Lucene
IndexSearcher for every search. IndexSearcher (and IndexReader)
construction is a relatively expensive operation - an order of magnitude
bigger than your average search, because it involves reading in data about
the whole index from disk - so it would be better if IndexSearcher instances
were cached by OpenCms. This is a standard Lucene pattern: IndexSearchers
are threadsafe, and Lucene implements appropriate index locking at the
inter-process level. The cache entry for an IndexSearcher should be
invalidated (after closing the searcher) on any event which causes an index
write operation.
2. It's difficult to do much Lucene-related customisation in OpenCms
because the beautifully simple Lucene API is hidden. The single simplest
way of revealing it again, providing we do (1) above, is for
CmsSearchManager / CmsSearchIndex to expose a method
public IndexSearcher getIndexSearcher(CmsObject cmsObject, CmsProject
project);
whereupon we can construct Queries, use Filters and Sort objects and process
Hits just how we like. Characteristics of this method:
(a) it should retrieve instances from a cache keyed by project, and lazily
instantiate (and return) new entries when necessary
(b) it should be the recommended way for code to get hold of an
IndexSearcher, and it should dispense only non-closed IndexSearcher cache
entries
(c) calling code should be prepared to re-try in the event that an index is
locked or it finds itself in possession of an IndexSearcher which has been
close[d]() post-acquisition by another thread
(d) after a few minutes' thought I don't believe this weakens security in
any way, though that would have to be considered. DOS attacks by rogue JSPs
are always possible in any case, and other attack vectors are best not
discussed on-list!
Jon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://webmail.opencms.org/pipermail/opencms-dev/attachments/20060602/1b46d558/attachment.htm>
More information about the opencms-dev
mailing list