<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=us-ascii">
<META content="MSHTML 6.00.2900.2873" name=GENERATOR></HEAD>
<BODY>
<DIV><SPAN class=462465505-02062006><FONT face=Arial size=2>I'd welcome any
comments on the proposals belows, a few suggestions for improving the
performance and integration of Lucene within OpenCms.</FONT></SPAN></DIV>
<DIV><SPAN class=462465505-02062006><FONT face=Arial
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=462465505-02062006><FONT face=Arial size=2>1. With the
default searching mechanism, OpenCms constructs a new Lucene IndexSearcher for
every search. IndexSearcher (and IndexReader) construction is a relatively
expensive operation - an order of magnitude bigger than your average search,
because it involves reading in data about the whole index from disk - so it
would be better if IndexSearcher instances were cached by OpenCms. This is
a standard Lucene pattern: IndexSearchers are threadsafe, and Lucene implements
appropriate index locking at the inter-process level. The cache entry for
an IndexSearcher should be invalidated (after closing the searcher) on any event
which causes an index write operation.</FONT></SPAN></DIV>
<DIV><SPAN class=462465505-02062006><FONT face=Arial
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=462465505-02062006><FONT face=Arial size=2>2. It's
difficult to do much Lucene-related customisation in OpenCms because the
beautifully simple Lucene API is hidden. The single simplest way of
revealing it again, providing we do (1) above, is for CmsSearchManager /
CmsSearchIndex to expose a method</FONT></SPAN></DIV>
<DIV><SPAN class=462465505-02062006><FONT face=Arial
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=462465505-02062006><FONT face="Lucida Console" size=2>public
IndexSearcher getIndexSearcher(CmsObject cmsObject, CmsProject
project);</FONT></SPAN></DIV>
<DIV><SPAN class=462465505-02062006><FONT face=Arial
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=462465505-02062006><FONT face=Arial size=2>whereupon we can
construct Queries, use Filters and Sort objects and process Hits just how we
like. Characteristics of this method:</FONT></SPAN></DIV>
<DIV><SPAN class=462465505-02062006><FONT face=Arial
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=462465505-02062006><FONT face=Arial size=2>(a) it should
retrieve instances from a cache keyed by <FONT
face="Lucida Console">project</FONT>, and lazily instantiate (and return) new
entries when necessary</FONT></SPAN></DIV>
<DIV><SPAN class=462465505-02062006><FONT face=Arial
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=462465505-02062006></SPAN><FONT face=Arial><FONT size=2><SPAN
class=462465505-02062006>(b) </SPAN>i<SPAN class=462465505-02062006>t should be
</SPAN><SPAN class=462465505-02062006>the recommended way for code to get hold
of an IndexSearcher, and it should dispense only non-closed IndexSearcher cache
entries</SPAN></FONT></FONT></DIV>
<DIV><FONT face=Arial><FONT size=2><SPAN
class=462465505-02062006></SPAN></FONT></FONT> </DIV>
<DIV><FONT face=Arial><FONT size=2><SPAN class=462465505-02062006>(c) calling
code should be prepared to re-try in the event that an index is locked or it
finds itself in possession of an IndexSearcher which has been close[d]()
post-acquisition by another thread</SPAN></FONT></FONT></DIV>
<DIV><FONT face=Arial><FONT size=2><SPAN
class=462465505-02062006></SPAN></FONT></FONT> </DIV>
<DIV><FONT face=Arial><FONT size=2><SPAN class=462465505-02062006>(d) after a
few minutes' thought I don't believe this weakens security in any way, though
that would have to be considered. DOS attacks by rogue JSPs are always
possible in any case, and other attack vectors are best not discussed
on-list!</SPAN></FONT></FONT></DIV>
<DIV><SPAN class=462465505-02062006><FONT face=Arial
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=462465505-02062006><FONT face=Arial
size=2>Jon</FONT></SPAN></DIV>
<DIV><SPAN class=462465505-02062006><FONT face=Arial
size=2></FONT></SPAN> </DIV></BODY></HTML>