[opencms-dev] Unpredictable Lucene document factory associations

Jonathan Woods jonathan.woods at scintillance.com
Tue May 23 08:39:44 CEST 2006


I'd be interested in hearing how anyone else has found it best to deal with
the following problem, assuming I've got things right.

I've implemented my own (working) Lucene document factory and configured its
so-called document type in opencms-search.xml.  The trouble is that it's
getting hidden by other document types - and I believe this is because in
general document factories which declare interest in the same document keys
as other document factories may win or lose, unpredictably, in getting their
association recognised by OpenCms.

Wouldn't it be better if opencms-search.xml were processed in a way that
embodied some kind of shadowing or priority, so that later entries beat
earlier ones?  Then we could have (say) document type 'xmlcontent' dealing
with all XML content types except those which were dealt with by a custom
Lucene document factory... which is what I'm looking for!  When I've tried
to change the 'xmlcontent' document type config to handle only xmlcontent
resource types, then my document factory still isn't associated with my
custom XML resources types, because the 'generic' document type takes over.

Analysis:

1.  The behaviour of indexing and searching is defined in
opencms-search.xml, which allows developers to create their own (Lucene)
document factories for specific purposes.  opencms-search.xml, together with
the document factory implementations, help associate resource types and MIME
types with the right factory implementation.  The key to this association is
a so-called 'document key', which is either MIME-specific (it contains the
resource type and the MIME type) or non-MIME-specific (it contains only the
resource type).

2.  At start-up time, search document types ('document types') are read from
opencms-search.xml and put in a HashMap m_ m_documentTypeConfigs.

3.  Still at start-up, OpenCms then iterates through the keys of that
document type config map, and for each document type config it does the
following: (i) it instantiates the associated document factory (class), (ii)
it calls 'getDocumentKeys' on that instance, telling it which combination of
resource types and MIME types it may consider (as configured in
opencms-search.xml's resourcetypes and mimetypes nodes); (iii) it uses
HashMap m_documentTypes to map each of the document keys to the document
factory instance. 

4.  The iteration order through keys of m_documentTypeConfigs is
unpredictable, and therfore so too is the order in which document factories
are asked to declare their interest in various document keys.  A document
factory may therefore end up replacing the value of an existing key - i.e. a
document factory instance already configured - with its own instance.

Jon

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://webmail.opencms.org/pipermail/opencms-dev/attachments/20060523/f24f0356/attachment.htm>


More information about the opencms-dev mailing list