[opencms-dev] How to make Lucene use Custom Extractors

Mon Oct 5 11:46:07 CEST 2009

Hi List,

has anyone an Idea how to be able to configure a Custom Extractor that can
be used while rebuilding the search indexes on OpenCms 7.0.5? We have an
OpenCms
where some Content has OpenCmsString defined, but the HtmlWidget is
necessary to 
edit content. The problem is, that using the default configuration the
Lucene Excerpts
also hold HTML-Snippets, which are shown in the Search-Result. Our Client
now wants
the HTML-Code removed from the Search-Result-Excerpts.

Changing the OpenCmsString to OpenCmsHtml is NOT an option.

What I tried is the following:

1) create a custom extension of
org.opencms.search.documents.CmsDocumentXmlContent
(test.CmsDocumentXmlContent) with the Method

"public I_cmsExtractionResult extractContent(CmsObject cms, CmsResource
resource, CmsSearchIndex index) throws CmsException"

being a copy of the original in the csv, but changed to always use the
CmsHtmlExtractor to gather the extraced data. Further i Added some
LOG.error() entries to see, if the method is invoked.

2) Configure that as a replacement in opencms-search.xml

<documenttype>
   <name>xmlcontent</name>
   <class>test.CmsDocumentXmlContent</class>
   ...
</documenttype>

3) The test.CmsDocumentXmlContent class is present in WEB-INF/classes/test/

During Server start I can see the following Message in opencms.log:

"Search document types: adding "xmlcontent" using handler
test.CmsDocumentXmlContent"

When I now try to rebuild a search index, it seems my custom
CmsDocumentXmlContent 
is not invoked, because the log-output is not written to the opencms.log and
the search-results
are not changed and still show the HTML-Snippets.

Any Ideas, how I can reach my goal or what I did wrong?

Thanks

Eska

-- 
View this message in context: http://www.nabble.com/How-to-make-Lucene-use-Custom-Extractors-tp25747875p25747875.html
Sent from the OpenCMS - Dev mailing list archive at Nabble.com.