[opencms-dev] Indexing News for Lucene Search - Please help..

Hartmann, Waehrisch & Feykes GmbH hartmann at waehrisch-feykes.de
Wed Nov 5 10:26:01 CET 2003


Hi Trevor,

this classes were written for version 1.0 of the news module, but should
work also with verison 2.1.
It seems that there is a null for a_info1 what shouldn't be. Normally all
fields get initialized with an empty String.

The page tag describes a page that should be called to show the single news
entry by passing it some parameters.
<param method="getIntId" name="newsid"/>
tells it to append a parameter newsid=123 where the id is fetch by calling
the method getIntId on the ContentDefinition.

Bye,
Stephan

----- Original Message ----- 
From: "Trevor Lee" <Trevor.Lee at 4Loop.com.au>
To: <opencms-dev at opencms.org>
Sent: Wednesday, November 05, 2003 7:07 AM
Subject: [opencms-dev] Indexing News for Lucene Search - Please help..


> Hi
>
> I have news2.1 and Lucene Search 1.4 installed on opencms 5.0
>
> I'm trying to index news items and need this functionality working very
> soon, so if any one can help ....
>
> The following is what my registry.xml looks like in relation to lucene:
>         <luceneSearch>
>             <mergeFactor>100000</mergeFactor>
>             <permCheck>true</permCheck>
>
>
<indexDir>C:\Jakarta-Tomcat-4.1.12\webapps\opencms\lucene\index\</indexDir>
>
> <analyzer>org.apache.lucene.analysis.standard.StandardAnalyzer</analyzer>
>             <subsearch>true</subsearch>
>             <project>online</project>
>             <docFactories>
>                 <docFactory enabled="true" type="page">
>
> <class>net.grcomputing.opencms.search.lucene.PageDocument</class>
>                 </docFactory>
>                 <docFactory enabled="true" type="plain">
>                     <fileType name="plaintext">
>                         <extension>.txt</extension>
>
> <class>net.grcomputing.opencms.search.lucene.PlainDocument</class>
>                     </fileType>
>                     <fileType name="taggedtext">
>                         <extension>.html</extension>
>                         <extension>.htm</extension>
>                         <extension>.xml</extension>
>                         <!-- This will strip tags before processing -->
>
> <class>net.grcomputing.opencms.search.lucene.TaggedPlainDocument</class>
>                     </fileType>
>                 </docFactory>
>                 <docFactory enabled="true" type="binary">
>
> <class>net.grcomputing.opencms.search.lucene.BodylessDocument</class>
>                 </docFactory>
>                 <docFactory enabled="true" type="jsp">
>
> <class>net.grcomputing.opencms.search.lucene.JspDocument</class>
>                 </docFactory>
>                 <docFactory enabled="true" type="news">
>
> <class>net.grcomputing.opencms.search.lucene.NewsDocument</class>
>                 </docFactory>
>                 <docFactory enabled="false" type="XML Template"/>
>             </docFactories>
>             <directories>
>                 <directory location="/swm/">
>                     <section>Test</section>
>                     <subsearch>true</subsearch>
>                 </directory>
>             </directories>
>             <contentDefinitions>
>                 <contentDefinition type="news">
>
> <class>com.opencms.modules.homepage.news.NewsContentDefinition</class>
>
>
<initClass>net.grcomputing.opencms.search.lucene.NewsInitialization</initCla
> ss>
>                     <listMethod name="getNewsList">
>                         <param type="java.lang.Integer">1</param>
>                         <param type="java.lang.String">-1</param>
>                     </listMethod>
>                    <page uri="/news/news.jsp?__element=entry">
>                         <param method="getIntId" name="newsid"/>
>                    </page>
>                 </contentDefinition>
>             </contentDefinitions>
>         </luceneSearch>
>
> The news.jsp file is the same as that provided in the news2.1 zip file.
I've
> modified it:
> <jsp:useBean id="newsbean"
> class="com.opencms.modules.homepage.news.NewsContentDefinition"
scope="page"
> />
> <%@page session="false" import="java.util.*, java.text.*,
> com.opencms.modules.homepage.news.*" %>
> <%@ taglib prefix="cms" uri="http://www.opencms.org/taglib/cms" %>
> <cms:template element="entry"> <!-- added this line -->
> <%
> String sID = request.getParameter("id");
> :
> :
> :
> %>
> </cms:template>
> I've added the element "entry" as per the instructions in the message
below.
>
> When the lucene cron job runs I get the following error messages:
> [05.11.2003 05:55:10] <opencms_cronscheduler> Starting job for
> com.opencms.core.CmsCronEntry{55 5 * * * admin Administrators
> net.grcomputing.opencms.search.lucene.CronIndexManager createIndex=true}
> [05.11.2003 05:55:10] <opencms_info>
>
=====IndexManager===========================================================
> ==
> [05.11.2003 05:55:10] <opencms_info> Analyzer:
> org.apache.lucene.analysis.standard.StandardAnalyzer
> [05.11.2003 05:55:10] <opencms_info> Extension map exists to handle
> plaintext
> [05.11.2003 05:55:10] <opencms_info> Extension map exists to handle
> taggedtext
> [05.11.2003 05:55:10] <opencms_info> JSP DocumentFactory loaded
> [05.11.2003 05:55:10] <opencms_info> Bodyless DocumentFactory loaded
> [05.11.2003 05:55:10] <opencms_info> Page DocumentFactory loaded
> [05.11.2003 05:55:10] <opencms_info> IndexManager: indexing /swm/
> :
> :
> 05.11.2003 05:55:12] <opencms_cronscheduler> Error running job for
> com.opencms.core.CmsCronEntry{55 5 * * * admin Administrators
> net.grcomputing.opencms.search.lucene.CronIndexManager createIndex=true}
> Error: java.lang.IllegalArgumentException: value cannot be null
> at org.apache.lucene.document.Field.<init>(Unknown Source)
> at org.apache.lucene.document.Field.UnStored(Unknown Source)
> at
>
net.grcomputing.opencms.search.lucene.NewsDocument.Document(NewsDocument.jav
> a:140)
> at
>
net.grcomputing.opencms.search.lucene.IndexManager.processContentDefinitions
> (IndexManager.java:437)
> at
>
net.grcomputing.opencms.search.lucene.IndexManager.doIndex(IndexManager.java
> :240)
> at
>
net.grcomputing.opencms.search.lucene.CronIndexManager.launch(CronIndexManag
> er.java:107)
> at com.opencms.core.CmsCronScheduleJob.run(CmsCronScheduleJob.java:68)
>
>
> IS the error due to the <page> element in <contentDefinition type="news">?
>
> Thank you in advance.
>
> Cheers
>
> Trevor
> -----Original Message-----
> From: opencms-dev-admin at opencms.org
> [mailto:opencms-dev-admin at opencms.org]On Behalf Of Hartmann, Waehrisch &
> Feykes GmbH
> Sent: Wednesday, October 22, 2003 4:51 PM
> To: opencms-dev at opencms.org
> Subject: Re: [opencms-dev] (no subject)
>
>
> Hi Ben,
>
> i think this won't work since the plainDocFactory will only be used for
> files of type "plain" but not for files of type "binary".
> Recently we have done some additions to the module - by order of Lenord,
> Bauer & Co. GmbH - that could meet your needs. It introduces a more
flexible
> way of defining docFactories that you can add new factories without having
> to recompile the whole module. So other modules (like the news) can bring
> their own docFactory and all you have to do is to edit the registry.xml.
> Here is an example:
>
>             <docFactories>
>                 <docFactory enabled="true" type="plain">
>                     <fileType name="plaintext">
>                         <extension>.txt</extension>
>
> <class>net.grcomputing.opencms.search.lucene.PlainDocument</class>
>                     </fileType>
>                 </docFactory>
>                 <docFactory enabled="true" type="news">
>
> <class>net.grcomputing.opencms.search.lucene.NewsDocument</class>
>                 </docFactory>
>             </docFactories>
>
> To index binary files all you need to add is this:
>
>            <docFactory enabled="true" type="binary">
>
> <class>net.grcomputing.opencms.search.lucene.BodylessDocument</class>
>            </docFactory>
>
> There should be no need for an extension mapping.
>
> For the interested people:
> For ContentDefinitions (like news) i introduced the following:
>             <contentDefinitions>
>                 <contentDefinition type="news"> <!-- must match docFactory
> type -->
>
> <class>com.opencms.modules.homepage.news.NewsContentDefinition</class>
>
>
<initClass>net.grcomputing.opencms.search.lucene.NewsInitialization</initCla
> ss>
>                     <listMethod name="getNewsList">
>                         <param type="java.lang.Integer">1</param>
>                         <param type="java.lang.String">-1</param>
>                     </listMethod>
>                     <page uri="/news.html?__element=entry">
>                         <param method="getIntId" name="newsid"/>
>                     </page>
>                 </contentDefinition>
>
> In short:
> initClass is optional: For the news the news classes have to be loaded to
> initialize the db pool.
> listMethod: a method of the content definition class that returns a List
of
> elements
> page: the page that can display an entry. Here a jsp that has a template
> element "entry". It also needs the id of the news item.
> getIntId is a method of the content definition class and newsid is the url
> parameter the page needs. A link like
> news.html?__element=entry&newsid=xy
> will be generated.
>
> Best regards,
> Stephan
>
>
>
> _______________________________________________
> This mail is send to you from the opencms-dev mailing list
> To change your list options, or to unsubscribe from the list, please visit
> http://mail.opencms.org/mailman/listinfo/opencms-dev




More information about the opencms-dev mailing list