[opencms-dev] Indexing News for Lucene Search - Please help..
Hartmann, Waehrisch & Feykes GmbH
hartmann at waehrisch-feykes.de
Wed Nov 5 10:26:01 CET 2003
Hi Trevor,
this classes were written for version 1.0 of the news module, but should
work also with verison 2.1.
It seems that there is a null for a_info1 what shouldn't be. Normally all
fields get initialized with an empty String.
The page tag describes a page that should be called to show the single news
entry by passing it some parameters.
<param method="getIntId" name="newsid"/>
tells it to append a parameter newsid=123 where the id is fetch by calling
the method getIntId on the ContentDefinition.
Bye,
Stephan
----- Original Message -----
From: "Trevor Lee" <Trevor.Lee at 4Loop.com.au>
To: <opencms-dev at opencms.org>
Sent: Wednesday, November 05, 2003 7:07 AM
Subject: [opencms-dev] Indexing News for Lucene Search - Please help..
> Hi
>
> I have news2.1 and Lucene Search 1.4 installed on opencms 5.0
>
> I'm trying to index news items and need this functionality working very
> soon, so if any one can help ....
>
> The following is what my registry.xml looks like in relation to lucene:
> <luceneSearch>
> <mergeFactor>100000</mergeFactor>
> <permCheck>true</permCheck>
>
>
<indexDir>C:\Jakarta-Tomcat-4.1.12\webapps\opencms\lucene\index\</indexDir>
>
> <analyzer>org.apache.lucene.analysis.standard.StandardAnalyzer</analyzer>
> <subsearch>true</subsearch>
> <project>online</project>
> <docFactories>
> <docFactory enabled="true" type="page">
>
> <class>net.grcomputing.opencms.search.lucene.PageDocument</class>
> </docFactory>
> <docFactory enabled="true" type="plain">
> <fileType name="plaintext">
> <extension>.txt</extension>
>
> <class>net.grcomputing.opencms.search.lucene.PlainDocument</class>
> </fileType>
> <fileType name="taggedtext">
> <extension>.html</extension>
> <extension>.htm</extension>
> <extension>.xml</extension>
> <!-- This will strip tags before processing -->
>
> <class>net.grcomputing.opencms.search.lucene.TaggedPlainDocument</class>
> </fileType>
> </docFactory>
> <docFactory enabled="true" type="binary">
>
> <class>net.grcomputing.opencms.search.lucene.BodylessDocument</class>
> </docFactory>
> <docFactory enabled="true" type="jsp">
>
> <class>net.grcomputing.opencms.search.lucene.JspDocument</class>
> </docFactory>
> <docFactory enabled="true" type="news">
>
> <class>net.grcomputing.opencms.search.lucene.NewsDocument</class>
> </docFactory>
> <docFactory enabled="false" type="XML Template"/>
> </docFactories>
> <directories>
> <directory location="/swm/">
> <section>Test</section>
> <subsearch>true</subsearch>
> </directory>
> </directories>
> <contentDefinitions>
> <contentDefinition type="news">
>
> <class>com.opencms.modules.homepage.news.NewsContentDefinition</class>
>
>
<initClass>net.grcomputing.opencms.search.lucene.NewsInitialization</initCla
> ss>
> <listMethod name="getNewsList">
> <param type="java.lang.Integer">1</param>
> <param type="java.lang.String">-1</param>
> </listMethod>
> <page uri="/news/news.jsp?__element=entry">
> <param method="getIntId" name="newsid"/>
> </page>
> </contentDefinition>
> </contentDefinitions>
> </luceneSearch>
>
> The news.jsp file is the same as that provided in the news2.1 zip file.
I've
> modified it:
> <jsp:useBean id="newsbean"
> class="com.opencms.modules.homepage.news.NewsContentDefinition"
scope="page"
> />
> <%@page session="false" import="java.util.*, java.text.*,
> com.opencms.modules.homepage.news.*" %>
> <%@ taglib prefix="cms" uri="http://www.opencms.org/taglib/cms" %>
> <cms:template element="entry"> <!-- added this line -->
> <%
> String sID = request.getParameter("id");
> :
> :
> :
> %>
> </cms:template>
> I've added the element "entry" as per the instructions in the message
below.
>
> When the lucene cron job runs I get the following error messages:
> [05.11.2003 05:55:10] <opencms_cronscheduler> Starting job for
> com.opencms.core.CmsCronEntry{55 5 * * * admin Administrators
> net.grcomputing.opencms.search.lucene.CronIndexManager createIndex=true}
> [05.11.2003 05:55:10] <opencms_info>
>
=====IndexManager===========================================================
> ==
> [05.11.2003 05:55:10] <opencms_info> Analyzer:
> org.apache.lucene.analysis.standard.StandardAnalyzer
> [05.11.2003 05:55:10] <opencms_info> Extension map exists to handle
> plaintext
> [05.11.2003 05:55:10] <opencms_info> Extension map exists to handle
> taggedtext
> [05.11.2003 05:55:10] <opencms_info> JSP DocumentFactory loaded
> [05.11.2003 05:55:10] <opencms_info> Bodyless DocumentFactory loaded
> [05.11.2003 05:55:10] <opencms_info> Page DocumentFactory loaded
> [05.11.2003 05:55:10] <opencms_info> IndexManager: indexing /swm/
> :
> :
> 05.11.2003 05:55:12] <opencms_cronscheduler> Error running job for
> com.opencms.core.CmsCronEntry{55 5 * * * admin Administrators
> net.grcomputing.opencms.search.lucene.CronIndexManager createIndex=true}
> Error: java.lang.IllegalArgumentException: value cannot be null
> at org.apache.lucene.document.Field.<init>(Unknown Source)
> at org.apache.lucene.document.Field.UnStored(Unknown Source)
> at
>
net.grcomputing.opencms.search.lucene.NewsDocument.Document(NewsDocument.jav
> a:140)
> at
>
net.grcomputing.opencms.search.lucene.IndexManager.processContentDefinitions
> (IndexManager.java:437)
> at
>
net.grcomputing.opencms.search.lucene.IndexManager.doIndex(IndexManager.java
> :240)
> at
>
net.grcomputing.opencms.search.lucene.CronIndexManager.launch(CronIndexManag
> er.java:107)
> at com.opencms.core.CmsCronScheduleJob.run(CmsCronScheduleJob.java:68)
>
>
> IS the error due to the <page> element in <contentDefinition type="news">?
>
> Thank you in advance.
>
> Cheers
>
> Trevor
> -----Original Message-----
> From: opencms-dev-admin at opencms.org
> [mailto:opencms-dev-admin at opencms.org]On Behalf Of Hartmann, Waehrisch &
> Feykes GmbH
> Sent: Wednesday, October 22, 2003 4:51 PM
> To: opencms-dev at opencms.org
> Subject: Re: [opencms-dev] (no subject)
>
>
> Hi Ben,
>
> i think this won't work since the plainDocFactory will only be used for
> files of type "plain" but not for files of type "binary".
> Recently we have done some additions to the module - by order of Lenord,
> Bauer & Co. GmbH - that could meet your needs. It introduces a more
flexible
> way of defining docFactories that you can add new factories without having
> to recompile the whole module. So other modules (like the news) can bring
> their own docFactory and all you have to do is to edit the registry.xml.
> Here is an example:
>
> <docFactories>
> <docFactory enabled="true" type="plain">
> <fileType name="plaintext">
> <extension>.txt</extension>
>
> <class>net.grcomputing.opencms.search.lucene.PlainDocument</class>
> </fileType>
> </docFactory>
> <docFactory enabled="true" type="news">
>
> <class>net.grcomputing.opencms.search.lucene.NewsDocument</class>
> </docFactory>
> </docFactories>
>
> To index binary files all you need to add is this:
>
> <docFactory enabled="true" type="binary">
>
> <class>net.grcomputing.opencms.search.lucene.BodylessDocument</class>
> </docFactory>
>
> There should be no need for an extension mapping.
>
> For the interested people:
> For ContentDefinitions (like news) i introduced the following:
> <contentDefinitions>
> <contentDefinition type="news"> <!-- must match docFactory
> type -->
>
> <class>com.opencms.modules.homepage.news.NewsContentDefinition</class>
>
>
<initClass>net.grcomputing.opencms.search.lucene.NewsInitialization</initCla
> ss>
> <listMethod name="getNewsList">
> <param type="java.lang.Integer">1</param>
> <param type="java.lang.String">-1</param>
> </listMethod>
> <page uri="/news.html?__element=entry">
> <param method="getIntId" name="newsid"/>
> </page>
> </contentDefinition>
>
> In short:
> initClass is optional: For the news the news classes have to be loaded to
> initialize the db pool.
> listMethod: a method of the content definition class that returns a List
of
> elements
> page: the page that can display an entry. Here a jsp that has a template
> element "entry". It also needs the id of the news item.
> getIntId is a method of the content definition class and newsid is the url
> parameter the page needs. A link like
> news.html?__element=entry&newsid=xy
> will be generated.
>
> Best regards,
> Stephan
>
>
>
> _______________________________________________
> This mail is send to you from the opencms-dev mailing list
> To change your list options, or to unsubscribe from the list, please visit
> http://mail.opencms.org/mailman/listinfo/opencms-dev
More information about the opencms-dev
mailing list