[opencms-dev] Indexing News for Lucene Search - Please help..

Trevor Lee Trevor.Lee at 4Loop.com.au
Thu Nov 6 05:04:02 CET 2003


Hi Stephan,

Thanks for that. I manually changed the a_info1, a_info2 and a_info3 in the
database to replace the null elements with an empty string and the news
items can be indexed.

However, it seems that with news2.1 the three info fields are no longer
available on the form that is used to create a news entry.
This means that subsequent news items created will have a_info1 etc
defaulted to "NULL" in the database and hence cause the indexing to fail.

Is there an easy fix for this?

Thanks in advance for any help.

Cheers
Trevor

-----Original Message-----
From: opencms-dev-admin at opencms.org
[mailto:opencms-dev-admin at opencms.org]On Behalf Of Hartmann, Waehrisch &
Feykes GmbH
Sent: Wednesday, November 05, 2003 7:54 PM
To: opencms-dev at opencms.org
Subject: Re: [opencms-dev] Indexing News for Lucene Search - Please
help..


Hi Trevor,

this classes were written for version 1.0 of the news module, but should
work also with verison 2.1.
It seems that there is a null for a_info1 what shouldn't be. Normally all
fields get initialized with an empty String.

The page tag describes a page that should be called to show the single news
entry by passing it some parameters.
<param method="getIntId" name="newsid"/>
tells it to append a parameter newsid=123 where the id is fetch by calling
the method getIntId on the ContentDefinition.

Bye,
Stephan

----- Original Message -----
From: "Trevor Lee" <Trevor.Lee at 4Loop.com.au>
To: <opencms-dev at opencms.org>
Sent: Wednesday, November 05, 2003 7:07 AM
Subject: [opencms-dev] Indexing News for Lucene Search - Please help..


> Hi
>
> I have news2.1 and Lucene Search 1.4 installed on opencms 5.0
>
> I'm trying to index news items and need this functionality working very
> soon, so if any one can help ....
>
> The following is what my registry.xml looks like in relation to lucene:
>         <luceneSearch>
>             <mergeFactor>100000</mergeFactor>
>             <permCheck>true</permCheck>
>
>
<indexDir>C:\Jakarta-Tomcat-4.1.12\webapps\opencms\lucene\index\</indexDir>
>
> <analyzer>org.apache.lucene.analysis.standard.StandardAnalyzer</analyzer>
>             <subsearch>true</subsearch>
>             <project>online</project>
>             <docFactories>
>                 <docFactory enabled="true" type="page">
>
> <class>net.grcomputing.opencms.search.lucene.PageDocument</class>
>                 </docFactory>
>                 <docFactory enabled="true" type="plain">
>                     <fileType name="plaintext">
>                         <extension>.txt</extension>
>
> <class>net.grcomputing.opencms.search.lucene.PlainDocument</class>
>                     </fileType>
>                     <fileType name="taggedtext">
>                         <extension>.html</extension>
>                         <extension>.htm</extension>
>                         <extension>.xml</extension>
>                         <!-- This will strip tags before processing -->
>
> <class>net.grcomputing.opencms.search.lucene.TaggedPlainDocument</class>
>                     </fileType>
>                 </docFactory>
>                 <docFactory enabled="true" type="binary">
>
> <class>net.grcomputing.opencms.search.lucene.BodylessDocument</class>
>                 </docFactory>
>                 <docFactory enabled="true" type="jsp">
>
> <class>net.grcomputing.opencms.search.lucene.JspDocument</class>
>                 </docFactory>
>                 <docFactory enabled="true" type="news">
>
> <class>net.grcomputing.opencms.search.lucene.NewsDocument</class>
>                 </docFactory>
>                 <docFactory enabled="false" type="XML Template"/>
>             </docFactories>
>             <directories>
>                 <directory location="/swm/">
>                     <section>Test</section>
>                     <subsearch>true</subsearch>
>                 </directory>
>             </directories>
>             <contentDefinitions>
>                 <contentDefinition type="news">
>
> <class>com.opencms.modules.homepage.news.NewsContentDefinition</class>
>
>
<initClass>net.grcomputing.opencms.search.lucene.NewsInitialization</initCla
> ss>
>                     <listMethod name="getNewsList">
>                         <param type="java.lang.Integer">1</param>
>                         <param type="java.lang.String">-1</param>
>                     </listMethod>
>                    <page uri="/news/news.jsp?__element=entry">
>                         <param method="getIntId" name="newsid"/>
>                    </page>
>                 </contentDefinition>
>             </contentDefinitions>
>         </luceneSearch>
>
> The news.jsp file is the same as that provided in the news2.1 zip file.
I've
> modified it:
> <jsp:useBean id="newsbean"
> class="com.opencms.modules.homepage.news.NewsContentDefinition"
scope="page"
> />
> <%@page session="false" import="java.util.*, java.text.*,
> com.opencms.modules.homepage.news.*" %>
> <%@ taglib prefix="cms" uri="http://www.opencms.org/taglib/cms" %>
> <cms:template element="entry"> <!-- added this line -->
> <%
> String sID = request.getParameter("id");
> :
> :
> :
> %>
> </cms:template>
> I've added the element "entry" as per the instructions in the message
below.
>
> When the lucene cron job runs I get the following error messages:
> [05.11.2003 05:55:10] <opencms_cronscheduler> Starting job for
> com.opencms.core.CmsCronEntry{55 5 * * * admin Administrators
> net.grcomputing.opencms.search.lucene.CronIndexManager createIndex=true}
> [05.11.2003 05:55:10] <opencms_info>
>
=====IndexManager===========================================================
> ==
> [05.11.2003 05:55:10] <opencms_info> Analyzer:
> org.apache.lucene.analysis.standard.StandardAnalyzer
> [05.11.2003 05:55:10] <opencms_info> Extension map exists to handle
> plaintext
> [05.11.2003 05:55:10] <opencms_info> Extension map exists to handle
> taggedtext
> [05.11.2003 05:55:10] <opencms_info> JSP DocumentFactory loaded
> [05.11.2003 05:55:10] <opencms_info> Bodyless DocumentFactory loaded
> [05.11.2003 05:55:10] <opencms_info> Page DocumentFactory loaded
> [05.11.2003 05:55:10] <opencms_info> IndexManager: indexing /swm/
> :
> :
> 05.11.2003 05:55:12] <opencms_cronscheduler> Error running job for
> com.opencms.core.CmsCronEntry{55 5 * * * admin Administrators
> net.grcomputing.opencms.search.lucene.CronIndexManager createIndex=true}
> Error: java.lang.IllegalArgumentException: value cannot be null
> at org.apache.lucene.document.Field.<init>(Unknown Source)
> at org.apache.lucene.document.Field.UnStored(Unknown Source)
> at
>
net.grcomputing.opencms.search.lucene.NewsDocument.Document(NewsDocument.jav
> a:140)
> at
>
net.grcomputing.opencms.search.lucene.IndexManager.processContentDefinitions
> (IndexManager.java:437)
> at
>
net.grcomputing.opencms.search.lucene.IndexManager.doIndex(IndexManager.java
> :240)
> at
>
net.grcomputing.opencms.search.lucene.CronIndexManager.launch(CronIndexManag
> er.java:107)
> at com.opencms.core.CmsCronScheduleJob.run(CmsCronScheduleJob.java:68)
>
>
> IS the error due to the <page> element in <contentDefinition type="news">?
>
> Thank you in advance.
>
> Cheers
>
> Trevor
> -----Original Message-----
> From: opencms-dev-admin at opencms.org
> [mailto:opencms-dev-admin at opencms.org]On Behalf Of Hartmann, Waehrisch &
> Feykes GmbH
> Sent: Wednesday, October 22, 2003 4:51 PM
> To: opencms-dev at opencms.org
> Subject: Re: [opencms-dev] (no subject)
>
>
> Hi Ben,
>
> i think this won't work since the plainDocFactory will only be used for
> files of type "plain" but not for files of type "binary".
> Recently we have done some additions to the module - by order of Lenord,
> Bauer & Co. GmbH - that could meet your needs. It introduces a more
flexible
> way of defining docFactories that you can add new factories without having
> to recompile the whole module. So other modules (like the news) can bring
> their own docFactory and all you have to do is to edit the registry.xml.
> Here is an example:
>
>             <docFactories>
>                 <docFactory enabled="true" type="plain">
>                     <fileType name="plaintext">
>                         <extension>.txt</extension>
>
> <class>net.grcomputing.opencms.search.lucene.PlainDocument</class>
>                     </fileType>
>                 </docFactory>
>                 <docFactory enabled="true" type="news">
>
> <class>net.grcomputing.opencms.search.lucene.NewsDocument</class>
>                 </docFactory>
>             </docFactories>
>
> To index binary files all you need to add is this:
>
>            <docFactory enabled="true" type="binary">
>
> <class>net.grcomputing.opencms.search.lucene.BodylessDocument</class>
>            </docFactory>
>
> There should be no need for an extension mapping.
>
> For the interested people:
> For ContentDefinitions (like news) i introduced the following:
>             <contentDefinitions>
>                 <contentDefinition type="news"> <!-- must match docFactory
> type -->
>
> <class>com.opencms.modules.homepage.news.NewsContentDefinition</class>
>
>
<initClass>net.grcomputing.opencms.search.lucene.NewsInitialization</initCla
> ss>
>                     <listMethod name="getNewsList">
>                         <param type="java.lang.Integer">1</param>
>                         <param type="java.lang.String">-1</param>
>                     </listMethod>
>                     <page uri="/news.html?__element=entry">
>                         <param method="getIntId" name="newsid"/>
>                     </page>
>                 </contentDefinition>
>
> In short:
> initClass is optional: For the news the news classes have to be loaded to
> initialize the db pool.
> listMethod: a method of the content definition class that returns a List
of
> elements
> page: the page that can display an entry. Here a jsp that has a template
> element "entry". It also needs the id of the news item.
> getIntId is a method of the content definition class and newsid is the url
> parameter the page needs. A link like
> news.html?__element=entry&newsid=xy
> will be generated.
>
> Best regards,
> Stephan
>
>
>
> _______________________________________________
> This mail is send to you from the opencms-dev mailing list
> To change your list options, or to unsubscribe from the list, please visit
> http://mail.opencms.org/mailman/listinfo/opencms-dev

_______________________________________________
This mail is send to you from the opencms-dev mailing list
To change your list options, or to unsubscribe from the list, please visit
http://mail.opencms.org/mailman/listinfo/opencms-dev





More information about the opencms-dev mailing list