[opencms-dev] lucene indexing doesn't start
M Butcher
mbutcher at grcomputing.net
Tue May 11 21:24:00 CEST 2004
Any errors in the catalina.log file?
Matt
Konstantins Dorodovs wrote:
> Hi,
>
> I have a problem with lucene indexing
> (opencms version: 5.0.6b1, lucene module: 1.5, tomcat: 4.1.30)
>
> cron job seems doesn't start: looked at log
> entry in Scheduler(
> 11 21 * * * admin Administrators
> net.grcomputing.opencms.search.lucene.CronIndexManager createIndex=true
> )
>
> seems, I did according to docs,
> (cron is enabled: [11.05.2004 20:10:04] <opencms_init> . OpenCms
> scheduler : enabled)
> below, there is an excerpt from my registry.xml:
>
> Thanks
>
> Konstantin
>
>
> ---------- cut ------
> <tempfileproject>3</tempfileproject>
>
> <luceneSearch>
> <!--
> - mergeFactor and permCheck are currently ignored.
> -->
> <mergeFactor>100000</mergeFactor>
> <permCheck>true</permCheck>
>
> <!--
> - directory in which lucene will store its indexes. Note: this is real
> - fs, not VFS.
> -->
> <indexDir>C:\luceneindex\</indexDir>
> <!-- <indexDir>F:\luceneindex\</indexDir> -->
>
> <!--
> - The analyzer is used for parsing documents. Choose one for your
> - language. If language is English, use the StandardAnalyzer.
> - There are additional analyzers at http://jakarta.apache.org/lucene
> -->
> <analyzer>org.apache.lucene.analysis.standard.StandardAnalyzer</analyzer>
> <!-- <analyzer>org.apache.lucene.analysis.de.GermanAnalyzer</analyzer>
> -->
>
> <!--
> - If subsearch is true, subfolders will be searched by default.
> - This can be turned on/off per directory.
> -->
> <subsearch>true</subsearch>
> <!--
> - Name of the project to index. Online is recommended.
> -->
> <project>online</project>
>
> <!--
> - docFactories determine how documents are processed. Generally, one
> - docFactory exists for each type of content (viz. JSP, Page, Plain)
> - that you want to index.
> -->
> <docFactories>
>
> <!--
> - This docFactory indexes documents with type page (e.g. HTML
> - files edited with the WYSIWYG editor).
> -
> - Note that the 'type' attribute specifies which content definition
> - to use. Built in content types include page, plain, binary,
> and jsp
> - (there are others, too). Custom content types can be used as well
> - (see the contentDefinitions section below).
> -->
> <docFactory enabled="true" type="page">
> <class>net.grcomputing.opencms.search.lucene.PageDocument</class>
> </docFactory>
>
> <!--
> - This docFactory is a little more complex. It takes documents of
> - type "plain" and determines, by extension, what class should be
> - used to index each particular file. In this example, we want to
> - index plain text files exactly as they are, but any files that
> - contain tags need the tags stripped out before they are indexed.
> -
> - Note that the name="" attribute is simply for pretty output, and
> - can contain any allowable PCDATA text.
> -->
> <docFactory enabled="true" type="plain">
> <fileType name="plaintext">
> <extension>.txt</extension>
>
> <class>net.grcomputing.opencms.search.lucene.PlainDocument</class>
> </fileType>
> <fileType name="taggedtext">
> <extension>.html</extension>
> <extension>.htm</extension>
> <extension>.xml</extension>
> <!-- This will strip tags before processing -->
>
> <class>net.grcomputing.opencms.search.lucene.TaggedPlainDocument</class>
> </fileType>
> </docFactory>
>
> <!-- This is for binary files. PDF and DOC files are binary, as are
> - CLASS and JAR files.
> -->
> <docFactory enabled="true" type="binary">
> <!-- This is for indexing PDF files -->
> <fileType name="PDF">
> <extension>.pdf</extension>
> <class>net.grcomputing.opencms.search.lucene.PDFDocument</class>
> </fileType>
> <!-- This is for indexing MS Word documents -->
> <fileType name="Word">
> <extension>.doc</extension>
> <extension>.dot</extension>
>
> <class>net.grcomputing.opencms.search.lucene.WordDocument</class>
> </fileType>
> </docFactory>
>
> <!--
> - This will strip JSP tags and all scriptlets. IT WILL NOT
> RENDER THE
> - JSP FIRST, as JSPs are, by nature, dynamic.
> -
> - Usually, this is off by default.
> -->
> <docFactory enabled="false" type="jsp">
> <class>net.grcomputing.opencms.search.lucene.JspDocument</class>
> </docFactory>
>
> <!-- For the news module. Enable if you use news -->
>
> <!-- <docFactory enabled="false" type="news">
> <class>net.grcomputing.opencms.search.lucene.NewsDocument</class>
> </docFactory>
> -->
>
> <!-- For the forum module. Enable if you use forums. -->
> <!--
> <docFactory enabled="false" type="forum">
> <class>de.wfnetz.opencms.modules.forum.ContributionDocument</class>
> </docFactory>
> -->
>
> <!-- If you need to index XML Template files (bad idea) use this: -->
> <docFactory enabled="false" type="XML Template"/>
> </docFactories>
>
> <!--
> - <directories/> determines which directories are indexed. By default,
> - the /system directory is never indexed, so it is safe to index root.
> -
> - If you want to specify only certain directories for indexing, create
> - one <directory/> entry per directory. Again, you may use
> subsearch to
> - override the default subsearch setting discussed above.
> -->
> <directories>
> <directory location="/">
> <section>Root</section>
> <subsearch>true</subsearch>
> </directory>
> </directories>
>
> <!--
> - Use this section to define specific contentDefinitions. Provided
> below
> - are entries for the news and forum modules.
> - (Uncomment these only after you have installed the corresponding
> - modules)
> -->
> <contentDefinitions>
> <!--
> <contentDefinition type="news">
> -->
> <!--
> - <class /> determines the class of the content definition.
> Should
> - be a subclass of com.opencms.defaults.A_CmsContentDefinition.
> -->
> <!--
>
> <class>com.opencms.modules.homepage.news.NewsContentDefinition</class>
> -->
> <!--
> - <initClass /> is optional and has to implement
> -
> net.grcomputing.opencms.search.lucene.I_ContentDefinitionInitialization.
> - It provides you with the ability to perform some
> - initialization before the content definition class can be
> used.
> - In case of the news module the NewsChannelContentDefinition
> class
> - has to be loaded.
> -->
> <!--
>
> <initClass>net.grcomputing.opencms.search.lucene.NewsInitialization</initClass>
>
> -->
> <!--
> - <listMethod /> defines the method of the content
> definition class
> - which should be used to retrieve all content definition
> objects
> - (or any subset).
> - Usually you use this method also in the backoffice or any
> other
> - list view.
> -->
> <!--
> <listMethod name="getNewsList">
> <param type="java.lang.Integer">1</param>
> <param type="java.lang.String">-1</param>
> </listMethod>
> -->
> <!--
> - <page /> determines a page in the virtual file system that
> can
> - display a single entry of a content definition. You must
> provide
> - also a method of the content definition class that
> retrieves an
> - id (or something else that has to be appended to your page
> uri
> - to determine which entry has to be displayed). The result
> will
> - look like:
> - /news.html?__element=entry&newsid=<result of getIntId>
> - for each content definition instance object.
> -->
> <!--
> <page uri="/news.html?__element=entry">
> <param method="getIntId" name="newsid"/>
> </page>
> -->
> <!--
> <page uri="/singleNews.jsp">
> <param method="getIntId" name="id"/>
> </page>
> -->
> <!--
> </contentDefinition>
> -->
> <!-- for Forums modules
> <contentDefinition type="forum">
>
> <class>de.wfnetz.opencms.modules.forum.ContributionContentDefinition</class>
>
> <listMethod name="getSortedList">
> <param type="java.lang.String"/>
> </listMethod>
> <page uri="/forum.html?forumtemplate=viewcontributionentry">
> <param method="getId" name="conid"/>
> </page>
> </contentDefinition>
> -->
> </contentDefinitions>
> </luceneSearch>
>
> </system>
> ---------- cut ------
>
>
> _______________________________________________
> This mail is send to you from the opencms-dev mailing list
> To change your list options, or to unsubscribe from the list, please visit
> http://mail.opencms.org/mailman/listinfo/opencms-dev
More information about the opencms-dev
mailing list