[opencms-dev] lucene indexing doesn't start

Tue May 11 21:24:00 CEST 2004

Any errors in the catalina.log file?

Matt

Konstantins Dorodovs wrote:
> Hi,
> 
> I have a problem with lucene indexing
> (opencms version: 5.0.6b1, lucene module: 1.5, tomcat: 4.1.30)
> 
> cron job seems doesn't start:  looked at log
> entry in Scheduler(
> 11 21 * * * admin Administrators 
> net.grcomputing.opencms.search.lucene.CronIndexManager createIndex=true
> )
> 
> seems, I did according to docs,
> (cron is enabled: [11.05.2004 20:10:04] <opencms_init> . OpenCms 
> scheduler    : enabled)
> below, there is an excerpt from my registry.xml:
> 
> Thanks
> 
> Konstantin
> 
> 
> ---------- cut ------
>        <tempfileproject>3</tempfileproject>
> 
> <luceneSearch>
>    <!--
>      - mergeFactor and permCheck are currently ignored.
>      -->
>   <mergeFactor>100000</mergeFactor>
>   <permCheck>true</permCheck>
> 
>    <!--
>      - directory in which lucene will store its indexes. Note: this is real
>      - fs, not VFS.
>      -->
>   <indexDir>C:\luceneindex\</indexDir>
>   <!-- <indexDir>F:\luceneindex\</indexDir> -->
> 
>    <!--
>      - The analyzer is used for parsing documents. Choose one for your
>      - language. If language is English, use the StandardAnalyzer.
>      - There are additional analyzers at http://jakarta.apache.org/lucene
>      -->
>   <analyzer>org.apache.lucene.analysis.standard.StandardAnalyzer</analyzer>
>   <!-- <analyzer>org.apache.lucene.analysis.de.GermanAnalyzer</analyzer> 
> -->
> 
>    <!--
>      - If subsearch is true, subfolders will be searched by default.
>      - This can be turned on/off per directory.
>      -->
>   <subsearch>true</subsearch>  
>    <!--
>      - Name of the project to index. Online is recommended.
>      -->
>   <project>online</project>
>  
>    <!--
>      - docFactories determine how documents are processed. Generally, one
>      - docFactory exists for each type of content (viz. JSP, Page, Plain)
>      - that you want to index.
>      -->
>   <docFactories>
>  
>       <!--
>         - This docFactory indexes documents with type page (e.g. HTML
>         - files edited with the WYSIWYG editor).
>         -
>         - Note that the 'type' attribute specifies which content definition
>         - to use. Built in content types include page, plain, binary, 
> and jsp
>         - (there are others, too). Custom content types can be used as well
>         - (see the contentDefinitions section below).
>         -->
>       <docFactory enabled="true" type="page">
>         <class>net.grcomputing.opencms.search.lucene.PageDocument</class>
>       </docFactory>
> 
>       <!--
>         - This docFactory is a little more complex. It takes documents of
>         - type "plain" and determines, by extension, what class should be
>         - used to index each particular file. In this example, we want to
>         - index plain text files exactly as they are, but any files that
>         - contain tags need the tags stripped out before they are indexed.
>         -
>         - Note that the name="" attribute is simply for pretty output, and
>         - can contain any allowable PCDATA text.
>         -->
>       <docFactory enabled="true" type="plain">
>          <fileType name="plaintext">
>            <extension>.txt</extension>
>            
> <class>net.grcomputing.opencms.search.lucene.PlainDocument</class>
>          </fileType>
>          <fileType name="taggedtext">
>            <extension>.html</extension>
>            <extension>.htm</extension>
>            <extension>.xml</extension>
>            <!-- This will strip tags before processing -->
>            
> <class>net.grcomputing.opencms.search.lucene.TaggedPlainDocument</class>
>          </fileType>
>       </docFactory>
> 
>        <!-- This is for binary files. PDF and DOC files are binary, as are
>          - CLASS and JAR files.
>          -->
>       <docFactory enabled="true" type="binary">
>          <!-- This is for indexing PDF files -->
>          <fileType name="PDF">
>            <extension>.pdf</extension>
>            <class>net.grcomputing.opencms.search.lucene.PDFDocument</class>
>          </fileType>
>          <!-- This is for indexing MS Word documents -->
>          <fileType name="Word">
>            <extension>.doc</extension>
>            <extension>.dot</extension>
>            
> <class>net.grcomputing.opencms.search.lucene.WordDocument</class>
>          </fileType>
>       </docFactory>
> 
>       <!--
>         - This will strip JSP tags and all scriptlets. IT WILL NOT 
> RENDER THE
>         - JSP FIRST, as JSPs are, by nature, dynamic.
>         -
>         - Usually, this is off by default.
>         -->
>       <docFactory enabled="false" type="jsp">
>         <class>net.grcomputing.opencms.search.lucene.JspDocument</class>
>       </docFactory>
> 
>       <!-- For the news module. Enable if you use news -->
> 
> <!--       <docFactory enabled="false" type="news">
>         <class>net.grcomputing.opencms.search.lucene.NewsDocument</class>
>       </docFactory>
> -->
> 
>       <!-- For the forum module. Enable if you use forums. -->
> <!--
>       <docFactory enabled="false" type="forum">
>         <class>de.wfnetz.opencms.modules.forum.ContributionDocument</class>
>       </docFactory>
> -->
> 
>       <!-- If you need to index XML Template files (bad idea) use this: -->
>       <docFactory enabled="false" type="XML Template"/>
>   </docFactories>
>  
>    <!--
>      - <directories/> determines which directories are indexed. By default,
>      - the /system directory is never indexed, so it is safe to index root.
>      -
>      - If you want to specify only certain directories for indexing, create
>      - one <directory/> entry per directory. Again, you may use 
> subsearch to
>      - override the default subsearch setting discussed above.
>      -->
>   <directories>
>       <directory location="/">
>         <section>Root</section>
>         <subsearch>true</subsearch>
>       </directory>
>   </directories>
> 
>   <!--
>     - Use this section to define specific contentDefinitions. Provided 
> below
>     - are entries for the news and forum modules.
>     - (Uncomment these only after you have installed the corresponding
>     - modules)
>     -->
>   <contentDefinitions>
>       <!--
>       <contentDefinition type="news">
>        -->
>          <!--
>            - <class /> determines the class of the content definition. 
> Should
>            - be a subclass of com.opencms.defaults.A_CmsContentDefinition.
>            -->
>         <!--
>         
> <class>com.opencms.modules.homepage.news.NewsContentDefinition</class>
>          -->
>          <!--
>            - <initClass /> is optional and has to implement
>            - 
> net.grcomputing.opencms.search.lucene.I_ContentDefinitionInitialization.
>            - It provides you with the ability to perform some
>            - initialization before the content definition class can be 
> used.
>            - In case of the news module the NewsChannelContentDefinition 
> class
>            - has to be loaded.
>            -->
>         <!--
>         
> <initClass>net.grcomputing.opencms.search.lucene.NewsInitialization</initClass> 
> 
>          -->
>           <!--
>             - <listMethod /> defines the method of the content 
> definition class
>             - which should be used to retrieve all content definition 
> objects
>             - (or any subset).
>             - Usually you use this method also in the backoffice or any 
> other
>             - list view.
>             -->
>         <!--
>         <listMethod name="getNewsList">
>           <param type="java.lang.Integer">1</param>
>           <param type="java.lang.String">-1</param>
>         </listMethod>
>          -->
>           <!--
>             - <page /> determines a page in the virtual file system that 
> can
>             - display a single entry of a content definition. You must 
> provide
>             - also a method of the content definition class that 
> retrieves an
>             - id (or something else that has to be appended to your page 
> uri
>             - to determine which entry has to be displayed). The result 
> will
>             - look like:
>             - /news.html?__element=entry&newsid=<result of getIntId>
>             - for each content definition instance object.
>             -->
>         <!--
>         <page uri="/news.html?__element=entry">
>           <param method="getIntId" name="newsid"/>
>         </page>
>          -->
>         <!--
>           <page uri="/singleNews.jsp">
>             <param method="getIntId" name="id"/>
>           </page>
>           -->
>       <!--
>       </contentDefinition>
>        -->
>        <!-- for Forums modules
>       <contentDefinition type="forum">
>         
> <class>de.wfnetz.opencms.modules.forum.ContributionContentDefinition</class> 
> 
>         <listMethod name="getSortedList">
>           <param type="java.lang.String"/>
>         </listMethod>
>         <page uri="/forum.html?forumtemplate=viewcontributionentry">
>           <param method="getId" name="conid"/>
>         </page>
>       </contentDefinition>
>       -->
>   </contentDefinitions>
> </luceneSearch>
> 
>    </system>
> ---------- cut ------
> 
> 
> _______________________________________________
> This mail is send to you from the opencms-dev mailing list
> To change your list options, or to unsubscribe from the list, please visit
> http://mail.opencms.org/mailman/listinfo/opencms-dev