[opencms-dev] lucene indexing doesn't start

Konstantins Dorodovs K.Dorodovs at mebius.lv
Tue May 11 20:23:01 CEST 2004


Hi,

I have a problem with lucene indexing
(opencms version: 5.0.6b1, lucene module: 1.5, tomcat: 4.1.30)

cron job seems doesn't start:  looked at log
entry in Scheduler(
11 21 * * * admin Administrators 
net.grcomputing.opencms.search.lucene.CronIndexManager createIndex=true
)

seems, I did according to docs,
(cron is enabled: [11.05.2004 20:10:04] <opencms_init> . OpenCms 
scheduler    : enabled)
below, there is an excerpt from my registry.xml:

Thanks

Konstantin


---------- cut ------
        <tempfileproject>3</tempfileproject>

<luceneSearch>
    <!--
      - mergeFactor and permCheck are currently ignored.
      -->
   <mergeFactor>100000</mergeFactor>
   <permCheck>true</permCheck>

    <!--
      - directory in which lucene will store its indexes. Note: this is real
      - fs, not VFS.
      -->
   <indexDir>C:\luceneindex\</indexDir>
   <!-- <indexDir>F:\luceneindex\</indexDir> -->

    <!--
      - The analyzer is used for parsing documents. Choose one for your
      - language. If language is English, use the StandardAnalyzer.
      - There are additional analyzers at http://jakarta.apache.org/lucene
      -->
   <analyzer>org.apache.lucene.analysis.standard.StandardAnalyzer</analyzer>
   <!-- 
<analyzer>org.apache.lucene.analysis.de.GermanAnalyzer</analyzer> -->

    <!--
      - If subsearch is true, subfolders will be searched by default.
      - This can be turned on/off per directory.
      -->
   <subsearch>true</subsearch>   

    <!--
      - Name of the project to index. Online is recommended.
      -->
   <project>online</project>
  
    <!--
      - docFactories determine how documents are processed. Generally, one
      - docFactory exists for each type of content (viz. JSP, Page, Plain)
      - that you want to index.
      -->
   <docFactories>
  
       <!--
         - This docFactory indexes documents with type page (e.g. HTML
         - files edited with the WYSIWYG editor).
         -
         - Note that the 'type' attribute specifies which content definition
         - to use. Built in content types include page, plain, binary, 
and jsp
         - (there are others, too). Custom content types can be used as well
         - (see the contentDefinitions section below).
         -->
       <docFactory enabled="true" type="page">
         <class>net.grcomputing.opencms.search.lucene.PageDocument</class>
       </docFactory>

       <!--
         - This docFactory is a little more complex. It takes documents of
         - type "plain" and determines, by extension, what class should be
         - used to index each particular file. In this example, we want to
         - index plain text files exactly as they are, but any files that
         - contain tags need the tags stripped out before they are indexed.
         -
         - Note that the name="" attribute is simply for pretty output, and
         - can contain any allowable PCDATA text.
         -->
       <docFactory enabled="true" type="plain">
          <fileType name="plaintext">
            <extension>.txt</extension>
            
<class>net.grcomputing.opencms.search.lucene.PlainDocument</class>
          </fileType>
          <fileType name="taggedtext">
            <extension>.html</extension>
            <extension>.htm</extension>
            <extension>.xml</extension>
            <!-- This will strip tags before processing -->
            
<class>net.grcomputing.opencms.search.lucene.TaggedPlainDocument</class>
          </fileType>
       </docFactory>

        <!-- This is for binary files. PDF and DOC files are binary, as are
          - CLASS and JAR files.
          -->
       <docFactory enabled="true" type="binary">
          <!-- This is for indexing PDF files -->
          <fileType name="PDF">
            <extension>.pdf</extension>
            <class>net.grcomputing.opencms.search.lucene.PDFDocument</class>
          </fileType>
          <!-- This is for indexing MS Word documents -->
          <fileType name="Word">
            <extension>.doc</extension>
            <extension>.dot</extension>
            
<class>net.grcomputing.opencms.search.lucene.WordDocument</class>
          </fileType>
       </docFactory>

       <!--
         - This will strip JSP tags and all scriptlets. IT WILL NOT 
RENDER THE
         - JSP FIRST, as JSPs are, by nature, dynamic.
         -
         - Usually, this is off by default.
         -->
       <docFactory enabled="false" type="jsp">
         <class>net.grcomputing.opencms.search.lucene.JspDocument</class>
       </docFactory>

       <!-- For the news module. Enable if you use news -->

<!--       <docFactory enabled="false" type="news">
         <class>net.grcomputing.opencms.search.lucene.NewsDocument</class>
       </docFactory>
-->

       <!-- For the forum module. Enable if you use forums. -->
<!--
       <docFactory enabled="false" type="forum">
         <class>de.wfnetz.opencms.modules.forum.ContributionDocument</class>
       </docFactory>
-->

       <!-- If you need to index XML Template files (bad idea) use this: -->
       <docFactory enabled="false" type="XML Template"/>
   </docFactories>
  
    <!--
      - <directories/> determines which directories are indexed. By default,
      - the /system directory is never indexed, so it is safe to index root.
      -
      - If you want to specify only certain directories for indexing, create
      - one <directory/> entry per directory. Again, you may use 
subsearch to
      - override the default subsearch setting discussed above.
      -->
   <directories>
       <directory location="/">
         <section>Root</section>
         <subsearch>true</subsearch>
       </directory>
   </directories>

   <!--
     - Use this section to define specific contentDefinitions. Provided 
below
     - are entries for the news and forum modules.
     - (Uncomment these only after you have installed the corresponding
     - modules)
     -->
   <contentDefinitions>
       <!--
       <contentDefinition type="news">
        -->
          <!--
            - <class /> determines the class of the content definition. 
Should
            - be a subclass of com.opencms.defaults.A_CmsContentDefinition.
            -->
         <!--
         
<class>com.opencms.modules.homepage.news.NewsContentDefinition</class>
          -->
          <!--
            - <initClass /> is optional and has to implement
            - 
net.grcomputing.opencms.search.lucene.I_ContentDefinitionInitialization.
            - It provides you with the ability to perform some
            - initialization before the content definition class can be 
used.
            - In case of the news module the 
NewsChannelContentDefinition class
            - has to be loaded.
            -->
         <!--
         
<initClass>net.grcomputing.opencms.search.lucene.NewsInitialization</initClass>
          -->
           <!--
             - <listMethod /> defines the method of the content 
definition class
             - which should be used to retrieve all content definition 
objects
             - (or any subset).
             - Usually you use this method also in the backoffice or any 
other
             - list view.
             -->
         <!--
         <listMethod name="getNewsList">
           <param type="java.lang.Integer">1</param>
           <param type="java.lang.String">-1</param>
         </listMethod>
          -->
           <!--
             - <page /> determines a page in the virtual file system 
that can
             - display a single entry of a content definition. You must 
provide
             - also a method of the content definition class that 
retrieves an
             - id (or something else that has to be appended to your 
page uri
             - to determine which entry has to be displayed). The result 
will
             - look like:
             - /news.html?__element=entry&newsid=<result of getIntId>
             - for each content definition instance object.
             -->
         <!--
         <page uri="/news.html?__element=entry">
           <param method="getIntId" name="newsid"/>
         </page>
          -->
         <!--
           <page uri="/singleNews.jsp">
             <param method="getIntId" name="id"/>
           </page>
           -->
       <!--
       </contentDefinition>
        -->
        <!-- for Forums modules
       <contentDefinition type="forum">
         
<class>de.wfnetz.opencms.modules.forum.ContributionContentDefinition</class>
         <listMethod name="getSortedList">
           <param type="java.lang.String"/>
         </listMethod>
         <page uri="/forum.html?forumtemplate=viewcontributionentry">
           <param method="getId" name="conid"/>
         </page>
       </contentDefinition>
       -->
   </contentDefinitions>
</luceneSearch>

    </system>
---------- cut ------





More information about the opencms-dev mailing list