[opencms-dev] lucene indexing doesn't start

Wed May 12 22:05:02 CEST 2004

Are any other cron tasks executing? It sounds like the CronIndexManager 
is never being run.

If you suspect otherwise, a simple test is to run the CronIndexManager 
from a JSP. That would print any exceptions directly to the browser 
window, which would be helpful.

CmsJspActionElement cmsjsp =
     new CmsJspActionElement(pageContext, request, response)
CronIndexManager c = new CronIndexManager();
c.launch(cmsjsp.getCmsObject(), "createIndex=true");

Matt

Konstantins Dorodovs wrote:
> looked in %CATALINA_HOME%\logs\localhost_log.MYDATE.txt
> no relevant errors there :(
> 
> 
> 
> 
> M Butcher wrote:
> 
>>
>> Any errors in the catalina.log file?
>>
>> Matt
>>
>> Konstantins Dorodovs wrote:
>>
>>> Hi,
>>>
>>> I have a problem with lucene indexing
>>> (opencms version: 5.0.6b1, lucene module: 1.5, tomcat: 4.1.30)
>>>
>>> cron job seems doesn't start:  looked at log
>>> entry in Scheduler(
>>> 11 21 * * * admin Administrators 
>>> net.grcomputing.opencms.search.lucene.CronIndexManager createIndex=true
>>> )
>>>
>>> seems, I did according to docs,
>>> (cron is enabled: [11.05.2004 20:10:04] <opencms_init> . OpenCms 
>>> scheduler    : enabled)
>>> below, there is an excerpt from my registry.xml:
>>>
>>> Thanks
>>>
>>> Konstantin
>>>
>>>
>>> ---------- cut ------
>>>        <tempfileproject>3</tempfileproject>
>>>
>>> <luceneSearch>
>>>    <!--
>>>      - mergeFactor and permCheck are currently ignored.
>>>      -->
>>>   <mergeFactor>100000</mergeFactor>
>>>   <permCheck>true</permCheck>
>>>
>>>    <!--
>>>      - directory in which lucene will store its indexes. Note: this 
>>> is real
>>>      - fs, not VFS.
>>>      -->
>>>   <indexDir>C:\luceneindex\</indexDir>
>>>   <!-- <indexDir>F:\luceneindex\</indexDir> -->
>>>
>>>    <!--
>>>      - The analyzer is used for parsing documents. Choose one for your
>>>      - language. If language is English, use the StandardAnalyzer.
>>>      - There are additional analyzers at 
>>> http://jakarta.apache.org/lucene
>>>      -->
>>>   
>>> <analyzer>org.apache.lucene.analysis.standard.StandardAnalyzer</analyzer> 
>>>
>>>   <!-- 
>>> <analyzer>org.apache.lucene.analysis.de.GermanAnalyzer</analyzer> -->
>>>
>>>    <!--
>>>      - If subsearch is true, subfolders will be searched by default.
>>>      - This can be turned on/off per directory.
>>>      -->
>>>   <subsearch>true</subsearch>     <!--
>>>      - Name of the project to index. Online is recommended.
>>>      -->
>>>   <project>online</project>
>>>  
>>>    <!--
>>>      - docFactories determine how documents are processed. Generally, 
>>> one
>>>      - docFactory exists for each type of content (viz. JSP, Page, 
>>> Plain)
>>>      - that you want to index.
>>>      -->
>>>   <docFactories>
>>>  
>>>       <!--
>>>         - This docFactory indexes documents with type page (e.g. HTML
>>>         - files edited with the WYSIWYG editor).
>>>         -
>>>         - Note that the 'type' attribute specifies which content 
>>> definition
>>>         - to use. Built in content types include page, plain, binary, 
>>> and jsp
>>>         - (there are others, too). Custom content types can be used 
>>> as well
>>>         - (see the contentDefinitions section below).
>>>         -->
>>>       <docFactory enabled="true" type="page">
>>>         
>>> <class>net.grcomputing.opencms.search.lucene.PageDocument</class>
>>>       </docFactory>
>>>
>>>       <!--
>>>         - This docFactory is a little more complex. It takes 
>>> documents of
>>>         - type "plain" and determines, by extension, what class 
>>> should be
>>>         - used to index each particular file. In this example, we 
>>> want to
>>>         - index plain text files exactly as they are, but any files that
>>>         - contain tags need the tags stripped out before they are 
>>> indexed.
>>>         -
>>>         - Note that the name="" attribute is simply for pretty 
>>> output, and
>>>         - can contain any allowable PCDATA text.
>>>         -->
>>>       <docFactory enabled="true" type="plain">
>>>          <fileType name="plaintext">
>>>            <extension>.txt</extension>
>>>            
>>> <class>net.grcomputing.opencms.search.lucene.PlainDocument</class>
>>>          </fileType>
>>>          <fileType name="taggedtext">
>>>            <extension>.html</extension>
>>>            <extension>.htm</extension>
>>>            <extension>.xml</extension>
>>>            <!-- This will strip tags before processing -->
>>>            
>>> <class>net.grcomputing.opencms.search.lucene.TaggedPlainDocument</class>
>>>          </fileType>
>>>       </docFactory>
>>>
>>>        <!-- This is for binary files. PDF and DOC files are binary, 
>>> as are
>>>          - CLASS and JAR files.
>>>          -->
>>>       <docFactory enabled="true" type="binary">
>>>          <!-- This is for indexing PDF files -->
>>>          <fileType name="PDF">
>>>            <extension>.pdf</extension>
>>>            
>>> <class>net.grcomputing.opencms.search.lucene.PDFDocument</class>
>>>          </fileType>
>>>          <!-- This is for indexing MS Word documents -->
>>>          <fileType name="Word">
>>>            <extension>.doc</extension>
>>>            <extension>.dot</extension>
>>>            
>>> <class>net.grcomputing.opencms.search.lucene.WordDocument</class>
>>>          </fileType>
>>>       </docFactory>
>>>
>>>       <!--
>>>         - This will strip JSP tags and all scriptlets. IT WILL NOT 
>>> RENDER THE
>>>         - JSP FIRST, as JSPs are, by nature, dynamic.
>>>         -
>>>         - Usually, this is off by default.
>>>         -->
>>>       <docFactory enabled="false" type="jsp">
>>>         <class>net.grcomputing.opencms.search.lucene.JspDocument</class>
>>>       </docFactory>
>>>
>>>       <!-- For the news module. Enable if you use news -->
>>>
>>> <!--       <docFactory enabled="false" type="news">
>>>         
>>> <class>net.grcomputing.opencms.search.lucene.NewsDocument</class>
>>>       </docFactory>
>>> -->
>>>
>>>       <!-- For the forum module. Enable if you use forums. -->
>>> <!--
>>>       <docFactory enabled="false" type="forum">
>>>         
>>> <class>de.wfnetz.opencms.modules.forum.ContributionDocument</class>
>>>       </docFactory>
>>> -->
>>>
>>>       <!-- If you need to index XML Template files (bad idea) use 
>>> this: -->
>>>       <docFactory enabled="false" type="XML Template"/>
>>>   </docFactories>
>>>  
>>>    <!--
>>>      - <directories/> determines which directories are indexed. By 
>>> default,
>>>      - the /system directory is never indexed, so it is safe to index 
>>> root.
>>>      -
>>>      - If you want to specify only certain directories for indexing, 
>>> create
>>>      - one <directory/> entry per directory. Again, you may use 
>>> subsearch to
>>>      - override the default subsearch setting discussed above.
>>>      -->
>>>   <directories>
>>>       <directory location="/">
>>>         <section>Root</section>
>>>         <subsearch>true</subsearch>
>>>       </directory>
>>>   </directories>
>>>
>>>   <!--
>>>     - Use this section to define specific contentDefinitions. 
>>> Provided below
>>>     - are entries for the news and forum modules.
>>>     - (Uncomment these only after you have installed the corresponding
>>>     - modules)
>>>     -->
>>>   <contentDefinitions>
>>>       <!--
>>>       <contentDefinition type="news">
>>>        -->
>>>          <!--
>>>            - <class /> determines the class of the content 
>>> definition. Should
>>>            - be a subclass of 
>>> com.opencms.defaults.A_CmsContentDefinition.
>>>            -->
>>>         <!--
>>>         
>>> <class>com.opencms.modules.homepage.news.NewsContentDefinition</class>
>>>          -->
>>>          <!--
>>>            - <initClass /> is optional and has to implement
>>>            - 
>>> net.grcomputing.opencms.search.lucene.I_ContentDefinitionInitialization.
>>>            - It provides you with the ability to perform some
>>>            - initialization before the content definition class can 
>>> be used.
>>>            - In case of the news module the 
>>> NewsChannelContentDefinition class
>>>            - has to be loaded.
>>>            -->
>>>         <!--
>>>         
>>> <initClass>net.grcomputing.opencms.search.lucene.NewsInitialization</initClass> 
>>>
>>>          -->
>>>           <!--
>>>             - <listMethod /> defines the method of the content 
>>> definition class
>>>             - which should be used to retrieve all content definition 
>>> objects
>>>             - (or any subset).
>>>             - Usually you use this method also in the backoffice or 
>>> any other
>>>             - list view.
>>>             -->
>>>         <!--
>>>         <listMethod name="getNewsList">
>>>           <param type="java.lang.Integer">1</param>
>>>           <param type="java.lang.String">-1</param>
>>>         </listMethod>
>>>          -->
>>>           <!--
>>>             - <page /> determines a page in the virtual file system 
>>> that can
>>>             - display a single entry of a content definition. You 
>>> must provide
>>>             - also a method of the content definition class that 
>>> retrieves an
>>>             - id (or something else that has to be appended to your 
>>> page uri
>>>             - to determine which entry has to be displayed). The 
>>> result will
>>>             - look like:
>>>             - /news.html?__element=entry&newsid=<result of getIntId>
>>>             - for each content definition instance object.
>>>             -->
>>>         <!--
>>>         <page uri="/news.html?__element=entry">
>>>           <param method="getIntId" name="newsid"/>
>>>         </page>
>>>          -->
>>>         <!--
>>>           <page uri="/singleNews.jsp">
>>>             <param method="getIntId" name="id"/>
>>>           </page>
>>>           -->
>>>       <!--
>>>       </contentDefinition>
>>>        -->
>>>        <!-- for Forums modules
>>>       <contentDefinition type="forum">
>>>         
>>> <class>de.wfnetz.opencms.modules.forum.ContributionContentDefinition</class> 
>>>
>>>         <listMethod name="getSortedList">
>>>           <param type="java.lang.String"/>
>>>         </listMethod>
>>>         <page uri="/forum.html?forumtemplate=viewcontributionentry">
>>>           <param method="getId" name="conid"/>
>>>         </page>
>>>       </contentDefinition>
>>>       -->
>>>   </contentDefinitions>
>>> </luceneSearch>
>>>
>>>    </system>
>>> ---------- cut ------
>>>
>>>
>>> _______________________________________________
>>> This mail is send to you from the opencms-dev mailing list
>>> To change your list options, or to unsubscribe from the list, please 
>>> visit
>>> http://mail.opencms.org/mailman/listinfo/opencms-dev
>>
>>
>>
>> _______________________________________________
>> This mail is send to you from the opencms-dev mailing list
>> To change your list options, or to unsubscribe from the list, please 
>> visit
>> http://mail.opencms.org/mailman/listinfo/opencms-dev
>>
> _______________________________________________
> This mail is send to you from the opencms-dev mailing list
> To change your list options, or to unsubscribe from the list, please visit
> http://mail.opencms.org/mailman/listinfo/opencms-dev