[opencms-dev] lucene indexing doesn't start

Konstantins Dorodovs K.Dorodovs at mebius.lv
Fri May 14 10:58:01 CEST 2004


it's ok, task was run, only later, then expected

a new problem is:

doc.get("title")   returns null when lucene indexes on linux,

when I run on windows it seems ok

Konstantin



M Butcher wrote:

>
> Are any other cron tasks executing? It sounds like the 
> CronIndexManager is never being run.
>
> If you suspect otherwise, a simple test is to run the CronIndexManager 
> from a JSP. That would print any exceptions directly to the browser 
> window, which would be helpful.
>
> CmsJspActionElement cmsjsp =
>     new CmsJspActionElement(pageContext, request, response)
> CronIndexManager c = new CronIndexManager();
> c.launch(cmsjsp.getCmsObject(), "createIndex=true");
>
> Matt
>
> Konstantins Dorodovs wrote:
>
>> looked in %CATALINA_HOME%\logs\localhost_log.MYDATE.txt
>> no relevant errors there :(
>>
>>
>>
>>
>> M Butcher wrote:
>>
>>>
>>> Any errors in the catalina.log file?
>>>
>>> Matt
>>>
>>> Konstantins Dorodovs wrote:
>>>
>>>> Hi,
>>>>
>>>> I have a problem with lucene indexing
>>>> (opencms version: 5.0.6b1, lucene module: 1.5, tomcat: 4.1.30)
>>>>
>>>> cron job seems doesn't start:  looked at log
>>>> entry in Scheduler(
>>>> 11 21 * * * admin Administrators 
>>>> net.grcomputing.opencms.search.lucene.CronIndexManager 
>>>> createIndex=true
>>>> )
>>>>
>>>> seems, I did according to docs,
>>>> (cron is enabled: [11.05.2004 20:10:04] <opencms_init> . OpenCms 
>>>> scheduler    : enabled)
>>>> below, there is an excerpt from my registry.xml:
>>>>
>>>> Thanks
>>>>
>>>> Konstantin
>>>>
>>>>
>>>> ---------- cut ------
>>>>        <tempfileproject>3</tempfileproject>
>>>>
>>>> <luceneSearch>
>>>>    <!--
>>>>      - mergeFactor and permCheck are currently ignored.
>>>>      -->
>>>>   <mergeFactor>100000</mergeFactor>
>>>>   <permCheck>true</permCheck>
>>>>
>>>>    <!--
>>>>      - directory in which lucene will store its indexes. Note: this 
>>>> is real
>>>>      - fs, not VFS.
>>>>      -->
>>>>   <indexDir>C:\luceneindex\</indexDir>
>>>>   <!-- <indexDir>F:\luceneindex\</indexDir> -->
>>>>
>>>>    <!--
>>>>      - The analyzer is used for parsing documents. Choose one for your
>>>>      - language. If language is English, use the StandardAnalyzer.
>>>>      - There are additional analyzers at 
>>>> http://jakarta.apache.org/lucene
>>>>      -->
>>>>   
>>>> <analyzer>org.apache.lucene.analysis.standard.StandardAnalyzer</analyzer> 
>>>>
>>>>   <!-- 
>>>> <analyzer>org.apache.lucene.analysis.de.GermanAnalyzer</analyzer> -->
>>>>
>>>>    <!--
>>>>      - If subsearch is true, subfolders will be searched by default.
>>>>      - This can be turned on/off per directory.
>>>>      -->
>>>>   <subsearch>true</subsearch>     <!--
>>>>      - Name of the project to index. Online is recommended.
>>>>      -->
>>>>   <project>online</project>
>>>>  
>>>>    <!--
>>>>      - docFactories determine how documents are processed. 
>>>> Generally, one
>>>>      - docFactory exists for each type of content (viz. JSP, Page, 
>>>> Plain)
>>>>      - that you want to index.
>>>>      -->
>>>>   <docFactories>
>>>>  
>>>>       <!--
>>>>         - This docFactory indexes documents with type page (e.g. HTML
>>>>         - files edited with the WYSIWYG editor).
>>>>         -
>>>>         - Note that the 'type' attribute specifies which content 
>>>> definition
>>>>         - to use. Built in content types include page, plain, 
>>>> binary, and jsp
>>>>         - (there are others, too). Custom content types can be used 
>>>> as well
>>>>         - (see the contentDefinitions section below).
>>>>         -->
>>>>       <docFactory enabled="true" type="page">
>>>>         
>>>> <class>net.grcomputing.opencms.search.lucene.PageDocument</class>
>>>>       </docFactory>
>>>>
>>>>       <!--
>>>>         - This docFactory is a little more complex. It takes 
>>>> documents of
>>>>         - type "plain" and determines, by extension, what class 
>>>> should be
>>>>         - used to index each particular file. In this example, we 
>>>> want to
>>>>         - index plain text files exactly as they are, but any files 
>>>> that
>>>>         - contain tags need the tags stripped out before they are 
>>>> indexed.
>>>>         -
>>>>         - Note that the name="" attribute is simply for pretty 
>>>> output, and
>>>>         - can contain any allowable PCDATA text.
>>>>         -->
>>>>       <docFactory enabled="true" type="plain">
>>>>          <fileType name="plaintext">
>>>>            <extension>.txt</extension>
>>>>            
>>>> <class>net.grcomputing.opencms.search.lucene.PlainDocument</class>
>>>>          </fileType>
>>>>          <fileType name="taggedtext">
>>>>            <extension>.html</extension>
>>>>            <extension>.htm</extension>
>>>>            <extension>.xml</extension>
>>>>            <!-- This will strip tags before processing -->
>>>>            
>>>> <class>net.grcomputing.opencms.search.lucene.TaggedPlainDocument</class> 
>>>>
>>>>          </fileType>
>>>>       </docFactory>
>>>>
>>>>        <!-- This is for binary files. PDF and DOC files are binary, 
>>>> as are
>>>>          - CLASS and JAR files.
>>>>          -->
>>>>       <docFactory enabled="true" type="binary">
>>>>          <!-- This is for indexing PDF files -->
>>>>          <fileType name="PDF">
>>>>            <extension>.pdf</extension>
>>>>            
>>>> <class>net.grcomputing.opencms.search.lucene.PDFDocument</class>
>>>>          </fileType>
>>>>          <!-- This is for indexing MS Word documents -->
>>>>          <fileType name="Word">
>>>>            <extension>.doc</extension>
>>>>            <extension>.dot</extension>
>>>>            
>>>> <class>net.grcomputing.opencms.search.lucene.WordDocument</class>
>>>>          </fileType>
>>>>       </docFactory>
>>>>
>>>>       <!--
>>>>         - This will strip JSP tags and all scriptlets. IT WILL NOT 
>>>> RENDER THE
>>>>         - JSP FIRST, as JSPs are, by nature, dynamic.
>>>>         -
>>>>         - Usually, this is off by default.
>>>>         -->
>>>>       <docFactory enabled="false" type="jsp">
>>>>         
>>>> <class>net.grcomputing.opencms.search.lucene.JspDocument</class>
>>>>       </docFactory>
>>>>
>>>>       <!-- For the news module. Enable if you use news -->
>>>>
>>>> <!--       <docFactory enabled="false" type="news">
>>>>         
>>>> <class>net.grcomputing.opencms.search.lucene.NewsDocument</class>
>>>>       </docFactory>
>>>> -->
>>>>
>>>>       <!-- For the forum module. Enable if you use forums. -->
>>>> <!--
>>>>       <docFactory enabled="false" type="forum">
>>>>         
>>>> <class>de.wfnetz.opencms.modules.forum.ContributionDocument</class>
>>>>       </docFactory>
>>>> -->
>>>>
>>>>       <!-- If you need to index XML Template files (bad idea) use 
>>>> this: -->
>>>>       <docFactory enabled="false" type="XML Template"/>
>>>>   </docFactories>
>>>>  
>>>>    <!--
>>>>      - <directories/> determines which directories are indexed. By 
>>>> default,
>>>>      - the /system directory is never indexed, so it is safe to 
>>>> index root.
>>>>      -
>>>>      - If you want to specify only certain directories for 
>>>> indexing, create
>>>>      - one <directory/> entry per directory. Again, you may use 
>>>> subsearch to
>>>>      - override the default subsearch setting discussed above.
>>>>      -->
>>>>   <directories>
>>>>       <directory location="/">
>>>>         <section>Root</section>
>>>>         <subsearch>true</subsearch>
>>>>       </directory>
>>>>   </directories>
>>>>
>>>>   <!--
>>>>     - Use this section to define specific contentDefinitions. 
>>>> Provided below
>>>>     - are entries for the news and forum modules.
>>>>     - (Uncomment these only after you have installed the corresponding
>>>>     - modules)
>>>>     -->
>>>>   <contentDefinitions>
>>>>       <!--
>>>>       <contentDefinition type="news">
>>>>        -->
>>>>          <!--
>>>>            - <class /> determines the class of the content 
>>>> definition. Should
>>>>            - be a subclass of 
>>>> com.opencms.defaults.A_CmsContentDefinition.
>>>>            -->
>>>>         <!--
>>>>         
>>>> <class>com.opencms.modules.homepage.news.NewsContentDefinition</class>
>>>>          -->
>>>>          <!--
>>>>            - <initClass /> is optional and has to implement
>>>>            - 
>>>> net.grcomputing.opencms.search.lucene.I_ContentDefinitionInitialization. 
>>>>
>>>>            - It provides you with the ability to perform some
>>>>            - initialization before the content definition class can 
>>>> be used.
>>>>            - In case of the news module the 
>>>> NewsChannelContentDefinition class
>>>>            - has to be loaded.
>>>>            -->
>>>>         <!--
>>>>         
>>>> <initClass>net.grcomputing.opencms.search.lucene.NewsInitialization</initClass> 
>>>>
>>>>          -->
>>>>           <!--
>>>>             - <listMethod /> defines the method of the content 
>>>> definition class
>>>>             - which should be used to retrieve all content 
>>>> definition objects
>>>>             - (or any subset).
>>>>             - Usually you use this method also in the backoffice or 
>>>> any other
>>>>             - list view.
>>>>             -->
>>>>         <!--
>>>>         <listMethod name="getNewsList">
>>>>           <param type="java.lang.Integer">1</param>
>>>>           <param type="java.lang.String">-1</param>
>>>>         </listMethod>
>>>>          -->
>>>>           <!--
>>>>             - <page /> determines a page in the virtual file system 
>>>> that can
>>>>             - display a single entry of a content definition. You 
>>>> must provide
>>>>             - also a method of the content definition class that 
>>>> retrieves an
>>>>             - id (or something else that has to be appended to your 
>>>> page uri
>>>>             - to determine which entry has to be displayed). The 
>>>> result will
>>>>             - look like:
>>>>             - /news.html?__element=entry&newsid=<result of getIntId>
>>>>             - for each content definition instance object.
>>>>             -->
>>>>         <!--
>>>>         <page uri="/news.html?__element=entry">
>>>>           <param method="getIntId" name="newsid"/>
>>>>         </page>
>>>>          -->
>>>>         <!--
>>>>           <page uri="/singleNews.jsp">
>>>>             <param method="getIntId" name="id"/>
>>>>           </page>
>>>>           -->
>>>>       <!--
>>>>       </contentDefinition>
>>>>        -->
>>>>        <!-- for Forums modules
>>>>       <contentDefinition type="forum">
>>>>         
>>>> <class>de.wfnetz.opencms.modules.forum.ContributionContentDefinition</class> 
>>>>
>>>>         <listMethod name="getSortedList">
>>>>           <param type="java.lang.String"/>
>>>>         </listMethod>
>>>>         <page uri="/forum.html?forumtemplate=viewcontributionentry">
>>>>           <param method="getId" name="conid"/>
>>>>         </page>
>>>>       </contentDefinition>
>>>>       -->
>>>>   </contentDefinitions>
>>>> </luceneSearch>
>>>>
>>>>    </system>
>>>> ---------- cut ------
>>>>
>>>>
>>>> _______________________________________________
>>>> This mail is send to you from the opencms-dev mailing list
>>>> To change your list options, or to unsubscribe from the list, 
>>>> please visit
>>>> http://mail.opencms.org/mailman/listinfo/opencms-dev
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> This mail is send to you from the opencms-dev mailing list
>>> To change your list options, or to unsubscribe from the list, please 
>>> visit
>>> http://mail.opencms.org/mailman/listinfo/opencms-dev
>>>
>> _______________________________________________
>> This mail is send to you from the opencms-dev mailing list
>> To change your list options, or to unsubscribe from the list, please 
>> visit
>> http://mail.opencms.org/mailman/listinfo/opencms-dev
>
>
> _______________________________________________
> This mail is send to you from the opencms-dev mailing list
> To change your list options, or to unsubscribe from the list, please 
> visit
> http://mail.opencms.org/mailman/listinfo/opencms-dev
>



More information about the opencms-dev mailing list