[opencms-dev] lucene indexing doesn't start

M Butcher mbutcher at grcomputing.net
Fri May 14 17:33:02 CEST 2004


I think it's "Title", not "title"... but I can't remember. Some of the 
params are upper case and others are lower case.

Matt

Konstantins Dorodovs wrote:
> it's ok, task was run, only later, then expected
> 
> a new problem is:
> 
> doc.get("title")   returns null when lucene indexes on linux,
> 
> when I run on windows it seems ok
> 
> Konstantin
> 
> 
> 
> M Butcher wrote:
> 
>>
>> Are any other cron tasks executing? It sounds like the 
>> CronIndexManager is never being run.
>>
>> If you suspect otherwise, a simple test is to run the CronIndexManager 
>> from a JSP. That would print any exceptions directly to the browser 
>> window, which would be helpful.
>>
>> CmsJspActionElement cmsjsp =
>>     new CmsJspActionElement(pageContext, request, response)
>> CronIndexManager c = new CronIndexManager();
>> c.launch(cmsjsp.getCmsObject(), "createIndex=true");
>>
>> Matt
>>
>> Konstantins Dorodovs wrote:
>>
>>> looked in %CATALINA_HOME%\logs\localhost_log.MYDATE.txt
>>> no relevant errors there :(
>>>
>>>
>>>
>>>
>>> M Butcher wrote:
>>>
>>>>
>>>> Any errors in the catalina.log file?
>>>>
>>>> Matt
>>>>
>>>> Konstantins Dorodovs wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I have a problem with lucene indexing
>>>>> (opencms version: 5.0.6b1, lucene module: 1.5, tomcat: 4.1.30)
>>>>>
>>>>> cron job seems doesn't start:  looked at log
>>>>> entry in Scheduler(
>>>>> 11 21 * * * admin Administrators 
>>>>> net.grcomputing.opencms.search.lucene.CronIndexManager 
>>>>> createIndex=true
>>>>> )
>>>>>
>>>>> seems, I did according to docs,
>>>>> (cron is enabled: [11.05.2004 20:10:04] <opencms_init> . OpenCms 
>>>>> scheduler    : enabled)
>>>>> below, there is an excerpt from my registry.xml:
>>>>>
>>>>> Thanks
>>>>>
>>>>> Konstantin
>>>>>
>>>>>
>>>>> ---------- cut ------
>>>>>        <tempfileproject>3</tempfileproject>
>>>>>
>>>>> <luceneSearch>
>>>>>    <!--
>>>>>      - mergeFactor and permCheck are currently ignored.
>>>>>      -->
>>>>>   <mergeFactor>100000</mergeFactor>
>>>>>   <permCheck>true</permCheck>
>>>>>
>>>>>    <!--
>>>>>      - directory in which lucene will store its indexes. Note: this 
>>>>> is real
>>>>>      - fs, not VFS.
>>>>>      -->
>>>>>   <indexDir>C:\luceneindex\</indexDir>
>>>>>   <!-- <indexDir>F:\luceneindex\</indexDir> -->
>>>>>
>>>>>    <!--
>>>>>      - The analyzer is used for parsing documents. Choose one for your
>>>>>      - language. If language is English, use the StandardAnalyzer.
>>>>>      - There are additional analyzers at 
>>>>> http://jakarta.apache.org/lucene
>>>>>      -->
>>>>>   
>>>>> <analyzer>org.apache.lucene.analysis.standard.StandardAnalyzer</analyzer> 
>>>>>
>>>>>   <!-- 
>>>>> <analyzer>org.apache.lucene.analysis.de.GermanAnalyzer</analyzer> -->
>>>>>
>>>>>    <!--
>>>>>      - If subsearch is true, subfolders will be searched by default.
>>>>>      - This can be turned on/off per directory.
>>>>>      -->
>>>>>   <subsearch>true</subsearch>     <!--
>>>>>      - Name of the project to index. Online is recommended.
>>>>>      -->
>>>>>   <project>online</project>
>>>>>  
>>>>>    <!--
>>>>>      - docFactories determine how documents are processed. 
>>>>> Generally, one
>>>>>      - docFactory exists for each type of content (viz. JSP, Page, 
>>>>> Plain)
>>>>>      - that you want to index.
>>>>>      -->
>>>>>   <docFactories>
>>>>>  
>>>>>       <!--
>>>>>         - This docFactory indexes documents with type page (e.g. HTML
>>>>>         - files edited with the WYSIWYG editor).
>>>>>         -
>>>>>         - Note that the 'type' attribute specifies which content 
>>>>> definition
>>>>>         - to use. Built in content types include page, plain, 
>>>>> binary, and jsp
>>>>>         - (there are others, too). Custom content types can be used 
>>>>> as well
>>>>>         - (see the contentDefinitions section below).
>>>>>         -->
>>>>>       <docFactory enabled="true" type="page">
>>>>>         
>>>>> <class>net.grcomputing.opencms.search.lucene.PageDocument</class>
>>>>>       </docFactory>
>>>>>
>>>>>       <!--
>>>>>         - This docFactory is a little more complex. It takes 
>>>>> documents of
>>>>>         - type "plain" and determines, by extension, what class 
>>>>> should be
>>>>>         - used to index each particular file. In this example, we 
>>>>> want to
>>>>>         - index plain text files exactly as they are, but any files 
>>>>> that
>>>>>         - contain tags need the tags stripped out before they are 
>>>>> indexed.
>>>>>         -
>>>>>         - Note that the name="" attribute is simply for pretty 
>>>>> output, and
>>>>>         - can contain any allowable PCDATA text.
>>>>>         -->
>>>>>       <docFactory enabled="true" type="plain">
>>>>>          <fileType name="plaintext">
>>>>>            <extension>.txt</extension>
>>>>>            
>>>>> <class>net.grcomputing.opencms.search.lucene.PlainDocument</class>
>>>>>          </fileType>
>>>>>          <fileType name="taggedtext">
>>>>>            <extension>.html</extension>
>>>>>            <extension>.htm</extension>
>>>>>            <extension>.xml</extension>
>>>>>            <!-- This will strip tags before processing -->
>>>>>            
>>>>> <class>net.grcomputing.opencms.search.lucene.TaggedPlainDocument</class> 
>>>>>
>>>>>          </fileType>
>>>>>       </docFactory>
>>>>>
>>>>>        <!-- This is for binary files. PDF and DOC files are binary, 
>>>>> as are
>>>>>          - CLASS and JAR files.
>>>>>          -->
>>>>>       <docFactory enabled="true" type="binary">
>>>>>          <!-- This is for indexing PDF files -->
>>>>>          <fileType name="PDF">
>>>>>            <extension>.pdf</extension>
>>>>>            
>>>>> <class>net.grcomputing.opencms.search.lucene.PDFDocument</class>
>>>>>          </fileType>
>>>>>          <!-- This is for indexing MS Word documents -->
>>>>>          <fileType name="Word">
>>>>>            <extension>.doc</extension>
>>>>>            <extension>.dot</extension>
>>>>>            
>>>>> <class>net.grcomputing.opencms.search.lucene.WordDocument</class>
>>>>>          </fileType>
>>>>>       </docFactory>
>>>>>
>>>>>       <!--
>>>>>         - This will strip JSP tags and all scriptlets. IT WILL NOT 
>>>>> RENDER THE
>>>>>         - JSP FIRST, as JSPs are, by nature, dynamic.
>>>>>         -
>>>>>         - Usually, this is off by default.
>>>>>         -->
>>>>>       <docFactory enabled="false" type="jsp">
>>>>>         
>>>>> <class>net.grcomputing.opencms.search.lucene.JspDocument</class>
>>>>>       </docFactory>
>>>>>
>>>>>       <!-- For the news module. Enable if you use news -->
>>>>>
>>>>> <!--       <docFactory enabled="false" type="news">
>>>>>         
>>>>> <class>net.grcomputing.opencms.search.lucene.NewsDocument</class>
>>>>>       </docFactory>
>>>>> -->
>>>>>
>>>>>       <!-- For the forum module. Enable if you use forums. -->
>>>>> <!--
>>>>>       <docFactory enabled="false" type="forum">
>>>>>         
>>>>> <class>de.wfnetz.opencms.modules.forum.ContributionDocument</class>
>>>>>       </docFactory>
>>>>> -->
>>>>>
>>>>>       <!-- If you need to index XML Template files (bad idea) use 
>>>>> this: -->
>>>>>       <docFactory enabled="false" type="XML Template"/>
>>>>>   </docFactories>
>>>>>  
>>>>>    <!--
>>>>>      - <directories/> determines which directories are indexed. By 
>>>>> default,
>>>>>      - the /system directory is never indexed, so it is safe to 
>>>>> index root.
>>>>>      -
>>>>>      - If you want to specify only certain directories for 
>>>>> indexing, create
>>>>>      - one <directory/> entry per directory. Again, you may use 
>>>>> subsearch to
>>>>>      - override the default subsearch setting discussed above.
>>>>>      -->
>>>>>   <directories>
>>>>>       <directory location="/">
>>>>>         <section>Root</section>
>>>>>         <subsearch>true</subsearch>
>>>>>       </directory>
>>>>>   </directories>
>>>>>
>>>>>   <!--
>>>>>     - Use this section to define specific contentDefinitions. 
>>>>> Provided below
>>>>>     - are entries for the news and forum modules.
>>>>>     - (Uncomment these only after you have installed the corresponding
>>>>>     - modules)
>>>>>     -->
>>>>>   <contentDefinitions>
>>>>>       <!--
>>>>>       <contentDefinition type="news">
>>>>>        -->
>>>>>          <!--
>>>>>            - <class /> determines the class of the content 
>>>>> definition. Should
>>>>>            - be a subclass of 
>>>>> com.opencms.defaults.A_CmsContentDefinition.
>>>>>            -->
>>>>>         <!--
>>>>>         
>>>>> <class>com.opencms.modules.homepage.news.NewsContentDefinition</class>
>>>>>          -->
>>>>>          <!--
>>>>>            - <initClass /> is optional and has to implement
>>>>>            - 
>>>>> net.grcomputing.opencms.search.lucene.I_ContentDefinitionInitialization. 
>>>>>
>>>>>            - It provides you with the ability to perform some
>>>>>            - initialization before the content definition class can 
>>>>> be used.
>>>>>            - In case of the news module the 
>>>>> NewsChannelContentDefinition class
>>>>>            - has to be loaded.
>>>>>            -->
>>>>>         <!--
>>>>>         
>>>>> <initClass>net.grcomputing.opencms.search.lucene.NewsInitialization</initClass> 
>>>>>
>>>>>          -->
>>>>>           <!--
>>>>>             - <listMethod /> defines the method of the content 
>>>>> definition class
>>>>>             - which should be used to retrieve all content 
>>>>> definition objects
>>>>>             - (or any subset).
>>>>>             - Usually you use this method also in the backoffice or 
>>>>> any other
>>>>>             - list view.
>>>>>             -->
>>>>>         <!--
>>>>>         <listMethod name="getNewsList">
>>>>>           <param type="java.lang.Integer">1</param>
>>>>>           <param type="java.lang.String">-1</param>
>>>>>         </listMethod>
>>>>>          -->
>>>>>           <!--
>>>>>             - <page /> determines a page in the virtual file system 
>>>>> that can
>>>>>             - display a single entry of a content definition. You 
>>>>> must provide
>>>>>             - also a method of the content definition class that 
>>>>> retrieves an
>>>>>             - id (or something else that has to be appended to your 
>>>>> page uri
>>>>>             - to determine which entry has to be displayed). The 
>>>>> result will
>>>>>             - look like:
>>>>>             - /news.html?__element=entry&newsid=<result of getIntId>
>>>>>             - for each content definition instance object.
>>>>>             -->
>>>>>         <!--
>>>>>         <page uri="/news.html?__element=entry">
>>>>>           <param method="getIntId" name="newsid"/>
>>>>>         </page>
>>>>>          -->
>>>>>         <!--
>>>>>           <page uri="/singleNews.jsp">
>>>>>             <param method="getIntId" name="id"/>
>>>>>           </page>
>>>>>           -->
>>>>>       <!--
>>>>>       </contentDefinition>
>>>>>        -->
>>>>>        <!-- for Forums modules
>>>>>       <contentDefinition type="forum">
>>>>>         
>>>>> <class>de.wfnetz.opencms.modules.forum.ContributionContentDefinition</class> 
>>>>>
>>>>>         <listMethod name="getSortedList">
>>>>>           <param type="java.lang.String"/>
>>>>>         </listMethod>
>>>>>         <page uri="/forum.html?forumtemplate=viewcontributionentry">
>>>>>           <param method="getId" name="conid"/>
>>>>>         </page>
>>>>>       </contentDefinition>
>>>>>       -->
>>>>>   </contentDefinitions>
>>>>> </luceneSearch>
>>>>>
>>>>>    </system>
>>>>> ---------- cut ------
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> This mail is send to you from the opencms-dev mailing list
>>>>> To change your list options, or to unsubscribe from the list, 
>>>>> please visit
>>>>> http://mail.opencms.org/mailman/listinfo/opencms-dev
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> This mail is send to you from the opencms-dev mailing list
>>>> To change your list options, or to unsubscribe from the list, please 
>>>> visit
>>>> http://mail.opencms.org/mailman/listinfo/opencms-dev
>>>>
>>> _______________________________________________
>>> This mail is send to you from the opencms-dev mailing list
>>> To change your list options, or to unsubscribe from the list, please 
>>> visit
>>> http://mail.opencms.org/mailman/listinfo/opencms-dev
>>
>>
>>
>> _______________________________________________
>> This mail is send to you from the opencms-dev mailing list
>> To change your list options, or to unsubscribe from the list, please 
>> visit
>> http://mail.opencms.org/mailman/listinfo/opencms-dev
>>
> _______________________________________________
> This mail is send to you from the opencms-dev mailing list
> To change your list options, or to unsubscribe from the list, please visit
> http://mail.opencms.org/mailman/listinfo/opencms-dev




More information about the opencms-dev mailing list