[opencms-dev] lucene indexing doesn't start

Konstantins Dorodovs K.Dorodovs at mebius.lv
Mon May 17 20:15:02 CEST 2004


thanks for advice, Matt



M Butcher wrote:

>
> You do actually have to restart the servlet container. Otherwise, it 
> will use the classes loaded during startup.
>
> Matt
>
> Konstantins Dorodovs wrote:
>
>> unfortunately matched document doesn't contain neither "Title" nor 
>> "title" fields
>> only
>>
>> abs_path
>> initial_add
>> last_modified
>>
>> fields.
>>
>> I played with lucene module source a bit, and added to 
>> BodylessDocument.Document (CmsObject, CmsFile)
>> code:
>>            if((title = cmso.readProperty(absPath, "Title")) != null) {
>>            doc.add(Field.Text(FIELD_TITLE, title));
>>            doc.add(Field.UnStored(FIELD_BULK, title));
>>        }
>>
>> it didn't help,
>> although I'm not sure whether my code loaded, I supposed, module 
>> change doesn't require
>> me to restart the web server.
>>
>> any idea
>>
>>
>> M Butcher wrote:
>>
>>>
>>> I think it's "Title", not "title"... but I can't remember. Some of 
>>> the params are upper case and others are lower case.
>>>
>>> Matt
>>>
>>> Konstantins Dorodovs wrote:
>>>
>>>> it's ok, task was run, only later, then expected
>>>>
>>>> a new problem is:
>>>>
>>>> doc.get("title")   returns null when lucene indexes on linux,
>>>>
>>>> when I run on windows it seems ok
>>>>
>>>> Konstantin
>>>>
>>>>
>>>>
>>>> M Butcher wrote:
>>>>
>>>>>
>>>>> Are any other cron tasks executing? It sounds like the 
>>>>> CronIndexManager is never being run.
>>>>>
>>>>> If you suspect otherwise, a simple test is to run the 
>>>>> CronIndexManager from a JSP. That would print any exceptions 
>>>>> directly to the browser window, which would be helpful.
>>>>>
>>>>> CmsJspActionElement cmsjsp =
>>>>>     new CmsJspActionElement(pageContext, request, response)
>>>>> CronIndexManager c = new CronIndexManager();
>>>>> c.launch(cmsjsp.getCmsObject(), "createIndex=true");
>>>>>
>>>>> Matt
>>>>>
>>>>> Konstantins Dorodovs wrote:
>>>>>
>>>>>> looked in %CATALINA_HOME%\logs\localhost_log.MYDATE.txt
>>>>>> no relevant errors there :(
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> M Butcher wrote:
>>>>>>
>>>>>>>
>>>>>>> Any errors in the catalina.log file?
>>>>>>>
>>>>>>> Matt
>>>>>>>
>>>>>>> Konstantins Dorodovs wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I have a problem with lucene indexing
>>>>>>>> (opencms version: 5.0.6b1, lucene module: 1.5, tomcat: 4.1.30)
>>>>>>>>
>>>>>>>> cron job seems doesn't start:  looked at log
>>>>>>>> entry in Scheduler(
>>>>>>>> 11 21 * * * admin Administrators 
>>>>>>>> net.grcomputing.opencms.search.lucene.CronIndexManager 
>>>>>>>> createIndex=true
>>>>>>>> )
>>>>>>>>
>>>>>>>> seems, I did according to docs,
>>>>>>>> (cron is enabled: [11.05.2004 20:10:04] <opencms_init> . 
>>>>>>>> OpenCms scheduler    : enabled)
>>>>>>>> below, there is an excerpt from my registry.xml:
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>> Konstantin
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------- cut ------
>>>>>>>>        <tempfileproject>3</tempfileproject>
>>>>>>>>
>>>>>>>> <luceneSearch>
>>>>>>>>    <!--
>>>>>>>>      - mergeFactor and permCheck are currently ignored.
>>>>>>>>      -->
>>>>>>>>   <mergeFactor>100000</mergeFactor>
>>>>>>>>   <permCheck>true</permCheck>
>>>>>>>>
>>>>>>>>    <!--
>>>>>>>>      - directory in which lucene will store its indexes. Note: 
>>>>>>>> this is real
>>>>>>>>      - fs, not VFS.
>>>>>>>>      -->
>>>>>>>>   <indexDir>C:\luceneindex\</indexDir>
>>>>>>>>   <!-- <indexDir>F:\luceneindex\</indexDir> -->
>>>>>>>>
>>>>>>>>    <!--
>>>>>>>>      - The analyzer is used for parsing documents. Choose one 
>>>>>>>> for your
>>>>>>>>      - language. If language is English, use the StandardAnalyzer.
>>>>>>>>      - There are additional analyzers at 
>>>>>>>> http://jakarta.apache.org/lucene
>>>>>>>>      -->
>>>>>>>>   
>>>>>>>> <analyzer>org.apache.lucene.analysis.standard.StandardAnalyzer</analyzer> 
>>>>>>>>
>>>>>>>>   <!-- 
>>>>>>>> <analyzer>org.apache.lucene.analysis.de.GermanAnalyzer</analyzer> 
>>>>>>>> -->
>>>>>>>>
>>>>>>>>    <!--
>>>>>>>>      - If subsearch is true, subfolders will be searched by 
>>>>>>>> default.
>>>>>>>>      - This can be turned on/off per directory.
>>>>>>>>      -->
>>>>>>>>   <subsearch>true</subsearch>     <!--
>>>>>>>>      - Name of the project to index. Online is recommended.
>>>>>>>>      -->
>>>>>>>>   <project>online</project>
>>>>>>>>  
>>>>>>>>    <!--
>>>>>>>>      - docFactories determine how documents are processed. 
>>>>>>>> Generally, one
>>>>>>>>      - docFactory exists for each type of content (viz. JSP, 
>>>>>>>> Page, Plain)
>>>>>>>>      - that you want to index.
>>>>>>>>      -->
>>>>>>>>   <docFactories>
>>>>>>>>  
>>>>>>>>       <!--
>>>>>>>>         - This docFactory indexes documents with type page 
>>>>>>>> (e.g. HTML
>>>>>>>>         - files edited with the WYSIWYG editor).
>>>>>>>>         -
>>>>>>>>         - Note that the 'type' attribute specifies which 
>>>>>>>> content definition
>>>>>>>>         - to use. Built in content types include page, plain, 
>>>>>>>> binary, and jsp
>>>>>>>>         - (there are others, too). Custom content types can be 
>>>>>>>> used as well
>>>>>>>>         - (see the contentDefinitions section below).
>>>>>>>>         -->
>>>>>>>>       <docFactory enabled="true" type="page">
>>>>>>>>         
>>>>>>>> <class>net.grcomputing.opencms.search.lucene.PageDocument</class>
>>>>>>>>       </docFactory>
>>>>>>>>
>>>>>>>>       <!--
>>>>>>>>         - This docFactory is a little more complex. It takes 
>>>>>>>> documents of
>>>>>>>>         - type "plain" and determines, by extension, what class 
>>>>>>>> should be
>>>>>>>>         - used to index each particular file. In this example, 
>>>>>>>> we want to
>>>>>>>>         - index plain text files exactly as they are, but any 
>>>>>>>> files that
>>>>>>>>         - contain tags need the tags stripped out before they 
>>>>>>>> are indexed.
>>>>>>>>         -
>>>>>>>>         - Note that the name="" attribute is simply for pretty 
>>>>>>>> output, and
>>>>>>>>         - can contain any allowable PCDATA text.
>>>>>>>>         -->
>>>>>>>>       <docFactory enabled="true" type="plain">
>>>>>>>>          <fileType name="plaintext">
>>>>>>>>            <extension>.txt</extension>
>>>>>>>>            
>>>>>>>> <class>net.grcomputing.opencms.search.lucene.PlainDocument</class>
>>>>>>>>          </fileType>
>>>>>>>>          <fileType name="taggedtext">
>>>>>>>>            <extension>.html</extension>
>>>>>>>>            <extension>.htm</extension>
>>>>>>>>            <extension>.xml</extension>
>>>>>>>>            <!-- This will strip tags before processing -->
>>>>>>>>            
>>>>>>>> <class>net.grcomputing.opencms.search.lucene.TaggedPlainDocument</class> 
>>>>>>>>
>>>>>>>>          </fileType>
>>>>>>>>       </docFactory>
>>>>>>>>
>>>>>>>>        <!-- This is for binary files. PDF and DOC files are 
>>>>>>>> binary, as are
>>>>>>>>          - CLASS and JAR files.
>>>>>>>>          -->
>>>>>>>>       <docFactory enabled="true" type="binary">
>>>>>>>>          <!-- This is for indexing PDF files -->
>>>>>>>>          <fileType name="PDF">
>>>>>>>>            <extension>.pdf</extension>
>>>>>>>>            
>>>>>>>> <class>net.grcomputing.opencms.search.lucene.PDFDocument</class>
>>>>>>>>          </fileType>
>>>>>>>>          <!-- This is for indexing MS Word documents -->
>>>>>>>>          <fileType name="Word">
>>>>>>>>            <extension>.doc</extension>
>>>>>>>>            <extension>.dot</extension>
>>>>>>>>            
>>>>>>>> <class>net.grcomputing.opencms.search.lucene.WordDocument</class>
>>>>>>>>          </fileType>
>>>>>>>>       </docFactory>
>>>>>>>>
>>>>>>>>       <!--
>>>>>>>>         - This will strip JSP tags and all scriptlets. IT WILL 
>>>>>>>> NOT RENDER THE
>>>>>>>>         - JSP FIRST, as JSPs are, by nature, dynamic.
>>>>>>>>         -
>>>>>>>>         - Usually, this is off by default.
>>>>>>>>         -->
>>>>>>>>       <docFactory enabled="false" type="jsp">
>>>>>>>>         
>>>>>>>> <class>net.grcomputing.opencms.search.lucene.JspDocument</class>
>>>>>>>>       </docFactory>
>>>>>>>>
>>>>>>>>       <!-- For the news module. Enable if you use news -->
>>>>>>>>
>>>>>>>> <!--       <docFactory enabled="false" type="news">
>>>>>>>>         
>>>>>>>> <class>net.grcomputing.opencms.search.lucene.NewsDocument</class>
>>>>>>>>       </docFactory>
>>>>>>>> -->
>>>>>>>>
>>>>>>>>       <!-- For the forum module. Enable if you use forums. -->
>>>>>>>> <!--
>>>>>>>>       <docFactory enabled="false" type="forum">
>>>>>>>>         
>>>>>>>> <class>de.wfnetz.opencms.modules.forum.ContributionDocument</class> 
>>>>>>>>
>>>>>>>>       </docFactory>
>>>>>>>> -->
>>>>>>>>
>>>>>>>>       <!-- If you need to index XML Template files (bad idea) 
>>>>>>>> use this: -->
>>>>>>>>       <docFactory enabled="false" type="XML Template"/>
>>>>>>>>   </docFactories>
>>>>>>>>  
>>>>>>>>    <!--
>>>>>>>>      - <directories/> determines which directories are indexed. 
>>>>>>>> By default,
>>>>>>>>      - the /system directory is never indexed, so it is safe to 
>>>>>>>> index root.
>>>>>>>>      -
>>>>>>>>      - If you want to specify only certain directories for 
>>>>>>>> indexing, create
>>>>>>>>      - one <directory/> entry per directory. Again, you may use 
>>>>>>>> subsearch to
>>>>>>>>      - override the default subsearch setting discussed above.
>>>>>>>>      -->
>>>>>>>>   <directories>
>>>>>>>>       <directory location="/">
>>>>>>>>         <section>Root</section>
>>>>>>>>         <subsearch>true</subsearch>
>>>>>>>>       </directory>
>>>>>>>>   </directories>
>>>>>>>>
>>>>>>>>   <!--
>>>>>>>>     - Use this section to define specific contentDefinitions. 
>>>>>>>> Provided below
>>>>>>>>     - are entries for the news and forum modules.
>>>>>>>>     - (Uncomment these only after you have installed the 
>>>>>>>> corresponding
>>>>>>>>     - modules)
>>>>>>>>     -->
>>>>>>>>   <contentDefinitions>
>>>>>>>>       <!--
>>>>>>>>       <contentDefinition type="news">
>>>>>>>>        -->
>>>>>>>>          <!--
>>>>>>>>            - <class /> determines the class of the content 
>>>>>>>> definition. Should
>>>>>>>>            - be a subclass of 
>>>>>>>> com.opencms.defaults.A_CmsContentDefinition.
>>>>>>>>            -->
>>>>>>>>         <!--
>>>>>>>>         
>>>>>>>> <class>com.opencms.modules.homepage.news.NewsContentDefinition</class> 
>>>>>>>>
>>>>>>>>          -->
>>>>>>>>          <!--
>>>>>>>>            - <initClass /> is optional and has to implement
>>>>>>>>            - 
>>>>>>>> net.grcomputing.opencms.search.lucene.I_ContentDefinitionInitialization. 
>>>>>>>>
>>>>>>>>            - It provides you with the ability to perform some
>>>>>>>>            - initialization before the content definition class 
>>>>>>>> can be used.
>>>>>>>>            - In case of the news module the 
>>>>>>>> NewsChannelContentDefinition class
>>>>>>>>            - has to be loaded.
>>>>>>>>            -->
>>>>>>>>         <!--
>>>>>>>>         
>>>>>>>> <initClass>net.grcomputing.opencms.search.lucene.NewsInitialization</initClass> 
>>>>>>>>
>>>>>>>>          -->
>>>>>>>>           <!--
>>>>>>>>             - <listMethod /> defines the method of the content 
>>>>>>>> definition class
>>>>>>>>             - which should be used to retrieve all content 
>>>>>>>> definition objects
>>>>>>>>             - (or any subset).
>>>>>>>>             - Usually you use this method also in the 
>>>>>>>> backoffice or any other
>>>>>>>>             - list view.
>>>>>>>>             -->
>>>>>>>>         <!--
>>>>>>>>         <listMethod name="getNewsList">
>>>>>>>>           <param type="java.lang.Integer">1</param>
>>>>>>>>           <param type="java.lang.String">-1</param>
>>>>>>>>         </listMethod>
>>>>>>>>          -->
>>>>>>>>           <!--
>>>>>>>>             - <page /> determines a page in the virtual file 
>>>>>>>> system that can
>>>>>>>>             - display a single entry of a content definition. 
>>>>>>>> You must provide
>>>>>>>>             - also a method of the content definition class 
>>>>>>>> that retrieves an
>>>>>>>>             - id (or something else that has to be appended to 
>>>>>>>> your page uri
>>>>>>>>             - to determine which entry has to be displayed). 
>>>>>>>> The result will
>>>>>>>>             - look like:
>>>>>>>>             - /news.html?__element=entry&newsid=<result of 
>>>>>>>> getIntId>
>>>>>>>>             - for each content definition instance object.
>>>>>>>>             -->
>>>>>>>>         <!--
>>>>>>>>         <page uri="/news.html?__element=entry">
>>>>>>>>           <param method="getIntId" name="newsid"/>
>>>>>>>>         </page>
>>>>>>>>          -->
>>>>>>>>         <!--
>>>>>>>>           <page uri="/singleNews.jsp">
>>>>>>>>             <param method="getIntId" name="id"/>
>>>>>>>>           </page>
>>>>>>>>           -->
>>>>>>>>       <!--
>>>>>>>>       </contentDefinition>
>>>>>>>>        -->
>>>>>>>>        <!-- for Forums modules
>>>>>>>>       <contentDefinition type="forum">
>>>>>>>>         
>>>>>>>> <class>de.wfnetz.opencms.modules.forum.ContributionContentDefinition</class> 
>>>>>>>>
>>>>>>>>         <listMethod name="getSortedList">
>>>>>>>>           <param type="java.lang.String"/>
>>>>>>>>         </listMethod>
>>>>>>>>         <page 
>>>>>>>> uri="/forum.html?forumtemplate=viewcontributionentry">
>>>>>>>>           <param method="getId" name="conid"/>
>>>>>>>>         </page>
>>>>>>>>       </contentDefinition>
>>>>>>>>       -->
>>>>>>>>   </contentDefinitions>
>>>>>>>> </luceneSearch>
>>>>>>>>
>>>>>>>>    </system>
>>>>>>>> ---------- cut ------
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> This mail is send to you from the opencms-dev mailing list
>>>>>>>> To change your list options, or to unsubscribe from the list, 
>>>>>>>> please visit
>>>>>>>> http://mail.opencms.org/mailman/listinfo/opencms-dev
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> This mail is send to you from the opencms-dev mailing list
>>>>>>> To change your list options, or to unsubscribe from the list, 
>>>>>>> please visit
>>>>>>> http://mail.opencms.org/mailman/listinfo/opencms-dev
>>>>>>>
>>>>>> _______________________________________________
>>>>>> This mail is send to you from the opencms-dev mailing list
>>>>>> To change your list options, or to unsubscribe from the list, 
>>>>>> please visit
>>>>>> http://mail.opencms.org/mailman/listinfo/opencms-dev
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> This mail is send to you from the opencms-dev mailing list
>>>>> To change your list options, or to unsubscribe from the list, 
>>>>> please visit
>>>>> http://mail.opencms.org/mailman/listinfo/opencms-dev
>>>>>
>>>> _______________________________________________
>>>> This mail is send to you from the opencms-dev mailing list
>>>> To change your list options, or to unsubscribe from the list, 
>>>> please visit
>>>> http://mail.opencms.org/mailman/listinfo/opencms-dev
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> This mail is send to you from the opencms-dev mailing list
>>> To change your list options, or to unsubscribe from the list, please 
>>> visit
>>> http://mail.opencms.org/mailman/listinfo/opencms-dev
>>>
>> _______________________________________________
>> This mail is send to you from the opencms-dev mailing list
>> To change your list options, or to unsubscribe from the list, please 
>> visit
>> http://mail.opencms.org/mailman/listinfo/opencms-dev
>
>
> _______________________________________________
> This mail is send to you from the opencms-dev mailing list
> To change your list options, or to unsubscribe from the list, please 
> visit
> http://mail.opencms.org/mailman/listinfo/opencms-dev
>



More information about the opencms-dev mailing list