[opencms-dev] lucene indexing doesn't start

M Butcher mbutcher at grcomputing.net
Mon May 17 20:01:00 CEST 2004


You do actually have to restart the servlet container. Otherwise, it 
will use the classes loaded during startup.

Matt

Konstantins Dorodovs wrote:
> unfortunately matched document doesn't contain neither "Title" nor 
> "title" fields
> only
> 
> abs_path
> initial_add
> last_modified
> 
> fields.
> 
> I played with lucene module source a bit, and added to 
> BodylessDocument.Document (CmsObject, CmsFile)
> code:
>            if((title = cmso.readProperty(absPath, "Title")) != null) {
>            doc.add(Field.Text(FIELD_TITLE, title));
>            doc.add(Field.UnStored(FIELD_BULK, title));
>        }
> 
> it didn't help,
> although I'm not sure whether my code loaded, I supposed, module change 
> doesn't require
> me to restart the web server.
> 
> any idea
> 
> 
> M Butcher wrote:
> 
>>
>> I think it's "Title", not "title"... but I can't remember. Some of the 
>> params are upper case and others are lower case.
>>
>> Matt
>>
>> Konstantins Dorodovs wrote:
>>
>>> it's ok, task was run, only later, then expected
>>>
>>> a new problem is:
>>>
>>> doc.get("title")   returns null when lucene indexes on linux,
>>>
>>> when I run on windows it seems ok
>>>
>>> Konstantin
>>>
>>>
>>>
>>> M Butcher wrote:
>>>
>>>>
>>>> Are any other cron tasks executing? It sounds like the 
>>>> CronIndexManager is never being run.
>>>>
>>>> If you suspect otherwise, a simple test is to run the 
>>>> CronIndexManager from a JSP. That would print any exceptions 
>>>> directly to the browser window, which would be helpful.
>>>>
>>>> CmsJspActionElement cmsjsp =
>>>>     new CmsJspActionElement(pageContext, request, response)
>>>> CronIndexManager c = new CronIndexManager();
>>>> c.launch(cmsjsp.getCmsObject(), "createIndex=true");
>>>>
>>>> Matt
>>>>
>>>> Konstantins Dorodovs wrote:
>>>>
>>>>> looked in %CATALINA_HOME%\logs\localhost_log.MYDATE.txt
>>>>> no relevant errors there :(
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> M Butcher wrote:
>>>>>
>>>>>>
>>>>>> Any errors in the catalina.log file?
>>>>>>
>>>>>> Matt
>>>>>>
>>>>>> Konstantins Dorodovs wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I have a problem with lucene indexing
>>>>>>> (opencms version: 5.0.6b1, lucene module: 1.5, tomcat: 4.1.30)
>>>>>>>
>>>>>>> cron job seems doesn't start:  looked at log
>>>>>>> entry in Scheduler(
>>>>>>> 11 21 * * * admin Administrators 
>>>>>>> net.grcomputing.opencms.search.lucene.CronIndexManager 
>>>>>>> createIndex=true
>>>>>>> )
>>>>>>>
>>>>>>> seems, I did according to docs,
>>>>>>> (cron is enabled: [11.05.2004 20:10:04] <opencms_init> . OpenCms 
>>>>>>> scheduler    : enabled)
>>>>>>> below, there is an excerpt from my registry.xml:
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> Konstantin
>>>>>>>
>>>>>>>
>>>>>>> ---------- cut ------
>>>>>>>        <tempfileproject>3</tempfileproject>
>>>>>>>
>>>>>>> <luceneSearch>
>>>>>>>    <!--
>>>>>>>      - mergeFactor and permCheck are currently ignored.
>>>>>>>      -->
>>>>>>>   <mergeFactor>100000</mergeFactor>
>>>>>>>   <permCheck>true</permCheck>
>>>>>>>
>>>>>>>    <!--
>>>>>>>      - directory in which lucene will store its indexes. Note: 
>>>>>>> this is real
>>>>>>>      - fs, not VFS.
>>>>>>>      -->
>>>>>>>   <indexDir>C:\luceneindex\</indexDir>
>>>>>>>   <!-- <indexDir>F:\luceneindex\</indexDir> -->
>>>>>>>
>>>>>>>    <!--
>>>>>>>      - The analyzer is used for parsing documents. Choose one for 
>>>>>>> your
>>>>>>>      - language. If language is English, use the StandardAnalyzer.
>>>>>>>      - There are additional analyzers at 
>>>>>>> http://jakarta.apache.org/lucene
>>>>>>>      -->
>>>>>>>   
>>>>>>> <analyzer>org.apache.lucene.analysis.standard.StandardAnalyzer</analyzer> 
>>>>>>>
>>>>>>>   <!-- 
>>>>>>> <analyzer>org.apache.lucene.analysis.de.GermanAnalyzer</analyzer> 
>>>>>>> -->
>>>>>>>
>>>>>>>    <!--
>>>>>>>      - If subsearch is true, subfolders will be searched by default.
>>>>>>>      - This can be turned on/off per directory.
>>>>>>>      -->
>>>>>>>   <subsearch>true</subsearch>     <!--
>>>>>>>      - Name of the project to index. Online is recommended.
>>>>>>>      -->
>>>>>>>   <project>online</project>
>>>>>>>  
>>>>>>>    <!--
>>>>>>>      - docFactories determine how documents are processed. 
>>>>>>> Generally, one
>>>>>>>      - docFactory exists for each type of content (viz. JSP, 
>>>>>>> Page, Plain)
>>>>>>>      - that you want to index.
>>>>>>>      -->
>>>>>>>   <docFactories>
>>>>>>>  
>>>>>>>       <!--
>>>>>>>         - This docFactory indexes documents with type page (e.g. 
>>>>>>> HTML
>>>>>>>         - files edited with the WYSIWYG editor).
>>>>>>>         -
>>>>>>>         - Note that the 'type' attribute specifies which content 
>>>>>>> definition
>>>>>>>         - to use. Built in content types include page, plain, 
>>>>>>> binary, and jsp
>>>>>>>         - (there are others, too). Custom content types can be 
>>>>>>> used as well
>>>>>>>         - (see the contentDefinitions section below).
>>>>>>>         -->
>>>>>>>       <docFactory enabled="true" type="page">
>>>>>>>         
>>>>>>> <class>net.grcomputing.opencms.search.lucene.PageDocument</class>
>>>>>>>       </docFactory>
>>>>>>>
>>>>>>>       <!--
>>>>>>>         - This docFactory is a little more complex. It takes 
>>>>>>> documents of
>>>>>>>         - type "plain" and determines, by extension, what class 
>>>>>>> should be
>>>>>>>         - used to index each particular file. In this example, we 
>>>>>>> want to
>>>>>>>         - index plain text files exactly as they are, but any 
>>>>>>> files that
>>>>>>>         - contain tags need the tags stripped out before they are 
>>>>>>> indexed.
>>>>>>>         -
>>>>>>>         - Note that the name="" attribute is simply for pretty 
>>>>>>> output, and
>>>>>>>         - can contain any allowable PCDATA text.
>>>>>>>         -->
>>>>>>>       <docFactory enabled="true" type="plain">
>>>>>>>          <fileType name="plaintext">
>>>>>>>            <extension>.txt</extension>
>>>>>>>            
>>>>>>> <class>net.grcomputing.opencms.search.lucene.PlainDocument</class>
>>>>>>>          </fileType>
>>>>>>>          <fileType name="taggedtext">
>>>>>>>            <extension>.html</extension>
>>>>>>>            <extension>.htm</extension>
>>>>>>>            <extension>.xml</extension>
>>>>>>>            <!-- This will strip tags before processing -->
>>>>>>>            
>>>>>>> <class>net.grcomputing.opencms.search.lucene.TaggedPlainDocument</class> 
>>>>>>>
>>>>>>>          </fileType>
>>>>>>>       </docFactory>
>>>>>>>
>>>>>>>        <!-- This is for binary files. PDF and DOC files are 
>>>>>>> binary, as are
>>>>>>>          - CLASS and JAR files.
>>>>>>>          -->
>>>>>>>       <docFactory enabled="true" type="binary">
>>>>>>>          <!-- This is for indexing PDF files -->
>>>>>>>          <fileType name="PDF">
>>>>>>>            <extension>.pdf</extension>
>>>>>>>            
>>>>>>> <class>net.grcomputing.opencms.search.lucene.PDFDocument</class>
>>>>>>>          </fileType>
>>>>>>>          <!-- This is for indexing MS Word documents -->
>>>>>>>          <fileType name="Word">
>>>>>>>            <extension>.doc</extension>
>>>>>>>            <extension>.dot</extension>
>>>>>>>            
>>>>>>> <class>net.grcomputing.opencms.search.lucene.WordDocument</class>
>>>>>>>          </fileType>
>>>>>>>       </docFactory>
>>>>>>>
>>>>>>>       <!--
>>>>>>>         - This will strip JSP tags and all scriptlets. IT WILL 
>>>>>>> NOT RENDER THE
>>>>>>>         - JSP FIRST, as JSPs are, by nature, dynamic.
>>>>>>>         -
>>>>>>>         - Usually, this is off by default.
>>>>>>>         -->
>>>>>>>       <docFactory enabled="false" type="jsp">
>>>>>>>         
>>>>>>> <class>net.grcomputing.opencms.search.lucene.JspDocument</class>
>>>>>>>       </docFactory>
>>>>>>>
>>>>>>>       <!-- For the news module. Enable if you use news -->
>>>>>>>
>>>>>>> <!--       <docFactory enabled="false" type="news">
>>>>>>>         
>>>>>>> <class>net.grcomputing.opencms.search.lucene.NewsDocument</class>
>>>>>>>       </docFactory>
>>>>>>> -->
>>>>>>>
>>>>>>>       <!-- For the forum module. Enable if you use forums. -->
>>>>>>> <!--
>>>>>>>       <docFactory enabled="false" type="forum">
>>>>>>>         
>>>>>>> <class>de.wfnetz.opencms.modules.forum.ContributionDocument</class>
>>>>>>>       </docFactory>
>>>>>>> -->
>>>>>>>
>>>>>>>       <!-- If you need to index XML Template files (bad idea) use 
>>>>>>> this: -->
>>>>>>>       <docFactory enabled="false" type="XML Template"/>
>>>>>>>   </docFactories>
>>>>>>>  
>>>>>>>    <!--
>>>>>>>      - <directories/> determines which directories are indexed. 
>>>>>>> By default,
>>>>>>>      - the /system directory is never indexed, so it is safe to 
>>>>>>> index root.
>>>>>>>      -
>>>>>>>      - If you want to specify only certain directories for 
>>>>>>> indexing, create
>>>>>>>      - one <directory/> entry per directory. Again, you may use 
>>>>>>> subsearch to
>>>>>>>      - override the default subsearch setting discussed above.
>>>>>>>      -->
>>>>>>>   <directories>
>>>>>>>       <directory location="/">
>>>>>>>         <section>Root</section>
>>>>>>>         <subsearch>true</subsearch>
>>>>>>>       </directory>
>>>>>>>   </directories>
>>>>>>>
>>>>>>>   <!--
>>>>>>>     - Use this section to define specific contentDefinitions. 
>>>>>>> Provided below
>>>>>>>     - are entries for the news and forum modules.
>>>>>>>     - (Uncomment these only after you have installed the 
>>>>>>> corresponding
>>>>>>>     - modules)
>>>>>>>     -->
>>>>>>>   <contentDefinitions>
>>>>>>>       <!--
>>>>>>>       <contentDefinition type="news">
>>>>>>>        -->
>>>>>>>          <!--
>>>>>>>            - <class /> determines the class of the content 
>>>>>>> definition. Should
>>>>>>>            - be a subclass of 
>>>>>>> com.opencms.defaults.A_CmsContentDefinition.
>>>>>>>            -->
>>>>>>>         <!--
>>>>>>>         
>>>>>>> <class>com.opencms.modules.homepage.news.NewsContentDefinition</class> 
>>>>>>>
>>>>>>>          -->
>>>>>>>          <!--
>>>>>>>            - <initClass /> is optional and has to implement
>>>>>>>            - 
>>>>>>> net.grcomputing.opencms.search.lucene.I_ContentDefinitionInitialization. 
>>>>>>>
>>>>>>>            - It provides you with the ability to perform some
>>>>>>>            - initialization before the content definition class 
>>>>>>> can be used.
>>>>>>>            - In case of the news module the 
>>>>>>> NewsChannelContentDefinition class
>>>>>>>            - has to be loaded.
>>>>>>>            -->
>>>>>>>         <!--
>>>>>>>         
>>>>>>> <initClass>net.grcomputing.opencms.search.lucene.NewsInitialization</initClass> 
>>>>>>>
>>>>>>>          -->
>>>>>>>           <!--
>>>>>>>             - <listMethod /> defines the method of the content 
>>>>>>> definition class
>>>>>>>             - which should be used to retrieve all content 
>>>>>>> definition objects
>>>>>>>             - (or any subset).
>>>>>>>             - Usually you use this method also in the backoffice 
>>>>>>> or any other
>>>>>>>             - list view.
>>>>>>>             -->
>>>>>>>         <!--
>>>>>>>         <listMethod name="getNewsList">
>>>>>>>           <param type="java.lang.Integer">1</param>
>>>>>>>           <param type="java.lang.String">-1</param>
>>>>>>>         </listMethod>
>>>>>>>          -->
>>>>>>>           <!--
>>>>>>>             - <page /> determines a page in the virtual file 
>>>>>>> system that can
>>>>>>>             - display a single entry of a content definition. You 
>>>>>>> must provide
>>>>>>>             - also a method of the content definition class that 
>>>>>>> retrieves an
>>>>>>>             - id (or something else that has to be appended to 
>>>>>>> your page uri
>>>>>>>             - to determine which entry has to be displayed). The 
>>>>>>> result will
>>>>>>>             - look like:
>>>>>>>             - /news.html?__element=entry&newsid=<result of getIntId>
>>>>>>>             - for each content definition instance object.
>>>>>>>             -->
>>>>>>>         <!--
>>>>>>>         <page uri="/news.html?__element=entry">
>>>>>>>           <param method="getIntId" name="newsid"/>
>>>>>>>         </page>
>>>>>>>          -->
>>>>>>>         <!--
>>>>>>>           <page uri="/singleNews.jsp">
>>>>>>>             <param method="getIntId" name="id"/>
>>>>>>>           </page>
>>>>>>>           -->
>>>>>>>       <!--
>>>>>>>       </contentDefinition>
>>>>>>>        -->
>>>>>>>        <!-- for Forums modules
>>>>>>>       <contentDefinition type="forum">
>>>>>>>         
>>>>>>> <class>de.wfnetz.opencms.modules.forum.ContributionContentDefinition</class> 
>>>>>>>
>>>>>>>         <listMethod name="getSortedList">
>>>>>>>           <param type="java.lang.String"/>
>>>>>>>         </listMethod>
>>>>>>>         <page uri="/forum.html?forumtemplate=viewcontributionentry">
>>>>>>>           <param method="getId" name="conid"/>
>>>>>>>         </page>
>>>>>>>       </contentDefinition>
>>>>>>>       -->
>>>>>>>   </contentDefinitions>
>>>>>>> </luceneSearch>
>>>>>>>
>>>>>>>    </system>
>>>>>>> ---------- cut ------
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> This mail is send to you from the opencms-dev mailing list
>>>>>>> To change your list options, or to unsubscribe from the list, 
>>>>>>> please visit
>>>>>>> http://mail.opencms.org/mailman/listinfo/opencms-dev
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> This mail is send to you from the opencms-dev mailing list
>>>>>> To change your list options, or to unsubscribe from the list, 
>>>>>> please visit
>>>>>> http://mail.opencms.org/mailman/listinfo/opencms-dev
>>>>>>
>>>>> _______________________________________________
>>>>> This mail is send to you from the opencms-dev mailing list
>>>>> To change your list options, or to unsubscribe from the list, 
>>>>> please visit
>>>>> http://mail.opencms.org/mailman/listinfo/opencms-dev
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> This mail is send to you from the opencms-dev mailing list
>>>> To change your list options, or to unsubscribe from the list, please 
>>>> visit
>>>> http://mail.opencms.org/mailman/listinfo/opencms-dev
>>>>
>>> _______________________________________________
>>> This mail is send to you from the opencms-dev mailing list
>>> To change your list options, or to unsubscribe from the list, please 
>>> visit
>>> http://mail.opencms.org/mailman/listinfo/opencms-dev
>>
>>
>>
>> _______________________________________________
>> This mail is send to you from the opencms-dev mailing list
>> To change your list options, or to unsubscribe from the list, please 
>> visit
>> http://mail.opencms.org/mailman/listinfo/opencms-dev
>>
> _______________________________________________
> This mail is send to you from the opencms-dev mailing list
> To change your list options, or to unsubscribe from the list, please visit
> http://mail.opencms.org/mailman/listinfo/opencms-dev




More information about the opencms-dev mailing list