[opencms-dev] lucene indexing doesn't start

Konstantins Dorodovs K.Dorodovs at mebius.lv
Mon May 17 19:37:02 CEST 2004


unfortunately matched document doesn't contain neither "Title" nor 
"title" fields
only

abs_path
initial_add
last_modified

fields.

I played with lucene module source a bit, and added to 
BodylessDocument.Document (CmsObject, CmsFile)
code:
     
        if((title = cmso.readProperty(absPath, "Title")) != null) {
            doc.add(Field.Text(FIELD_TITLE, title));
            doc.add(Field.UnStored(FIELD_BULK, title));
        }

it didn't help,
although I'm not sure whether my code loaded, I supposed, module change 
doesn't require
me to restart the web server.

any idea


M Butcher wrote:

>
> I think it's "Title", not "title"... but I can't remember. Some of the 
> params are upper case and others are lower case.
>
> Matt
>
> Konstantins Dorodovs wrote:
>
>> it's ok, task was run, only later, then expected
>>
>> a new problem is:
>>
>> doc.get("title")   returns null when lucene indexes on linux,
>>
>> when I run on windows it seems ok
>>
>> Konstantin
>>
>>
>>
>> M Butcher wrote:
>>
>>>
>>> Are any other cron tasks executing? It sounds like the 
>>> CronIndexManager is never being run.
>>>
>>> If you suspect otherwise, a simple test is to run the 
>>> CronIndexManager from a JSP. That would print any exceptions 
>>> directly to the browser window, which would be helpful.
>>>
>>> CmsJspActionElement cmsjsp =
>>>     new CmsJspActionElement(pageContext, request, response)
>>> CronIndexManager c = new CronIndexManager();
>>> c.launch(cmsjsp.getCmsObject(), "createIndex=true");
>>>
>>> Matt
>>>
>>> Konstantins Dorodovs wrote:
>>>
>>>> looked in %CATALINA_HOME%\logs\localhost_log.MYDATE.txt
>>>> no relevant errors there :(
>>>>
>>>>
>>>>
>>>>
>>>> M Butcher wrote:
>>>>
>>>>>
>>>>> Any errors in the catalina.log file?
>>>>>
>>>>> Matt
>>>>>
>>>>> Konstantins Dorodovs wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I have a problem with lucene indexing
>>>>>> (opencms version: 5.0.6b1, lucene module: 1.5, tomcat: 4.1.30)
>>>>>>
>>>>>> cron job seems doesn't start:  looked at log
>>>>>> entry in Scheduler(
>>>>>> 11 21 * * * admin Administrators 
>>>>>> net.grcomputing.opencms.search.lucene.CronIndexManager 
>>>>>> createIndex=true
>>>>>> )
>>>>>>
>>>>>> seems, I did according to docs,
>>>>>> (cron is enabled: [11.05.2004 20:10:04] <opencms_init> . OpenCms 
>>>>>> scheduler    : enabled)
>>>>>> below, there is an excerpt from my registry.xml:
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> Konstantin
>>>>>>
>>>>>>
>>>>>> ---------- cut ------
>>>>>>        <tempfileproject>3</tempfileproject>
>>>>>>
>>>>>> <luceneSearch>
>>>>>>    <!--
>>>>>>      - mergeFactor and permCheck are currently ignored.
>>>>>>      -->
>>>>>>   <mergeFactor>100000</mergeFactor>
>>>>>>   <permCheck>true</permCheck>
>>>>>>
>>>>>>    <!--
>>>>>>      - directory in which lucene will store its indexes. Note: 
>>>>>> this is real
>>>>>>      - fs, not VFS.
>>>>>>      -->
>>>>>>   <indexDir>C:\luceneindex\</indexDir>
>>>>>>   <!-- <indexDir>F:\luceneindex\</indexDir> -->
>>>>>>
>>>>>>    <!--
>>>>>>      - The analyzer is used for parsing documents. Choose one for 
>>>>>> your
>>>>>>      - language. If language is English, use the StandardAnalyzer.
>>>>>>      - There are additional analyzers at 
>>>>>> http://jakarta.apache.org/lucene
>>>>>>      -->
>>>>>>   
>>>>>> <analyzer>org.apache.lucene.analysis.standard.StandardAnalyzer</analyzer> 
>>>>>>
>>>>>>   <!-- 
>>>>>> <analyzer>org.apache.lucene.analysis.de.GermanAnalyzer</analyzer> 
>>>>>> -->
>>>>>>
>>>>>>    <!--
>>>>>>      - If subsearch is true, subfolders will be searched by default.
>>>>>>      - This can be turned on/off per directory.
>>>>>>      -->
>>>>>>   <subsearch>true</subsearch>     <!--
>>>>>>      - Name of the project to index. Online is recommended.
>>>>>>      -->
>>>>>>   <project>online</project>
>>>>>>  
>>>>>>    <!--
>>>>>>      - docFactories determine how documents are processed. 
>>>>>> Generally, one
>>>>>>      - docFactory exists for each type of content (viz. JSP, 
>>>>>> Page, Plain)
>>>>>>      - that you want to index.
>>>>>>      -->
>>>>>>   <docFactories>
>>>>>>  
>>>>>>       <!--
>>>>>>         - This docFactory indexes documents with type page (e.g. 
>>>>>> HTML
>>>>>>         - files edited with the WYSIWYG editor).
>>>>>>         -
>>>>>>         - Note that the 'type' attribute specifies which content 
>>>>>> definition
>>>>>>         - to use. Built in content types include page, plain, 
>>>>>> binary, and jsp
>>>>>>         - (there are others, too). Custom content types can be 
>>>>>> used as well
>>>>>>         - (see the contentDefinitions section below).
>>>>>>         -->
>>>>>>       <docFactory enabled="true" type="page">
>>>>>>         
>>>>>> <class>net.grcomputing.opencms.search.lucene.PageDocument</class>
>>>>>>       </docFactory>
>>>>>>
>>>>>>       <!--
>>>>>>         - This docFactory is a little more complex. It takes 
>>>>>> documents of
>>>>>>         - type "plain" and determines, by extension, what class 
>>>>>> should be
>>>>>>         - used to index each particular file. In this example, we 
>>>>>> want to
>>>>>>         - index plain text files exactly as they are, but any 
>>>>>> files that
>>>>>>         - contain tags need the tags stripped out before they are 
>>>>>> indexed.
>>>>>>         -
>>>>>>         - Note that the name="" attribute is simply for pretty 
>>>>>> output, and
>>>>>>         - can contain any allowable PCDATA text.
>>>>>>         -->
>>>>>>       <docFactory enabled="true" type="plain">
>>>>>>          <fileType name="plaintext">
>>>>>>            <extension>.txt</extension>
>>>>>>            
>>>>>> <class>net.grcomputing.opencms.search.lucene.PlainDocument</class>
>>>>>>          </fileType>
>>>>>>          <fileType name="taggedtext">
>>>>>>            <extension>.html</extension>
>>>>>>            <extension>.htm</extension>
>>>>>>            <extension>.xml</extension>
>>>>>>            <!-- This will strip tags before processing -->
>>>>>>            
>>>>>> <class>net.grcomputing.opencms.search.lucene.TaggedPlainDocument</class> 
>>>>>>
>>>>>>          </fileType>
>>>>>>       </docFactory>
>>>>>>
>>>>>>        <!-- This is for binary files. PDF and DOC files are 
>>>>>> binary, as are
>>>>>>          - CLASS and JAR files.
>>>>>>          -->
>>>>>>       <docFactory enabled="true" type="binary">
>>>>>>          <!-- This is for indexing PDF files -->
>>>>>>          <fileType name="PDF">
>>>>>>            <extension>.pdf</extension>
>>>>>>            
>>>>>> <class>net.grcomputing.opencms.search.lucene.PDFDocument</class>
>>>>>>          </fileType>
>>>>>>          <!-- This is for indexing MS Word documents -->
>>>>>>          <fileType name="Word">
>>>>>>            <extension>.doc</extension>
>>>>>>            <extension>.dot</extension>
>>>>>>            
>>>>>> <class>net.grcomputing.opencms.search.lucene.WordDocument</class>
>>>>>>          </fileType>
>>>>>>       </docFactory>
>>>>>>
>>>>>>       <!--
>>>>>>         - This will strip JSP tags and all scriptlets. IT WILL 
>>>>>> NOT RENDER THE
>>>>>>         - JSP FIRST, as JSPs are, by nature, dynamic.
>>>>>>         -
>>>>>>         - Usually, this is off by default.
>>>>>>         -->
>>>>>>       <docFactory enabled="false" type="jsp">
>>>>>>         
>>>>>> <class>net.grcomputing.opencms.search.lucene.JspDocument</class>
>>>>>>       </docFactory>
>>>>>>
>>>>>>       <!-- For the news module. Enable if you use news -->
>>>>>>
>>>>>> <!--       <docFactory enabled="false" type="news">
>>>>>>         
>>>>>> <class>net.grcomputing.opencms.search.lucene.NewsDocument</class>
>>>>>>       </docFactory>
>>>>>> -->
>>>>>>
>>>>>>       <!-- For the forum module. Enable if you use forums. -->
>>>>>> <!--
>>>>>>       <docFactory enabled="false" type="forum">
>>>>>>         
>>>>>> <class>de.wfnetz.opencms.modules.forum.ContributionDocument</class>
>>>>>>       </docFactory>
>>>>>> -->
>>>>>>
>>>>>>       <!-- If you need to index XML Template files (bad idea) use 
>>>>>> this: -->
>>>>>>       <docFactory enabled="false" type="XML Template"/>
>>>>>>   </docFactories>
>>>>>>  
>>>>>>    <!--
>>>>>>      - <directories/> determines which directories are indexed. 
>>>>>> By default,
>>>>>>      - the /system directory is never indexed, so it is safe to 
>>>>>> index root.
>>>>>>      -
>>>>>>      - If you want to specify only certain directories for 
>>>>>> indexing, create
>>>>>>      - one <directory/> entry per directory. Again, you may use 
>>>>>> subsearch to
>>>>>>      - override the default subsearch setting discussed above.
>>>>>>      -->
>>>>>>   <directories>
>>>>>>       <directory location="/">
>>>>>>         <section>Root</section>
>>>>>>         <subsearch>true</subsearch>
>>>>>>       </directory>
>>>>>>   </directories>
>>>>>>
>>>>>>   <!--
>>>>>>     - Use this section to define specific contentDefinitions. 
>>>>>> Provided below
>>>>>>     - are entries for the news and forum modules.
>>>>>>     - (Uncomment these only after you have installed the 
>>>>>> corresponding
>>>>>>     - modules)
>>>>>>     -->
>>>>>>   <contentDefinitions>
>>>>>>       <!--
>>>>>>       <contentDefinition type="news">
>>>>>>        -->
>>>>>>          <!--
>>>>>>            - <class /> determines the class of the content 
>>>>>> definition. Should
>>>>>>            - be a subclass of 
>>>>>> com.opencms.defaults.A_CmsContentDefinition.
>>>>>>            -->
>>>>>>         <!--
>>>>>>         
>>>>>> <class>com.opencms.modules.homepage.news.NewsContentDefinition</class> 
>>>>>>
>>>>>>          -->
>>>>>>          <!--
>>>>>>            - <initClass /> is optional and has to implement
>>>>>>            - 
>>>>>> net.grcomputing.opencms.search.lucene.I_ContentDefinitionInitialization. 
>>>>>>
>>>>>>            - It provides you with the ability to perform some
>>>>>>            - initialization before the content definition class 
>>>>>> can be used.
>>>>>>            - In case of the news module the 
>>>>>> NewsChannelContentDefinition class
>>>>>>            - has to be loaded.
>>>>>>            -->
>>>>>>         <!--
>>>>>>         
>>>>>> <initClass>net.grcomputing.opencms.search.lucene.NewsInitialization</initClass> 
>>>>>>
>>>>>>          -->
>>>>>>           <!--
>>>>>>             - <listMethod /> defines the method of the content 
>>>>>> definition class
>>>>>>             - which should be used to retrieve all content 
>>>>>> definition objects
>>>>>>             - (or any subset).
>>>>>>             - Usually you use this method also in the backoffice 
>>>>>> or any other
>>>>>>             - list view.
>>>>>>             -->
>>>>>>         <!--
>>>>>>         <listMethod name="getNewsList">
>>>>>>           <param type="java.lang.Integer">1</param>
>>>>>>           <param type="java.lang.String">-1</param>
>>>>>>         </listMethod>
>>>>>>          -->
>>>>>>           <!--
>>>>>>             - <page /> determines a page in the virtual file 
>>>>>> system that can
>>>>>>             - display a single entry of a content definition. You 
>>>>>> must provide
>>>>>>             - also a method of the content definition class that 
>>>>>> retrieves an
>>>>>>             - id (or something else that has to be appended to 
>>>>>> your page uri
>>>>>>             - to determine which entry has to be displayed). The 
>>>>>> result will
>>>>>>             - look like:
>>>>>>             - /news.html?__element=entry&newsid=<result of getIntId>
>>>>>>             - for each content definition instance object.
>>>>>>             -->
>>>>>>         <!--
>>>>>>         <page uri="/news.html?__element=entry">
>>>>>>           <param method="getIntId" name="newsid"/>
>>>>>>         </page>
>>>>>>          -->
>>>>>>         <!--
>>>>>>           <page uri="/singleNews.jsp">
>>>>>>             <param method="getIntId" name="id"/>
>>>>>>           </page>
>>>>>>           -->
>>>>>>       <!--
>>>>>>       </contentDefinition>
>>>>>>        -->
>>>>>>        <!-- for Forums modules
>>>>>>       <contentDefinition type="forum">
>>>>>>         
>>>>>> <class>de.wfnetz.opencms.modules.forum.ContributionContentDefinition</class> 
>>>>>>
>>>>>>         <listMethod name="getSortedList">
>>>>>>           <param type="java.lang.String"/>
>>>>>>         </listMethod>
>>>>>>         <page uri="/forum.html?forumtemplate=viewcontributionentry">
>>>>>>           <param method="getId" name="conid"/>
>>>>>>         </page>
>>>>>>       </contentDefinition>
>>>>>>       -->
>>>>>>   </contentDefinitions>
>>>>>> </luceneSearch>
>>>>>>
>>>>>>    </system>
>>>>>> ---------- cut ------
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> This mail is send to you from the opencms-dev mailing list
>>>>>> To change your list options, or to unsubscribe from the list, 
>>>>>> please visit
>>>>>> http://mail.opencms.org/mailman/listinfo/opencms-dev
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> This mail is send to you from the opencms-dev mailing list
>>>>> To change your list options, or to unsubscribe from the list, 
>>>>> please visit
>>>>> http://mail.opencms.org/mailman/listinfo/opencms-dev
>>>>>
>>>> _______________________________________________
>>>> This mail is send to you from the opencms-dev mailing list
>>>> To change your list options, or to unsubscribe from the list, 
>>>> please visit
>>>> http://mail.opencms.org/mailman/listinfo/opencms-dev
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> This mail is send to you from the opencms-dev mailing list
>>> To change your list options, or to unsubscribe from the list, please 
>>> visit
>>> http://mail.opencms.org/mailman/listinfo/opencms-dev
>>>
>> _______________________________________________
>> This mail is send to you from the opencms-dev mailing list
>> To change your list options, or to unsubscribe from the list, please 
>> visit
>> http://mail.opencms.org/mailman/listinfo/opencms-dev
>
>
> _______________________________________________
> This mail is send to you from the opencms-dev mailing list
> To change your list options, or to unsubscribe from the list, please 
> visit
> http://mail.opencms.org/mailman/listinfo/opencms-dev
>



More information about the opencms-dev mailing list