[opencms-dev] lucene indexing doesn't start
Konstantins Dorodovs
K.Dorodovs at mebius.lv
Mon May 17 19:37:02 CEST 2004
unfortunately matched document doesn't contain neither "Title" nor
"title" fields
only
abs_path
initial_add
last_modified
fields.
I played with lucene module source a bit, and added to
BodylessDocument.Document (CmsObject, CmsFile)
code:
if((title = cmso.readProperty(absPath, "Title")) != null) {
doc.add(Field.Text(FIELD_TITLE, title));
doc.add(Field.UnStored(FIELD_BULK, title));
}
it didn't help,
although I'm not sure whether my code loaded, I supposed, module change
doesn't require
me to restart the web server.
any idea
M Butcher wrote:
>
> I think it's "Title", not "title"... but I can't remember. Some of the
> params are upper case and others are lower case.
>
> Matt
>
> Konstantins Dorodovs wrote:
>
>> it's ok, task was run, only later, then expected
>>
>> a new problem is:
>>
>> doc.get("title") returns null when lucene indexes on linux,
>>
>> when I run on windows it seems ok
>>
>> Konstantin
>>
>>
>>
>> M Butcher wrote:
>>
>>>
>>> Are any other cron tasks executing? It sounds like the
>>> CronIndexManager is never being run.
>>>
>>> If you suspect otherwise, a simple test is to run the
>>> CronIndexManager from a JSP. That would print any exceptions
>>> directly to the browser window, which would be helpful.
>>>
>>> CmsJspActionElement cmsjsp =
>>> new CmsJspActionElement(pageContext, request, response)
>>> CronIndexManager c = new CronIndexManager();
>>> c.launch(cmsjsp.getCmsObject(), "createIndex=true");
>>>
>>> Matt
>>>
>>> Konstantins Dorodovs wrote:
>>>
>>>> looked in %CATALINA_HOME%\logs\localhost_log.MYDATE.txt
>>>> no relevant errors there :(
>>>>
>>>>
>>>>
>>>>
>>>> M Butcher wrote:
>>>>
>>>>>
>>>>> Any errors in the catalina.log file?
>>>>>
>>>>> Matt
>>>>>
>>>>> Konstantins Dorodovs wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I have a problem with lucene indexing
>>>>>> (opencms version: 5.0.6b1, lucene module: 1.5, tomcat: 4.1.30)
>>>>>>
>>>>>> cron job seems doesn't start: looked at log
>>>>>> entry in Scheduler(
>>>>>> 11 21 * * * admin Administrators
>>>>>> net.grcomputing.opencms.search.lucene.CronIndexManager
>>>>>> createIndex=true
>>>>>> )
>>>>>>
>>>>>> seems, I did according to docs,
>>>>>> (cron is enabled: [11.05.2004 20:10:04] <opencms_init> . OpenCms
>>>>>> scheduler : enabled)
>>>>>> below, there is an excerpt from my registry.xml:
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> Konstantin
>>>>>>
>>>>>>
>>>>>> ---------- cut ------
>>>>>> <tempfileproject>3</tempfileproject>
>>>>>>
>>>>>> <luceneSearch>
>>>>>> <!--
>>>>>> - mergeFactor and permCheck are currently ignored.
>>>>>> -->
>>>>>> <mergeFactor>100000</mergeFactor>
>>>>>> <permCheck>true</permCheck>
>>>>>>
>>>>>> <!--
>>>>>> - directory in which lucene will store its indexes. Note:
>>>>>> this is real
>>>>>> - fs, not VFS.
>>>>>> -->
>>>>>> <indexDir>C:\luceneindex\</indexDir>
>>>>>> <!-- <indexDir>F:\luceneindex\</indexDir> -->
>>>>>>
>>>>>> <!--
>>>>>> - The analyzer is used for parsing documents. Choose one for
>>>>>> your
>>>>>> - language. If language is English, use the StandardAnalyzer.
>>>>>> - There are additional analyzers at
>>>>>> http://jakarta.apache.org/lucene
>>>>>> -->
>>>>>>
>>>>>> <analyzer>org.apache.lucene.analysis.standard.StandardAnalyzer</analyzer>
>>>>>>
>>>>>> <!--
>>>>>> <analyzer>org.apache.lucene.analysis.de.GermanAnalyzer</analyzer>
>>>>>> -->
>>>>>>
>>>>>> <!--
>>>>>> - If subsearch is true, subfolders will be searched by default.
>>>>>> - This can be turned on/off per directory.
>>>>>> -->
>>>>>> <subsearch>true</subsearch> <!--
>>>>>> - Name of the project to index. Online is recommended.
>>>>>> -->
>>>>>> <project>online</project>
>>>>>>
>>>>>> <!--
>>>>>> - docFactories determine how documents are processed.
>>>>>> Generally, one
>>>>>> - docFactory exists for each type of content (viz. JSP,
>>>>>> Page, Plain)
>>>>>> - that you want to index.
>>>>>> -->
>>>>>> <docFactories>
>>>>>>
>>>>>> <!--
>>>>>> - This docFactory indexes documents with type page (e.g.
>>>>>> HTML
>>>>>> - files edited with the WYSIWYG editor).
>>>>>> -
>>>>>> - Note that the 'type' attribute specifies which content
>>>>>> definition
>>>>>> - to use. Built in content types include page, plain,
>>>>>> binary, and jsp
>>>>>> - (there are others, too). Custom content types can be
>>>>>> used as well
>>>>>> - (see the contentDefinitions section below).
>>>>>> -->
>>>>>> <docFactory enabled="true" type="page">
>>>>>>
>>>>>> <class>net.grcomputing.opencms.search.lucene.PageDocument</class>
>>>>>> </docFactory>
>>>>>>
>>>>>> <!--
>>>>>> - This docFactory is a little more complex. It takes
>>>>>> documents of
>>>>>> - type "plain" and determines, by extension, what class
>>>>>> should be
>>>>>> - used to index each particular file. In this example, we
>>>>>> want to
>>>>>> - index plain text files exactly as they are, but any
>>>>>> files that
>>>>>> - contain tags need the tags stripped out before they are
>>>>>> indexed.
>>>>>> -
>>>>>> - Note that the name="" attribute is simply for pretty
>>>>>> output, and
>>>>>> - can contain any allowable PCDATA text.
>>>>>> -->
>>>>>> <docFactory enabled="true" type="plain">
>>>>>> <fileType name="plaintext">
>>>>>> <extension>.txt</extension>
>>>>>>
>>>>>> <class>net.grcomputing.opencms.search.lucene.PlainDocument</class>
>>>>>> </fileType>
>>>>>> <fileType name="taggedtext">
>>>>>> <extension>.html</extension>
>>>>>> <extension>.htm</extension>
>>>>>> <extension>.xml</extension>
>>>>>> <!-- This will strip tags before processing -->
>>>>>>
>>>>>> <class>net.grcomputing.opencms.search.lucene.TaggedPlainDocument</class>
>>>>>>
>>>>>> </fileType>
>>>>>> </docFactory>
>>>>>>
>>>>>> <!-- This is for binary files. PDF and DOC files are
>>>>>> binary, as are
>>>>>> - CLASS and JAR files.
>>>>>> -->
>>>>>> <docFactory enabled="true" type="binary">
>>>>>> <!-- This is for indexing PDF files -->
>>>>>> <fileType name="PDF">
>>>>>> <extension>.pdf</extension>
>>>>>>
>>>>>> <class>net.grcomputing.opencms.search.lucene.PDFDocument</class>
>>>>>> </fileType>
>>>>>> <!-- This is for indexing MS Word documents -->
>>>>>> <fileType name="Word">
>>>>>> <extension>.doc</extension>
>>>>>> <extension>.dot</extension>
>>>>>>
>>>>>> <class>net.grcomputing.opencms.search.lucene.WordDocument</class>
>>>>>> </fileType>
>>>>>> </docFactory>
>>>>>>
>>>>>> <!--
>>>>>> - This will strip JSP tags and all scriptlets. IT WILL
>>>>>> NOT RENDER THE
>>>>>> - JSP FIRST, as JSPs are, by nature, dynamic.
>>>>>> -
>>>>>> - Usually, this is off by default.
>>>>>> -->
>>>>>> <docFactory enabled="false" type="jsp">
>>>>>>
>>>>>> <class>net.grcomputing.opencms.search.lucene.JspDocument</class>
>>>>>> </docFactory>
>>>>>>
>>>>>> <!-- For the news module. Enable if you use news -->
>>>>>>
>>>>>> <!-- <docFactory enabled="false" type="news">
>>>>>>
>>>>>> <class>net.grcomputing.opencms.search.lucene.NewsDocument</class>
>>>>>> </docFactory>
>>>>>> -->
>>>>>>
>>>>>> <!-- For the forum module. Enable if you use forums. -->
>>>>>> <!--
>>>>>> <docFactory enabled="false" type="forum">
>>>>>>
>>>>>> <class>de.wfnetz.opencms.modules.forum.ContributionDocument</class>
>>>>>> </docFactory>
>>>>>> -->
>>>>>>
>>>>>> <!-- If you need to index XML Template files (bad idea) use
>>>>>> this: -->
>>>>>> <docFactory enabled="false" type="XML Template"/>
>>>>>> </docFactories>
>>>>>>
>>>>>> <!--
>>>>>> - <directories/> determines which directories are indexed.
>>>>>> By default,
>>>>>> - the /system directory is never indexed, so it is safe to
>>>>>> index root.
>>>>>> -
>>>>>> - If you want to specify only certain directories for
>>>>>> indexing, create
>>>>>> - one <directory/> entry per directory. Again, you may use
>>>>>> subsearch to
>>>>>> - override the default subsearch setting discussed above.
>>>>>> -->
>>>>>> <directories>
>>>>>> <directory location="/">
>>>>>> <section>Root</section>
>>>>>> <subsearch>true</subsearch>
>>>>>> </directory>
>>>>>> </directories>
>>>>>>
>>>>>> <!--
>>>>>> - Use this section to define specific contentDefinitions.
>>>>>> Provided below
>>>>>> - are entries for the news and forum modules.
>>>>>> - (Uncomment these only after you have installed the
>>>>>> corresponding
>>>>>> - modules)
>>>>>> -->
>>>>>> <contentDefinitions>
>>>>>> <!--
>>>>>> <contentDefinition type="news">
>>>>>> -->
>>>>>> <!--
>>>>>> - <class /> determines the class of the content
>>>>>> definition. Should
>>>>>> - be a subclass of
>>>>>> com.opencms.defaults.A_CmsContentDefinition.
>>>>>> -->
>>>>>> <!--
>>>>>>
>>>>>> <class>com.opencms.modules.homepage.news.NewsContentDefinition</class>
>>>>>>
>>>>>> -->
>>>>>> <!--
>>>>>> - <initClass /> is optional and has to implement
>>>>>> -
>>>>>> net.grcomputing.opencms.search.lucene.I_ContentDefinitionInitialization.
>>>>>>
>>>>>> - It provides you with the ability to perform some
>>>>>> - initialization before the content definition class
>>>>>> can be used.
>>>>>> - In case of the news module the
>>>>>> NewsChannelContentDefinition class
>>>>>> - has to be loaded.
>>>>>> -->
>>>>>> <!--
>>>>>>
>>>>>> <initClass>net.grcomputing.opencms.search.lucene.NewsInitialization</initClass>
>>>>>>
>>>>>> -->
>>>>>> <!--
>>>>>> - <listMethod /> defines the method of the content
>>>>>> definition class
>>>>>> - which should be used to retrieve all content
>>>>>> definition objects
>>>>>> - (or any subset).
>>>>>> - Usually you use this method also in the backoffice
>>>>>> or any other
>>>>>> - list view.
>>>>>> -->
>>>>>> <!--
>>>>>> <listMethod name="getNewsList">
>>>>>> <param type="java.lang.Integer">1</param>
>>>>>> <param type="java.lang.String">-1</param>
>>>>>> </listMethod>
>>>>>> -->
>>>>>> <!--
>>>>>> - <page /> determines a page in the virtual file
>>>>>> system that can
>>>>>> - display a single entry of a content definition. You
>>>>>> must provide
>>>>>> - also a method of the content definition class that
>>>>>> retrieves an
>>>>>> - id (or something else that has to be appended to
>>>>>> your page uri
>>>>>> - to determine which entry has to be displayed). The
>>>>>> result will
>>>>>> - look like:
>>>>>> - /news.html?__element=entry&newsid=<result of getIntId>
>>>>>> - for each content definition instance object.
>>>>>> -->
>>>>>> <!--
>>>>>> <page uri="/news.html?__element=entry">
>>>>>> <param method="getIntId" name="newsid"/>
>>>>>> </page>
>>>>>> -->
>>>>>> <!--
>>>>>> <page uri="/singleNews.jsp">
>>>>>> <param method="getIntId" name="id"/>
>>>>>> </page>
>>>>>> -->
>>>>>> <!--
>>>>>> </contentDefinition>
>>>>>> -->
>>>>>> <!-- for Forums modules
>>>>>> <contentDefinition type="forum">
>>>>>>
>>>>>> <class>de.wfnetz.opencms.modules.forum.ContributionContentDefinition</class>
>>>>>>
>>>>>> <listMethod name="getSortedList">
>>>>>> <param type="java.lang.String"/>
>>>>>> </listMethod>
>>>>>> <page uri="/forum.html?forumtemplate=viewcontributionentry">
>>>>>> <param method="getId" name="conid"/>
>>>>>> </page>
>>>>>> </contentDefinition>
>>>>>> -->
>>>>>> </contentDefinitions>
>>>>>> </luceneSearch>
>>>>>>
>>>>>> </system>
>>>>>> ---------- cut ------
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> This mail is send to you from the opencms-dev mailing list
>>>>>> To change your list options, or to unsubscribe from the list,
>>>>>> please visit
>>>>>> http://mail.opencms.org/mailman/listinfo/opencms-dev
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> This mail is send to you from the opencms-dev mailing list
>>>>> To change your list options, or to unsubscribe from the list,
>>>>> please visit
>>>>> http://mail.opencms.org/mailman/listinfo/opencms-dev
>>>>>
>>>> _______________________________________________
>>>> This mail is send to you from the opencms-dev mailing list
>>>> To change your list options, or to unsubscribe from the list,
>>>> please visit
>>>> http://mail.opencms.org/mailman/listinfo/opencms-dev
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> This mail is send to you from the opencms-dev mailing list
>>> To change your list options, or to unsubscribe from the list, please
>>> visit
>>> http://mail.opencms.org/mailman/listinfo/opencms-dev
>>>
>> _______________________________________________
>> This mail is send to you from the opencms-dev mailing list
>> To change your list options, or to unsubscribe from the list, please
>> visit
>> http://mail.opencms.org/mailman/listinfo/opencms-dev
>
>
> _______________________________________________
> This mail is send to you from the opencms-dev mailing list
> To change your list options, or to unsubscribe from the list, please
> visit
> http://mail.opencms.org/mailman/listinfo/opencms-dev
>
More information about the opencms-dev
mailing list