[opencms-dev] can't get lucene 1.5 to work

Olli Aro olli_aro at yahoo.co.uk
Wed Jun 23 10:40:01 CEST 2004


Here is a resend, since I seemed to miss the crucial word ‘off’ from my
previous mail:-) The configuration below should work. The reason why your
original configuration does not work is because the cron is trying to index
some content definitions as defined in the configuration, which your site
most likely doesn’t have.

 

Olli

 

<?xml version="1.0" encoding="ISO-8859-1"?>
<registry>
    <system>
- <luceneSearch>
- <!-- 
          - mergeFactor and permCheck are currently ignored.
          

  --> 
  <mergeFactor>100000</mergeFactor> 
  <permCheck>true</permCheck> 
- <!-- 
          - directory in which lucene will store its indexes. Note: this is
real
          - fs, not VFS.
          

  --> 
  <indexDir>C:\luceneindex</indexDir> 
- <!--  <indexDir>F:\luceneindex\</indexDir> 
  --> 
- <!-- 
          - The analyzer is used for parsing documents. Choose one for your 
          - language. If language is English, use the StandardAnalyzer.
          - There are additional analyzers at HYPERLINK
"http://jakarta.apache.org/lucene"http://jakarta.apache.org/lucene
          

  --> 
  <analyzer>org.apache.lucene.analysis.standard.StandardAnalyzer</analyzer> 
- <!--  <analyzer>org.apache.lucene.analysis.de.GermanAnalyzer</analyzer> 
  --> 
- <!-- 
          - If subsearch is true, subfolders will be searched by default.
          - This can be turned on/off per directory.
          

  --> 
  <subsearch>true</subsearch> 
- <!-- 
          - Name of the project to index. Online is recommended.
          

  --> 
  <project>online</project> 
- <!-- 
          - docFactories determine how documents are processed. Generally,
one
          - docFactory exists for each type of content (viz. JSP, Page,
Plain) 
          - that you want to index.
          

  --> 
- <docFactories>
- <!-- 
             - This docFactory indexes documents with type page (e.g. HTML 
                 - files edited with the WYSIWYG editor). 
             

  --> 
- <docFactory enabled="true" type="page">
  <class>net.grcomputing.opencms.search.lucene.PageDocument</class> 
  </docFactory>
- <!-- 
             - This docFactory is a little more complex. It takes documents
of
                 - type "plain" and determines, by extension, what class
should be
                 - used to index each particular file. In this example, we
want to
                 - index plain text files exactly as they are, but any files
that 
                 - contain tags need the tags stripped out before they are
indexed.
                 -
                 - Note that the name="" attribute is simply for pretty
output, and 
                 - can contain any allowable PCDATA text.
                 

  --> 
- <docFactory enabled="true" type="plain">
- <fileType name="plaintext">
  <extension>.txt</extension> 
  <class>net.grcomputing.opencms.search.lucene.PlainDocument</class> 
  </fileType>
- <fileType name="taggedtext">
  <extension>.html</extension> 
  <extension>.htm</extension> 
  <extension>.xml</extension> 
- <!--  This will strip tags before processing 
  --> 
  <class>net.grcomputing.opencms.search.lucene.TaggedPlainDocument</class> 
  </fileType>
  </docFactory>
- <!-- 
             - This will strip JSP tags and all scriptlets. IT WILL NOT
RENDER THE
                 - JSP FIRST, as JSPs are, by nature, dynamic.
                 -
                 - Usually, this is off by default.
                 

  --> 
- <docFactory enabled="false" type="jsp">
  <class>net.grcomputing.opencms.search.lucene.JspDocument</class> 
  </docFactory>
- <!--  For the news module. Enable if you use news 
  --> 
- <docFactory enabled="false" type="news">
  <class>net.grcomputing.opencms.search.lucene.NewsDocument</class> 
  </docFactory>
- <!--  For the forum module. Enable if you use forums. 
  --> 
- <docFactory enabled="false" type="forum">
  <class>de.wfnetz.opencms.modules.forum.ContributionDocument</class> 
  </docFactory>
- <!--  If you need to index XML Template files (bad idea) use this: 
  --> 
  <docFactory enabled="false" type="XML Template" /> 
  </docFactories>
- <!-- 
          - <directories/> determines which directories are indexed. By
default,
          - the /system directory is never indexed, so it is safe to index
root.
          -
          - If you want to specify only certain directories for indexing,
create
          - one <directory/> entry per directory. Again, you may use
subsearch to
          - override the default subsearch setting discussed above.
          

  --> 
- <directories>
- <directory location="/">
  <section>Root</section> 
  <subsearch>true</subsearch> 
  </directory>
  </directories>
- <!-- 
     - Use this section to define specific contentDefinitions. Provided
below
         - are entries for the news and forum modules.
         

  --> 
- <contentDefinitions/>
  </luceneSearch>
- <!-- 
   - END lucene config
   

  -->

 

   _____  

 


---
Incoming mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.708 / Virus Database: 464 - Release Date: 18/06/2004



---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.708 / Virus Database: 464 - Release Date: 18/06/2004



---
Incoming mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.708 / Virus Database: 464 - Release Date: 18/06/2004



---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.708 / Virus Database: 464 - Release Date: 18/06/2004
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://webmail.opencms.org/pipermail/opencms-dev/attachments/20040623/4822568e/attachment.htm>


More information about the opencms-dev mailing list