[opencms-dev] Re: [opencms-dev] Lucene 1.5 on Linux doesn't work for me, please help

Arash Kaffamanesh kaffamanesh at dmu-world.de
Mon Jul 19 21:56:02 CEST 2004


Darin,

I got tomcat 5.0.27, the indexing worked charming with the same ocms
standard 5.0.1 app, here is the lucene part of registry.xml:
I'm going to reinstall my tomcat 5.0.25 on linux and try it again and
let you know about the result.

Thanks, arash

<luceneSearch>
	<!--
	  - mergeFactor and permCheck are currently ignored.
	  -->
   <mergeFactor>100000</mergeFactor>
   <permCheck>true</permCheck>

	<!--
	  - directory in which lucene will store its indexes. Note: this
is real
	  - fs, not VFS.
	  -->
   <indexDir>/home/ark.old/indexDir/</indexDir>
   <!-- <indexDir>D:\indexDir\</indexDir> -->

	<!--
	  - The analyzer is used for parsing documents. Choose one for
your 
	  - language. If language is English, use the StandardAnalyzer.
	  - There are additional analyzers at
http://jakarta.apache.org/lucene
	  -->
 
<analyzer>org.apache.lucene.analysis.standard.StandardAnalyzer</analyzer
>
   <!--
<analyzer>org.apache.lucene.analysis.de.GermanAnalyzer</analyzer> -->

	<!--
	  - If subsearch is true, subfolders will be searched by
default.
	  - This can be turned on/off per directory.
	  -->
   <subsearch>true</subsearch>	

	<!--
	  - Name of the project to index. Online is recommended.
	  -->
   <project>online</project>
   
	<!--
	  - docFactories determine how documents are processed.
Generally, one
	  - docFactory exists for each type of content (viz. JSP, Page,
Plain) 
	  - that you want to index.
	  -->
   <docFactories>
   
	   <!--
	     - This docFactory indexes documents with type page (e.g.
HTML 
		 - files edited with the WYSIWYG editor). 
	     -->
       <docFactory enabled="true" type="page">
 
<class>net.grcomputing.opencms.search.lucene.PageDocument</class>
       </docFactory>

	   <!--
	     - This docFactory is a little more complex. It takes
documents of
		 - type "plain" and determines, by extension, what class
should be
		 - used to index each particular file. In this example,
we want to
		 - index plain text files exactly as they are, but any
files that 
		 - contain tags need the tags stripped out before they
are indexed.
		 -
		 - Note that the name="" attribute is simply for pretty
output, and 
		 - can contain any allowable PCDATA text.
		 -->
       <docFactory enabled="true" type="plain">
          <fileType name="plaintext">
            <extension>.txt</extension>
 
<class>net.grcomputing.opencms.search.lucene.PlainDocument</class>
          </fileType>
          <fileType name="taggedtext">
            <extension>.html</extension>
            <extension>.htm</extension>
            <extension>.xml</extension>
            <!-- This will strip tags before processing -->
 
<class>net.grcomputing.opencms.search.lucene.TaggedPlainDocument</class>
          </fileType>
       </docFactory>
       
       
       <docFactory enabled="true" type="binary">
          <fileType name="pdftext">
                        <extension>.pdf</extension>

	<class>net.grcomputing.opencms.search.lucene.PDFDocument</class>
           </fileType>
                    <fileType name="wordtext">
                    <extension>.doc</extension>

 
<class>net.grcomputing.opencms.search.lucene.WordDocument</class>
                    </fileType>
        </docFactory>


	   <!--
	     - This will strip JSP tags and all scriptlets. IT WILL NOT
RENDER THE
		 - JSP FIRST, as JSPs are, by nature, dynamic.
		 -
		 - Usually, this is off by default.
		 -->
       <docFactory enabled="false" type="jsp">
 
<class>net.grcomputing.opencms.search.lucene.JspDocument</class>
       </docFactory>

	   <!-- For the news module. Enable if you use news -->
       <docFactory enabled="false" type="news">
 
<class>net.grcomputing.opencms.search.lucene.NewsDocument</class>
       </docFactory>

	   <!-- For the forum module. Enable if you use forums. -->
       <docFactory enabled="false" type="forum">
 
<class>de.wfnetz.opencms.modules.forum.ContributionDocument</class>
       </docFactory>

	   <!-- If you need to index XML Template files (bad idea) use
this: -->
       <docFactory enabled="false" type="XML Template"/>
   </docFactories>
   
	<!--
	  - <directories/> determines which directories are indexed. By
default,
	  - the /system directory is never indexed, so it is safe to
index root.
	  -
	  - If you want to specify only certain directories for
indexing, create
	  - one <directory/> entry per directory. Again, you may use
subsearch to
	  - override the default subsearch setting discussed above.
	  -->
   <directories>
       <directory location="/release/en/DEVELOP/">
         <section>DEVELOP</section>
         <subsearch>true</subsearch>
       </directory>
   </directories>

   <!--
     - Use this section to define specific contentDefinitions. Provided
below
	 - are entries for the news and forum modules.
	
   <contentDefinitions>
       <contentDefinition type="news">
 
<class>com.opencms.modules.homepage.news.NewsContentDefinition</class>
 
<initClass>net.grcomputing.opencms.search.lucene.NewsInitialization</ini
tClass>
         <listMethod name="getNewsList">
           <param type="java.lang.Integer">1</param>
           <param type="java.lang.String">-1</param>
         </listMethod>
         <page uri="/news.html?__element=entry">
           <param method="getIntId" name="newsid"/>
         </page>
       </contentDefinition>
       <contentDefinition type="forum">
 
<class>de.wfnetz.opencms.modules.forum.ContributionContentDefinition</cl
ass>
         <listMethod name="getSortedList">
           <param type="java.lang.String"/>
         </listMethod>
         <page uri="/forum.html?forumtemplate=viewcontributionentry">
           <param method="getId" name="conid"/>
         </page>
       </contentDefinition>
   </contentDefinitions>
    -->
</luceneSearch>

-----Original Message-----
From: opencms-dev-admin at opencms.org
[mailto:opencms-dev-admin at opencms.org] On Behalf Of Darin Kuntze
Sent: Monday, July 19, 2004 4:19 PM
To: opencms-dev at opencms.org
Subject: RE: [opencms-dev] Re: [opencms-dev] Lucene 1.5 on Linux doesn't
work for me, please help


Could you post the pertinent parts of your registry?

-----Original Message-----
From: opencms-dev-admin at opencms.org
[mailto:opencms-dev-admin at opencms.org]
On Behalf Of kaffamanesh at dmu-world.de
Sent: Monday, July 19, 2004 8:44 AM
To: opencms-dev at opencms.org
Subject: [opencms-dev] Re: [opencms-dev] Lucene 1.5 on Linux doesn't
work for me, please help



I've deployed the same App unter embedded tomcat4.1.29 in jBoss 3.2.3
and the index works on Linux so brave as it should, Strange behaviour
;-)



Arash wrote:


> 
> I'd installed lucene 1.5 on opemcms_5.0.1 (on win2k3 and suse9.1
> (kernel 2.6), on windows it was setup within 5 minutes with tomcat 
> 4.1.29.
> 
> Now I'm getting the following error on Linux with tomcat 5.0.25, my
> registry.xml contains:
> 
> 	
> -
> 		
>           		DEVELOP
>          		true
>         	
> 		
> 	
> 
> under develop I have *.pdf, *.doc, .txt, .html and .htm files
> 
> and my indexDir is:
> 
> /home/ark.old/indexDir/
> 
> Lucene 1.2 could write her indices in the same directory on the same
> mashine
> 
> any ideas?
> permission problems, what means the Unknown Source message in the
> trace below?
> 
> thanks for any help in advance
> 
> kind regards
> Arash
> 
> 
> =====IndexManager=====================================================
> ========
> [19.07.2004 14:55:10]  Analyzer: 
> org.apache.lucene.analysis.standard.StandardAnalyzer
> [19.07.2004 14:55:10]  Extension map exists to handle plaintext
> [19.07.2004 14:55:10]  Extension map exists to handle
> taggedtext
> [19.07.2004 14:55:10]  Extension map exists to handle Word
> [19.07.2004 14:55:10]  Extension map exists to handle PDF
> [19.07.2004 14:55:10]  Page DocumentFactory loaded
> [19.07.2004 14:55:10]  Error running job for
> com.opencms.core.CmsCronEntry{55 14 * * * admin Administrators
> net.grcomputing.opencms.search.lucene.CronIndexManager
> createIndex=true} Error: java.lang.NullPointerException
> 	at
org.apache.lucene.store.FSDirectory.create(FSDirectory.java:172)
> 	at org.apache.lucene.store.FSDirectory.(FSDirectory.java:151)
> 	at
> org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:132)
> 	at
> org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:113)
> 	at org.apache.lucene.index.IndexWriter.(IndexWriter.java:151)
> 	at
net.grcomputing.opencms.search.lucene.IndexManager.doIndex(Unknown
> Source)
> 	at 
> net.grcomputing.opencms.search.lucene.CronIndexManager.launch(Unknown
> Source)
> 	at
com.opencms.core.CmsCronScheduleJob.run(CmsCronScheduleJob.java:68)
> _______________________________________________
> This mail is send to you from the opencms-dev mailing list
> To change your list options, or to unsubscribe from the list, please 
> visit http://mail.opencms.org/mailman/listinfo/opencms-dev
-- 
arash kaffamanesh gueltlingen
ackerstrasse 145
10115 berlin
tel. 0151 12210107 _______________________________________________
This mail is send to you from the opencms-dev mailing list
To change your list options, or to unsubscribe from the list, please
visit http://mail.opencms.org/mailman/listinfo/opencms-dev



_______________________________________________
This mail is send to you from the opencms-dev mailing list
To change your list options, or to unsubscribe from the list, please
visit http://mail.opencms.org/mailman/listinfo/opencms-dev




More information about the opencms-dev mailing list