[opencms-dev] Lucene search module.

M Butcher mbutcher at grcomputing.net
Wed Aug 20 18:05:02 CEST 2003


The Lucene index is not stored _inside_ the VFS, it is stored on the
underlying file system. So, if you are on UNIX, check the /lucenindex
directory on your filesystem.

Does that make sense?

There are three reasons why the Lucene indices are not stored inside the
CMS:
1) It doesn't make a lot of sense to import that sort of data and
clutter up the VFS. Since it runs only against the live system, storing
it in the VFS would just mean that you had to publish the folder every
time the indexer ran (once a day by default).
2) Storing the data inside of the CMS would slow down the index scanning
procedure.
3) I would have had to write an extension to Lucene to get it to store
in a database instead of in the fs.

Hope that helps,

Matt

On Wed, 2003-08-20 at 03:38, Abhishek Tiwari wrote:
> Hi all,
> 
> For this search module I have created a directory on the root folder
> of opencms and mentioned the path of the indexdir(which has all the
> index files in it) in the registry.xml .
> Here is a part of my registry.xml
> ***************************************************
> <luceneSearch>
>             <mergeFactor>100000</mergeFactor>
>             <permCheck>true</permCheck>
>             <indexDir>/luceneindex/</indexDir>
>            
> <analyzer>org.apache.lucene.analysis.standard.StandardAnalyzer</analyzer>
>             <subsearch>true</subsearch>
>             <project>online</project>
>             <docFactories>
>                 <pageDocFactory enabled="true">
>                    
> <class>net.grcomputing.opencms.search.lucene.PageDocument</class>
>                 </pageDocFactory>
>                 <plainDocFactory enabled="true">
>                     <fileType name="plaintext">
>                         <extension>.txt</extension>
>                        
> <class>net.grcomputing.opencms.search.lucene.PlainDocument</class>
>                     </fileType>
>                     <fileType name="taggedtext">
>                         <extension>.html</extension>
>                         <extension>.htm</extension>
>                         <extension>.xml</extension>
>                         <!-- This will strip tags before processing
> -->
>                        
> <class>net.grcomputing.opencms.search.lucene.TaggedPlainDocument</class>
>                     </fileType>
>                 </plainDocFactory>
>                 <jspDocFactory enabled="true">
>                    
> <class>net.grcomputing.opencms.search.lucene.JspDocument</class>
>                 </jspDocFactory>
>                 <xmlTemplateDocFactory enabled="false"/>
>             </docFactories>
>             <directories>
>                 <directory location="/myfolder/">
>                     <section>Test</section>
>                     <subsearch>true</subsearch>
>                 </directory>
>             </directories>
>         </luceneSearch>
>     </system>
> 
> *************************************************************
> Now the indexer is running fine indexing 5 files under "myfolder" and
> the logfile viewer shows 
> 
> Successful launch of job com.opencms.core.CmsCronEntry{50 14 * * *
> Admin Administrators
> net.grcomputing.opencms.search.lucene.CronIndexManager
> createIndex=true} Message: CronIndexManager rebuilt the Lucene index
> on Wed Aug 20 14:50:13 GMT+05:30 2003
> 
> But still there is no file under the directory luceneindex after the
> indexing gets over, I tried it 5-6 times but to no avail??
> Anybody having a clue abt it..
> 
> Thanks,
> Abhishek.
> 
> 
> 
> 
> 
> _______________________________________________ This mail is send to
> you from the opencms-dev mailing list To change your list options, or
> to unsubscribe from the list, please visit
> http://mail.opencms.org/mailman/listinfo/opencms-dev
-- 
M Butcher <mbutcher at grcomputing.net>



More information about the opencms-dev mailing list