[opencms-dev] OpenCMSLucene 1.4 search: Word doc indexing is done but not for html/txt

Hartmann, Waehrisch & Feykes GmbH hartmann at waehrisch-feykes.de
Thu Jan 22 08:35:01 CET 2004


I tried to tell you to make sure that you use the new registry.xml format. The tags "plainDocFactory", "jspDocFactory" and "pageDocFactory" are obsolete and not used anymore. You have to replace them with 
<docFactory type="plain" enabled="true">
<docFactory type="page" enabled = true">

Regards,
Stephan

  ----- Original Message ----- 
  From: Ritwik Datta 
  To: opencms-dev at opencms.org 
  Sent: Thursday, January 22, 2004 6:45 AM
  Subject: [opencms-dev] OpenCMSLucene 1.4 search: Word doc indexing is done but not for html/txt


  Dear All,


  I have compiled opencmslucene 1.4 source from sourceforge.net CVS repository. Now I am able to index Word Documents. But what I noticed is indexing for other file extension like html txt is not happening. It was happening with lucene module 1.3 for opencms. My registry.xml does contain entries for PlainDocument, Taggeddocument and of course word document. but Index manager is not taking other files into consideration other than Word documents.
  Earlier I had opencmslucene 1.3. But to upgrade I downloaded all java files from latest CVS, compiled and uploaded under $TOMCAT_HOME/webapps/opencms/WEB-INF/classes/net/grcomputing/opencms/search/lucene and jakarta-poi-1.9.0-dev-20030109.jar & tm-extractors-0.2.jar under $TOMCAT_HOME/webapps/opencms/WEB-INF/lib folder.
   I am pasting the relevant contents of my registry.xml and log entries of Index manager. but I need html/txt indexing also. Please help me. This is urgent.


  <luceneSearch>
              <mergeFactor>100000</mergeFactor>
              <permCheck>true</permCheck>
              <indexDir>/opt/lucene/index/opencms/</indexDir>
              <analyzer>org.apache.lucene.analysis.standard.StandardAnalyzer</analyzer>
              <subsearch>true</subsearch>
              <project>online</project>
              <docFactories>
                  <pageDocFactory enabled="true">
                      <class>net.grcomputing.opencms.search.lucene.PageDocument</class>
                  </pageDocFactory>
                  <plainDocFactory enabled="true">
                      <fileType name="plaintext">
                          <extension>.txt</extension>
                          <class>net.grcomputing.opencms.search.lucene.PlainDocument</class>
                      </fileType>
                      <fileType name="taggedtext">
                          <extension>.html</extension>
                          <extension>.htm</extension>
                          <extension>.xml</extension>
                          <!-- This will strip tags before processing -->
                          <class>net.grcomputing.opencms.search.lucene.TaggedPlainDocument</class>
                      </fileType>
                  </plainDocFactory>
      <docFactory type="binary" enabled="true">
       <fileType name="doctext">
        <extension>.doc</extension>
        <extension>.dot</extension>
        <class>net.grcomputing.opencms.search.lucene.WordDocument</class>
       </fileType>
      </docFactory>
                  <jspDocFactory enabled="true">
                      <class>net.grcomputing.opencms.search.lucene.JspDocument</class>
                  </jspDocFactory>
                  <xmlTemplateDocFactory enabled="false"/>
     </docFactories>
     <directories>
                  <directory location="/release/">
                      <section>Test</section>
                      <subsearch>true</subsearch>
                  </directory>
              </directories>
          </luceneSearch>

  =====IndexManager=============================================================
  [22.01.2004 09:46:10] <opencms_info> Analyzer: org.apache.lucene.analysis.standard.StandardAnalyzer
  [22.01.2004 09:46:10] <opencms_info> Extension map exists to handle doctext
  [22.01.2004 09:46:10] <opencms_info> IndexManager: indexing /release/
  [22.01.2004 09:46:10] <opencms_info> IndexManager: indexing /release/spdb/
  [22.01.2004 09:46:10] <opencms_info> IndexManager: indexing /release/spdb/Assessment_Findings/
  [22.01.2004 09:46:10] <opencms_info> IndexManager: indexing /release/spdb/Best_Practices/
  [22.01.2004 09:46:10] <opencms_info> IndexManager: indexing /release/spdb/Business_Goals/
  [22.01.2004 09:46:10] <opencms_info> IndexManager: indexing /release/spdb/CMC_Product_Information/
  [22.01.2004 09:46:10] <opencms_info> IndexManager: indexing /release/spdb/CMM_Action_Plans/
  [22.01.2004 09:46:10] <opencms_i nfo> IndexManager: indexing /release/spdb/Coding_Standard/
  [22.01.2004 09:46:10] <opencms_info> IndexManager: indexing /release/spdb/Dashboard/
  [22.01.2004 09:46:10] <opencms_info> IndexManager: indexing /release/spdb/Defect_Prevention/
  [22.01.2004 09:46:10] <opencms_info> IndexManager: indexing /release/spdb/ER_SI_Organisation_Structure/
  [22.01.2004 09:46:10] <opencms_info> IndexManager: indexing /release/spdb/Estimation/
  [22.01.2004 09:46:10] <opencms_info> IndexManager: indexing /release/spdb/Expert_List/
  [22.01.2004 09:46:10] <opencms_info> IndexManager: indexing /release/spdb/FAQ/
  [22.01.2004 09:46:10] <opencms_info> IndexManager: indexing /release/spdb/IGC_OSSP_Role_Mapping/
  [22.01.2004 09:46:10] <opencms_info> IndexManager: indexing /release/spdb/Metrics_and_Measurements/
  [22.01.2004 09:46:10] <opencms_info> IndexManager: indexing /release/spdb/OQPM/
  [22.01.2004 09:46:10] <opencms_info&g t; IndexManager: indexing /release/spdb/OSSP/
  [22.01.2004 09:46:10] <opencms_info> IndexManager: indexing /release/spdb/Presentation_Library/
  [22.01.2004 09:46:10] <opencms_info> IndexManager: indexing /release/spdb/Process_Change_Management/
  [22.01.2004 09:46:10] <opencms_info> IndexManager: indexing /release/spdb/Projectwise_Plans/
  [22.01.2004 09:46:10] <opencms_info> IndexManager: indexing /release/spdb/PROMPT/
  [22.01.2004 09:46:10] <opencms_info> IndexManager: indexing /release/spdb/Readables/
  [22.01.2004 09:46:10] <opencms_info> IndexManager: indexing /release/spdb/Sample_CMM_Documents/
  [22.01.2004 09:46:10] <opencms_info> IndexManager: indexing /release/spdb/SCM/
  [22.01.2004 09:46:11] <opencms_info> IndexManager: indexing /release/spdb/SEPG/
  [22.01.2004 09:46:11] <opencms_info> IndexManager: indexing /release/spdb/SPDB_Notes/
  [22.01.2004 09:46:11] <opencms_info> IndexManager: indexin g /release/spdb/SPDB_Search/
  [22.01.2004 09:46:11] <opencms_info> IndexManager: indexing /release/spdb/SQA/
  [22.01.2004 09:46:11] <opencms_info> IndexManager: indexing /release/spdb/TCM/
  [22.01.2004 09:46:11] <opencms_info> IndexManager: indexing /release/spdb/TCM/Notes/
  [22.01.2004 09:46:11] <opencms_info> IndexManager: indexing /release/spdb/TCM/Others/
  [22.01.2004 09:46:11] <opencms_info> IndexManager: indexing /release/spdb/TCM/Reusable_Assets/
  [22.01.2004 09:46:11] <opencms_info> IndexManager: indexing /release/spdb/TCM/Reusable_Assets/Asset_Data/
  [22.01.2004 09:46:11] <opencms_info> IndexManager: indexing /release/spdb/TCM/Reusable_Assets/Asset_Details/
  [22.01.2004 09:46:11] <opencms_info> IndexManager: indexing /release/spdb/TCM/Reusable_Assets/Asset_Details/Bilingual_2-tier_Application_to_3-tier_Conversion/
  [22.01.2004 09:46:11] <opencms_info> IndexManager: indexing /release/spdb/TCM/Reusable_Assets/Asset_Details/Citrix/
  [22.01.2004 09:46:11] <opencms_info> IndexManager: indexing /release/spdb/TCM/Reusable_Assets/Asset_Details/Compilation_Problem/
  [22.01.2004 09:46:11] <opencms_info> IndexManager: indexing /release/spdb/TCM/Reusable_Assets/Asset_Details/Driver_Installation/
  [22.01.2004 09:46:11] <opencms_info> IndexManager: indexing /release/spdb/TCM/Reusable_Assets/Asset_Details/FTP_Service_on_Linux/
  [22.01.2004 09:46:11] <opencms_info> IndexManager: indexing /release/spdb/TCM/Reusable_Assets/Asset_Details/Hindi_Email/
  [22.01.2004 09:46:11] <opencms_info> IndexManager: indexing /release/spdb/TCM/Reusable_Assets/Asset_Details/Hindi_Integration_Development_Guidelines/
  [22.01.2004 09:46:11] <opencms_info> IndexManager: indexing /release/spdb/TCM/Reusable_Assets/Asset_Details/HW_Requirement_for_Oracle9i_9iDS_9iASR2/
  [22.01.2004 09:46:11] <opencms_info> IndexManager: indexing /release/spdb/TCM/Reusable_Assets/Asset_Details/Oracle_9i_Application_Server_Release2_Installation/
  [22.01.2004 09:46:11] <opencms_info> IndexManager: indexing /release/spdb/TCM/Reusable_Assets/Asset_Details/Oracle_Forms9i_to_Forms6i_Conversion/
  [22.01.2004 09:46:11] <opencms_info> IndexManager: indexing /release/spdb/TCM/Reusable_Assets/Asset_Details/Oracle_Froms6i_Deployment_on_9iAS/
  [22.01.2004 09:46:11] <opencms_info> IndexManager: indexing /release/spdb/TCM/Reusable_Assets/Asset_Details/ORARRP_Reusable_Components/
  [22.01.2004 09:46:11] <opencms_info> IndexManager: indexing /release/spdb/TCM/Reusable_Assets/Asset_Details/OS_Problem/
  [22.01.2004 09:46:11] <opencms_info> IndexManager: indexing /release/spdb/TCM/Reusable_Assets/Asset_Details/Red_Hat_Advance_Server_Installation/
  [22.01.2004 09:46:11] <opencms_info> IndexManager: indexing /release/spdb/TCM/Reusable_Assets/Project_Info/
  [22.01.2004 09:46:11] <opencms_ info> IndexManager: indexing /release/spdb/TCM/Reusable_Assets/Register/
  [22.01.2004 09:46:11] <opencms_info> IndexManager: indexing /release/spdb/TCM/Reusable_Assets/Training_Materials/
  [22.01.2004 09:46:11] <opencms_info> IndexManager: indexing /release/spdb/TCM/TCM_Plans/
  [22.01.2004 09:46:11] <opencms_info> IndexManager: indexing /release/spdb/TCM/Templates/
  [22.01.2004 09:46:12] <opencms_info> IndexManager: indexing /release/spdb/Timesheet/
  [22.01.2004 09:46:12] <opencms_info> IndexManager: indexing /release/spdb/Training/
  [22.01.2004 09:46:12] <opencms_info> IndexManager: 4 documents are being processed
  [22.01.2004 09:46:13] <opencms_info> IndexManager:  Index has been optimized.
  [22.01.2004 09:46:13] <opencms_info> Done



------------------------------------------------------------------------------
  Do you Yahoo!?
  Yahoo! SiteBuilder - Free web site building tool. Try it!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://webmail.opencms.org/pipermail/opencms-dev/attachments/20040122/5ae2679b/attachment.htm>


More information about the opencms-dev mailing list