[opencms-dev] Lucene 1.5 pdf & doc won't index

Miyuru C. Ratnayake miyuruchanna at yahoo.com
Wed Mar 17 04:21:01 CET 2004


M Butcher,

=====IndexManager=============================================================
[17.03.2004 09:18:10] <opencms_info> Analyzer:
org.apache.lucene.analysis.standard.StandardAnalyzer
[17.03.2004 09:18:10] <opencms_info> IndexManager:
indexing /TBOKCMS/Documents/
[17.03.2004 09:18:10] <opencms_info> IndexManager:
indexing /TBOKCMS/Documents/Certification/
[17.03.2004 09:18:10] <opencms_info> IndexManager:
indexing /TBOKCMS/Documents/Certification/123/
[17.03.2004 09:18:10] <opencms_info> IndexManager:
indexing /TBOKCMS/Documents/Certification/Resource/
[17.03.2004 09:18:10] <opencms_info> IndexManager:
indexing /TBOKCMS/Documents/Certification/Sample/
[17.03.2004 09:18:10] <opencms_info> IndexManager:
indexing /TBOKCMS/Documents/Persistence/
[17.03.2004 09:18:10] <opencms_info> IndexManager:
indexing /TBOKCMS/Documents/Persistence/Resource/
[17.03.2004 09:18:10] <opencms_info> IndexManager:
indexing /TBOKCMS/Documents/Persistence/Sample/
[17.03.2004 09:18:10] <opencms_info> IndexManager:
indexing /TBOKCMS/Documents/Persistence/Tips/
[17.03.2004 09:18:10] <opencms_info> IndexManager:
indexing /TBOKCMS/Documents/Security/
[17.03.2004 09:18:10] <opencms_info> IndexManager:
indexing /TBOKCMS/Documents/Security/Resource/
[17.03.2004 09:18:10] <opencms_info> IndexManager:
indexing /TBOKCMS/Documents/Security/Sample/
[17.03.2004 09:18:10] <opencms_info> IndexManager:
indexing /TBOKCMS/Documents/WebServices/
[17.03.2004 09:18:10] <opencms_info> IndexManager:
indexing /TBOKCMS/Documents/WebServices/Resources/
[17.03.2004 09:18:10] <opencms_info> IndexManager:
indexing /TBOKCMS/Documents/WebServices/Samples/
[17.03.2004 09:18:11] <opencms_info> IndexManager:
indexing /TBOKCMS/Documents/WebServices/Tips/
[17.03.2004 09:18:11] <opencms_info> IndexManager: 0
documents are being processed
[17.03.2004 09:18:11] <opencms_info> IndexManager: 
Index has been optimized.
[17.03.2004 09:18:11] <opencms_info> Done
=====IndexManager=============================================================


the relavant registry.xml used for this....

<luceneSearch>
			<mergeFactor>100000</mergeFactor>
   			<permCheck>true</permCheck>
			<indexDir>C:\lucene\TBOKCMS\</indexDir>
		
<analyzer>org.apache.lucene.analysis.standard.StandardAnalyzer</analyzer>
  			<subsearch>true</subsearch>	
			<project>online</project>
   			<docFactories>
   				<docFactory enabled="true" type="page">
         		
<class>net.grcomputing.opencms.search.lucene.PageDocument</class>
      			</docFactory>
				<docFactory enabled="true" type="plain">
          			<fileType name="plaintext">
            			<extension>.txt</extension>
            		
<class>net.grcomputing.opencms.search.lucene.PlainDocument</class>
          			</fileType>
          			<fileType name="taggedtext">
            			<extension>.html</extension>
            			<extension>.htm</extension>
            			<extension>.xml</extension>
                       
<class>net.grcomputing.opencms.search.lucene.TaggedPlainDocument</class>
          			</fileType>
       			</docFactory>
				<docFactory enabled="true" type="binary">
					<fileType name="Word">
						<extension>.doc</extension>
					
<class>net.grcomputing.opencms.search.lucene.WordDocument</class>
					</fileType>
					<fileType name="PDF">
						<extension>.pdf</extension>
					
<class>net.grcomputing.opencms.search.lucene.PDFDocument</class>
					</fileType>
				</docFactory>					
				<docFactory enabled="false" type="jsp">
         		
<class>net.grcomputing.opencms.search.lucene.JspDocument</class>
       			</docFactory>
		       <docFactory enabled="false" type="news">
        		
<class>net.grcomputing.opencms.search.lucene.NewsDocument</class>
       			</docFactory>
				<docFactory enabled="false" type="forum">
         		
<class>de.wfnetz.opencms.modules.forum.ContributionDocument</class>
       			</docFactory>
				<docFactory enabled="false" type="XML Template"/>
   			</docFactories>
   			<directories>
			   <directory location="/TBOKCMS/Documents/">
				 <section>TBOK CMS</section>
				 <subsearch>true</subsearch>
			   </directory>
   			</directories>
   			<contentDefinitions>
			   <contentDefinition type="news">
				
<class>com.opencms.modules.homepage.news.NewsContentDefinition</class>
				
<initClass>net.grcomputing.opencms.search.lucene.NewsInitialization</initClass>
				 <listMethod name="getNewsList">
				   <param type="java.lang.Integer">1</param>
				   <param type="java.lang.String">-1</param>
				 </listMethod>
				 <page uri="/news.html?__element=entry">
				   <param method="getIntId" name="newsid"/>
				 </page>
			   </contentDefinition>
			   <contentDefinition type="forum">
				
<class>de.wfnetz.opencms.modules.forum.ContributionContentDefinition</class>
				 <listMethod name="getSortedList">
				   <param type="java.lang.String"/>
				 </listMethod>
				 <page
uri="/forum.html?forumtemplate=viewcontributionentry">
				   <param method="getId" name="conid"/>
				 </page>
			   </contentDefinition>
   			</contentDefinitions>
		</luceneSearch>
Thanks,
Miyuru.


M Butcher <mbutcher at grcomputing.net> wrote:

Miyuru,

Can you send the section of the log that shows the
IndexManager entries. 
It starts:

=====IndexManager=========================

And it should show what DocumentFactories and
extension maps were loaded.

Matt

Miyuru C. Ratnayake wrote:
> Hi,
> 
> There are no errors. Only 4 documents get indexed
they all are .txt 
> documents in plain type. There are .pdf and .doc
documents too but they 
> won't get indexed
> 
> Miyuru
> 
> Do you Yahoo!?
> *Yahoo! Mail* 
> - More reliable, more storage, less spam
> 

_______________________________________________
This mail is send to you from the opencms-dev mailing
list
To change your list options, or to unsubscribe from
the list, please visit
http://mail.opencms.org/mailman/listinfo/opencms-dev

__________________________________
Do you Yahoo!?
Yahoo! Mail - More reliable, more storage, less spam
http://mail.yahoo.com



More information about the opencms-dev mailing list