[opencms-dev] Lucene 1.5 pdf & doc won't index

Miyuru C. Ratnayake miyuruchanna at yahoo.com
Wed Mar 17 04:28:01 CET 2004


M Buther,

When I use 
<plainDocFactory enebled="true">
then plian documents get indexed

if i used
<docFactory enebled="true" type="plain">
then it won't get indexed, this is what I send you in
the previos mail

thanks,
Miyuru.

--- "Miyuru C. Ratnayake" <miyuruchanna at yahoo.com>
wrote:
> M Butcher,
> 
>
=====IndexManager=============================================================
> [17.03.2004 09:18:10] <opencms_info> Analyzer:
> org.apache.lucene.analysis.standard.StandardAnalyzer
> [17.03.2004 09:18:10] <opencms_info> IndexManager:
> indexing /TBOKCMS/Documents/
> [17.03.2004 09:18:10] <opencms_info> IndexManager:
> indexing /TBOKCMS/Documents/Certification/
> [17.03.2004 09:18:10] <opencms_info> IndexManager:
> indexing /TBOKCMS/Documents/Certification/123/
> [17.03.2004 09:18:10] <opencms_info> IndexManager:
> indexing /TBOKCMS/Documents/Certification/Resource/
> [17.03.2004 09:18:10] <opencms_info> IndexManager:
> indexing /TBOKCMS/Documents/Certification/Sample/
> [17.03.2004 09:18:10] <opencms_info> IndexManager:
> indexing /TBOKCMS/Documents/Persistence/
> [17.03.2004 09:18:10] <opencms_info> IndexManager:
> indexing /TBOKCMS/Documents/Persistence/Resource/
> [17.03.2004 09:18:10] <opencms_info> IndexManager:
> indexing /TBOKCMS/Documents/Persistence/Sample/
> [17.03.2004 09:18:10] <opencms_info> IndexManager:
> indexing /TBOKCMS/Documents/Persistence/Tips/
> [17.03.2004 09:18:10] <opencms_info> IndexManager:
> indexing /TBOKCMS/Documents/Security/
> [17.03.2004 09:18:10] <opencms_info> IndexManager:
> indexing /TBOKCMS/Documents/Security/Resource/
> [17.03.2004 09:18:10] <opencms_info> IndexManager:
> indexing /TBOKCMS/Documents/Security/Sample/
> [17.03.2004 09:18:10] <opencms_info> IndexManager:
> indexing /TBOKCMS/Documents/WebServices/
> [17.03.2004 09:18:10] <opencms_info> IndexManager:
> indexing /TBOKCMS/Documents/WebServices/Resources/
> [17.03.2004 09:18:10] <opencms_info> IndexManager:
> indexing /TBOKCMS/Documents/WebServices/Samples/
> [17.03.2004 09:18:11] <opencms_info> IndexManager:
> indexing /TBOKCMS/Documents/WebServices/Tips/
> [17.03.2004 09:18:11] <opencms_info> IndexManager: 0
> documents are being processed
> [17.03.2004 09:18:11] <opencms_info> IndexManager: 
> Index has been optimized.
> [17.03.2004 09:18:11] <opencms_info> Done
>
=====IndexManager=============================================================
> 
> 
> the relavant registry.xml used for this....
> 
> <luceneSearch>
> 			<mergeFactor>100000</mergeFactor>
>    			<permCheck>true</permCheck>
> 			<indexDir>C:\lucene\TBOKCMS\</indexDir>
> 		
>
<analyzer>org.apache.lucene.analysis.standard.StandardAnalyzer</analyzer>
>   			<subsearch>true</subsearch>	
> 			<project>online</project>
>    			<docFactories>
>    				<docFactory enabled="true" type="page">
>          		
>
<class>net.grcomputing.opencms.search.lucene.PageDocument</class>
>       			</docFactory>
> 				<docFactory enabled="true" type="plain">
>           			<fileType name="plaintext">
>             			<extension>.txt</extension>
>             		
>
<class>net.grcomputing.opencms.search.lucene.PlainDocument</class>
>           			</fileType>
>           			<fileType name="taggedtext">
>             			<extension>.html</extension>
>             			<extension>.htm</extension>
>             			<extension>.xml</extension>
>                        
>
<class>net.grcomputing.opencms.search.lucene.TaggedPlainDocument</class>
>           			</fileType>
>        			</docFactory>
> 				<docFactory enabled="true" type="binary">
> 					<fileType name="Word">
> 						<extension>.doc</extension>
> 					
>
<class>net.grcomputing.opencms.search.lucene.WordDocument</class>
> 					</fileType>
> 					<fileType name="PDF">
> 						<extension>.pdf</extension>
> 					
>
<class>net.grcomputing.opencms.search.lucene.PDFDocument</class>
> 					</fileType>
> 				</docFactory>					
> 				<docFactory enabled="false" type="jsp">
>          		
>
<class>net.grcomputing.opencms.search.lucene.JspDocument</class>
>        			</docFactory>
> 		       <docFactory enabled="false" type="news">
>         		
>
<class>net.grcomputing.opencms.search.lucene.NewsDocument</class>
>        			</docFactory>
> 				<docFactory enabled="false" type="forum">
>          		
>
<class>de.wfnetz.opencms.modules.forum.ContributionDocument</class>
>        			</docFactory>
> 				<docFactory enabled="false" type="XML
> Template"/>
>    			</docFactories>
>    			<directories>
> 			   <directory location="/TBOKCMS/Documents/">
> 				 <section>TBOK CMS</section>
> 				 <subsearch>true</subsearch>
> 			   </directory>
>    			</directories>
>    			<contentDefinitions>
> 			   <contentDefinition type="news">
> 				
>
<class>com.opencms.modules.homepage.news.NewsContentDefinition</class>
> 				
>
<initClass>net.grcomputing.opencms.search.lucene.NewsInitialization</initClass>
> 				 <listMethod name="getNewsList">
> 				   <param type="java.lang.Integer">1</param>
> 				   <param type="java.lang.String">-1</param>
> 				 </listMethod>
> 				 <page uri="/news.html?__element=entry">
> 				   <param method="getIntId" name="newsid"/>
> 				 </page>
> 			   </contentDefinition>
> 			   <contentDefinition type="forum">
> 				
>
<class>de.wfnetz.opencms.modules.forum.ContributionContentDefinition</class>
> 				 <listMethod name="getSortedList">
> 				   <param type="java.lang.String"/>
> 				 </listMethod>
> 				 <page
>
uri="/forum.html?forumtemplate=viewcontributionentry">
> 				   <param method="getId" name="conid"/>
> 				 </page>
> 			   </contentDefinition>
>    			</contentDefinitions>
> 		</luceneSearch>
> Thanks,
> Miyuru.
> 
> 
> M Butcher <mbutcher at grcomputing.net> wrote:
> 
> Miyuru,
> 
> Can you send the section of the log that shows the
> IndexManager entries. 
> It starts:
> 
> =====IndexManager=========================
> 
> And it should show what DocumentFactories and
> extension maps were loaded.
> 
> Matt
> 
> Miyuru C. Ratnayake wrote:
> > Hi,
> > 
> > There are no errors. Only 4 documents get indexed
> they all are .txt 
> > documents in plain type. There are .pdf and .doc
> documents too but they 
> > won't get indexed
> > 
> > Miyuru
> > 
> > Do you Yahoo!?
> > *Yahoo! Mail* 
> > - More reliable, more storage, less spam
> > 
> 
> _______________________________________________
> This mail is send to you from the opencms-dev
> mailing
> list
> To change your list options, or to unsubscribe from
> the list, please visit
> http://mail.opencms.org/mailman/listinfo/opencms-dev
> 
> __________________________________
> Do you Yahoo!?
> Yahoo! Mail - More reliable, more storage, less spam
> http://mail.yahoo.com
> _______________________________________________
> This mail is send to you from the opencms-dev
> mailing list
> To change your list options, or to unsubscribe from
> the list, please visit
> http://mail.opencms.org/mailman/listinfo/opencms-dev


__________________________________
Do you Yahoo!?
Yahoo! Mail - More reliable, more storage, less spam
http://mail.yahoo.com



More information about the opencms-dev mailing list