[opencms-dev] FILE_CONTENT not found

Darin Kuntze dkuntze at thinksacco.com
Thu Apr 15 21:15:01 CEST 2004


There are about 35 pdf files. Any trick to finding the bad one? There are a
couple that are >10M..

-----Original Message-----
From: opencms-dev-admin at opencms.org [mailto:opencms-dev-admin at opencms.org]
On Behalf Of M Butcher
Sent: Thursday, April 15, 2004 2:08 PM
To: opencms-dev at opencms.org
Subject: Re: [opencms-dev] FILE_CONTENT not found



That all looks correct... how many PDF files are you indexing? I'm 
thinking that maybe one of those files is the culprit.

Matt

Darin Kuntze wrote:
> Here's the lucene part:
> 
> 
>         <luceneSearch>
>             <mergeFactor>100000</mergeFactor>
>             <permCheck>true</permCheck>
>             <indexDir>/opt/lucene/index/opencms/</indexDir>
>  
> <analyzer>org.apache.lucene.analysis.standard.StandardAnalyzer</analyzer>
>             <subsearch>true</subsearch>
>             <project>Online</project>
>             <docFactories>
>                 <docFactory enabled="true" type="page">
>  
> <class>net.grcomputing.opencms.search.lucene.PageDocument</class>
>                 </docFactory>
>                 <docFactory enabled="true" type="plain">
>                     <fileType name="plaintext">
>                         <extension>.txt</extension>
>  
> <class>net.grcomputing.opencms.search.lucene.PlainDocument</class>
>                     </fileType>
>                     <fileType name="taggedtext">
>                         <extension>.html</extension>
>                         <extension>.htm</extension>
>                         <extension>.jsp</extension>
>                         <!-- This will strip tags before processing 
> -->
>  
> <class>net.grcomputing.opencms.search.lucene.TaggedPlainDocument</class>
>                     </fileType>
>                 </docFactory>
>                 <docFactory enabled="false" type="jsp">
>  
> <class>net.grcomputing.opencms.search.lucene.JspDocument</class>
>                 </docFactory>
>                 <docFactory enabled="false" type="XML Template"/>
>                 <docFactory enabled="true" type="binary">
>                     <fileType name="pdftext">
>                         <extension>.pdf</extension>
>  
> <class>net.grcomputing.opencms.search.lucene.PDFDocument</class>
>                     </fileType>
>                 </docFactory>
>             </docFactories>
>             <directories>
>                 <directory location="/dept/">
>                     <section>Department</section>
>                     <subsearch>true</subsearch>
>                 </directory>
>                 <directory location="/pdfs/">
>                     <section>PDFs</section>
>                     <subsearch>true</subsearch>
>                 </directory>
>                 <directory location="/primary/">
>                     <section>MainSite</section>
>                     <subsearch>true</subsearch>
>                 </directory>
>                 <directory location="/statements/">
>                     <section>Statements</section>
>                     <subsearch>true</subsearch>
>                 </directory>
>             </directories>
>         </luceneSearch>
> 
> -----Original Message-----
> From: opencms-dev-admin at opencms.org 
> [mailto:opencms-dev-admin at opencms.org]
> On Behalf Of M Butcher
> Sent: Thursday, April 15, 2004 12:29 PM
> To: opencms-dev at opencms.org
> Subject: Re: [opencms-dev] FILE_CONTENT not found
> 
> 
> 
> Hmmm... that doesn't sound like it's choking on index.jsp, does it? 
> What
> does your registry XML look like? I wonder if the PDF Box classes are 
> having trouble parsing one or more of your PDF files.
> 
> Matt
> 
> Darin Kuntze wrote:
> 
>>The log message I'm getting looks telling:
>>
>>java.io.IOException: expected='obj' actual='obj<</H[576' pdfSource
>>0x21
>>
>>at org.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:261)
>>        at org.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:93)
>>        at
>>org.textmining.text.extraction.PDFExtractor.extractText(PDFExtractor.j
>>ava:37
>>)
>>        at
>>net.grcomputing.opencms.search.lucene.PDFDocument.Document(Unknown Source)
>>        at
>>net.grcomputing.opencms.search.lucene.IndexManager.processFile(Unknown
>>Source)
>>        at
>>net.grcomputing.opencms.search.lucene.IndexManager.processDir(Unknown
>>Source)
>>        at
>>net.grcomputing.opencms.search.lucene.IndexManager.doIndex(Unknown Source)
>>        at
>>net.grcomputing.opencms.search.lucene.CronIndexManager.launch(Unknown
>>Source)
>>        at
>>com.opencms.core.CmsCronScheduleJob.run(CmsCronScheduleJob.java:68)
>>
>>It eats up all the memory then kills tomcat.
>>
>>-----Original Message-----
>>From: opencms-dev-admin at opencms.org
>>[mailto:opencms-dev-admin at opencms.org]
>>On Behalf Of M Butcher
>>Sent: Thursday, April 15, 2004 10:18 AM
>>To: opencms-dev at opencms.org
>>Subject: Re: [opencms-dev] FILE_CONTENT not found
>>
>>
>>Does the file have content? Sounds like the CmsFile.getContents() is 
>>getting an error.
>>
>>You can use this SQL to check the contents in the database:
>>
>>select a.RESOURCE_NAME, b.FILE_CONTENT
>>   from CMS_RESOURCES as a, CMS_FILES as b
>>   where a.FILE_ID = b.FILE_ID and
>>     a.RESOURCE_NAME = '/default/vfs/index.jsp';
>>
>>Matt
>>
>>
>>
>>Darin Kuntze wrote:
>>
>>
>>>I'm getting this error in my opencms.log:
>>>[15.04.2004 02:03:20] <opencms_critical> IndexManager: CMS Error
>>>processing file index.jsp: com.opencms.core.CmsException: 4 Sql 
>>>exception. Detailed error: [com.opencms.file.mySql.CmsDbAccess] Column 
>>>'FILE_CONTENT' not found..
>>>
>>>Something is causing the site to hang... I'm guessing this has 
>>>something to do with it.
>>>
>>><http://www.thinksacco.com/> 	Darin Kuntze
>>>/Senior Technologist/
>>>*The Sacco Group*
>>>402.392.2222 x120
>>>
>>>
>>
>>
>>_______________________________________________
>>This mail is send to you from the opencms-dev mailing list
>>To change your list options, or to unsubscribe from the list, please
>>visit http://mail.opencms.org/mailman/listinfo/opencms-dev
>>
>>
>>
>>_______________________________________________
>>This mail is send to you from the opencms-dev mailing list
>>To change your list options, or to unsubscribe from the list, please
>>visit http://mail.opencms.org/mailman/listinfo/opencms-dev
> 
> 
> _______________________________________________
> This mail is send to you from the opencms-dev mailing list
> To change your list options, or to unsubscribe from the list, please 
> visit http://mail.opencms.org/mailman/listinfo/opencms-dev
> 
> 
> 
> _______________________________________________
> This mail is send to you from the opencms-dev mailing list
> To change your list options, or to unsubscribe from the list, please 
> visit http://mail.opencms.org/mailman/listinfo/opencms-dev

_______________________________________________
This mail is send to you from the opencms-dev mailing list
To change your list options, or to unsubscribe from the list, please visit
http://mail.opencms.org/mailman/listinfo/opencms-dev






More information about the opencms-dev mailing list