[opencms-dev] FILE_CONTENT not found
Darin Kuntze
dkuntze at thinksacco.com
Thu Apr 15 21:15:01 CEST 2004
There are about 35 pdf files. Any trick to finding the bad one? There are a
couple that are >10M..
-----Original Message-----
From: opencms-dev-admin at opencms.org [mailto:opencms-dev-admin at opencms.org]
On Behalf Of M Butcher
Sent: Thursday, April 15, 2004 2:08 PM
To: opencms-dev at opencms.org
Subject: Re: [opencms-dev] FILE_CONTENT not found
That all looks correct... how many PDF files are you indexing? I'm
thinking that maybe one of those files is the culprit.
Matt
Darin Kuntze wrote:
> Here's the lucene part:
>
>
> <luceneSearch>
> <mergeFactor>100000</mergeFactor>
> <permCheck>true</permCheck>
> <indexDir>/opt/lucene/index/opencms/</indexDir>
>
> <analyzer>org.apache.lucene.analysis.standard.StandardAnalyzer</analyzer>
> <subsearch>true</subsearch>
> <project>Online</project>
> <docFactories>
> <docFactory enabled="true" type="page">
>
> <class>net.grcomputing.opencms.search.lucene.PageDocument</class>
> </docFactory>
> <docFactory enabled="true" type="plain">
> <fileType name="plaintext">
> <extension>.txt</extension>
>
> <class>net.grcomputing.opencms.search.lucene.PlainDocument</class>
> </fileType>
> <fileType name="taggedtext">
> <extension>.html</extension>
> <extension>.htm</extension>
> <extension>.jsp</extension>
> <!-- This will strip tags before processing
> -->
>
> <class>net.grcomputing.opencms.search.lucene.TaggedPlainDocument</class>
> </fileType>
> </docFactory>
> <docFactory enabled="false" type="jsp">
>
> <class>net.grcomputing.opencms.search.lucene.JspDocument</class>
> </docFactory>
> <docFactory enabled="false" type="XML Template"/>
> <docFactory enabled="true" type="binary">
> <fileType name="pdftext">
> <extension>.pdf</extension>
>
> <class>net.grcomputing.opencms.search.lucene.PDFDocument</class>
> </fileType>
> </docFactory>
> </docFactories>
> <directories>
> <directory location="/dept/">
> <section>Department</section>
> <subsearch>true</subsearch>
> </directory>
> <directory location="/pdfs/">
> <section>PDFs</section>
> <subsearch>true</subsearch>
> </directory>
> <directory location="/primary/">
> <section>MainSite</section>
> <subsearch>true</subsearch>
> </directory>
> <directory location="/statements/">
> <section>Statements</section>
> <subsearch>true</subsearch>
> </directory>
> </directories>
> </luceneSearch>
>
> -----Original Message-----
> From: opencms-dev-admin at opencms.org
> [mailto:opencms-dev-admin at opencms.org]
> On Behalf Of M Butcher
> Sent: Thursday, April 15, 2004 12:29 PM
> To: opencms-dev at opencms.org
> Subject: Re: [opencms-dev] FILE_CONTENT not found
>
>
>
> Hmmm... that doesn't sound like it's choking on index.jsp, does it?
> What
> does your registry XML look like? I wonder if the PDF Box classes are
> having trouble parsing one or more of your PDF files.
>
> Matt
>
> Darin Kuntze wrote:
>
>>The log message I'm getting looks telling:
>>
>>java.io.IOException: expected='obj' actual='obj<</H[576' pdfSource
>>0x21
>>
>>at org.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:261)
>> at org.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:93)
>> at
>>org.textmining.text.extraction.PDFExtractor.extractText(PDFExtractor.j
>>ava:37
>>)
>> at
>>net.grcomputing.opencms.search.lucene.PDFDocument.Document(Unknown Source)
>> at
>>net.grcomputing.opencms.search.lucene.IndexManager.processFile(Unknown
>>Source)
>> at
>>net.grcomputing.opencms.search.lucene.IndexManager.processDir(Unknown
>>Source)
>> at
>>net.grcomputing.opencms.search.lucene.IndexManager.doIndex(Unknown Source)
>> at
>>net.grcomputing.opencms.search.lucene.CronIndexManager.launch(Unknown
>>Source)
>> at
>>com.opencms.core.CmsCronScheduleJob.run(CmsCronScheduleJob.java:68)
>>
>>It eats up all the memory then kills tomcat.
>>
>>-----Original Message-----
>>From: opencms-dev-admin at opencms.org
>>[mailto:opencms-dev-admin at opencms.org]
>>On Behalf Of M Butcher
>>Sent: Thursday, April 15, 2004 10:18 AM
>>To: opencms-dev at opencms.org
>>Subject: Re: [opencms-dev] FILE_CONTENT not found
>>
>>
>>Does the file have content? Sounds like the CmsFile.getContents() is
>>getting an error.
>>
>>You can use this SQL to check the contents in the database:
>>
>>select a.RESOURCE_NAME, b.FILE_CONTENT
>> from CMS_RESOURCES as a, CMS_FILES as b
>> where a.FILE_ID = b.FILE_ID and
>> a.RESOURCE_NAME = '/default/vfs/index.jsp';
>>
>>Matt
>>
>>
>>
>>Darin Kuntze wrote:
>>
>>
>>>I'm getting this error in my opencms.log:
>>>[15.04.2004 02:03:20] <opencms_critical> IndexManager: CMS Error
>>>processing file index.jsp: com.opencms.core.CmsException: 4 Sql
>>>exception. Detailed error: [com.opencms.file.mySql.CmsDbAccess] Column
>>>'FILE_CONTENT' not found..
>>>
>>>Something is causing the site to hang... I'm guessing this has
>>>something to do with it.
>>>
>>><http://www.thinksacco.com/> Darin Kuntze
>>>/Senior Technologist/
>>>*The Sacco Group*
>>>402.392.2222 x120
>>>
>>>
>>
>>
>>_______________________________________________
>>This mail is send to you from the opencms-dev mailing list
>>To change your list options, or to unsubscribe from the list, please
>>visit http://mail.opencms.org/mailman/listinfo/opencms-dev
>>
>>
>>
>>_______________________________________________
>>This mail is send to you from the opencms-dev mailing list
>>To change your list options, or to unsubscribe from the list, please
>>visit http://mail.opencms.org/mailman/listinfo/opencms-dev
>
>
> _______________________________________________
> This mail is send to you from the opencms-dev mailing list
> To change your list options, or to unsubscribe from the list, please
> visit http://mail.opencms.org/mailman/listinfo/opencms-dev
>
>
>
> _______________________________________________
> This mail is send to you from the opencms-dev mailing list
> To change your list options, or to unsubscribe from the list, please
> visit http://mail.opencms.org/mailman/listinfo/opencms-dev
_______________________________________________
This mail is send to you from the opencms-dev mailing list
To change your list options, or to unsubscribe from the list, please visit
http://mail.opencms.org/mailman/listinfo/opencms-dev
More information about the opencms-dev
mailing list