[opencms-dev] FILE_CONTENT not found
Darin Kuntze
dkuntze at thinksacco.com
Thu Apr 15 20:58:02 CEST 2004
Here's the lucene part:
<luceneSearch>
<mergeFactor>100000</mergeFactor>
<permCheck>true</permCheck>
<indexDir>/opt/lucene/index/opencms/</indexDir>
<analyzer>org.apache.lucene.analysis.standard.StandardAnalyzer</analyzer>
<subsearch>true</subsearch>
<project>Online</project>
<docFactories>
<docFactory enabled="true" type="page">
<class>net.grcomputing.opencms.search.lucene.PageDocument</class>
</docFactory>
<docFactory enabled="true" type="plain">
<fileType name="plaintext">
<extension>.txt</extension>
<class>net.grcomputing.opencms.search.lucene.PlainDocument</class>
</fileType>
<fileType name="taggedtext">
<extension>.html</extension>
<extension>.htm</extension>
<extension>.jsp</extension>
<!-- This will strip tags before processing -->
<class>net.grcomputing.opencms.search.lucene.TaggedPlainDocument</class>
</fileType>
</docFactory>
<docFactory enabled="false" type="jsp">
<class>net.grcomputing.opencms.search.lucene.JspDocument</class>
</docFactory>
<docFactory enabled="false" type="XML Template"/>
<docFactory enabled="true" type="binary">
<fileType name="pdftext">
<extension>.pdf</extension>
<class>net.grcomputing.opencms.search.lucene.PDFDocument</class>
</fileType>
</docFactory>
</docFactories>
<directories>
<directory location="/dept/">
<section>Department</section>
<subsearch>true</subsearch>
</directory>
<directory location="/pdfs/">
<section>PDFs</section>
<subsearch>true</subsearch>
</directory>
<directory location="/primary/">
<section>MainSite</section>
<subsearch>true</subsearch>
</directory>
<directory location="/statements/">
<section>Statements</section>
<subsearch>true</subsearch>
</directory>
</directories>
</luceneSearch>
-----Original Message-----
From: opencms-dev-admin at opencms.org [mailto:opencms-dev-admin at opencms.org]
On Behalf Of M Butcher
Sent: Thursday, April 15, 2004 12:29 PM
To: opencms-dev at opencms.org
Subject: Re: [opencms-dev] FILE_CONTENT not found
Hmmm... that doesn't sound like it's choking on index.jsp, does it? What
does your registry XML look like? I wonder if the PDF Box classes are
having trouble parsing one or more of your PDF files.
Matt
Darin Kuntze wrote:
> The log message I'm getting looks telling:
>
> java.io.IOException: expected='obj' actual='obj<</H[576' pdfSource
> 0x21
>
> at org.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:261)
> at org.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:93)
> at
> org.textmining.text.extraction.PDFExtractor.extractText(PDFExtractor.j
> ava:37
> )
> at
> net.grcomputing.opencms.search.lucene.PDFDocument.Document(Unknown Source)
> at
> net.grcomputing.opencms.search.lucene.IndexManager.processFile(Unknown
> Source)
> at
> net.grcomputing.opencms.search.lucene.IndexManager.processDir(Unknown
> Source)
> at
> net.grcomputing.opencms.search.lucene.IndexManager.doIndex(Unknown Source)
> at
> net.grcomputing.opencms.search.lucene.CronIndexManager.launch(Unknown
> Source)
> at
> com.opencms.core.CmsCronScheduleJob.run(CmsCronScheduleJob.java:68)
>
> It eats up all the memory then kills tomcat.
>
> -----Original Message-----
> From: opencms-dev-admin at opencms.org
> [mailto:opencms-dev-admin at opencms.org]
> On Behalf Of M Butcher
> Sent: Thursday, April 15, 2004 10:18 AM
> To: opencms-dev at opencms.org
> Subject: Re: [opencms-dev] FILE_CONTENT not found
>
>
> Does the file have content? Sounds like the CmsFile.getContents() is
> getting an error.
>
> You can use this SQL to check the contents in the database:
>
> select a.RESOURCE_NAME, b.FILE_CONTENT
> from CMS_RESOURCES as a, CMS_FILES as b
> where a.FILE_ID = b.FILE_ID and
> a.RESOURCE_NAME = '/default/vfs/index.jsp';
>
> Matt
>
>
>
> Darin Kuntze wrote:
>
>>I'm getting this error in my opencms.log:
>>[15.04.2004 02:03:20] <opencms_critical> IndexManager: CMS Error
>>processing file index.jsp: com.opencms.core.CmsException: 4 Sql
>>exception. Detailed error: [com.opencms.file.mySql.CmsDbAccess] Column
>>'FILE_CONTENT' not found..
>>
>>Something is causing the site to hang... I'm guessing this has
>>something
>>to do with it.
>>
>><http://www.thinksacco.com/> Darin Kuntze
>>/Senior Technologist/
>>*The Sacco Group*
>>402.392.2222 x120
>>
>>
>
>
> _______________________________________________
> This mail is send to you from the opencms-dev mailing list
> To change your list options, or to unsubscribe from the list, please
> visit http://mail.opencms.org/mailman/listinfo/opencms-dev
>
>
>
> _______________________________________________
> This mail is send to you from the opencms-dev mailing list
> To change your list options, or to unsubscribe from the list, please
> visit http://mail.opencms.org/mailman/listinfo/opencms-dev
_______________________________________________
This mail is send to you from the opencms-dev mailing list
To change your list options, or to unsubscribe from the list, please visit
http://mail.opencms.org/mailman/listinfo/opencms-dev
More information about the opencms-dev
mailing list