[opencms-dev] Re: Lucene - PDF exception during index creation
Ralf Emanuel
emanuel at inexweb.de
Fri Mar 19 17:52:02 CET 2004
Matt,
OK, I got the actual pdfbox jar file and changed the version 0.5.6-mpb with
version 0.6.5 in the WEB-INF/lib directory. After restart of tomcat during
the next index run I received the following exception:
--snip--
java.lang.NoClassDefFoundError
at
org.textmining.text.extraction.PDFExtractor.extractText(PDFExtractor.java:35)
at
net.grcomputing.opencms.search.lucene.PDFDocument.Document(Unknown Source)
--snip--
It seems, that there is a newer class included or something similar !?
Any idea?
Ralf
>Ralf,
>The PDF Box classes think the PDF file is corrupt. Those classes are
>outside of the development work that we do, but it is possible that a
>newer version of the PDF Box classes will fix the issue.
>Matt
>Ralf Emanuel wrote:
> Dear opencms list,
>
> we use lucene 1.5 and opencms 5 in a current project on Windows 2003
> Server. Each time the index run the below mentioned exception appears.
>
> Can anybody help me?
>
> --snip--
> java.io.IOException: Error: Expected an integer type, actual='endobj'
> at org.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:943)
> at org.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:253)
> at org.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:93)
> at
>
org.textmining.text.extraction.PDFExtractor.extractText(PDFExtractor.java:37)
>
> at
> net.grcomputing.opencms.search.lucene.PDFDocument.Document(Unknown Source)
> at
> net.grcomputing.opencms.search.lucene.IndexManager.processFile(Unknown
> Source)
> at
> net.grcomputing.opencms.search.lucene.IndexManager.processDir(Unknown
> Source)
> at
> net.grcomputing.opencms.search.lucene.IndexManager.processDir(Unknown
> Source)
> at
> net.grcomputing.opencms.search.lucene.IndexManager.doIndex(Unknown Source)
> at
> net.grcomputing.opencms.search.lucene.CronIndexManager.launch(Unknown
> Source)
> at
> com.opencms.core.CmsCronScheduleJob.run(CmsCronScheduleJob.java:68)
> --snip--
>
> Thanks in advance.
>
>
> Ralf Emanuel
More information about the opencms-dev
mailing list