[opencms-dev] Lucene - PDF exception during index creation
M Butcher
mbutcher at grcomputing.net
Thu Mar 18 18:24:02 CET 2004
Ralf,
The PDF Box classes think the PDF file is corrupt. Those classes are
outside of the development work that we do, but it is possible that a
newer version of the PDF Box classes will fix the issue.
Matt
Ralf Emanuel wrote:
> Dear opencms list,
>
> we use lucene 1.5 and opencms 5 in a current project on Windows 2003
> Server. Each time the index run the below mentioned exception appears.
>
> Can anybody help me?
>
> --snip--
> java.io.IOException: Error: Expected an integer type, actual='endobj'
> at org.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:943)
> at org.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:253)
> at org.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:93)
> at
> org.textmining.text.extraction.PDFExtractor.extractText(PDFExtractor.java:37)
>
> at
> net.grcomputing.opencms.search.lucene.PDFDocument.Document(Unknown Source)
> at
> net.grcomputing.opencms.search.lucene.IndexManager.processFile(Unknown
> Source)
> at
> net.grcomputing.opencms.search.lucene.IndexManager.processDir(Unknown
> Source)
> at
> net.grcomputing.opencms.search.lucene.IndexManager.processDir(Unknown
> Source)
> at
> net.grcomputing.opencms.search.lucene.IndexManager.doIndex(Unknown Source)
> at
> net.grcomputing.opencms.search.lucene.CronIndexManager.launch(Unknown
> Source)
> at
> com.opencms.core.CmsCronScheduleJob.run(CmsCronScheduleJob.java:68)
> --snip--
>
> Thanks in advance.
>
>
> Ralf Emanuel
>
>
> _______________________________________________
> This mail is send to you from the opencms-dev mailing list
> To change your list options, or to unsubscribe from the list, please visit
> http://mail.opencms.org/mailman/listinfo/opencms-dev
More information about the opencms-dev
mailing list