[opencms-dev] Lucene - PDF exception during index creation
Ralf Emanuel
emanuel at inexweb.de
Thu Mar 18 10:15:01 CET 2004
Dear opencms list,
we use lucene 1.5 and opencms 5 in a current project on Windows 2003
Server. Each time the index run the below mentioned exception appears.
Can anybody help me?
--snip--
java.io.IOException: Error: Expected an integer type, actual='endobj'
at org.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:943)
at org.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:253)
at org.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:93)
at
org.textmining.text.extraction.PDFExtractor.extractText(PDFExtractor.java:37)
at
net.grcomputing.opencms.search.lucene.PDFDocument.Document(Unknown Source)
at
net.grcomputing.opencms.search.lucene.IndexManager.processFile(Unknown Source)
at
net.grcomputing.opencms.search.lucene.IndexManager.processDir(Unknown Source)
at
net.grcomputing.opencms.search.lucene.IndexManager.processDir(Unknown Source)
at
net.grcomputing.opencms.search.lucene.IndexManager.doIndex(Unknown Source)
at
net.grcomputing.opencms.search.lucene.CronIndexManager.launch(Unknown Source)
at com.opencms.core.CmsCronScheduleJob.run(CmsCronScheduleJob.java:68)
--snip--
Thanks in advance.
Ralf Emanuel
More information about the opencms-dev
mailing list