[opencms-dev] Re: Lucene - PDF exception during index creation

Ralf Emanuel emanuel at inexweb.de
Fri Mar 19 17:52:02 CET 2004


Matt,

OK, I got the actual pdfbox jar file and changed the version 0.5.6-mpb with 
version 0.6.5 in the WEB-INF/lib directory. After restart of tomcat during 
the next index run I received the following exception:

--snip--
java.lang.NoClassDefFoundError
         at 
org.textmining.text.extraction.PDFExtractor.extractText(PDFExtractor.java:35)
         at 
net.grcomputing.opencms.search.lucene.PDFDocument.Document(Unknown Source)
--snip--

It seems, that there is a newer class included or something similar !?

Any idea?


Ralf

 >Ralf,
 >The PDF Box classes think the PDF file is corrupt. Those classes are
 >outside of the development work that we do, but it is possible that a
 >newer version of the PDF Box classes will fix the issue.
 >Matt

 >Ralf Emanuel wrote:
 > Dear opencms list,
 >
 > we use lucene 1.5 and opencms 5 in a current project on Windows 2003
 > Server. Each time the index run the below mentioned exception appears.
 >
 > Can anybody help me?
 >
 > --snip--
 > java.io.IOException: Error: Expected an integer type, actual='endobj'
 > at org.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:943)
 > at org.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:253)
 > at org.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:93)
 > at
 > 
org.textmining.text.extraction.PDFExtractor.extractText(PDFExtractor.java:37)
 >
 > at
 > net.grcomputing.opencms.search.lucene.PDFDocument.Document(Unknown Source)
 > at
 > net.grcomputing.opencms.search.lucene.IndexManager.processFile(Unknown
 > Source)
 > at
 > net.grcomputing.opencms.search.lucene.IndexManager.processDir(Unknown
 > Source)
 > at
 > net.grcomputing.opencms.search.lucene.IndexManager.processDir(Unknown
 > Source)
 > at
 > net.grcomputing.opencms.search.lucene.IndexManager.doIndex(Unknown Source)
 > at
 > net.grcomputing.opencms.search.lucene.CronIndexManager.launch(Unknown
 > Source)
 > at
 > com.opencms.core.CmsCronScheduleJob.run(CmsCronScheduleJob.java:68)
 > --snip--
 >
 > Thanks in advance.
 >
 >
 > Ralf Emanuel




More information about the opencms-dev mailing list