[opencms-dev] Error on Lucene indexing

Maurizio Barucco maurizio at domino.it
Wed Dec 1 09:27:42 CET 2004


When the cron scheduler start (every night) the index update for Lucene ,
next morning I see in log files this error messages:

 


java.io.IOException: expected='obj' actual='obj<</H[696' pdfSource 0x21

 


04/12/01 05:30:33     at
org.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:261)

 


04/12/01 05:30:33     at
org.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:93)

 


04/12/01 05:30:33     at
org.textmining.text.extraction.PDFExtractor.extractText(PDFExtractor.java:37
)

 


04/12/01 05:30:33     at
net.grcomputing.opencms.search.lucene.PDFDocument.Document(PDFDocument.java:
50)

 


04/12/01 05:30:33     at
net.grcomputing.opencms.search.lucene.IndexManager.processFile(IndexManager.
java:519)

 


04/12/01 05:30:33     at
net.grcomputing.opencms.search.lucene.IndexManager.processDir(IndexManager.j
ava:364)

 


04/12/01 05:30:33     at
net.grcomputing.opencms.search.lucene.IndexManager.processDir(IndexManager.j
ava:397)

 


04/12/01 05:30:33     at
net.grcomputing.opencms.search.lucene.IndexManager.doIndex(IndexManager.java
:234)

 


04/12/01 05:30:33     at
net.grcomputing.opencms.search.lucene.CronIndexManager.launch(CronIndexManag
er.java:107)

 


04/12/01 05:30:33     at
com.opencms.core.CmsCronScheduleJob.run(CmsCronScheduleJob.java:68)

 

 

The search engine seem to be up and running, but you know what this error
means?

Is a PDF format problem?

 

Hi.

 

 

Mauzirio

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://webmail.opencms.org/pipermail/opencms-dev/attachments/20041201/5a010241/attachment.htm>


More information about the opencms-dev mailing list