[opencms-dev] Error on Lucene indexing
Maurizio Barucco
maurizio at domino.it
Wed Dec 1 09:27:42 CET 2004
When the cron scheduler start (every night) the index update for Lucene ,
next morning I see in log files this error messages:
java.io.IOException: expected='obj' actual='obj<</H[696' pdfSource 0x21
04/12/01 05:30:33 at
org.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:261)
04/12/01 05:30:33 at
org.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:93)
04/12/01 05:30:33 at
org.textmining.text.extraction.PDFExtractor.extractText(PDFExtractor.java:37
)
04/12/01 05:30:33 at
net.grcomputing.opencms.search.lucene.PDFDocument.Document(PDFDocument.java:
50)
04/12/01 05:30:33 at
net.grcomputing.opencms.search.lucene.IndexManager.processFile(IndexManager.
java:519)
04/12/01 05:30:33 at
net.grcomputing.opencms.search.lucene.IndexManager.processDir(IndexManager.j
ava:364)
04/12/01 05:30:33 at
net.grcomputing.opencms.search.lucene.IndexManager.processDir(IndexManager.j
ava:397)
04/12/01 05:30:33 at
net.grcomputing.opencms.search.lucene.IndexManager.doIndex(IndexManager.java
:234)
04/12/01 05:30:33 at
net.grcomputing.opencms.search.lucene.CronIndexManager.launch(CronIndexManag
er.java:107)
04/12/01 05:30:33 at
com.opencms.core.CmsCronScheduleJob.run(CmsCronScheduleJob.java:68)
The search engine seem to be up and running, but you know what this error
means?
Is a PDF format problem?
Hi.
Mauzirio
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://webmail.opencms.org/pipermail/opencms-dev/attachments/20041201/5a010241/attachment.htm>
More information about the opencms-dev
mailing list