[opencms-dev] Adobe 9 and pdfBox

Tue Mar 22 18:14:27 CET 2011

Are there any fixes available for 7.x that will allow the content to be indexed in pdf files created with adobe 9? 

This an example of the errors we are getting:

22 Mar 2011 09:07:07,050 ERROR [rch.documents.A_CmsVfsDocument: 166] Extracting text from resource "/sites/Insite/hr/job_descriptions/Public_Works_Division/Water_and_Sewer_Services_Department/Water_Operations/Tradesperson_II.pdf" failed.
org.opencms.search.CmsIndexException: Extracting text from resource "/sites/Insite/hr/job_descriptions/Public_Works_Division/Water_and_Sewer_Services_Department/Water_Operations/Tradesperson_II_x_Plumber_Cross_Connection.pdf" failed.
 at org.opencms.search.documents.CmsDocumentPdf.extractContent(CmsDocumentPdf.java:91)
 at org.opencms.search.documents.A_CmsVfsDocument.createDocument(A_CmsVfsDocument.java:159)
 at org.opencms.search.CmsIndexingThread.run(CmsIndexingThread.java:129)
Caused by: java.lang.NullPointerException
 at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:194)
 at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:182)
 at org.pdfbox.pdmodel.PDDocumentCatalog.getAllPages(PDDocumentCatalog.java:162)
 at org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:220)
 at org.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:140)
 at org.opencms.search.extractors.CmsExtractorPdf.extractText(CmsExtractorPdf.java:104)
 at org.opencms.search.extractors.A_CmsTextExtractor.extractText(A_CmsTextExtractor.java:72)
 at org.opencms.search.extractors.A_CmsTextExtractor.extractText(A_CmsTextExtractor.java:62)
 at org.opencms.search.documents.CmsDocumentPdf.extractContent(CmsDocumentPdf.java:78)
 ... 2 more

DISCLAIMER: The information transmitted is intended only for the addressee and may contain confidential, proprietary and/or privileged material. Any unauthorized review, distribution or other use of or the taking of any action in reliance upon this information is prohibited. If you received this in error, please contact the sender and delete or destroy this message and any copies. 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://webmail.opencms.org/pipermail/opencms-dev/attachments/20110322/28c55185/attachment.htm>