[opencms-dev] Adobe 9 and pdfBox

Tue Mar 22 19:19:41 CET 2011

Hi,

It appears this is a known issue that appeared in PDFBOX before version 0.8 (OpenCms uses 0.7.2):
https://issues.apache.org/jira/browse/PDFBOX-361

You could try and download the latest version of PDFBOX (1.5.0) from here:
http://pdfbox.apache.org/download.html

However I am not sure how much the PDFBOX API has changed so it may be that this version is not supported by OpenCms 7.

Graeme

________________________________
> Date: Tue, 22 Mar 2011 11:14:27 -0600
> From: TTHUL at regina.ca
> To: opencms-dev at opencms.org
> Subject: [opencms-dev] Adobe 9 and pdfBox
>
> Are there any fixes available for 7.x that will allow the content to be
> indexed in pdf files created with adobe 9?
>
> This an example of the errors we are getting:
>
> 22 Mar 2011 09:07:07,050 ERROR [rch.documents.A_CmsVfsDocument: 166]
> Extracting text from resource
> "/sites/Insite/hr/job_descriptions/Public_Works_Division/Water_and_Sewer_Services_Department/Water_Operations/Tradesperson_II.pdf"
> failed.
> org.opencms.search.CmsIndexException: Extracting text from resource
> "/sites/Insite/hr/job_descriptions/Public_Works_Division/Water_and_Sewer_Services_Department/Water_Operations/Tradesperson_II_x_Plumber_Cross_Connection.pdf"
> failed.
> at
> org.opencms.search.documents.CmsDocumentPdf.extractContent(CmsDocumentPdf.java:91)
> at
> org.opencms.search.documents.A_CmsVfsDocument.createDocument(A_CmsVfsDocument.java:159)
> at org.opencms.search.CmsIndexingThread.run(CmsIndexingThread.java:129)
> Caused by: java.lang.NullPointerException
> at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:194)
> at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:182)
> at
> org.pdfbox.pdmodel.PDDocumentCatalog.getAllPages(PDDocumentCatalog.java:162)
> at org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:220)
> at org.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:140)
> at
> org.opencms.search.extractors.CmsExtractorPdf.extractText(CmsExtractorPdf.java:104)
> at
> org.opencms.search.extractors.A_CmsTextExtractor.extractText(A_CmsTextExtractor.java:72)
> at
> org.opencms.search.extractors.A_CmsTextExtractor.extractText(A_CmsTextExtractor.java:62)
> at
> org.opencms.search.documents.CmsDocumentPdf.extractContent(CmsDocumentPdf.java:78)
> ... 2 more
>
>
> DISCLAIMER: The information transmitted is intended only for the
> addressee and may contain confidential, proprietary and/or privileged
> material. Any unauthorized review, distribution or other use of or the
> taking of any action in reliance upon this information is prohibited.
> If you received this in error, please contact the sender and delete or
> destroy this message and any copies.
>
> _______________________________________________ This mail is sent to
> you from the opencms-dev mailing list To change your list options, or
> to unsubscribe from the list, please visit
> http://lists.opencms.org/mailman/listinfo/opencms-dev