[opencms-dev] Adobe 9 and pdfBox

Michael Emmerich m.emmerich at alkacon.com
Wed Mar 23 12:14:46 CET 2011


Tony,

we just tested the newer version of pdfbox here and will add it to the 
OpenCms 7.5.x release we are currently preparing. Of course we will also 
add this to OpenCms 8.

Kind Regards,
Michael.


Am 22.03.2011 20:07, schrieb Tony Thul:
> I replaced the pdfbox jar 0.7.2 with 1.5.0 and added fontbox-1.5.0.jar.
> It would not work unless I changed the import statements in
> CmsExtractorPdf.java from
> org.pdfbox.pdfparser.PDFParser;
> to
> org.apache.pdfbox.pdfparser.PDFParser
> and rebuilt OpenCMS from source. This seems to work, the content was
> indexed.
> Are there any plans to upgrade to a newer version of pdfbox in the future?
> Thanks!
> Tony
>
>  >>> Graeme Kidd <coolkidd3 at hotmail.com> 22/Mar/2011 12:19 pm >>>
>
>
> Hi,
>
> It appears this is a known issue that appeared in PDFBOX before version
> 0.8 (OpenCms uses 0.7.2):
> https://issues.apache.org/jira/browse/PDFBOX-361
>
> You could try and download the latest version of PDFBOX (1.5.0) from here:
> http://pdfbox.apache.org/download.html
>
> However I am not sure how much the PDFBOX API has changed so it may be
> that this version is not supported by OpenCms 7.
>
> Graeme
>
> ________________________________
>  > Date: Tue, 22 Mar 2011 11:14:27 -0600
>  > From: TTHUL at regina.ca
>  > To: opencms-dev at opencms.org
>  > Subject: [opencms-dev] Adobe 9 and pdfBox
>  >
>  > Are there any fixes available for 7.x that will allow the content to be
>  > indexed in pdf files created with adobe 9?
>  >
>  > This an example of the errors we are getting:
>  >
>  > 22 Mar 2011 09:07:07,050 ERROR [rch.documents.A_CmsVfsDocument: 166]
>  > Extracting text from resource
>  >
> "/sites/Insite/hr/job_descriptions/Public_Works_Division/Water_and_Sewer_Services_Department/Water_Operations/Tradesperson_II.pdf"
>  > failed.
>  > org.opencms.search.CmsIndexException: Extracting text from resource
>  >
> "/sites/Insite/hr/job_descriptions/Public_Works_Division/Water_and_Sewer_Services_Department/Water_Operations/Tradesperson_II_x_Plumber_Cross_Connection.pdf"
>  > failed.
>  > at
>  >
> org.opencms.search.documents.CmsDocumentPdf.extractContent(CmsDocumentPdf.java:91)
>  > at
>  >
> org.opencms.search.documents.A_CmsVfsDocument.createDocument(A_CmsVfsDocument.java:159)
>  > at org.opencms.search.CmsIndexingThread.run(CmsIndexingThread.java:129)
>  > Caused by: java.lang.NullPointerException
>  > at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:194)
>  > at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:182)
>  > at
>  >
> org.pdfbox.pdmodel.PDDocumentCatalog.getAllPages(PDDocumentCatalog.java:162)
>  > at org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:220)
>  > at org.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:140)
>  > at
>  >
> org.opencms.search.extractors.CmsExtractorPdf.extractText(CmsExtractorPdf.java:104)
>  > at
>  >
> org.opencms.search.extractors.A_CmsTextExtractor.extractText(A_CmsTextExtractor.java:72)
>  > at
>  >
> org.opencms.search.extractors.A_CmsTextExtractor.extractText(A_CmsTextExtractor.java:62)
>  > at
>  >
> org.opencms.search.documents.CmsDocumentPdf.extractContent(CmsDocumentPdf.java:78)
>  > ... 2 more
>  >
>  >
>  > DISCLAIMER: The information transmitted is intended only for the
>  > addressee and may contain confidential, proprietary and/or privileged
>  > material. Any unauthorized review, distribution or other use of or the
>  > taking of any action in reliance upon this information is prohibited.
>  > If you received this in error, please contact the sender and delete or
>  > destroy this message and any copies.
>  >
>  > _______________________________________________ This mail is sent to
>  > you from the opencms-dev mailing list To change your list options, or
>  > to unsubscribe from the list, please visit
>  > http://lists.opencms.org/mailman/listinfo/opencms-dev
>
>
> _______________________________________________
> This mail is sent to you from the opencms-dev mailing list
> To change your list options, or to unsubscribe from the list, please visit
> http://lists.opencms.org/mailman/listinfo/opencms-dev
>
> *DISCLAIMER:* The information transmitted is intended only for the
> addressee and may contain confidential, proprietary and/or privileged
> material. Any unauthorized review, distribution or other use of or the
> taking of any action in reliance upon this information is prohibited. If
> you received this in error, please contact the sender and delete or
> destroy this message and any copies.
>
>
>
>
> _______________________________________________
> This mail is sent to you from the opencms-dev mailing list
> To change your list options, or to unsubscribe from the list, please visit
> http://lists.opencms.org/mailman/listinfo/opencms-dev

-- 
Kind Regards,
Michael.

-------------------
Michael Emmerich

Visit OpenCms Days 2011 Conference and Expo
May 9 to May 10 2011 in Cologne, Germany
http://www.opencms-days.org

Alkacon Software GmbH  - The OpenCms Experts
http://www.alkacon.com - http://www.opencms.org



More information about the opencms-dev mailing list