[opencms-dev] Problem with CmsExtractorMsPowerPoint.processPOIFSReaderEvent(POIFSReaderEvent event) ...

Vlastimil Eliáš vlastimil.elias at qbizm.cz
Thu May 15 14:54:15 CEST 2008


Hi,

this function is used while indexing MSPowerPoint file for fulltext search.

MsPowerPoint files content extractor for fulltext is rather buggy in 
current
OpenCms releases, beside of this OutOfMemmory problem there is national
charset problem too (some national characters are badly interpreted).

My first quick patch to this bug was to check "size" value and then skip
next few rows of code if value is too big.

I am trying some another implementation of MsPowerPoint extractor now
based on new version of POI library, but only for OpenCms 6.2.x now.
I plan to send this patched implementation to Alkacon to add it to
OpenCms 7 release in near future.

Quick sollution for you is to reconfigure your Fulltext search
Index sources not to add MS PowerPoint files to fulltext search indexes.
But these files then can't be found by fulltext search.

Regards

Vlastik

Le Bach napsal(a):
> Dear all,
>
> I am new to the OpenCMS 7, I got problem with execution of this 
> function (automatically called when publishing):
>
> org.opencms.search.extractors.CmsExtractorMsPowerPoint.processPOIFSReaderEvent(POIFSReaderEvent 
> event) {
>    ...
>    int size = (int)LittleEndian.getUInt(buffer, i + 4) + 3;
>    ...
>    byte[] buf = new byte[size]; //produce error out of heap space 
> here, although JVM heap size is set to 1024MB (-Xmx 1024M)
>    ...
> }
>
> The variable size above, with some powerpoint files, sometimes get the 
> very big number (>=1 million), then get OutOfMemoryError.
>
> I strongly want to know the purpose of this function, and can I change 
> code script to avoid OutOfMemoryError without spoiling it ?
>
> I'm looking for your idea.
> Any help would be appreciated.
>
> -- 
> Bach Le
> ------------------------------------------------------------------------
>
>
> _______________________________________________
> This mail is sent to you from the opencms-dev mailing list
> To change your list options, or to unsubscribe from the list, please visit
> http://lists.opencms.org/mailman/listinfo/opencms-dev

-- 
Ing. Vlastimil Elias                        Qbizm technologies, a.s.
vedouci analytik/teamleader                 ... the art of software.
____________________________________________________________________
www.qbizm-technologies.cz    www.qbizm.cz      www.qbizm-services.cz

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://webmail.opencms.org/pipermail/opencms-dev/attachments/20080515/0ee9729a/attachment.htm>


More information about the opencms-dev mailing list