[opencms-dev] Indexing iframes

Christian Steinert christian_steinert at web.de
Sun Feb 18 12:36:17 CET 2007


Jan.Cifra at qimonda.com schrieb:
> Hello everyone, 
>
> I am working on a new internet solution which uses OpenCms. Some data is
> going to be displayed in iframes so these will not actually be in the
> VFS of OpenCms. The problem is one of the main requirements of this
> solution is that it needs to be able to search through the html pages,
> even these iframes. Is there any way to modify/enhance lucene to do
> this?
>   
You would have to write your own Lucene indexer. The lucene
documentation should describe how this can be done, but I have no
experience with this. Maybe Jonathan Woods here on the list can give
some more pointers on this.

Opencms allows to use different indexers for different resource types,
so if you write your own indexer, then you should be able to assign that
indexer to a resource type like "external link". So - yes it is
generally possible, but you have to write your own indexer that
downloads pages and extracts page elements like the title and the body part.

Your indexer would of course have to be written to download the pages
itself via a httprequest and then add these pages to the index of
opencms. If you have written the logic for downloading and
pre-processing the remote pages, then it is probably worth it to look at
the existing indexers opencms indexers to see how your downloaded
resources can be added into the search index of opencms.

regardsd
christian
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3269 bytes
Desc: S/MIME Cryptographic Signature
URL: <https://webmail.opencms.org/pipermail/opencms-dev/attachments/20070218/691960f8/attachment.bin>


More information about the opencms-dev mailing list