[opencms-dev] How to use the fastTagStripper for searching onaJSP page

Ernesto De Santis opencms at colaborativa.net
Mon Feb 20 16:24:21 CET 2006


Hi Jonathan

Lucene only index text, don't do any other process. If you need do some 
thing, is your responsibility. For example, parse a file content is 
doing with third parts.

In OpenCms, if you need index the jsp results, I think that you need 
index the jsp static export (file system). But, the opencms index 
module, don't index files on file system, only in vfs. You need do 
another code to add contents to OpenCms module index (class or jsp)

Maybe you need to know how lucene do it, see the lucene site:
http://lucene.apache.org/java/docs/index.html

Bye
Ernesto.


Jonathan Woods escribió:
> How about getting Lucene to index the _result_ of the JSP page, i.e. what an
> end user would see?  This feels like what you would want in a case like
> this: you just want to index the content as the end user sees it, not to
> understand the structure of the JSP as you would want to make use of the
> structure of content marked up with semantic XML.
> 
> I don't know whether or not OpenCms/Lucene provides something which makes
> this easy, but conceptually it shouldn't be a problem -  when a JSP resource
> changes, you could just get Lucene to visit the corresponding URL over HTTP
> and to index the HTML it gets sent back.  You'd also have to take account of
> security/permissions when adding stuff to the index.
> 
> Jon
> 
> -----Original Message-----
> From: opencms-dev-bounces at opencms.org
> [mailto:opencms-dev-bounces at opencms.org] On Behalf Of Ernesto De Santis
> Sent: 17 February 2006 12:47
> To: The OpenCms mailing list
> Subject: Re: [opencms-dev] How to use the fastTagStripper for searching
> onaJSP page
> 
> Hi Thierry
> 
> You can modify in the opencms-search.xml the jsp tag for index, using the
> html document factory.:
> 
> <documenttype>
> 	<name>jsp</name>
> <!-- my change: -->
> 	<class>org.opencms.search.documents.CmsDocumentHtml</class>
> 	<mimetypes/>
> 	<resourcetypes>
> 		<resourcetype>jsp</resourcetype>
> 	</resourcetypes>
> </documenttype>		
> 
> 
> But maybe this isn't the best solution for parse a jsp file. I think that
> this don't remove the scriplets. You can try and see if the result is good
> for you.
> 
> The best solution is find a jsp parser, and code a jsp DocumentFactory.
> You need to implement I_CmsDocumentFactory interface or simply extend
> A_CmsVfsDocument class.
> 
> then config your opencms-search.xml with your document factory.
> 
> good luck
> Ernesto.
> 
> 
> 
> Thierry Collogne escribió:
>> I am using opencms 6.0.4.
>>
>> I have read about the xml file, but how can I configure that the jsp 
>> tags should not be included in the search, only the text in between?
>>
>>
>>> From: Ernesto De Santis <opencms at colaborativa.net>
>>> Reply-To: The OpenCms mailing list <opencms-dev at opencms.org>
>>> To: The OpenCms mailing list <opencms-dev at opencms.org>
>>> Subject: Re: [opencms-dev] How to use the fastTagStripper for 
>>> searching ona JSP    page
>>> Date: Thu, 16 Feb 2006 16:16:14 -0300
>>>
>>> Hi
>>>
>>> You are using opencms 5 or 6?
>>>
>>> The thread that you read is for opencms 5 and opencmslucene module 
>>> (for opencms 5).
>>>
>>> Now, OpenCms 6 include a functionality to index and search.
>>> in WEB-INF/config/opencms-search.xml
>>> you can configure the index.
>>>
>>> You read the alkacon documentation about it?
>>>
>>> Bye,
>>> Ernesto.
>>>
>>> Thierry Collogne escribió:
>>>> Hi all,
>>>>
>>>> I am using the lucene search module which is included in opencms. I 
>>>> am able to search jsp pages, but when I search them, also the jsp 
>>>> tags are searched.
>>>>
>>>> I have allready read here
>>>>
>>>> http://mail.opencms.org/pipermail/opencms-dev/2003q3/006883.html
>>>>
>>>> that it should be possible to search the jsp and omit the tags and 
>>>> code and only search the the text between the tags using a 
>>>> fastTagStripper.
>>>>
>>>> I have no idea how to use the fastTagStripper. Can somebody help me?
>>>>
>>>> Thank you,
>>>>
>>>> Thierry
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> This mail is sent to you from the opencms-dev mailing list To change 
>>>> your list options, or to unsubscribe from the list, please visit 
>>>> http://lists.opencms.org/mailman/listinfo/opencms-dev
>>>>
>>>>
>>
>>> << opencms.vcf >>
>>
>>> _______________________________________________
>>> This mail is sent to you from the opencms-dev mailing list To change 
>>> your list options, or to unsubscribe from the list, please visit 
>>> http://lists.opencms.org/mailman/listinfo/opencms-dev
>>
>>
>> _______________________________________________
>> This mail is sent to you from the opencms-dev mailing list To change 
>> your list options, or to unsubscribe from the list, please visit 
>> http://lists.opencms.org/mailman/listinfo/opencms-dev
>>
>>
> 
> 
> _______________________________________________
> This mail is sent to you from the opencms-dev mailing list
> To change your list options, or to unsubscribe from the list, please visit
> http://lists.opencms.org/mailman/listinfo/opencms-dev
> 
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: opencms.vcf
Type: text/x-vcard
Size: 263 bytes
Desc: not available
URL: <https://webmail.opencms.org/pipermail/opencms-dev/attachments/20060220/bb64c74a/attachment.vcf>


More information about the opencms-dev mailing list