[opencms-dev] How to use the fastTagStripper for searching onaJSP page

Jonathan Woods jonathan.woods at scintillance.com
Sun Feb 19 07:14:33 CET 2006


How about getting Lucene to index the _result_ of the JSP page, i.e. what an
end user would see?  This feels like what you would want in a case like
this: you just want to index the content as the end user sees it, not to
understand the structure of the JSP as you would want to make use of the
structure of content marked up with semantic XML.

I don't know whether or not OpenCms/Lucene provides something which makes
this easy, but conceptually it shouldn't be a problem -  when a JSP resource
changes, you could just get Lucene to visit the corresponding URL over HTTP
and to index the HTML it gets sent back.  You'd also have to take account of
security/permissions when adding stuff to the index.

Jon

-----Original Message-----
From: opencms-dev-bounces at opencms.org
[mailto:opencms-dev-bounces at opencms.org] On Behalf Of Ernesto De Santis
Sent: 17 February 2006 12:47
To: The OpenCms mailing list
Subject: Re: [opencms-dev] How to use the fastTagStripper for searching
onaJSP page

Hi Thierry

You can modify in the opencms-search.xml the jsp tag for index, using the
html document factory.:

<documenttype>
	<name>jsp</name>
<!-- my change: -->
	<class>org.opencms.search.documents.CmsDocumentHtml</class>
	<mimetypes/>
	<resourcetypes>
		<resourcetype>jsp</resourcetype>
	</resourcetypes>
</documenttype>		


But maybe this isn't the best solution for parse a jsp file. I think that
this don't remove the scriplets. You can try and see if the result is good
for you.

The best solution is find a jsp parser, and code a jsp DocumentFactory.
You need to implement I_CmsDocumentFactory interface or simply extend
A_CmsVfsDocument class.

then config your opencms-search.xml with your document factory.

good luck
Ernesto.



Thierry Collogne escribió:
> I am using opencms 6.0.4.
> 
> I have read about the xml file, but how can I configure that the jsp 
> tags should not be included in the search, only the text in between?
> 
> 
>> From: Ernesto De Santis <opencms at colaborativa.net>
>> Reply-To: The OpenCms mailing list <opencms-dev at opencms.org>
>> To: The OpenCms mailing list <opencms-dev at opencms.org>
>> Subject: Re: [opencms-dev] How to use the fastTagStripper for 
>> searching ona JSP    page
>> Date: Thu, 16 Feb 2006 16:16:14 -0300
>>
>> Hi
>>
>> You are using opencms 5 or 6?
>>
>> The thread that you read is for opencms 5 and opencmslucene module 
>> (for opencms 5).
>>
>> Now, OpenCms 6 include a functionality to index and search.
>> in WEB-INF/config/opencms-search.xml
>> you can configure the index.
>>
>> You read the alkacon documentation about it?
>>
>> Bye,
>> Ernesto.
>>
>> Thierry Collogne escribió:
>>> Hi all,
>>>
>>> I am using the lucene search module which is included in opencms. I 
>>> am able to search jsp pages, but when I search them, also the jsp 
>>> tags are searched.
>>>
>>> I have allready read here
>>>
>>> http://mail.opencms.org/pipermail/opencms-dev/2003q3/006883.html
>>>
>>> that it should be possible to search the jsp and omit the tags and 
>>> code and only search the the text between the tags using a 
>>> fastTagStripper.
>>>
>>> I have no idea how to use the fastTagStripper. Can somebody help me?
>>>
>>> Thank you,
>>>
>>> Thierry
>>>
>>>
>>>
>>> _______________________________________________
>>> This mail is sent to you from the opencms-dev mailing list To change 
>>> your list options, or to unsubscribe from the list, please visit 
>>> http://lists.opencms.org/mailman/listinfo/opencms-dev
>>>
>>>
> 
> 
>> << opencms.vcf >>
> 
> 
>>
>> _______________________________________________
>> This mail is sent to you from the opencms-dev mailing list To change 
>> your list options, or to unsubscribe from the list, please visit 
>> http://lists.opencms.org/mailman/listinfo/opencms-dev
> 
> 
> 
> _______________________________________________
> This mail is sent to you from the opencms-dev mailing list To change 
> your list options, or to unsubscribe from the list, please visit 
> http://lists.opencms.org/mailman/listinfo/opencms-dev
> 
> 




More information about the opencms-dev mailing list