[opencms-dev] Lucene Query and untokenized xmlContent items

Tue Aug 14 12:57:58 CEST 2012

Hi Rudiger,

finally I did implementing a new analyzer that uses the
LimitTokenCountAnalyzer. 

package org.apache.lucene.analysis;

public class FirstWordAnalyzer extends Analyzer {

	private final Analyzer delegate;
	private final Version matchVersion;

	public FirstWordAnalyzer(Version matchVersion) {
		this.matchVersion = matchVersion;
		delegate = new LimitTokenCountAnalyzer(new
SimpleAnalyzer(this.matchVersion), 1);
	}

	@Override
	public TokenStream tokenStream(String fieldName, Reader reader) {
		return delegate.tokenStream(fieldName, reader);
	}
}

and configured the field in opencms-search.xml such it:

                    <field name="title"store="true" index="true"
excerpt="true" analyzer="FirstWordAnalyzer">
                        <mapping type="item">title[1]</mapping>
                    </field>

I think is not the most clean way to do it. For example the analyzer must
stay onto package org.apache.lucene.analysis.

Any suggestion to do it in a better way?

thank you Rudiger and thank you all

Davide

"Rüdiger Kurz (list)" wrote:
> 
> Hi,
> 
> the searches in OpenCms are performed in the class CmsSearchIndex. 
> You're are able to extend this class and override:
> 
> CmsSearchIndex#search(CmsObject, CmsSearchParameters)
> 
> maybe you can go this way?
> 
> regards Rüdiger
> 
> 
> On 08/14/2012 10:21 AM, Cavva79 wrote:
>>
>> Hi there,
>>
>> I'm new for this forum.
>>
>> I'm using opencms 8.0.4, Tomcat 7.0.12 and JDK 6
>> To be short I've troubles trying to get first character of a xmlcontent
>> field, which is a small sentence, like a "title" of an article that could
>> be
>> both uppercase or lowercase.
>> It must be a lucene query because it is part of that request to it.
>>
>> 1) I'm indexing the full xmlcontent item "untokenized" because I want to
>> get
>> first character of the full sentence and not of any word of it;
>>
>> 2) I tried to send the query to CmsSearch with the setQuery(query) and
>> setParsedQuery(queryAlreadyParsedWithLuceneQuery). In both cases opencms
>> uses QueryParser of lucene to normalize the query which turn in a
>> modified
>> query with all terms to find in lowercase, tested activating the logs and
>> debugging. My text start with a UpperCase letter.
>>
>> 3) by the way, Luke with the query I make works fine!
>>
>> Anybody did some workaround or have some ideas such a parameter to allow
>> case insensive searches?
>>
>> Best regards,
>> Davide
>>
> 
> -- 
> Kind Regards,
> Rüdiger.
> 
> -------------------
> 
> Visit OpenCms Days 2012 Conference and Expo September 24 to 25, 2012 in 
> Cologne, Germany http://www.opencms-days.org
> 
> Rüdiger Kurz
> 
> Alkacon Software GmbH  - The OpenCms Experts
> http://www.alkacon.com - http://www.opencms.org
> _______________________________________________
> This mail is sent to you from the opencms-dev mailing list
> To change your list options, or to unsubscribe from the list, please visit
> http://lists.opencms.org/cgi-bin/mailman/listinfo/opencms-dev
> 
> 
> 
> 
> 

-- 
View this message in context: http://old.nabble.com/Lucene-Query-and-untokenized-xmlContent-items-tp34279293p34296149.html
Sent from the OpenCMS - Dev mailing list archive at Nabble.com.