[opencms-dev] Lucene Query and untokenized xmlContent items
Cavva79
davide.cavarretta at gmail.com
Tue Aug 14 12:57:58 CEST 2012
Hi Rudiger,
finally I did implementing a new analyzer that uses the
LimitTokenCountAnalyzer.
package org.apache.lucene.analysis;
public class FirstWordAnalyzer extends Analyzer {
private final Analyzer delegate;
private final Version matchVersion;
public FirstWordAnalyzer(Version matchVersion) {
this.matchVersion = matchVersion;
delegate = new LimitTokenCountAnalyzer(new
SimpleAnalyzer(this.matchVersion), 1);
}
@Override
public TokenStream tokenStream(String fieldName, Reader reader) {
return delegate.tokenStream(fieldName, reader);
}
}
and configured the field in opencms-search.xml such it:
<field name="title"store="true" index="true"
excerpt="true" analyzer="FirstWordAnalyzer">
<mapping type="item">title[1]</mapping>
</field>
I think is not the most clean way to do it. For example the analyzer must
stay onto package org.apache.lucene.analysis.
Any suggestion to do it in a better way?
thank you Rudiger and thank you all
Davide
"Rüdiger Kurz (list)" wrote:
>
> Hi,
>
> the searches in OpenCms are performed in the class CmsSearchIndex.
> You're are able to extend this class and override:
>
> CmsSearchIndex#search(CmsObject, CmsSearchParameters)
>
> maybe you can go this way?
>
> regards Rüdiger
>
>
> On 08/14/2012 10:21 AM, Cavva79 wrote:
>>
>> Hi there,
>>
>> I'm new for this forum.
>>
>> I'm using opencms 8.0.4, Tomcat 7.0.12 and JDK 6
>> To be short I've troubles trying to get first character of a xmlcontent
>> field, which is a small sentence, like a "title" of an article that could
>> be
>> both uppercase or lowercase.
>> It must be a lucene query because it is part of that request to it.
>>
>> 1) I'm indexing the full xmlcontent item "untokenized" because I want to
>> get
>> first character of the full sentence and not of any word of it;
>>
>> 2) I tried to send the query to CmsSearch with the setQuery(query) and
>> setParsedQuery(queryAlreadyParsedWithLuceneQuery). In both cases opencms
>> uses QueryParser of lucene to normalize the query which turn in a
>> modified
>> query with all terms to find in lowercase, tested activating the logs and
>> debugging. My text start with a UpperCase letter.
>>
>> 3) by the way, Luke with the query I make works fine!
>>
>> Anybody did some workaround or have some ideas such a parameter to allow
>> case insensive searches?
>>
>> Best regards,
>> Davide
>>
>
> --
> Kind Regards,
> Rüdiger.
>
> -------------------
>
> Visit OpenCms Days 2012 Conference and Expo September 24 to 25, 2012 in
> Cologne, Germany http://www.opencms-days.org
>
> Rüdiger Kurz
>
> Alkacon Software GmbH - The OpenCms Experts
> http://www.alkacon.com - http://www.opencms.org
> _______________________________________________
> This mail is sent to you from the opencms-dev mailing list
> To change your list options, or to unsubscribe from the list, please visit
> http://lists.opencms.org/cgi-bin/mailman/listinfo/opencms-dev
>
>
>
>
>
--
View this message in context: http://old.nabble.com/Lucene-Query-and-untokenized-xmlContent-items-tp34279293p34296149.html
Sent from the OpenCMS - Dev mailing list archive at Nabble.com.
More information about the opencms-dev
mailing list