[opencms-dev] Lucene Query and untokenized xmlContent items

Tue Aug 14 14:37:45 CEST 2012

Hi,

not the most clean way, but I think clean enough!

@see org.opencms.search.CmsSearchManager.getAnalyzer(String, String)

the analyzer attribute in:

<field
     name="title"
     store="true"
     index="true"
     excerpt="true"
     analyzer="your.analyzer.package.ClassName">
     [... mappings ...]
</field>

can also be a full qualified class name.

And the class  org.apache.lucene.analysis.Analyzer as well as its method 
Analyzer#tokenStream(String, Reader) are public.

regards

On 08/14/2012 12:57 PM, Cavva79 wrote:
>
> Hi Rudiger,
>
> finally I did implementing a new analyzer that uses the
> LimitTokenCountAnalyzer.
>
> package org.apache.lucene.analysis;
>
> public class FirstWordAnalyzer extends Analyzer {
>
> 	private final Analyzer delegate;
> 	private final Version matchVersion;
>
> 	public FirstWordAnalyzer(Version matchVersion) {
> 		this.matchVersion = matchVersion;
> 		delegate = new LimitTokenCountAnalyzer(new
> SimpleAnalyzer(this.matchVersion), 1);
> 	}
>
> 	@Override
> 	public TokenStream tokenStream(String fieldName, Reader reader) {
> 		return delegate.tokenStream(fieldName, reader);
> 	}
> }
>
> and configured the field in opencms-search.xml such it:
>
>                      <field name="title"store="true" index="true"
> excerpt="true" analyzer="FirstWordAnalyzer">
>                          <mapping type="item">title[1]</mapping>
>                      </field>
>
> I think is not the most clean way to do it. For example the analyzer must
> stay onto package org.apache.lucene.analysis.
>
> Any suggestion to do it in a better way?
>
> thank you Rudiger and thank you all
>
> Davide
>
>
> "Rüdiger Kurz (list)" wrote:
>>
>> Hi,
>>
>> the searches in OpenCms are performed in the class CmsSearchIndex.
>> You're are able to extend this class and override:
>>
>> CmsSearchIndex#search(CmsObject, CmsSearchParameters)
>>
>> maybe you can go this way?
>>
>> regards Rüdiger
>>
>>
>> On 08/14/2012 10:21 AM, Cavva79 wrote:
>>>
>>> Hi there,
>>>
>>> I'm new for this forum.
>>>
>>> I'm using opencms 8.0.4, Tomcat 7.0.12 and JDK 6
>>> To be short I've troubles trying to get first character of a xmlcontent
>>> field, which is a small sentence, like a "title" of an article that could
>>> be
>>> both uppercase or lowercase.
>>> It must be a lucene query because it is part of that request to it.
>>>
>>> 1) I'm indexing the full xmlcontent item "untokenized" because I want to
>>> get
>>> first character of the full sentence and not of any word of it;
>>>
>>> 2) I tried to send the query to CmsSearch with the setQuery(query) and
>>> setParsedQuery(queryAlreadyParsedWithLuceneQuery). In both cases opencms
>>> uses QueryParser of lucene to normalize the query which turn in a
>>> modified
>>> query with all terms to find in lowercase, tested activating the logs and
>>> debugging. My text start with a UpperCase letter.
>>>
>>> 3) by the way, Luke with the query I make works fine!
>>>
>>> Anybody did some workaround or have some ideas such a parameter to allow
>>> case insensive searches?
>>>
>>> Best regards,
>>> Davide

-- 
Kind Regards,
Rüdiger.

-------------------

Visit OpenCms Days 2012 Conference and Expo September 24 to 25, 2012 in 
Cologne, Germany http://www.opencms-days.org

Rüdiger Kurz

Alkacon Software GmbH  - The OpenCms Experts
http://www.alkacon.com - http://www.opencms.org