[opencms-dev] Lucene Query and untokenized xmlContent items
"Rüdiger Kurz (list)"
r.kurz at alkacon.com
Tue Aug 14 14:37:45 CEST 2012
Hi,
not the most clean way, but I think clean enough!
@see org.opencms.search.CmsSearchManager.getAnalyzer(String, String)
the analyzer attribute in:
<field
name="title"
store="true"
index="true"
excerpt="true"
analyzer="your.analyzer.package.ClassName">
[... mappings ...]
</field>
can also be a full qualified class name.
And the class org.apache.lucene.analysis.Analyzer as well as its method
Analyzer#tokenStream(String, Reader) are public.
regards
On 08/14/2012 12:57 PM, Cavva79 wrote:
>
> Hi Rudiger,
>
> finally I did implementing a new analyzer that uses the
> LimitTokenCountAnalyzer.
>
> package org.apache.lucene.analysis;
>
> public class FirstWordAnalyzer extends Analyzer {
>
> private final Analyzer delegate;
> private final Version matchVersion;
>
> public FirstWordAnalyzer(Version matchVersion) {
> this.matchVersion = matchVersion;
> delegate = new LimitTokenCountAnalyzer(new
> SimpleAnalyzer(this.matchVersion), 1);
> }
>
> @Override
> public TokenStream tokenStream(String fieldName, Reader reader) {
> return delegate.tokenStream(fieldName, reader);
> }
> }
>
> and configured the field in opencms-search.xml such it:
>
> <field name="title"store="true" index="true"
> excerpt="true" analyzer="FirstWordAnalyzer">
> <mapping type="item">title[1]</mapping>
> </field>
>
> I think is not the most clean way to do it. For example the analyzer must
> stay onto package org.apache.lucene.analysis.
>
> Any suggestion to do it in a better way?
>
> thank you Rudiger and thank you all
>
> Davide
>
>
> "Rüdiger Kurz (list)" wrote:
>>
>> Hi,
>>
>> the searches in OpenCms are performed in the class CmsSearchIndex.
>> You're are able to extend this class and override:
>>
>> CmsSearchIndex#search(CmsObject, CmsSearchParameters)
>>
>> maybe you can go this way?
>>
>> regards Rüdiger
>>
>>
>> On 08/14/2012 10:21 AM, Cavva79 wrote:
>>>
>>> Hi there,
>>>
>>> I'm new for this forum.
>>>
>>> I'm using opencms 8.0.4, Tomcat 7.0.12 and JDK 6
>>> To be short I've troubles trying to get first character of a xmlcontent
>>> field, which is a small sentence, like a "title" of an article that could
>>> be
>>> both uppercase or lowercase.
>>> It must be a lucene query because it is part of that request to it.
>>>
>>> 1) I'm indexing the full xmlcontent item "untokenized" because I want to
>>> get
>>> first character of the full sentence and not of any word of it;
>>>
>>> 2) I tried to send the query to CmsSearch with the setQuery(query) and
>>> setParsedQuery(queryAlreadyParsedWithLuceneQuery). In both cases opencms
>>> uses QueryParser of lucene to normalize the query which turn in a
>>> modified
>>> query with all terms to find in lowercase, tested activating the logs and
>>> debugging. My text start with a UpperCase letter.
>>>
>>> 3) by the way, Luke with the query I make works fine!
>>>
>>> Anybody did some workaround or have some ideas such a parameter to allow
>>> case insensive searches?
>>>
>>> Best regards,
>>> Davide
--
Kind Regards,
Rüdiger.
-------------------
Visit OpenCms Days 2012 Conference and Expo September 24 to 25, 2012 in
Cologne, Germany http://www.opencms-days.org
Rüdiger Kurz
Alkacon Software GmbH - The OpenCms Experts
http://www.alkacon.com - http://www.opencms.org
More information about the opencms-dev
mailing list