[opencms-dev] Lucene-search: stop words aren't displayed insearchresultlist

Christian Steinert christian_steinert at web.de
Tue May 30 12:22:19 CEST 2006


Jonathan Woods schrieb:
> Jason -
>
> I can tell this problem is in my near future too.  Is it really necessary to
> create a patch?  I was hoping to specify an analyser class in
> opencms-search.xml and get round the problem that way.
>
> Jon
>   
Dear Jason, dear Jonathan,

I found this overview somewhere on the web, which shows that each 
analyzer uses a fixed filter/analyzer configuration, so it seems that 
each analyzer may contain both filters as well as stemmers.

Class:   Tokenizer and TokenFilter

* GermanAnalyzer:    StandardTokenizer, StandardFilter, StopFilter 
(deutsch alsStandard, alternative Wortliste möglich), GermanStemFilter
(variable Exclude-Liste), LowerCaseFilter
* SimpleAnalyzer:      LowerCaseTokenizer
* StandardAnalyzer:     StandardTokenizer, StandardFilter, 
LowerCaseFilter, StopFilter (englisch als Standard, alternative 
Wortliste möglich)
* StopAnalyzer:           LowerCaseTokenizer, StopFilter (englisch als 
Standard, alternative Wortliste möglich)
* WhitespaceAnalyzer:     WhitespaceTokenizer

The clean way would be to pull the preview content not from lucene, but 
from opencms.
Is this the way it's done? Is the mistake just that opencms filters the 
content badly before displaying it?

Jason - I would be *very* interested in taking a look at your patched 
code. Maybe the whitespace removal inside of opencms could just be done 
with a more primitive maybe the whitespace Analyzer (whitespace 
analyzer). I think it's quite mistaken to filter the preview through the 
same analyzer that is used for indexing.

Christian




More information about the opencms-dev mailing list