[opencms-dev] Lucene-search: stop words aren't displayed insearchresultlist
Christian Steinert
christian_steinert at web.de
Tue May 30 12:22:19 CEST 2006
Jonathan Woods schrieb:
> Jason -
>
> I can tell this problem is in my near future too. Is it really necessary to
> create a patch? I was hoping to specify an analyser class in
> opencms-search.xml and get round the problem that way.
>
> Jon
>
Dear Jason, dear Jonathan,
I found this overview somewhere on the web, which shows that each
analyzer uses a fixed filter/analyzer configuration, so it seems that
each analyzer may contain both filters as well as stemmers.
Class: Tokenizer and TokenFilter
* GermanAnalyzer: StandardTokenizer, StandardFilter, StopFilter
(deutsch alsStandard, alternative Wortliste möglich), GermanStemFilter
(variable Exclude-Liste), LowerCaseFilter
* SimpleAnalyzer: LowerCaseTokenizer
* StandardAnalyzer: StandardTokenizer, StandardFilter,
LowerCaseFilter, StopFilter (englisch als Standard, alternative
Wortliste möglich)
* StopAnalyzer: LowerCaseTokenizer, StopFilter (englisch als
Standard, alternative Wortliste möglich)
* WhitespaceAnalyzer: WhitespaceTokenizer
The clean way would be to pull the preview content not from lucene, but
from opencms.
Is this the way it's done? Is the mistake just that opencms filters the
content badly before displaying it?
Jason - I would be *very* interested in taking a look at your patched
code. Maybe the whitespace removal inside of opencms could just be done
with a more primitive maybe the whitespace Analyzer (whitespace
analyzer). I think it's quite mistaken to filter the preview through the
same analyzer that is used for indexing.
Christian
More information about the opencms-dev
mailing list