[opencms-dev] Lucene 1.4, Spanish Analyzer

Ernesto De Santis ernesto.desantis at colaborativa.net
Thu Mar 4 16:47:01 CET 2004


Hi Stephan

> seems that your analyzer reduces the words to their stem, i.e. it removes
> the inflections and declinations. This is just what a stemmer should do.

Well, How to others language analyzers work?
With Stemmer's?

> Can you verify this for verbs ending -ir and -er or their inflections,
too?

Yes, all verbs in spanish ending with ar, er, OR ir (jeje, or in english,
the spanish verbs don´t end with or)
The SpanishStemmer remove the "er" and "ir" correctly.
But I try with "caer" and don´t remove "er" !!! (caer = to fall, maybe the
word "ca" is very short?)
With anothers words, like "destruir" work fine, remove "ir".

> To get the right results when you are searching for your words you should
> use your analyzer also to parse the search query

Yes, i search with my analyzer, and found the results. But if I search with
the striped word, found the document! and this word don´t exist in spanish.

> (SearchHelper.doSimpleSearch only uses a StopAnalyzer).

I don´t use SearchHelper. I think that this don´t work for my application. I
parse the string query with

Ernesto.


----- Original Message ----- 
From: "Ernesto De Santis" <ernesto.desantis at colaborativa.net>
To: "OpenCms List" <opencms-dev at opencms.org>
Sent: Thursday, March 04, 2004 3:16 PM
Subject: [opencms-dev] Lucene 1.4, Spanish Analyzer


> > AFAIK, the Spanish analyzer is in the SnowballAnalyzers package at
> > jakarta.apache.com/lucene.
> > I've never used these, since my content is in English. Please tell me if
> > you have any trouble using them with the Lucene module.
>
> Hi Matt, I remember your mail of past year.
> my SpanishAnalyzer build with SnowballAnalyzers, (SpanishStemmer) work
very
> fine.
>
> I have some commentaries:
>
> The words finished in "ado" indexes them clearing that string. The same
with
> "ar".
> For example: "teclado" is indexed like "tecl", but "tecl" does not exist
in
> Spanish. "teclear" is also indexed like "tecl". but if the user enters
> "tecl" it finds that document. That not this good...
> In any case, it would have that to index "tecla".
>
> ----- spanish to english ----
> teclado = keyboard
> tecla = key
> teclear = to key in
>
> Other case: more complicated
>
> ----- spanish to english ----
> Comer: to eat
> Como: How to AND I eat
> Come: it eats
>
> If the user write: "Mi nena no come". (My baby does not eat)
> all´s Como .... (how to....) are founded.
>
> Ernesto.
>
> _______________________________________________
> This mail is send to you from the opencms-dev mailing list
> To change your list options, or to unsubscribe from the list, please visit
> http://mail.opencms.org/mailman/listinfo/opencms-dev

_______________________________________________
This mail is send to you from the opencms-dev mailing list
To change your list options, or to unsubscribe from the list, please visit
http://mail.opencms.org/mailman/listinfo/opencms-dev





More information about the opencms-dev mailing list