[opencms-dev] Problem with German umlauts in the search (Lucene) with GET

Corsin Camichel cocaman at gmail.com
Tue Sep 6 10:11:21 CEST 2005


Hi Achim

Thank you for your response.

> I expect your <defaultcontentencoding> (in opencms-system.xml) is not set to utf-8 but to a
> different like  ISO-8859-1. I recommend to set the default encoding to utf-8. This could already work.
I checked my config file and in there, the defaultcontentencoding is
set to UTF-8.

But I made a little workaround/hack for my problem.
First I send my query to a dummy jsp which replaces all the umlauts
and other signs to proper html code and do a send.redirect to the
results site with a new query string. This works fine for me and it is
not to much work.

Regards
Corsin

On 9/5/05, Achim Westermann <a.westermann at alkacon.com> wrote:
> Hi Corsin,
> 
> If you insist on supporting e.g. both charsets the query String has to be encoded additionally using
> a java script form validation at clientside. This is some trouble with automatic encoding of
> browsers and automatic decoding of tomcat. While browsers use utf-8 to encode before submit forms
> tomcat decodes (at request parameter access time) the query using the request encoding which (here)
> is e.g. ISO-8859-1 because opencms serverd the searchpage in this default encoding with the meta
> charset content attribute and corresponding http headers.
> The first encoding at client-side will make all special characters (Umlaute) disappear: Only the '%'
> character will remain as a "character to encode". The 2nd encoding then will encode the '%' a 2nd
> time. Automatic decoding of tomcat will turn these "%25" back to mere '%' which works regardless of
> any charset because it is in the ASCII range that will work for all exotic codepages (It is not
> harmful if a different encoding at client side was used). The 2nd OpenCms decode operation
> especially for the query now uses utf-8 (just as the browser did) and works correctly.
> 
> happy coding,
> 
> Achim
> 
> --
> Achim Westermann
> -------------------
> 
> Alkacon Software
> Alexander Kandzior
> An der Wachsfabrik 13
> 50996 Koeln, DE
> 
> Tel: +49 (0)2236 3826-0
> Fax: +49 (0)2236 3826-20
> Email: a.westermann at alkacon.com
> 
> http://www.alkacon.com
> 
> 
> Corsin Camichel wrote:
> > Hi everybody
> >
> > I am having a problem with German umlauts (ä => ä ü => ü ö
> > => ö) in the Lucene search engine if I work with action="GET".
> > The parameters change to something strange like
> > ?query=%C3%96ffnungszeiten
> > and OpenCMS decodes it back to
> > Öffnungszeiten
> > All documents are UTF-8 and have a  property locale=de
> >
> > I hope anybody has an idea or can point me to a reference of this problem.
> >
> > Thank you very much
> >
> > Corsin
> >
> 
> 
> _______________________________________________
> This mail is send to you from the opencms-dev mailing list
> To change your list options, or to unsubscribe from the list, please visit
> http://mail.opencms.org/mailman/listinfo/opencms-dev
> 


-- 
Corsin Camichel
cocaman at gmail.com


More information about the opencms-dev mailing list