[opencms-dev] Problem with German umlauts in the search (Lucene) with GET

Achim Westermann a.westermann at alkacon.com
Mon Sep 5 18:20:18 CEST 2005


Hi Corsin,

I expect your <defaultcontentencoding> (in opencms-system.xml) is not set to utf-8 but to a 
different like  ISO-8859-1. I recommend to set the default encoding to utf-8. This could already work.
If you insist on supporting e.g. both charsets the query String has to be encoded additionally using 
a java script form validation at clientside. This is some trouble with automatic encoding of 
browsers and automatic decoding of tomcat. While browsers use utf-8 to encode before submit forms 
tomcat decodes (at request parameter access time) the query using the request encoding which (here) 
is e.g. ISO-8859-1 because opencms serverd the searchpage in this default encoding with the meta 
charset content attribute and corresponding http headers.
The first encoding at client-side will make all special characters (Umlaute) disappear: Only the '%' 
character will remain as a "character to encode". The 2nd encoding then will encode the '%' a 2nd 
time. Automatic decoding of tomcat will turn these "%25" back to mere '%' which works regardless of 
any charset because it is in the ASCII range that will work for all exotic codepages (It is not 
harmful if a different encoding at client side was used). The 2nd OpenCms decode operation 
especially for the query now uses utf-8 (just as the browser did) and works correctly.

happy coding,

Achim

-- 
Achim Westermann
-------------------

Alkacon Software
Alexander Kandzior
An der Wachsfabrik 13
50996 Koeln, DE

Tel: +49 (0)2236 3826-0
Fax: +49 (0)2236 3826-20
Email: a.westermann at alkacon.com

http://www.alkacon.com


Corsin Camichel wrote:
> Hi everybody
> 
> I am having a problem with German umlauts (ä => ä ü => ü ö
> => ö) in the Lucene search engine if I work with action="GET".
> The parameters change to something strange like 
> ?query=%C3%96ffnungszeiten
> and OpenCMS decodes it back to
> Öffnungszeiten
> All documents are UTF-8 and have a  property locale=de
> 
> I hope anybody has an idea or can point me to a reference of this problem.
> 
> Thank you very much
> 
> Corsin
> 



More information about the opencms-dev mailing list