[opencms-dev] Problem with German umlauts in the search (Lucene) with GET

Claus Priisholm cpr at codedroids.com
Tue Sep 6 13:32:34 CEST 2005


I can confirm that. Having all UTF-8 does not solve all problems. One 
problem is that IE somehow gets the right characters through while Fx 
(and others) doesn't.

The usual remedy is set the encoding on the request before the first 
access to the parameters (such as request.getParameter()). And indeed 
that seem to be what happens while logged into the workspace. But when 
accessing the site from the outside the problem surfaces again.
Setting the character encoding explicitly in say the template jsp does 
not help - it seems that encoding is either set wrongly or set to 
default by tomcat (iso-8859-1 it seems) by accessing a parameter before 
the template jsp gets to set it explicitly.

A small test JSP run inside OpenCms behaves differently compared to when 
it is run outside.

My workaround ended up being to send along a parameter with a character 
known to require multiple bytes in UTF-8. Then it is fairly simple to 
detect whether the parameters have been mangled or not. If they are 
mangled they can be converted into UTF-8 manually before continue the 
form processing. The only wild card is to guess what encoding that 
string really is then, but it seems that tomcat defaults to ISO-8859-1.

/Claus

Corsin Camichel wrote:
> Hi Achim
> 
> Thank you for your response.
> 
> 
>>I expect your <defaultcontentencoding> (in opencms-system.xml) is not set to utf-8 but to a
>>different like  ISO-8859-1. I recommend to set the default encoding to utf-8. This could already work.
> 
> I checked my config file and in there, the defaultcontentencoding is
> set to UTF-8.
> 
> But I made a little workaround/hack for my problem.
> First I send my query to a dummy jsp which replaces all the umlauts
> and other signs to proper html code and do a send.redirect to the
> results site with a new query string. This works fine for me and it is
> not to much work.
> 
> Regards
> Corsin
> 
> On 9/5/05, Achim Westermann <a.westermann at alkacon.com> wrote:
> 
>>Hi Corsin,
>>
>>If you insist on supporting e.g. both charsets the query String has to be encoded additionally using
>>a java script form validation at clientside. This is some trouble with automatic encoding of
>>browsers and automatic decoding of tomcat. While browsers use utf-8 to encode before submit forms
>>tomcat decodes (at request parameter access time) the query using the request encoding which (here)
>>is e.g. ISO-8859-1 because opencms serverd the searchpage in this default encoding with the meta
>>charset content attribute and corresponding http headers.
>>The first encoding at client-side will make all special characters (Umlaute) disappear: Only the '%'
>>character will remain as a "character to encode". The 2nd encoding then will encode the '%' a 2nd
>>time. Automatic decoding of tomcat will turn these "%25" back to mere '%' which works regardless of
>>any charset because it is in the ASCII range that will work for all exotic codepages (It is not
>>harmful if a different encoding at client side was used). The 2nd OpenCms decode operation
>>especially for the query now uses utf-8 (just as the browser did) and works correctly.
>>
>>happy coding,
>>
>>Achim
>>
>>--
>>Achim Westermann
>>-------------------
>>
>>Alkacon Software
>>Alexander Kandzior
>>An der Wachsfabrik 13
>>50996 Koeln, DE
>>
>>Tel: +49 (0)2236 3826-0
>>Fax: +49 (0)2236 3826-20
>>Email: a.westermann at alkacon.com
>>
>>http://www.alkacon.com
>>
>>
>>Corsin Camichel wrote:
>>
>>>Hi everybody
>>>
>>>I am having a problem with German umlauts (ä => ä ü => ü ö
>>>=> ö) in the Lucene search engine if I work with action="GET".
>>>The parameters change to something strange like
>>>?query=%C3%96ffnungszeiten
>>>and OpenCMS decodes it back to
>>>Öffnungszeiten
>>>All documents are UTF-8 and have a  property locale=de
>>>
>>>I hope anybody has an idea or can point me to a reference of this problem.
>>>
>>>Thank you very much
>>>
>>>Corsin
>>>
>>
>>
>>_______________________________________________
>>This mail is send to you from the opencms-dev mailing list
>>To change your list options, or to unsubscribe from the list, please visit
>>http://mail.opencms.org/mailman/listinfo/opencms-dev
>>
> 
> 
> 
> 
> ------------------------------------------------------------------------
> 
> 
> 
> _______________________________________________
> This mail is send to you from the opencms-dev mailing list
> To change your list options, or to unsubscribe from the list, please visit
> http://mail.opencms.org/mailman/listinfo/opencms-dev

-- 
Claus Priisholm, CodeDroids ApS
cpr (you know what) codedroids.com - http://www.codedroids.com

Javadocs and other OpenCms stuff: 
http://www.codedroids.com/community/opencms




More information about the opencms-dev mailing list