[opencms-dev] Problem with German umlauts in the search (Lucene) with GET
Claus Priisholm
cpr at codedroids.com
Tue Sep 6 13:32:34 CEST 2005
I can confirm that. Having all UTF-8 does not solve all problems. One
problem is that IE somehow gets the right characters through while Fx
(and others) doesn't.
The usual remedy is set the encoding on the request before the first
access to the parameters (such as request.getParameter()). And indeed
that seem to be what happens while logged into the workspace. But when
accessing the site from the outside the problem surfaces again.
Setting the character encoding explicitly in say the template jsp does
not help - it seems that encoding is either set wrongly or set to
default by tomcat (iso-8859-1 it seems) by accessing a parameter before
the template jsp gets to set it explicitly.
A small test JSP run inside OpenCms behaves differently compared to when
it is run outside.
My workaround ended up being to send along a parameter with a character
known to require multiple bytes in UTF-8. Then it is fairly simple to
detect whether the parameters have been mangled or not. If they are
mangled they can be converted into UTF-8 manually before continue the
form processing. The only wild card is to guess what encoding that
string really is then, but it seems that tomcat defaults to ISO-8859-1.
/Claus
Corsin Camichel wrote:
> Hi Achim
>
> Thank you for your response.
>
>
>>I expect your <defaultcontentencoding> (in opencms-system.xml) is not set to utf-8 but to a
>>different like ISO-8859-1. I recommend to set the default encoding to utf-8. This could already work.
>
> I checked my config file and in there, the defaultcontentencoding is
> set to UTF-8.
>
> But I made a little workaround/hack for my problem.
> First I send my query to a dummy jsp which replaces all the umlauts
> and other signs to proper html code and do a send.redirect to the
> results site with a new query string. This works fine for me and it is
> not to much work.
>
> Regards
> Corsin
>
> On 9/5/05, Achim Westermann <a.westermann at alkacon.com> wrote:
>
>>Hi Corsin,
>>
>>If you insist on supporting e.g. both charsets the query String has to be encoded additionally using
>>a java script form validation at clientside. This is some trouble with automatic encoding of
>>browsers and automatic decoding of tomcat. While browsers use utf-8 to encode before submit forms
>>tomcat decodes (at request parameter access time) the query using the request encoding which (here)
>>is e.g. ISO-8859-1 because opencms serverd the searchpage in this default encoding with the meta
>>charset content attribute and corresponding http headers.
>>The first encoding at client-side will make all special characters (Umlaute) disappear: Only the '%'
>>character will remain as a "character to encode". The 2nd encoding then will encode the '%' a 2nd
>>time. Automatic decoding of tomcat will turn these "%25" back to mere '%' which works regardless of
>>any charset because it is in the ASCII range that will work for all exotic codepages (It is not
>>harmful if a different encoding at client side was used). The 2nd OpenCms decode operation
>>especially for the query now uses utf-8 (just as the browser did) and works correctly.
>>
>>happy coding,
>>
>>Achim
>>
>>--
>>Achim Westermann
>>-------------------
>>
>>Alkacon Software
>>Alexander Kandzior
>>An der Wachsfabrik 13
>>50996 Koeln, DE
>>
>>Tel: +49 (0)2236 3826-0
>>Fax: +49 (0)2236 3826-20
>>Email: a.westermann at alkacon.com
>>
>>http://www.alkacon.com
>>
>>
>>Corsin Camichel wrote:
>>
>>>Hi everybody
>>>
>>>I am having a problem with German umlauts (ä => ä ü => ü ö
>>>=> ö) in the Lucene search engine if I work with action="GET".
>>>The parameters change to something strange like
>>>?query=%C3%96ffnungszeiten
>>>and OpenCMS decodes it back to
>>>Öffnungszeiten
>>>All documents are UTF-8 and have a property locale=de
>>>
>>>I hope anybody has an idea or can point me to a reference of this problem.
>>>
>>>Thank you very much
>>>
>>>Corsin
>>>
>>
>>
>>_______________________________________________
>>This mail is send to you from the opencms-dev mailing list
>>To change your list options, or to unsubscribe from the list, please visit
>>http://mail.opencms.org/mailman/listinfo/opencms-dev
>>
>
>
>
>
> ------------------------------------------------------------------------
>
>
>
> _______________________________________________
> This mail is send to you from the opencms-dev mailing list
> To change your list options, or to unsubscribe from the list, please visit
> http://mail.opencms.org/mailman/listinfo/opencms-dev
--
Claus Priisholm, CodeDroids ApS
cpr (you know what) codedroids.com - http://www.codedroids.com
Javadocs and other OpenCms stuff:
http://www.codedroids.com/community/opencms
More information about the opencms-dev
mailing list