[opencms-dev] Problem with German umlauts in the search (Lucene) with GET
Claus Priisholm
cpr at codedroids.com
Wed Sep 7 08:41:40 CEST 2005
Xavier Ottolini wrote:
> Hi,
>
>
>> And indeed that seem to be what happens while logged into the
>> workspace. But when accessing the site from the outside the problem
>> surfaces again.
>> Setting the character encoding explicitly in say the template jsp does
>> not help - it seems that encoding is either set wrongly or set to
>> default by tomcat (iso-8859-1 it seems) by accessing a parameter
>> before the template jsp gets to set it explicitly.
>
> When OpenCms exports the files is html, the final user downloads static
> html files.
> If Apache httpd is set a front server, a parameter is to set in the
> httpd.conf .
> AddDefaultCharset UTF-8
That would could of course be another player in the game, but in my case
pages were served dynamically. And the issue with multi-byte encoding is
present event when running a simple (non-OpenCms) JSP directly in Tomcat
- IE seemingly send the post-content as 8859-1 regardless, and others do
send accordingly to the encoding set in the HTML meta-tags. IE's
approach seems to be wrong (once again), and who knows when that is
about to change. So that's why I've decided to add a hidden field with a
multi-byte character to my form processing - if it is returned in a
mangled state then I assume that somewhere along the lines there was an
encoding problem that need to be fixed:
value = new String(value.getBytes("ISO-8859-1"), "UTF-8");
This works with IE, Fx (on Mac and Windows) & Safari, outside OpenCms or
inside whether in offline project or not.
Note that according to some postings on the net Tomcat always defaults
to ISO-8859-1, but what Tomcat or IE for that matter does when run in
another default locale I don't know.
>
>
> For Apaphe 1.3, the settings are the following :
>
> For instance :
> <VirtualHost *:80>
> ServerAdmin webmaster at myhost.com
> DocumentRoot /home/apache/myhost/www
> ServerName www.myhost.com
>
> ErrorDocument 404 /errordocs/404.html
> <Directory "/home/apache/myhost/www/errordocs">
> AllowOverride None
> Order allow,deny
> Allow from all
> Options MultiViews
> </Directory>
>
> AddDefaultCharset UTF-8
>
> JkMount /formmail wrkr
> JkMount /*.jsp wrkr
> JkMount /EcardServlet wrkr
> <Location "/opencms/WEB-INF/">
> AllowOverride None
> deny from all
> </Location>
> </VirtualHost>
>
> I think that with apache 2 + tomcat 5 + opencms 6, the settings are
> different (according to the howtoos). But there is probably a similar
> parameter in the apache 2 settings.
>
> I hope that it helps !
>
> Xavier Ottolini
> Développeur multimédia
>
> Adelis
> 37, rue d'Engwiller
> 67350 La Walck
> France
> Téléphone : +33 (0) 3 88 72 29 10
> Télécopie : +33 (0) 3 88 72 29 19
> http://www.adelis.com
>
>>
>> /Claus
>>
>> Corsin Camichel wrote:
>>
>>> Hi Achim
>>>
>>> Thank you for your response.
>>>
>>>
>>>> I expect your <defaultcontentencoding> (in opencms-system.xml) is
>>>> not set to utf-8 but to a
>>>> different like ISO-8859-1. I recommend to set the default encoding
>>>> to utf-8. This could already work.
>>>
>>>
>>> I checked my config file and in there, the defaultcontentencoding is
>>> set to UTF-8.
>>>
>>> But I made a little workaround/hack for my problem.
>>> First I send my query to a dummy jsp which replaces all the umlauts
>>> and other signs to proper html code and do a send.redirect to the
>>> results site with a new query string. This works fine for me and it is
>>> not to much work.
>>>
>>> Regards
>>> Corsin
>>>
>>> On 9/5/05, Achim Westermann <a.westermann at alkacon.com> wrote:
>>>
>>>> Hi Corsin,
>>>>
>>>> If you insist on supporting e.g. both charsets the query String has
>>>> to be encoded additionally using
>>>> a java script form validation at clientside. This is some trouble
>>>> with automatic encoding of
>>>> browsers and automatic decoding of tomcat. While browsers use utf-8
>>>> to encode before submit forms
>>>> tomcat decodes (at request parameter access time) the query using
>>>> the request encoding which (here)
>>>> is e.g. ISO-8859-1 because opencms serverd the searchpage in this
>>>> default encoding with the meta
>>>> charset content attribute and corresponding http headers.
>>>> The first encoding at client-side will make all special characters
>>>> (Umlaute) disappear: Only the '%'
>>>> character will remain as a "character to encode". The 2nd encoding
>>>> then will encode the '%' a 2nd
>>>> time. Automatic decoding of tomcat will turn these "%25" back to
>>>> mere '%' which works regardless of
>>>> any charset because it is in the ASCII range that will work for all
>>>> exotic codepages (It is not
>>>> harmful if a different encoding at client side was used). The 2nd
>>>> OpenCms decode operation
>>>> especially for the query now uses utf-8 (just as the browser did)
>>>> and works correctly.
>>>>
>>>> happy coding,
>>>>
>>>> Achim
>>>>
>>>> --
>>>> Achim Westermann
>>>> -------------------
>>>>
>>>> Alkacon Software
>>>> Alexander Kandzior
>>>> An der Wachsfabrik 13
>>>> 50996 Koeln, DE
>>>>
>>>> Tel: +49 (0)2236 3826-0
>>>> Fax: +49 (0)2236 3826-20
>>>> Email: a.westermann at alkacon.com
>>>>
>>>> http://www.alkacon.com
>>>>
>>>>
>>>> Corsin Camichel wrote:
>>>>
>>>>> Hi everybody
>>>>>
>>>>> I am having a problem with German umlauts (ä => ä ü => ü ö
>>>>> => ö) in the Lucene search engine if I work with action="GET".
>>>>> The parameters change to something strange like
>>>>> ?query=%C3%96ffnungszeiten
>>>>> and OpenCMS decodes it back to
>>>>> Öffnungszeiten
>>>>> All documents are UTF-8 and have a property locale=de
>>>>>
>>>>> I hope anybody has an idea or can point me to a reference of this
>>>>> problem.
>>>>>
>>>>> Thank you very much
>>>>>
>>>>> Corsin
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> This mail is send to you from the opencms-dev mailing list
>>>> To change your list options, or to unsubscribe from the list, please
>>>> visit
>>>> http://mail.opencms.org/mailman/listinfo/opencms-dev
>>>>
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------
>>>
>>>
>>>
>>> _______________________________________________
>>> This mail is send to you from the opencms-dev mailing list
>>> To change your list options, or to unsubscribe from the list, please
>>> visit
>>> http://mail.opencms.org/mailman/listinfo/opencms-dev
>>
>>
>
> ------------------------------------------------------------------------
>
>
>
> _______________________________________________
> This mail is send to you from the opencms-dev mailing list
> To change your list options, or to unsubscribe from the list, please visit
> http://mail.opencms.org/mailman/listinfo/opencms-dev
--
Claus Priisholm, CodeDroids ApS
cpr (you know what) codedroids.com - http://www.codedroids.com
Javadocs and other OpenCms stuff:
http://www.codedroids.com/community/opencms
More information about the opencms-dev
mailing list