[opencms-dev] Re: OT: JSP & UTF-8
Stephan Hartmann
hartmann at metamesh.de
Fri Oct 20 12:07:14 CEST 2006
Hi Joe,
as the URIEncoding attribute in the connector element only works for GET
requests i suggest you to place a simple servlet filter in front of your
servlet that only sets the character encoding of the request object's
content correctly to UTF-8 by calling
ServletRequest.setCharacterEncoding("UTF-8")
Regards,
Stephan
Joe Desbonnet schrieb:
> Christoph,
>
> Thanks very much for the informative posting. I'll going to keep it as
> a check list in future.
>
> I'm going to post to the log4j list about my problem. The odd thing is
> that I don't use/touch/reference these libraries in any way, yet they
> influence on the behaviour of the JSPs. I've been able to reduce it
> down to a simple test case:
>
> http://galway.net/tmp/UTF8Test.war (test form + submit script, with
> commons-logging and log4j libraries in WEB-INF/lib)
>
> http://galway.net/tmp/UTF8Test-nologlib.war (test form + submit
> script, with no libraries).
>
> On my setup (Tomcat 5.5.16 + JDK 1.5.0_07 Linux) they behave differently.
>
> My temporary solution is to remove commons-logging as it's not
> required right now (but some libraries I intend to use may need it :(
>
> Thanks again for your help,
>
> Joe.
>
>
>
> On 10/20/06, Christoph Schönfeld <cschoenfeld at sylphen.com> wrote:
>
>>
>> Hi Joe and fellow list readers,
>>
>> in the Tomcat implementation of the Java Servlet Specification, the
>> first
>> call to HttpServletRequest.getParameters() has the side
>> effect that the GET and POST data is parsed with the encoding in
>> effect for
>> the HttpServletRequest at that time. The Java Servlet Specification
>> defines
>> ISO-8859-1 to be the default encoding. The Tomcat implementation
>> caches the
>> result of that operation after the first call and does not reparse
>> them when
>> HttpServletRequest.setCharacterEncoding() is called.
>>
>> Joe, I could imagine that your log settings cause a call to
>> HttpServletRequest.getParameters() or getParameter(). If
>> that's the case your call to
>> HttpServletRequest.setCharacterEncoding() has absolutely no
>> effect because it is made too late.
>>
>>
>> There are two aspects which make Unicode handling difficult with HTTP.
>> First, there is no way the server can tell clients the expected input
>> charset. This is a logical consequence of the stateless nature of
>> HTTP. But
>> secondly, there is no way the client can tell a server the charset of the
>> request data which is quite unfortunate. Servers always have to guess or
>> rely. IMO this is where the specification fails.
>>
>> To fully support UTF-8, you have to take care to get data output and
>> input
>> right. Getting output right is easier because it's fully supported by the
>> HTTP Content-Type header.
>>
>> I use the following measures successfully with Tomcat:
>>
>> Output: Make sure that the content sent to the browser actually is
>> what it
>> declares to be.
>>
>> 1. Use UTF-8 as the JSP contentType. The JSP Specification is not as
>> precise on the effect of this setting as version 2.0 is: contentType
>> defines
>> the charset of the HTTP Response. (See JSP.2.10.2 The taglib Directive
>> on p.
>> 52 in the JSP Specification version 1.2, and JSP.1.10.2 on p. 48 in
>> the JSP
>> Specification version 2.0)
>> 2. Make sure the Content-Type header in your HttpServletResponse is
>> 'text/html; charset="UTF-8"'. (See
>> http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.17)
>> 3. If you create URLs in your HTML pages, use java.net.URLEncoder to
>> make
>> sure that non-ASCII values cannot get into the parameter values.
>> 4. Optional: consider storing the JSP file itself as UTF-8 and using the
>> pageEncoding directive to tell the JSP Processor about that.
>>
>> If you have a servlet sending HTML with <meta http-equiv="contenttype"
>> content="text/html; charset=UTF-8">, make sure you actually send UTF-8
>> data.
>> If you use the ServletOutputStream directly, make sure you wrap it in a
>> OutputStreamWriter initialized with the correct encoding. If you use
>> HttpServletResponse.getWriter(), make sure you call
>> setCharacterEncoding()
>> before you call getWriter()!
>>
>> However, this is only the output aspect: If this is right, the
>> browser is
>> able to correctly display the UTF-8 data.
>>
>> Input: The part most poeple forget is to take care that input data sent
>> back by the browser correct.
>>
>> 1. Use the attribute "accept-charset" in your HTML form tag: <form ...
>> accept-charset="UTF-8">. This tells the browser to send UTF-8 data. (See
>> http://www.w3.org/TR/html4/interact/forms.html#adef-accept-charset).
>> This requires a compliant browser and is no guarantee that it will
>> work but
>> it does with current browsers.
>> 2. If you use Tomcat, use the attribute URIEncoding="UTF-8" in the
>> Connector element in your server.xml. This makes UTF-8 input also work
>> for
>> GET parameters (see 3. above).
>>
>>
>> Please correct me if I am wrong somewhere.
>>
>> Christoph
>>
>>
>>
>>
>>
>> _______________________________________________
>> This mail is sent to you from the opencms-dev mailing list
>> To change your list options, or to unsubscribe from the list, please
>> visit
>> http://lists.opencms.org/mailman/listinfo/opencms-dev
>>
>>
>
> _______________________________________________
> This mail is sent to you from the opencms-dev mailing list
> To change your list options, or to unsubscribe from the list, please visit
> http://lists.opencms.org/mailman/listinfo/opencms-dev
--
Stephan Hartmann
metamesh
Lippstädter Str. 22
44143 Dortmund
http://www.metamesh.de/
More information about the opencms-dev
mailing list