[opencms-dev] Re: OT: JSP & UTF-8

Joe Desbonnet jdesbonnet at gmail.com
Fri Oct 20 11:50:55 CEST 2006


Christoph,

Thanks very much for the informative posting. I'll going to keep it as
a check list in future.

I'm going to post to the log4j list about my problem. The odd thing is
that I don't use/touch/reference these libraries in any way, yet they
influence on the behaviour of the JSPs. I've been able to reduce it
down to a simple test case:

http://galway.net/tmp/UTF8Test.war   (test form + submit script, with
commons-logging and log4j libraries in WEB-INF/lib)

http://galway.net/tmp/UTF8Test-nologlib.war (test form + submit
script, with no libraries).

On my setup (Tomcat 5.5.16 + JDK 1.5.0_07 Linux) they behave differently.

My temporary solution is to remove commons-logging as it's not
required right now (but some libraries I intend to use may need it :(

Thanks again for your help,

Joe.



On 10/20/06, Christoph Schönfeld <cschoenfeld at sylphen.com> wrote:
>
>  Hi Joe and fellow list readers,
>
>  in the Tomcat implementation of the Java Servlet Specification, the first
> call to HttpServletRequest.getParameters() has the side
> effect that the GET and POST data is  parsed with the encoding in effect for
> the HttpServletRequest at that time. The Java Servlet Specification defines
> ISO-8859-1 to be the default encoding. The Tomcat implementation caches the
> result of that operation after the first call and does not reparse them when
> HttpServletRequest.setCharacterEncoding() is called.
>
>  Joe, I could imagine that your log settings cause a call to
> HttpServletRequest.getParameters() or getParameter(). If
> that's the case your call to
> HttpServletRequest.setCharacterEncoding() has absolutely no
> effect because it is made too late.
>
>
>  There are two aspects which make Unicode handling difficult with HTTP.
> First, there is no way the server can tell clients the expected input
> charset. This is a logical consequence of the stateless nature of HTTP. But
> secondly, there is no way the client can tell a server the charset of the
> request data which is quite unfortunate. Servers always have to guess or
> rely. IMO this is where the specification fails.
>
>  To fully support UTF-8, you have to take care to get data output and input
> right. Getting output right is easier because it's fully supported by the
> HTTP Content-Type header.
>
>  I use the following measures successfully with Tomcat:
>
>  Output: Make sure that the content sent to the browser actually is what it
> declares to be.
>
>  1. Use UTF-8 as the JSP contentType. The JSP Specification is not as
> precise on the effect of this setting as version 2.0 is: contentType defines
> the charset of the HTTP Response. (See JSP.2.10.2 The taglib Directive on p.
> 52 in the JSP Specification version 1.2, and JSP.1.10.2 on p. 48 in the JSP
> Specification version 2.0)
>  2. Make sure the Content-Type header in your HttpServletResponse is
> 'text/html; charset="UTF-8"'. (See
> http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.17)
>  3. If you create URLs in your HTML pages, use java.net.URLEncoder to make
> sure that non-ASCII values cannot get into the parameter values.
>  4. Optional: consider storing the JSP file itself as UTF-8 and using the
> pageEncoding directive to tell the JSP Processor about that.
>
>  If you have a servlet sending HTML with <meta http-equiv="contenttype"
> content="text/html; charset=UTF-8">, make sure you actually send UTF-8 data.
> If you use the ServletOutputStream directly, make sure you wrap it in a
> OutputStreamWriter initialized with the correct encoding. If you use
> HttpServletResponse.getWriter(), make sure you call setCharacterEncoding()
> before you call getWriter()!
>
>  However, this is only the output aspect: If this is right, the browser is
> able to correctly display the UTF-8 data.
>
>  Input: The part most poeple forget is to take care that input data sent
> back by the browser correct.
>
>  1. Use the attribute "accept-charset" in your HTML form tag: <form ...
> accept-charset="UTF-8">. This tells the browser to send UTF-8 data. (See
> http://www.w3.org/TR/html4/interact/forms.html#adef-accept-charset).
> This requires a compliant browser and is no guarantee that it will work but
> it does with current browsers.
>  2. If you use Tomcat, use the attribute URIEncoding="UTF-8" in the
> Connector element in your server.xml. This makes UTF-8 input also work for
> GET parameters (see 3. above).
>
>
>  Please correct me if I am wrong somewhere.
>
>  Christoph
>
>
>
>
>
> _______________________________________________
> This mail is sent to you from the opencms-dev mailing list
> To change your list options, or to unsubscribe from the list, please visit
> http://lists.opencms.org/mailman/listinfo/opencms-dev
>
>



More information about the opencms-dev mailing list