<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
<font face="Helvetica, Arial, sans-serif">Hi Joe and fellow list
readers,<br>
<br>
in the Tomcat implementation of the Java Servlet Specification, the
first call to HttpServletRequest.getParameters() has the side effect
that the GET and POST data is parsed with the encoding in effect for
the HttpServletRequest at that time. The Java Servlet Specification
defines ISO-8859-1 to be the default encoding. The Tomcat
implementation caches the result of that operation after the first call
and does not reparse them when
HttpServletRequest.setCharacterEncoding() is called. <br>
<br>
Joe, I could imagine that your log settings cause a call to
HttpServletRequest.getParameters() or getParameter(). If that's the
case your call to HttpServletRequest.setCharacterEncoding() has
absolutely no effect because it is made too late.<br>
</font><font face="Helvetica, Arial, sans-serif"><br>
<br>
There are two aspects which make Unicode handling difficult with HTTP.
First, there is no way the server can tell clients the expected input
charset. This is a logical consequence of the stateless nature of HTTP.
But secondly, there is no way the client can tell a server the charset
of the request data which is quite unfortunate. Servers always have to
guess or rely.</font><font face="Helvetica, Arial, sans-serif"> IMO
this is where the specification fails.</font><br>
<font face="Helvetica, Arial, sans-serif"><br>
</font><font face="Helvetica, Arial, sans-serif">To fully support
UTF-8, you
have to take care to get data output and input right. Getting output
right is easier because it's fully supported by the HTTP Content-Type
header.<br>
<br>
</font><font face="Helvetica, Arial, sans-serif">I use</font><font
face="Helvetica, Arial, sans-serif"> the following measures
successfully</font><font face="Helvetica, Arial, sans-serif"> with
Tomcat:</font><br>
<br>
<font face="Helvetica, Arial, sans-serif">Output: Make sure that the
content sent to the browser actually is what it declares to be.<br>
<br>
</font><font face="Helvetica, Arial, sans-serif">1. Use UTF-8 as the
JSP contentType. The JSP Specification is not as precise on the effect
of this setting as version 2.0 is: contentType defines the charset of
the HTTP Response. (See JSP.2.10.2 The taglib Directive on p. 52 in the
JSP Specification version 1.2, and JSP.1.10.2 on p. 48 in the JSP
Specification version 2.0)<br>
2. Make sure the Content-Type header in your HttpServletResponse is
'text/html; charset="UTF-8"'. (See
<a class="moz-txt-link-freetext" href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.17">http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.17</a>)<br>
3. If you create URLs in your HTML pages, use java.net.URLEncoder to
make sure that non-ASCII values cannot get into the parameter values.<br>
4. Optional: consider storing the JSP file itself as UTF-8 and using
the
pageEncoding directive to tell the JSP Processor about that.
<br>
<br>
</font><font face="Helvetica, Arial, sans-serif">If you have a servlet
sending
HTML with <meta http-equiv="contenttype" content="text/html;
charset=UTF-8">, make sure you actually send UTF-8 data. If you use
the ServletOutputStream directly, make sure you wrap it in a
OutputStreamWriter initialized with the correct encoding. If you use
HttpServletResponse.getWriter(), make sure you call
setCharacterEncoding() before you call getWriter()!<br>
</font><br>
<font face="Helvetica, Arial, sans-serif">However, this is only the
output aspect: If this is right, the browser is able to correctly
display the UTF-8 data. <br>
<br>
</font><font face="Helvetica, Arial, sans-serif">Input: </font><font
face="Helvetica, Arial, sans-serif">The part most poeple forget is to
take care that input data sent back by the browser correct.<br>
</font><font face="Helvetica, Arial, sans-serif"><br>
1. Use the attribute "accept-charset" in your HTML form tag: <form
... accept-charset="UTF-8">. This tells the browser to send UTF-8
data. (See
<a class="moz-txt-link-freetext" href="http://www.w3.org/TR/html4/interact/forms.html#adef-accept-charset">http://www.w3.org/TR/html4/interact/forms.html#adef-accept-charset</a>).
This requires a compliant browser and is no guarantee that it will work
but it does with current browsers.<br>
2. If you use Tomcat, use the attribute URIEncoding="UTF-8" in the
Connector element in your server.xml. This makes UTF-8 input also work
for GET parameters (see 3. above).<br>
<br>
<br>
Please correct me if I am wrong somewhere.<br>
<br>
</font><font face="Helvetica, Arial, sans-serif">Christoph<br>
</font><br>
<pre class="moz-signature" cols="80"><font
face="Helvetica, Arial, sans-serif">
</font></pre>
</body>
</html>