[opencms-dev] newer version of JTidy for Opencms 6.21 [ fix for: "invalid XML character (Unicode: 0x0)" ]
Christian Steinert
christian_steinert at web.de
Fri Jun 30 15:14:56 CEST 2006
Hi,
Various people (including myself) faced the following issue after
upgrading to opencms 6.2.1 - Whe saving some in FCKedit the following
message occurred:
===============================
Error Unmarshalling xml document failed.
Reason: Error on line 31 of document : An invalid XML character
(Unicode: 0x0) was found in the CDATA section. Nested exception: An
invalid XML character (Unicode: 0x0) was found in the CDATA section.
==============================
I use opencms with UTF-8 (on tomcat 4.1/Mysql 4.0/java 1.5 and tomcat
5.0/Mysql 4.0/java 1.5), maybe UTF-8 has something do do with this problem.
The error report is correct: under certain conditions that I have not
precisely pinpointed, the JTidy library will add 0x00 characters to the
HTML code. The code gets inserted by Jtidy - this was very clear in the
debugger - yet *still* I did not have this problem when using the
HTMLarea editor instead.
For me the error had something to do with HTML entities that represent
special characters (for example fancy quotes like “ ” or
„). Directly after such entities, a NULL (0x00) character was
inserted by Jtidy, but I was not able to pinpoint the exact code
location where this happened.
Also, the problem was not there *every* time when one of these html
entities appeared. It seems to have something to do the *exact position*
of *some* html entities in the file. Maybe there is some problem, when
certain Html entities hit the end of some internal buffer within Jtidy -
but this is just a guess. (Because Jtidy is ported from C, it handles
character encodings by itself and does not just use Java-based string
processing).
I have temporarily uploaded a Jar file which is working for me to
http://www.berzinarchives.com/temp/jtidy
The Jar contains both the original source and the compiled classes.
I have downloaded the Jtidy source from the HEAD of their Subversion
repository. I have not changed *anything substantial* in the code. (I
have compared the code again to the original one that I had downloaded
from SVN, just to make sure). The only change I have made at all is an
additional safety check in class org.w3c.jtidy.OutJavaImpl
My code insertion is trivial and enclosed by the comment
//test berzinarchives
For quite a while now the library has worked for me in my development
environment. My additional check did not fire again after I had added
the new library and restarted tomcat completely.
So in short:
- If you run into this problem then you might want to try to download
this newer jar and put it into your opencms WEB-INF/libs folder.
- Don't forget to move the original jar OUT of this folder. There should
be only one version of Jtidy in your WEB-INF/libs folder
If it *doesn't* work for you, then I'm sorry.
But: I did not invent Jtidy, I have written down here all that I know
about this problem and after having wasted a lot of time on this problem
I still do not understand how Jtidy's nasty character handling works.
Hope that helps
Regards
Christian
More information about the opencms-dev
mailing list