[opencms-dev] Problems with labels in workplace_ru.properties

Paul-Inge Flakstad flakstad at npolar.no
Thu Oct 1 12:40:03 CEST 2009


Hi Claus

Thank you very much for clarifying. 

In this discussion, it certainly is an important fact that Properties files in XML format is supported by newer versions of Java. And equally important is the detail; UTF-8 is the default encoding when handling the XML variant. I wasn't aware of any of this, until late last night when I read the javadoc.

What you suggest about modifying CmsResourceBundle is very interesting. I'm afraid it looks a bit complicated for me ATM, mostly due to lack of time - I already created a fix to our problem with reading russian labels, and it seems to be working just fine. And I can't say for sure if we actually _want_ to switch over to XML, digressing from the conventional OpenCms - and conventional Java - format. But I will certainly keep your notes, and perhaps look into it at a later time.

All the best,
Paul

> -----Original Message-----
> From: opencms-dev-bounces at opencms.org 
> [mailto:opencms-dev-bounces at opencms.org] On Behalf Of Claus Priisholm
> Sent: 1. oktober 2009 09:50
> To: The OpenCms mailing list
> Subject: Re: [opencms-dev] Problems with labels in 
> workplace_ru.properties
> 
> The Properties specification allows for an XML variant that 
> by default 
> use UTF-8 (loadFromXML() - 
> http://java.sun.com/javase/6/docs/api/java/util/Properties.html).
> 
> Unfortunately the Properties.load method does not try to determine if 
> the content of the stream it is given is xml or old style property 
> format, so you will have to decide whether to use load() or 
> loadFromXML() somewhere before loading the properties. This also has 
> implications for the ResourceBundle class as it does not load XML 
> formatted properties. At least you can give it your own 
> controller as of 
> Java 1.6, but for older versions it is a problem if you use the 
> ResourceBundle class.
> 
> Luckily there is the CmsResourceBundleLoader class that seems to 
> centralize loading of properties in OpenCms, so it is likely that one 
> could patch that class to provide support for loading files 
> on either of 
> the two formats.
> 
> Best Regards
> Claus
> 
> 
> Paul-Inge Flakstad wrote:
> > Hi Christian
> >  
> > It is indeed horrible, just imagine how much easier it 
> would have been 
> > to use utf-8 (or at least be able to specify the 
> encoding)..! No idea 
> > what Sun has been thinking (or not thinking at all) all the 
> time they 
> > haven't addressed this issue. Yes, the native2ascii thing 
> is fine, but 
> > damn what a messy "solution" that is...
> >  
> > I agree that the format of the .properties files have a 
> nice simplicity 
> > to them, and I don't see anything wrong with them other than the 
> > encoding issue. But of course, XML is a strong contender if 
> there's ever 
> > going to be a change.
> >  
> > However, that would require a lot more re-writing than just 
> switching 
> > over to UTF-8 while keeping the format, I would assume 
> (since you'd have 
> > to change both the format and the encoding)? As far as I 
> can tell from 
> > what (little) I've read on the subject, UTF-8 is largely "backwards 
> > compatible" with ISO-8859-1, so in my mind it seems like a pretty 
> > straight-forward swap for Alkacon - if they ever wanted 
> > to. Eventually, I guess it's up to Sun if they want to change the 
> > specification. I just thought it would be cool if Alkacon 
> stepped up and 
> > set an example, OpenCms being a multilanguage CMS and all.
> >  
> > I was considering whether or not to extend/modify the 
> OpenCms core in 
> > order to read .properties-values as UTF-8 (probably something like 
> > you've done already), but for now, my method is sufficient, 
> so I think 
> > I'll put that idea on ice for now.
> >  
> > Thanks for your reply, comforting to know I'm not the only 
> one who's 
> > been struggling with the properties encoding.
> >  
> > All the best,
> > Paul
> > 
> >     
> --------------------------------------------------------------
> ----------
> >     *From:* opencms-dev-bounces at opencms.org
> >     [mailto:opencms-dev-bounces at opencms.org] *On Behalf Of 
> *Christian
> >     Steinert
> >     *Sent:* 30. september 2009 18:55
> >     *To:* The OpenCms mailing list
> >     *Subject:* Re: [opencms-dev] Problems with labels in
> >     workplace_ru.properties
> > 
> >     Paul-Inge Flakstad wrote:
> >>     I was a bit mistaken in my last post. By 
> specification, all .properties files are Latin-1 encoded, and 
> when loading from (or saving to) a stream, ISO-8859-1 
> character encoding is used. All characters that cannot be 
> represented in this encoding must be Unicode escaped.
> >>
> >>     This is probably old news for the more experienced 
> developers, but to me, it's news. I'm glad to finally have 
> learned the cause of my "mysterios" encoding problem, but 
> bewildered adn confounded by the facts... Why why WHY???
> >>       
> >     At some point I hat also stumbled over this one - 
> horrible, isn't
> >     it? In the end I just wrote my own property file loader 
> method that
> >     pushes the property data string through the native2ascii
> >     implementation from GNU Classpath and then loads the resulting
> >     property bundle.
> > 
> >     I don't know why Sun never fixed the property file spec 
> by adding an
> >     optional encoding signature, but somehow they never did.
> >>     Further reading: 
> http://www.thoughtsabout.net/blog/archives/000044.html
> >>
> >>     Guys at Alkacon: Any possibility that, in a future 
> release, you would consider disregarding the specification 
> for .properties files, and use UTF-8 instead? In my opinion, 
> OpenCms would become more user-friendly if you did - at least 
> when dealing with multilanguage sites that should support use 
> of characters not specified in the Latin-1 set.
> >>       
> >     Maybe it would be best to switch to XML-based resource 
> bundles at
> >     some point. I like the simplicity of the old-fashioned property
> >     format, but since their encoding behavior is definded like that,
> >     it's maybe best to leave them aside rather than bending the spec
> >     oficially.
> > 
> >     Best Regards
> >     Christian
> > 
> > 
> > 
> >>     Cheers,
> >>     Paul
> >>
> >>       
> >>>     -----Original Message-----
> >>>     From: opencms-dev-bounces at opencms.org 
> >>>     [mailto:opencms-dev-bounces at opencms.org] On Behalf Of 
> >>>     Paul-Inge Flakstad
> >>>     Sent: 30. september 2009 15:42
> >>>     To: The OpenCms mailing list
> >>>     Subject: Re: [opencms-dev] Problems with labels in 
> >>>     workplace_ru.properties
> >>>
> >>>     Self-replying :)
> >>>
> >>>     Given that my assumptions are correct: 
> >>>     The workplace_xx.properties files are read during the 
> >>>     workplace initialization, using the default encoding of the 
> >>>     JVM, which typically depends upon the locale and charset of 
> >>>     the underlying operating system.
> >>>
> >>>     In my case, the workplace_ru.properties file is read as 
> >>>     ISO-8859-1, and as a result, no strings fetched using 
> >>>     CmsJspActionElement#label(String) make any sense - 
> it's all gibberish.
> >>>
> >>>     The solution is something along the lines of this:
> >>>
> >>>         public String labelUnicode(String key) {
> >>>             String jvmDefaultCharsetName = 
> >>>     Charset.defaultCharset().displayName();
> >>>             try {
> >>>                 return new 
> >>>     
> String(this.label(key).getBytes(jvmDefaultCharsetName), "UTF-8");
> >>>             } catch (java.io.UnsupportedEncodingException e) {
> >>>                 return new String("[Default label: " + 
> >>>     this.label(key) + "]");
> >>>             }
> >>>         }
> >>>
> >>>     This seems to be working just perfectly. I don't have to 
> >>>     think about character encoding since utf-8 is the default 
> >>>     encoding all over OpenCms (I can just leave the 
> >>>     "content-encoding" property blank), and I can even mix 
> >>>     special characters from different languages all in one 
> >>>     .properties file.
> >>>
> >>>     I would even propose to add a method, like the one suggested 
> >>>     above, to CmsJspActionElement. (I'm pretty sure I attempted 
> >>>     every single possibility within OpenCms to get the correct 
> >>>     strings from my workplace_ru.properties returned, using the 
> >>>     "standard" label(String), but never got anything but strange 
> >>>     symbols. If someone for some reason _needs_ to have their JVM 
> >>>     default encoding set to ISO-8859-1, while at the same time 
> >>>     supporting a multilanguage OpenCms system, there seems to be 
> >>>     no method native to OpenCms that enables getting correct 
> >>>     russian (for example) strings from the 
> workplace_xx.properties files.)
> >>>
> >>>     Cheers,
> >>>     Paul
> >>>
> >>>     PS: I know I should propably set the JVM default encoding 
> >>>     manually to UTF-8 instead, but I'm unsure of any possible 
> >>>     side-effects. So until then, this is a pretty decent 
> workaround.
> >>>
> >>>
> >>>         
> >>>>     -----Original Message-----
> >>>>     From: opencms-dev-bounces at opencms.org 
> >>>>     [mailto:opencms-dev-bounces at opencms.org] On Behalf Of 
> >>>>     Paul-Inge Flakstad
> >>>>     Sent: 29. september 2009 22:57
> >>>>     To: The OpenCms mailing list
> >>>>     Subject: [opencms-dev] Problems with labels in 
> >>>>           
> >>>     workplace_ru.properties
> >>>         
> >>>>     Hi all
> >>>>
> >>>>     In one of our multilanguage sites, Russian and English 
> >>>>     content is mixed. Everything's working as expected (since 
> >>>>     we're using UTF-8 encoding), but all the labels read from 
> >>>>     workplace_ru.properties, using 
> >>>>     CmsJspActionElement#label(String), is just gibberish... 
> >>>>
> >>>>     As a workaround, I created my own label(String, Locale) 
> >>>>     method that does nothing more than simply read the value 
> >>>>     straight out from the workplace_ru.properties file. When 
> >>>>     using this method to access the labels, everything 
> is OK, but 
> >>>>     the .properties file is "parsed" upon each invocation, so 
> >>>>     it's not desirable to keep using it.
> >>>>
> >>>>     I've tried this:
> >>>>     Set the HTML charset to utf-8.
> >>>>     Set the JSP pageEncoding to utf-8. 
> >>>>     Set the OpenCms <defaultcontentencoding> to utf-8.
> >>>>     I also checked the HTTP response header, it also says utf-8.
> >>>>
> >>>>     Also, I been experimenting with different constellations of 
> >>>>     encodings (including Cyrillic iso-8859-5), but to no avail.
> >>>>
> >>>>     Can anyone please provide some insight?
> >>>>
> >>>>     (Just so there's no mistake, I'm reading the labels from the 
> >>>>     .properties file to use them as text on a web-page, not in 
> >>>>     the OpenCms workplace. Things like "Photo:", "Published by " 
> >>>>     and alike.)
> >>>>
> >>>>     Cheers,
> >>>>     Paul
> >>>>
> >>>>     _______________________________________________
> >>>>     This mail is sent to you from the opencms-dev mailing list
> >>>>     To change your list options, or to unsubscribe from 
> the list, 
> >>>>     please visit
> >>>>     http://lists.opencms.org/mailman/listinfo/opencms-dev
> >>>>
> >>>>           
> >>>     _______________________________________________
> >>>     This mail is sent to you from the opencms-dev mailing list
> >>>     To change your list options, or to unsubscribe from the list, 
> >>>     please visit
> >>>     http://lists.opencms.org/mailman/listinfo/opencms-dev
> >>>
> >>>         
> >>
> >>     _______________________________________________
> >>     This mail is sent to you from the opencms-dev mailing list
> >>     To change your list options, or to unsubscribe from 
> the list, please visit
> >>     http://lists.opencms.org/mailman/listinfo/opencms-dev
> >>
> >>       
> > 
> > 
> > 
> --------------------------------------------------------------
> ----------
> > 
> > 
> > _______________________________________________
> > This mail is sent to you from the opencms-dev mailing list
> > To change your list options, or to unsubscribe from the 
> list, please visit
> > http://lists.opencms.org/mailman/listinfo/opencms-dev
> 
> -- 
> Claus Priisholm, CodeDroids ApS
> Phone: +45 48 22 46 46
> cpr (you know what) codedroids.com - http://www.codedroids.com
> cpr (you know what) interlet.dk - http://www.interlet.dk
> -- 
> Javadocs and other OpenCms stuff: 
> http://www.codedroids.com/community/opencms
> 
> _______________________________________________
> This mail is sent to you from the opencms-dev mailing list
> To change your list options, or to unsubscribe from the list, 
> please visit
> http://lists.opencms.org/mailman/listinfo/opencms-dev
> 


More information about the opencms-dev mailing list