Home Guidelines Datetime Currency i18n Problems Practical Bits Quality

Some Practical Bits and Pieces

 

This page has some practical notes and short examples and instructions about general topics and ui resource handling.

 

 

 

XLIFF resources as replacement for Java .properties and ListResourceBundles

For several years now, Java has a ResourceBundle.Control which provides you with so much flexibility that I sometimes shake my head at the reluctance to adopt it.  The XLIFF resources format, originally intended as an interchange format, has been adopted quite widely and can even be used as a drop-in replacement for Java Properties and ListResourceBundle (LRB).

For the sake of full disclosure: Yes, I am aware that no "real objects", as the javdoc of ListResourceBundle calls them, can be in the LRB if you want to use XLIFF as a replacement. But then, finding “real objects” in resource bundles has been a pet peeve of mine for 20 years and is one of the low points of internationalization architecture. And yes, the much touted ability to catch potential runtime errors through the fact that you have to compile LRBs before use is another pet peeve of mine.

But how do you use an XLIFF or any other xml as a replacement?

For your convenience,  an implementation of XMLResourceBundleControl (extends  ResourceBundle.Control) is right here. And a matching XLIFF reader is here.

 

Play with it and thank the Java folks for a very useful feature.  There have been two main objections with regard to this class. The first one was that the fallback mechanism did not work or was hard. That is a misconception, fallback works. The second, more justified one is performance and footprint. For very small applications in the mobile or embedded areas, yes, you will want to consider this. But for anything else, especially those resource hogging enterprise suites, I have only one comment: Let us assume, hypothetically, that we have a large application and that loading of a resource file takes an additional few milliseconds. Now, if someone tells that this is an unacceptable drop in performance, my answer will be “please find the true performance issues in the base code before you dismiss the use of ResourceBundle.Control”.

 

Using an interface to call (text) resources

Standardization of resource strings and ease of use in source code continue to be big issues. Some improvements, such as using a central database and having gatekeepers who must approve new strings, have been more or less successful. One contribution to a solution for big projects is the use of interfaces to declare constansts that can then be called easily.

How to do this:
1. Create an Interface and declare strings as public static final, for example, public static final String NO_USER_LOGIN = "NO_USER_LOGIN";

2. If you use LRBs, add an entry like this:
    { BaseConfigMsgID.NO_USER_LOGIN, "No valid login found for user {0}" }, where  BaseConfigMsgID is your Interface.

   If you use a .properties bundle, add your message like this:

    NO_USER_LOGIN="No valid login found for user {0}"  

 

The code that calls the message is identical and goes like this: bundle.getString(BaseConfigMsgID.NO_USER_LOGIN);

 

The beauty of this, in my view, is that developers can easily reference the correct message through autocomplete in modern development environments and even more so the fact that you can switch between LRBs and Properties bundles so easily.

 

Location, location, location

The standard real estate phrase also applies to Java software real estate.  An astounding number of programs insist on packaging resources in the same .jar/.war as the code.  This is partly because of IDE defaults, partly because of inadequate research into lookup and loading.

Yet, there is absolutely no need to do this. On the contrary, if you separate them out, you have already made a huge step to facilitate adding new languages to an existing program.

Java web applications will easily load resources from jars in the lib directory. But, since jars, wars, ears are "virtual file systems", you can really put them anywhere. This simple jsf snippet will work with resources in a jar in the webapp lib directory.  Stick the resources into the lib folder, and then all you need in the ui code is a short snippet, shown here for jsfs:

 

<f:view  locale="ja">

   <f:loadBundle basename="myapp.mycomponent.msg.IntroMsg" var="msg"/>
      <h1><h:outputText value="#{msg.ABORTED}" /></h1>
</f:view>

It does not get much better or, big development dream, much more elegant.

 

Plurals and gender issues

Attempts have been made to provide ways to handle more complex formatting operations, for instance, plural forms, ranges of numbers, etc. You should avoid these because of some inherent issues with translation. Stick to the simplest possible way of doing resources with placeholders. Mozilla has examples for using PluralForms, but plurals are not the only aspect of a language. You will find gender questions, questions of case and other choices, such as agreement between different parts of speech.
What is wrong with the following?

Issue: You cannot go from the English with its two forms to the Polish with three by translation. Translators will not add or remove items. They translate what is there.
Issue: The approach above turns a properties file into a nested file by adding a secondary delimiter. Asian translators can and will frequently replace the ASCII semicolon with an ideographic one (outside of the ASCII range, showing in the translation as \uxxxx).

Solution: Use more than one key=value pair. Never ever chain values together as shown above.

 

This way or that way (bidi)

The good news is that ADF and other packages will take care of bi-directional display for you. The bad news is: challenges remain.
But there is an excellent tutorial by the W3C folks at
Creating HTML Pages in Arabic, Hebrew and Other Right-to-left Scripts.
 

A usage based view of xml

Those of you who had the pleasure of working with SGML markup long before html and xml came around, can skip this section.

 

While many discussions of xml over the years have focused on issues of “the spec” versus “the implementation”,  I have found it useful to distinguish between two usage scenarios.

 

Scenario 1 data exchange between applications or no transformation.
If xml is used exclusively to transfer data, for example, in SOAP or as an export/import format, few constraints apply to parsing and formatting. For example, you can very well ignore "pretty" settings that produce nice indentation. While I am wary of certain implementations when it comes to usage described in Scenario 2,  you can also safely use standards-compliant but weird implementations like the Oracle XMLPrintDriver class in the v2 parser in the pure data exchange scenario.

 

Scenario 2: human interaction or transformation.
If there is any chance that humans will interact with the xml source, or if some transformation is performed, you should add the niceties and some hints that are not strictly required for parsing or processing xml.  The general guideline for the treatment of XML that is sometimes (or all the time) destined for direct human consumption or further processing is: no gratuitous changes.

 

Common examples include:

 

Outputting xml without an xml parser

We have all done it at some point, especially in the earlier days when parser implementations were often unwieldy and sucked. We built “xml” as strings in memory and wrote them out using simple file output with an OutputStream. The XML never saw a parser or supporting classes (like a DcoumentBuilder).

The problem with this is that this approach works until the stream is no longer just ASCII, at which time everything falls apart.

So, do yourself a favor and tell the scrum masters that they do want controlled input and output. I, for one, will be forever grateful to the folks who wrote the JDOM package because it was more then adequate for many use cases.

 

Naming your files

Another seemingly trivial aspect which merits enforcement is paying attention to the naming of files as they are created. Unfortunately, IDEs provide dafault names too easily accepted, for example “resource.properties” or “labels.java”.  That is enough for the IDE to keep track of strings and to know where it can find them. But just image how one team felt when they saw that their great new product had all the ui strings and messages in some 30 files that all had the name “labels.properties”.  If you run into such a problem and your project manager or dev manager has other things to do than make coders change the filenames and the calling code, it will be you as the internationalization or globalization people who will suffer for years to come.  You will soon be sick and tired of having to dig through files because a bug is filed against one “labels.properties”, and you will have to find out which one.

Css, a matter of style

For any browser based rendering, use external css stylesheets. They can be easily adapted to internationalization needs. For example, the display font size for most Asian language needs to be slightly larger than for Western languages, usually an increase of 1pt is enough. But since you have them defined externally, it is easy to change that, too.
Issue: I don't know how many stylesheets I need, one per locale?
Solution: One per language is enough. In a more colloquial way of saying it: Austrians and Swiss will ge happy with the same font size. Depending on your specific ui (how cluttered, how busy), you may very well be able to use a single stylesheet for most or all of your supported languages.

 

 

Default display language

It is a simple question, what do you show on a screen if you are asked to display Chinese,  or Spanish, or English? It has to do with the different versions of language and script. An easy way to try and settle this is to look at English: we have come to pretty much accept to see the American version of this language in software.  Since politics has a way of making life hard, I argue with numbers. There are some 1,2 billion users of “Simplified Chinese” versus some 150 million of “Traditional Chinese”. So, in terms of the base language, I will fight to display “Simplified Chinese” if all that matters of a locale is the language part (in this case “zh”). And my argument for Portuguese is the same, “pt” will have Brazilian text.  If you need the version used is Portugal, you use pt-pt.  

This argument will generally win unless:

Someone has a politico-cultural war to fight. I have personally experienced a bad fight over “Chinese” because the internationalization chiefs all came from “Traditional” backgrounds. To this day, it was the dirtiest fight of two letters I have seen.

Sales numbers are held against you. If the sales folks say “we have no customers in mainland China, so we want  Chinese to mean Traditional”, your best hope is that you can convince development to keep the design open for the future.

 

Comment, comment, comment

Making sure that developers put comments into translatable files to explain meaning and usage as well as restrictions is a never-ending challenge.

My minimum requirement: if a string is not a complete sentence, explain it.  As a developer, you know what the label “Size” means but you cannot expect translators to know or to guess correctly.

Or, another example, you have a string “Test server”.  Is “test” a noun or a verb, are you talking about a server used as a “test server”, or are you trying to test  a server?

The term “comment” itself is also used with slightly different meaning, which caused me to introduce the terms “soft comment” and “hard comment”.

Hard comment is something like attributes in xml which are actually processing instructions or attributes, for example a “do not translate” instruction to a parser.

Soft comments are any comments that are for human consumption only, such as xml <-- comment here -- >.

It’s all one big game

Until now, I have stayed mum on games, preferring to deal with a more enterprise software theme. A theme that covers a lot of ground, including much of the recent fashion “social media”. To me, the Internet is social media. The blips and bumps of Facebook and Twitter, as important as they are, are really only a small subset of the one social medium that is the Internet.  Back to games. Games have faced internationalization challenges that make the enterprise world pale and boring.  Although, and this is where social media come in, the enterprise world is now seeing some of the same challenges, from voice and speech processing to videos and games.  Games by their nature have a tighter cultural integration than business software. Games still have a greater diversity of moving parts. One of the most fascinating aspects of games is the need to be able to test small, discrete components with end-user interaction.

 

 

Defaults will hurt you

The one aspect of any programming language that will bite you sooner or later is the unquestioned use of defaults. Whether date or time, opening files, parsing strings, or using the default language (and/or locale) that comes in from a web browser,  the design of way too many defaults is not geared towards internationalization. So, use PMD or something similar to catch defaults in your code and subject them to inspection.