Locale Extensions
What can be done
Extensions allow to add additional information to a Locale as also described in the Java Doc:The Locale class implements IETF BCP 47 which is composed of RFC 4647 "Matching of Language Tags" and RFC 5646 "Tags for Identifying Languages" with support for the LDML (UTS#35, "Unicode Locale Data Markup Language") BCP 47-compatible extensions for locale data exchange.Now what you can do, is adding extension tags as follows:
b.setExtension('x', "myExt-myCal-myCur");
But there are some things to be aware of:
- extensions are not case sensitive (The JDK Locale converts them all implicitly to lower case!)
- there is no defined order of extension tags, so dont rely on!
- tags can be separated by '-' or '_' (the standard requires '-', Java accepts both as input, but then translates all '_' to '-').
- valid characters for tags are restricted to [a-z][A-Z][0-9], so there are no special characters like '?' or '=' or similar possible (Java checks this).
- tags are a minimum of 2 characters long (this is also checked by the JDK)
- tags are a maximum of 8 characters long (this is also checked by the JDK)
- each extension is identified by a singleton character (not a digit)
So all the following inputs are accepted by the JDK:
b.setExtension('x', "mi");b.setExtension('a', "maxmaxma");b.setExtension('b', "de-US");
b.setExtension('d', "aa1-bb2_cc3_dd4");
Strange Behavior
Some days ago I played around with Locale extensions (JDK 7/8):Locale.Builder b = new Locale.Builder();// b.setRegion("DE");// b.setLanguage("de");b.setExtension('x', "gr2-spPrepen-nldeDE");System.out.println("Locale: " + b.build());System.out.println("Locale's extension: " + b.build().getExtension('x'));
The outputput is a bit surprising (the extension does NOT appear in the toString-output):
> Locale: > Locale's extension: gr2-spprepen-nldede
At a first glue this seem to be a bug, but when reading the spec in http://tools.ietf.org/html/rfc5646#page-16, especially section 2.2.6:
An extension MUST follow at least a primary language subtag.
That is, a language tag cannot begin with an extension.
this can be a hint, why this behaves as shown above, though, if a Locale is invalid, then it should not be possible to create/build it...
Now, when setting a language in our example with:
b.setLanguage("de");
the toString() result now seem to be correct:
> Locale: de__#x-gr2-spprepen-nldede> Locale's extension: gr2-spprepen-nldede
The same applies, when setting a region only...
> Locale: _DE_#x-gr2-spprepen-nldede> Locale's extension: gr2-spprepen-nldede
...or when setting both, a region and a language, also of the output is as expected:
> Locale: de_DE_#x-gr2-spprepen-nldede> Locale's extension: gr2-spprepen-nldede
Finally I also was trying some special inputs based on the constraints defined by the specification, and I was able to create other invalid Locale instances realtively easily:
So be careful, when using the Locale extension mechanism. I will also post this to the i18n colleagues at OpenJDK, I am wondering what they think...
- b.setExtension('c', "de-DE"); will be converted to c-de-de, which is invalid since de is duplicated in the final representation (but required to be unique).
- b.setExtension('c', "c-de"); will be converted to c-c-de-de, which is invalid since the extension singleton c is duplicated in the final representation (but required to be unique).
- b.setExtension('c', "x-de"); will be converted to c-x-de-de, which is invalid since an extension singleton must contain some tags, which is not the case for c-x-de, which in this case is the final representation.