Codes for Languages (and language groups) of the World are covered by the ISO-639 standard
These standards provide letter codes for each language. E.g. ISO-639-3 provides a three-letter code for all living languages.
There are too many such codes to be contained in a java-enum (e.g. https://github.com/TakahikoKawasaki/nv-i18n/blob/master/src/main/java/com/neovisionaries/i18n/LanguageAlpha3Code.java is just not complete)
This package has the tab seperated files provided by https://iso639-3.sil.org/, and java classes to read this, and provide all language codes as java objects, with getters.
import org.meeuw.i18n.languages.*;
// get a language by its code;
Optional<LanguageCode> optional = ISO_639.getByPart3("nld");
LanguageCode languageCode = LanguageCode.languageCode("nl");
// show its 'inverted' name
System.out.println(languageCode.nameRecord(Locale.US).inverted());
// get a language family
Optional<LanguageFamilyCode> family = ISO_639.getByPart5("ger");
// get by any code
Optional<ISO_639_Code> byCode = ISO_639.get("nl");
// stream by names, language may have several names (dutch, flemish), and appear multiple times
ISO_639_Code.streamByNames().forEach(e -> {
System.out.println(e.getKey() + " " + e.getValue());
});
See also the test cases
link:src/test/java/org/meeuw/i18n/languages/test/LanguageCodeTest.java[role=include]
LanguageCode#getByCode
will also support retired codes if possible. This means that the code of the returned object may be different:
// the 'krim' dialect (Sierra Leone) officially merged into 'bmf' (Bom-Kim) in 2017
assertThat(LanguageCode.getByCode("krm").get().getCode()).isEqualTo("bmf");
Sometimes we have to deal with systems which have their own versions of the standards. In these cases it is possible to register 'fall backs'.
E.g.
// Our partner uses the pseudo ISO-639-1 code 'XX' for 'no language'
// fall back to a proper Part 3 code.
try {
LanguageCode.registerFallback("XX", LanguageCode.languageCode("zxx"));
assertThat(ISO_639.iso639("XX").code()).isEqualTo("zxx");
} finally {
LanguageCode.resetFallBacks();
}
The language code is annotated with a JAXB annotation. It will serialize and deserialize to and from the code. The dependency on the annotation is optional.
The needed classes are also annotated by Jackson annotations, so they can be serialized to and from JSON.
LanguageCode
is serializable too, and ensures that on deserialization the same object for every language is returned. (And only the code is non-transient).
<1 |
developing/testing |
2023 |
|
1.x |
compabible with java 8, javax.xml, module-info java 11 |
||
1.0 |
2023-11-30 |
||
2.x |
java 11, jakarta.xml |
2024-01-28 |
jakarta mostly applies to the optional jaxb support (and to some - also optional - validation annotations) |
2.1 |
support for retired codes |
2024-02-11 |
|
2.2 |
migrated support for language code validation from i18n-regions |
2024-? |
|
3.0 |
Refactoring |
2024-3 |
Added enum for ISO-639-1 codes,
Made syntax forward compabible with records. So, getters like |
3.1 |
Refactoring |
2024-3 |
Support for ISO-639-5. Dropped the -3 from the artifact id. |