-
-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Language localization #270
Comments
All looks like good assumptions There may be some complication in |
Chinese is actually an interesting case, because there quite a lot of entries in osm:
|
Here is a list of all OSM |
Update: The assumption that we have a database where the |
The rule proposed here are roughly implemented in this demo: |
This should also be the case in Hong Kong and a few other places due to mapping conventions. In these situations we could ignore If I look at Hong Kong in English -> the 2nd label should be Chinese |
Hong Kong is an interesting example because there are two languages (English, Chinese) and two scripts (Latin, Han). Let us assume for a moment that we have a database where up to local names of a city can be stored separately
The ordering of the names has a meaning, maybe number of people speaking the language or administrative/cultural use. If we had this dataset of listed names, we could do the following rule:
With this rule we would get the following for Hong Kong: English:
Chinese:
Russian:
|
Special cases for country and region (state) labels:
Do you mean to say the county and state labels would not be "stacked" by default, and the value of the "single line" label would follow the fallback chain in rule 1, like (modified):
Or do you mean to say no name would be displayed at all if the localization data is unavailable? Street labels: For street labels, are you proposing the labels be stacked or delimitated (concatenated)? Multiple alphabet option languages: Might be worth adding more details around the languages which can be represented in multiple alphabets. Does each writing system belong in a different fallback chain? Chinese For Chinese, Tilezen v1.9 added basic detection of Chinese simplified versus traditional for each name, and localized name key-value pairs and modified the tile output to better annotate that. The raw input data in OSM is sometimes quite messy. Overall: On the display side, our recommendations and best practices match and are exceeded by what @wipfli is already suggesting here, nice! The demo matches my expectations :) |
Thanks for your questions @nvkelso. Country labels should appear only in the target language. For example, if the target language is French, then the country labels should be "Allemagne", "Suisse", or "Autriche". So here no stacking is needed. From my experience, country labels have great language coverage so we should be able to not need a fallback chain at all. My preference would be to define a set of supported languages and then make sure that for each supported language we have a State labels are currently used too much in the Protomaps basemap in my opinion. In the US it might make sense to have them on the map because the country is huge and most states are huge too, but in smaller countries the state labels are not needed. I will open a separate issue at some point to propose to only have state/province labels for these countries: US, Canada, Mexico, Brazil, China, India, Australia. For now I propose to ignore the problem of state labels and treat them like city labels or like country labels, we can see which one works better. Street labels I honestly have not thought much about yet. Some languages use multiple scripts like for example Kazakh or Uzbek, however, there is very limited coverage in OSM for the different script (only around 1k names) and so I don't see the value at the moment of adding special logic for these languages. Japanese might have larger coverage and also uses multiple scripts, but so far my impression is that the sample code works quite well in Japan. I want to reach out to some Japanese friends and ask them for input. |
Thanks for asking! With the current tiles we have information in the
So you see that in the case of Zürich and Athens, where only one script is used in the name tag, we can build everything we want with MapLibre style expressions. However, if the name tag contains more than one script, like for example in Hong Kong, then we are a bit in a tricky situation. One option when we have
Another option when we have
Yet another option would be when we have
Tiles modificationQuestion to @nvkelso and @bdon: Do you think any of the above 3 options is good enough for now? If yes, I can start implementing the frontend styles. If no, I suggest we do a bit more thinking around how mixed-script I am leaning towards the second, i.e., breaking up |
I did some java prototyping for splitting the Here is the result (1.6 MB): https://github.com/wipfli/multi-script-names/blob/main/list.txt Overall I am quite happy with this segmentation. The data is We have to deal with some typos coming from confusion between similar looking letters in Cyrillic, Latin, and Greek. Also, sometimes Latin letters are used for numbering purposes so there we should not segment. Some numbers:
|
Regarding the entries that use 3 scripts, we have
Now do we want to support the languages that use these scripts? Because if we don't then, we can get away with 2 segments for the name tag, otherwise we will need 3. |
I made some tiles with segmented name tags. Here is a demo using MapLibre GL JS v4.5.0 with a style localized to Arabic: Moroccohttps://pub-cf7f11e26ace447db8f7215b61ac0eae.r2.dev/segment.html#map=8.87/33.7469/-7.1911 Note how Arabic is in the top line because it is localized to Arabic. In OSM, I think Arabic is mostly the last entry in Morocco. Hong Konghttps://pub-cf7f11e26ace447db8f7215b61ac0eae.r2.dev/segment.html#map=9.22/22.3113/114.2289 Athenshttps://pub-cf7f11e26ace447db8f7215b61ac0eae.r2.dev/segment.html#map=10.35/37.9577/23.7035 Note how it falls back to Cairohttps://pub-cf7f11e26ace447db8f7215b61ac0eae.r2.dev/segment.html#map=10.4/30.0417/31.2211 If the name is Arabic, only show the name. |
Currently, the basemap does not have any language localization capabilities. Country, state, and place names are taken from the OSM
name
tag and contain information in the local language or languages. For example, the country label of Germany is "Deutschland" whereas the country label for Italy is "Italia". In this Issue I would like to propose a scheme for displaying names localized to a specific user language which should make the basemap more accessible to a wider audience.Assumptions
Let us make the following assumptions:
name
contains the local name(s)using a single script.name:<language-code>
contains the name in a specific language.<language-code>
we know the script(s).<language-code>
s.Definitions
Proposed Supported Languages
Below is a proposed list of roughly 80 supported languages. The languages are grouped by script and some languages may use more than one script. Note that for some scripts such as Telugu or Khmer we need to create a positioned glyph font first.
The structure is:
Language:
<language-code>
, number of nodes/ways/relations inname:<language-code>
in OSM's taginfoLatin
Arabic
Cyrillic
Han
Devanagari
One Language Per Script
Proposed Rules
name
tag only if the script of the target language and the script of the name tag are the same.name
tag. In this case only show the label in the target language in a single line label.name
tag. In this case show two lines. First the target language, second thename
.Examples
Localized to English
Country example 1:
City Example 1:
Country Example 2:
City Example 2:
Localized to Greek
Country example 1:
City Example 1:
Country example 2:
City example 2:
The text was updated successfully, but these errors were encountered: