mirror of
https://github.com/mozilla/gecko-dev.git
synced 2024-12-17 04:11:16 +00:00
624de11cb1
spoof locale on datepicker to English when privacy.spoof_english == 2 Differential Revision: https://phabricator.services.mozilla.com/D19148 --HG-- extra : moz-landing-system : lando
596 lines
22 KiB
ReStructuredText
596 lines
22 KiB
ReStructuredText
.. role:: js(code)
|
|
:language: javascript
|
|
|
|
=================
|
|
Locale management
|
|
=================
|
|
|
|
A locale is a combination of language, region, script, and regional preferences the
|
|
user wants to format their data into.
|
|
|
|
There are multiple models of locale data structures in the industry that have varying degrees
|
|
of compatibility between each other. Historically, each major platform has used their own,
|
|
and many standard bodies provided conflicting proposals.
|
|
|
|
Mozilla, alongside with most modern platforms, follows Unicode and W3C recommendation
|
|
and conforms to a standard known as `BCP 47`_ which describes a low level textual
|
|
representation of a locale known as `language tag`.
|
|
|
|
A few examples of language tags: *en-US*, *de*, *ar*, *zh-Hans*, *es-CL*.
|
|
|
|
Locales and Language Tags
|
|
=========================
|
|
|
|
Locale data structure consists of four primary fields.
|
|
|
|
- Language (Example: English - *en*, French - *fr*, Serbian - *sr*)
|
|
- Script (Example: Latin - *Latn*, Cyrylic - *Cyrl*)
|
|
- Region (Example: United States - *US*, Canada - *CA*, Russia - *RU*)
|
|
- Variants (Example: Mac OS - *macos*, Windows - *windows*, Linux - *linux*)
|
|
|
|
`BCP 47`_ specifies the syntax for each of those fields (called subtags) when
|
|
represented as a string. The syntax defines the allowed selection of characters,
|
|
their capitalization, and the order in which the fields should be defined.
|
|
|
|
Most of the base subtags are valid ISO codes, such as `ISO 639`_ for
|
|
language subtag, or `ISO 3166-1`_ for region.
|
|
|
|
The examples above present language tags with several fields omitted, which is allowed
|
|
by the standard.
|
|
|
|
On top of that, a locale may contain:
|
|
|
|
- extensions and private fields
|
|
These fields can be used to carry additional information about a locale.
|
|
Mozilla currently has partial support for them in the JS implementation and plans to
|
|
extend support to all APIs.
|
|
- extkeys and grandfathered tags
|
|
Mozilla does not support these yet.
|
|
|
|
|
|
An example locale can be visualized as:
|
|
|
|
.. code-block:: javascript
|
|
|
|
{
|
|
"language": "sr",
|
|
"script": "Cyrl",
|
|
"region": "RU",
|
|
"variants": [],
|
|
"extensions": {},
|
|
"privateuse": [],
|
|
}
|
|
|
|
which can be then serialized into a string: **"sr-Cyrl-RU"**.
|
|
|
|
.. important::
|
|
|
|
Since locales are often stored and passed around the codebase as
|
|
language tag strings, it is important to always use an appropriate
|
|
API to parse, manipulate and serialize them.
|
|
Avoid `Do-It-Yourself` solutions which leave your code fragile and may
|
|
break on unexpected language tag structures.
|
|
|
|
Locale Fallback Chains
|
|
======================
|
|
|
|
Locale sensitive operations are always considered "best-effort". That means that it
|
|
cannot be assumed that a perfect match will exist between what the user requested and what
|
|
the API can provide.
|
|
|
|
As a result, the best practice is to *always* operate on locale fallback chains -
|
|
ordered lists of locales according to the user preference.
|
|
|
|
An example of a locale fallback chain may be: :js:`["es-CL", "es-ES", "es", "fr", "en"]`.
|
|
|
|
The above means a request to format the data according to the Chilean Spanish if possible,
|
|
fall back to Spanish Spanish, then any (generic) Spanish, French and eventually to
|
|
English.
|
|
|
|
.. important::
|
|
|
|
It is *always* better to use a locale fallback chain over a single locale.
|
|
In case there's only one locale available, a list with one element will work
|
|
while allowing for future extensions without a costly refactor.
|
|
|
|
Language Negotiation
|
|
====================
|
|
|
|
Due to the imperfections in data matching, all operations on locales should always
|
|
use a language negotiation algorithm to resolve the best available set of locales,
|
|
based on the list of all available locales and an ordered list of requested locales.
|
|
|
|
Such algorithms may vary in sophistication and number of strategies. Mozilla's
|
|
solution is based on modified logic from `RFC 5656`_.
|
|
|
|
The three lists of locales used in negotiation:
|
|
|
|
- **Available** - locales that are locally installed
|
|
- **Requested** - locales that the user selected in decreasing order of preference
|
|
- **Resolved** - result of the negotiation
|
|
|
|
The result of a negotiation is an ordered list of locales that are available to
|
|
the system, and the consumer is expected to attempt using the locales in the
|
|
resolved order.
|
|
|
|
Negotiation should be used in all scenarios like selecting language resources,
|
|
calendar, number formatting, etc.
|
|
|
|
Single Locale Matching
|
|
----------------------
|
|
|
|
Every negotiation strategy goes through a list of steps in an attempt to find the
|
|
best possible match between locales.
|
|
|
|
The exact algorithm is custom, and consists of a 6 level strategy:
|
|
|
|
::
|
|
|
|
1) Attempt to find an exact match for each requested locale in available
|
|
locales.
|
|
Example: ['en-US'] * ['en-US'] = ['en-US']
|
|
|
|
2) Attempt to match a requested locale to an available locale treated
|
|
as a locale range.
|
|
Example: ['en-US'] * ['en'] = ['en']
|
|
^^
|
|
|-- becomes 'en-*-*-*'
|
|
|
|
3) Attempt to use the maximized version of the requested locale, to
|
|
find the best match in available locales.
|
|
Example: ['en'] * ['en-GB', 'en-US'] = ['en-US']
|
|
^^
|
|
|-- ICU likelySubtags expands it to 'en-Latn-US'
|
|
|
|
4) Attempt to look for a different variant of the same locale.
|
|
Example: ['ja-JP-win'] * ['ja-JP-mac'] = ['ja-JP-mac']
|
|
^^^^^^^^^
|
|
|----------- replace variant with range: 'ja-JP-*'
|
|
|
|
5) Attempt to look for a maximized version of the requested locale,
|
|
stripped of the region code.
|
|
Example: ['en-CA'] * ['en-ZA', 'en-US'] = ['en-US', 'en-ZA']
|
|
^^^^^
|
|
|----------- look for likelySubtag of 'en': 'en-Latn-US'
|
|
|
|
6) Attempt to look for a different region of the same locale.
|
|
Example: ['en-GB'] * ['en-AU'] = ['en-AU']
|
|
^^^^^
|
|
|----- replace region with range: 'en-*'
|
|
|
|
Filtering / Matching / Lookup
|
|
-----------------------------
|
|
|
|
When negotiating between lists of locales, Mozilla's :js:`LocaleService` API
|
|
offers three language negotiation strategies:
|
|
|
|
Filtering
|
|
^^^^^^^^^
|
|
|
|
This is the most common scenario, where there is an advantage in creating a
|
|
maximal possible list of locales that the user may benefit from.
|
|
|
|
An example of a scenario:
|
|
|
|
.. code-block:: javascript
|
|
|
|
let requested = ["fr-CA", "en-US"];
|
|
let available = ["en-GB", "it", "en-ZA", "fr", "de-DE", "fr-CA", "fr-CH"];
|
|
|
|
let result = Services.locale.negotiateLanguages(requested, available);
|
|
|
|
result == ["fr-CA", "fr", "fr-CH", "en-GB", "en-ZA"];
|
|
|
|
In the example above the algorithm was able to match *"fr-CA"* as a perfect match,
|
|
but then was able to find other matches as well - a generic French is a very
|
|
good match, and Swiss French is also very close to the top requested language.
|
|
|
|
In case of the second of the requested locales, unfortunately American English
|
|
is not available, but British English and South African English are.
|
|
|
|
The algorithm is greedy and attempts to match as many locales
|
|
as possible. This is usually what the developer wants.
|
|
|
|
Matching
|
|
^^^^^^^^
|
|
|
|
In less common scenarios the code needs to match a single, best available locale for
|
|
each of the requested locales.
|
|
|
|
An example of this scenario:
|
|
|
|
.. code-block:: javascript
|
|
|
|
let requested = ["fr-CA", "en-US"];
|
|
let available = ["en-GB", "it", "en-ZA", "fr", "de-DE", "fr-CA", "fr-ZH"];
|
|
|
|
let result = Services.locale.negotiateLanguages(
|
|
requested,
|
|
available,
|
|
undefined,
|
|
Services.locale.langNegStrategyMatching);
|
|
|
|
result == ["fr-CA", "en-GB"];
|
|
|
|
The best available locales for *"fr-CA"* is a perfect match, and for *"en-US"*, the
|
|
algorithm selected British English.
|
|
|
|
Lookup
|
|
^^^^^^
|
|
|
|
The third strategy should be used in cases where no matter what, only one locale
|
|
can be ever used. Some third-party APIs don't support fallback and it doesn't make
|
|
sense to continue resolving after finding the first locale.
|
|
|
|
It is still advised to continue using this API as a fallback chain list, just in
|
|
this case with a single element.
|
|
|
|
.. code-block:: javascript
|
|
|
|
let requested = ["fr-CA", "en-US"];
|
|
let available = ["en-GB", "it", "en-ZA", "fr", "de-DE", "fr-CA", "fr-ZH"];
|
|
|
|
let result = Services.locale.negotiateLanguages(
|
|
requested,
|
|
available,
|
|
Services.locale.defaultLocale,
|
|
Services.locale.langNegStrategyLookup);
|
|
|
|
result == ["fr-CA"];
|
|
|
|
Default Locale
|
|
--------------
|
|
|
|
Besides *Available*, *Requested* and *Resolved* locale lists, there's also a concept
|
|
of *DefaultLocale*, which is a single locale out of the list of available ones that
|
|
should be used in case there is no match to be found between available and
|
|
requested locales.
|
|
|
|
Every Firefox is built with a single default locale - for example
|
|
**Firefox zh-CN** has *DefaultLocale* set to *zh-CN* since this locale is guaranteed
|
|
to be packaged in, have all the resources, and should be used if the negotiation fails
|
|
to return any matches.
|
|
|
|
.. code-block:: javascript
|
|
|
|
let requested = ["fr-CA", "en-US"];
|
|
let available = ["it", "de", "zh-CN", "pl", "sr-RU"];
|
|
let defaultLocale = "zh-CN";
|
|
|
|
let result = Services.locale.negotiateLanguages(requested, available, defaultLocale);
|
|
|
|
result == ["zh-CN"];
|
|
|
|
Chained Language Negotiation
|
|
----------------------------
|
|
|
|
In some cases the user may want to link a language selection to another component.
|
|
|
|
For example, a Firefox extension may come with its own list of available locales, which
|
|
may have locales that Firefox doesn't.
|
|
|
|
In that case, negotiation between user requested locales and the add-on's list may result
|
|
in a selection of locales superseding that of Firefox itself.
|
|
|
|
|
|
.. code-block:: none
|
|
|
|
Fx Available
|
|
+-------------+
|
|
| it, fr, ar |
|
|
+-------------+ Fx Locales
|
|
| +--------+
|
|
+--------------> | fr, ar |
|
|
| +--------+
|
|
Requested |
|
|
+----------------+
|
|
| es, fr, pl, ar |
|
|
+----------------+ Add-on Locales
|
|
| +------------+
|
|
+--------------> | es, fr, ar |
|
|
Add-on Available | +------------+
|
|
+-----------------+
|
|
| de, es, fr, ar |
|
|
+-----------------+
|
|
|
|
|
|
In that case, an add-on may end up being displayed in Spanish, while Firefox UI will
|
|
use French. In most cases this results in a bad UX.
|
|
|
|
In order to avoid that, one can chain the add-on negotiation and take Firefox's resolved
|
|
locales as a `requested`, and negotiate that against the add-ons' `available` list.
|
|
|
|
.. code-block:: none
|
|
|
|
Fx Available
|
|
+-------------+
|
|
| it, ar, fr |
|
|
+-------------+ Fx Locales (as Add-on Requested)
|
|
| +--------+
|
|
+--------------> | fr, ar |
|
|
| +--------+
|
|
Requested | | Add-on Locales
|
|
+----------------+ | +--------+
|
|
| es, fr, pl, ar | +-------------> | fr, ar |
|
|
+----------------+ | +--------+
|
|
|
|
|
Add-on Available |
|
|
+-----------------+
|
|
| de, es, ar, fr |
|
|
+-----------------+
|
|
|
|
Available Locales
|
|
=================
|
|
|
|
In Gecko, available locales come from the `Packaged Locales` and the installed
|
|
`language packs`. Language packs are a variant of web extensions providing just
|
|
localized resources for one or more languages.
|
|
|
|
The primary notion of which locales are available is based on which locales Gecko has
|
|
UI localization resources for, and other datasets such as internationalization may
|
|
carry different lists of available locales.
|
|
|
|
Requested Locales
|
|
=================
|
|
|
|
The list of requested locales can be read and set using :js:`LocaleService::requestedLocales` API.
|
|
|
|
Using the API will perform necessary sanity checks and canonicalize the values.
|
|
|
|
After the sanitization, the value will be stored in a pref :js:`intl.locale.requested`.
|
|
The pref usually will store a comma separated list of valid BCP47 locale
|
|
codes, but it can also have two special meanings:
|
|
|
|
- If the pref is not set at all, Gecko will use the default locale as the requested one.
|
|
- If the pref is set to an empty string, Gecko will look into OS app locales as the requested.
|
|
|
|
The former is the current default setting for Firefox Desktop, and the latter is the
|
|
default setting for Firefox for Android.
|
|
|
|
If the developer wants to programmatically request the app to follow OS locales,
|
|
they can assign :js:`null` to :js:`requestedLocales`.
|
|
|
|
Regional Preferences
|
|
====================
|
|
|
|
Every locale comes with a set of default preferences that are specific to a culture
|
|
and region. This contains preferences such as calendar system, way to display
|
|
time (24h vs 12h clock), which day the week starts on, which days constitute a weekend,
|
|
what numbering system and date time formatting a given locale uses
|
|
(for example "MM/DD" in en-US vs "DD/MM" in en-AU).
|
|
|
|
For all such preferences Gecko has a list of default settings for every region,
|
|
but there's also a degree of customization every user may want to make.
|
|
|
|
All major operating systems have a Settings UI for selecting those preferences,
|
|
and since Firefox does not provide its own, Gecko looks into the OS for them.
|
|
|
|
A special API :js:`mozilla::intl::OSPreferences` handles communication with the
|
|
host operating system, retrieving regional preferences and altering
|
|
internationalization formatting with user preferences.
|
|
|
|
One thing to notice is that the boundary between regional preferences and language
|
|
selection is not strong. In many cases the internationalization formats
|
|
will contain language specific terms and literals. For example a date formatting
|
|
pattern into Japanese may look like this - *"2018年3月24日"*, or the date format
|
|
may contains names of months or weekdays to be translated
|
|
("April", "Tuesday" etc.).
|
|
|
|
For that reason it is tricky to follow regional preferences in a scenario where Operating
|
|
System locale selection does not match the Firefox UI locales.
|
|
|
|
Such behavior might lead to a UI case like "Today is 24 października" in an English Firefox
|
|
with Polish date formats.
|
|
|
|
For that reason, by default, Gecko will *only* look into OS Preferences if the *language*
|
|
portion of the locale of the OS and Firefox match.
|
|
That means that if Windows is in "**en**-AU" and Firefox is in "**en**-US" Gecko will look
|
|
into Windows Regional Preferences, but if Windows is in "**de**-CH" and Firefox
|
|
is in "**fr**-FR" it won't.
|
|
In order to force Gecko to look into OS preferences irrelevant of the language match,
|
|
set the flag :js:`intl.regional_prefs.use_os_locales` to :js:`true`.
|
|
|
|
UI Direction
|
|
------------
|
|
|
|
Since the UI direction is so tightly coupled with the locale selection, the
|
|
main method of testing the directionality of the Gecko app lives in LocaleService.
|
|
|
|
:js:`LocaleService::IsAppLocaleRTL` returns a boolean indicating if the current
|
|
direction of the app UI is right-to-left.
|
|
|
|
Default and Last Fallback Locales
|
|
=================================
|
|
|
|
Every Gecko application is built with a single locale as the default one. Such locale
|
|
is guaranteed to have all linguistic resources available, should be used
|
|
as the default locale in case language negotiation cannot find any match, and also
|
|
as the last locale to look for in a fallback chain.
|
|
|
|
If all else fails, Gecko also support a notion of last fallback locale, which is
|
|
currently hardcoded to *"en-US"*, and is the very final locale to try in case
|
|
nothing else (including the default locale) works.
|
|
Notice that Unicode and ICU use *"en-GB"* in that role because more English speaking
|
|
people around the World recognize British regional preferences than American (metric vs.
|
|
imperial, Fahrenheit vs Celsius etc.).
|
|
Mozilla may switch to *"en-GB"* in the future.
|
|
|
|
Packaged Locales
|
|
================
|
|
|
|
When the Gecko application is being packaged it bundles a selection of locale resources
|
|
to be available within it. At the moment, for example, most Firefox for Android
|
|
builds come with almost 100 locales packaged into it, while Desktop Firefox comes
|
|
with usually just one packaged locale.
|
|
|
|
There is currently work being done on enabling more flexibility in how
|
|
the locales are packaged to allow for bundling applications with different
|
|
sets of locales in different areas - dictionaries, hyphenations, product language resources,
|
|
installer language resources, etc.
|
|
|
|
Web Exposed Locales
|
|
====================
|
|
|
|
For anti-tracking or some other reasons, we tend to expose spoofed locale to web content instead
|
|
of default locales. This can be done by setting the pref :js:`intl.locale.privacy.web_exposed`.
|
|
The pref is a comma separated list of locale, and empty string implies default locales.
|
|
|
|
The pref has no function while :js:`privacy.spoof_english` is set to 2, where *"en-US"* will always
|
|
be returned.
|
|
|
|
Multi-Process
|
|
=============
|
|
|
|
Locale management can operate in a client/server model. This allows a Gecko process
|
|
to manage locales (server mode) or just receive the locale selection from a parent
|
|
process (client mode).
|
|
|
|
The client mode is currently used by all child processes of Desktop Firefox, and
|
|
may be used by, for example, GeckoView to follow locale selection from a parent
|
|
process.
|
|
|
|
To check the mode the process is operating in, the :js:`LocaleService::IsServer` method is available.
|
|
|
|
Mozilla Exceptions
|
|
==================
|
|
|
|
There's currently only a single exception of the BCP47 used, and that's
|
|
a legacy "ja-JP-mac" locale. The "mac" is a variant and BCP47 requires all variants
|
|
to be 5-8 character long.
|
|
|
|
Gecko supports the limitation by accepting the 3-letter variants in our APIs and also
|
|
provides a special :js:`appLocalesAsLangTags` method which returns this locale in that form.
|
|
(:js:`appLocalesAsBCP47` will canonicalize it and turn into `"ja-JP-macos"`).
|
|
|
|
Usage of language negotiation etc. shouldn't rely on this behavior.
|
|
|
|
Events
|
|
======
|
|
|
|
:js:`LocaleService` emits two events: :js:`intl:app-locales-changed` and
|
|
:js:`intl:requested-locales-changed` which all code can listen to.
|
|
|
|
Those events may be broadcasted in response to new language packs being installed, or
|
|
uninstalled, or user selection of languages changing.
|
|
|
|
In most cases, the code should observe the :js:`intl:app-locales-changed`
|
|
and react to only that event since this is the one indicating a change
|
|
in the currently used language settings that the components should follow.
|
|
|
|
Testing
|
|
=======
|
|
|
|
Many components may have logic encoded to react to changes in requested, available
|
|
or resolved locales.
|
|
|
|
In order to test the component's behavior, it is important to replicate
|
|
the environment in which such change may happen.
|
|
|
|
Since in most cases it is advised for a component to tie its
|
|
language negotiation to the main application (see `Chained Language Negotiation`),
|
|
it is not enough to add a new locale to trigger the language change.
|
|
|
|
First, it is necessary to add a new locale to the available ones, then change
|
|
the requested, and only that will result in a new negotiation and language
|
|
change happening.
|
|
|
|
There are two primary ways to add a locale to available ones.
|
|
|
|
Testing Localization
|
|
--------------------
|
|
|
|
If the goal is to test that the correct localization ends up in the correct place,
|
|
the developer needs to register a new :js:`FileSource` in :js:`L10nRegistry` and
|
|
provide a mock cached data to be returned by the API.
|
|
|
|
It may look like this:
|
|
|
|
.. code-block:: javascript
|
|
|
|
let fs = new FileSource(["ko-KR", "ar"], "resource://mock-addon/localization/{locale}");
|
|
|
|
fs.cache = {
|
|
"resource://mock-addon/localization/ko-KR/test.ftl": "key = Value in Korean",
|
|
"resource://mock-addon/localization/ar/test.ftl": "key = Value in Arabic"
|
|
};
|
|
|
|
L10nRegistry.registerSource(fs);
|
|
|
|
let availableLocales = Services.locale.availableLocales;
|
|
|
|
assert(availableLocales.includes("ko-KR"));
|
|
assert(availableLocales.includes("ar"));
|
|
|
|
Services.locale.requestedLocales = ["ko-KR"];
|
|
|
|
let appLocales = Services.locale.appLocalesAsBCP47;
|
|
assert(appLocales[0], "ko-KR");
|
|
|
|
From here, a resource :js:`test.ftl` can be added to a `Localization` and for ID :js:`key`
|
|
the correct value from the mocked cache will be returned.
|
|
|
|
Testing Locale Switching
|
|
------------------------
|
|
|
|
The second method is much more limited, as it only mocks the locale availability,
|
|
but it is also simpler:
|
|
|
|
.. code-block:: javascript
|
|
|
|
Services.locale.availableLocales = ["ko-KR", "ar"];
|
|
Services.locale.requestedLocales = ["ko-KR"];
|
|
|
|
let appLocales = Services.locale.appLocalesAsBCP47;
|
|
assert(appLocales[0], "ko-KR");
|
|
|
|
In the future, Mozilla plans to add a third way for add-ons (`bug 1440969`_)
|
|
to allow for either manual or automated testing purposes disconnecting its locales
|
|
from the main application ones.
|
|
|
|
Testing the outcome
|
|
-------------------
|
|
|
|
Except of testing for reaction to locale changes, it is advised to avoid writing
|
|
tests that expect a certain locale to be selected, or certain internationalization
|
|
or localization data to be used.
|
|
|
|
Doing so locks down the test infrastructure to be only usable when launched in
|
|
a single locale environment and requires those tests to be updated whenever the underlying
|
|
data changes.
|
|
|
|
In the case of testing locale selection it is best to use a fake locale like :js:`x-test`, that
|
|
will not be present at the beginning of the test.
|
|
|
|
In the case of testing for internationalization data it is best to use :js:`resolvedOptions()`,
|
|
to verify the right data is being used, rather than comparing the output string.
|
|
|
|
In the case of localization, it is best to test against the correct :js:`data-l10n-id`
|
|
being set or, in edge cases, verify that a given variable is present in the string using
|
|
:js:`String.prototype.includes`.
|
|
|
|
Deep Dive
|
|
=========
|
|
|
|
Below is a list of articles with additional
|
|
details on selected subjects:
|
|
|
|
.. toctree::
|
|
:maxdepth: 1
|
|
|
|
locale_env
|
|
locale_startup
|
|
|
|
Feedback
|
|
========
|
|
|
|
In case of questions, please consult Intl module peers.
|
|
|
|
|
|
.. _RFC 5656: https://tools.ietf.org/html/rfc5656
|
|
.. _BCP 47: https://tools.ietf.org/html/bcp47#section-2.1
|
|
.. _ISO 639: http://www.loc.gov/standards/iso639-2/php/code_list.php
|
|
.. _ISO 3166-1: https://www.iso.org/iso-3166-country-codes.html
|
|
.. _Intl.Locale: https://bugzilla.mozilla.org/show_bug.cgi?id=1433303
|
|
.. _fluent-locale: https://docs.rs/fluent-locale/
|
|
.. _bug 1440969: https://bugzilla.mozilla.org/show_bug.cgi?id=1440969
|