When reloading https://en.wikipedia.org/wiki/Barack_Obama that is used by
browsertime benchmark, `CountGraphemeClusters` is called around 3000 times.
But half calls are that `aText` is empty.
So if we add fast path for empty text, we can avoid a lot of heap allocations
of `ICU4XGraphemeClusterBreakIteratorUtf16`.
Differential Revision: https://phabricator.services.mozilla.com/D196008
When running wikipedia's page by browsertime benchmark, 0.5%-1% calls of
`LineBreaker::ComputeBreakPositions` is that aLength is 1. If this is 1,
we only set SOT break in ICU4X's line segmenter.
So we can add a fast-path for this situation. `ICU4XLineBreakIterator*`
always allocate rust heap, so we can reduce a few heap allocation costs.
Differential Revision: https://phabricator.services.mozilla.com/D195523
This file shows up after running `update-icu4x.sh`. It is part of the downloaded
`icu_capi` crate. We should check it in for completeness even if it is not used.
Differential Revision: https://phabricator.services.mozilla.com/D195591
Although since `icu_capi` uses weak dependency syntax, cargo vendor doesn't
recognize it. So this command will copy unnecessary crates. To avoid it, I
would like to use modified version of icu_capi.
And this has another issue. `icu_capi`'s C++ headers isn't compatible with
clang [*1]. So we need the workaround for it.
ICU4X 1.3 has another change for data provider with `icu_capi`.
From ICU4X 1.3, there are new `icu_*_data` crates to custom data file, instead
of `icu_testdata`. So we have to add each data crate if using `icu_capi`.
*1 https://github.com/llvm/llvm-project/issues/70162
Differential Revision: https://phabricator.services.mozilla.com/D192902
The matching behavior implemented in bug 1857742 did not quite follow the spec,
particularly with regard to language *ranges* (as used in the :lang() pseudo)
that are not themselves valid language *tags*.
This updates the LangTagCompare function to more correctly follow the BCP4647
"Extended Filtering" algorithm, and adjusts the relevant WPT tests (originally
from bug 1857742) to reflect the corrected behavior.
Differential Revision: https://phabricator.services.mozilla.com/D194054
The currency display name for multiple locales is now only defined in the "root" locale,
which triggers the `U_USING_DEFAULT_WARNING` case, so we have to change our detection
when no localisation was found.
Depends on D192734
Differential Revision: https://phabricator.services.mozilla.com/D192735
Use `CLASS_CHARACTER` because that matches the previous character class for most
characters which are now part of the new character classes.
Depends on D192733
Differential Revision: https://phabricator.services.mozilla.com/D192734
Add various workarounds for ICU-20548 to improve the output of the "timeZoneName"
option when formatting date-time ranges.
Changes to "intl/icu/source/i18n/dtitvinf.cpp":
The existing ICU code was already changing `LOW_Z` to `LOW_V` when searching
for the best-fit pattern, but it was missing support for `CAP_O`. (The other
time zone name skeleton characters aren't currently supported in ECMA-402, so
I didn't handle them here.)
Changes to "intl/icu/source/i18n/dtitvfmt.cpp":
In `DateIntervalFormat::getDateTimeSkeleton()`, handle `CAP_O` similar to the
existing code for `LOW_Z` and `LOW_V`. And also keep the original number of
skeleton characters in `normalizedTimeSkeleton`.
In `DateIntervalFormat::adjustFieldWidth()`, copy the field width information
for `LOW_V` into `LOW_Z` resp. `CAP_O` because `LOW_V` is replaced in the
resolved pattern with `LOW_Z` resp. `CAP_O`.
Differential Revision: https://phabricator.services.mozilla.com/D189743
The default value for `GetOption` in `InitializeCollator` is changed from `false`
to `undefined` to avoid having to `intl_isIgnorePunctuation` in the constructor
function.
The corresponding spec PR is <https://github.com/tc39/ecma402/pull/833>, but
our behaviour is already not strictly spec-compliant as described in
<https://github.com/tc39/ecma402/issues/832>, so it seems reasonable to simply
implement the spec PR ahead of time. (We don't want to "fix" our implementation
to strictly follow the spec without the PR, because that could be web-incompatible
resp. at least disruptive for Thai users, which currently already get the expected
behaviour where punctuation characters are ignored in Thai.)
Differential Revision: https://phabricator.services.mozilla.com/D189545
Add `Collator::GetIgnorePunctuation()` similar to the existing
`Collator::GetCaseFirst()` function.
Drive-by change:
- Remove unnecessary macro with an inline function.
Differential Revision: https://phabricator.services.mozilla.com/D189543
Map both "r" (related Gregorian year) and "U" (cyclic year name) to a resolved numeric year.
From <https://github.com/tc39/ecma402/issues/816#issuecomment-1667481214>:
> Per 11.2.3 Internal slots, "{relatedYear}" can only appear in a pattern string
> when the format record has a [[year]] field. And if a [[year]] field is present,
> CreateDateTimeFormat will set the Intl.DateTimeFormat's [[Year]] field to the
> format records [[year]] value. Intl.DateTimeFormat.prototype.resolvedOptions
> should then return the value of the [[Year]] internal slot.
Fixes <https://github.com/tc39/ecma402/issues/816>.
Differential Revision: https://phabricator.services.mozilla.com/D189542
Updates the ICU patch "double-conversion.diff" to only use the in-tree copy of
double-conversion when building Firefox/SpiderMonkey. Standalone ICU builds
instead use the ICU copy of double-conversion. Standalone ICU builds happen
when running "intl/icu_sources_data.py" and when converting the in-tree
little-endian ICU data file to the big-endian format.
`JS_HAS_INTL_API` is used to detect how ICU is built: When building ICU as part
of Firefox/SpiderMonkey, `JS_HAS_INTL_API` is guaranteed to be set, whereas in
standalone ICU builds `JS_HAS_INTL_API` isn't defined.
After applying the updated "double-conversion.diff" patch, ICU was reimported
to restore the previously deleted double-conversion sources files from ICU.
Depends on D190791
Differential Revision: https://phabricator.services.mozilla.com/D190792
This makes number localization cheaper / halves the time in the
microbenchmark.
Change the content-language handling to use atoms. This exposes some
interesting inconsistencies but I tried not to change behavior there.
Differential Revision: https://phabricator.services.mozilla.com/D191174