Commit Graph

431 Commits

Author SHA1 Message Date
Ting-Yu Lin
2c88da9b0e Bug 1740831 Part 2 - Replace LineBreaker::Strictness with LineBreakRule. r=m_kato
LineBreaker::Strictness is just an alias of LineBreakRule in Segmenter.h. This
is to reduce the dependency of the legacy LineBreaker.

Differential Revision: https://phabricator.services.mozilla.com/D131026
2021-11-15 17:20:36 +00:00
Ting-Yu Lin
638eb14439 Bug 1740831 Part 1 - Replace LineBreaker::WordBreak with WordBreakRule. r=m_kato
LineBreaker::WordBreak is just an alias of WordBreakRule in Segmenter.h. This is
to reduce the dependency of the legacy LineBreaker.

Differential Revision: https://phabricator.services.mozilla.com/D131025
2021-11-15 17:20:35 +00:00
Ting-Yu Lin
4a00f61720 Bug 1722484 Part 1 - Introduce mozilla::intl::Segmenter and break iterators. r=m_kato,dminor
intl/component is already linked with Javascript. However, segmenter is still
backed by lwbrk, which highly couples with xpcom APIs, so we cannnot add it
under intl/component until we integrate ICU4X segmenter. I added it under
intl/lwbrk for now.

The enum `WordBreakRule` and `LineBreakRule` are named after their counterpart
in ICU4X.
https://unicode-org.github.io/icu4x-docs/doc/icu_segmenter/index.html#enums

Differential Revision: https://phabricator.services.mozilla.com/D129193
2021-11-09 01:14:14 +00:00
Iulian Moraru
36283a9a3b Backed out 2 changesets (bug 1722484) for causing multiple build bustages. CLOSED TREE
Backed out changeset bef547b588ff (bug 1722484)
Backed out changeset e676fa1a0cb7 (bug 1722484)
2021-11-09 01:42:20 +02:00
Ting-Yu Lin
b2dc37c08a Bug 1722484 Part 1 - Introduce mozilla::intl::Segmenter and break iterators. r=m_kato,dminor
intl/component is already linked with Javascript. However, segmenter is still
backed by lwbrk, which highly couples with xpcom APIs, so we cannnot add it
under intl/component until we integrate ICU4X segmenter. I added it under
intl/lwbrk for now.

The enum `WordBreakRule` and `LineBreakRule` are named after their counterpart
in ICU4X.
https://unicode-org.github.io/icu4x-docs/doc/icu_segmenter/index.html#enums

Differential Revision: https://phabricator.services.mozilla.com/D129193
2021-11-08 22:24:18 +00:00
Jonathan Kew
37bb886753 Bug 1737147 - Treat characters with LB=BreakAfter as Breakable in LineBreaker.cpp. r=emilio
Differential Revision: https://phabricator.services.mozilla.com/D130477
2021-11-06 19:05:43 +00:00
Narcis Beleuzu
3144ccc409 Backed out changeset 551d3e02d2f7 (bug 1737147) for wr failures on word-break-normal-bo-000.html . CLOSED TREE 2021-11-05 23:44:50 +02:00
Jonathan Kew
97209d1f8a Bug 1737147 - Treat characters with LB=BreakAfter as Breakable in LineBreaker.cpp. r=emilio
Differential Revision: https://phabricator.services.mozilla.com/D130477
2021-11-05 16:22:29 +00:00
Bob Owen
b77c3841c1 Bug 1737914: Add missing MOZ_SANDBOX guards in nsUniscribeBreaker.cpp. r=jfkthame
Differential Revision: https://phabricator.services.mozilla.com/D129637
2021-10-27 14:42:12 +00:00
Bob Owen
93b7d525df Bug 1713973 p4: Test brokered complex breaker against Uniscribe in content. r=jfkthame
Depends on D129143

Differential Revision: https://phabricator.services.mozilla.com/D129144
2021-10-26 09:58:52 +00:00
Bob Owen
75ff7d4c1e Bug 1713973 p3: Use brokered complex line breaking when win32k lockdown is enabled. r=jfkthame,tkikuchi
Depends on D126809

Differential Revision: https://phabricator.services.mozilla.com/D129143
2021-10-26 09:58:52 +00:00
Bob Owen
25f1dc489f Bug 1713973 p1: Add caching for calls to NS_GetComplexLineBreaks. r=jfkthame
Differential Revision: https://phabricator.services.mozilla.com/D129125
2021-10-26 09:58:51 +00:00
Ting-Yu Lin
8dddb69ae7 Bug 1736938 Part 5 - Convert LineBreaker and WordBreaker into utility classes. r=jfkthame
Differential Revision: https://phabricator.services.mozilla.com/D129363
2021-10-25 19:00:23 +00:00
Ting-Yu Lin
c2f6ef7a09 Bug 1736938 Part 3 - Make all WordBreaker's methods static, and adapt the callers. r=jfkthame
The motivation is the same as the previous part.

Differential Revision: https://phabricator.services.mozilla.com/D129109
2021-10-25 19:00:22 +00:00
Ting-Yu Lin
427cc27771 Bug 1736938 Part 2 - Make all LineBreaker's methods static, and adapt the callers. r=jfkthame
LineBreaker has no member variables and acts like "namespaces" with
utility functions. Therefore, its methods can be static and called
directly without needing a LineBreaker instance.

Rename GetJISx4051Breaks() to ComputeBreakPositions() per review
feedbacks.

Differential Revision: https://phabricator.services.mozilla.com/D129107
2021-10-25 19:00:22 +00:00
Ting-Yu Lin
db09924f17 Bug 1736938 Part 1 - Expand LineBreaker::WordMove() within Next(). r=jfkthame
After removing Prev() in Bug 1733009, WordMove()'s argument aDirection
now always equals to 1 passed from its remaining caller Next(). Thus, we
can expand WordMove() within Next(), and simply the logic.

Differential Revision: https://phabricator.services.mozilla.com/D129361
2021-10-25 19:00:21 +00:00
Jonathan Kew
9f67437d11 Bug 1736393 - Don't use ScriptBreak for Tibetan, just treat Tsheg character like a hyphen instead. r=m_kato
Differential Revision: https://phabricator.services.mozilla.com/D128756
2021-10-19 13:11:37 +00:00
Ting-Yu Lin
717c05c00c Bug 1733009 - Remove LineBreaker::Prev(). r=jfkthame,m_kato
Differential Revision: https://phabricator.services.mozilla.com/D128559
2021-10-15 18:10:09 +00:00
Jonathan Kew
74b10b85a0 Bug 1734590 - Remove the LineBreaker::DeprecatedNext method. r=TYLin
Differential Revision: https://phabricator.services.mozilla.com/D127807
2021-10-11 12:20:42 +00:00
Csoregi Natalia
e92f158606 Backed out 4 changesets (bug 1734590) for failures on test_xmlserializer.js. CLOSED TREE
Backed out changeset e492f8fd3d53 (bug 1734590)
Backed out changeset 0af985bb7569 (bug 1734590)
Backed out changeset 3751b93ae994 (bug 1734590)
Backed out changeset 45059121c015 (bug 1734590)
2021-10-08 14:41:06 +03:00
Jonathan Kew
6494d53d3f Bug 1734590 - Remove the LineBreaker::DeprecatedNext method. r=TYLin
Differential Revision: https://phabricator.services.mozilla.com/D127807
2021-10-08 10:05:52 +00:00
Ting-Yu Lin
e558c2cbd9 Bug 1733872 Part 2 - Add a new LineBreaker::Next(), and deprecate the old Next(). r=m_kato
This patch is similar to Bug 1728708 Part 4, but for line breaker. This
should make the future integration of ICU4X line segmenter easier. A
UAX14 compatible line breaker always breaks at the end of text
(rule LB3 [1]), and ICU4X line segmenter has this behavior, too.

Current LineBreaker::Next() doesn't treat the end of text as a line
break opportunity, so this patch deprecates it by renaming it, and add a
new Next() method.

TestASCIILB() has adopted the new Next(). All the other callers of the
DeprecatedNext (nsPlainTextSerializer, nsXMLContentSerializer,
InternetCiter) should be audited later, possibly with the removal of
Prev() because the all the usages are very close to Prev().

[1] https://www.unicode.org/reports/tr14/#LB3

Differential Revision: https://phabricator.services.mozilla.com/D127379
2021-10-07 07:39:13 +00:00
Ting-Yu Lin
da1f64cd71 Bug 1733872 Part 1 - Modernize TestASCIILB and TestASCIIWB. r=m_kato
After Bug 1728708 Part 4 [1], WordBreaker always breaks at the end of
text. Therefore, we don't need to manually record the end of text as a
word break opportunity when we see NS_WORDBREAKER_NEED_MORE_TEXT.

Also, modernize the interface of each functions by using mozilla::Span
to remove the needs to pass array length as parameters.

[1] https://hg.mozilla.org/mozilla-central/rev/55efff2d5628

Differential Revision: https://phabricator.services.mozilla.com/D127378
2021-10-07 07:39:13 +00:00
Jonathan Kew
2535a4390c Bug 1730084 Part 6 - Add some empty fragments to the word-breaker test data. r=TYLin
Differential Revision: https://phabricator.services.mozilla.com/D125180
2021-09-13 23:55:34 +00:00
Ting-Yu Lin
f8152b2b24 Bug 1730084 Part 5 - Remove WordBreaker::BreakInBetween(). r=jfkthame
The motivation of this patch is to remove rarely used API in
WordBreaker. WordBreaker::BreakInBetween() is used only in
nsFind::BreakInBetween() in production, and it can be replaced by
Next().

If the user wants to know whether there is a word break between two
strings such as the use cases in gtest, joining the two strings and
passing the result to Next() is the preferred way.

Note: I delete the buggy forward word search algorithm in
TestFindWordBreakFromPosition() because from the test expectations, it
doesn't expect to continue the search in previous fragments. Also, the
buggy part comes from the following code, which had undefined behavior
before Part 4, and does nothing after Part 4.

```
wbk->FindWord(prevFragText.get(), prevFragText.Length(), prevFragText.Length());
```

Differential Revision: https://phabricator.services.mozilla.com/D125151
2021-09-13 23:55:33 +00:00
Ting-Yu Lin
bd25bca479 Bug 1730084 Part 4 - Clean up and fix an edge case of FindWord(). r=jfkthame
* Rename arguments so that their names are consistent with Next().
* Make the function not assert on an empty string, i.e. aLen == 0, like
Next().
* Fix an undefined behavior when the user passes aTextLen == aOffset.
The methods used to access `aText[aOffset]` that is clearly out of range
because the string may not be null-terminated. After this patch, it
returns a sentinel WordRange when aLen == aPos.
* Add document and gtest TestFindWordWithEmptyString().
* Change the sentinel return value to {aLen,aLen} for FindWord(), and
adapt one caller.

Differential Revision: https://phabricator.services.mozilla.com/D125434
2021-09-13 23:55:33 +00:00
Ting-Yu Lin
b94f1b97c5 Bug 1730084 Part 1 - Run TestNextWordBreakWithEmptyString as a test. r=jfkthame
The function is added in Bug 1728708 Part 4, but it never get called, so
this patch transforms it into a test in WordBreak test suite to make it
run.

While I'm here, other individual functions are also transformed into
tests so that we can have more granular results if some of them failed.

Differential Revision: https://phabricator.services.mozilla.com/D125148
2021-09-13 23:55:32 +00:00
Ting-Yu Lin
57b867e7ff Bug 1728708 Part 4 - Simplify WordBreaker::Next() and make it recognize the end of text a word break opportunity. r=jfkthame
A UAX29 compatible word breaker (like ICU4C) treat the end of text as a
word break opportunity (rule WB2 [1]), but currently lwbrk word breaker
doesn't.

The motivation of this patch is to make `WordBreaker::Next()` closer to
a UAX29 compatible one (at least for English text), and see if the
callers need to change. This should make the future integration of ICU4X
segmenter easier.

The only caller of WordBreaker::Next() is ClusterIterator's constructor.
This patch shouldn't change its behavior because we've already manually
assigned a word break point at the end of the line when `aContext` is
empty and `aDirection` is -1. This patch generalizes it to all
conditions.

Also, update TestPrintWordWithBreak() so that the result string makes
more sense.

[1] https://www.unicode.org/reports/tr29/#WB2

Differential Revision: https://phabricator.services.mozilla.com/D124304
2021-09-08 04:19:38 +00:00
Ting-Yu Lin
edb9c8ed39 Bug 1728708 Part 3 - Clean up the gtest for line and word breaker. r=jfkthame
Here are the changes in this patch. They shouldn't change the behavior.

* Rename the gtest to `TestBreak.cpp` because it also contains word break tests.
* Align ruler comments to the test strings.
* Rename `lb` to `wb` in `TestASCIIWB`.
* Remove unused variable `j` in `TestPrintWordWithBreak()`.
* Use `ArrayLength` instead of `sizeof` trick to get the array length.
* #include ArrayUtils.h, and sort the #includes statements.

Differential Revision: https://phabricator.services.mozilla.com/D124303
2021-09-08 04:19:37 +00:00
Ting-Yu Lin
69a841c529 Bug 1728708 Part 2 - Rename WordBreaker::NextWord() to WordBreaker::Next(). r=jfkthame
Differential Revision: https://phabricator.services.mozilla.com/D124302
2021-09-08 04:19:37 +00:00
Ting-Yu Lin
1083821003 Bug 1728708 Part 1 - Move WordBreakClass and GetClass into WordBreaker's private section. r=jfkthame
Differential Revision: https://phabricator.services.mozilla.com/D124301
2021-09-08 04:19:36 +00:00
Ting-Yu Lin
5449000ecc Bug 1728241 - Fix non-unified build for intl/lwbrk. r=platform-i18n-reviewers,dminor
This patch is to fix WordBreaker.cpp under non-unified build. I test the
non-unified build locally on my Linux machine via changing
`UNIFIED_SOURCES` containing "LineBreaker.cpp" and "WordBreaker.cpp" to
`SOURCES` in intl/lwbrk/moz.build.

Differential Revision: https://phabricator.services.mozilla.com/D123999
2021-08-31 16:01:32 +00:00
Andi-Bogdan Postelnicu
2fc4f70e9b Bug 1725145 - Preparation for the hybrid build env. r=necko-reviewers,firefox-build-system-reviewers,valentin,glandium
Automatically generated path that adds flag `REQUIRES_UNIFIED_BUILD = True` to `moz.build`
when the module governed by the build config file is not buildable outside on the unified environment.

This needs to be done in order to have a hybrid build system that adds the possibility of combing
unified build components with ones that are built outside of the unified eco system.

Differential Revision: https://phabricator.services.mozilla.com/D122345
2021-08-25 10:46:17 +00:00
Jonathan Kew
fd668cd57a Bug 1703213 - Disallow soft line break between adjacent IDEOGRAPHIC SPACE characters. r=m_kato
Differential Revision: https://phabricator.services.mozilla.com/D111065
2021-04-08 09:41:48 +00:00
Bob Owen
2e99a1b7b3 Bug 1696940: Use nsRuleBreaker code in nsUniscribeBreaker when win32k lockdown is enabled. r=jfkthame
This is only intended for testing, because win32k lockdown is disabled by
default for content.

Differential Revision: https://phabricator.services.mozilla.com/D107495
2021-03-10 15:45:41 +00:00
Ricky Stewart
02a7b4ebdf Bug 1654103: Standardize on Black for Python code in mozilla-central.
Allow-list all Python code in tree for use with the black linter, and re-format all code in-tree accordingly.

To produce this patch I did all of the following:

1. Make changes to tools/lint/black.yml to remove include: stanza and update list of source extensions.

2. Run ./mach lint --linter black --fix

3. Make some ad-hoc manual updates to python/mozbuild/mozbuild/test/configure/test_configure.py -- it has some hard-coded line numbers that the reformat breaks.

4. Make some ad-hoc manual updates to `testing/marionette/client/setup.py`, `testing/marionette/harness/setup.py`, and `testing/firefox-ui/harness/setup.py`, which have hard-coded regexes that break after the reformat.

5. Add a set of exclusions to black.yml. These will be deleted in a follow-up bug (1672023).

# ignore-this-changeset

Differential Revision: https://phabricator.services.mozilla.com/D94045
2020-10-26 18:34:53 +00:00
Bogdan Tara
da1098d4aa Backed out 10 changesets (bug 1654103, bug 1672023, bug 1518999) for PanZoomControllerTest.touchEventForResult gv-junit failures CLOSED TREE
Backed out changeset ff3fb0b4a512 (bug 1672023)
Backed out changeset e7834b600201 (bug 1654103)
Backed out changeset 807893ca8069 (bug 1518999)
Backed out changeset 13e6b92440e9 (bug 1518999)
Backed out changeset 8b2ac5a6c98a (bug 1518999)
Backed out changeset 575748295752 (bug 1518999)
Backed out changeset 65f07ce7b39b (bug 1518999)
Backed out changeset 4bb80556158d (bug 1518999)
Backed out changeset 8ac8461d7bd7 (bug 1518999)
Backed out changeset e8ba13ee17f5 (bug 1518999)
2020-10-24 03:36:18 +03:00
Ricky Stewart
c0cea3b0fa Bug 1654103: Standardize on Black for Python code in mozilla-central. r=remote-protocol-reviewers,marionette-reviewers,webdriver-reviewers,perftest-reviewers,devtools-backward-compat-reviewers,jgilbert,preferences-reviewers,sylvestre,maja_zf,webcompat-reviewers,denschub,ntim,whimboo,sparky
Allow-list all Python code in tree for use with the black linter, and re-format all code in-tree accordingly.

To produce this patch I did all of the following:

1. Make changes to tools/lint/black.yml to remove include: stanza and update list of source extensions.

2. Run ./mach lint --linter black --fix

3. Make some ad-hoc manual updates to python/mozbuild/mozbuild/test/configure/test_configure.py -- it has some hard-coded line numbers that the reformat breaks.

4. Make some ad-hoc manual updates to `testing/marionette/client/setup.py`, `testing/marionette/harness/setup.py`, and `testing/firefox-ui/harness/setup.py`, which have hard-coded regexes that break after the reformat.

5. Add a set of exclusions to black.yml. These will be deleted in a follow-up bug (1672023).

# ignore-this-changeset

Differential Revision: https://phabricator.services.mozilla.com/D94045
2020-10-23 20:40:42 +00:00
Dorel Luca
1ff59cb7a3 Backed out changeset 7558c8821a07 (bug 1654103) for multiple failures. CLOSED TREE 2020-10-22 03:51:06 +03:00
Makoto Kato
fdfea00747 Bug 1672269 - NextWord shouldn't return empty. r=jfkthame
After landing bug 425915, we use NextWord instead of BreakInBetween.
NextWord is possible to return empty string (offset equals to current
position). So it shouldn't return empty string.

Differential Revision: https://phabricator.services.mozilla.com/D94266
2020-10-21 13:10:29 +00:00
Ricky Stewart
50762dacab Bug 1654103: Standardize on Black for Python code in mozilla-central. r=remote-protocol-reviewers,marionette-reviewers,webdriver-reviewers,perftest-reviewers,devtools-backward-compat-reviewers,jgilbert,preferences-reviewers,sylvestre,maja_zf,webcompat-reviewers,denschub,ntim,whimboo,sparky
Allow-list all Python code in tree for use with the black linter, and re-format all code in-tree accordingly.

To produce this patch I did all of the following:

1. Make changes to tools/lint/black.yml to remove include: stanza and update list of source extensions.

2. Run ./mach lint --linter black --fix

3. Make some ad-hoc manual updates to python/mozbuild/mozbuild/test/configure/test_configure.py -- it has some hard-coded line numbers that the reformat breaks.

4. Add a set of exclusions to black.yml. These will be deleted in a follow-up bug (1672023).

# ignore-this-changeset

Differential Revision: https://phabricator.services.mozilla.com/D94045
2020-10-21 21:27:27 +00:00
Kagami Sascha Rosylight
5052c19940 Bug 1637624 - Part 2: Use StaticPrefs for layout.word_select.* r=emilio
Differential Revision: https://phabricator.services.mozilla.com/D83090
2020-07-10 21:21:30 +00:00
Jonathan Kew
66966abc6b Bug 1647377 - Provide basic line-breaking support for Tibetan on Android by treating TSHEG like a hyphen. r=m_kato
Differential Revision: https://phabricator.services.mozilla.com/D81441
2020-06-30 07:27:00 +00:00
Jonathan Kew
1cd6eafa23 Bug 1640408 - Check Unicode general category to identify punctuation marks in word-breaker. r=m_kato
Differential Revision: https://phabricator.services.mozilla.com/D77655
2020-06-03 15:24:29 +00:00
Simon Giesecke
191a830575 Bug 1628715 - Part 7: Add MOZ_NONNULL_RETURN to infallible nsTArray::AppendElements. r=xpcom-reviewers,necko-reviewers,nika,valentin
Differential Revision: https://phabricator.services.mozilla.com/D70831
2020-04-24 13:31:14 +00:00
Jonathan Kew
3751d26654 Bug 425915 - Use complex line breaker to identify word boundaries in SEAsian languages without interword spaces. r=m_kato
Differential Revision: https://phabricator.services.mozilla.com/D71206
2020-04-23 14:18:08 +00:00
MHD
d93413f35e Bug 1629435 - Using bool literals instead of integer literals. r=sylvestre
Differential Revision: https://phabricator.services.mozilla.com/D70825

--HG--
extra : moz-landing-system : lando
2020-04-14 15:17:44 +00:00
Jonathan Kew
aceaf80ca1 Bug 1595428 - Allow potential line-break after fullwidth comma and period. r=m_kato
(Also fixes the handling of FULLWIDTH OPEN/CLOSE WHITE PARENTHESIS, which do not
map directly to ASCII counterparts.)

Differential Revision: https://phabricator.services.mozilla.com/D69646

--HG--
extra : moz-landing-system : lando
2020-04-07 01:01:58 +00:00
Simon Giesecke
65378eb4e0 Bug 1613985 - Use default for equivalent-to-default constructors/destructors in intl. r=hsivonen
Depends on D65290

Differential Revision: https://phabricator.services.mozilla.com/D65291

--HG--
extra : moz-landing-system : lando
2020-03-04 09:11:10 +00:00
Jonathan Kew
083ae19512 Bug 1293584 - Fix implementation of word-break:keep-all to better follow the spec. r=m_kato
Differential Revision: https://phabricator.services.mozilla.com/D60275

--HG--
extra : moz-landing-system : lando
2020-01-21 08:02:20 +00:00