LineBreaker::Strictness is just an alias of LineBreakRule in Segmenter.h. This
is to reduce the dependency of the legacy LineBreaker.
Differential Revision: https://phabricator.services.mozilla.com/D131026
LineBreaker::WordBreak is just an alias of WordBreakRule in Segmenter.h. This is
to reduce the dependency of the legacy LineBreaker.
Differential Revision: https://phabricator.services.mozilla.com/D131025
LineBreaker has no member variables and acts like "namespaces" with
utility functions. Therefore, its methods can be static and called
directly without needing a LineBreaker instance.
Rename GetJISx4051Breaks() to ComputeBreakPositions() per review
feedbacks.
Differential Revision: https://phabricator.services.mozilla.com/D129107
After removing Prev() in Bug 1733009, WordMove()'s argument aDirection
now always equals to 1 passed from its remaining caller Next(). Thus, we
can expand WordMove() within Next(), and simply the logic.
Differential Revision: https://phabricator.services.mozilla.com/D129361
This patch is similar to Bug 1728708 Part 4, but for line breaker. This
should make the future integration of ICU4X line segmenter easier. A
UAX14 compatible line breaker always breaks at the end of text
(rule LB3 [1]), and ICU4X line segmenter has this behavior, too.
Current LineBreaker::Next() doesn't treat the end of text as a line
break opportunity, so this patch deprecates it by renaming it, and add a
new Next() method.
TestASCIILB() has adopted the new Next(). All the other callers of the
DeprecatedNext (nsPlainTextSerializer, nsXMLContentSerializer,
InternetCiter) should be audited later, possibly with the removal of
Prev() because the all the usages are very close to Prev().
[1] https://www.unicode.org/reports/tr14/#LB3
Differential Revision: https://phabricator.services.mozilla.com/D127379
After Bug 1728708 Part 4 [1], WordBreaker always breaks at the end of
text. Therefore, we don't need to manually record the end of text as a
word break opportunity when we see NS_WORDBREAKER_NEED_MORE_TEXT.
Also, modernize the interface of each functions by using mozilla::Span
to remove the needs to pass array length as parameters.
[1] https://hg.mozilla.org/mozilla-central/rev/55efff2d5628
Differential Revision: https://phabricator.services.mozilla.com/D127378
The motivation of this patch is to remove rarely used API in
WordBreaker. WordBreaker::BreakInBetween() is used only in
nsFind::BreakInBetween() in production, and it can be replaced by
Next().
If the user wants to know whether there is a word break between two
strings such as the use cases in gtest, joining the two strings and
passing the result to Next() is the preferred way.
Note: I delete the buggy forward word search algorithm in
TestFindWordBreakFromPosition() because from the test expectations, it
doesn't expect to continue the search in previous fragments. Also, the
buggy part comes from the following code, which had undefined behavior
before Part 4, and does nothing after Part 4.
```
wbk->FindWord(prevFragText.get(), prevFragText.Length(), prevFragText.Length());
```
Differential Revision: https://phabricator.services.mozilla.com/D125151
* Rename arguments so that their names are consistent with Next().
* Make the function not assert on an empty string, i.e. aLen == 0, like
Next().
* Fix an undefined behavior when the user passes aTextLen == aOffset.
The methods used to access `aText[aOffset]` that is clearly out of range
because the string may not be null-terminated. After this patch, it
returns a sentinel WordRange when aLen == aPos.
* Add document and gtest TestFindWordWithEmptyString().
* Change the sentinel return value to {aLen,aLen} for FindWord(), and
adapt one caller.
Differential Revision: https://phabricator.services.mozilla.com/D125434
The function is added in Bug 1728708 Part 4, but it never get called, so
this patch transforms it into a test in WordBreak test suite to make it
run.
While I'm here, other individual functions are also transformed into
tests so that we can have more granular results if some of them failed.
Differential Revision: https://phabricator.services.mozilla.com/D125148
A UAX29 compatible word breaker (like ICU4C) treat the end of text as a
word break opportunity (rule WB2 [1]), but currently lwbrk word breaker
doesn't.
The motivation of this patch is to make `WordBreaker::Next()` closer to
a UAX29 compatible one (at least for English text), and see if the
callers need to change. This should make the future integration of ICU4X
segmenter easier.
The only caller of WordBreaker::Next() is ClusterIterator's constructor.
This patch shouldn't change its behavior because we've already manually
assigned a word break point at the end of the line when `aContext` is
empty and `aDirection` is -1. This patch generalizes it to all
conditions.
Also, update TestPrintWordWithBreak() so that the result string makes
more sense.
[1] https://www.unicode.org/reports/tr29/#WB2
Differential Revision: https://phabricator.services.mozilla.com/D124304
Here are the changes in this patch. They shouldn't change the behavior.
* Rename the gtest to `TestBreak.cpp` because it also contains word break tests.
* Align ruler comments to the test strings.
* Rename `lb` to `wb` in `TestASCIIWB`.
* Remove unused variable `j` in `TestPrintWordWithBreak()`.
* Use `ArrayLength` instead of `sizeof` trick to get the array length.
* #include ArrayUtils.h, and sort the #includes statements.
Differential Revision: https://phabricator.services.mozilla.com/D124303
This patch is to fix WordBreaker.cpp under non-unified build. I test the
non-unified build locally on my Linux machine via changing
`UNIFIED_SOURCES` containing "LineBreaker.cpp" and "WordBreaker.cpp" to
`SOURCES` in intl/lwbrk/moz.build.
Differential Revision: https://phabricator.services.mozilla.com/D123999
Automatically generated path that adds flag `REQUIRES_UNIFIED_BUILD = True` to `moz.build`
when the module governed by the build config file is not buildable outside on the unified environment.
This needs to be done in order to have a hybrid build system that adds the possibility of combing
unified build components with ones that are built outside of the unified eco system.
Differential Revision: https://phabricator.services.mozilla.com/D122345
Allow-list all Python code in tree for use with the black linter, and re-format all code in-tree accordingly.
To produce this patch I did all of the following:
1. Make changes to tools/lint/black.yml to remove include: stanza and update list of source extensions.
2. Run ./mach lint --linter black --fix
3. Make some ad-hoc manual updates to python/mozbuild/mozbuild/test/configure/test_configure.py -- it has some hard-coded line numbers that the reformat breaks.
4. Make some ad-hoc manual updates to `testing/marionette/client/setup.py`, `testing/marionette/harness/setup.py`, and `testing/firefox-ui/harness/setup.py`, which have hard-coded regexes that break after the reformat.
5. Add a set of exclusions to black.yml. These will be deleted in a follow-up bug (1672023).
# ignore-this-changeset
Differential Revision: https://phabricator.services.mozilla.com/D94045
Allow-list all Python code in tree for use with the black linter, and re-format all code in-tree accordingly.
To produce this patch I did all of the following:
1. Make changes to tools/lint/black.yml to remove include: stanza and update list of source extensions.
2. Run ./mach lint --linter black --fix
3. Make some ad-hoc manual updates to python/mozbuild/mozbuild/test/configure/test_configure.py -- it has some hard-coded line numbers that the reformat breaks.
4. Make some ad-hoc manual updates to `testing/marionette/client/setup.py`, `testing/marionette/harness/setup.py`, and `testing/firefox-ui/harness/setup.py`, which have hard-coded regexes that break after the reformat.
5. Add a set of exclusions to black.yml. These will be deleted in a follow-up bug (1672023).
# ignore-this-changeset
Differential Revision: https://phabricator.services.mozilla.com/D94045
After landing bug 425915, we use NextWord instead of BreakInBetween.
NextWord is possible to return empty string (offset equals to current
position). So it shouldn't return empty string.
Differential Revision: https://phabricator.services.mozilla.com/D94266
Allow-list all Python code in tree for use with the black linter, and re-format all code in-tree accordingly.
To produce this patch I did all of the following:
1. Make changes to tools/lint/black.yml to remove include: stanza and update list of source extensions.
2. Run ./mach lint --linter black --fix
3. Make some ad-hoc manual updates to python/mozbuild/mozbuild/test/configure/test_configure.py -- it has some hard-coded line numbers that the reformat breaks.
4. Add a set of exclusions to black.yml. These will be deleted in a follow-up bug (1672023).
# ignore-this-changeset
Differential Revision: https://phabricator.services.mozilla.com/D94045
(Also fixes the handling of FULLWIDTH OPEN/CLOSE WHITE PARENTHESIS, which do not
map directly to ASCII counterparts.)
Differential Revision: https://phabricator.services.mozilla.com/D69646
--HG--
extra : moz-landing-system : lando