Ting-Yu Lin 57b867e7ff Bug 1728708 Part 4 - Simplify WordBreaker::Next() and make it recognize the end of text a word break opportunity. r=jfkthame
A UAX29 compatible word breaker (like ICU4C) treat the end of text as a
word break opportunity (rule WB2 [1]), but currently lwbrk word breaker
doesn't.

The motivation of this patch is to make `WordBreaker::Next()` closer to
a UAX29 compatible one (at least for English text), and see if the
callers need to change. This should make the future integration of ICU4X
segmenter easier.

The only caller of WordBreaker::Next() is ClusterIterator's constructor.
This patch shouldn't change its behavior because we've already manually
assigned a word break point at the end of the line when `aContext` is
empty and `aDirection` is -1. This patch generalizes it to all
conditions.

Also, update TestPrintWordWithBreak() so that the result string makes
more sense.

[1] https://www.unicode.org/reports/tr29/#WB2

Differential Revision: https://phabricator.services.mozilla.com/D124304
2021-09-08 04:19:38 +00:00
..