third_party_rust_regex

mirror of https://gitee.com/openharmony/third_party_rust_regex synced 2025-04-07 20:51:33 +00:00

History

Andrew Gallant 84a2bf5d73 Match (?-u:\B) correctly in the NFA engines when valid UTF-8 is required.

This commit fixes a bug where matching (?-u:\B) (that is, "not an ASCII
word boundary") in the NFA engines could produce match positions at invalid
UTF-8 sequence boundaries. The specific problem is that determining whether
(?-u:\B) matches or not relies on knowing whether we must report matches
only at UTF-8 boundaries, and this wasn't actually being taken into
account. (Instead, we prefer to enforce this invariant in the compiler, so
that the matching engines mostly don't have to care about it.) But of
course, the zero-width assertions are kind of a special case all around,
so we need to handle ASCII word boundaries differently depending on
whether we require valid UTF-8.

This bug was noticed because the DFA actually handles this correctly (by
encoding ASCII word boundaries into the state machine itself, which in turn
guarantees the valid UTF-8 invariant) while the NFAs don't, leading to an
inconsistency.

Fix #241.

2016-07-09 22:45:11 -04:00

src

Match (?-u:\B) correctly in the NFA engines when valid UTF-8 is required.

2016-07-09 22:45:11 -04:00

Cargo.toml

regex-syntax 0.3.3

2016-06-16 07:21:00 -04:00