mirror of
https://gitee.com/openharmony/third_party_rust_regex
synced 2025-04-07 12:41:46 +00:00

TL;DR - The combination of case folding, character classes and nested negation is darn tricky. The problem presented in #99 was related to how we're storing case folded character classes. Namely, we only store the canonical representation of each character (which means that when we match text, we must apply case folding to the input). But when this representation is negated, information is lost. From #99, consider the negated class with a single range `x`. The class is negated before applying case folding. The negated class includes `X`, so that case folding includes both `X` and `x` even though the regex in #99 is specifically trying to not match either `X` or `x`. The solution is to apply case folding *after* negation. But given our representation, this doesn't work. Namely, case folding the range `x` yields `x` with a case insensitive flag set. Negating this class ends up matching all characters sans `x`, which means it will match `X`. So I've backtracked the representation to include *all* case folding variants. This means we can negate case folded classes and get the expected result. e.g., case folding the class `[x]` yields `[xX]`, and negating `[xX]` gives the desired result for the regex in #99.