mirror of
https://gitee.com/openharmony/third_party_rust_regex
synced 2025-04-12 23:50:30 +00:00

Overview of changes: * Instruction set has been redesigned to be smaller, mostly by collapsing empty-width matches into one instruction type. In addition to moving instruction-matching out of the matching engine, this makes matching engine code much simpler. * Rewrote input handling to use an inline representation of `Option<char>` and clearer position handling with the `Input` trait. * Added a new bounded backtracking matching engine that is invoked for small regexes/inputs. It's about twice as fast as the full NFA matching engine. * Implemented caching for both the NFA and backtracking engines. This avoids costly allocations on subsequent uses of the regex. * Overhauled prefix handling at both discovery and matching. Namely, sets of prefix literals can now be extracted from regexes. Depending on what the prefixes look like, an Aho-Corasick DFA is built from them. (This adds a dependency on the `aho-corasick` crate.) * When appropriate, use `memchr` to jump around in the input when there is a single common byte prefix. (This adds a dependency on the `memchr` crate.) * Bring the `regex!` macro up to date. Unfortunately, it still implements the full NFA matching engine and doesn't yet have access to the new prefix DFA handling. Thus, its performance has gotten *worse* than the dynamic implementation in most cases. The docs have been updated to reflect this change. Surprisingly, all of this required exactly one new application of `unsafe`, which is isolated in the `memchr` crate. (Aho-Corasick has no `unsafe` either!) There should be *no* breaking changes in this commit. The only public facing change is the addition of a method to the `Replacer` trait, but it comes with a default implementation so that existing implementors won't break. (Its purpose is to serve as a hint as to whether or not replacement strings need to be expanded. This is crucial to speeding up simple replacements.) Closes #21.