mirror of
https://github.com/openharmony/third_party_rust_aho-corasick.git
synced 2026-07-01 23:32:36 -04:00
8ee90ca160
This commit introduces a ground-up rewrite of the entire crate. Most or all use cases served by `aho-corasick 0.6` should be served by this rewrite as well. Pretty much everything has been improved. The API is simpler, and much more flexible with many new configuration knobs for controlling the space-vs-time tradeoffs of Aho-Corasick automatons. In particular, there are several tunable optimizations for controlling space usage such as state ID representation and byte classes. The API is simpler in that there is now just one type that encapsulates everything: `AhoCorasick`. Support for streams has been improved quite a bit, with new APIs for stream search & replace. Test and benchmark coverage has increased quite a bit. This also fixes a subtle but important bug: empty patterns are now handled correctly. Previously, they could never match, but now they can match at any position. Finally, I believe this is now the only Aho-Corasick implementation to support leftmost-first and leftmost-longest semantics by using what I think is a novel alteration to the Aho-Corasick construction algorithm. I surveyed some other implementations, and there are a few Java libraries that support leftmost-longest match semantics, but they implement it by adding a sliding queue at search time. I also looked into Perl's regex implementation which has an Aho-Corasick optimization for `foo|bar|baz|...|quux` style regexes, and therefore must somehow implement leftmost-first semantics. The code is a bit hard to grok, but it looks like this is being handled at search time as opposed to baking it into the automaton. Fixes #18, Fixes #19, Fixes #26, Closes #34