Files
Andrew Gallant 8ee90ca160 rewrite the entire crate
This commit introduces a ground-up rewrite of the entire crate. Most or
all use cases served by `aho-corasick 0.6` should be served by this
rewrite as well. Pretty much everything has been improved. The API is
simpler, and much more flexible with many new configuration knobs for
controlling the space-vs-time tradeoffs of Aho-Corasick automatons. In
particular, there are several tunable optimizations for controlling
space usage such as state ID representation and byte classes.

The API is simpler in that there is now just one type that encapsulates
everything: `AhoCorasick`.

Support for streams has been improved quite a bit, with new APIs for
stream search & replace.

Test and benchmark coverage has increased quite a bit.

This also fixes a subtle but important bug: empty patterns are now
handled correctly. Previously, they could never match, but now they can
match at any position.

Finally, I believe this is now the only Aho-Corasick implementation to
support leftmost-first and leftmost-longest semantics by using what I
think is a novel alteration to the Aho-Corasick construction algorithm.
I surveyed some other implementations, and there are a few Java
libraries that support leftmost-longest match semantics, but they
implement it by adding a sliding queue at search time. I also looked
into Perl's regex implementation which has an Aho-Corasick optimization
for `foo|bar|baz|...|quux` style regexes, and therefore must somehow
implement leftmost-first semantics. The code is a bit hard to grok, but
it looks like this is being handled at search time as opposed to baking
it into the automaton.

Fixes #18, Fixes #19, Fixes #26, Closes #34
2019-03-27 22:06:36 -04:00
..
2019-03-27 22:06:36 -04:00
2019-03-27 22:06:36 -04:00
2019-03-27 22:06:36 -04:00
2019-03-27 22:06:36 -04:00
2019-03-27 22:06:36 -04:00