Andrew Gallant c86c025624 Major refactoring and performance improvements.
Overview of changes:

* Instruction set has been redesigned to be smaller, mostly by
  collapsing empty-width matches into one instruction type.
  In addition to moving instruction-matching out of the matching
  engine, this makes matching engine code much simpler.
* Rewrote input handling to use an inline representation of
  `Option<char>` and clearer position handling with the `Input` trait.
* Added a new bounded backtracking matching engine that is invoked for
  small regexes/inputs. It's about twice as fast as the full NFA
  matching engine.
* Implemented caching for both the NFA and backtracking engines.
  This avoids costly allocations on subsequent uses of the regex.
* Overhauled prefix handling at both discovery and matching.
  Namely, sets of prefix literals can now be extracted from regexes.
  Depending on what the prefixes look like, an Aho-Corasick DFA
  is built from them.
  (This adds a dependency on the `aho-corasick` crate.)
* When appropriate, use `memchr` to jump around in the input when
  there is a single common byte prefix.
  (This adds a dependency on the `memchr` crate.)
* Bring the `regex!` macro up to date. Unfortunately, it still
  implements the full NFA matching engine and doesn't yet have
  access to the new prefix DFA handling. Thus, its performance
  has gotten *worse* than the dynamic implementation in most
  cases. The docs have been updated to reflect this change.

Surprisingly, all of this required exactly one new application of
`unsafe`, which is isolated in the `memchr` crate. (Aho-Corasick has no
`unsafe` either!)

There should be *no* breaking changes in this commit. The only public
facing change is the addition of a method to the `Replacer` trait, but
it comes with a default implementation so that existing implementors
won't break. (Its purpose is to serve as a hint as to whether or not
replacement strings need to be expanded. This is crucial to speeding
up simple replacements.)

Closes #21.
2015-06-15 21:18:24 -04:00
..