I added this so that I can compare the results of the old benchmark
suite with the new one I'm working on in regex-automata. (The idea is to
port all or most of the benchmarks from the old suite and make sure the
results are at least roughly consistent.)
This commit does a number of manual fixups to the code after the
previous two commits were done via 'cargo fix' automatically.
Actually, this contains more 'cargo fix' annotations, since I had
forgotten to add 'edition = "2018"' to all sub-crates.
This removes the ad hoc FreqyPacked searcher and the implementation of
Boyer-Moore, and replaces it with a new implementation of memmem in the
memchr crate. (Introduced in memchr 2.4.) Since memchr 2.4 also moves to
Rust 2018, we'll do the same in subsequent commits. (Finally.)
The benchmarks look about as expected. Latency on some of the smaller
benchmarks has worsened slightly by a nanosecond or two. The top
throughput speed has also decreased, and some other benchmarks
(especially ones with frequent literal matches) have improved
dramatically.
This commit removes the need to use the `unstable` feature to enable
SIMD optimizations. We add a "version sniffer" to the `build.rs` script
to detect if Rust version 1.27 or newer is being used, and if so, enable
the SIMD optimizations.
The 'unstable' feature is now a no-op, but we keep it for backwards
compatibility. We also may use it again some day.
This forces the C++ benchmarks that use libc++ to use Clang, which is
apparently the only way it works?
We also disable a benchmark for D's compile time regexes that seems to
either never terminate or take exponential time.
This commit adds a new `libcxx` feature that enables testing libc++'s
implementation of `std::regex` when combined with the `re-stdcpp`
feature.
See also: https://libcxx.llvm.org/docs/UsingLibcxx.html
This commit ports the Teddy searcher to use std::arch and moves off the
portable SIMD vector API. Performance remains the same, and it looks
like the codegen is identical, which is great!
This also makes the `simd-accel` feature a no-op and adds a new
`unstable` feature which will enable the Teddy optimization. The `-C
target-feature` or `-C target-cpu` settings are no longer necessary,
since this will now do runtime target feature detection.
We also add a new `unstable` feature to the regex crate, which will
enable this new use of std::arch. Once enabled, the Teddy optimizations
becomes available automatically without any additional compile time
flags.
This commit does the mechanical changes necessary to remove the old
regex-syntax crate and replace it with the rewrite. The rewrite now
subsumes the `regex-syntax` crate name, and gets a semver bump to 0.5.0.
The DFA can't produce captures, but is still faster than the Pike VM
NFA, so the normal approach to finding capture groups is to look for
the entire match with the DFA and then run the NFA on the substring
of the input that matched. In cases where the regex in anchored, the
match always starts at the beginning of the input, so there is never
any point to trying the DFA first.
The DFA can still be useful for rejecting inputs which are not in the
language of the regular expression, but anchored regex with capture
groups are most commonly used in a parsing context, so it seems like a
fair trade-off.
Fixes#348
4fab6c added the current bench runner script as `benches/run`, and
removed the old `run-bench` script. It was later renamed to `bench/run`
when `benches` was renamed to `bench` in b217bf. This patch fixes a few
references to the old benchmark runner in the hacking guide as well
as a few references to the old directory structure. The cargo plugin
syntax in the example is also updated.
The regex_macros crate hasn't been maintained in quite some time, and has
been broken. Nobody has complained. Given the fact that there are no
immediate plans to improve the situation, and the fact that it is slower
than the runtime engine, we simply remove it.
While the existing literal string searching algorithm
leveraging memchr is quite fast, in some case more
traditional approaches still make sense. This patch
provides an implimentation of Tuned Boyer-Moore as
laid out in Fast String Searching by Hume & Sunday.
Some refinements to their work were gleened from the
grep source.
See: https://github.com/rust-lang/regex/issues/408
See: https://github.com/BurntSushi/ripgrep/issues/617
Writing all of the testing scripts inside the .travis.yml file was
becoming painful, and parts of it were wrong by allowing for some
commands to fail without failing the entire build.
This also fixes the Github token (again).
This makes a few touch ups to benchmarks:
1. Add some regex-dna related benchmarks.
2. Change use of RUSTFLAGS="-C target-feature=+ssse3" to
RUSTFLAGS="-C target-cpu=native".
3. Switch order of parameters to regex-run-one benchmarking tool.
This uses the "Teddy" algorithm, as learned from the Hyperscan regular
expression library: https://01.org/hyperscan
This support optional, subject to the following:
1. A nightly compiler.
2. Enabling the `simd-accel` feature.
3. Adding `RUSTFLAGS="-C target-feature=+ssse3"` when compiling.
It's too good to pass up. This time, we avoid quadratic behavior
with a simple work-around: we limit the amount of reverse searching
we do after having found a literal match. If the reverse search ends
at the beginning of its search text (whether a match or not), then we
stop the reverse suffix optimization and fall back to the standard forward
search.
This reverts commit 50d991eaf53e6c21b8101c82e01ab6cf36fe687c.
# Conflicts:
# src/exec.rs