56 Commits

Author SHA1 Message Date
Andrew Gallant
c01b633804
bench: add new benchmark baseline
I added this so that I can compare the results of the old benchmark
suite with the new one I'm working on in regex-automata. (The idea is to
port all or most of the benchmarks from the old suite and make sure the
results are at least roughly consistent.)
2022-07-01 09:15:49 -04:00
Andrew Gallant
ea525cd1bf
bench: remove D and C++ regex engines
Neither of them were particularly competitive and they make building the
benchmark harness more trouble than it's worth.
2022-07-01 09:15:49 -04:00
cui fliter
b5372864e2
doc: fix some typos
PR #856
2022-04-24 13:24:49 -04:00
Alex Touchet
b92ffd5471
cargo: use SPDX license format
We were previously using '/' to indicate the dual licensing
scheme, but I guess we're now supposed to use 'OR'.

PR #843
2022-03-03 07:31:45 -05:00
Andrew Gallant
e2860fe037 edition: manual fixups to code
This commit does a number of manual fixups to the code after the
previous two commits were done via 'cargo fix' automatically.

Actually, this contains more 'cargo fix' annotations, since I had
forgotten to add 'edition = "2018"' to all sub-crates.
2021-04-30 20:02:56 -04:00
Andrew Gallant
ccdcf27805 imp: use new memmem impl from memchr crate
This removes the ad hoc FreqyPacked searcher and the implementation of
Boyer-Moore, and replaces it with a new implementation of memmem in the
memchr crate. (Introduced in memchr 2.4.) Since memchr 2.4 also moves to
Rust 2018, we'll do the same in subsequent commits. (Finally.)

The benchmarks look about as expected. Latency on some of the smaller
benchmarks has worsened slightly by a nanosecond or two. The top
throughput speed has also decreased, and some other benchmarks
(especially ones with frequent literal matches) have improved
dramatically.
2021-04-30 20:02:56 -04:00
Andrew Gallant
691ec58171 bench: reduce huge regex a bit
It looks like it blows the default regex size limit at the moment.
2021-03-11 21:10:40 -05:00
Jeremy Stucki
8b0d2acacf style: use Once::new 2020-01-09 14:26:57 -05:00
Andrew Gallant
058a2e1fc1
bench: add regex compilation benchmarks
I don't remember why I disabled these (or even if I did it intentionally),
but bring them back.
2019-08-22 18:03:10 -04:00
Andrew Gallant
fc3e6aa19a
license: remove license headers from files
The Rust project determined these were unnecessary a while back[1,2,3]
and we follow suite.

[1] - 0565653eec
[2] - https://github.com/rust-lang/rust/pull/43498
[3] - https://github.com/rust-lang/rust/pull/57108
2019-08-03 14:47:45 -04:00
Andrew Gallant
0e96af4166
style: start using rustfmt 2019-08-03 14:20:22 -04:00
gnzlbg
3ab963e429
bench: improve error handling for benchmark script
Closes #591
2019-07-04 11:31:38 -04:00
Andrew Gallant
0a5beddafc
bench: slim down compile script
Some of the other regex implementations appear to be having trouble
compiling. This disables those for now.
2019-07-04 10:18:08 -04:00
Andrew Gallant
d4b9419ed4
1.1.0 2018-11-30 22:06:13 -05:00
Markus Westerlind
d51d23642f bench: add RegexSet benchmarks 2018-11-30 21:47:53 -05:00
Finkelman
4393476db5 update some debs mostly for minimal-versions 2018-08-16 13:57:07 -04:00
Andrew Gallant
d107c80dae
regex 1.0.1 2018-06-19 19:28:32 -04:00
Andrew Gallant
e455d53108 literal: auto enable SIMD on Rust stable 1.27+
This commit removes the need to use the `unstable` feature to enable
SIMD optimizations. We add a "version sniffer" to the `build.rs` script
to detect if Rust version 1.27 or newer is being used, and if so, enable
the SIMD optimizations.

The 'unstable' feature is now a no-op, but we keep it for backwards
compatibility. We also may use it again some day.
2018-06-19 18:13:24 -04:00
Andrew Gallant
b5ef0ec281
regex 1.0 2018-05-01 16:52:05 -04:00
Andrew Gallant
92e7baf584
regex-syntax 0.5.6 2018-05-01 13:28:53 -04:00
Andrew Gallant
2c7ae83b7b
bench: add up-to-date benchmarks
This includes D, C++/boost, C++/std, Oniguruma, PCRE1, PCRE2, RE2 and
Tcl.
2018-04-29 10:07:25 -04:00
Andrew Gallant
5d42006a31
bench: fixes for benchmarking harness
This forces the C++ benchmarks that use libc++ to use Clang, which is
apparently the only way it works?

We also disable a benchmark for D's compile time regexes that seems to
either never terminate or take exponential time.
2018-04-29 10:01:07 -04:00
Matthew Krupcale
4e3a107376
bench: add boost
This commit adds a new `re-boost` feature that enables benchmarking
Boost's regex implementation.

Closes #459
2018-04-28 12:22:04 -04:00
Matthew Krupcale
00a66ded28
bench: add libc++'s std::regex
This commit adds a new `libcxx` feature that enables testing libc++'s
implementation of `std::regex` when combined with the `re-stdcpp`
feature.

See also: https://libcxx.llvm.org/docs/UsingLibcxx.html
2018-04-28 12:22:02 -04:00
Matthew Krupcale
f9cd75c463
bench: add C++'s std::regex
This commit adds a new `re-stdcpp` feature to the benchmark runner that
enables benchmarking C++'s standard library regex implementation.
2018-04-28 12:22:02 -04:00
Andrew Gallant
361459c27f bench: remove RUSTFLAGS
We no longer need to enable SIMD optimizations at compile time. They are
automatically enabled when regex is compiled with the `unstable`
feature.
2018-03-12 22:32:53 -04:00
Andrew Gallant
91296ddcc0 teddy: port teddy searcher to std::arch
This commit ports the Teddy searcher to use std::arch and moves off the
portable SIMD vector API. Performance remains the same, and it looks
like the codegen is identical, which is great!

This also makes the `simd-accel` feature a no-op and adds a new
`unstable` feature which will enable the Teddy optimization. The `-C
target-feature` or `-C target-cpu` settings are no longer necessary,
since this will now do runtime target feature detection.

We also add a new `unstable` feature to the regex crate, which will
enable this new use of std::arch. Once enabled, the Teddy optimizations
becomes available automatically without any additional compile time
flags.
2018-03-12 22:32:53 -04:00
Andrew Gallant
b3e5fd2dde regex: remove old regex-syntax crate
This commit does the mechanical changes necessary to remove the old
regex-syntax crate and replace it with the rewrite. The rewrite now
subsumes the `regex-syntax` crate name, and gets a semver bump to 0.5.0.
2018-03-07 19:01:24 -05:00
Andrew Gallant
43bb64b254
bench: small tweaks
This adds object files (produced by D compilers) to gitignore, and adds
RE2 to the benchmark compilation script by default.
2018-03-04 09:23:56 -05:00
Andrew Gallant
f0b92ca277
bench: update to memmap 0.6 2018-02-17 22:14:47 -05:00
Andrew Gallant
2dee2fe3f2
bench: add logs 2018-02-08 18:14:47 -05:00
Robert Clipsham
ed174dfd41 Add benchmarks for D's ctRegex 2018-01-01 10:35:45 -05:00
Robert Clipsham
49f2a3dae5 Add d-phobos bench feature to reduce duplication 2018-01-01 10:35:45 -05:00
Andrew Gallant
fe9d82be0f ci: try to improve build times 2018-01-01 09:21:07 -05:00
Robert Clipsham
9c790659c4 Add support for benchmarking D's std.regex
This commit adds support for benchmarking the runtime version of the D
programming language's std.regex using the dmd and ldc compilers.

Closes #430
2017-12-31 18:11:48 -05:00
Ethan Pailes
918d4a0cdd search: skip dfa for anchored pats with captures
The DFA can't produce captures, but is still faster than the Pike VM
NFA, so the normal approach to finding capture groups is to look for
the entire match with the DFA and then run the NFA on the substring
of the input that matched. In cases where the regex in anchored, the
match always starts at the beginning of the input, so there is never
any point to trying the DFA first.

The DFA can still be useful for rejecting inputs which are not in the
language of the regular expression, but anchored regex with capture
groups are most commonly used in a parsing context, so it seems like a
fair trade-off.

Fixes #348
2017-12-30 15:37:41 -05:00
Ethan Pailes
5aa347a136 docs: fix dangling references to run-bench
4fab6c added the current bench runner script as `benches/run`, and
removed the old `run-bench` script. It was later renamed to `bench/run`
when `benches` was renamed to `bench` in b217bf. This patch fixes a few
references to the old benchmark runner in the hacking guide as well
as a few references to the old directory structure. The cargo plugin
syntax in the example is also updated.
2017-12-30 15:37:41 -05:00
Andrew Gallant
65c4f8ee1f docs: link to docs.rs 2017-12-30 15:37:41 -05:00
Andrew Gallant
00f30ee02a bench: update the benchmark runner
This updates dependencies and makes sure everything compiles and runs.
This also simplifies the build script.
2017-12-30 15:37:41 -05:00
Andrew Gallant
2f1e5b0e10 deps: setup workspace
There are a few sub-crates in this repository, so sharing a target
directory makes sense.
2017-12-30 15:37:41 -05:00
Andrew Gallant
0375954389 regex_macros: delete it
The regex_macros crate hasn't been maintained in quite some time, and has
been broken. Nobody has complained. Given the fact that there are no
immediate plans to improve the situation, and the fact that it is slower
than the runtime engine, we simply remove it.
2017-12-30 15:37:41 -05:00
Ethan Pailes
d5be8391ca Add an implimentation of Tuned Boyer-Moore.
While the existing literal string searching algorithm
leveraging memchr is quite fast, in some case more
traditional approaches still make sense. This patch
provides an implimentation of Tuned Boyer-Moore as
laid out in Fast String Searching by Hume & Sunday.
Some refinements to their work were gleened from the
grep source.

See: https://github.com/rust-lang/regex/issues/408
See: https://github.com/BurntSushi/ripgrep/issues/617
2017-12-09 08:48:03 -05:00
Andrew Gallant
df48ddc79d update benchmarks 2017-02-08 19:07:58 -05:00
Andrew Gallant
c7bc06f8d4 Reorganize CI testing.
Writing all of the testing scripts inside the .travis.yml file was
becoming painful, and parts of it were wrong by allowing for some
commands to fail without failing the entire build.

This also fixes the Github token (again).
2017-01-02 16:50:48 -05:00
Andrew Gallant
ac3ab6d21b Bump versions everywhere and update CHANGELOG.
Fixes #296, Fixes #307
2016-12-31 17:01:54 -05:00
Andrew Gallant
d44a9f94ab Switch bytes::Regex to using Unicode mode by default. 2016-12-30 01:05:43 -05:00
Andrew Gallant
623132526c Touch up benchmarks.
This makes a few touch ups to benchmarks:

1. Add some regex-dna related benchmarks.
2. Change use of RUSTFLAGS="-C target-feature=+ssse3" to
   RUSTFLAGS="-C target-cpu=native".
3. Switch order of parameters to regex-run-one benchmarking tool.
2016-06-17 04:54:53 -04:00
Andrew Gallant
203c509df9 Add SIMD accelerated multiple pattern search.
This uses the "Teddy" algorithm, as learned from the Hyperscan regular
expression library: https://01.org/hyperscan

This support optional, subject to the following:

1. A nightly compiler.
2. Enabling the `simd-accel` feature.
3. Adding `RUSTFLAGS="-C target-feature=+ssse3"` when compiling.
2016-05-18 10:48:13 -04:00
Andrew Gallant
7038f5c430 small cleanups 2016-05-06 20:20:05 -04:00
Andrew Gallant
37b6d318c0 Reintroduce the reverse suffix literal optimization.
It's too good to pass up. This time, we avoid quadratic behavior
with a simple work-around: we limit the amount of reverse searching
we do after having found a literal match. If the reverse search ends
at the beginning of its search text (whether a match or not), then we
stop the reverse suffix optimization and fall back to the standard forward
search.

This reverts commit 50d991eaf53e6c21b8101c82e01ab6cf36fe687c.

# Conflicts:
#	src/exec.rs
2016-05-06 18:00:02 -04:00