This removes the ad hoc FreqyPacked searcher and the implementation of
Boyer-Moore, and replaces it with a new implementation of memmem in the
memchr crate. (Introduced in memchr 2.4.) Since memchr 2.4 also moves to
Rust 2018, we'll do the same in subsequent commits. (Finally.)
The benchmarks look about as expected. Latency on some of the smaller
benchmarks has worsened slightly by a nanosecond or two. The top
throughput speed has also decreased, and some other benchmarks
(especially ones with frequent literal matches) have improved
dramatically.
This commit removes the thread_local dependency (even as an optional
dependency) and replaces it with a more purpose driven memory pool. The
comments in src/pool.rs explain this in more detail, but the short story
is that thread_local seems to be at the root of some memory leaks
happening in certain usage scenarios.
The great thing about thread_local though is how fast it is. Using a
simple Mutex<Vec<T>> is easily at least twice as slow. We work around
that a bit by coding a simplistic fast path for the "owner" of a pool.
This does require one new use of `unsafe`, of which we extensively
document.
This now makes the 'perf-cache' feature a no-op. We of course retain it
for compatibility purposes (and perhaps it will be used again in the
future), but for now, we always use the same pool.
As for benchmarks, it is likely that *some* cases will get a hair
slower. But there shouldn't be any dramatic difference. A careful review
of micro-benchmarks in addition to more holistic (albeit ad hoc)
benchmarks via ripgrep seems to confirm this.
Now that we have more explicit control over the memory pool, we also
clean stuff up with repsect to RefUnwindSafe.
Fixes#362, Fixes#576
Ref https://github.com/BurntSushi/rure-go/issues/3
The quickcheck update seems to have sussed out a bug in our DFA logic
regarding the encoding of NFA state IDs. But the bug seems unlikely to
occur in real code, so we massage the test data for now until the lazy
DFA gets moved into regex-automata.
It relies on `cfg(doctest)`, which wasn't stabilized until Rust 1.43.
Interestingly, it compiled on Rust 1.28, but didn't compile on, e.g.,
Rust 1.39. This breaks our MSRV policy, so we unfortunately remove the
use of doc_comment for now. It's likely possible to conditionally
enable it, but the extra build script required to do version sniffing to
do it doesn't seem worth it.
Fixes#684, Fixes#685
This commit tweaks the features enabled for the `regex-syntax` crate
from the `regex` crate itself. This isn't intended to actually have any
functional change, but should help feature unification for Cargo in some
projects.
One project I work on exhibits an issue where executing `cargo build`
followed by `cargo test` will rebuild `regex-syntax` and all of its
transitive dependencies. The cause for this issue is that the tests are
using the `proptest` crate. The `proptest` crate depends on
`regex-syntax` with normal features (e.g. the defaults). All other
crates depend on `regex` with normal default features too.
The problem happens where when *only* the `regex` crate depends on
`regex-syntax` then the `default` and `unicode` features of
`regex-syntax` are disabled. This is because the `regex` crate disables
default features and `regex`'s `unicode` feature delegates to all the
individual features of `regex-syntax`. When the `regex-syntax` crate is
depended on directly by `proptest` it then enables the `default` and
`unicode` features of `regex-syntax`.
Functionally these two builds of `regex-syntax` are exactly the same
since `default` is simply a proxy for `unicode` and `unicode` is simply
an umbrella including other features.
This PR updates the features enabled on `regex-syntax` by the `regex`
crate in two ways:
* The `default` feature for `regex` enables `regex-syntax/default`.
* The `unicode` feature for `regex` enables the `regex-syntax/unicode`
feature.
This makes is so that if another crate in your crate graph depends on
`regex-syntax` then it'll have, by default, the same set of features
enabled than if you also depend on `regex`.
PR #665
Historically, the `/` was used throughout the ecosystem, but
the actual SPDX standard requires this to be an OR. The said
standard also has AND, and for this reason `/` is ambiguous.
https://github.com/rust-lang/cargo/pull/4898 contains some more details
about this.
PR #615
This commit enables support for the perf-literal feature. When it's
disabled, no literal optimizations will be performed. Instead, only
the regex engine itself is used.
In practice, it's quite plausible that we don't need to disable *all*
literal optimizations. But that is the simplest path here, and I don't
have the stomach to do anything more with the current code. src/exec.rs
has turned into a giant soup.
This commit sets up the infrastructure for supporting various `unicode`
and `perf` features, which permit decreasing binary size, compile times
and the size of the dependency tree.
Most of the work here is in modifying the regex tests to make them
work in concert with the available Unicode features. In cases where
Unicode is irrelevant, we just turn it off. In other cases, we require
the Unicode features to run the tests.
This also introduces a new error in the compiler where by if a Unicode
word boundary is used, but the `unicode-perl` feature is disabled, then
the regex will fail to compile. (Because the necessary data to match
Unicode word boundaries isn't available.)