third_party_rust_regex

mirror of https://gitee.com/openharmony/third_party_rust_regex synced 2025-04-12 07:34:07 +00:00

Author	SHA1	Message	Date
Andrew Gallant	258bdf798a	changelog: 1.5.5 This adds the notes after the release, which were overlooked.	2022-03-08 09:46:00 -05:00
Andrew Gallant	d130381b15	1.5.5	2022-03-08 08:58:47 -05:00
Andrew Gallant	ae70b41d4f	security: fix denial-of-service bug in compiler The regex compiler will happily attempt to compile '(?:){294967295}' by compiling the empty sub-expression 294,967,295 times. Empty sub-expressions don't use any memory in the current implementation, so this doesn't trigger the pre-existing machinery for stopping compilation early if the regex object gets too big. The end result is that while compilation will eventually succeed, it takes a very long time to do so. In this commit, we fix this problem by adding a fake amount of memory every time we compile an empty sub-expression. It turns out we were already tracking an additional amount of indirect heap usage via 'extra_inst_bytes' in the compiler, so we just make it look like compiling an empty sub-expression actually adds an additional 'Inst' to the compiled regex object. This has the effect of causing the regex compiler to reject this sort of regex in a reasonable amount of time by default. Many thanks to @VTCAKAVSMoACE for reporting this, providing the valuable test cases and continuing to test this patch as it was developed. Fixes https://github.com/rust-lang/regex/security/advisories/GHSA-m5pq-gvj9-9vr8	2022-03-03 10:05:00 -05:00
Alex Touchet	b92ffd5471	cargo: use SPDX license format We were previously using '/' to indicate the dual licensing scheme, but I guess we're now supposed to use 'OR'. PR #843	2022-03-03 07:31:45 -05:00
Andrew Gallant	f6e52dafde	syntax: fix 'unused' warnings It looks like the dead code detector got smarter. We never ended up using the 'printer' field in these visitors, so just get rid of it.	2022-02-25 12:48:26 -05:00
Catena cyber	5197f21287	fuzz: do not use inherits in Cargo.toml This fixes the oss-fuzz build. Specifically, the build log[1] showed this error: Step #3 - "compile-libfuzzer-address-x86_64": error: inherits must not be specified in root profile dev So we just remove it and inline the settings. PR #817 [1] - https://oss-fuzz-build-logs.storage.googleapis.com/log-c9b61873-8950-4a50-a729-820d5617ff7a.txt	2021-11-17 16:49:44 -05:00
Dave Rolsky	3662851482	doc: fix typo PR #814	2021-11-15 09:52:37 -05:00
Ian Kerins	63ee6699a2	syntax/doc: fix 'their' typo	2021-11-02 18:25:39 -04:00
Alex Touchet	d6bc7a4c3b	readme: remove broken badge This was missed in bd0a142. Fixes #797 (again)	2021-07-23 12:49:36 -04:00
Andrew Gallant	bd7466034f	fuzz: try to fix build issue Ref: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=36474 See: https://oss-fuzz-build-logs.storage.googleapis.com/log-fe51f615-a13f-4685-b8d8-de4583da1ebd.txt	2021-07-23 08:39:44 -04:00
Andrew Gallant	bd0a14231b	readme: fix badges Fixes #797, Fixes #798	2021-07-23 08:24:45 -04:00
Andrew Gallant	fce37e4932	dfa: remove some redundant branches I discovered these while reviewing the code to prep for the rewrite in regex-automata.	2021-06-26 09:16:29 -04:00
Andrew Gallant	6cdb9040f5	fuzz: bump libfuzzer-sys dependency This is a half-hearted attempt to fix a build failure that I don't understand in OSS-fuzz: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=34294 cc @DavidKorczynski	2021-05-14 08:31:50 -04:00
Andrew Gallant	f2dc1b788f	1.5.4	2021-05-06 17:34:20 -04:00
Andrew Gallant	aa170b54db	changelog: 1.5.4	2021-05-06 17:34:04 -04:00
Lucas	11543fd949	pattern: fix compilation This fixes compilation when the 'pattern' feature is enabled. This wasn't previously tested since it is a nightly only feature. But in this commit, we add it to CI explicitly. PR #772	2021-05-06 17:32:55 -04:00
Dirk Stolle	238b7759ad	readme: update rustc version in MSRV badge PR #773	2021-05-05 07:56:28 -04:00
Dirk Stolle	977aabd043	doc: fix some typos PR #774	2021-05-05 07:56:08 -04:00
Andrew Gallant	26c8d8e461	1.5.3	2021-05-01 20:31:04 -04:00
Andrew Gallant	5f557188e0	deps: bump to regex-syntax 0.6.25	2021-05-01 20:31:02 -04:00
Andrew Gallant	3ea9e3eca7	regex-syntax-0.6.25	2021-05-01 20:30:34 -04:00
Andrew Gallant	908594905a	changelog: 1.5.3	2021-05-01 20:30:27 -04:00
Andrew Gallant	a8554b3cc4	syntax: fix compilation errors with unicode-perl When only the unicode-perl feature is enabled, regex-syntax would fail to build. It turns out that 'cargo fix' doesn't actually fix all imports. It looks like it only fixes things that it can build in the current configuration. Fixes #769, Fixes #770	2021-05-01 18:52:18 -04:00
Andrew Gallant	0abcada3a7	ci: test scripts should fail on errors While these test scripts are running in CI, if any of their commands fail, they don't actually fail the build.	2021-05-01 18:52:18 -04:00
Andrew Gallant	2393c5555c	1.5.2	2021-05-01 07:44:06 -04:00
Andrew Gallant	eb009655e9	changelog: 1.5.2	2021-05-01 07:44:03 -04:00
Andrew Gallant	036ce80c93	compiler: fix lazy DFA false quits on ASCII text One of the things the lazy DFA can't handle is Unicode word boundaries, since it requires multi-byte look-around. However, it turns out that on pure ASCII text, Unicode word boundaries are equivalent to ASCII word boundaries. So the DFA has a heuristic: it treats Unicode word boundaries as ASCII boundaries until it sees a non-ASCII byte. When it does, it quits, and some other (slower) regex engine needs to take over. In a bug report against ripgrep[1], it was discovered that the lazy DFA was quitting and falling back to a slower engine even though the haystack was pure ASCII. It turned out that our equivalence byte class optimization was at fault. Namely, a '{' (which appears very frequently in the input) was being grouped in with other non-ASCII bytes. So whenever the DFA saw it, it treated it as a non-ASCII byte and thus stopped. The fix for this is simple: when we see a Unicode word boundary in the compiler, we set a boundary on our byte classes such that ASCII bytes are guaranteed to be in a different class from non-ASCII bytes. And indeed, this fixes the performance problem reported in [1]. [1] - https://github.com/BurntSushi/ripgrep/issues/1860	2021-05-01 07:42:36 -04:00
Andrew Gallant	374c1680dc	1.5.1	2021-04-30 20:25:22 -04:00
Andrew Gallant	0c6dfbc1d9	impl: fix compilation error when perf-literal is disabled It's unclear to me why CI did not catch this. CI explicitly tests building regex without the perf-literal feature enabled.	2021-04-30 20:25:20 -04:00
Andrew Gallant	9f9f693768	1.5.0	2021-04-30 20:11:21 -04:00
Andrew Gallant	b0ff75df4e	impl: remove deprecated use of byte_classes The auto_configure routine will now never disable it.	2021-04-30 20:10:35 -04:00
Andrew Gallant	f3b8479840	deps: bump regex-syntax minimum version to 0.6.24	2021-04-30 20:09:54 -04:00
Andrew Gallant	00fb09e0b7	regex-syntax-0.6.24	2021-04-30 20:09:30 -04:00
Andrew Gallant	99bd099a20	changelog: 1.5.0	2021-04-30 20:08:51 -04:00
Andrew Gallant	a2a393f1ff	fmt: run 'cargo fmt --all' It looks like 'cargo fix' didn't do this.	2021-04-30 20:02:56 -04:00
Andrew Gallant	832ba73877	msrv: bump to Rust 1.41.1 This was long overdue, and we were motivated by memchr's move to Rust 2018 in https://github.com/BurntSushi/memchr/pull/82. Rust 1.41.1 was selected because it's the current version of Rust in Debian Stable. It also feels old enough to assure wide support.	2021-04-30 20:02:56 -04:00
Andrew Gallant	e2860fe037	edition: manual fixups to code This commit does a number of manual fixups to the code after the previous two commits were done via 'cargo fix' automatically. Actually, this contains more 'cargo fix' annotations, since I had forgotten to add 'edition = "2018"' to all sub-crates.	2021-04-30 20:02:56 -04:00
Andrew Gallant	94ce242913	edition: more 2018 migration (idioms)	2021-04-30 20:02:56 -04:00
Andrew Gallant	cb108b77e7	edition: initial migration to Rust 2018	2021-04-30 20:02:56 -04:00
Andrew Gallant	ccdcf27805	imp: use new memmem impl from memchr crate This removes the ad hoc FreqyPacked searcher and the implementation of Boyer-Moore, and replaces it with a new implementation of memmem in the memchr crate. (Introduced in memchr 2.4.) Since memchr 2.4 also moves to Rust 2018, we'll do the same in subsequent commits. (Finally.) The benchmarks look about as expected. Latency on some of the smaller benchmarks has worsened slightly by a nanosecond or two. The top throughput speed has also decreased, and some other benchmarks (especially ones with frequent literal matches) have improved dramatically.	2021-04-30 20:02:56 -04:00
Andrew Gallant	3db8722d0b	1.4.6	2021-04-22 17:59:28 -04:00
Andrew Gallant	41f14c2d9b	fuzz: account for Unicode class size in compiler This improves the precision of the "expression too big" regex compilation error. Previously, it was not considering the heap usage from Unicode character classes. It's possible this will make some regexes fail to compile that previously compiled. However, this is a bug fix. If you do wind up seeing this though, feel free to file an issue, since it would be good to get an idea of what kinds of regexes no longer compile but did. This was found by OSS-fuzz: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=33579	2021-04-22 17:59:22 -04:00
Élie ROUDNINSKI	6d95a6f836	impl: shrink size of Inst By using a boxed slice instead of a vector, we can shrink the size of the `Inst` structure by 8 bytes going from 40 to 32 bytes on 64-bit platforms. PR #760	2021-04-14 07:52:15 -04:00
DavidKorczynski	cc0f2c9064	fuzz: update libfuzzer dependency This is intended to fix an OSS-fuzz build failure detailed here: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=32817 Fixes #757	2021-04-08 10:43:47 -04:00
Andrew Gallant	ff283badce	1.4.5	2021-03-14 14:38:55 -04:00
Andrew Gallant	78c7cefbc9	impl: substantially reduce regex stack size This commit fixes a fairly large regression in the stack size of a Regex introduced in regex 1.4.4. When I dropped thread_local and replaced it with Pool, it turned out that Pool inlined a T into its struct and a Regex in turn had Pool inlined into itself. It further turns out that the T=ProgramCache is itself quite large. We fix this by introducing an indirection in the inner regex type. That is, we use a Box<Pool> instead of a Pool. This shrinks the size of a Regex from 856 bytes to 16 bytes. Interestingly, prior to regex 1.4.4, a Regex was still quite substantial in size, coming in at around 552 bytes. So it looks like the 1.4.4 release didn't dramatically increase it, but it increased it enough that folks started experiencing real problems: stack overflows. Since indirection can lead to worse locality and performance loss, I did run the benchmark suite. I couldn't see any measurable difference. This is generally what I would expect. This is an indirection at a fairly high level. There's lots of other indirection already, and this indirection isn't accessed in a hot path. (The regex cache itself is of course used in hot paths, but by the time we get there, we have already followed this particular pointer.) We also include a regression test that asserts a Regex (and company) are 16 bytes in size. While this isn't an API guarantee, it at least means that increasing the size of Regex will be an intentional thing in the future and not an accidental leakage of implementation details. Fixes #750, Fixes #751 Ref https://github.com/servo/servo/pull/28269	2021-03-14 14:38:56 -04:00
Andrew Gallant	951b8b93bb	1.4.4	2021-03-11 21:16:13 -05:00
Andrew Gallant	5a3570163b	regex-syntax-0.6.23	2021-03-11 21:15:50 -05:00
Andrew Gallant	967a0905a3	changelog: 1.4.4	2021-03-11 21:15:33 -05:00
Andrew Gallant	e040c1b063	impl: drop thread_local dependency This commit removes the thread_local dependency (even as an optional dependency) and replaces it with a more purpose driven memory pool. The comments in src/pool.rs explain this in more detail, but the short story is that thread_local seems to be at the root of some memory leaks happening in certain usage scenarios. The great thing about thread_local though is how fast it is. Using a simple Mutex<Vec<T>> is easily at least twice as slow. We work around that a bit by coding a simplistic fast path for the "owner" of a pool. This does require one new use of `unsafe`, of which we extensively document. This now makes the 'perf-cache' feature a no-op. We of course retain it for compatibility purposes (and perhaps it will be used again in the future), but for now, we always use the same pool. As for benchmarks, it is likely that some cases will get a hair slower. But there shouldn't be any dramatic difference. A careful review of micro-benchmarks in addition to more holistic (albeit ad hoc) benchmarks via ripgrep seems to confirm this. Now that we have more explicit control over the memory pool, we also clean stuff up with repsect to RefUnwindSafe. Fixes #362, Fixes #576 Ref https://github.com/BurntSushi/rure-go/issues/3	2021-03-11 21:10:40 -05:00

1 2 3 4 5 ...

1181 Commits