third_party_rust_regex

mirror of https://gitee.com/openharmony/third_party_rust_regex synced 2025-04-17 18:10:26 +00:00

Author	SHA1	Message	Date
Andrew Gallant	2b1fc2772d	regex-debug: add character count This adds a total character count to the output of the utf8-ranges sub-command.	2018-03-18 08:57:27 -04:00
Andrew Gallant	47d1aeeb89	0.2.10	2018-03-16 11:39:23 -04:00
Andrew Gallant	c84bc41e5a	unstable: update to latest std::arch This replaces `is_target_feature_detected!` with `is_x86_feature_detected!` and adds the `cfg_target_feature` required for using said macro.	2018-03-15 12:32:41 -04:00
Andrew Gallant	dba7f3b041	regex-syntax-0.5.3	2018-03-13 21:44:49 -04:00
Andrew Gallant	97651fb604	syntax/hir: add a printer for HIR This adds a printer for the high-level intermediate representation. The regex it prints is valid, and can be used as a way to turn it into a regex::Regex.	2018-03-13 21:44:08 -04:00
Andrew Gallant	c230e59468	syntax/hir: fix handling of ASCII word boundaries Previously, we had some inconsistencies in how we were handling ASCII word boundaries. In particular, the translator was accepting a negated ASCII word boundary even if the caller didn't disable the UTF-8 invariant. This is wrong, since a negated ASCII word boundary can match between any two arbitrary bytes. However, fixing this is a breaking change, so for now we document the bug. We plan to fix it with regex 1.0. See #457. Additionally, we were incorrectly declaring that an ASCII word boundary matched invalid UTF-8 via the Hir::is_always_utf8 property. An ASCII word boundary must always match an ASCII byte on one side, which implies a valid UTF-8 position.	2018-03-13 21:44:08 -04:00
Andrew Gallant	c7c7a43827	style: reword ast::print docs Also, small formatting fix and removal of debugging test.	2018-03-13 21:44:08 -04:00
Andrew Gallant	37379b09dc	0.2.9	2018-03-12 22:36:49 -04:00
Andrew Gallant	3e87082374	changelog: 0.2.9	2018-03-12 22:36:30 -04:00
Andrew Gallant	27ed3fa9fa	doc: note the new `unstable` feature	2018-03-12 22:32:53 -04:00
Andrew Gallant	04e2930206	ci: remove RUSTFLAGS, enable unstable This removes our compile time SIMD flags and replaces them with the `unstable` feature, which will cause CI to use whatever CPU features are available. Ideally, we would test each important CPU feature combinations, but I'd like to avoid doing that in one CI job and instead split them out into separate CI jobs to keep CI times low. That requires more work.	2018-03-12 22:32:53 -04:00
Andrew Gallant	361459c27f	bench: remove RUSTFLAGS We no longer need to enable SIMD optimizations at compile time. They are automatically enabled when regex is compiled with the `unstable` feature.	2018-03-12 22:32:53 -04:00
Andrew Gallant	f962ddbff0	teddy: port teddy searcher to AVX2 This commit adds a copy of the Teddy searcher that works on AVX2. We don't attempt to reuse any code between them just yet, and instead just copy & paste and tweak parts of it to work on 32 bytes instead of 16. (Some parts were trickier than others. For example, @jneem figured out how to nearly compensate for the lack of a real 256-bit bytewise PALIGNR instruction, which we borrow here.) Overall, AVX2 provides a nice bump in performance.	2018-03-12 22:32:53 -04:00
Andrew Gallant	91296ddcc0	teddy: port teddy searcher to std::arch This commit ports the Teddy searcher to use std::arch and moves off the portable SIMD vector API. Performance remains the same, and it looks like the codegen is identical, which is great! This also makes the `simd-accel` feature a no-op and adds a new `unstable` feature which will enable the Teddy optimization. The `-C target-feature` or `-C target-cpu` settings are no longer necessary, since this will now do runtime target feature detection. We also add a new `unstable` feature to the regex crate, which will enable this new use of std::arch. Once enabled, the Teddy optimizations becomes available automatically without any additional compile time flags.	2018-03-12 22:32:53 -04:00
Andrew Gallant	0baa9bf859	gitignore: add tmp dir	2018-03-12 22:32:53 -04:00
Andrew Gallant	a3c0510711	regex-syntax-0.5.2	2018-03-12 09:49:20 -04:00
Andrew Gallant	102458feff	syntax: fix trailing - bug This fixes a bug in the parser where a regex like `(?x)[ / - ]` would fail to parse. In particular, since whitespace insensitive mode is enabled, this regex should be equivalent to `[/-]`, where the `-` is treated as a literal `-` instead of a range since it is the last character in the class. However, the parser did not account for whitespace insensitive mode, so it didn't see the `-` in `(?x)[ / - ]` as trailing, and therefore reported an unclosed character class (since the `]` was treated as part of the range). We fix that in this commit by accounting for whitespace insensitive mode, which we do by adding a `peek` method that skips over whitespace. Fixes #455	2018-03-12 09:27:02 -04:00
Andrew Gallant	3e370e4c6b	0.2.8	2018-03-12 08:19:53 -04:00
Andrew Gallant	b8b37e9ffb	deps: bump regex-syntax to 0.5.1	2018-03-12 08:19:44 -04:00
Andrew Gallant	8b374ed3e7	regex-syntax-0.5.1	2018-03-12 08:19:06 -04:00
Andrew Gallant	c3fa4a46cb	changelog 0.2.8	2018-03-12 08:18:32 -04:00
Andrew Gallant	0f32c0393a	regex/dfa: minor perf improvement This commit improves the DFA's `follow_episilons` routine slightly. In particular, it eliminates a sizable chunk of stack operations by using a normal linear loop. The only time we use the stack is for a Split instruction, which is still admittedly quite common. However, as we improve the byte code, many of the Split instructions should go away. Note that this is the same technique used by the backtracking and PikeVM engines.	2018-03-10 08:09:36 -05:00
Andrew Gallant	a89220dd71	regex-syntax: fix nest limit checker This commit fixes an embarrassing bug where the depth in the nest limit checker was never decremented during postorder traversal, which means long but shallow regexes would incorrectly trip the nest limit. We fix that in this commit and add two regression tests. Fixes #454	2018-03-09 22:45:55 -05:00
Andrew Gallant	649762db9b	regex: add nest_limit This commit exposes the `nest_limit` option that regex-syntax provides. The nest limit controls how deeply nested a regex is allowed to be.	2018-03-09 22:43:50 -05:00
ethanpailes	7f23152b23	doc: resync TBM should_use comment The TBM `should_use` comment drifted slightly out of sync with the code when a better usage huristic was added. I've shaved the yak.	2018-03-09 07:12:03 -05:00
Andrew Gallant	cbfc0a38de	0.2.7	2018-03-07 19:13:22 -05:00
Andrew Gallant	8aa479dac3	changelog: 0.2.7	2018-03-07 19:12:03 -05:00
Andrew Gallant	052176d67f	regex/literals: re-enable Tuned Boyer-Moore We've added tests and carefully scrutinized it. Let's try this again.	2018-03-07 19:07:34 -05:00
Andrew Gallant	d756dba73e	tests: remove unused plugin tests	2018-03-07 19:06:06 -05:00
ethanpailes	c075e18c62	regex/literal: add quickcheck property for Boyer-Moore If you just generate two random strings, the odds are very high that the shorter one won't be a substring of the longer one once they reach any substantial length. This means that the existing quickcheck cases were probably just testing the negative cases. The exception would be the two cases that append the needle to the haystack, but those only test behavior at the ends. This patch adds a better quickcheck case that can test a needle anywhere in the haystack. Fixes #446	2018-03-07 19:03:13 -05:00
Andrew Gallant	4ce111568b	changelog: update for next release	2018-03-07 19:01:24 -05:00
Andrew Gallant	b3e5fd2dde	regex: remove old regex-syntax crate This commit does the mechanical changes necessary to remove the old regex-syntax crate and replace it with the rewrite. The rewrite now subsumes the `regex-syntax` crate name, and gets a semver bump to 0.5.0.	2018-03-07 19:01:24 -05:00
Andrew Gallant	efff9fa20e	doc: update README	2018-03-07 19:01:24 -05:00
Andrew Gallant	f3b0c66347	regex: better formatting for syntax errors This commit adds an explicit Debug impl for regex's main Error type. The purpose of this impl is to format parse errors in normal panic messages more nicely. This is slightly idiosyncratic, but the default Debug impl prints the full string anyway, we might as well format it nicely. See also: #450	2018-03-07 19:01:24 -05:00
Andrew Gallant	040a71f9d4	regex-debug: add utf8-ranges sub-command This sub-command prints out the UTF-8 alternation machine for an arbitrary character class.	2018-03-07 19:01:24 -05:00
Andrew Gallant	eb03ef11c8	doc: document Unicode support This commit provides exhaustive documentation for the regex crate's support for Level 1 ("Basic Unicode Support") as documented in UTS#18. We also document the small number of additions added to the concrete syntax as a result of the regex-syntax rewrite. See: http://unicode.org/reports/tr18/	2018-03-07 19:01:24 -05:00
Andrew Gallant	b906fd55c5	tests: add Unicode general category tests	2018-03-07 19:01:24 -05:00
Andrew Gallant	ddcbf5b44d	compile: ban empty sub-expressions With the regex syntax rewrite, we now support empty subexpressions more officially. Unfortunately, the compiler has trouble with empty subexpressions in alternation branches. There's no particular reason to not support for them, but they are difficult/awkward to express with the current compiler. So just ban them for now. If one does need an empty subexpression in an alternate branch, then amusingly, something like `()?\|z` will work. We could rewrite all such empty sub-expressions into `()?`, which would retain the same match semantics, but we choose to take the most conservative change possible.	2018-03-07 19:01:24 -05:00
Andrew Gallant	4ae3ae9d92	regex: move to regex-syntax-2 This commit moves the entire regex crate over to the regex-syntax-2 rewrite. Most of this is just rewriting types. The compiler got the most interesting set of changes. It got simpler in some respects, but not significantly so.	2018-03-07 19:01:24 -05:00
Andrew Gallant	715a807289	syntax: rewrite the regex-syntax crate This commit represents a ground up rewrite of the regex-syntax crate. This commit is also an intermediate state. That is, it adds a new regex-syntax-2 crate without making any serious changes to any other code. Subsequent commits will cover the integration of the rewrite and the removal of the old crate. The rewrite is intended to be the first phase in an effort to overhaul the entire regex crate. To that end, this rewrite takes steps in that direction: * The principle change in the public API is an explicit split between a regular expression's abstract syntax (AST) and a high-level intermediate representation (HIR) that is easier to analyze. The old version of this crate mixes these two concepts, but leaned heavily towards an HIR. The AST in the rewrite has a much closer correspondence with the concrete syntax than the old `Expr` type does. The new HIR embraces its role; all flags are now compiled away (including the `i` flag), which will simplify subsequent passes, including literal detection and the compiler. ASTs are produced by ast::parse and HIR is produced by hir::translate. A top-level parser is provided that combines these so that callers can skip straight from concrete syntax to HIR. * Error messages are vastly improved thanks to the span information that is now embedded in the AST. In addition to better formatting, error messages now also include helpful hints when trying to use features that aren't supported (like backreferences and look-around). In particular, octal support is now an opt-in option. (Octal support will continue to be enabled in regex proper to support backwards compatibility, but will be disabled in 1.0.) * More robust support for Unicode Level 1 as described in UTS#18. In particular, we now fully support Unicode character classes including set notation (difference, intersection, symmetric difference) and correct support for named general categories, scripts, script extensions and age. That is, `\p{scx:Hira}` and `p{age:3.0}` now work. To make this work, we introduce an internal interval set data structure. * With the exception of literal extraction (which will be overhauled in a later phase), all code in the rewrite uses constant stack space, even while performing analysis that requires structural induction over the AST or HIR. This is done by pushing the call stack onto the heap, and is abstracted by the `ast::Visitor` and `hir::Visitor` traits. The point of this method is to eliminate stack overflows in the general case. * Empty sub-expressions are now properly supported. Expressions like `()`, `\|`, `a\|` and `b\|()+` are now valid syntax. The principle downsides of these changes are parse time and binary size. Both seemed to have increased (slower and bigger) by about 1.5x. Parse time is generally peanuts compared to the compiler, so we mostly don't care about that. Binary size is mildly unfortunate, and if it becomes a serious issue, it should be possible to introduce a feature that disables some level of Unicode support and/or work on compressing the Unicode tables. Compile times have increased slightly, but are still a very small fraction of the overall time it takes to compile `regex`. Fixes #174, Fixes #424	2018-03-07 19:01:24 -05:00
Matt Brubeck	7f020b8de0	regex: add Replacer::by_ref adaptor This permits use of a Replacer without consuming it. Note: This can't simply return `&mut Self` because a generic `impl<R: Replacer> Replacer for &mut R` would conflict with libstd's generic `impl<F: FnMut> FnMut for &mut F`. See also: #83 Closes #449	2018-03-07 15:39:50 -05:00
ethanpailes	7645ff2bc0	regex/literal: fix bug in Boyer-Moore This patch fixes an issue where skip resolution would go strait to the default value (the md2_shift) on a match failure after the shift_loop. Now we do the right thing, and first check in the skip table. The problem with going strait to the md2_shift is that you can accidentally shift to far when `window_end` actually is in the pattern (as is the case for the failing match).	2018-03-07 15:33:29 -05:00
Andrew Gallant	43bb64b254	bench: small tweaks This adds object files (produced by D compilers) to gitignore, and adds RE2 to the benchmark compilation script by default.	2018-03-04 09:23:56 -05:00
Andrew Gallant	b0113ec3db	ci: remove doc generation This has apparently been broken for a while, and with docs.rs, we don't need it any more. Tangentially, this method seemingly required a personal access token, which seems like a bad idea in a shared repo.	2018-02-18 13:22:29 -05:00
Andrew Gallant	5eb4552262	ci: reformat	2018-02-18 13:21:52 -05:00
Andrew Gallant	f0b92ca277	bench: update to memmap 0.6	2018-02-17 22:14:47 -05:00
Andrew Gallant	9ee9943ec8	dfa: disable if size limit is 0 As a special case, if the user configures a DFA size limit of 0, then we should never try to use it. This avoids a bit of thrashing where the DFA tries to senselessly run before spilling over to the NFA.	2018-02-09 23:13:01 -05:00
Andrew Gallant	3182b23f34	0.2.6	2018-02-08 18:14:56 -05:00
Andrew Gallant	2dee2fe3f2	bench: add logs	2018-02-08 18:14:47 -05:00
Andrew Gallant	04355544f1	changelog: 0.2.6	2018-02-08 18:12:20 -05:00

1 2 3 4 5 ...

795 Commits