third_party_rust_regex

mirror of https://gitee.com/openharmony/third_party_rust_regex synced 2025-04-17 01:50:23 +00:00

Author	SHA1	Message	Date
Andrew Gallant	169783c1d6	syntax: release 0.6.11	2019-08-03 16:10:47 -04:00
Andrew Gallant	b4c67cb80c	syntax: drop ucd_util dependency This one was a bit hard to swallow because it involved copying a fairly short but not terribly simple function for normalizing property names/values. But the code is so small, changes rarely, and is easily tested, that it's just not worth bringing in a whole dependency for it given how big regex-syntax already is.	2019-08-03 16:09:49 -04:00
Andrew Gallant	caa075f653	syntax: absorb utf8-ranges crate This commit brings the utf8-ranges crate into regex-syntax as a utf8 sub-module. This was done because it was observed that utf8-ranges is effectively unused outside the context of regex-syntax. It is a very small amount of code, and fits alongside the rest of regex-syntax. In particular, anyone building a regex engine using regex-syntax will likely need this code anyway.	2019-08-03 16:09:49 -04:00
Andrew Gallant	fc3e6aa19a	license: remove license headers from files The Rust project determined these were unnecessary a while back[1,2,3] and we follow suite. [1] - `0565653eec` [2] - https://github.com/rust-lang/rust/pull/43498 [3] - https://github.com/rust-lang/rust/pull/57108	2019-08-03 14:47:45 -04:00
Andrew Gallant	0e96af4166	style: start using rustfmt	2019-08-03 14:20:22 -04:00
Andrew Gallant	341f207c10	regex-syntax-0.6.10	2019-07-20 23:01:44 -04:00
Andrew Gallant	dc111a5f19	syntax: update Unicode ages lookup This was a missed fix for the Unicode 12.1 update.	2019-07-20 23:01:23 -04:00
Andrew Gallant	0c57ea14ea	syntax: release 0.6.9	2019-07-20 22:46:46 -04:00
Andrew Gallant	3124a3b2ca	syntax: update to Unicode 12.1	2019-07-20 22:45:39 -04:00
Andrew Gallant	918350a59b	msrv: bump to Rust 1.28 Rust 1.28 is almost a year old by this point, and there were a number of nice stabilizations between 1.24 and 1.28. Notably, vendor intrinsics were stabilized in Rust 1.26, so we no longer need a build script.	2019-07-20 22:35:18 -04:00
Gurwinder Singh	dfe0dc6493	syntax/doc: fix typo	2019-07-14 08:04:21 -04:00
Andrew Gallant	62b7b508fa	regex-syntax-0.6.8	2019-07-06 09:16:20 -04:00
Andrew Gallant	886a7e7185	syntax: move error test to syntax crate The problem with putting it in the regex crate proper is that it requires the regex crate to bump its minimal regex-syntax crate version. While this isn't necessarily an issue, since we can't enable Cargo's minimal version check because of the `rand` dependency, this winds up being a hazard. Plus, having it in the regex crate doesn't buy us too much. It's just as well to have the tests in regex-syntax. Fixes #593	2019-07-06 09:15:11 -04:00
Christian Rondeau	172898a4fd	syntax: better errors missing repetition quantifier This change causes a better error message to surface when a repetition quantifier is used with a missing number. Closes #545	2019-06-11 07:45:27 -04:00
Andrew Gallant	3ffe9a20b8	regex-syntax-0.6.7	2019-06-09 08:57:15 -04:00
Andrew Gallant	53270d8232	syntax: fix warnings The language team is getting deprecation-happy with old syntax. But Rust 1.24.1 doesn't support inclusive range syntax, so we forcefully allow it.	2019-06-09 08:49:06 -04:00
Andrew Gallant	89074f87d0	1.1.3	2019-03-30 10:53:01 -04:00
Andrew Gallant	231643248b	syntax: fix bug when parsing ((?x)) This fixes yet another bug with our handling of (?flags) directives in the regex. This time, we try to be a bit more principled and specifically treat a (?flags) directive as a valid empty sub-expression. While this means we could remove errors reported from previous fixes for things like `(?i)+`, we retain those for now since they are a bit weird. Although `((?i))+` is now allowed, which is equivalent. We should probably allow `(?i)+` in the future for consistency sake. Fixes #527	2019-03-30 10:47:45 -04:00
Andrew Gallant	7b1599f2f6	syntax: fix counted repetition bug This fixes a bug where the HIR translator would panic on regexes such as `(?i){1}` since it assumes that every repetition operator has a valid sub-expression, and `(?i)` is not actually a sub-expression (but is more like a directive instead). Previously, we fixed this same bug for uncounted repetitions in commit 17764ffe (for bug #465), but we did not fix it for counted repetitions. We apply the same fix here. Fixes #555	2019-03-30 10:47:45 -04:00
Andrew Gallant	bd5f2b4be5	syntax: add is_literal and is_alternation_literal This adds a couple new methods on HIR expressions for determining whether they are literals or not. This is useful for determining whether to apply optimizations such as Aho-Corasick without re-analyzing the syntax.	2019-03-30 08:18:19 -04:00
Andrew Gallant	60d087a230	regex-syntax-0.6.5	2019-01-26 11:14:37 -05:00
Andrew Gallant	0fc24d275a	syntax: add is_line_anchored_{start,end} This commit adds two new predicates to `Hir` values that permit querying whether an expression is line anchored at the start or end. This was motivated by a desire to tweak the offsets of a match when enabling --crlf mode in ripgrep.	2019-01-26 11:14:27 -05:00
Andrew Gallant	b77e3fca8a	regex-syntax-0.6.4	2018-11-30 22:05:18 -05:00
Daniel Holbert	e214d8cd88	doc: Fix typo in comment ("ocassionally") PR #515	2018-11-30 20:02:29 -05:00
Andrew Gallant	ecc1a5a70d	syntax: add emoji and break properties This commit adds several emoji properties such as Emoji and Extended_Pictographic. We also add support for the Grapheme_Cluster_Break, Word_Break and Sentence_Break enumeration properties.	2018-11-30 20:00:49 -05:00
Andrew Gallant	770edd59b2	regex-syntax-0.6.3	2018-11-07 17:20:08 -05:00
Derek Gonyeo	ce4154365f	syntax/license: add the unicode license for unicode-tables Add the Unicode license to the unicode-tables directory, as the data there comes from the Unicode Consortium. Fixes #530	2018-11-07 17:19:51 -05:00
kennytm	5241919f48	syntax: fix [[:blank:]] character class Ensure `[[:blank:]]` only matches `[ \t]`. It appears that there was a transcription error when `regex-syntax` was rewritten such that `[[:blank:]]` ended up matching more than it was supposed to. Fixes #533	2018-10-29 08:24:15 -04:00
Andrew Gallant	8421c9ae85	regex-syntax 0.6.2	2018-07-18 09:24:25 -04:00
Andrew Gallant	24c7770b80	syntax: fix printing bug for HIR This commit fixes a bug in the HIR printer where it would not correctly escape meta characters in character classes.	2018-07-18 09:15:27 -04:00
Andrew Gallant	7ebe4ae02d	syntax: update docs to reflect behavior This updates the documentation on `allow_invalid_utf8` to reflect the current behavior of the translator. The old documentation was describing the behavior of regex-syntax 0.5, but it was changed in regex-syntax 0.6.	2018-07-18 09:14:26 -04:00
Andrew Gallant	bf8f55f187	regex-syntax-0.6.1	2018-06-12 06:55:06 -04:00
Josh Stone	5eaff67a6a	syntax: regenerate tables for Unicode 11 This adds `scripts/generate.py`, and uses it to regenerate all tables with data from Unicode 11.0.0. This also restores the character tests that were first added in #400, with a new one for 11.	2018-06-12 06:54:13 -04:00
Andrew Gallant	b5ef0ec281	regex 1.0	2018-05-01 16:52:05 -04:00
Andrew Gallant	8e180eb71f	syntax: fixes for Rust 1.20.0 Make sure we can run tests for regex-syntax on Rust 1.20.0.	2018-05-01 16:48:46 -04:00
Andrew Gallant	76343f8cd6	regex: ban (?-u:\B) for Unicode regexes The issue with the ASCII version of \B is that it can match between code units of UTF-8, which means it can cause match indices reported to be on invalid UTF-8 boundaries. Therefore, similar to things like `(?-u:\xFF)`, we ban negated ASCII word boundaries from Unicode regular expressions. Normal ASCII word boundaries remain accessible from Unicode regular expressions. See #457	2018-05-01 16:48:46 -04:00
Andrew Gallant	9604cc07ed	unicode: remove implementations of encode_utf8 This commit removes our explicit implementations of encode_utf8 and replaces them with uses of `char::encode_utf8`, which was added to the standard library in Rust 1.15.	2018-05-01 16:48:46 -04:00
Andrew Gallant	05ab8f318d	*: switch from try! to ?	2018-05-01 16:48:46 -04:00
Andrew Gallant	92e7baf584	regex-syntax 0.5.6	2018-05-01 13:28:53 -04:00
Andrew Gallant	17764ffe17	syntax: fix handling of (?flags) in parser This commit fixes a bug with the handling of `(?flags)` sub-expressions in the parser. Previously, the parser read `(?flags)`, added it to the current concatenation, and then treat that as a valid sub-expression for repetition operators, as in `(?i)*`. This in turn caused the translator to panic on a failed assumption: that witnessing a repetition operator necessarily implies a preceding sub-expression. But `(?i)` has no explicit represents in the HIR, so there is no sub-expression. There are two legitimate ways to fix this: 1. Ban such constructions in the parser. 2. Remove the assumption in the translator, and/or always translate a `(?i)` into an empty sub-expression, which should generally be a no-op. This commit chooses (1) because it is more conservative. That is, it turns a panic into an error, which gives us flexibility in the future to choose (2) if necessary. Fixes #465	2018-04-28 12:02:39 -04:00
Andrew Gallant	d5e5da68e2	syntax: fix 'C' alias bug This re-generates the Unicode table for property name aliases after fixing a bug in property name canonicalization. Namely, the 'isc' alias of the 'ISO_Comment' property was being canonicalized to 'c', which is actually an alias of the 'Other' general category. This is a result of the canonicalization procedure ignoring 'is' prefixes, as permitted by UTS#18. Fixes #466	2018-04-28 10:44:41 -04:00
Andrew Gallant	f7ea409880	syntax: better error messages for '[\d-a]' This commit adds a new type of error message that is used whenever a character class escape sequence is used as the start or end of a character class range. Fixes #461	2018-04-28 09:50:25 -04:00
Andrew Gallant	15a68c8856	regex-syntax 0.5.5	2018-04-14 16:44:01 -04:00
Andrew Gallant	9ba9a758c2	syntax: fix bug in error printer This fixes an off-by-one bug in the error formatter. Namely, if a regex ends with a literal `\n` and an error is reported that contains a span at the end of the regex, then this trips a bug in the formatter because its line count ends up being wrong. We fix this by tweaking the line count. The actual error message is still a little wonky, but given the literal `\n`, it's hard not to make it wonky. Fixes #464	2018-04-14 16:35:02 -04:00
Andrew Gallant	dba7f3b041	regex-syntax-0.5.3	2018-03-13 21:44:49 -04:00
Andrew Gallant	97651fb604	syntax/hir: add a printer for HIR This adds a printer for the high-level intermediate representation. The regex it prints is valid, and can be used as a way to turn it into a regex::Regex.	2018-03-13 21:44:08 -04:00
Andrew Gallant	c230e59468	syntax/hir: fix handling of ASCII word boundaries Previously, we had some inconsistencies in how we were handling ASCII word boundaries. In particular, the translator was accepting a negated ASCII word boundary even if the caller didn't disable the UTF-8 invariant. This is wrong, since a negated ASCII word boundary can match between any two arbitrary bytes. However, fixing this is a breaking change, so for now we document the bug. We plan to fix it with regex 1.0. See #457. Additionally, we were incorrectly declaring that an ASCII word boundary matched invalid UTF-8 via the Hir::is_always_utf8 property. An ASCII word boundary must always match an ASCII byte on one side, which implies a valid UTF-8 position.	2018-03-13 21:44:08 -04:00
Andrew Gallant	c7c7a43827	style: reword ast::print docs Also, small formatting fix and removal of debugging test.	2018-03-13 21:44:08 -04:00
Andrew Gallant	a3c0510711	regex-syntax-0.5.2	2018-03-12 09:49:20 -04:00
Andrew Gallant	102458feff	syntax: fix trailing - bug This fixes a bug in the parser where a regex like `(?x)[ / - ]` would fail to parse. In particular, since whitespace insensitive mode is enabled, this regex should be equivalent to `[/-]`, where the `-` is treated as a literal `-` instead of a range since it is the last character in the class. However, the parser did not account for whitespace insensitive mode, so it didn't see the `-` in `(?x)[ / - ]` as trailing, and therefore reported an unclosed character class (since the `]` was treated as part of the range). We fix that in this commit by accounting for whitespace insensitive mode, which we do by adding a `peek` method that skips over whitespace. Fixes #455	2018-03-12 09:27:02 -04:00

1 2 3

136 Commits