201 Commits

Author SHA1 Message Date
lubinglun
683028773a Add GN Build Files and Custom Modifications
Issue:https://gitee.com/openharmony/build/issues/I6UFTP
Signed-off-by: lubinglun <lubinglun@huawei.com>
2023-04-12 17:26:48 +08:00
Andrew Gallant
ea3b132080
regex-syntax-0.6.28 2022-11-05 13:32:31 -04:00
Andrew Gallant
9a1892737b syntax: update to Unicode 15
Closes #916
2022-11-05 13:32:03 -04:00
Andrew Gallant
8c0eccd0c6
regex-syntax-0.6.27 2022-07-05 13:59:34 -04:00
Elie ROUDNINSKI
de838287bb syntax: fix clippy lints up to rust 1.41.1
Some lints have been intentionally ignored, especially:

* any lints that would change public APIs (like &self -> self)
* any lints that would introduce new public APIs (like Default over new)
2022-07-05 13:53:46 -04:00
Alexander Gonzalez
b87cd88476 syntax: include only the start of the character class on error
This fixes a bug where the caret in some types of error messages was not
quite correct.

Fixes #792, Closes #794
2022-07-05 13:53:46 -04:00
Alexander Gonzalez
9d1478cfb5 doc: fix typos 2022-07-05 13:53:46 -04:00
Alex Chan
0c2774894a
doc: fix spelling typo of "insignificant"
PR #839
2022-07-05 13:07:24 -04:00
Alexander Beedie
30dba7422d
doc: fix typo in 'is_meta_character'
PR #873
2022-07-05 13:02:32 -04:00
Andrew Gallant
95af74d8d9 syntax: update to Unicode 14
Closes #878
2022-07-05 13:00:10 -04:00
Andrew Gallant
b41bde0b85
regex-syntax-0.6.26 2022-05-20 14:05:16 -04:00
Andrew Gallant
1c19619672 syntax: fix literal extraction for 'ab??'
Previously, 'ab??' returned [Complete(ab), Complete(a)], but the order
matters here because of greediness. The correct result is [Complete(a),
Complete(ab)].

Instead of trying to actually fix literal extraction (which is a mess),
we just rewrite 'ab?' (and 'ab??') as 'ab*'. 'ab*' still produces
literals in the incorrect order, i.e., [Cut(ab), Complete(a)], but since
one is cut we are guaranteed that the regex engine will be called to
confirm the match. In so doing, it will correctly report 'a' as a match
for 'ab??' in 'ab'.

Fixes #862
2022-05-20 14:02:08 -04:00
Andrew Gallant
88a2a62d86 syntax: fix 'is_match_empty' predicate
This was incorrectly defined for \b. Previously, I had erroneously made
it return true only for \B since \B matches '' and \b does not match
''. However, \b does match the empty string. Like \B, it only matches a
subset of empty strings, depending on what the surrounding context is.
The important bit is that it can match *an* empty string, not that it
matches *the* empty string.

We were not yet using this predicate anywhere in the regex crate, so we
just fix the implementation and update the tests.

This does present a compatibility hazard for anyone who was using this
function, but as of this time, I'm considering this a bug fix since \b
clearly matches an empty string.

Fixes #859
2022-05-18 08:18:14 -04:00
Andrew Gallant
72f09f1aeb syntax: fix ascii class union bug
This fixes a bug in how ASCII class unioning was implemented. Namely, it
previously and erroneously unioned together two classes and then applied
negation/case-folding based on the most recently added class, even if
the class added previously wasn't negated. So for example, given the
regex '[[:alnum:][:^ascii:]]', this would initialize the class with
'[:alnum:]', then add all '[:^ascii:]' codepoints and then negate the
entire thing because of the negation in '[:^ascii:]'. Negating the
entire thing is clearly wrong and not the intended semantics.

We fix this by applying negation/case-folding only to the class we're
dealing with, and then we union it with whatever existing class we're
building.

Fixes #680
2022-05-18 08:18:14 -04:00
Alex Touchet
b92ffd5471
cargo: use SPDX license format
We were previously using '/' to indicate the dual licensing
scheme, but I guess we're now supposed to use 'OR'.

PR #843
2022-03-03 07:31:45 -05:00
Andrew Gallant
f6e52dafde
syntax: fix 'unused' warnings
It looks like the dead code detector got smarter. We never ended up
using the 'printer' field in these visitors, so just get rid of it.
2022-02-25 12:48:26 -05:00
Ian Kerins
63ee6699a2
syntax/doc: fix 'their' typo 2021-11-02 18:25:39 -04:00
Alex Touchet
d6bc7a4c3b
readme: remove broken badge
This was missed in bd0a142.

Fixes #797 (again)
2021-07-23 12:49:36 -04:00
Andrew Gallant
bd0a14231b
readme: fix badges
Fixes #797, Fixes #798
2021-07-23 08:24:45 -04:00
Dirk Stolle
977aabd043
doc: fix some typos
PR #774
2021-05-05 07:56:08 -04:00
Andrew Gallant
3ea9e3eca7
regex-syntax-0.6.25 2021-05-01 20:30:34 -04:00
Andrew Gallant
a8554b3cc4 syntax: fix compilation errors with unicode-perl
When only the unicode-perl feature is enabled, regex-syntax would fail
to build. It turns out that 'cargo fix' doesn't actually fix all
imports. It looks like it only fixes things that it can build in the
current configuration.

Fixes #769, Fixes #770
2021-05-01 18:52:18 -04:00
Andrew Gallant
0abcada3a7 ci: test scripts should fail on errors
While these test scripts are running in CI, if any of their commands
fail, they don't actually fail the build.
2021-05-01 18:52:18 -04:00
Andrew Gallant
00fb09e0b7
regex-syntax-0.6.24 2021-04-30 20:09:30 -04:00
Andrew Gallant
a2a393f1ff fmt: run 'cargo fmt --all'
It looks like 'cargo fix' didn't do this.
2021-04-30 20:02:56 -04:00
Andrew Gallant
e2860fe037 edition: manual fixups to code
This commit does a number of manual fixups to the code after the
previous two commits were done via 'cargo fix' automatically.

Actually, this contains more 'cargo fix' annotations, since I had
forgotten to add 'edition = "2018"' to all sub-crates.
2021-04-30 20:02:56 -04:00
Andrew Gallant
cb108b77e7 edition: initial migration to Rust 2018 2021-04-30 20:02:56 -04:00
Andrew Gallant
5a3570163b
regex-syntax-0.6.23 2021-03-11 21:15:50 -05:00
Markus
bf7f8f19c6
doc: use 'text' instead of 'ignore' for regexes
This makes rendering a bit nicer by disabling syntax
highlighting and removing the "untested" warning.

PR #741
2021-01-21 17:50:49 -05:00
Alex Touchet
259863dfb6
doc: use HTTPS in links
PR #726
2021-01-12 07:31:38 -05:00
Andrew Gallant
d27882cbd8
regex-syntax-0.6.22 2021-01-08 11:10:24 -05:00
Ryan Lopopolo
ee94996c5d
api: add missing Debug impls for public types
In general, all public types should have a `Debug` impl.
Some types didn't because it was just never needed, but
it's good form to do it.

PR #735
2020-12-29 17:28:34 -05:00
Andrew Gallant
d03ae186b5
regex-syntax-0.6.21 2020-11-01 11:27:37 -05:00
Andrew Gallant
6fdb6e123c
syntax: forbid \P{any}
Previously, the translator would forbid constructs like [^\w\W] that
compiled to empty character classes. These things are forbidden not
because the translator can't handle it, but because the compile in
'regex' proper can't handle it. Once we migrate to the compiler in
regex-automata, which supports empty classes, then we can lift this
restriction. But until then, we should ban all such instances. It turns
out that \P{any} was another way to utter this, so we ban it in this
commit.

This was found by OSS-Fuzz:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=26505

Fixes #722
2020-11-01 11:25:11 -05:00
Andrew Gallant
3589accc6d
regex-syntax-0.6.20 2020-10-13 10:31:53 -04:00
Andrew Gallant
b1489c8445
syntax: make \p{cf} work
It turns out that 'cf' is also an abbreviation for the 'Case_Folding'
property. Even though we don't actually support a 'Case_Folding'
property, a quirk of our code caused 'cf' to fail since it was treated
as a normal boolean property instead of a general category. We fix it be
special casing it.

Note that '\p{gc=cf}' worked and continues to work.

If we ever do add the 'Case_Folding' property, we'll not be able to
support its abbreviation since it is now taken by 'Format'.

Fixes #719
2020-10-13 10:29:03 -04:00
Andrew Gallant
e2c0889dc3
regex-syntax-0.6.19 2020-10-11 20:09:56 -04:00
Bruce Guenter
e1e36925ca capture: support [, ] and . in capture group names
This slightly expands the set of characters allowed in capture group
names to be `[][_0-9A-Za-z.]` from `[_0-9A-Za-z]`.

This required some delicacy in order to avoid replacement strings like
`$Z[` from referring to invalid capture group names where the intent was
to refer to the capture group named `Z`. That is, in order to use `[`,
`]` or `.` in a capture group name, one must use the explicit brace
syntax: `${Z[}`. We clarify the docs around this issue.

Regretably, we are not much closer to handling #595. In order to
support, say, all Unicode word characters, our replacement parser would
need to become UTF-8 aware on `&[u8]`. But std makes this difficult and
I would prefer not to add another dependency on ad hoc UTF-8 decoding or
a dependency on another crate.

Closes #649
2020-10-11 20:08:30 -04:00
Alexandre Viau
a3194d0323
syntax/doc: fix enabld -> enabled
PR #703
2020-08-04 19:19:07 -04:00
Andrew Gallant
95047166ac
regex-syntax-0.6.18 2020-05-28 11:21:59 -04:00
Valentin Gatien-Baron
d50d31ba77
hir: make is_alternation_literal say false on Empty
To avoid this assertion in tests when empty alternations are allowed:

   internal error: entered unreachable code: expected literal or
   concat, got Hir { kind: Empty, info: HirInfo { bools: 1795 } }',
   src/exec.rs:1568:18

The code in exec.rs relies on the documented invariant for
is_alternation_literal:

    /// ... This is only true when this HIR expression is either
    /// itself a `Literal` or a concatenation of only `Literal`s or an
    /// alternation of only `Literal`s.
2020-05-28 11:10:33 -04:00
Andrew Gallant
ad89e8c8fe
syntax: update formatting
rustfmt appears to have had a slight tweak. This also fixes CI.
2020-04-27 21:24:08 -04:00
Hubert Hirtz
3ff6ae19ee
syntax: improve allocation of escape_into
This causes escape_into to reserve capacity instead of having escape do
it. This is a bit more general and will benefit users of escape_into.

PR #655
2020-03-24 08:07:41 -04:00
Andrew Gallant
c1585975f4
syntax: regenerate tables for version info
This is a cosmetic change only. ucd-generate now includes the Unicode
version in the generated output.
2020-03-12 22:24:46 -04:00
Andrew Gallant
46564406b4
regex-syntax-0.6.17 2020-03-12 22:03:15 -04:00
Andrew Gallant
88b3fa542a syntax: update to Unicode 13 2020-03-12 22:00:48 -04:00
Andrew Gallant
db67087198
regex-syntax-0.6.16 2020-03-02 20:16:20 -05:00
Andrew Gallant
c187cbf04a
syntax: add ClassUnicode::is_all_ascii
This mirrors the same routine on ClassBytes. This is useful when
translating an HIR to an NFA and one wants to write a fast path for the
common all ASCII case.
2020-03-02 20:15:33 -05:00
Andrew Gallant
17304c5a55
regex-syntax-0.6.15 2020-03-01 08:22:29 -05:00
Andrew Gallant
49b9a348ac
syntax/doc: fix docs for try_case_fold_simple
Its whole purpose is to not panic and instead return an error, which
matches the implementation. This fixes the docs to properly reflect
that.
2020-03-01 08:21:46 -05:00