174 Commits

Author SHA1 Message Date
Andrew Gallant
5a3570163b
regex-syntax-0.6.23 2021-03-11 21:15:50 -05:00
Markus
bf7f8f19c6
doc: use 'text' instead of 'ignore' for regexes
This makes rendering a bit nicer by disabling syntax
highlighting and removing the "untested" warning.

PR #741
2021-01-21 17:50:49 -05:00
Alex Touchet
259863dfb6
doc: use HTTPS in links
PR #726
2021-01-12 07:31:38 -05:00
Andrew Gallant
d27882cbd8
regex-syntax-0.6.22 2021-01-08 11:10:24 -05:00
Ryan Lopopolo
ee94996c5d
api: add missing Debug impls for public types
In general, all public types should have a `Debug` impl.
Some types didn't because it was just never needed, but
it's good form to do it.

PR #735
2020-12-29 17:28:34 -05:00
Andrew Gallant
d03ae186b5
regex-syntax-0.6.21 2020-11-01 11:27:37 -05:00
Andrew Gallant
6fdb6e123c
syntax: forbid \P{any}
Previously, the translator would forbid constructs like [^\w\W] that
compiled to empty character classes. These things are forbidden not
because the translator can't handle it, but because the compile in
'regex' proper can't handle it. Once we migrate to the compiler in
regex-automata, which supports empty classes, then we can lift this
restriction. But until then, we should ban all such instances. It turns
out that \P{any} was another way to utter this, so we ban it in this
commit.

This was found by OSS-Fuzz:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=26505

Fixes #722
2020-11-01 11:25:11 -05:00
Andrew Gallant
3589accc6d
regex-syntax-0.6.20 2020-10-13 10:31:53 -04:00
Andrew Gallant
b1489c8445
syntax: make \p{cf} work
It turns out that 'cf' is also an abbreviation for the 'Case_Folding'
property. Even though we don't actually support a 'Case_Folding'
property, a quirk of our code caused 'cf' to fail since it was treated
as a normal boolean property instead of a general category. We fix it be
special casing it.

Note that '\p{gc=cf}' worked and continues to work.

If we ever do add the 'Case_Folding' property, we'll not be able to
support its abbreviation since it is now taken by 'Format'.

Fixes #719
2020-10-13 10:29:03 -04:00
Andrew Gallant
e2c0889dc3
regex-syntax-0.6.19 2020-10-11 20:09:56 -04:00
Bruce Guenter
e1e36925ca capture: support [, ] and . in capture group names
This slightly expands the set of characters allowed in capture group
names to be `[][_0-9A-Za-z.]` from `[_0-9A-Za-z]`.

This required some delicacy in order to avoid replacement strings like
`$Z[` from referring to invalid capture group names where the intent was
to refer to the capture group named `Z`. That is, in order to use `[`,
`]` or `.` in a capture group name, one must use the explicit brace
syntax: `${Z[}`. We clarify the docs around this issue.

Regretably, we are not much closer to handling #595. In order to
support, say, all Unicode word characters, our replacement parser would
need to become UTF-8 aware on `&[u8]`. But std makes this difficult and
I would prefer not to add another dependency on ad hoc UTF-8 decoding or
a dependency on another crate.

Closes #649
2020-10-11 20:08:30 -04:00
Alexandre Viau
a3194d0323
syntax/doc: fix enabld -> enabled
PR #703
2020-08-04 19:19:07 -04:00
Andrew Gallant
95047166ac
regex-syntax-0.6.18 2020-05-28 11:21:59 -04:00
Valentin Gatien-Baron
d50d31ba77
hir: make is_alternation_literal say false on Empty
To avoid this assertion in tests when empty alternations are allowed:

   internal error: entered unreachable code: expected literal or
   concat, got Hir { kind: Empty, info: HirInfo { bools: 1795 } }',
   src/exec.rs:1568:18

The code in exec.rs relies on the documented invariant for
is_alternation_literal:

    /// ... This is only true when this HIR expression is either
    /// itself a `Literal` or a concatenation of only `Literal`s or an
    /// alternation of only `Literal`s.
2020-05-28 11:10:33 -04:00
Andrew Gallant
ad89e8c8fe
syntax: update formatting
rustfmt appears to have had a slight tweak. This also fixes CI.
2020-04-27 21:24:08 -04:00
Hubert Hirtz
3ff6ae19ee
syntax: improve allocation of escape_into
This causes escape_into to reserve capacity instead of having escape do
it. This is a bit more general and will benefit users of escape_into.

PR #655
2020-03-24 08:07:41 -04:00
Andrew Gallant
c1585975f4
syntax: regenerate tables for version info
This is a cosmetic change only. ucd-generate now includes the Unicode
version in the generated output.
2020-03-12 22:24:46 -04:00
Andrew Gallant
46564406b4
regex-syntax-0.6.17 2020-03-12 22:03:15 -04:00
Andrew Gallant
88b3fa542a syntax: update to Unicode 13 2020-03-12 22:00:48 -04:00
Andrew Gallant
db67087198
regex-syntax-0.6.16 2020-03-02 20:16:20 -05:00
Andrew Gallant
c187cbf04a
syntax: add ClassUnicode::is_all_ascii
This mirrors the same routine on ClassBytes. This is useful when
translating an HIR to an NFA and one wants to write a fast path for the
common all ASCII case.
2020-03-02 20:15:33 -05:00
Andrew Gallant
17304c5a55
regex-syntax-0.6.15 2020-03-01 08:22:29 -05:00
Andrew Gallant
49b9a348ac
syntax/doc: fix docs for try_case_fold_simple
Its whole purpose is to not panic and instead return an error, which
matches the implementation. This fixes the docs to properly reflect
that.
2020-03-01 08:21:46 -05:00
Andrew Gallant
e6a0c55afa
syntax: add Utf8Sequence::reverse method
This is very convenient when compiling reverse UTF-8 automata.
2020-03-01 08:18:42 -05:00
Andrew Gallant
25d7c7433c
regex-syntax-0.6.14 2020-01-30 18:31:08 -05:00
Andrew Gallant
ea4009a22d syntax: fix flag scoping issue
This fixes a rather nasty bug where flags set inside a group were being
applies to expressions outside the group. e.g., In the simplest case,
`((?i)a)b)` would match `aB`, even though the case insensitive flag
_shouldn't_ be applied to `b`.

The issue here was that we were actually going out of our way to reset
the flags when a group is popped only _some_ of the time. Namely, when
flags were set via `(?i:a)b` syntax. Instead, flags should be reset to
their previous state _every_ time a group is popped in the translator.

The fix here is pretty simple. When we open a group, if the group itself
does not have any flags, then we simply record the current state of the
flags instead of trying to replace the current flags. Then, when we pop
the group, we are guaranteed to obtain the old flags, at which point, we
reset them.

Fixes #640
2020-01-30 18:28:45 -05:00
Andrew Gallant
94a58860e3
syntax: release 0.6.13 2020-01-09 14:29:15 -05:00
Jeremy Stucki
98bc9041c2 style: remove needless lifetime 2020-01-09 14:26:57 -05:00
Daniele D'Orazio
eff5348aa5 syntax: add explicit error for \p\
Fixes #594, Closes #622
2020-01-09 14:26:57 -05:00
Andrew Gallant
9ac0f5e82e deprecated: allow use of deprecated description methods
PR #633 removed these methods, but we can't do that without making a
breaking change release. Removing deprecated methods isn't worth doing a
breaking change release, so we instead simply allow them for now by
squashing the warnings.

Closes #633
2020-01-09 14:26:57 -05:00
Andrew Gallant
27c0d6d944 style: rust updated rustfmt 2020-01-09 14:26:57 -05:00
Andrew Gallant
25ae00460e
syntax: release 0.6.12 2019-09-03 12:52:18 -04:00
Andrew Gallant
8465302996 syntax: forcefully un-inline some methods
This seems to save about 12KB on the final binary size. Benchmarks
suggest that there is no meaningful runtime performance difference.
2019-09-03 12:35:17 -04:00
Andrew Gallant
7f2d2c65ca syntax: add forbid(unsafe_code)
We have a good thing going, so let's formalize it a bit.
2019-09-03 12:35:17 -04:00
Andrew Gallant
c09d9e0edc syntax: make Unicode completely optional
This commit refactors the way this library handles Unicode data by
making it completely optional. Several features are introduced which
permit callers to select only the Unicode data they need (up to a point
of granularity).

An important property of these changes is that presence of absence of
crate features will never change the match semantics of a regular
expression. Instead, the presence or absence of a crate feature can only
add or subtract from the set of all possible valid regular expressions.

So for example, if the `unicode-case` feature is disabled, then
attempting to produce `Hir` for the regex `(?i)a` will fail. Instead,
callers must use `(?i-u)a` (or enable the `unicode-case` feature).

This partially addresses #583 since it permits callers to decrease
binary size.
2019-09-03 12:35:17 -04:00
Andrew Gallant
98a7337d62 syntax/unicode: lightly refactor Perl Unicode class handling
This nominally moves the logic for acquiring Unicode-aware Perl character
classes into the `unicode` module, and also makes the calling code
robust with respect to failures.

This commit is prep work for making the availability of Unicode-aware
Perl classes optional.
2019-09-03 12:35:17 -04:00
Andrew Gallant
5204ee424f script: tweak generate-unicode-tables
This makes sure the generated tables are rustfmt'd.
2019-09-03 12:35:17 -04:00
Andrew Gallant
29f39b8721
syntax: add PartialOrd/Ord to UTF-8 types 2019-08-22 18:05:19 -04:00
Andrew Gallant
169783c1d6
syntax: release 0.6.11 2019-08-03 16:10:47 -04:00
Andrew Gallant
b4c67cb80c syntax: drop ucd_util dependency
This one was a bit hard to swallow because it involved copying a
fairly short but not terribly simple function for normalizing property
names/values. But the code is so small, changes rarely, and is easily
tested, that it's just not worth bringing in a whole dependency for it
given how big regex-syntax already is.
2019-08-03 16:09:49 -04:00
Andrew Gallant
caa075f653 syntax: absorb utf8-ranges crate
This commit brings the utf8-ranges crate into regex-syntax as a utf8
sub-module.

This was done because it was observed that utf8-ranges is effectively
unused outside the context of regex-syntax. It is a very small amount of
code, and fits alongside the rest of regex-syntax. In particular, anyone
building a regex engine using regex-syntax will likely need this code
anyway.
2019-08-03 16:09:49 -04:00
Andrew Gallant
fc3e6aa19a
license: remove license headers from files
The Rust project determined these were unnecessary a while back[1,2,3]
and we follow suite.

[1] - 0565653eec
[2] - https://github.com/rust-lang/rust/pull/43498
[3] - https://github.com/rust-lang/rust/pull/57108
2019-08-03 14:47:45 -04:00
Andrew Gallant
0e96af4166
style: start using rustfmt 2019-08-03 14:20:22 -04:00
Andrew Gallant
341f207c10
regex-syntax-0.6.10 2019-07-20 23:01:44 -04:00
Andrew Gallant
dc111a5f19
syntax: update Unicode ages lookup
This was a missed fix for the Unicode 12.1 update.
2019-07-20 23:01:23 -04:00
Andrew Gallant
0c57ea14ea
syntax: release 0.6.9 2019-07-20 22:46:46 -04:00
Andrew Gallant
3124a3b2ca
syntax: update to Unicode 12.1 2019-07-20 22:45:39 -04:00
Andrew Gallant
918350a59b
msrv: bump to Rust 1.28
Rust 1.28 is almost a year old by this point, and there were a number of
nice stabilizations between 1.24 and 1.28. Notably, vendor intrinsics were
stabilized in Rust 1.26, so we no longer need a build script.
2019-07-20 22:35:18 -04:00
Gurwinder Singh
dfe0dc6493 syntax/doc: fix typo 2019-07-14 08:04:21 -04:00
Andrew Gallant
62b7b508fa
regex-syntax-0.6.8 2019-07-06 09:16:20 -04:00