161 Commits

Author SHA1 Message Date
Andrew Gallant
d385028ed4 version bumps. 2015-07-05 13:17:05 -04:00
Andrew Gallant
cedfc8db51 Re-work case insensitive matching.
In commit 56ea4a, char classes were changed so that case folding them
stored all possible variants in the class ranges. This makes it possible
to drastically simplify the compiler to the point where case folding flags
can be completely removed. This has two major implications for
performance:

  1. Matching engines no longer need to do case folding on the input.
  2. Since case folding is now part of the automata, literal prefix
     optimizations are now automatically applied even to regexes with
     (?i).

This makes several changes in the public API of regex-syntax. Namely,
the `casei` flag has been removed from the `CharClass` expression and
the corresponding `is_case_insensitive` method has been removed.
2015-07-05 13:13:41 -04:00
Andrew Gallant
fb5868fcc7 version bump 2015-07-05 11:47:31 -04:00
Andrew Gallant
56ea4a835c Fixes #99.
TL;DR - The combination of case folding, character classes and nested
negation is darn tricky.

The problem presented in #99 was related to how we're storing case folded
character classes. Namely, we only store the canonical representation
of each character (which means that when we match text, we must apply
case folding to the input). But when this representation is negated,
information is lost.

From #99, consider the negated class with a single range `x`. The class is
negated before applying case folding. The negated class includes `X`,
so that case folding includes both `X` and `x` even though the regex
in #99 is specifically trying to not match either `X` or `x`.

The solution is to apply case folding *after* negation. But given our
representation, this doesn't work. Namely, case folding the range `x`
yields `x` with a case insensitive flag set. Negating this class ends up
matching all characters sans `x`, which means it will match `X`.

So I've backtracked the representation to include *all* case folding
variants. This means we can negate case folded classes and get the
expected result. e.g., case folding the class `[x]` yields `[xX]`, and
negating `[xX]` gives the desired result for the regex in #99.
2015-07-05 11:46:11 -04:00
Andrew Gallant
1e79c4d9ee regex-syntax: version bump 2015-06-02 18:16:43 -04:00
Andrew Gallant
f9fc8614d2 Optimize case folding.
When `regex-syntax` is compiled under debug mode, case folding can
take a significant amount of time. This path is easily triggered by
using case insensitive regexes.

This commit speeds up the case folding process by skipping binary
searches, although it is still not optimal. It could probably benefit
from a fresh approach, but let's leave it alone for now.
2015-06-02 18:16:04 -04:00
Andrew Gallant
7a72b1fc57 version bump.
Actually, I don't think I needed to bump `regex` proper. Whoops.
2015-05-28 19:14:55 -04:00
Pascal Hertleif
c427a3f4ff Adjust Some Formatting, Use checkadd More
Related to #88
2015-05-29 00:52:43 +02:00
Pascal Hertleif
13eb7bef5f Add '\#' Escaping
Fixes #88
2015-05-28 20:22:54 +02:00
Pascal Hertleif
349158ed27 [WIP] Treat '#' as Punctuation
Relates to #88
2015-05-28 18:31:06 +02:00
Andrew Gallant
6d5e909e5e Fixes from code review.
The big change here is the addition of a non-public variant in the
error enums. This will hint to users that one shouldn't exhaustively
match the enums in case new variants are added.
2015-05-27 18:43:28 -04:00