Commit Graph

201 Commits

Author SHA1 Message Date
openharmony_ci
cf2a0fb545
!3 修改软件名
Some checks failed
ci / test (beta, ubuntu-18.04, beta) (push) Has been cancelled
ci / test (macos, macos-latest, stable) (push) Has been cancelled
ci / test (nightly, ubuntu-18.04, nightly) (push) Has been cancelled
ci / test (pinned, ubuntu-18.04, 1.41.1) (push) Has been cancelled
ci / test (stable, ubuntu-18.04, stable) (push) Has been cancelled
ci / test (stable-32, ubuntu-18.04, stable, i686-unknown-linux-gnu) (push) Has been cancelled
ci / test (stable-mips, ubuntu-18.04, stable, mips64-unknown-linux-gnuabi64) (push) Has been cancelled
ci / test (win-gnu, windows-2019, stable-x86_64-gnu) (push) Has been cancelled
ci / test (win-msvc, windows-2019, stable) (push) Has been cancelled
ci / rustfmt (push) Has been cancelled
Merge pull request !3 from archane/master
2024-11-04 08:10:59 +00:00
zhaipeizhe
f603e0758b update Name in README.OpenSource
Signed-off-by: zhaipeizhe <zhaipeizhe@huawei.com>
Change-Id: I3741af390f8f8373cddd3043b788676b1062c8c3
2024-10-31 16:19:15 +08:00
openharmony_ci
06f542944d
!2 Add OAT.xml and README.OpenSource
Merge pull request !2 from fangting/master
2023-04-14 08:11:08 +00:00
fangting
e5ba656f46 Add OAT.xml and README.OpenSource
Signed-off-by: fangting <fangting12@huawei.com>
2023-04-14 14:14:34 +08:00
openharmony_ci
8d53e3e293
!1 [aho-corasick]Add GN Build Files and Custom Modifications to Rust Third-party Libraries
Merge pull request !1 from lubinglun/master
2023-04-13 11:34:10 +00:00
lubinglun
f64d41183d Add GN Build Files and Custom Modifications
Issue:https://gitee.com/openharmony/build/issues/I6UFTP
Signed-off-by: lubinglun <lubinglun@huawei.com>
2023-04-12 17:25:42 +08:00
Andrew Gallant
7e231db4b4
0.7.20 2022-11-21 22:35:53 -05:00
Andrew Gallant
7705c0aab6
nfa: fix 'heap_bytes' calculation
We weren't previously accounting for the memory used by 'State' itself,
and instead only counts the *heap* memory used by 'State'.

Fixes #85
2022-11-21 22:00:54 -05:00
Andrew Gallant
9e42ff1c95
doc: note that Unicode case folding is unlikely to happen
Closes #70
2022-11-21 22:00:54 -05:00
Alex Touchet
a5c037435c
cargo: fix license specification + badge
PR #87
2022-11-21 21:47:06 -05:00
James Youngman
4bd157881a
doc: fix wording
PR #90
2022-11-21 21:45:36 -05:00
Bráulio Bezerra
c640920ee5
doc: remove duplicate "the"
PR #94
2022-11-21 21:44:52 -05:00
Andrew Gallant
2a6d8f3d68 nfa: massively simplify leftmost failure transitions
The key insight here is that all we need to do to support leftmost
semantics is to omit ALL failure transitions that appear after a match
state in the trie. (And to omit any entries in the trie that cross a
previously existing match state for leftmost-first semantics, and keep
them for leftmost-longest.)

Previously, I had somehow convinced myself that the subset was more
difficult to identify and required comparing depths. But this is just
not the case. Moreover, once you set the match state to have a failure
transition to the dead state, it automatically propagates to all
subsequent states.

This is such a huge simplification that I combined the 'standard' and
'leftmost' failure transition construction into a single method.

Fixes #92
2022-11-21 21:41:46 -05:00
Andrew Gallant
979cf735e2
0.7.19 2022-09-03 13:30:50 -04:00
Andrew Gallant
5fa8eda68c
ci: switch from 'v1' to 'master' for dtolnay action 2022-09-03 13:30:01 -04:00
Ten0
9d72e93c87
api: add Match::len method
PR #97
2022-09-03 13:28:54 -04:00
Andrew Gallant
9af8bb9339
ci: switch to dtolnay/rust-toolchain
The actions-rs/toolchain project appears dead.
2022-07-14 13:21:10 -04:00
Andrew Gallant
ec58090dca
ci: pin cross to v0.2.1
Ref https://github.com/rust-lang/regex/pull/869
2022-06-14 09:15:27 -04:00
Andrew Gallant
c1526a8a54
lint: remove dead code
The unused 'start' field in NonMatch is likely a remnant of some
experiments I was doing to get streaming search working with
leftmost match semantics.

The fact that 'config' is unused in the packed searcher was at
first surprising, but it's only ever used as part of construction.
2022-06-02 11:58:26 -04:00
Eli Doran
f8197afced
doc: fix a few typos
PR #86
2021-07-03 07:52:44 -04:00
Petar Dambovaliev
4499d7fdb4
impl: remove unused field and elide lifetime
Fixes #80
2021-05-12 12:01:21 -04:00
Alex Touchet
789774cc11
doc: update links
PR #79
2021-05-06 10:49:21 -04:00
Andrew Gallant
416a02715a
readme: add link to Python wrapper
Kudos to @itamarst for putting in the work to build a nice wrapper!

Closes #77
2021-05-03 17:48:34 -04:00
Andrew Gallant
1b116376d6
0.7.18 2021-04-30 19:53:19 -04:00
Andrew Gallant
8ac8f73a2d
build: fix compilation on i686
It looks like 'cargo fix' didn't quite fix all 'use' statements.
2021-04-30 19:52:55 -04:00
Andrew Gallant
33c65227a3
0.7.17 2021-04-30 19:38:02 -04:00
Andrew Gallant
2281bf6971 deps: update to memchr 2.4
We also use the 'std' feature in lieu of the 'use_std' feature, which
was deprecated quite some time ago.
2021-04-30 19:35:31 -04:00
Andrew Gallant
b149915f5d msrv: bump to Rust 1.41
This is in line with similar changes to the regex and memchr crates:
https://github.com/BurntSushi/memchr/pull/82
and
https://github.com/rust-lang/regex/pull/767
2021-04-30 19:35:31 -04:00
Andrew Gallant
04e8c74175 api: deprecate byte classes and premultiply options
These options aren't really carrying their weight. In a future release,
aho-corasick will make both options enabled by default all the time with
the impossibility of disabling them. The main reason for this is that
these options require a quadrupling in the amount of code in this crate.

While it's possible to see a performance hit when using byte classes, it
should generally be very small. The improvement, if one exists, just
doesn't see worth it.

Please see https://github.com/BurntSushi/aho-corasick/issues/57 for more
discussion.

This is meant to mirror a similar decision occurring in regex-automata:
https://github.com/BurntSushi/regex-automata/issues/7.
2021-04-30 19:35:31 -04:00
Andrew Gallant
c3136f12da bench: update to criterion 0.3.4 2021-04-30 19:35:31 -04:00
Andrew Gallant
b253580d08 edition: run 'cargo fix --edition --edition-idioms' 2021-04-30 19:35:31 -04:00
Andrew Gallant
45a4ee770e edition: run 'cargo fix --edition --all' 2021-04-30 19:35:31 -04:00
Andrew Gallant
3852632f10
0.7.15 2020-11-03 12:20:49 -05:00
Andrew Gallant
682f96a7fe
nfa: fix another ASCII case insensitive bug
When building the failure transitions, we iterate over the transitions
of each state. When ASCII case insensitivity is enabled, it's possible
for this transition list to contain duplicate states which in turn
results in creating duplicate matches in the NFA graph. It turns out
that this is strictly redundant work, so if we had already see that
state, we can skip it.

Fixes #68
2020-11-03 12:20:37 -05:00
Andrew Gallant
a416b0c6f2
ci: fix setting of environment variables
See: https://docs.github.com/en/free-pro-team@latest/actions/reference/workflow-commands-for-github-actions#environment-files

See: https://github.blog/changelog/2020-10-01-github-actions-deprecating-set-env-and-add-path-commands/
2020-10-12 19:59:29 -04:00
Andrew Gallant
63f0b52523
0.7.14 2020-10-12 19:34:57 -04:00
Andrew Gallant
e5ea12873a prefilter: fix bug when doing a stream search
This fixes yet another bug in the prefilter. Sigh. This only occurs when
doing a stream search. The problem is that the stream handling code
assumes that if no match is found at the end of the buffer, then the
current state of the automaton is correctly updated and the buffer can
be rolled.

With most prefilters that look for a candidate *start* of a match, this
is okay. If a prefilter can't find anything, then there's nothing to
start and the current state remains in the starting state.

But if the prefilter looks for a byte that may not be at the start of
the match---like the rare byte prefilter---then we cannot assume that a
match doesn't begin near the end of the buffer searched. And in this
case, the internal implementation of search doesn't correctly hold up
it's contract because the current state won't be updated. That is, there
is an embedded assumption that if a prefilter fails then there is no
match and thus there is no need to update the current state ID. But of
course, this is just not true in a streaming context.

The right way to fix this is unfortunately to rethink how we've
implemented stream searching and make it aware of these kinds of
prefilters. I think, anyway. The other option would be to fix the lower
level search APIs to always make sure the current state ID is correct.
That would fix everything, but that seems tricky and probably requires
some delicate handling.

So for now, we just disable a prefilter entirely if it's a rare byte
prefilter and we're doing a stream search. We could build a backup
prefilter and still use that, but it feels like a gross hack. At least
now, we preserve correctness.

Kudos to @ogoffart who did the initial investigation here and came up
with a regression test, which is included in this commit. Note though,
that some tests do fail when the buffer's size is set to its minimum. So
there was a regression at some point because we aren't getting the best
test coverage. We should just bite the buffer and make the buffer size
configurable as an internal API so that tests can tweak it and provoke
more edge cases.

Fixes #64
2020-10-12 19:34:39 -04:00
Ten0
39b029bb22
doc: fix confusing typo
PR #63
2020-07-01 13:16:53 -04:00
Andrew Gallant
55a42968a2
0.7.13 2020-06-23 08:27:48 -04:00
Andrew Gallant
e2cb94a384
tests: remove use of doc_comment crate
It relies on `cfg(doctest)`, which wasn't stabilized until Rust 1.43.
Interestingly, it compiled on Rust 1.28, but didn't compile on, e.g.,
Rust 1.39. This breaks our MSRV policy, so we unfortunately remove the
use of doc_comment for now. It's likely possible to conditionally
enable it, but the extra build script required to do version sniffing to
do it doesn't seem worth it.

The same problem occurred with the regex crate:
d7fbd158f7

Fixes #62
2020-06-23 08:26:27 -04:00
Andrew Gallant
6cb8eb0983
0.7.12 2020-06-22 13:33:20 -04:00
Andrew Gallant
8373f243e3
doc: update some documentation
This adds a few things to the feature list and updates the section on
prefilters to be in line with the current implementation. (The section
on prefilters had been written before aho-corasick adopted the Teddy
implementation.)
2020-06-22 13:32:31 -04:00
Andrew Gallant
bd8c11d295
0.7.11 2020-06-22 12:58:37 -04:00
Draphar
6b1acde65b
api: respect the closure return value
This fixes a bug where the replace_all_with routine wouldn't actually
stop when the closure returned false, even though the documentation
promised it would.

This commit includes test cases in the form of documentation examples.

Closes #59
2020-06-22 12:58:28 -04:00
Guillaume Gomez
933b6a71ae
tests: replace "cfg(test)" with "cfg(doctest)" for readme testing
rustdoc now passes "doctest" when running in test mode.

PR #60
2020-05-07 08:07:51 -04:00
Andrew Gallant
36de9d383a
0.7.10 2020-03-08 19:56:29 -04:00
Andrew Gallant
8b479a6090 style: fix rust-analyzer warnings
I still haven't figured out how to turn these warnings off. So just fix
them.
2020-03-08 19:56:08 -04:00
Andrew Gallant
e9110e994b prefilter: fix another case insensitive prefilter bug
This fixes another bug in the handling of case insensitivity inside
the rare byte prefilter. In particular, we were not correctly populating
the byte offset table when ASCII case insensitivity was enabled. Instead
of just setting the offsets for bytes we've seen, we also need to set
offsets for the ASCII case insensitive version of each byte we see. We
add that in this commit along with a regression test.

Fixes #55
2020-03-08 19:56:08 -04:00
Andrew Gallant
c6e47f76b2
0.7.9 2020-02-26 20:25:46 -05:00
Andrew Gallant
d2ade94657 prefilter: fix rare byte prefilter
This fixes a rather nasty bug that occurred when the rare byte prefilter
computed its shift offset incorrectly. In particular, when a rare byte
is found using a prefilter, we shift backwards in the haystack by the
maximum amount possible before confirming whether a match exists or not.
If this shift is not actually the maximum amount possible, then it's
quite possible that we will miss a match. (N.B. The prefilter
infrastructure takes care to avoid accidentally quadratic behavior.)

The specific regression in this case was caused by searching for these
two patterns:

    ab/j/
    x/

which would erroneously fail to match this haystack

    ab/j/

When prefilters are enabled (the default), this particular search would
use the "rare two byte" prefilter. Specifically, it would detect '/' and
'j' as rare bytes, with '1' as the max offset for '/' and '3' as the max
offset for 'j'. The former is clearly incorrect, since '/' occurs at
offset 4 in the first pattern. This was being incorrectly computed
because we weren't actually looking at all possible bytes in all
patterns and recording their offsets. Once we found a rare byte, we
stopped trying to find more occurrences of it.

We fix this byte now recording the maximum offsets of _all_ bytes for
_all_ patterns given. That way, we're guaranteed to have the correct
maximal shift amount for any rare byte found.

Fixes #53
2020-02-26 20:25:13 -05:00