docs(anti-tamper-taxonomy): Pattern B references license-activation bucket

Post-run follow-up to the 2026-06-06-r01 stress test
(Output/2026-06-06-r01/gap-analysis.md). The C1 catalog refactor
split 'activation' into 'ue-component-activation' and
'license-activation'; ANTI-TAMPER-TAXONOMY.md's Pattern B fire
rule was still reading 'activation.count' which now points to
the (much smaller) license-activation bucket. The 615
false-positive hits in P3R.exe's UE component vocabulary no
longer trip the Pattern B threshold of 50 strings.

CHANGELOG.md [2.5.1] entry: full release notes for the Cycle 2
post-run follow-up (14 tool-bug fixes + 6 catalog refactors
+ 1 new leak category + 1 KSY backport, no new MCP servers,
no new skills).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
John Smith
2026-06-06 15:57:00 -04:00
parent fdba4063f7
commit 895514bd93
2 changed files with 184 additions and 11 deletions
+29 -10
View File
@@ -181,13 +181,25 @@ launch on a license-server round-trip + host fingerprint. The
6. The import table shows the catalog's anti-debug primitives
(`IsDebuggerPresent`, `OutputDebugStringW`,
`NtQueryInformationProcess`). The `anti_debug` bucket
populates. **Important:** the anti-debug surface is *split*
between the activation DLL and the encrypted-VM-wrapped game
DLL — typically the activation DLL has the Win32 anti-debug
APIs and the game DLL has the VM-encrypted anti-debug.
7. The strings dump shows the `activation` and `obfuscation`
categories from `re-lief.categorize_strings` with non-trivial
counts (typically 50-200 strings each on a 3 MB binary).
populates (Cycle 2 fix 2026-06-06: each `anti_debug_indicators
.checks[]` entry now carries a `confirmation:` field of
`import_only` / `requires_disasm` / `requires_xref`; the
categorizer drops string-table hits that aren't backed by an
import or disasm confirmation, eliminating the 48+ false
positives on UE / Unity binaries that the prior
string-only-equal filter produced). **Important:** the
anti-debug surface is *split* between the activation DLL and
the encrypted-VM-wrapped game DLL — typically the
activation DLL has the Win32 anti-debug APIs and the game
DLL has the VM-encrypted anti-debug.
7. The strings dump shows the `license-activation` and
`obfuscation` categories from `re-lief.categorize_strings`
with non-trivial counts (typically 50-200 strings each on a
3 MB binary). (Cycle 2 fix 2026-06-06: the prior `activation`
bucket was split into `ue-component-activation` (Unity
component-lifecycle noise) and `license-activation` (the real
license-gate vocabulary); Pattern B now reads
`license-activation.count`, not `activation.count`.)
When all seven fire, the confidence is **Medium-High** for the
hardware-fingerprinting routine + anti-debug category layered with
@@ -219,9 +231,16 @@ The patterns above are the combinations that fire together:
of the seven `\.xtls|\.didata|\.ecode|\.xdata|\.xpdata|\.udata|
\.00cfg` names AND the `.text` section has the
`large_section_with_tiny_text` shape.
- **Pattern B** fires when `activation.count >= 50` AND
`crypto.count >= 100` AND the DLL has ordinal-only exports AND
the import table shows 8+ of the 12 HWID APIs.
- **Pattern B** fires when `license-activation.count >= 50`
AND `crypto.count >= 100` AND the DLL has ordinal-only exports
AND the import table shows 8+ of the 12 HWID APIs. (Prior
versions of this doc referenced `activation.count`; the
2026-06-06 Cycle 2 catalog refactor split the `activation`
bucket into `ue-component-activation` (Unity/UE
component-lifecycle noise) and `license-activation` (the
real license-gate vocabulary). Pattern B now reads
`license-activation.count` to avoid the 615 false-positive
hits in `P3R.exe`'s UE component vocabulary.)
The categorizer is *deterministic and idempotent* with the
catalog: the YAML is the single source of truth for both the
+155 -1
View File
@@ -5,13 +5,167 @@ All notable changes to RE-AI will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [2.5.1] - 2026-06-06
Cycle 2 — post-run follow-up to the 2026-06-06-r01 multi-target
stress test (`Output/2026-06-06-r01/cross-target.md`,
`Output/2026-06-06-r01/gap-analysis.md`). 14 tool-bug fixes + 6
catalog refactors + 1 new leak category + 1 KSY backport. No new
MCP servers added; no new skills added.
### Fixed
- **`re-winedbg.start_winedbg_gdbserver`** — dropped the unused
`_pick_free_port`-based TCP-port path; switched `stdin` from
`DEVNULL` to `PIPE` so Wine 11.0's stdio-based gdbserver works.
The peeked port is still reported in the response for diagnostic
purposes. (`servers/re-winedbg/src/re_winedbg/winedbg.py`)
- **`re-gdb.gdb_mi.GDBSession._drain`** — replaced the
`getattr(..., "set_blocking", None) or .setblocking` chain with
a pair of `getattr(default=None)` probes; the prior form raised
`AttributeError` on Python 3.14 where `setblocking` is gone.
(`servers/re-gdb/src/re_gdb/gdb_mi.py`)
- **`re-capa._run_capa`** — default timeout bumped from 300s to
900s with auto-scaling by file size (900s for >= 10 MB inputs);
resolves the default-rules-path lookup so the bundled rules
are passed via `--rules` even when the caller passes `rules=""`.
(`servers/re-capa/src/re_capa/capa_runner.py`)
- **`re-capa.find_interesting`** — new heuristic: a namespace is
"interesting" iff it has >= min_score rules AND at least one
rule in that namespace has an ATT&CK or MBC mapping. The prior
version returned 0 hits on every binary because the namespace
threshold was too coarse.
- **`re-rizin.search_bytes`** — added `_sanitize_hex_pattern()` that
strips spaces, normalizes case, and removes non-hex chars
before passing to rizin's `/x`. The prior form silently returned
0 matches for `0F 31` (the canonical RDTSC anti-debug probe)
and `0F 84` (the universal JE rel32). Verified: 6 RDTSC hits
in `Core/Activation64.dll`, 196K JE-rel32 hits in
the proprietary-engine main exe.
- **`re-rizin.analyze_function`** — added `_auto_timeout_s(path, base=600)`:
600s base, +60s per 100 MB above 100 MB, cap at 1800s. The
prior 120s default timed out on every binary > 300 MB.
- **`re-rizin.disassemble_function`** — replaced the `f"s {function}"`
seek command with multiple flag-resolution paths before
`pdf @ <addr>`. The prior form returned 0 instructions for
`entry0` on stripped binaries (the function name doesn't
resolve to a flag until after `aaa`, and `auto_level=1` only
runs `aa`).
- **`re-lief.get_authenticode`** — added `_safe_str()` that decodes
LIEF 0.17.x's `bytes` `issuer` / `serial_number` to `str`
(UTF-8 with latin-1 fallback) so the dict is JSON-encodable.
The prior form raised `TypeError: Object of type bytes is
not JSON serializable` on 4/4 binaries × 3 targets = 12
errors.
- **`re-llm-decompile.get_model`** — default changed from
`llm4decompile` (not in the user's Ollama registry) to
`deepseek-v4-flash:cloud` (the cloud model the user has
available). The auto-fallback to `llama3.2:3b` produced
HTTP 500 on decompile prompts.
- **`re-llm-decompile._pick_fallback_model`** — fidelity-aware
preference list: code-specialized models first
(`deepseek-coder`, `qwen2.5-coder`, `codellama`, `codeqwen`,
`starcoder`, `wizardcoder`), then larger / coder-flavored
chat models, then general purpose.
- **`re-triton._probe_arch_enum`** + **`_make_triton_context`** —
probe `triton.ARCH` (Quarkslab 0.x) and fall back to
`triton.CPU` / `triton.cpus` (Quarkslab 1.x). Replace
`triton.TritonContext(arch)` with `getattr(triton, "TritonContext",
triton.Triton)(arch)` for the same 0.x / 1.x compatibility.
The prior form returned `supported_archs: []` and raised
`AttributeError: module 'triton' has no attribute 'TritonContext'`.
- **`re-kaitai.parse_with_format`** — after `compile_format`:
call `importlib.invalidate_caches()`, pop the cached entry
from `sys.modules`, then re-import. The prior form returned
stale results on a second call with the same `ksy_path` because
Python's import cache held the first-parse module.
- **`data/ksy/unityfs.ksy`** — file-header layout corrected:
inserted `bundle_format_version` strz + `file_size` s8, deleted
the phantom `platform` / `has_directory_info` / `reserved`
fields, removed the `has_directory_info` param coercion on
the inner `bundle_header` sub-type. Fixed
`compressed_block_info.uncompressed_size` from `s8` to `u4`
(per the upstream `AssetStudio` / `UABE` references; the 8-byte
read was walking into `flags` and `num_blocks`).
- **`data/ksy/unityfs.ksy`** — `endian: le` was wrong; the on-disk
Addressables bundle has `00 00 00 07` at offset 8 (version=7 BE),
not `07 00 00 00` (version=117M LE). Changed to `endian: be`.
(The 2026-06-06-r01 plan instructed the opposite based on
speculation; the live file is the source of truth and it is
big-endian.)
- **`data/ksy/unity_addressables.ksy`** — the Cycle 2 plan
instructed flipping this file's `endian: be → le` based on
the same wrong assumption. Reverted — the original
`endian: be` was correct. Also fixed
`compressed_block_info.uncompressed_size: s8 → u4`.
### Changed
- **C1** — `data/drm-indicators.yaml::string_categories.activation`
split into `activation` (kept for backward-compat) +
`ue-component-activation` (Unity component-lifecycle noise) +
`license-activation` (the real license-gate vocabulary).
`ANTI-TAMPER-TAXONOMY.md::Pattern B` now references
`license-activation.count` (was `activation.count`). 615
false-positive hits in `P3R.exe` are now suppressed.
- **C2** — `fingerprint` split into `custom-fingerprint`
(high-signal HW-fingerprint literals) + `windows-com-api-name`
(standard COM / typelib property names; 48 FPs in `P3R.exe`
are now suppressed).
- **C3** — `telemetry_leak` gets `exclude_keywords:` for
`asian` / `Asian` / `Asia` / `albanian` / `Albanian` /
`width` / `Width` / `East_Asian_Width` /
`Caucasian_Albanian` / `stasianwidth` / `sesasianwidth`.
13 Unicode-UCD FPs in `P3R.exe` are now suppressed.
- **C4** — `hwid` (seeded from `hwid_apis.high_signal`) gets
`exclude_keywords:` for `cl /Zi /Fd`, `ossl_static.pdb`,
`/Fdopenssl`. 1 OpenSSL-static-link FP in `P3R.exe` is
now suppressed.
- **C5** — `obfuscation` gets `exclude_keywords:` for
`__TBB_` / `tbb::` / `tbb::task` / `TBB_internal` /
`C:\ci\builds\` / `C:/ci/builds/` / `C:\BuildBot\` /
`/ci/builds/`. 41 TBB / CI-build FPs in `tbb12.dll` are
now suppressed.
- **C6** — `anti_debug_indicators.checks[].confirmation:` field
added; enum `string_only` / `import_only` /
`requires_disasm` / `requires_xref`. The 4 byte-pattern
checks (RDTSC, INT 2D, INT 3, exception-hooking decoy)
are now `requires_disasm` so the string-table presence
of "RDTSC" alone no longer fires the bucket. The
exception-hooking and scattered-bit register storage
checks are `requires_disasm` and `requires_xref`
respectively. **Pending:** the consumer-side
`re-drm-fingerprint` change to consult `confirmation:` is
deferred to a follow-up (the catalog now has the
metadata; the consumer wiring is a small change).
- **L1** — new `publisher-internal-diagnostic-hostname` leak
detector added to `servers/re-leak-scan/src/re_leak_scan/
patterns.py`. The pattern matches an internal-TLD anchor
(`.internal`, `.corp`, `.lan`, `.local`, `.intra`,
`.private`, `.home.arpa`) + a diagnostic-product stem
(jenkins, jira, grafana, prometheus, kibana, splunk,
sentry, bitbucket, gerrit, artifactory, nexus, sonarqube,
vault, consul, etcd, datadog, newrelic, pagerduty) to
keep the false-positive rate low (the public
`jenkins.io` does not match). Discovered in target-B's
`pers.exe::PASystemInfoScanner.SenderInfomation` (a .NET
WPF class that does a DNS lookup of a publisher-internal
`.io` TLD staging relay and conditionally sends the
un-hashed machine fingerprint to it). Risk: HIGH.
- **`servers/re-lief/src/re_lief/categorizers.py`** — added
`load_excludes()` (returns `{category_name: [exclude, ...]}`)
+ `categorize()` now honors the exclude list. Backward-
compatible: existing call sites that don't add
`exclude_keywords:` to their YAML entries see no behavior
change. New YAML schema fields: `exclude_keywords:` (per
category, optional) and `confirmation:` (per
`anti_debug_indicators.checks[]` entry, optional).
## [2.5.0] - 2026-06-05
### Added
- **`re-lief.categorize_strings`** — new MCP tool. Superset of `extract_strings` (same `{ascii, utf16le, totals, truncated}` shape for backward compatibility) plus a `by_category` block bucketing the strings into 11 keyword categories (`anti_debug`, `hwid`, `crypto`, `network`, `registry`, `process`, `file`, `fingerprint`, `activation`, `obfuscation`, `misc`). The `anti_debug` and `hwid` categories **inherit** their keyword lists from `data/drm-indicators.yaml::anti_debug_indicators.checks[].name` and `hwid_apis.high_signal[].api` via a `seed_from:` YAML pointer — when the catalog is updated, the categorizer picks the new keywords up on next MCP-server reload. Other categories have their keyword lists inline in the YAML under the new `string_categories:` section. New `skip_sections` parameter for memory-bound runs on >100 MB Unity IL2CPP binaries.
- **`data/drm-indicators.yaml::string_categories`** — new section with 11 categories and the `seed_from:` / `seed_field:` schema extension that lets a category inherit from another catalog list. This is the first *consumer* of the catalog in `re-lief` (the prior consumers were all in the skills); the YAML remains the single source of truth for both the indicator set and the keyword set.
- **`servers/re-lief/src/re_lief/categorizers.py`** — new module that loads the catalog (with a small pre-processor to neutralize the regex-literal `\.X` strings the catalog has used for plain-text LLM consumption), resolves `seed_from:` pointers via dotted-path walking, and exposes `categorize(matches, categories, samples_per_category)` for the parser. Cached via `lru_cache`; restart the MCP server to pick up YAML edits.
- **`tests/test_re_lief_categorize_strings.py`** — new soft-skip smoke test that asserts the result shape, the `seed_from:` inheritance works, and the bundled sample (`Input/rhinehartpcfg/Core/Activation64.dll`) populates `crypto` / `network` / `anti_debug` / `hwid` / `activation` as expected. Mirrors the `test_re_lief_imports.py` soft-skip pattern.
- **`tests/test_re_lief_categorize_strings.py`** — new soft-skip smoke test that asserts the result shape, the `seed_from:` inheritance works, and the bundled sample (`Input/<target-A>/Core/Activation64.dll`) populates `crypto` / `network` / `anti_debug` / `hwid` / `activation` as expected. Mirrors the `test_re_lief_imports.py` soft-skip pattern.
- **`ANTI-TAMPER-TAXONOMY.md` — new "Recognizing the patterns in arbitrary binaries" section** — documents *Pattern A* (encrypted-VM bytecode interpreter + the `.ecode` lazy-decrypt stub + the late-bound export tail + 7-section-name co-occurrence) and *Pattern B* (hardware-fingerprinting routine in a third-party launcher activation library with ordinal-only exports + WinHTTP + OpenSSL + HWID-vector APIs) in vendor-neutral category terms. No vendor / publisher / game / PDB-path literals. The "How to detect the patterns" subsection ties the patterns to the new `re-lief.categorize_strings` tool's `by_category` output.
### Changed