Post-run follow-up to the 2026-06-06-r01 stress test (Output/2026-06-06-r01/gap-analysis.md). The C1 catalog refactor split 'activation' into 'ue-component-activation' and 'license-activation'; ANTI-TAMPER-TAXONOMY.md's Pattern B fire rule was still reading 'activation.count' which now points to the (much smaller) license-activation bucket. The 615 false-positive hits in P3R.exe's UE component vocabulary no longer trip the Pattern B threshold of 50 strings. CHANGELOG.md [2.5.1] entry: full release notes for the Cycle 2 post-run follow-up (14 tool-bug fixes + 6 catalog refactors + 1 new leak category + 1 KSY backport, no new MCP servers, no new skills). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
18 KiB
Anti-Tamper Taxonomy
Status: Public-facing reference. Vendor-neutral. Audience: Reverse engineers using RE-AI on binaries wrapped in anti-tamper or VM-based protection.
What this document is
RE-AI's data/drm-indicators.yaml catalog records the observable
signatures of anti-tamper and VM-based protection schemes — section
names, import sets, byte patterns, PEB reads — and the categories of
protection those signatures suggest. This document explains the
taxonomy: what categories the toolchain recognizes, the inference chain
from a binary's observable features to a category, and the negative
space (what RE-AI explicitly does not do).
The catalog is vendor-neutral. The patterns in
data/drm-indicators.yaml and the skills in skills/ describe
categories of protection (encrypted-VM bytecode, MBA-obfuscated
arithmetic, legacy disc-based protection, etc.) — not specific
commercial products. The user supplies vendor attribution based on
their context.
The categories
Pattern C — encrypted-VM bytecode interpreter (proprietary-engine target)
Added 2026-06-05 per the Sample B finding. The proprietary-engine
target family uses a distinct section set (.arch, .link,
.xcode, .xtext, .sbss) and the encrypted body lives in
.rodata (often 100x the size of .text).
- Section table —
.arch / .link / .xcode / .xtext / .sbssis the marker. The encrypted body is in.rodata(high entropy, large size), not in a.vmp0/.xtls/.ecodestyle dedicated section. - Real native code —
.textis small and normal-entropy. The decrypted dispatch + handler body is what runs at runtime. - Anti-debug — standard PEB.BeingDebugged / NtQueryInformationProcess / RDTSC / CPUID patterns, but wrapped by the proprietary engine's own anti-tamper stub (not a vendor-attributed wrapper).
- Distinct from Pattern A — Pattern A is the Unity IL2CPP
target variant: GameAssembly.dll + global-metadata.dat
pairing,
.xtls/.didata/.ecode/.xdata/.xpdata/.udata/.00cfgsection set, encrypted body in.xtls(the highest-entropy region, 7.85+ entropy). Pattern C is the proprietary-engine target variant: no GameAssembly.dll, no.xtls-style dedicated section, encrypted body in.rodata.
The pattern_indicators.mappings entry "encrypted-VM bytecode,
proprietary-engine target" carries a confidence: Medium-High
because the .arch / .link / .xcode / .xtext / .sbss section
set is rare outside this family — but the encrypted body in
.rodata is also seen in packed-but-not-protected binaries,
so confirm with a dispatcher + lazy-decrypt-stub detection
before publishing.
Pattern D — publisher telemetry pipeline leak
Added 2026-06-05 per the Sample A + B findings. This is an attack-surface category, not a protection category. The binary's string table contains publisher operational infrastructure that has no business being shipped:
- Sentry DSN —
https://<public-key>@<host>/<project-id>form. The public key alone is enough to submit forged crash reports. - Logstash / log-ingestion URL — internal observability endpoint. Often POST-only but the URL leaks the host.
- Confluence wiki page — internal engineering docs / secrets, often link-only but still information disclosure.
- Google Drive document URL — publisher-internal design docs (the bulk of Sample A's 16,236 string matches).
- Long-lived credentials — AWS access key IDs (
AKIA…,ASIA…), Slack tokens (xox[bpaeors]-…).
Detected by re-leak-scan (regex pass over the string table)
and bucketed into string_categories::telemetry_leak in
re-lief.categorize_strings. The re-telemetry-extract skill
adds an active HTTP probe (verify_sentry_dsn,
verify_confluence_url) to confirm each endpoint is still
live.
This is not a DRM / anti-tamper pattern — the encrypted-VM bytecode wrapper does not prevent these leaks because the URL strings are typically not encrypted. The skill output should report the leak as a separate finding, not bundle it with the encrypted-VM finding.
| Category | Description | Recognizable by |
|---|---|---|
| encrypted-VM bytecode interpreter | The binary's real x86 code is replaced by a register-based VM; a dispatcher fetches handlers from a table. | High-entropy encrypted-TLS section; tiny .text; massive .idata or similar; ordinal-only ws2_32 imports; HWID-vector API set; PDB filename with a vendor tag |
| MBA-obfuscated arithmetic | Arithmetic operations rewritten using bitwise identities to defeat pattern matching. | re-triton.solve_constraint finds Z3 proofs for mba == original |
| legacy disc-based protection | Old-style CD/DVD check or kernel-driver protection. | Section names with securom / .sdc; co-located *.sys drivers; high-frequency DeviceIoControl |
| hardware-fingerprinting routine | Static imports of HWID-vector APIs, regardless of whether the binary is also VM-pack-wrapped. | Imports ≥ 2 of {CryptAcquireContextW, CryptGenRandom, GetUserNameW, GetAdaptersAddresses, GetNetworkParams, GetComputerNameW, GetVolumeInformationW, NtQuerySystemInformation}; direct PEB reads |
| anti-debug / anti-emulation | Static or dynamic checks for a debugger / VM / sandbox. | IsDebuggerPresent / CheckRemoteDebuggerPresent / OutputDebugString imports; 0F 31 (RDTSC) / CD 2D (INT 2D) / many CC (int3) byte patterns; KUSER_SHARED_DATA reads at TickCountLow / InterruptTime |
Recognizing the patterns in arbitrary binaries
The categories in the table above are abstract — they describe a shape, not a feature you can grep for. This section shows the observable composition that a future agent should be able to recognize from first principles on a binary it has never seen before, using only the catalog and the MCP tools. Every observation below is generic — it describes a category, not a specific commercial product.
Pattern A — encrypted-VM bytecode interpreter (Unity IL2CPP target)
A register-based bytecode VM that has replaced the binary's real x86 code. The observable composition that fires together (all seven are diagnostic; any four is a strong signal):
- The PE's section table contains at least four of the seven
section-name regexes
\.xtls,\.didata,\.ecode,\.xdata,\.xpdata,\.udata,\.00cfg(defined indata/drm-indicators.yaml::section_indicators.rules). The.xtlssection is typically the highest-entropy region (entropy 7.85+). - The largest code-bearing section is W^X —
CNT_CODE | MEM_EXECUTE | MEM_READ | MEM_WRITEpermissions simultaneously. A 100+ MB.idatacarrying all four is the canonical example. - The canonical
.textsection hasvirtual_size >> raw_size(e.g. 2.2 MB virtual, 512 raw on disk). This is thelarge_section_with_tiny_textrule. - A small (under 200 bytes)
.ecodesection sits at the PE entry point and contains a lazy-decrypt stub — a 2-instruction walk over the bytecode range that fires on first call, not at load time, gated by a one-byte "done" flag in the section. - The PE debug directory references a PDB filename that embeds a vendor tag (a name fragment that's not the binary's own basename). Vendor-neutral translation: presence of any non-matching tag in the PDB reference is the signal.
- The exports table ends with a single late-bound entry — a stub the game calls after the interpreter is initialized. The interpreter is "armed but inert" until this export returns.
- The import table shows 8+ of the 12 APIs in
drm-indicators.yaml::hwid_apis.high_signal— the fingerprint-vector set is unusual for a non-DRM Unity IL2CPP game.
When all seven fire, the confidence is Medium-High for the
encrypted-VM bytecode interpreter category. re-lief.categorize_strings
will populate the obfuscation bucket (with the dispatch,
handler, lookup, vm_entry keywords) and the hwid bucket
with the imported APIs.
Pattern B — hardware-fingerprinting routine + anti-debug, in a third-party launcher activation library
A small native DLL sitting alongside the main game binary, gating launch on a license-server round-trip + host fingerprint. The observable composition that fires together:
- A small (1-3 MB) native DLL with ordinal-only exports
(
@100,@101— no symbol names). Exports are deliberately stripped. - The launcher
.exeimports only 2-3 ordinals from this DLL (entry point + setup/teardown). Nothing else. The DLL is opaque to the launcher. - The activation DLL statically links a recognizable crypto
library — the catalog's signal is the
.\crypto\...path fragments (1,000+ of them in.rdata). OpenSSL is the most common (look forEVP_*,RSA_*,X509*,PKCS*,BIO_*,PEM_*substrings).re-lief.categorize_stringspopulates thecryptobucket with 500+ matches on a 3 MB binary. - The import table shows WinHTTP (
WinHttpOpen,WinHttpConnect,WinHttpOpenRequest,WinHttpSendRequest,WinHttpReceiveResponse,WinHttpQueryHeaders,WinHttpReadData) plus the X.509 / Authenticode APIs (CryptQueryObject,PFXImportCertStore,WinVerifyTrust). Thenetworkbucket populates accordingly. - The import table shows 8+ of the 12 APIs in
drm-indicators.yaml::hwid_apis.high_signal(GetComputerNameW,GetUserNameW,GetVolumeInformationW,CryptAcquireContextW,CryptGenRandom,GetAdaptersAddresses, etc.). Thehwidbucket populates accordingly. - The import table shows the catalog's anti-debug primitives
(
IsDebuggerPresent,OutputDebugStringW,NtQueryInformationProcess). Theanti_debugbucket populates (Cycle 2 fix 2026-06-06: eachanti_debug_indicators .checks[]entry now carries aconfirmation:field ofimport_only/requires_disasm/requires_xref; the categorizer drops string-table hits that aren't backed by an import or disasm confirmation, eliminating the 48+ false positives on UE / Unity binaries that the prior string-only-equal filter produced). Important: the anti-debug surface is split between the activation DLL and the encrypted-VM-wrapped game DLL — typically the activation DLL has the Win32 anti-debug APIs and the game DLL has the VM-encrypted anti-debug. - The strings dump shows the
license-activationandobfuscationcategories fromre-lief.categorize_stringswith non-trivial counts (typically 50-200 strings each on a 3 MB binary). (Cycle 2 fix 2026-06-06: the prioractivationbucket was split intoue-component-activation(Unity component-lifecycle noise) andlicense-activation(the real license-gate vocabulary); Pattern B now readslicense-activation.count, notactivation.count.)
When all seven fire, the confidence is Medium-High for the
hardware-fingerprinting routine + anti-debug category layered with
a third-party launcher activation library. The activation library
is a separate layer from the main game DLL; the encrypted-VM
interpreter does the game-DLL work, the activation DLL does the
license-gate work, and the launcher .exe is the glue.
How to detect the patterns
The MCP tool re-lief.categorize_strings (in re-lief) drives
the static detection. Call it on every DLL and the launcher
.exe in the target. The categorizer buckets strings into
{anti_debug, hwid, crypto, network, registry, process, file, fingerprint, activation, obfuscation, misc} using the keyword
vocabularies in data/drm-indicators.yaml::string_categories.
The two seed categories (anti_debug, hwid) inherit their
keyword lists from the existing
anti_debug_indicators.checks[].name and
hwid_apis.high_signal[].api lists via a seed_from: YAML
pointer — when a future agent adds a new HWID API to
hwid_apis.high_signal, the hwid category picks it up on next
MCP-server reload with zero Python change.
The patterns above are the combinations that fire together:
- Pattern A fires when
obfuscation.count >= 5ANDhwid.count >= 5AND the section table contains at least four of the seven\.xtls|\.didata|\.ecode|\.xdata|\.xpdata|\.udata| \.00cfgnames AND the.textsection has thelarge_section_with_tiny_textshape. - Pattern B fires when
license-activation.count >= 50ANDcrypto.count >= 100AND the DLL has ordinal-only exports AND the import table shows 8+ of the 12 HWID APIs. (Prior versions of this doc referencedactivation.count; the 2026-06-06 Cycle 2 catalog refactor split theactivationbucket intoue-component-activation(Unity/UE component-lifecycle noise) andlicense-activation(the real license-gate vocabulary). Pattern B now readslicense-activation.countto avoid the 615 false-positive hits inP3R.exe's UE component vocabulary.)
The categorizer is deterministic and idempotent with the
catalog: the YAML is the single source of truth for both the
indicator set that re-drm-fingerprint reads and the keyword
set that the categorizer reads. Both the static analysis and the
string analysis will give consistent answers.
The inference chain
A reverse engineer using RE-AI typically goes:
-
Run
re-static-triageon the binary. This produces a section list, import table, and a capa capability report. -
Run
re-drm-fingerprintto score the binary against the catalog. The skill returns a confidence (Low / Medium / High) and a pattern indicator (the category from the table above). -
Match the indicator against the user's context:
- The user knows which protection their target uses → the indicator is just confirmation.
- The user is triaging an unknown binary → the indicator is the signal; the user supplies the vendor attribution (e.g. "this is a commercial encrypted-VM bytecode product shipping with this Unity target").
-
Use the right follow-up skill:
re-vm-reversefor VM bytecode interpreters (lift the dispatcher)re-mba-deobfuscatefor MBA-obfuscated arithmetic (Z3 proofs)re-il2cpp-decompilefor Unity IL2CPP class-graph recovery (post-protection, only the symbol table is readable)re-decompilefor function-level disassembly + decompilationre-dynamic-analysis(gdb/GEF) for runtime breakpoint / steppingre-symbolic-exec(Triton) for constraint solving on a single function
The negative space
RE-AI explicitly does not:
- Name a specific commercial vendor in any of its tools, data, or
generated output. The pattern indicators are descriptive; vendor
attribution is the user's call. (The gitignored
docs/andOutput/directories contain historical reports that do name vendors; those are not shipped.) - Crack or bypass the protection. The skills identify, lift, and document. The user decides what to do with the result.
- Compare two binaries' protection schemes for vendor attribution.
(
re-lief.normalize_for_diffdoes structural comparison; vendor attribution is orthogonal.) - Produce YARA rules for the protection scheme (v2 candidate).
Why vendor-neutral?
Three reasons:
-
The patterns are observable facts. The section names, import sets, and byte patterns are real bytes in real binaries. Anyone familiar with a commercial protection product will recognize the patterns. Naming the product in our public-facing tools adds nothing — the inference chain is in the catalog.
-
Avoiding vendor attribution makes the toolchain durable. A new protection product that ships next year is recognizable by the same patterns; we don't have to update every skill to add a new vendor. The catalog's
pattern_indicators.mappingsis the only place that needs new entries. -
The reverse-engineering community is small enough that attribution is redundant. Anyone using this toolchain against a real target already knows what protection their target uses; the pattern indicator confirms it. Anyone using it against an unknown target can apply the inference chain themselves.
Adding a new pattern to the catalog
When you encounter a new anti-tamper scheme with a public analysis:
- Add the section-name regex to
data/drm-indicators.yaml::section_indicators.rules(keepflags: any). - Add the HWID-vector API to
hwid_apis.high_signal(ormedium_signalif the signal is weaker). - Add the anti-debug check to
anti_debug_indicators.checks. - Add a new entry to
pattern_indicators.mappingswith a genericdescriptor:and the observableindicators:.
The vendor: field is gone. If you need a vendor-tagged entry (e.g.
"the catalog author has confirmed this pattern is from product X"),
add a note: to the mapping explaining the observation, not the
attribution.
Glossary
- anti-tamper — Software that detects and prevents tampering (debugging,
patching, hooking, dumping). The "DRM" in the original
drm-indicators.yamlis a legacy term from when anti-tamper was primarily about copy protection; today's anti-tamper covers all reverse-engineering defenses. - encrypted-VM bytecode — A bytecode interpreter where the bytecode is stored encrypted and decrypted on-the-fly by a VM dispatcher. The original x86 code is replaced by the VM.
- MBA (Mixed-Boolean-Arithmetic) — A class of obfuscation that rewrites arithmetic using bitwise identities. Semantically equivalent to the original; just harder to read.
- HWID (Hardware ID) — A fingerprint of the host machine, used for license binding. The HWID-vector API set is the set of Windows APIs most commonly read to assemble the fingerprint.
- dispatcher — The function in a VM that fetches the next handler from a table and jumps to it. The hottest function in a VM by call count.
- PEB (Process Environment Block) — A user-mode data structure
in Windows that DRM schemes read to detect a debugger. See
data/drm-indicators.yaml::peb. - KUSER_SHARED_DATA — A kernel-mapped page that user code can
read without a syscall. DRM schemes read fields here as part of
the host fingerprint. See
data/drm-indicators.yaml::kuser_shared_data.