mirror of
https://github.com/Heretek-AI/RE-AI.git
synced 2026-07-01 22:34:01 -04:00
f5e5e9e72c
Adds a keyword-bucketed strings dump to the re-lief MCP server, turning
the manual-grep step that today lives in the LLM's head into a
catalog-driven, deterministic lookup. Superset of extract_strings
(same {ascii, utf16le, totals, truncated} shape for backward compat)
plus a by_category block with 11 semantic categories (anti_debug,
hwid, crypto, network, registry, process, file, fingerprint,
activation, obfuscation, misc).
The categorization vocabulary lives in a new
data/drm-indicators.yaml::string_categories section. Two seed
categories (anti_debug, hwid) inherit their keyword lists from
existing catalog sections via a seed_from / seed_field YAML pointer
— when a future agent adds a new HWID API to hwid_apis.high_signal,
the categorizer picks it up on next MCP-server reload with zero
Python change. The YAML is the single source of truth for both the
indicator set that re-drm-fingerprint reads and the keyword set
that the categorizer reads.
Five skills (re-static-triage, re-malware-triage, re-drm-fingerprint,
re-vm-reverse, re-format-decode) had their manual-grep step replaced
with a call to re-lief.categorize_strings. No new workflow steps
were added — the categorizer IS the string scan.
ANTI-TAMPER-TAXONOMY.md gains a "Recognizing the patterns in
arbitrary binaries" section that documents Pattern A (encrypted-VM
bytecode interpreter: 7 section-name co-occurrence + W^X .idata +
.text virt>>raw + .ecode lazy-decrypt stub + vendor-tagged PDB +
late-bound export tail + 8+ HWID APIs) and Pattern B
(hardware-fingerprinting routine in a third-party launcher
activation library: ordinal-only exports + WinHTTP + OpenSSL +
HWID-vector APIs + split anti-debug surface) in vendor-neutral
category terms. No vendor / publisher / game / PDB-path literals
appear in any shipped file.
Tests: 7 new soft-skip tests in test_re_lief_categorize_strings.py
covering the result shape, the seed_from inheritance, the bundled
Activation64.dll high-signal hits, the legacy extract_strings
wrapper, and the GameAssembly full-section vs skip_sections paths.
All always-on tests (leakage, frontmatter, server registration,
smoke) continue to pass. ./verify.sh is green.
701 lines
28 KiB
YAML
701 lines
28 KiB
YAML
# DRM / anti-tamper indicator catalog for the RE-AI plugin.
|
|
#
|
|
# This file is the data layer for the `re-vm-reverse`, `re-mba-deobfuscate`,
|
|
# and `re-drm-fingerprint` skills. It is intentionally structured so the
|
|
# LLM can read it as plain text — no schema magic, just consistent
|
|
# indentation and one observation per bullet.
|
|
#
|
|
# Sources: public reverse-engineering literature on encrypted-VM
|
|
# bytecode interpreters, MBA-obfuscated arithmetic, and the kuser /
|
|
# PEB offsets documented in ReactOS / Wine / Windows Internals 7th ed.
|
|
#
|
|
# The catalog is intentionally vendor-neutral: it records *observable
|
|
# patterns* (section names, import sets, byte patterns, PEB reads) and
|
|
# the *categories* of anti-tamper those patterns suggest. The user
|
|
# supplies the vendor attribution based on their context — see
|
|
# ANTI-TAMPER-TAXONOMY.md for the inference chain.
|
|
#
|
|
# Update this file when a new pattern family shows up in the wild.
|
|
# Don't update it for every new anti-tamper trick — focus on
|
|
# *static-detectable* indicators (imports, sections, byte patterns).
|
|
|
|
version: 3
|
|
last_updated: 2026-06-04
|
|
|
|
# ─────────────────────────────────────────────────────────────────────
|
|
# KUSER_SHARED_DATA — read by encrypted-VM anti-tamper and other DRM to fingerprint
|
|
# the host. These are 64-bit fields in the kernel's user-shared region.
|
|
# Address: 0xFFFFF78000000000 (x64) / 0x7FFE0000 (x86, deprecated).
|
|
# ─────────────────────────────────────────────────────────────────────
|
|
|
|
kuser_shared_data:
|
|
description: |
|
|
KUSER_SHARED_DATA is a kernel-mapped page that user code can read
|
|
without a syscall. DRM schemes use the values here as part of the
|
|
hardware fingerprint. If a binary reads multiple of these offsets
|
|
in sequence and then sends the result to a server, it is likely
|
|
a DRM fingerprinting routine.
|
|
address_64: 0xFFFFF78000000000
|
|
address_32: 0x7FFE0000
|
|
fields_of_interest:
|
|
- offset: 0x000
|
|
name: NtMajorVersion
|
|
size: 4
|
|
note: Windows major version (e.g. 10 = Windows 10/11).
|
|
- offset: 0x004
|
|
name: NtMinorVersion
|
|
size: 4
|
|
note: Windows minor version.
|
|
- offset: 0x008
|
|
name: NtBuildNumber
|
|
size: 4
|
|
- offset: 0x026
|
|
name: NativeProcessorInformation
|
|
size: 6
|
|
note: 6-byte processor feature mask; commonly fingerprint-checked.
|
|
- offset: 0x02C
|
|
name: SystemRoot
|
|
size: 30 # actually a UNICODE_STRING
|
|
note: read via `GetWindowsDirectoryW` or direct memory access.
|
|
- offset: 0x058
|
|
name: NtProductType
|
|
size: 4
|
|
- offset: 0x260
|
|
name: TickCountLow
|
|
size: 4
|
|
note: encrypted-VM anti-tamper and other DRM read this for anti-debug timing.
|
|
- offset: 0x264
|
|
name: TickCountLowDeprecated
|
|
size: 4
|
|
- offset: 0x3C0
|
|
name: InterruptTime
|
|
size: 8
|
|
note: encrypted-VM anti-tamper reads this in spin-lock CPUID guards.
|
|
detection_pattern: |
|
|
A sequence of >= 3 distinct reads from this region (in any form —
|
|
direct memory access, `__readpmc`, or via `NtQuerySystemInformation`)
|
|
combined with network or registry persistence is a strong DRM signal.
|
|
|
|
# ─────────────────────────────────────────────────────────────────────
|
|
# PEB (Process Environment Block) — same deal. Fields commonly read
|
|
# by DRM routines.
|
|
# ─────────────────────────────────────────────────────────────────────
|
|
|
|
peb:
|
|
description: |
|
|
PEB is at fs:[0x30] (x86) / gs:[0x60] (x64). DRM routines read
|
|
fields here to fingerprint the process and detect hooks.
|
|
fields_of_interest:
|
|
- offset_x64: 0x060
|
|
offset_x86: 0x030
|
|
name: PEB pointer itself
|
|
note: the field where fs:[0x30] / gs:[0x60] points.
|
|
- offset_x64: 0x118
|
|
offset_x86: 0x0BC
|
|
name: OSMajorVersion
|
|
- offset_x64: 0x11C
|
|
offset_x86: 0x0C0
|
|
name: OSMinorVersion
|
|
- offset_x64: 0x120
|
|
offset_x86: 0x0C4
|
|
name: OSBuildNumber
|
|
- offset_x64: 0x130
|
|
offset_x86: 0x0D4
|
|
name: BeingDebugged
|
|
note: classic anti-debug check. Always 0 in non-attached process.
|
|
- offset_x64: 0x158
|
|
offset_x86: 0x0F8
|
|
name: ProcessHeap
|
|
note: encrypted-VM anti-tamper and other DRM read this for heap-spray detection.
|
|
detection_pattern: |
|
|
Direct PEB reads without going through NtCurrentTeb() are a strong
|
|
static signal of anti-analysis code.
|
|
|
|
# ─────────────────────────────────────────────────────────────────────
|
|
# HWID-vector API imports. These are the user-mode APIs most commonly
|
|
# used to assemble a hardware fingerprint. Encrypted-VM anti-tamper,
|
|
# legacy disc-based protection, and HWID-fingerprinting routines all
|
|
# read some subset of these.
|
|
# ─────────────────────────────────────────────────────────────────────
|
|
|
|
hwid_apis:
|
|
description: |
|
|
APIs whose values are commonly used to assemble a hardware
|
|
fingerprint. A binary that imports *several* of these is a
|
|
candidate for being a DRM routine or a malware host-fingerprinting
|
|
routine.
|
|
high_signal:
|
|
- api: GetVolumeInformationW
|
|
library: kernel32
|
|
note: serial number of the boot volume — classic DRM check.
|
|
- api: GetComputerNameW
|
|
library: kernel32
|
|
note: hostname; rarely used by legitimate code.
|
|
- api: GetWindowsDirectoryW
|
|
library: kernel32
|
|
- api: GetSystemDirectoryW
|
|
library: kernel32
|
|
- api: GetUserNameW
|
|
library: advapi32
|
|
- api: GetAdaptersInfo
|
|
library: iphlpapi
|
|
note: MAC addresses of all NICs.
|
|
- api: GetAdaptersAddresses
|
|
library: iphlpapi
|
|
- api: NtQuerySystemInformation
|
|
library: ntdll
|
|
note: used to enumerate a *lot* of things; high-signal only when
|
|
called with `SystemKernelDebuggerInformation` or
|
|
`SystemCodeIntegrityInformation`.
|
|
- api: NtQuerySystemInformationEx
|
|
library: ntdll
|
|
- api: NtQueryInformationProcess
|
|
library: ntdll
|
|
note: anti-debug target (ProcessDebugPort, ProcessDebugObjectHandle).
|
|
- api: CPUID
|
|
library: native
|
|
note: read directly; not an import. Look for the `cpuid` opcode.
|
|
medium_signal:
|
|
- api: RegOpenKeyExW
|
|
library: advapi32
|
|
note: when followed by RegQueryValueExW on
|
|
`HKLM\SOFTWARE\Microsoft\Cryptography\MachineGuid`.
|
|
- api: RegQueryValueExW
|
|
library: advapi32
|
|
note: with the MachineGuid subkey.
|
|
- api: GetSystemInfo
|
|
library: kernel32
|
|
- api: GetNativeSystemInfo
|
|
library: kernel32
|
|
- api: GlobalMemoryStatusEx
|
|
library: kernel32
|
|
detection_pattern: |
|
|
Importing *at least 2* high-signal APIs and *at least 1* medium-signal
|
|
API is a high-confidence indicator of HWID-fingerprinting code.
|
|
|
|
# ─────────────────────────────────────────────────────────────────────
|
|
# Section heuristics. Custom VMs and packers leave recognizable
|
|
# artifacts in the section table.
|
|
# ─────────────────────────────────────────────────────────────────────
|
|
|
|
section_indicators:
|
|
description: |
|
|
Section name + permission patterns that suggest a custom VM, a
|
|
packer, or a DRM-wrapped binary. The skill `re-vm-reverse` uses
|
|
these as a *first-pass triage* signal.
|
|
rules:
|
|
- name_match:
|
|
- "\.vm"
|
|
- "\.vmp"
|
|
- "\.code"
|
|
- "\.themida"
|
|
- "\.winlice"
|
|
- "\.securom"
|
|
flags: any
|
|
meaning: "encrypted-VM bytecode container (custom VM-pack pattern)"
|
|
# Added 2026-06-04. These seven section names co-occur on Unity IL2CPP
|
|
# targets wrapped by a commercial encrypted-VM anti-tamper. They are
|
|
# *observable* in any binary's section table — not vendor attribution.
|
|
# The .xtls-style section is the highest-entropy region (entropy 7.85+)
|
|
# and is encrypted-TLS for the VM bytecode interpreter.
|
|
# See ANTI-TAMPER-TAXONOMY.md for the inference chain.
|
|
- name_match:
|
|
- "\.xtls"
|
|
- "\.didata"
|
|
- "\.ecode"
|
|
- "\.xdata"
|
|
- "\.xpdata"
|
|
- "\.udata"
|
|
- "\.00cfg"
|
|
flags: any
|
|
meaning: "encrypted-VM bytecode container (Unity IL2CPP target — observable pattern, not vendor attribution)"
|
|
- name_match:
|
|
- "\.UPX0"
|
|
- "\.UPX1"
|
|
- "\.mpress1"
|
|
- "\.aspack"
|
|
- "\.petite"
|
|
flags: any
|
|
meaning: "known packer (less likely a VM, more likely compressed)"
|
|
- flags_match: ["R", "W", "X"] # W^X
|
|
meaning: "writable + executable; suspicious, especially on .text"
|
|
- large_section_with_tiny_text:
|
|
meaning: "virtual_size >> raw_size on a code section; classic packed indicator"
|
|
- section_no_name:
|
|
meaning: "single-character or empty section names; manual packing"
|
|
|
|
# ─────────────────────────────────────────────────────────────────────
|
|
# VM dispatcher patterns. When a binary has a custom VM, the entry
|
|
# point or a frequently-called function has a recognizable dispatcher
|
|
# that fetches the next handler from a table and jumps to it.
|
|
# ─────────────────────────────────────────────────────────────────────
|
|
|
|
vm_dispatcher_patterns:
|
|
description: |
|
|
Patterns to search for in disassembly to locate a VM dispatcher.
|
|
The skill `re-vm-reverse` uses these as a fast identification
|
|
signal before deeper analysis.
|
|
x86_64_patterns:
|
|
- description: "movzx + jmp via handler table"
|
|
asm: |
|
|
mov rax, [rbx+rcx*8] ; or rdx*8, or rsi*8
|
|
jmp rax
|
|
- description: "mov + jmp via handler table (rdi-based)"
|
|
asm: |
|
|
mov rax, [rdi+rsi*8]
|
|
jmp rax
|
|
- description: "ret after dispatch"
|
|
asm: |
|
|
mov rax, [handler_table+rcx*8]
|
|
; ... validation ...
|
|
jmp rax
|
|
detection_heuristic: |
|
|
A function that ends with `jmp reg` where `reg` was loaded from a
|
|
memory operand that is *not* a function pointer is a strong
|
|
dispatcher candidate. Confirm with frequency analysis (re-gdb):
|
|
the dispatcher should be called far more often than ordinary
|
|
functions.
|
|
|
|
# ─────────────────────────────────────────────────────────────────────
|
|
# MBA (Mixed-Boolean-Arithmetic) patterns. These are arithmetic
|
|
# identities rewritten using bitwise operators to defeat pattern
|
|
# matching. The skill `re-mba-deobfuscate` uses this catalog to
|
|
# construct Z3 queries that simplify MBA expressions back to their
|
|
# original form.
|
|
# ─────────────────────────────────────────────────────────────────────
|
|
|
|
mba_patterns:
|
|
description: |
|
|
MBA-rewritten x86 expressions. Each entry is a Z3-compatible
|
|
Python expression. The skill uses these to drive
|
|
`re-triton.solve_constraint` queries.
|
|
identities:
|
|
- name: "add_via_and_or"
|
|
original: "x + y"
|
|
mba: "(x & y) + (x | y)"
|
|
note: "most common MBA add"
|
|
- name: "or_via_add"
|
|
original: "x | y"
|
|
mba: "x + y + 1 + (~x | ~y)"
|
|
note: "Z3 will simplify if both are 8-bit or 32-bit"
|
|
- name: "xor_via_or_and"
|
|
original: "x ^ y"
|
|
mba: "(x | y) & ~(x & y)"
|
|
- name: "and_via_or_and"
|
|
original: "x & y"
|
|
mba: "(x + y) - (x | y)"
|
|
- name: "sub_via_add_not"
|
|
original: "x - y"
|
|
mba: "x + ~y + 1"
|
|
note: "this one is *correct* two's-complement; an MBA tool would
|
|
expand it further to obscure intent."
|
|
- name: "neg_via_add"
|
|
original: "-x"
|
|
mba: "~x + 1"
|
|
z3_strategy: |
|
|
For each MBA expression, construct a Z3 `BitVec` query that
|
|
asserts `mba_expr == original_expr` and ask the solver to produce
|
|
a counterexample (i.e. an input pair for which the MBA and the
|
|
original differ). If no counterexample exists, the MBA
|
|
equivalence is proven; the original is the simpler form.
|
|
|
|
# ─────────────────────────────────────────────────────────────────────
|
|
# Anti-debug / anti-emulation catalog. Used by `re-drm-fingerprint`
|
|
# to flag binaries that exhibit *any* of these patterns.
|
|
# ─────────────────────────────────────────────────────────────────────
|
|
|
|
anti_debug_indicators:
|
|
description: |
|
|
Static-detectable patterns that suggest a binary is checking for
|
|
debuggers, emulators, or sandboxes. Each entry is a
|
|
detection signal and a static check.
|
|
checks:
|
|
- name: "PEB.BeingDebugged"
|
|
signal: "direct read of fs:[0x30] / gs:[0x60] + 0x02 / 0x130"
|
|
detection: "string search for the byte sequence in the binary"
|
|
- name: "IsDebuggerPresent"
|
|
signal: "import of kernel32!IsDebuggerPresent"
|
|
detection: "re-rizin.list_imports_exports"
|
|
- name: "CheckRemoteDebuggerPresent"
|
|
signal: "import of kernel32!CheckRemoteDebuggerPresent"
|
|
detection: "re-rizin.list_imports_exports"
|
|
- name: "NtQueryInformationProcess"
|
|
signal: "call to ntdll!NtQueryInformationProcess with class
|
|
ProcessDebugPort (0x07) or ProcessDebugObjectHandle (0x1E)"
|
|
detection: "re-rizin.disassemble_function around the call site;
|
|
check the immediate argument to the call"
|
|
- name: "RDTSC timing"
|
|
signal: "sequence of two rdtsc instructions separated by a small
|
|
amount of code; if delta > N cycles, debugger is present"
|
|
detection: "re-rizin.search_bytes for `0F 31` opcode (rdtsc)"
|
|
- name: "INT 2D"
|
|
signal: "int 0x2d (single-byte interrupt) — when a debugger is
|
|
attached, the exception is swallowed and execution continues
|
|
with different state than when no debugger is attached"
|
|
detection: "re-rizin.search_bytes for `CD 2D`"
|
|
- name: "INT 3 trap"
|
|
signal: "int 0x3 (0xCC) used as a control-flow check; a
|
|
debugger will step over it without raising the exception,
|
|
while normal execution raises EXCEPTION_BREAKPOINT"
|
|
detection: "re-rizin.search_bytes for `CC` and check that the
|
|
following code expects SEH"
|
|
- name: "OutputDebugString"
|
|
signal: "import of kernel32!OutputDebugString with a non-null
|
|
buffer; if a debugger is present, the function returns
|
|
immediately without side effects, otherwise it raises an
|
|
exception that the calling code can catch"
|
|
detection: "re-rizin.list_imports_exports"
|
|
- name: "exception-hooking decoys (encrypted-VM-style)"
|
|
signal: "writes to the stack frame at offsets that match
|
|
EXCEPTION_RECORD layout, *after* a CPUID or syscall, then
|
|
reads the same offsets before a comparison"
|
|
detection: "manual review; pattern: `mov [rsp+0..0x98], X`
|
|
followed by a `cmp [rsp+0..0x98], X`" # the decoy pattern
|
|
- name: "scattered-bit register storage (encrypted-VM marker)"
|
|
signal: "VM register values stored as bits scattered across the
|
|
stack rather than contiguously; defeats pattern matching"
|
|
detection: "manual; flagged by `re-vm-reverse` after the
|
|
dispatcher is identified"
|
|
|
|
# ─────────────────────────────────────────────────────────────────────
|
|
# String categories. Used by `re-lief.categorize_strings` to bucket
|
|
# the strings extracted from a binary into semantic categories. Two
|
|
# categories (`anti_debug`, `hwid`) inherit their keyword lists from
|
|
# the catalog lists above via the `seed_from:` / `seed_field:`
|
|
# pointer syntax; the rest have inline keyword lists. When a future
|
|
# agent adds a new HWID API to `hwid_apis.high_signal`, the
|
|
# `hwid` category picks it up on next MCP-server reload with zero
|
|
# Python change. All keywords are generic Windows API names,
|
|
# OpenSSL source-path fragments, or standard protocol substrings —
|
|
# no commercial product, publisher, or PDB-path literal appears.
|
|
# ─────────────────────────────────────────────────────────────────────
|
|
|
|
string_categories:
|
|
description: |
|
|
Buckets for `re-lief.categorize_strings`. Each category is a list
|
|
of case-insensitive substrings; a string is added to a category
|
|
if any keyword matches. A string can match multiple categories
|
|
(counted in each); the categorizer de-duplicates within a
|
|
category by (string, section). Categories whose `seed_from`
|
|
pointer is set inherit their keyword list from the named catalog
|
|
list at module load time — see
|
|
`servers/re-lief/src/re_lief/categorizers.py::load_categories`.
|
|
categories:
|
|
- name: anti_debug
|
|
seed_from: anti_debug_indicators.checks
|
|
seed_field: name
|
|
note: |
|
|
Inherits verbatim from `anti_debug_indicators.checks[].name`
|
|
— IsDebuggerPresent, OutputDebugString,
|
|
NtQueryInformationProcess, etc. The two strings below are
|
|
detected by the bytes-pattern checks (RDTSC, INT 2D, INT 3
|
|
are in the catalog as opcode signals) but `re-lief`'s
|
|
strings pass is a static-import pass, so the names that
|
|
fire here are the API names and the C++ symbols
|
|
(`_Xlength_error`, `_Xout_of_range` — typeinfo false
|
|
positives) that contain the substrings.
|
|
- name: hwid
|
|
seed_from: hwid_apis.high_signal
|
|
seed_field: api
|
|
note: |
|
|
Inherits verbatim from `hwid_apis.high_signal[].api` —
|
|
GetComputerNameW, GetVolumeInformationW,
|
|
GetAdaptersAddresses, etc. The `medium_signal` set
|
|
(RegOpenKeyExW, RegQueryValueExW, GetSystemInfo, etc.)
|
|
lives in the `registry` and `process` categories below
|
|
for a cleaner bucket split.
|
|
- name: crypto
|
|
keywords:
|
|
- "OpenSSL"
|
|
- "\\crypto\\"
|
|
- "EVP_"
|
|
- "RSA"
|
|
- "AES"
|
|
- "SHA"
|
|
- "HMAC"
|
|
- "DH_"
|
|
- "EC_"
|
|
- "PEM_"
|
|
- "BIO_"
|
|
- "X509"
|
|
- "PKCS"
|
|
- "CRYPTO_"
|
|
- "SSL_"
|
|
- "TLS"
|
|
- "Cipher"
|
|
- "MD5_"
|
|
- "digest"
|
|
- "PRIVATEKEY"
|
|
- "Public-Key"
|
|
- "Private-Key"
|
|
- "key_length"
|
|
- "cms_"
|
|
- "pkey"
|
|
- "ocsp"
|
|
- "crl"
|
|
note: |
|
|
OpenSSL-internal strings, X.509 / CMS / PKCS object names,
|
|
cipher-suite and digest identifiers. Statically-linked
|
|
OpenSSL releases typically contribute 600+ strings to this
|
|
bucket (every `.\crypto\...` source-path fragment counts).
|
|
- name: network
|
|
keywords:
|
|
- "WinHttp"
|
|
- "WinINet"
|
|
- "InternetOpen"
|
|
- "HttpOpenRequest"
|
|
- "WSAStartup"
|
|
- "ws2_32"
|
|
- "connect"
|
|
- "send"
|
|
- "recv"
|
|
- "socket"
|
|
- "gethostbyname"
|
|
- "getaddrinfo"
|
|
- "URL"
|
|
- "http://"
|
|
- "https://"
|
|
- "ftp://"
|
|
- "tcp://"
|
|
- ".com"
|
|
- ".net"
|
|
- ".org"
|
|
- ".io"
|
|
- "DNS"
|
|
- "Host:"
|
|
- "User-Agent:"
|
|
- "Content-Type:"
|
|
- "ocsp."
|
|
- "crl."
|
|
- "ts-ocsp"
|
|
note: |
|
|
HTTP / Winsock / DNS / URL substrings — including CRL/OCSP
|
|
endpoints (the WinVerifyTrust / PFXImportCertStore
|
|
license-validation pattern). False positives: the URL
|
|
scheme substrings (`.com`, `.net`, etc.) will match
|
|
non-network strings; review the `samples[]` to confirm.
|
|
- name: registry
|
|
keywords:
|
|
- "RegOpenKeyEx"
|
|
- "RegQueryValueEx"
|
|
- "RegSetValueEx"
|
|
- "RegCloseKey"
|
|
- "RegCreateKeyEx"
|
|
- "HKEY_"
|
|
- "HKLM"
|
|
- "HKCU"
|
|
- "Software\\Microsoft"
|
|
- "CurrentVersion\\Run"
|
|
- "MachineGuid"
|
|
- "Cryptography"
|
|
- "advapi32"
|
|
note: |
|
|
Registry API names + common key paths. Note: HKLM/HKCU
|
|
are 4-char tokens; a string like 'HKLM\\foo' fires here
|
|
even if the real registry call is in a different binary.
|
|
- name: process
|
|
keywords:
|
|
- "CreateProcess"
|
|
- "CreateThread"
|
|
- "CreateRemoteThread"
|
|
- "OpenProcess"
|
|
- "WriteProcessMemory"
|
|
- "ReadProcessMemory"
|
|
- "VirtualAlloc"
|
|
- "VirtualAllocEx"
|
|
- "VirtualProtect"
|
|
- "VirtualQuery"
|
|
- "NtCreateThread"
|
|
- "ResumeThread"
|
|
- "SuspendThread"
|
|
- "TerminateProcess"
|
|
- "ShellExecute"
|
|
- "WinExec"
|
|
- "CreateProcessW"
|
|
- "CreateProcessA"
|
|
note: |
|
|
Process / thread / memory APIs. Both all-process versions
|
|
(no 'Ex' suffix) and remote-injection versions are
|
|
included.
|
|
- name: file
|
|
keywords:
|
|
- "CreateFile"
|
|
- "ReadFile"
|
|
- "WriteFile"
|
|
- "DeleteFile"
|
|
- "MoveFile"
|
|
- "CopyFile"
|
|
- "GetFileSize"
|
|
- "FindFirstFile"
|
|
- "FindNextFile"
|
|
- "GetTempPath"
|
|
- "GetTempFileName"
|
|
- "CreateFileW"
|
|
- "CreateFileA"
|
|
- "DeleteFileW"
|
|
- "kernel32"
|
|
note: |
|
|
File I/O API names. Includes both W and A variants.
|
|
`kernel32` is included because the OpenSSL path-fragment
|
|
noise often mentions the host DLL; a binary that only
|
|
links kernel32 + the file APIs (a pure copy tool) will
|
|
fire only on this bucket.
|
|
- name: fingerprint
|
|
keywords:
|
|
- "Volume{"
|
|
- "\\\\.\\PhysicalDrive"
|
|
- "\\\\.\\CdRom"
|
|
- "SMBIOS"
|
|
- "Manufacturer"
|
|
- "SerialNumber"
|
|
- "ProductId"
|
|
- "UUID"
|
|
- "MachineGuid"
|
|
- "HKLM\\SOFTWARE\\Microsoft\\Cryptography"
|
|
- "displayName"
|
|
- "enhancedSearchGuide"
|
|
- "searchGuide"
|
|
- "fingerprint"
|
|
- "hostid"
|
|
note: |
|
|
Strings that suggest the binary is reading a
|
|
hardware-fingerprint vector *directly* (not via the API).
|
|
Less about the API, more about the *value* — `Volume{...}`
|
|
is the canonical Windows volume-serial GUID. Most
|
|
fingerprints reach the binary through the API in the
|
|
`hwid` bucket; this one catches the rare case where the
|
|
fingerprint is inlined as a literal.
|
|
- name: activation
|
|
keywords:
|
|
- "Activation"
|
|
- "Activate"
|
|
- "License"
|
|
- "Licence"
|
|
- "Entitlement"
|
|
- "DeregisterEventSource"
|
|
- "RegisterEventSource"
|
|
- "EventSource"
|
|
- "LocalKeySet"
|
|
- "PKCS7"
|
|
- "PKCS8"
|
|
- "PFX"
|
|
- "CMS_"
|
|
- "Recipient"
|
|
- "SignedData"
|
|
- "EnvelopedData"
|
|
- "AuthorityKey"
|
|
- "SubjectKey"
|
|
- "Token"
|
|
- "Challenge"
|
|
- "Response"
|
|
- "Manifest"
|
|
- "msi.dll"
|
|
- "mscoree.dll"
|
|
note: |
|
|
Activation / license-gate vocabulary. Includes PKCS#7 /
|
|
CMS object names and the RegisterEventSource /
|
|
DeregisterEventSource pair that the activation routine
|
|
typically uses to write to the Windows Event Log. False
|
|
positives: any UI string containing the word
|
|
"Activate" (Unity component lifecycle) fires here; review
|
|
`samples[]` to confirm.
|
|
- name: obfuscation
|
|
keywords:
|
|
- "\\crypto\\"
|
|
- "decrypt"
|
|
- "encrypt"
|
|
- "obfuscat"
|
|
- "packed"
|
|
- "xor"
|
|
- "XOR"
|
|
- "ROL"
|
|
- "ROR"
|
|
- "base64"
|
|
- "Base64"
|
|
- "lzma"
|
|
- "zlib"
|
|
- "deflate"
|
|
- "inflate"
|
|
- "RC4"
|
|
- "S-box"
|
|
- "sbox"
|
|
- "lookup"
|
|
- "dispatch"
|
|
- "handler"
|
|
- "vm_entry"
|
|
- "vm_dispatch"
|
|
- "vm_init"
|
|
- "kUSER"
|
|
- "PEB"
|
|
- "BeingDebugged"
|
|
- "NtGlobalFlag"
|
|
note: |
|
|
String patterns that suggest obfuscation / VM-pack code.
|
|
Note `\\crypto\\` is a *path*, not a runtime call — it
|
|
ends up in this bucket via OpenSSL source paths leaking
|
|
into release binaries (a known false positive on
|
|
statically linked OpenSSL). The VM-dispatch strings
|
|
(lookup / dispatch / handler / vm_entry) are the
|
|
encrypted-VM bytecode category signal.
|
|
- name: misc
|
|
keywords: []
|
|
note: |
|
|
Catch-all bucket. Populated only when `include_misc=true`.
|
|
The `uncategorized_sample` field in the categorizer's
|
|
return shape is what callers use to spot *missing*
|
|
categories — a string the user knows is interesting but
|
|
that the YAML doesn't cover is a signal to add a new
|
|
keyword to the appropriate category.
|
|
|
|
# ─────────────────────────────────────────────────────────────────────
|
|
# Pattern indicators. Soft signals — describe the *category* of
|
|
# anti-tamper a set of observables suggests, not a specific vendor.
|
|
# ─────────────────────────────────────────────────────────────────────
|
|
|
|
pattern_indicators:
|
|
description: |
|
|
Heuristic mappings from observed indicators to anti-tamper
|
|
*categories*. These are descriptive pattern indicators — the
|
|
user supplies the vendor attribution based on their context.
|
|
Confirm with a deeper analysis (capability comparison, signature
|
|
lookups) before publishing.
|
|
mappings:
|
|
- descriptor: "encrypted-VM bytecode, Unity IL2CPP target"
|
|
indicators:
|
|
- "large encrypted-VM section with W^X"
|
|
- "imports of GetVolumeInformationW + GetComputerNameW +
|
|
GetUserNameW + NtQuerySystemInformation"
|
|
- "spin-lock-guarded CPUID patches"
|
|
- "scattered-bit register storage"
|
|
confidence: "Medium-High"
|
|
- descriptor: "encrypted-VM bytecode (alternative dispatcher variant)"
|
|
indicators:
|
|
- ".vmp0 / .vmp1 section names"
|
|
- "handlers prefixed with `VMP` in disassembly"
|
|
- "imports of unknown APIs (resolved at runtime)"
|
|
confidence: "High"
|
|
- descriptor: "encrypted-VM bytecode (WinLicense-family)"
|
|
indicators:
|
|
- ".themida or .winlice section names"
|
|
- "Mutant / Mutex named with a vendor tag"
|
|
- "Imports that are missing or ordinal-only"
|
|
confidence: "High"
|
|
- descriptor: "encrypted-VM bytecode (CISC-dispatch variant)"
|
|
indicators:
|
|
- ".code section with W^X"
|
|
- "handler dispatch via `jmp [reg*8+table]`"
|
|
- "anti-debug catalog hits (PEB.BeingDebugged, RDTSC)"
|
|
confidence: "Medium"
|
|
- descriptor: "legacy disc-based protection"
|
|
indicators:
|
|
- "section names with 'securom' or '.sdc'"
|
|
- "imports of `CreateFileA` with a CD/DVD check literal in the
|
|
string table"
|
|
confidence: "High"
|
|
- descriptor: "legacy disc-based protection (kernel-driver variant)"
|
|
indicators:
|
|
- "drivers named `*.sys` in the same directory"
|
|
- "imports of `DeviceIoControl` with high-frequency call
|
|
patterns"
|
|
confidence: "Medium"
|