capstone/suite/test_mc.py
Rot127 ef89b18a88 Architecture updater (auto-sync) - Updating AArch64 (#2026)
* Update sysop inc file

* Fix missing  braces warning

* Handle new system operands

* Fix build errors by renaming.

* Fix segfault

* Fix segfault

* Add custom MCOperand valiadtors

* Add AArch64 case for getFeatureBits

* Fix infinite loop

* Fix braces warning.

* Implement loopuo by name for sys operands

* Fix incorrect translation which remove else if statements.

* Fix several segfaults

* Rename GetRegFromClass patch

* Fix segfaults and asserts

* Fix segfault

* Move MRI setting to Mapping

* Remove unused code

* Add add_op_X functinos for AArch64.

* Add fill detail functins

* Handle RegWithShiftExtend operands

* Handle TypedVectorList operands.

* Handle ComplexRoatation operands

* Handle MemExtend operands

* Handle ImmRangeScale operands

* Handle ExactFPImm operands

* Handle GPRSeqPairsClass operands

* Handle Imm8OptLsl operands

* Handle ImmScale operands

* Handle LogicalImm operands

* Handle Matrix operands

* Handle SME Matrix tiles and vectors.

* Handle normal operands.

* Fix segfault.

* Handle PostInc operands.

* Reorder VecLayout enum to have no duplicate enum value.

* Handle PredicateAsCounter operands

* Handle ZPRasFPR operands

* Handle VectorIndex operands

* Handle UImm12Offset operands.

* Move reg suffix to enum val to single function.

* Handle SVERegOp operands

* Handle SVELogicalImm operands

* Handle SImm operand

* Handle PrefetchOp operands

* Handle Imm and ImmHex operands

* Handle GPR64as32 and GPR64x8 operands

* Add missing break

* Handle FPImm operand

* Handle ExtendedRegister opreand

* Handle CondCode operands

* Handle BTIHintOp operands

* Handle BarrierOption operands

* Handle BarrierXSOption

* Add not implemeted case again

* Handle ArithExtend operands

* Handle AdrpLabel and AlignedLabel operands

* Handle AMNoIndex operands

* Handle AddSubImm operands

* Handle MSRSystemRegisters and MRSSystemRegister operands

* Handle PSBHntOp and RPRFMOperand operands

* Remove unused variables

* Handle InverseCondCode operands

* Handle ImplicityTypedVectorList operands

* Handle ShiftedRegister operands

* Handle Shifter operands

* Handle SIMDType10Operand operands

* Handle SVCROp operands

* Handle SVEPattern operands

* Handle SVEVecLenSpecifier operands

* Handle SysCROperands

* Handle SysXzrPair operands

* Handle PState operands

* Handle VRegOperands

* Primt SME oeprands.

* Fix cs_operand.h include

* Rename arm64 -> aarch64 in python bindings.

* Add Python bindings for SH

* Fix ARM Python bindings (#2127)

* Restructure auto-sync update scripts.

* Move Helper functions to Updater dir

* Move requirements.txt

* Add basic ASUpdater.py

* Run black.

* Add inc file generater to updater

* Add option to select certain inc files fore generation.

* Enable clean build and implement patcher for inc files.

* Format config

* Patch main header files after inc generation.

* Implement clang-format function (unused yet, because it takes forever.)

* Copy generated inc files to arch dir

* Invert clean option (noramlly we need to clean the build dir.)

* Clearify arg doc

* Rename SystemRegister file for AArch64

* Centralize handling of path variables.

* Check if SystemOperands had to be generated before renaming on of its files.

* Replace class parameters by calling get_path

* Remove updater config which only contained paths.

* Add refactor option.

* Remove more path handling in the Configurator.

* Add translation step to updater.

* Fix includes after CppTranslator was moved into the Updater

* Remove updater config

* Fix several issue in the Configurator

* Fix file operations

* Remove addition argument from translator.

* Add Differ step to updater.

* Add path variable for arch_config

* Add diff step.

* Fix typo

* Introduce .clang-format path variable.

* Remove duplicate functions

* Add option to select update steps to execute.

* Check in write functions for write flag.

* Rename PatchMainHeader -> HeaderPatcher

* Move .gitignore

* Add README to vendor dir.

* Add all system operands to cstool output

* Update cstest with aarch64 changes

* Remove wb flag of aarch64 detail struct

* Set updates_flag after decoding

* Set writeback after decoding.

* Rename ARM64 -> AArch64

* Update printer and op mapping

* Exit normally

* Add AArch64 alias

* Fix some tmeplate function calls

* Fix flag check after rebase.

* Fix build by commentig unnused code.

* Add memory operand flag

* Handle memory operands printed via generic printOperand function.

* Handle UImm memory offsets

* Introduce MEM_REG and MEM_IMM op types

* Handle scaled memory immediates

* Check for op_count before checking for mem op at -1 index.

* Update memory operand flags.

* Pass imm/reg memory ops in set_imm/reg to set_mem.

* Add missing set_sme_operand call and fix assert.

* Remove CS_OP_MEM flag before entering switch.

* Preidcates are registers.

* Add shift info always to the previous operand

* Check for generic system regs

* Handle NumLanes = 0 LaneKind = q case

* Replace printImm call with normal print logic. Otherwise ops get added twice to detail.

* Handle FP operands in printOperand.

* Add access information to float operands.

* Rewrite SME matrix handling.

* Set correct SME layouts and allow for immediate range sme offsets.

* Handle cases of unknown system alias by setting their raw values

* Update cstool and header file with new SME offset handling

* Handle SME Tile lists.

* Fix build error in cstest

* Update MC tests for AArch64

* Handle TLBI operands and fix printing bug.

* Fix: Print signed value as signed.

* Add more system alias to detail.

* Remove duplicate hex prefix

* Set correct values for the register info

* Replace tabs with white spaces

* Move string append logic to own function.

* Set DecodeComplete = true before decoding (as originally in the LLVM code).

* Change type of feature argument, since only LLVM features are passed, not CS groups.

* Imitate lower_bound for the index table binary search.

* Remove trailing comments from test files.

* Print shift amount in decimal

* Save detail of shift alias instructions.

* Add extension details fot ext instruction alias

* Print LSB and width in decimal

* Fix LLVM bug. The feature check for V8_2a doesn't check if all features are enabled.

* Fix lower_bounds check.
For m == 0 we wrap around 0 of cause.

* Fix feature check. Add check for FeatureAll since it includes XS

* Operate on temporary MCInst when trying decoding.

* Add lower_bound behavior to IndexTypeStr binsearch.

* Fix MC tests which were incorrect because of missing FeatureAll check

* Add Alias handling for AArch64

* Update system operands with SYSIMM types and add additional sysop category.

* Add macros for meta programming (ARM64 <-> AArch64 selection).

* Fix union/struct confusion and add raw_value member to uninions.

* Allow to set Syntax and mode options for AArch64

* Fix build warning by using correct type

* Print shift value in decimal

* Add missing call to add_cs_detail.

* Update name map files with normalized names.

* Remove unused function

* Add check if detail should be filled.

* Fill detail for real instructions if only real detail is requested.

* Add always the extension.

* Make dir creation log message debug level

* Implement ADR immediate operand printer.

See: c3484b1fdc

* Check for flag registers beeing written and update flag.

* Move multiple CondCode helpers to aarch64.h because they are so freaking useful.

+ Print CC if it is EQ

* Fix incorrectly initialized CC and VectorLayout.

* Add LSL shift type for extensions.

* Fix case when shift amount is 0

* Fix post-index memory instructions.

* Pass raw immediate through getShiftValue to extract actual shift amount

* Setup AArch64 detail ops.

* Add flag for operands part of a list.

* Set vector indices for all relevant registers.

* Add missing call to add_cs_detail for postIncOperands

* Add ugly yet reliable way to determine post-index addressing mode

* Add support for old Capstone register alias.

* Remove leading space before some alias mnemonics.

* add AARCH64 to `cmake.sh`

* add HAS_AARCH64 to `cs.c`

* should probably just reference `cs_operand.h` in `aarch64.h`

* hint compiler at `AArch64_SYSREG` enum type for casting purposes

* update `Makefile` for AARCH64

leaves `CAPSTONE_HAS_ARM64` supported

* `testFeatureBits` platform function check

`testFeatureBits` should check if the platform function is visible first

* update tests to use AARCH64 convention

* hack: avoid enum casts for `MCInst` Values

Apple compiler really hates typecasting a enum, even if bounded from a unsigned. Lets set the raw_value directly

is a hack and needs proper review

* Check for present detail before accessing it.

* Add CS only groups

* Use general map ins_op type

* Fix build warning about str size computation.

* Disable warning about unitialized value for GCC 11.

Imm is initialized and the warning does not appear
in later versions.

* Use correct include guard for PPC

* Add missing requirements

* Update SystemOperand enums.

* Fix overlapping comparison warning

* Fix reachable assert where OpNum is not of type IMM

* Handle 0.0 operand for fcmp

* Fix incorrect variable passed.

* Fix for MacOS which doesn't know the warning and throws another one.

* Make getExtendEncoding static to fix build warning on MSVC.

* Fix build error: 'missing binary operator before token' by checking __GNUC__

* Add string search to add vector layout info.

* Add missing mem disponents of several ldr and str instructions.

* Add 0 immediates to several instructions.

* Rename v regs to q and d variant.

The cs_regname API can not pass the variant name of the register requested.
So we simply emit the default variant name.

* Fix incorrect enum value.

* Fix tests for system operands.

* Fix syntax issues in tests.

* Rename Arm64 -> AArch64 Python bindings.

* Fix Python bindings C structs.

* Fix generation of constants (ARMCC skipped because it starts with ARM)

* Update const files

* Remove -Wmaybe-uninitialized warning since it fails fuzz build

* Add missing comma

* Fix case

* Fix AArch64 Python bindings:

- Do not generate constants automatically (dscript is way too buggy).
- Update printing of details.

* Rename ARM64 -> AArch64 in test_corpus.py

* Rename test_arm64 -> test_aarch64

* Rename ARM-64 -> AArch64

* Fix diff CI test by disassembling AArch64 at former ARM64 place

* Fix several wrong types and remove unnecessary memebers from Python binding

* Fix: Same printing format of detail for cstool, test_ and test_*.py

* Fix: pass correct op index for mov alias with op[1] == reg wzr.

* Set prfm op manuall in case of unnown sysop. set_imm would add it to an memory operand wihtout base.

* Fix: If barrier ops are not set an assert is reached.

We fix it here by simply getting the immediate as the printing code does.

---------

Co-authored-by: Peace-Maker <peace-maker@wcfan.de>
Co-authored-by: Dayton <5340801+watbulb@users.noreply.github.com>
2023-11-15 12:12:14 +08:00

268 lines
10 KiB
Python
Executable File

#!/usr/bin/python
# Test tool to compare Capstone output with llvm-mc. By Nguyen Anh Quynh, 2014
import array, os.path, sys
from subprocess import Popen, PIPE, STDOUT
from capstone import *
# convert all hex numbers to decimal numbers in a text
def normalize_hex(a):
while(True):
i = a.find('0x')
if i == -1: # no more hex number
break
hexnum = '0x'
for c in a[i + 2:]:
if c in '0123456789abcdefABCDEF':
hexnum += c
else:
break
num = int(hexnum, 16)
a = a.replace(hexnum, str(num))
return a
def run_mc(arch, hexcode, option, syntax=None):
def normalize(text):
# remove tabs
text = text.lower()
items = text.split()
text = ' '.join(items)
if arch == CS_ARCH_X86:
# remove comment after #
i = text.find('# ')
if i != -1:
return text[:i].strip()
if arch == CS_ARCH_AARCH64:
# remove comment after #
i = text.find('// ')
if i != -1:
return text[:i].strip()
# remove some redundant spaces
text = text.replace('{ ', '{')
text = text.replace(' }', '}')
return text.strip()
#print("Trying to decode: %s" %hexcode)
if syntax:
if arch == CS_ARCH_MIPS:
p = Popen(['llvm-mc', '-disassemble', '-print-imm-hex', '-mattr=+msa', syntax] + option, stdout=PIPE, stdin=PIPE, stderr=STDOUT)
else:
p = Popen(['llvm-mc', '-disassemble', '-print-imm-hex', syntax] + option, stdout=PIPE, stdin=PIPE, stderr=STDOUT)
else:
if arch == CS_ARCH_MIPS:
p = Popen(['llvm-mc', '-disassemble', '-print-imm-hex', '-mattr=+msa'] + option, stdout=PIPE, stdin=PIPE, stderr=STDOUT)
else:
p = Popen(['llvm-mc', '-disassemble', '-print-imm-hex'] + option, stdout=PIPE, stdin=PIPE, stderr=STDOUT)
output = p.communicate(input=hexcode)[0]
lines = output.split('\n')
#print lines
if 'invalid' in lines[0]:
#print 'invalid ----'
return 'FAILED to disassemble (MC)'
else:
#print 'OK:', lines[1]
return normalize(lines[1].strip())
def test_file(fname):
print("Test %s" %fname);
f = open(fname)
lines = f.readlines()
f.close()
if not lines[0].startswith('# '):
print("ERROR: decoding information is missing")
return
# skip '# ' at the front, then split line to get out hexcode
# Note: option can be '', or 'None'
#print lines[0]
#print lines[0][2:].split(', ')
(arch, mode, option) = lines[0][2:].split(', ')
mode = mode.replace(' ', '')
option = option.strip()
archs = {
"CS_ARCH_ARM": CS_ARCH_ARM,
"CS_ARCH_AARCH64": CS_ARCH_AARCH64,
"CS_ARCH_MIPS": CS_ARCH_MIPS,
"CS_ARCH_PPC": CS_ARCH_PPC,
"CS_ARCH_SPARC": CS_ARCH_SPARC,
"CS_ARCH_SYSZ": CS_ARCH_SYSZ,
"CS_ARCH_X86": CS_ARCH_X86,
"CS_ARCH_XCORE": CS_ARCH_XCORE,
"CS_ARCH_RISCV": CS_ARCH_RISCV
# "CS_ARCH_M68K": CS_ARCH_M68K,
}
modes = {
"CS_MODE_16": CS_MODE_16,
"CS_MODE_32": CS_MODE_32,
"CS_MODE_64": CS_MODE_64,
"CS_MODE_MIPS32": CS_MODE_MIPS32,
"CS_MODE_MIPS64": CS_MODE_MIPS64,
"0": CS_MODE_ARM,
"CS_MODE_ARM": CS_MODE_ARM,
"CS_MODE_THUMB": CS_MODE_THUMB,
"CS_MODE_ARM+CS_MODE_V8": CS_MODE_ARM+CS_MODE_V8,
"CS_MODE_THUMB+CS_MODE_V8": CS_MODE_THUMB+CS_MODE_V8,
"CS_MODE_THUMB+CS_MODE_MCLASS": CS_MODE_THUMB+CS_MODE_MCLASS,
"CS_MODE_LITTLE_ENDIAN": CS_MODE_LITTLE_ENDIAN,
"CS_MODE_BIG_ENDIAN": CS_MODE_BIG_ENDIAN,
"CS_MODE_64+CS_MODE_LITTLE_ENDIAN": CS_MODE_64+CS_MODE_LITTLE_ENDIAN,
"CS_MODE_64+CS_MODE_BIG_ENDIAN": CS_MODE_64+CS_MODE_BIG_ENDIAN,
"CS_MODE_MIPS32+CS_MODE_MICRO": CS_MODE_MIPS32+CS_MODE_MICRO,
"CS_MODE_MIPS32+CS_MODE_MICRO+CS_MODE_BIG_ENDIAN": CS_MODE_MIPS32+CS_MODE_MICRO+CS_MODE_BIG_ENDIAN,
"CS_MODE_MIPS32+CS_MODE_BIG_ENDIAN+CS_MODE_MICRO": CS_MODE_MIPS32+CS_MODE_MICRO+CS_MODE_BIG_ENDIAN,
"CS_MODE_BIG_ENDIAN+CS_MODE_V9": CS_MODE_BIG_ENDIAN + CS_MODE_V9,
"CS_MODE_MIPS32+CS_MODE_BIG_ENDIAN": CS_MODE_MIPS32+CS_MODE_BIG_ENDIAN,
"CS_MODE_MIPS32+CS_MODE_LITTLE_ENDIAN": CS_MODE_MIPS32+CS_MODE_LITTLE_ENDIAN,
"CS_MODE_MIPS64+CS_MODE_LITTLE_ENDIAN": CS_MODE_MIPS64+CS_MODE_LITTLE_ENDIAN,
"CS_MODE_MIPS64+CS_MODE_BIG_ENDIAN": CS_MODE_MIPS64+CS_MODE_BIG_ENDIAN,
"CS_MODE_RISCV32": CS_MODE_RISCV32,
"CS_MODE_RISCV64": CS_MODE_RISCV64,
}
options = {
"CS_OPT_SYNTAX_ATT": CS_OPT_SYNTAX_ATT,
"CS_OPT_SYNTAX_NOREGNAME": CS_OPT_SYNTAX_NOREGNAME,
}
mc_modes = {
("CS_ARCH_X86", "CS_MODE_32"): ['-triple=i386'],
("CS_ARCH_X86", "CS_MODE_64"): ['-triple=x86_64'],
("CS_ARCH_ARM", "CS_MODE_ARM"): ['-triple=armv7'],
("CS_ARCH_ARM", "CS_MODE_THUMB"): ['-triple=thumbv7'],
("CS_ARCH_ARM", "CS_MODE_ARM+CS_MODE_V8"): ['-triple=armv8'],
("CS_ARCH_ARM", "CS_MODE_THUMB+CS_MODE_V8"): ['-triple=thumbv8'],
("CS_ARCH_ARM", "CS_MODE_THUMB+CS_MODE_MCLASS"): ['-triple=thumbv7m'],
("CS_ARCH_AARCH64", "0"): ['-triple=aarch64'],
("CS_ARCH_MIPS", "CS_MODE_MIPS32+CS_MODE_BIG_ENDIAN"): ['-triple=mips'],
("CS_ARCH_MIPS", "CS_MODE_MIPS32+CS_MODE_MICRO"): ['-triple=mipsel', '-mattr=+micromips'],
("CS_ARCH_MIPS", "CS_MODE_MIPS64"): ['-triple=mips64el'],
("CS_ARCH_MIPS", "CS_MODE_MIPS32"): ['-triple=mipsel'],
("CS_ARCH_MIPS", "CS_MODE_MIPS64+CS_MODE_BIG_ENDIAN"): ['-triple=mips64'],
("CS_ARCH_MIPS", "CS_MODE_MIPS32+CS_MODE_MICRO+CS_MODE_BIG_ENDIAN"): ['-triple=mips', '-mattr=+micromips'],
("CS_ARCH_MIPS", "CS_MODE_MIPS32+CS_MODE_BIG_ENDIAN+CS_MODE_MICRO"): ['-triple=mips', '-mattr=+micromips'],
("CS_ARCH_PPC", "CS_MODE_BIG_ENDIAN"): ['-triple=powerpc64'],
('CS_ARCH_SPARC', 'CS_MODE_BIG_ENDIAN'): ['-triple=sparc'],
('CS_ARCH_SPARC', 'CS_MODE_BIG_ENDIAN+CS_MODE_V9'): ['-triple=sparcv9'],
('CS_ARCH_SYSZ', '0'): ['-triple=s390x', '-mcpu=z196'],
('CS_ARCH_RISCV', 'CS_MODE_RISCV32'): ['-triple=riscv32'],
('CS_ARCH_RISCV', 'CS_MODE_RISCV64'): ['-triple=riscv64'],
}
#if not option in ('', 'None'):
# print archs[arch], modes[mode], options[option]
#print(arch, mode, option)
md = Cs(archs[arch], modes[mode])
mc_option = None
if arch == 'CS_ARCH_X86':
# tell llvm-mc to use Intel syntax
mc_option = '-output-asm-variant=1'
if arch == 'CS_ARCH_ARM' or arch == 'CS_ARCH_PPC' :
md.syntax = CS_OPT_SYNTAX_NOREGNAME
if fname.endswith('3DNow.s.cs'):
md.syntax = CS_OPT_SYNTAX_ATT
for line in lines[1:]:
# ignore all the input lines having # in front.
if line.startswith('#'):
continue
#print("Check %s" %line)
code = line.split(' = ')[0]
asm = ''.join(line.split(' = ')[1:])
hex_code = code.replace('0x', '')
hex_code = hex_code.replace(',', '')
hex_data = hex_code.decode('hex')
#hex_bytes = array.array('B', hex_data)
x = list(md.disasm(hex_data, 0))
if len(x) > 0:
if x[0].op_str != '':
cs_output = "%s %s" %(x[0].mnemonic, x[0].op_str)
else:
cs_output = x[0].mnemonic
else:
cs_output = 'FAILED to disassemble'
cs_output2 = normalize_hex(cs_output)
cs_output2 = cs_output2.replace(' ', '')
if arch == 'CS_ARCH_MIPS':
# normalize register alias names
cs_output2 = cs_output2.replace('$at', '$1')
cs_output2 = cs_output2.replace('$v0', '$2')
cs_output2 = cs_output2.replace('$v1', '$3')
cs_output2 = cs_output2.replace('$a0', '$4')
cs_output2 = cs_output2.replace('$a1', '$5')
cs_output2 = cs_output2.replace('$a2', '$6')
cs_output2 = cs_output2.replace('$a3', '$7')
cs_output2 = cs_output2.replace('$t0', '$8')
cs_output2 = cs_output2.replace('$t1', '$9')
cs_output2 = cs_output2.replace('$t2', '$10')
cs_output2 = cs_output2.replace('$t3', '$11')
cs_output2 = cs_output2.replace('$t4', '$12')
cs_output2 = cs_output2.replace('$t5', '$13')
cs_output2 = cs_output2.replace('$t6', '$14')
cs_output2 = cs_output2.replace('$t7', '$15')
cs_output2 = cs_output2.replace('$t8', '$24')
cs_output2 = cs_output2.replace('$t9', '$25')
cs_output2 = cs_output2.replace('$s0', '$16')
cs_output2 = cs_output2.replace('$s1', '$17')
cs_output2 = cs_output2.replace('$s2', '$18')
cs_output2 = cs_output2.replace('$s3', '$19')
cs_output2 = cs_output2.replace('$s4', '$20')
cs_output2 = cs_output2.replace('$s5', '$21')
cs_output2 = cs_output2.replace('$s6', '$22')
cs_output2 = cs_output2.replace('$s7', '$23')
cs_output2 = cs_output2.replace('$k0', '$26')
cs_output2 = cs_output2.replace('$k1', '$27')
#print("Running MC ...")
if fname.endswith('thumb-fp-armv8.s.cs'):
mc_output = run_mc(archs[arch], code, ['-triple=thumbv8'], mc_option)
elif fname.endswith('mips64-alu-instructions.s.cs'):
mc_output = run_mc(archs[arch], code, ['-triple=mips64el', '-mcpu=mips64r2'], mc_option)
else:
mc_output = run_mc(archs[arch], code, mc_modes[(arch, mode)], mc_option)
mc_output2 = normalize_hex(mc_output)
if arch == 'CS_ARCH_MIPS':
mc_output2 = mc_output2.replace(' 0(', '(')
if arch == 'CS_ARCH_PPC':
mc_output2 = mc_output2.replace('.+', '')
mc_output2 = mc_output2.replace('.', '')
mc_output2 = mc_output2.replace(' 0(', '(')
mc_output2 = mc_output2.replace(' ', '')
mc_output2 = mc_output2.replace('opaque', '')
if (cs_output2 != mc_output2):
asm = asm.replace(' ', '').strip().lower()
if asm != cs_output2:
print("Mismatch: %s" %line.strip())
print("\tMC = %s" %mc_output)
print("\tCS = %s" %cs_output)
if __name__ == '__main__':
if len(sys.argv) == 1:
fnames = sys.stdin.readlines()
for fname in fnames:
test_file(fname.strip())
else:
#print("Usage: ./test_mc.py <input-file.s.cs>")
test_file(sys.argv[1])