llvm-capstone/README.md
Rot127 c0317ac800 Rebase refactored TableGen backends onto LLVM 18.
The MCInstDesc table changed. Bsides this only minor changes were done
and some additional code is emitted now for LLVM.

This commit is the combination of all previous Auto-Sync commits.
The list of commit messages follows:

-----------

Combination of all commits of the refactored tablegen backends.

These are the changes made for LLVM 16.

Refactor Capstone relevant TableGen Emitter backends.

This commit extracts the code which emits generated tables into two printer classes.
The Printer is called whenever actual code is written to a file.
There is the PrinterLLVM which emits tht code as before and
PrinterCapstone which is tailored to or needs (emitting C and generates
more info).

Additionally missing memory access properties were added to ARMs td
files.

Emit a single header for all files.

Captialize Target name for enums.

Add lay metric to emit enum value for Banked and system regs.

Malloc substr

Sort instructions in ascending order.

Free substr after use

Add vanished constrainsts

Fix `regInfoEmitEnums()` and indent

Fix `GenDisassemblerTables.inc#checkDecoderPredicate()`

Fix `TriCoreGenRegisterInfo.inc` | `PrinterCapstone::regInfoEmitRegClasses`

revert changes to NEON instructions

Add instructions with duplicate operands as Matchables.

Add memory load and store info

Correct memory access and out operand info

Set register lists again as read ops due to https://github.com/llvm/llvm-project/issues/62455

Make printAliasInstr and getMnemonic static.

Generate CS instruction enums from actual mnemonic. Not via the flawed AsmMatcher.

Fix typo in InstrInfoEmitter.cpp

Add deprecated QPX feature

Replace + and - with p and m

Add AssemblerPredicates to PPC

Generate RegEncodingTable

Define functions which are called by the Mapper as static.

Necessary because these functions are present in each arch'

Remove set_mem_access().

The cases where this is used to mark access to actual memory operands are
either very rare, or those are neon lane indicies.

Generate correct op type for absolute addresses.

Check for RegisterPointer operands first to prevent mis-categorization.

Add missing Operand types

Generate Instruction formats for PPC.

Add Paired Single instructions.

Partly revert 94e41ce23a7fd863a96288ec05b6c7202c3cfbf1 (introduces accidentially removed code.)

Set correct operand types for PS operands

Add memory read/write attributes

Add missing operand types

Add mayLoad and mayStore information.

Add documentation.

Handle special AArch64 operand

Replace C++ with C code.

Check for duplicate enum instr. names

Check for duplicate defintions of system registers.

Add note about missing target names.

Resolve templates in a single static method and add docs about it.

Revert printing target name in upper case.

Revert partially C++ syntax fixes in .td files.

They break the TemplateCOllector since it searches for exactly those references but can't find any'

Add all SubtargetFeatures to feature enum.

Not just the one used by CGIs.

Pass Decoder

Enable to check specific table fields to determine if reg enum must be emitted.

Allow to add namespace to type name/

Formatting

Rework emitting of tables.

The system operands are now emitted in reg, imm and aliass groups.
Also a bug was fixed which emitted incorrect code..

Check for rename IMPLICIT_IMM operand types

Pass DecodeComplete as pointer not as reference

Print undef when it needs to be printed.

Add namespace ids to all types and functions.

Rework C translation.

Pass MCOp as pointer not as ref

Add missing SysImm type

Fix syntax mistakes

Generate additonal sys immediates and op groups.

Handle edge case for printSVERegOp

Handle default arguments of template functions.

Add two missing op groups

Generate a static RecEncodingTable

Set enum values to encodings of the sys ops

Generate a single Enum value file for system operands.

Replace System operand groups with their operand types

Fix missing braces warning

Emit MCOperand validator.

Emit lookupByName functions for sys operands

Add namespaces for ARM.

Check for Target if default arguments of template functions are resolved.

auto-sync opcode & operand encoding info generation (#14)

* Added operand and opcode info generation

* Wrapped deprecated macro under an intellisense check

Basically intellisense fails, causing multiple errors in other files,

so when intellisense parses the code it will use the different version of the macro

* Fixed a small bug

Used double braces to prevent an old bug

Removed extra new line and fixed a bug regarding move semantics
2024-05-29 08:31:35 +00:00

5.3 KiB

Capstone's LLVM with refactored TableGen backends

This LLVM version has the purpose to generate code for the Capstone disassembler.

It refactors the TableGen emitter backends, so they can emit C code in addition to the C++ code they normally emit.

Please note that within LLVM we speak of a Target if we refer to an architecture.

Code generation

Relevant files

The TableGen emitter backends are located in llvm/utils/TableGen/.

The target definition files (.td), which define the instructions, operands, features etc., can be found in llvm/lib/Target/<ARCH>/.

Code generation overview

Generating code for a target has 6 steps:

                                                                         5                 6
                                                                    ┌──────────┐      ┌──────────┐
                                                                    │Printer   │      │CS .inc   │
    1               2                 3                4        ┌──►│Capstone  ├─────►│files     │
┌───────┐     ┌───────────┐     ┌───────────┐     ┌──────────┐  │   └──────────┘      └──────────┘
│ .td   │     │           │     │           │     │ Code-    │  │
│ files ├────►│ TableGen  ├────►│  CodeGen  ├────►│ Emitter  │◄─┤
└───────┘     └──────┬────┘     └───────────┘     └──────────┘  │
                     │                                 ▲        │   ┌──────────┐      ┌──────────┐
                     └─────────────────────────────────┘        └──►│Printer   ├─────►│LLVM .inc │
                                                                    │LLVM      │      │files     │
                                                                    └──────────┘      └──────────┘
  1. LLVM targets are defined in .td files. They describe instructions, operands, features and other properties.

  2. LLVM TableGen parses these files and converts them to an internal representation of Classes, Records, DAGs and other types.

  3. In the second step a TableGen component called CodeGen abstracts this even further. The result is a representation which is not specific to any target (e.g. the CodeGenInstruction class can represent a machine instruction of any target).

  4. Different code emitter backends use the result of the former two components to generated code.

  5. Whenever the emitter emits code it calls a Printer. Either the PrinterCapstone to emit C or PrinterLLVM to emit C++. Which one is controlled by the --printerLang=[CCS,C++] option passed to llvm-tblgen.

  6. After the emitter backend is done, the Printer writes the output_stream content into the .inc files.

Emitter backends and their use cases

We use the following emitter backends

Name Generated Code Note
AsmMatcherEmitter Mapping tables for Capstone
AsmWriterEmitter State machine to decode the asm-string for a MCInst
DecoderEmitter State machine which decodes bytes to a MCInst.
InstrInfoEmitter Tables with instruction information (instruction enum, instr. operand information...)
RegisterInfoEmitter Tables with register information (register enum, register type info...)
SubtargetEmitter Table about the target features.
SearchableTablesEmitter Usually used to generate tables and decoding functions for system registers. 1. Not all targets use this.
2. Backend can't access the target name. Wherever the target name is needed __ARCH__ or ##ARCH## is printed and later replaced.

Developer notes

  • If you find C++ code within the generated files you need to extend PrinterCapstone::translateToC(). If this still doesn't fix the problem, the code snipped wasn't passed through translateToC() before emitting. So you need to figure out where this specific code snipped is printed and add translateToC().

  • If the mapping files miss operand types or access information, then the .td files are incomplete (happens surprisingly often). You need to search for the instruction or operands with missing or incorrect values and fix them.

      Wrong access attributes for:
        - Registers, Immediates: The instructions defines "out" and "in" operands incorrectly.
        - Memory: The "mayLoad" or "mayStore" variable is not set for the instruction.
    
      Operand type is invalid:
        - The "OperandType" variable is unset for this operand type.
    
  • If certain target features (e.g. architecture extensions) were removed from LLVM or you want to add your own, checkout DeprecatedFeatures.md.