mirror of https://github.com/capstone-engine/llvm-capstone.git synced 2024-11-23 13:50:11 +00:00

llvm with tablegen backend for capstone disassembler

Go to file

Rot127 68972dc787 Merge pull request #33 from Rot127/decoder-fixes Decoder fixes		2023-09-11 16:53:09 +00:00
.github/workflows	Update deprecated command	2023-05-08 11:46:02 +08:00
cmake	Update LLVM release/16.x	2023-05-08 00:38:29 +00:00
llvm	Merge pull request #33 from Rot127/decoder-fixes	2023-09-11 16:53:09 +00:00
third-party	Update LLVM release/16.x	2023-05-08 00:38:29 +00:00
CONTRIBUTING.md	Add LLVM release/16.x	2023-05-06 09:11:32 +00:00
DeprecatedFeatures.md	Add documentation.	2023-06-02 07:36:03 -05:00
LICENSE.TXT	Add LLVM release/16.x	2023-05-06 09:11:32 +00:00
README.md	Remove invalid information, clearify certain points (remove passive, simplify word choice).	2023-07-06 09:53:24 -05:00
SECURITY.md	Add LLVM release/16.x	2023-05-06 09:11:32 +00:00

README.md

Capstone's LLVM with refactored TableGen backends

This LLVM version has the purpose to generate code for the Capstone disassembler.

It refactors the TableGen emitter backends, so they can emit C code in addition to the C++ code they normally emit.

Please note that within LLVM we speak of a Target if we refer to an architecture.

Code generation

Relevant files

The TableGen emitter backends are located in llvm/utils/TableGen/.

The target definition files (.td) define the instructions, operands, features and other things. This is the source of all our information. If something is wrongly defined there, it will be wrong in the generated files. You can find the td files in llvm/lib/Target/<ARCH>/.

Code generation overview

Generating code for a target has 6 steps:

                                                                         5                 6
                                                                    ┌──────────┐      ┌──────────┐
                                                                    │Printer   │      │CS .inc   │
    1               2                 3                4        ┌──►│Capstone  ├─────►│files     │
┌───────┐     ┌───────────┐     ┌───────────┐     ┌──────────┐  │   └──────────┘      └──────────┘
│ .td   │     │           │     │           │     │ Code-    │  │
│ files ├────►│ TableGen  ├────►│  CodeGen  ├────►│ Emitter  │◄─┤
└───────┘     └──────┬────┘     └───────────┘     └──────────┘  │
                     │                                 ▲        │   ┌──────────┐      ┌──────────┐
                     └─────────────────────────────────┘        └──►│Printer   ├─────►│LLVM .inc │
                                                                    │LLVM      │      │files     │
                                                                    └──────────┘      └──────────┘

LLVM targets are defined in .td files. They describe instructions, operands, features and other properties.
LLVM TableGen parses these files and converts them to an internal representation of Classes, Records, DAGs and other types.
In the second step a TableGen component called CodeGen abstracts this even further. The result is a representation which is not specific to any target (e.g. the CodeGenInstruction class can represent a machine instruction of any target).
Different code emitter backends use the result of the former two components to generated code.
Whenever the emitter emits code it calls a Printer. Either the PrinterCapstone to emit C or PrinterLLVM to emit C++. Which one is controlled by the --printerLang=[CCS,C++] option passed to llvm-tblgen.
After the emitter backend is done, the Printer writes the output_stream content into the .inc files.

Emitter backends and their use cases

We use the following emitter backends

Name	Generated Code	Note
AsmMatcherEmitter	Mapping tables for Capstone
AsmWriterEmitter	State machine to decode the asm-string for a `MCInst`
DecoderEmitter	State machine which decodes bytes to a `MCInst`.
InstrInfoEmitter	Tables with instruction information (instruction enum, instr. operand information...)
RegisterInfoEmitter	Tables with register information (register enum, register type info...)
SubtargetEmitter	Table about the target features.
SearchableTablesEmitter	Usually used to generate tables and decoding functions for system operands.	1. Not all targets use this.

Developer notes

If you find C++ code within the generated files you need to extend PrinterCapstone::translateToC(). If this still doesn't fix the problem, the code snipped wasn't passed through translateToC() before emitting. So you need to figure out where this specific code snipped is printed and pass it to translateToC().

If the mapping files miss operand types or access information, then the .td files are incomplete (happens surprisingly often). You need to search for the instruction or operands with missing or incorrect values and fix them.

  Wrong access attributes for:
    - Registers, Immediates: The instructions defines "out" and "in" operands incorrectly.
    - Memory: The "mayLoad" or "mayStore" variable is not set for the instruction.

  Operand type is invalid:
    - The "OperandType" variable is unset for this operand.

If certain target features (e.g. architecture extensions) were removed from upstream LLVM or you want to add your own, checkout DeprecatedFeatures.md.