2023-05-22 11:20:28 +00:00
|
|
|
# Capstone's LLVM with refactored TableGen backends
|
2023-05-06 09:11:32 +00:00
|
|
|
|
2023-05-22 11:20:28 +00:00
|
|
|
This LLVM version has the purpose to generate code for the
|
|
|
|
[Capstone disassembler](https://github.com/capstone-engine/capstone).
|
2023-05-06 09:11:32 +00:00
|
|
|
|
2023-05-22 11:20:28 +00:00
|
|
|
It refactors the TableGen emitter backends, so they can emit C code
|
|
|
|
in addition to the C++ code they normally emit.
|
2023-05-06 09:11:32 +00:00
|
|
|
|
2023-05-22 11:20:28 +00:00
|
|
|
Please note that within LLVM we speak of a `Target` if we refer to an architecture.
|
2023-05-06 09:11:32 +00:00
|
|
|
|
2023-05-22 11:20:28 +00:00
|
|
|
## Code generation
|
2023-05-06 09:11:32 +00:00
|
|
|
|
2023-05-22 11:20:28 +00:00
|
|
|
### Relevant files
|
2023-05-06 09:11:32 +00:00
|
|
|
|
2023-05-22 11:20:28 +00:00
|
|
|
The TableGen emitter backends are located in `llvm/utils/TableGen/`.
|
2023-05-06 09:11:32 +00:00
|
|
|
|
2023-07-06 14:53:24 +00:00
|
|
|
The target definition files (`.td`) define the
|
|
|
|
instructions, operands, features and other things. This is the source of all our information.
|
|
|
|
If something is wrongly defined there, it will be wrong in the generated files.
|
|
|
|
You can find the `td` files in `llvm/lib/Target/<ARCH>/`.
|
2023-05-06 09:11:32 +00:00
|
|
|
|
2023-05-22 11:20:28 +00:00
|
|
|
### Code generation overview
|
2023-05-06 09:11:32 +00:00
|
|
|
|
2023-05-22 11:20:28 +00:00
|
|
|
Generating code for a target has 6 steps:
|
2023-05-06 09:11:32 +00:00
|
|
|
|
2023-05-22 11:20:28 +00:00
|
|
|
```
|
|
|
|
5 6
|
|
|
|
┌──────────┐ ┌──────────┐
|
|
|
|
│Printer │ │CS .inc │
|
|
|
|
1 2 3 4 ┌──►│Capstone ├─────►│files │
|
|
|
|
┌───────┐ ┌───────────┐ ┌───────────┐ ┌──────────┐ │ └──────────┘ └──────────┘
|
|
|
|
│ .td │ │ │ │ │ │ Code- │ │
|
|
|
|
│ files ├────►│ TableGen ├────►│ CodeGen ├────►│ Emitter │◄─┤
|
|
|
|
└───────┘ └──────┬────┘ └───────────┘ └──────────┘ │
|
|
|
|
│ ▲ │ ┌──────────┐ ┌──────────┐
|
|
|
|
└─────────────────────────────────┘ └──►│Printer ├─────►│LLVM .inc │
|
|
|
|
│LLVM │ │files │
|
|
|
|
└──────────┘ └──────────┘
|
|
|
|
```
|
2023-05-06 09:11:32 +00:00
|
|
|
|
2023-05-22 11:20:28 +00:00
|
|
|
1. LLVM targets are defined in `.td` files. They describe instructions, operands,
|
|
|
|
features and other properties.
|
2023-05-06 09:11:32 +00:00
|
|
|
|
2023-05-22 11:20:28 +00:00
|
|
|
2. [LLVM TableGen](https://llvm.org/docs/TableGen/index.html) parses these files
|
|
|
|
and converts them to an internal representation of [Classes, Records, DAGs](https://llvm.org/docs/TableGen/ProgRef.html)
|
|
|
|
and other types.
|
2023-05-06 09:11:32 +00:00
|
|
|
|
2023-05-22 11:20:28 +00:00
|
|
|
3. In the second step a TableGen component called [CodeGen](https://llvm.org/docs/CodeGenerator.html)
|
|
|
|
abstracts this even further.
|
|
|
|
The result is a representation which is _not_ specific to any target
|
|
|
|
(e.g. the `CodeGenInstruction` class can represent a machine instruction of any target).
|
2023-05-06 09:11:32 +00:00
|
|
|
|
2023-05-22 11:20:28 +00:00
|
|
|
4. Different code emitter backends use the result of the former two components to
|
|
|
|
generated code.
|
2023-05-06 09:11:32 +00:00
|
|
|
|
2023-05-22 11:20:28 +00:00
|
|
|
5. Whenever the emitter emits code it calls a `Printer`. Either the `PrinterCapstone` to emit C or `PrinterLLVM` to emit C++.
|
|
|
|
Which one is controlled by the `--printerLang=[CCS,C++]` option passed to `llvm-tblgen`.
|
2023-05-06 09:11:32 +00:00
|
|
|
|
2023-05-22 11:20:28 +00:00
|
|
|
6. After the emitter backend is done, the `Printer` writes the `output_stream` content into the `.inc` files.
|
2023-05-06 09:11:32 +00:00
|
|
|
|
2023-05-22 11:20:28 +00:00
|
|
|
### Emitter backends and their use cases
|
2023-05-06 09:11:32 +00:00
|
|
|
|
2023-05-22 11:20:28 +00:00
|
|
|
We use the following emitter backends
|
2023-05-06 09:11:32 +00:00
|
|
|
|
2023-05-22 11:20:28 +00:00
|
|
|
| Name | Generated Code | Note |
|
|
|
|
|------|----------------|------|
|
|
|
|
| AsmMatcherEmitter | Mapping tables for Capstone | |
|
|
|
|
| AsmWriterEmitter | State machine to decode the asm-string for a `MCInst` | |
|
|
|
|
| DecoderEmitter | State machine which decodes bytes to a `MCInst`. | |
|
|
|
|
| InstrInfoEmitter | Tables with instruction information (instruction enum, instr. operand information...) | |
|
|
|
|
| RegisterInfoEmitter | Tables with register information (register enum, register type info...) | |
|
|
|
|
| SubtargetEmitter | Table about the target features. | |
|
2023-07-06 14:53:24 +00:00
|
|
|
| SearchableTablesEmitter | Usually used to generate tables and decoding functions for system operands. | **1.** Not all targets use this. |
|
2023-05-06 09:11:32 +00:00
|
|
|
|
2023-05-22 11:20:28 +00:00
|
|
|
## Developer notes
|
2023-05-06 09:11:32 +00:00
|
|
|
|
2023-05-30 13:06:12 +00:00
|
|
|
- If you find C++ code within the generated files you need to extend `PrinterCapstone::translateToC()`.
|
|
|
|
If this still doesn't fix the problem, the code snipped wasn't passed through `translateToC()` before emitting.
|
2023-07-06 14:53:24 +00:00
|
|
|
So you need to figure out where this specific code snipped is printed and pass it to `translateToC()`.
|
2023-05-06 09:11:32 +00:00
|
|
|
|
2023-05-22 11:20:28 +00:00
|
|
|
- If the mapping files miss operand types or access information, then the `.td` files are incomplete (happens surprisingly often).
|
|
|
|
You need to search for the instruction or operands with missing or incorrect values and fix them.
|
|
|
|
```
|
|
|
|
Wrong access attributes for:
|
|
|
|
- Registers, Immediates: The instructions defines "out" and "in" operands incorrectly.
|
|
|
|
- Memory: The "mayLoad" or "mayStore" variable is not set for the instruction.
|
2023-05-06 09:11:32 +00:00
|
|
|
|
2023-05-22 11:20:28 +00:00
|
|
|
Operand type is invalid:
|
2023-07-06 14:53:24 +00:00
|
|
|
- The "OperandType" variable is unset for this operand.
|
2023-05-22 11:20:28 +00:00
|
|
|
```
|
2023-05-06 09:11:32 +00:00
|
|
|
|
2023-07-06 14:53:24 +00:00
|
|
|
- If certain target features (e.g. architecture extensions) were removed from upstream LLVM or you want to add your own,
|
2023-05-22 11:20:28 +00:00
|
|
|
checkout [DeprecatedFeatures.md](DeprecatedFeatures.md).
|