llvm with tablegen backend for capstone disassembler
Go to file
2024-05-29 08:31:35 +00:00
.ci [🍒][ci] Fix the base branch we use to determine changes (#79503) (#79506) 2024-01-25 13:57:41 -08:00
.github Fix workflows 2024-05-29 08:31:35 +00:00
bolt [BOLT] Fix unconditional output of boltedcollection in merge-fdata (#78653) 2024-01-18 20:00:47 -08:00
clang [clang][modules] giving the __stddef_ headers their own modules can cause redeclaration errors with -fbuiltin-headers-in-system-modules (#84127) 2024-03-15 17:06:34 -07:00
clang-tools-extra [clangd] [HeuristicResolver] Protect against infinite recursion on DependentNameTypes (#83542) 2024-03-16 15:26:15 -07:00
cmake [CMake] Switch the CMP0091 policy (MSVC_RUNTIME_LIBRARY) to the new behaviour 2023-07-17 09:59:05 +03:00
compiler-rt Unbreak *tf builtins for hexfloat (#82208) 2024-03-12 17:13:57 -07:00
cross-project-tests [AMDGPU] Add GFX12 WMMA and SWMMAC instructions (#77795) 2024-01-26 20:01:08 -08:00
flang Apply kind code check on exitstat and cmdstat (#78286) 2024-01-30 16:45:35 +00:00
libc [libc] remove redundant call_once (#79226) 2024-01-23 15:38:12 -08:00
libclc libclc: add missing AMD gfx symlinks (#78884) 2024-01-22 15:32:13 +07:00
libcxx [libc++] Use clang-tidy version that matches the compiler we use in the CI (#85305) 2024-03-16 15:32:44 -07:00
libcxxabi [runtimes] Prefer -fvisibility-global-new-delete=force-hidden (#84917) 2024-03-14 21:36:23 -07:00
libunwind [runtimes] Prefer -fvisibility-global-new-delete=force-hidden (#84917) 2024-03-14 21:36:23 -07:00
lld [lld][LoongArch] Fix handleUleb128 2024-03-16 18:28:00 -07:00
lldb [lldb] Improve maintainability and readability for ValueObject methods (#75865) 2024-01-23 16:07:52 -08:00
llvm Change RegDiffLists type to MCPhysReg 2024-05-29 08:31:35 +00:00
llvm-libgcc [llvm-libgcc][CMake] Refactor llvm-libgcc (#65455) 2023-09-18 22:56:03 -07:00
mlir [mlir] Skip invalid test on big endian platform (s390x) (#80246) 2024-02-13 11:39:15 -08:00
openmp release/18.x: [openmp] __kmp_x86_cpuid fix for i386/PIC builds. (#84626) (#85053) 2024-03-14 21:43:47 -07:00
polly [AST] Don't merge memory locations in AliasSetTracker (#65731) 2024-01-17 15:59:13 +01:00
pstl Clear release notes for 18.x 2023-07-25 13:58:49 +02:00
runtimes [CMake] Always define runtimes-test-depends (#73629) 2023-11-28 15:30:52 -08:00
third-party [third-party] Silence warning on benchmark when building with Clang ToT 2024-01-17 07:23:56 -05:00
utils [Blaze] Fix build file 2024-01-22 15:45:04 -08:00
.arcconfig
.arclint
.clang-format
.clang-tidy Add -misc-use-anonymous-namespace to .clang-tidy 2023-05-06 02:33:20 +03:00
.git-blame-ignore-revs [libc++] Add libc++ clang-formatting commit to git-blame-ignore-revs file 2023-12-18 14:04:41 -05:00
.gitattributes [libc++] Format the code base (#74334) 2023-12-18 14:01:33 -05:00
.gitignore Revert accidental .gitignore change from 9b7763821a 2023-09-07 22:42:05 -07:00
.mailmap .mailmap: add second entry for self 2023-12-19 11:13:19 +00:00
CODE_OF_CONDUCT.md [llvm] Add CODE_OF_CONDUCT.md (#65816) 2023-09-09 10:55:31 -07:00
compare_tblgen_output.sh Remove syntax check, because it doesn't work that easy. 2024-05-29 08:31:35 +00:00
CONTRIBUTING.md Update CONTRIBUTING.md to remove the not about not accepting PR 2023-09-10 15:21:06 -07:00
DeprecatedFeatures.md Rebase refactored TableGen backends onto LLVM 18. 2024-05-29 08:31:35 +00:00
gen_cs_tables.sh Make gen scripts use the repository root dir. 2024-05-29 08:31:35 +00:00
gen_llvm_tables.sh Make gen scripts use the repository root dir. 2024-05-29 08:31:35 +00:00
LICENSE.TXT [docs] Add LICENSE.txt to the root of the mono-repo 2022-08-24 09:35:00 +02:00
README.md Add MatrixIndex_... to the OP_GROUP list 2024-05-29 08:31:35 +00:00
SECURITY.md

Capstone's LLVM with refactored TableGen backends

This LLVM version has the purpose to generate code for the Capstone disassembler.

It refactors the TableGen emitter backends, so they can emit C code in addition to the C++ code they normally emit.

Build

python3 -m venv .venv
source .venv/bin/activate
pip install Ninja cmake
mkdir build
cd build
cmake -G Ninja -DCMAKE_BUILD_TYPE=Debug ../llvm
cmake --build . --target llvm-tblgen --config Debug

Code generation

Please note that within LLVM we speak of a Target if we refer to an architecture.

Relevant files

The TableGen emitter backends are located in llvm/utils/TableGen/.

The target definition files (.td), which define the instructions, operands, features etc., can be found in llvm/lib/Target/<ARCH>/.

Code generation overview

Generating code for a target has 6 steps:

                                                                         5                 6
                                                                    ┌──────────┐      ┌──────────┐
                                                                    │Printer   │      │CS .inc   │
    1               2                 3                4        ┌──►│Capstone  ├─────►│files     │
┌───────┐     ┌───────────┐     ┌───────────┐     ┌──────────┐  │   └──────────┘      └──────────┘
│ .td   │     │           │     │           │     │ Code-    │  │
│ files ├────►│ TableGen  ├────►│  CodeGen  ├────►│ Emitter  │◄─┤
└───────┘     └──────┬────┘     └───────────┘     └──────────┘  │
                     │                                 ▲        │   ┌──────────┐      ┌──────────┐
                     └─────────────────────────────────┘        └──►│Printer   ├─────►│LLVM .inc │
                                                                    │LLVM      │      │files     │
                                                                    └──────────┘      └──────────┘
  1. LLVM targets are defined in .td files. They describe instructions, operands, features and other properties.

  2. LLVM TableGen parses these files and converts them to an internal representation of Classes, Records, DAGs and other types.

  3. In the second step a TableGen component called CodeGen abstracts this even further. The result is a representation which is not specific to any target (e.g. the CodeGenInstruction class can represent a machine instruction of any target).

  4. Different code emitter backends use the result of the former two components to generated code.

  5. Whenever the emitter emits code it calls a Printer. Either the PrinterCapstone to emit C or PrinterLLVM to emit C++. Which one is controlled by the --printerLang=[CCS,C++] option passed to llvm-tblgen.

  6. After the emitter backend is done, the Printer writes the output_stream content into the .inc files.

Emitter backends and their use cases

We use the following emitter backends

Name Generated Code Note
AsmMatcherEmitter Mapping tables for Capstone
AsmWriterEmitter State machine to decode the asm-string for a MCInst
DecoderEmitter State machine which decodes bytes to a MCInst.
InstrInfoEmitter Tables with instruction information (instruction enum, instr. operand information...)
RegisterInfoEmitter Tables with register information (register enum, register type info...)
SubtargetEmitter Table about the target features.
SearchableTablesEmitter Usually used to generate tables and decoding functions for system registers. 1. Not all targets use this.
2. Backend can't access the target name. Wherever the target name is needed __ARCH__ or ##ARCH## is printed and later replaced.

Developer notes

  • If you find C++ code within the generated files you need to extend PrinterCapstone::translateToC(). If this still doesn't fix the problem, the code snipped wasn't passed through translateToC() before emitting. So you need to figure out where this specific code snipped is printed and add translateToC().

  • Template functions with default values for their arguments, don't get replaced properly. See: handleDefaultArg() in PrinterCapstone.cpp to add the default argument value.

  • Some operand printer or decoder are not recognized. Compiler error like:

    .../AArch64GenAsmWriter.inc:18216:5: warning: implicit declaration of function printMatrixIndex_1; did you mean printMatrix_0? [-Wimplicit-function-declaration]
    18216 |     printMatrixIndex_1(MI, 2, O);
        |     ^~~~~~~~~~~~~~~~~~
        |     printMatrix_0
    
    

    To fix this the function declaration is probably missing in the header (e.g. <ARCH>InstPrinter.h). You can copy the DEFINE_printMatrix() function to the header and rewrite it as declaration. Just check the other DECLARE_... macros in the header file.

  • And ARCH_OP_GROUP_... is missing or not generated. Build error like:

    AArch64InstPrinter.c:2249:42: error: AArch64_OP_GROUP_MatrixIndex_8 undeclared (first use in this function); did you mean AArch64_OP_GROUP_MatrixIndex?
     2249 |                 add_cs_detail(MI, CONCAT(AArch64_OP_GROUP_MatrixIndex, Scale), \
    

    Fix it by adding the postfix MatrixIndex_8 to one of the exception lists in PrinterCapstone::printOpPrintGroupEnum().

  • If the mapping files miss operand types or access information, then the .td files are incomplete (happens surprisingly often). You need to search for the instruction or operands with missing or incorrect values and fix them.

      Wrong access attributes for:
        - Registers, Immediates: The instructions defines "out" and "in" operands incorrectly.
        - Memory: The "mayLoad" or "mayStore" variable is not set for the instruction.
    
      Operand type is invalid:
        - The "OperandType" variable is unset for this operand type.
    
  • If certain target features (e.g. architecture extensions) were removed from LLVM or you want to add your own, checkout DeprecatedFeatures.md.