mirror of
https://github.com/capstone-engine/capstone.git
synced 2024-11-23 05:29:53 +00:00
Restructure auto-sync docs to have them more contained (#2355)
* Restructure auto-sync docs to have them more contained in suite/auto-sync * Enhance Differ documentation * Fix link and emphasize importance of ARCHITECTURE.md * Add auto-syc intro.md document, based on @moste00 work * Be consistent with Auto-Sync naming and use python3
This commit is contained in:
parent
60d5b7ec2f
commit
03c41e1be4
@ -1,12 +1,9 @@
|
||||
# Auto-Sync
|
||||
<!--
|
||||
Copyright © 2022 Rot127 <unisono@quyllur.org>
|
||||
SPDX-License-Identifier: BSD-3
|
||||
-->
|
||||
|
||||
`auto-sync` is the architecture update tool for Capstone.
|
||||
Because the architecture modules of Capstone use mostly code from LLVM,
|
||||
we need to update this part with every LLVM release. `auto-sync` helps
|
||||
with this synchronization between LLVM and Capstone's modules by
|
||||
automating most of it.
|
||||
|
||||
You can find it in `suite/auto-sync`.
|
||||
# Architecture of the Auto-Sync framework
|
||||
|
||||
This document is split into four parts.
|
||||
|
||||
@ -15,8 +12,8 @@ This document is split into four parts.
|
||||
3. Instructions how to refactor an architecture to use `auto-sync`.
|
||||
4. Notes about how to add a new architecture to Capstone with `auto-sync`.
|
||||
|
||||
Please read the section about architecture module design in
|
||||
[ARCHITECTURE.md](ARCHITECTURE.md) before proceeding.
|
||||
Please read the section about capstone module design in
|
||||
[ARCHITECTURE.md](https://github.com/capstone-engine/capstone/blob/next/docs/ARCHITECTURE.md) before proceeding.
|
||||
The architectural understanding is important for the following.
|
||||
|
||||
## Update procedure
|
||||
@ -98,102 +95,30 @@ _Note_: For details about this checkout `suite/auto-sync/CppTranslator/README.md
|
||||
Because the result of the `CppTranslator` is not perfect,
|
||||
we still have many syntax problems left.
|
||||
|
||||
Those need to be fixed by hand.
|
||||
Those need to be fixed partially by hand.
|
||||
|
||||
**Differ**
|
||||
|
||||
In order to ease this process we run the `Differ` after the `CppTranslator`.
|
||||
|
||||
The `Differ` parses each _translated_ file and the corresponding source file _currently_ used in Capstone.
|
||||
It then compares specific nodes from the just translated file to the equivalent nodes in the old file.
|
||||
The `Differ` compares our two versions of C files we have now.
|
||||
One of them are the C files currently used by the architecture module.
|
||||
On the other hand we have the translated C files. Those are still faulty and need to be fixed.
|
||||
|
||||
Most fixes are syntactical problems. Those were almost always resolved before, during the last update.
|
||||
The `Differ` helps you to compare the files and let you select which version to accept.
|
||||
|
||||
Sometimes (not very often though), the newly translated C files contain important changes.
|
||||
Most often though, the old files are already correct.
|
||||
|
||||
The `Differ` parses both files into an abstract syntax tree and compares certain nodes with the same name
|
||||
(mostly functions).
|
||||
|
||||
The user can choose if she accepts the version from the translated file or the old file.
|
||||
This decision is saved for every node.
|
||||
If there exists a saved decision for a node, the previous decision automatically applied again.
|
||||
If there exists a saved decision for two nodes, and the nodes did not change since the last time,
|
||||
it applies the previous decision automatically again.
|
||||
|
||||
Every other syntax error must be solved manually.
|
||||
|
||||
## Update an architecture
|
||||
|
||||
To update an architecture do the following:
|
||||
|
||||
Rebase `llvm-capstone` onto the new LLVM release (if not already done).
|
||||
```
|
||||
# 1. Clone Capstone's LLVM
|
||||
git clone https://github.com/capstone-engine/llvm-capstone
|
||||
cd llvm-capstone
|
||||
git checkout auto-sync
|
||||
|
||||
# 2. Rebase onto the new LLVM release and resolve the conflicts.
|
||||
|
||||
# 3. Build tblgen
|
||||
mkdir build
|
||||
cd build
|
||||
cmake -G Ninja -DLLVM_TARGETS_TO_BUILD=<ARCH> -DCMAKE_BUILD_TYPE=Debug ../llvm
|
||||
cmake --build . --target llvm-tblgen --config Debug
|
||||
|
||||
# 4. Run the updater
|
||||
cd ../../suite/auto-sync/
|
||||
./Updater/ASUpdater.py -a <ARCH>
|
||||
```
|
||||
|
||||
The update script will execute the steps described above and copy the new files to their directories.
|
||||
|
||||
Afterward try to build Capstone and fix any build errors left.
|
||||
|
||||
If new instructions or operands were added, add test cases for those
|
||||
(recession tests for instructions are located in `suite/MC/`).
|
||||
|
||||
TODO: Operand and detail tests
|
||||
<!--
|
||||
TODO: Wait until `cstest` is rewritten and add description about operand testing.
|
||||
Issue: https://github.com/capstone-engine/capstone/issues/1984
|
||||
-->
|
||||
|
||||
## Refactor an architecture for `auto-sync`
|
||||
|
||||
To refactor an architecture to use `auto-sync`, you need to add it to the configuration.
|
||||
|
||||
1. Add the architecture to the supported architectures list in `ASUpdater.py`.
|
||||
2. Configure the `CppTranslator` for your architecture (`suite/auto-sync/CppTranslator/arch_config.json`)
|
||||
|
||||
Now, manually run the update commands within `ASUpdater.py` but *skip* the `Differ` step:
|
||||
|
||||
```
|
||||
./Updater/ASUpdater.py -a <ARCH> -s IncGen Translate
|
||||
```
|
||||
|
||||
The task after this is to:
|
||||
|
||||
- Replace leftover C++ syntax with its C equivalent.
|
||||
- Implement the `add_cs_detail()` handler in `<ARCH>Mapping` for each operand type.
|
||||
- Add any missing logic to the translated files.
|
||||
- Make it build and write tests.
|
||||
- Run the Differ again and always select the old nodes.
|
||||
|
||||
**Notes:**
|
||||
|
||||
- If you find yourself fixing the same syntax error multiple times,
|
||||
please consider adding a `Patch` to the `CppTranslator` for this case.
|
||||
|
||||
- Please check out the implementation of ARM's `add_cs_detail()` before implementing your own.
|
||||
|
||||
- Running the `Differ` after everything is done, preserves your version of syntax corrections, and the next user can auto-apply them.
|
||||
|
||||
- Sometimes the LLVM code uses a single function from a larger source file.
|
||||
It is not worth it to translate the whole file just for this function.
|
||||
Bundle those lonely functions in `<ARCH>DisassemblerExtension.c`.
|
||||
|
||||
- Some generated enums must be included in the `include/capstone/<ARCH>.h` header.
|
||||
At the position where the enum should be inserted, add a comment like this (don't remove the `<>` brackets):
|
||||
|
||||
```
|
||||
// generate content <FILENAME.inc> begin
|
||||
// generate content <FILENAME.inc> end
|
||||
```
|
||||
|
||||
The update script will insert the content of the `.inc` file at this place.
|
||||
|
||||
## Adding a new architecture
|
||||
|
||||
Adding a new architecture follows the same steps as above. With the exception that you need
|
||||
to implement all the Capstone files from scratch.
|
||||
|
||||
Check out an `auto-sync` supporting architectures for guidance and open an issue if you need help.
|
||||
The `Differ` is far from perfect. It only helps to automatically apply "known to be good" fixes
|
||||
and gives the user a better interface to solve the other problems.
|
||||
But there will still be syntax errors left afterward. These must be fixed by hand.
|
@ -1,15 +1,19 @@
|
||||
<!--
|
||||
Copyright © 2022 Rot127 <unisono@quyllur.org>
|
||||
Copyright © 2024 2022 Rot127 <unisono@quyllur.org>
|
||||
SPDX-License-Identifier: BSD-3
|
||||
-->
|
||||
|
||||
# Architecture updater
|
||||
# Architecture updater - Auto-Sync
|
||||
|
||||
This is Capstones updater for some architectures.
|
||||
Unfortunately not all architectures are supported yet.
|
||||
`auto-sync` is the architecture update tool for Capstone.
|
||||
Because the architecture modules of Capstone use mostly code from LLVM,
|
||||
we need to update this part with every LLVM release. `auto-sync` helps
|
||||
with this synchronization between LLVM and Capstone's modules by
|
||||
automating most of it.
|
||||
|
||||
## Install dependencies
|
||||
Please refer to [intro.md](intro.md) for an introduction about this tool.
|
||||
|
||||
## Install
|
||||
|
||||
Setup Python environment and Tree-sitter
|
||||
|
||||
@ -20,11 +24,25 @@ sudo apt install python3-venv
|
||||
# Setup virtual environment in Capstone root dir
|
||||
python3 -m venv ./.venv
|
||||
source ./.venv/bin/activate
|
||||
```
|
||||
|
||||
Install Auto-Sync framework
|
||||
|
||||
```
|
||||
cd suite/auto-sync/
|
||||
pip install -e .
|
||||
```
|
||||
|
||||
## Update
|
||||
## Architecture
|
||||
|
||||
Please read [ARCHITECTURE.md](https://github.com/capstone-engine/capstone/blob/next/docs/ARCHITECTURE.md) to understand how Auto-Sync works.
|
||||
|
||||
This step is essential! Please don't skip it.
|
||||
|
||||
## Update an architecture
|
||||
|
||||
Updating an architecture module to the newest LLVM release, is only possible if it uses Auto-Sync.
|
||||
Not all arch-modules support Auto-Sync yet.
|
||||
|
||||
Check if your architecture is supported.
|
||||
|
||||
@ -52,6 +70,14 @@ Run the updater
|
||||
./src/autosync/ASUpdater.py -a <ARCH>
|
||||
```
|
||||
|
||||
## Update procedure
|
||||
|
||||
1. Run the `ASUpdater.py` script.
|
||||
2. Compare the functions in `<ARCH>DisassemblerExtension.*` to LLVM (search the function names in the LLVM root)
|
||||
and update them if necessary.
|
||||
3. Try to build Capstone and fix the build errors.
|
||||
|
||||
|
||||
## Post-processing steps
|
||||
|
||||
This update translates some LLVM C++ files to C.
|
||||
@ -60,7 +86,7 @@ you will get build errors if you try to compile Capstone.
|
||||
|
||||
The last step to finish the update is to fix those build errors by hand.
|
||||
|
||||
## Developer
|
||||
## Additional details
|
||||
|
||||
### Overview updated files
|
||||
|
||||
@ -96,14 +122,7 @@ Those files are written by us:
|
||||
- `<ARCH>Mapping.*`: Binding code between the architecture module and the LLVM files. This is also where the detail is set.
|
||||
- `<ARCH>Module.*`: Interface to the Capstone core.
|
||||
|
||||
### Update procedure
|
||||
|
||||
1. Run the `ASUpdater.py` script.
|
||||
2. Compare the functions in `<ARCH>DisassemblerExtension.*` to LLVM (search the function names in the LLVM root)
|
||||
and update them if necessary.
|
||||
3. Try to build Capstone and fix the build errors.
|
||||
|
||||
### Update details
|
||||
### Relevant documentation and troubleshooting
|
||||
|
||||
**LLVM file translation**
|
||||
|
||||
@ -129,9 +148,66 @@ Documentation about the `.inc` file generation is in the [llvm-capstone](https:/
|
||||
|
||||
**Formatting**
|
||||
|
||||
- If you make changes to the `CppTranslator` please format the files with `black`
|
||||
- If you make changes to the `CppTranslator` please format the files with `black` and `usort`
|
||||
```
|
||||
source ./.venv/bin/activate
|
||||
pip3 install black
|
||||
python3 -m black --line-length=120 CppTranslator/*/*.py
|
||||
pip3 install black usort
|
||||
python3 -m usort format src/autosync
|
||||
python3 -m black src/autosync
|
||||
```
|
||||
|
||||
## Refactor an architecture for Auto-Sync framework
|
||||
|
||||
Not all architecture modules support Auto-Sync yet.
|
||||
Here is an overview of the steps to add support for it.
|
||||
|
||||
<hr>
|
||||
|
||||
To refactor one of them to use `auto-sync`, you need to add it to the configuration.
|
||||
|
||||
1. Add the architecture to the supported architectures list in `ASUpdater.py`.
|
||||
2. Configure the `CppTranslator` for your architecture (`suite/auto-sync/CppTranslator/arch_config.json`)
|
||||
|
||||
Now, manually run the update commands within `ASUpdater.py` but *skip* the `Differ` step:
|
||||
|
||||
```
|
||||
./Updater/ASUpdater.py -a <ARCH> -s IncGen Translate
|
||||
```
|
||||
|
||||
The task after this is to:
|
||||
|
||||
- Replace leftover C++ syntax with its C equivalent.
|
||||
- Implement the `add_cs_detail()` handler in `<ARCH>Mapping` for each operand type.
|
||||
- Edit the main header file of the architecture (`include/capstone/<ARCH>.h`) to include the generated enums (see below)
|
||||
- Add any missing logic to the translated files.
|
||||
- Make it build and write tests.
|
||||
- Run the Differ again and always select the old nodes.
|
||||
|
||||
**Notes:**
|
||||
|
||||
- Some generated enums must be included in the `include/capstone/<ARCH>.h` header.
|
||||
At the position where the enum should be inserted, add a comment like this (don't remove the `<>` brackets):
|
||||
|
||||
```
|
||||
// generate content <FILENAME.inc> begin
|
||||
// generate content <FILENAME.inc> end
|
||||
```
|
||||
|
||||
The update script will insert the content of the `.inc` file at this place.
|
||||
|
||||
- If you find yourself fixing the same syntax error multiple times,
|
||||
please consider adding a `Patch` to the `CppTranslator` for this case.
|
||||
|
||||
- Please check out the implementation of ARM's `add_cs_detail()` before implementing your own.
|
||||
|
||||
- Running the `Differ` after everything is done, preserves your version of syntax corrections, and the next user can auto-apply them.
|
||||
|
||||
- Sometimes the LLVM code uses a single function from a larger source file.
|
||||
It is not worth it to translate the whole file just for this function.
|
||||
Bundle those lonely functions in `<ARCH>DisassemblerExtension.c`.
|
||||
|
||||
## Adding a new architecture
|
||||
|
||||
Adding a new architecture follows the same steps as above. With the exception that you need
|
||||
to implement all the Capstone files from scratch.
|
||||
|
||||
Check out an `auto-sync` supporting architectures for guidance and open an issue if you need help.
|
||||
|
96
suite/auto-sync/intro.md
Normal file
96
suite/auto-sync/intro.md
Normal file
@ -0,0 +1,96 @@
|
||||
## Why the Auto-Sync framework?
|
||||
|
||||
Capstone provides a simple API to leverage the LLVM disassemblers, without
|
||||
having the big footprint of LLVM itself.
|
||||
|
||||
It does this by using a stripped down copy of LLVM disassemblers (one for each architecture)
|
||||
and provides a uniform API to them.
|
||||
|
||||
The actual disassembly task (bytes to asm-text and decoded operands) is completely done by
|
||||
the LLVM code.
|
||||
Capstone takes the disassembled instructions, adds details to them (operand read/write info etc.)
|
||||
and organizes them to a uniform structure (`cs_insn`, `cs_detail` etc.).
|
||||
These objects are then accessible from the API.
|
||||
|
||||
Capstone is in C and LLVM is in C++. So to use the disassembler modules of LLVM,
|
||||
Capstone effectively translates LLVM source files from C++ to C, without changing the semantics.
|
||||
One could also call it a "disassembler port".
|
||||
|
||||
Capstone supports multiple architectures. So whenever LLVM
|
||||
has a new release and adds more instructions, Capstone needs to update its modules as well.
|
||||
|
||||
In the past, the update procedure was done by hand and with some Python scripts.
|
||||
But the task was tedious and error-prone.
|
||||
|
||||
To ease the complicated update procedure, Auto-Sync comes in.
|
||||
|
||||
<hr>
|
||||
|
||||
## How LLVM disassemblers work
|
||||
|
||||
Because effectively use the LLVM disassembler logic, one must understand how they operate.
|
||||
|
||||
Each architecture is defined in a so-called `.td` file, that is, a "Target Description" file.
|
||||
Those files are a declarative description of an architecture.
|
||||
They are written in a Domain-Specific Language called [TableGen](https://llvm.org/docs/TableGen/).
|
||||
They contain instructions, registers, processor features, which instructions operands read and write and more information.
|
||||
|
||||
These files are consumed by "TableGen Backends". They parse and process them to generate C++ code.
|
||||
The generated code is for example: enums, decoding algorithms (for instructions and operands) or
|
||||
lookup tables for register names or alias.
|
||||
|
||||
Additionally, LLVM has handwritten files. They use the generated code to build the actual instruction classes
|
||||
and handle architecture specific edge cases.
|
||||
|
||||
Capstone uses both of those files. The generated ones as well as the handwritten ones.
|
||||
|
||||
## Overview of updating steps
|
||||
|
||||
An Auto-Sync update has multiple steps:
|
||||
|
||||
**(1)** Changes in the auto-generated C++ files are handled completely automatically,
|
||||
We have a LLVM fork with patched TableGen-backends, so they emit C code.
|
||||
|
||||
**(2)** Changes in LLVM's handwritten sources are handled semi-automatically.
|
||||
For each source file, we search C++ syntax and replace it with the equivalent C syntax.
|
||||
For this task we have the CppTranslator.
|
||||
|
||||
The end result is of course not perfectly valid C code.
|
||||
It is merely an intermediate file, which still has some C++ syntax in it.
|
||||
|
||||
Because this leftover syntax was likely already fixed in the equivalent C file currently in Capstone,
|
||||
we have a last step.
|
||||
The translated file is diffed with the corresponding old file in Capstone.
|
||||
|
||||
The `Differ` tool parses both files into an abstract syntax tree.
|
||||
From this AST it picks nodes with the same name and diffs them.
|
||||
The diff is given to the user, and they can decide which one to accept.
|
||||
|
||||
All choices are also recorded and automatically applied next time.
|
||||
|
||||
**Example**
|
||||
|
||||
> Suppose there is a file `ArchDisassembler.cpp` in LLVM.
|
||||
> Capstone has the C equivalent `ArchDisassembler.c`.
|
||||
>
|
||||
> Now LLVM has a new release, and there were several additions in `ArchDisassembler.cpp`.
|
||||
>
|
||||
> Auto-Sync will pass `ArchDisassembler.cpp` to the CppTranslator, which replaces most C++ syntax.
|
||||
> The result is an intermediate file `transl_ArchDisassembler.cpp`.
|
||||
>
|
||||
> The result is close to what we want (C code), but still contains invalid syntax.
|
||||
> Most of this syntax errors were fixed before. They must be, because the C file `ArchDisassemble.c`
|
||||
> is working fine.
|
||||
>
|
||||
> So the intermediate file `transl_ArchDisassebmler.cpp` is compared to the old `ArchDisassemble.c.
|
||||
> The Differ patches both files to an AST and automatically patches all nodes it can.
|
||||
>
|
||||
> Effectively automate most of the boring, mechanical work involved in fixing-up `transl_ArchDisassebmler.cpp`.
|
||||
> If something new came up, it asks the user for a decission.
|
||||
>
|
||||
> The result is saved to `ArchDisassembler.c`, which is now up-to-date with the newest LLVM release.
|
||||
>
|
||||
> In practice this file will still contain syntax errors. But not many, so they can easily be resolved.
|
||||
|
||||
**(3)** After (1) and (2), some changes in Capstone-only files follow.
|
||||
This step is manual work.
|
Loading…
Reference in New Issue
Block a user