mirror of
https://github.com/capstone-engine/capstone.git
synced 2024-11-30 08:50:42 +00:00
Restructure auto-sync docs to have them more contained (#2355)
* Restructure auto-sync docs to have them more contained in suite/auto-sync * Enhance Differ documentation * Fix link and emphasize importance of ARCHITECTURE.md * Add auto-syc intro.md document, based on @moste00 work * Be consistent with Auto-Sync naming and use python3
This commit is contained in:
parent
60d5b7ec2f
commit
03c41e1be4
@ -1,12 +1,9 @@
|
|||||||
# Auto-Sync
|
<!--
|
||||||
|
Copyright © 2022 Rot127 <unisono@quyllur.org>
|
||||||
|
SPDX-License-Identifier: BSD-3
|
||||||
|
-->
|
||||||
|
|
||||||
`auto-sync` is the architecture update tool for Capstone.
|
# Architecture of the Auto-Sync framework
|
||||||
Because the architecture modules of Capstone use mostly code from LLVM,
|
|
||||||
we need to update this part with every LLVM release. `auto-sync` helps
|
|
||||||
with this synchronization between LLVM and Capstone's modules by
|
|
||||||
automating most of it.
|
|
||||||
|
|
||||||
You can find it in `suite/auto-sync`.
|
|
||||||
|
|
||||||
This document is split into four parts.
|
This document is split into four parts.
|
||||||
|
|
||||||
@ -15,8 +12,8 @@ This document is split into four parts.
|
|||||||
3. Instructions how to refactor an architecture to use `auto-sync`.
|
3. Instructions how to refactor an architecture to use `auto-sync`.
|
||||||
4. Notes about how to add a new architecture to Capstone with `auto-sync`.
|
4. Notes about how to add a new architecture to Capstone with `auto-sync`.
|
||||||
|
|
||||||
Please read the section about architecture module design in
|
Please read the section about capstone module design in
|
||||||
[ARCHITECTURE.md](ARCHITECTURE.md) before proceeding.
|
[ARCHITECTURE.md](https://github.com/capstone-engine/capstone/blob/next/docs/ARCHITECTURE.md) before proceeding.
|
||||||
The architectural understanding is important for the following.
|
The architectural understanding is important for the following.
|
||||||
|
|
||||||
## Update procedure
|
## Update procedure
|
||||||
@ -98,102 +95,30 @@ _Note_: For details about this checkout `suite/auto-sync/CppTranslator/README.md
|
|||||||
Because the result of the `CppTranslator` is not perfect,
|
Because the result of the `CppTranslator` is not perfect,
|
||||||
we still have many syntax problems left.
|
we still have many syntax problems left.
|
||||||
|
|
||||||
Those need to be fixed by hand.
|
Those need to be fixed partially by hand.
|
||||||
|
|
||||||
|
**Differ**
|
||||||
|
|
||||||
In order to ease this process we run the `Differ` after the `CppTranslator`.
|
In order to ease this process we run the `Differ` after the `CppTranslator`.
|
||||||
|
|
||||||
The `Differ` parses each _translated_ file and the corresponding source file _currently_ used in Capstone.
|
The `Differ` compares our two versions of C files we have now.
|
||||||
It then compares specific nodes from the just translated file to the equivalent nodes in the old file.
|
One of them are the C files currently used by the architecture module.
|
||||||
|
On the other hand we have the translated C files. Those are still faulty and need to be fixed.
|
||||||
|
|
||||||
|
Most fixes are syntactical problems. Those were almost always resolved before, during the last update.
|
||||||
|
The `Differ` helps you to compare the files and let you select which version to accept.
|
||||||
|
|
||||||
|
Sometimes (not very often though), the newly translated C files contain important changes.
|
||||||
|
Most often though, the old files are already correct.
|
||||||
|
|
||||||
|
The `Differ` parses both files into an abstract syntax tree and compares certain nodes with the same name
|
||||||
|
(mostly functions).
|
||||||
|
|
||||||
The user can choose if she accepts the version from the translated file or the old file.
|
The user can choose if she accepts the version from the translated file or the old file.
|
||||||
This decision is saved for every node.
|
This decision is saved for every node.
|
||||||
If there exists a saved decision for a node, the previous decision automatically applied again.
|
If there exists a saved decision for two nodes, and the nodes did not change since the last time,
|
||||||
|
it applies the previous decision automatically again.
|
||||||
|
|
||||||
Every other syntax error must be solved manually.
|
The `Differ` is far from perfect. It only helps to automatically apply "known to be good" fixes
|
||||||
|
and gives the user a better interface to solve the other problems.
|
||||||
## Update an architecture
|
But there will still be syntax errors left afterward. These must be fixed by hand.
|
||||||
|
|
||||||
To update an architecture do the following:
|
|
||||||
|
|
||||||
Rebase `llvm-capstone` onto the new LLVM release (if not already done).
|
|
||||||
```
|
|
||||||
# 1. Clone Capstone's LLVM
|
|
||||||
git clone https://github.com/capstone-engine/llvm-capstone
|
|
||||||
cd llvm-capstone
|
|
||||||
git checkout auto-sync
|
|
||||||
|
|
||||||
# 2. Rebase onto the new LLVM release and resolve the conflicts.
|
|
||||||
|
|
||||||
# 3. Build tblgen
|
|
||||||
mkdir build
|
|
||||||
cd build
|
|
||||||
cmake -G Ninja -DLLVM_TARGETS_TO_BUILD=<ARCH> -DCMAKE_BUILD_TYPE=Debug ../llvm
|
|
||||||
cmake --build . --target llvm-tblgen --config Debug
|
|
||||||
|
|
||||||
# 4. Run the updater
|
|
||||||
cd ../../suite/auto-sync/
|
|
||||||
./Updater/ASUpdater.py -a <ARCH>
|
|
||||||
```
|
|
||||||
|
|
||||||
The update script will execute the steps described above and copy the new files to their directories.
|
|
||||||
|
|
||||||
Afterward try to build Capstone and fix any build errors left.
|
|
||||||
|
|
||||||
If new instructions or operands were added, add test cases for those
|
|
||||||
(recession tests for instructions are located in `suite/MC/`).
|
|
||||||
|
|
||||||
TODO: Operand and detail tests
|
|
||||||
<!--
|
|
||||||
TODO: Wait until `cstest` is rewritten and add description about operand testing.
|
|
||||||
Issue: https://github.com/capstone-engine/capstone/issues/1984
|
|
||||||
-->
|
|
||||||
|
|
||||||
## Refactor an architecture for `auto-sync`
|
|
||||||
|
|
||||||
To refactor an architecture to use `auto-sync`, you need to add it to the configuration.
|
|
||||||
|
|
||||||
1. Add the architecture to the supported architectures list in `ASUpdater.py`.
|
|
||||||
2. Configure the `CppTranslator` for your architecture (`suite/auto-sync/CppTranslator/arch_config.json`)
|
|
||||||
|
|
||||||
Now, manually run the update commands within `ASUpdater.py` but *skip* the `Differ` step:
|
|
||||||
|
|
||||||
```
|
|
||||||
./Updater/ASUpdater.py -a <ARCH> -s IncGen Translate
|
|
||||||
```
|
|
||||||
|
|
||||||
The task after this is to:
|
|
||||||
|
|
||||||
- Replace leftover C++ syntax with its C equivalent.
|
|
||||||
- Implement the `add_cs_detail()` handler in `<ARCH>Mapping` for each operand type.
|
|
||||||
- Add any missing logic to the translated files.
|
|
||||||
- Make it build and write tests.
|
|
||||||
- Run the Differ again and always select the old nodes.
|
|
||||||
|
|
||||||
**Notes:**
|
|
||||||
|
|
||||||
- If you find yourself fixing the same syntax error multiple times,
|
|
||||||
please consider adding a `Patch` to the `CppTranslator` for this case.
|
|
||||||
|
|
||||||
- Please check out the implementation of ARM's `add_cs_detail()` before implementing your own.
|
|
||||||
|
|
||||||
- Running the `Differ` after everything is done, preserves your version of syntax corrections, and the next user can auto-apply them.
|
|
||||||
|
|
||||||
- Sometimes the LLVM code uses a single function from a larger source file.
|
|
||||||
It is not worth it to translate the whole file just for this function.
|
|
||||||
Bundle those lonely functions in `<ARCH>DisassemblerExtension.c`.
|
|
||||||
|
|
||||||
- Some generated enums must be included in the `include/capstone/<ARCH>.h` header.
|
|
||||||
At the position where the enum should be inserted, add a comment like this (don't remove the `<>` brackets):
|
|
||||||
|
|
||||||
```
|
|
||||||
// generate content <FILENAME.inc> begin
|
|
||||||
// generate content <FILENAME.inc> end
|
|
||||||
```
|
|
||||||
|
|
||||||
The update script will insert the content of the `.inc` file at this place.
|
|
||||||
|
|
||||||
## Adding a new architecture
|
|
||||||
|
|
||||||
Adding a new architecture follows the same steps as above. With the exception that you need
|
|
||||||
to implement all the Capstone files from scratch.
|
|
||||||
|
|
||||||
Check out an `auto-sync` supporting architectures for guidance and open an issue if you need help.
|
|
@ -1,15 +1,19 @@
|
|||||||
<!--
|
<!--
|
||||||
Copyright © 2022 Rot127 <unisono@quyllur.org>
|
Copyright © 2022 Rot127 <unisono@quyllur.org>
|
||||||
Copyright © 2024 2022 Rot127 <unisono@quyllur.org>
|
|
||||||
SPDX-License-Identifier: BSD-3
|
SPDX-License-Identifier: BSD-3
|
||||||
-->
|
-->
|
||||||
|
|
||||||
# Architecture updater
|
# Architecture updater - Auto-Sync
|
||||||
|
|
||||||
This is Capstones updater for some architectures.
|
`auto-sync` is the architecture update tool for Capstone.
|
||||||
Unfortunately not all architectures are supported yet.
|
Because the architecture modules of Capstone use mostly code from LLVM,
|
||||||
|
we need to update this part with every LLVM release. `auto-sync` helps
|
||||||
|
with this synchronization between LLVM and Capstone's modules by
|
||||||
|
automating most of it.
|
||||||
|
|
||||||
## Install dependencies
|
Please refer to [intro.md](intro.md) for an introduction about this tool.
|
||||||
|
|
||||||
|
## Install
|
||||||
|
|
||||||
Setup Python environment and Tree-sitter
|
Setup Python environment and Tree-sitter
|
||||||
|
|
||||||
@ -20,11 +24,25 @@ sudo apt install python3-venv
|
|||||||
# Setup virtual environment in Capstone root dir
|
# Setup virtual environment in Capstone root dir
|
||||||
python3 -m venv ./.venv
|
python3 -m venv ./.venv
|
||||||
source ./.venv/bin/activate
|
source ./.venv/bin/activate
|
||||||
|
```
|
||||||
|
|
||||||
|
Install Auto-Sync framework
|
||||||
|
|
||||||
|
```
|
||||||
cd suite/auto-sync/
|
cd suite/auto-sync/
|
||||||
pip install -e .
|
pip install -e .
|
||||||
```
|
```
|
||||||
|
|
||||||
## Update
|
## Architecture
|
||||||
|
|
||||||
|
Please read [ARCHITECTURE.md](https://github.com/capstone-engine/capstone/blob/next/docs/ARCHITECTURE.md) to understand how Auto-Sync works.
|
||||||
|
|
||||||
|
This step is essential! Please don't skip it.
|
||||||
|
|
||||||
|
## Update an architecture
|
||||||
|
|
||||||
|
Updating an architecture module to the newest LLVM release, is only possible if it uses Auto-Sync.
|
||||||
|
Not all arch-modules support Auto-Sync yet.
|
||||||
|
|
||||||
Check if your architecture is supported.
|
Check if your architecture is supported.
|
||||||
|
|
||||||
@ -52,6 +70,14 @@ Run the updater
|
|||||||
./src/autosync/ASUpdater.py -a <ARCH>
|
./src/autosync/ASUpdater.py -a <ARCH>
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Update procedure
|
||||||
|
|
||||||
|
1. Run the `ASUpdater.py` script.
|
||||||
|
2. Compare the functions in `<ARCH>DisassemblerExtension.*` to LLVM (search the function names in the LLVM root)
|
||||||
|
and update them if necessary.
|
||||||
|
3. Try to build Capstone and fix the build errors.
|
||||||
|
|
||||||
|
|
||||||
## Post-processing steps
|
## Post-processing steps
|
||||||
|
|
||||||
This update translates some LLVM C++ files to C.
|
This update translates some LLVM C++ files to C.
|
||||||
@ -60,7 +86,7 @@ you will get build errors if you try to compile Capstone.
|
|||||||
|
|
||||||
The last step to finish the update is to fix those build errors by hand.
|
The last step to finish the update is to fix those build errors by hand.
|
||||||
|
|
||||||
## Developer
|
## Additional details
|
||||||
|
|
||||||
### Overview updated files
|
### Overview updated files
|
||||||
|
|
||||||
@ -96,14 +122,7 @@ Those files are written by us:
|
|||||||
- `<ARCH>Mapping.*`: Binding code between the architecture module and the LLVM files. This is also where the detail is set.
|
- `<ARCH>Mapping.*`: Binding code between the architecture module and the LLVM files. This is also where the detail is set.
|
||||||
- `<ARCH>Module.*`: Interface to the Capstone core.
|
- `<ARCH>Module.*`: Interface to the Capstone core.
|
||||||
|
|
||||||
### Update procedure
|
### Relevant documentation and troubleshooting
|
||||||
|
|
||||||
1. Run the `ASUpdater.py` script.
|
|
||||||
2. Compare the functions in `<ARCH>DisassemblerExtension.*` to LLVM (search the function names in the LLVM root)
|
|
||||||
and update them if necessary.
|
|
||||||
3. Try to build Capstone and fix the build errors.
|
|
||||||
|
|
||||||
### Update details
|
|
||||||
|
|
||||||
**LLVM file translation**
|
**LLVM file translation**
|
||||||
|
|
||||||
@ -129,9 +148,66 @@ Documentation about the `.inc` file generation is in the [llvm-capstone](https:/
|
|||||||
|
|
||||||
**Formatting**
|
**Formatting**
|
||||||
|
|
||||||
- If you make changes to the `CppTranslator` please format the files with `black`
|
- If you make changes to the `CppTranslator` please format the files with `black` and `usort`
|
||||||
```
|
```
|
||||||
source ./.venv/bin/activate
|
pip3 install black usort
|
||||||
pip3 install black
|
python3 -m usort format src/autosync
|
||||||
python3 -m black --line-length=120 CppTranslator/*/*.py
|
python3 -m black src/autosync
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Refactor an architecture for Auto-Sync framework
|
||||||
|
|
||||||
|
Not all architecture modules support Auto-Sync yet.
|
||||||
|
Here is an overview of the steps to add support for it.
|
||||||
|
|
||||||
|
<hr>
|
||||||
|
|
||||||
|
To refactor one of them to use `auto-sync`, you need to add it to the configuration.
|
||||||
|
|
||||||
|
1. Add the architecture to the supported architectures list in `ASUpdater.py`.
|
||||||
|
2. Configure the `CppTranslator` for your architecture (`suite/auto-sync/CppTranslator/arch_config.json`)
|
||||||
|
|
||||||
|
Now, manually run the update commands within `ASUpdater.py` but *skip* the `Differ` step:
|
||||||
|
|
||||||
|
```
|
||||||
|
./Updater/ASUpdater.py -a <ARCH> -s IncGen Translate
|
||||||
|
```
|
||||||
|
|
||||||
|
The task after this is to:
|
||||||
|
|
||||||
|
- Replace leftover C++ syntax with its C equivalent.
|
||||||
|
- Implement the `add_cs_detail()` handler in `<ARCH>Mapping` for each operand type.
|
||||||
|
- Edit the main header file of the architecture (`include/capstone/<ARCH>.h`) to include the generated enums (see below)
|
||||||
|
- Add any missing logic to the translated files.
|
||||||
|
- Make it build and write tests.
|
||||||
|
- Run the Differ again and always select the old nodes.
|
||||||
|
|
||||||
|
**Notes:**
|
||||||
|
|
||||||
|
- Some generated enums must be included in the `include/capstone/<ARCH>.h` header.
|
||||||
|
At the position where the enum should be inserted, add a comment like this (don't remove the `<>` brackets):
|
||||||
|
|
||||||
|
```
|
||||||
|
// generate content <FILENAME.inc> begin
|
||||||
|
// generate content <FILENAME.inc> end
|
||||||
|
```
|
||||||
|
|
||||||
|
The update script will insert the content of the `.inc` file at this place.
|
||||||
|
|
||||||
|
- If you find yourself fixing the same syntax error multiple times,
|
||||||
|
please consider adding a `Patch` to the `CppTranslator` for this case.
|
||||||
|
|
||||||
|
- Please check out the implementation of ARM's `add_cs_detail()` before implementing your own.
|
||||||
|
|
||||||
|
- Running the `Differ` after everything is done, preserves your version of syntax corrections, and the next user can auto-apply them.
|
||||||
|
|
||||||
|
- Sometimes the LLVM code uses a single function from a larger source file.
|
||||||
|
It is not worth it to translate the whole file just for this function.
|
||||||
|
Bundle those lonely functions in `<ARCH>DisassemblerExtension.c`.
|
||||||
|
|
||||||
|
## Adding a new architecture
|
||||||
|
|
||||||
|
Adding a new architecture follows the same steps as above. With the exception that you need
|
||||||
|
to implement all the Capstone files from scratch.
|
||||||
|
|
||||||
|
Check out an `auto-sync` supporting architectures for guidance and open an issue if you need help.
|
||||||
|
96
suite/auto-sync/intro.md
Normal file
96
suite/auto-sync/intro.md
Normal file
@ -0,0 +1,96 @@
|
|||||||
|
## Why the Auto-Sync framework?
|
||||||
|
|
||||||
|
Capstone provides a simple API to leverage the LLVM disassemblers, without
|
||||||
|
having the big footprint of LLVM itself.
|
||||||
|
|
||||||
|
It does this by using a stripped down copy of LLVM disassemblers (one for each architecture)
|
||||||
|
and provides a uniform API to them.
|
||||||
|
|
||||||
|
The actual disassembly task (bytes to asm-text and decoded operands) is completely done by
|
||||||
|
the LLVM code.
|
||||||
|
Capstone takes the disassembled instructions, adds details to them (operand read/write info etc.)
|
||||||
|
and organizes them to a uniform structure (`cs_insn`, `cs_detail` etc.).
|
||||||
|
These objects are then accessible from the API.
|
||||||
|
|
||||||
|
Capstone is in C and LLVM is in C++. So to use the disassembler modules of LLVM,
|
||||||
|
Capstone effectively translates LLVM source files from C++ to C, without changing the semantics.
|
||||||
|
One could also call it a "disassembler port".
|
||||||
|
|
||||||
|
Capstone supports multiple architectures. So whenever LLVM
|
||||||
|
has a new release and adds more instructions, Capstone needs to update its modules as well.
|
||||||
|
|
||||||
|
In the past, the update procedure was done by hand and with some Python scripts.
|
||||||
|
But the task was tedious and error-prone.
|
||||||
|
|
||||||
|
To ease the complicated update procedure, Auto-Sync comes in.
|
||||||
|
|
||||||
|
<hr>
|
||||||
|
|
||||||
|
## How LLVM disassemblers work
|
||||||
|
|
||||||
|
Because effectively use the LLVM disassembler logic, one must understand how they operate.
|
||||||
|
|
||||||
|
Each architecture is defined in a so-called `.td` file, that is, a "Target Description" file.
|
||||||
|
Those files are a declarative description of an architecture.
|
||||||
|
They are written in a Domain-Specific Language called [TableGen](https://llvm.org/docs/TableGen/).
|
||||||
|
They contain instructions, registers, processor features, which instructions operands read and write and more information.
|
||||||
|
|
||||||
|
These files are consumed by "TableGen Backends". They parse and process them to generate C++ code.
|
||||||
|
The generated code is for example: enums, decoding algorithms (for instructions and operands) or
|
||||||
|
lookup tables for register names or alias.
|
||||||
|
|
||||||
|
Additionally, LLVM has handwritten files. They use the generated code to build the actual instruction classes
|
||||||
|
and handle architecture specific edge cases.
|
||||||
|
|
||||||
|
Capstone uses both of those files. The generated ones as well as the handwritten ones.
|
||||||
|
|
||||||
|
## Overview of updating steps
|
||||||
|
|
||||||
|
An Auto-Sync update has multiple steps:
|
||||||
|
|
||||||
|
**(1)** Changes in the auto-generated C++ files are handled completely automatically,
|
||||||
|
We have a LLVM fork with patched TableGen-backends, so they emit C code.
|
||||||
|
|
||||||
|
**(2)** Changes in LLVM's handwritten sources are handled semi-automatically.
|
||||||
|
For each source file, we search C++ syntax and replace it with the equivalent C syntax.
|
||||||
|
For this task we have the CppTranslator.
|
||||||
|
|
||||||
|
The end result is of course not perfectly valid C code.
|
||||||
|
It is merely an intermediate file, which still has some C++ syntax in it.
|
||||||
|
|
||||||
|
Because this leftover syntax was likely already fixed in the equivalent C file currently in Capstone,
|
||||||
|
we have a last step.
|
||||||
|
The translated file is diffed with the corresponding old file in Capstone.
|
||||||
|
|
||||||
|
The `Differ` tool parses both files into an abstract syntax tree.
|
||||||
|
From this AST it picks nodes with the same name and diffs them.
|
||||||
|
The diff is given to the user, and they can decide which one to accept.
|
||||||
|
|
||||||
|
All choices are also recorded and automatically applied next time.
|
||||||
|
|
||||||
|
**Example**
|
||||||
|
|
||||||
|
> Suppose there is a file `ArchDisassembler.cpp` in LLVM.
|
||||||
|
> Capstone has the C equivalent `ArchDisassembler.c`.
|
||||||
|
>
|
||||||
|
> Now LLVM has a new release, and there were several additions in `ArchDisassembler.cpp`.
|
||||||
|
>
|
||||||
|
> Auto-Sync will pass `ArchDisassembler.cpp` to the CppTranslator, which replaces most C++ syntax.
|
||||||
|
> The result is an intermediate file `transl_ArchDisassembler.cpp`.
|
||||||
|
>
|
||||||
|
> The result is close to what we want (C code), but still contains invalid syntax.
|
||||||
|
> Most of this syntax errors were fixed before. They must be, because the C file `ArchDisassemble.c`
|
||||||
|
> is working fine.
|
||||||
|
>
|
||||||
|
> So the intermediate file `transl_ArchDisassebmler.cpp` is compared to the old `ArchDisassemble.c.
|
||||||
|
> The Differ patches both files to an AST and automatically patches all nodes it can.
|
||||||
|
>
|
||||||
|
> Effectively automate most of the boring, mechanical work involved in fixing-up `transl_ArchDisassebmler.cpp`.
|
||||||
|
> If something new came up, it asks the user for a decission.
|
||||||
|
>
|
||||||
|
> The result is saved to `ArchDisassembler.c`, which is now up-to-date with the newest LLVM release.
|
||||||
|
>
|
||||||
|
> In practice this file will still contain syntax errors. But not many, so they can easily be resolved.
|
||||||
|
|
||||||
|
**(3)** After (1) and (2), some changes in Capstone-only files follow.
|
||||||
|
This step is manual work.
|
Loading…
Reference in New Issue
Block a user