llvm-capstone/llvm/docs/PCSectionsMetadata.rst
Marco Elver bf9814b705 [SanitizerBinaryMetadata] Emit constants as ULEB128
Emit all constant integers produced by SanitizerBinaryMetadata as
ULEB128 to further reduce binary space used. Increasing the version is
not necessary given this change depends on (and will land) along with
the bump to v2.

To support this, the !pcsections metadata format is extended to allow
for per-section options, encoded in the first MD operator which must
always be a string and contain the section: "<section>!<options>".

Reviewed By: dvyukov

Differential Revision: https://reviews.llvm.org/D143484
2023-02-08 13:12:34 +01:00

129 lines
4.7 KiB
ReStructuredText

=========================
LLVM PC Sections Metadata
=========================
.. contents::
:local:
Introduction
============
PC Sections Metadata can be attached to instructions and functions, for which
addresses, viz. program counters (PCs), are to be emitted in specially encoded
binary sections. Metadata is assigned as an ``MDNode`` of the ``MD_pcsections``
(``!pcsections``) kind; the following section describes the metadata format.
Metadata Format
===============
An arbitrary number of interleaved ``MDString`` and constant operators can be
added, where a new ``MDString`` always denotes a section name, followed by an
arbitrary number of auxiliary constant data encoded along the PC of the
instruction or function. The first operator must be a ``MDString`` denoting the
first section.
.. code-block:: none
!0 = !{
!"<section#1>"
[ , !1 ... ]
[ !"<section#2">
[ , !2 ... ]
... ]
}
!1 = !{ iXX <aux-consts#1>, ... }
!2 = !{ iXX <aux-consts#2>, ... }
...
The occurrence of ``section#1``, ``section#2``, ..., ``section#N`` in the
metadata causes the backend to emit the PC for the associated instruction or
function to all named sections. For each emitted PC in a section #N, the
constants ``aux-consts#N`` in the tuple ``!N`` will be emitted after the PC.
Multiple tuples with constant data may be provided after a section name string
(e.g. ``!0 = !{"s1", !1, !2}``), and a single constant tuple may be reused for
different sections (e.g. ``!0 = !{"s1", !1, "s2", !1}``).
Binary Encoding
===============
*Instructions* result in emitting a single PC, and *functions* result in
emission of the start of the function and a 32-bit size. This is followed by
the auxiliary constants that followed the respective section name in the
``MD_pcsections`` metadata.
To avoid relocations in the final binary, each PC address stored at ``entry``
is a relative relocation, computed as ``pc - entry``. To decode, a user has to
compute ``entry + *entry``.
The size of each entry depends on the code model. With large and medium sized
code models, the entry size matches pointer size. For any smaller code model
the entry size is just 32 bits.
Encoding Options
----------------
Optional encoding options can be passed in the first ``MDString`` operator:
``<section>!<options>``. The following options are available:
* ``C`` -- Compress constant integers of size 2-8 bytes as ULEB128; this
includes the function size (but excludes the PC entry).
For example, ``foo!C`` will emit into section ``foo`` with all constants
encoded as ULEB128.
Guarantees on Code Generation
=============================
Attaching ``!pcsections`` metadata to LLVM IR instructions *shall not* affect
optimizations or code generation outside the requested PC sections.
While relying on LLVM IR metadata to request PC sections makes the above
guarantee relatively trivial, propagation of metadata through the optimization
and code generation pipeline has the following guarantees.
Metadata Propagation
--------------------
In general, LLVM *does not make any guarantees* about preserving IR metadata
(attached to an ``Instruction``) through IR transformations. When using PC
sections metadata, this guarantee is unchanged, and ``!pcsections`` metadata is
remains *optional* until lowering to machine IR (MIR).
Note for Code Generation
------------------------
As with other LLVM IR metadata, there are no requirements for LLVM IR
transformation passes to preserve ``!pcsections`` metadata, with the following
exceptions:
* The ``AtomicExpandPass`` shall preserve ``!pcsections`` metadata
according to the below rules 1-4.
When translating LLVM IR to MIR, the ``!pcsections`` metadata shall be copied
from the source ``Instruction`` to the target ``MachineInstr`` (set with
``MachineInstr::setPCSections()``). The instruction selectors and MIR
optimization passes shall preserve PC sections metadata as follows:
1. Replacements will preserve PC sections metadata of the replaced
instruction.
2. Duplications will preserve PC sections metadata of the copied
instruction.
3. Merging will preserve PC sections metadata of one of the two
instructions (no guarantee on which instruction's metadata is used).
4. Deletions will loose PC sections metadata.
This is similar to debug info, and the ``BuildMI()`` helper provides a
convenient way to propagate debug info and ``!pcsections`` metadata in the
``MIMetadata`` bundle.
Note for Metadata Users
-----------------------
Use cases for ``!pcsections`` metadata should either be fully tolerant to
missing metadata, or the passes inserting ``!pcsections`` metadata should run
*after* all LLVM IR optimization passes to preserve the metadata until being
translated to MIR.