mirror of
https://github.com/RPCSX/llvm.git
synced 2025-01-08 13:00:43 +00:00
9becdeed48
Summary: It was previously not possible for tools to use solely the stackmap information emitted to reconstruct the return addresses of callsites in the map, which is necessary to use the information to walk a stack. This patch adds per-function callsite counts when emitting the stackmap section in order to resolve the problem. Note that this slightly alters the stackmap format, so external tools parsing these maps will need to be updated. **Problem Details:** Records only store their offset from the beginning of the function they belong to. While these records and the functions are output in program order, it is not possible to determine where the end of one function's records are without the callsite count when processing the records to compute return addresses. Patch by Kavon Farvardin! Reviewers: atrick, ributzka, sanjoy Subscribers: nemanjai Differential Revision: https://reviews.llvm.org/D23487 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281532 91177308-0d34-0410-b5e6-96231b3b80d8
512 lines
20 KiB
ReStructuredText
512 lines
20 KiB
ReStructuredText
===================================
|
|
Stack maps and patch points in LLVM
|
|
===================================
|
|
|
|
.. contents::
|
|
:local:
|
|
:depth: 2
|
|
|
|
Definitions
|
|
===========
|
|
|
|
In this document we refer to the "runtime" collectively as all
|
|
components that serve as the LLVM client, including the LLVM IR
|
|
generator, object code consumer, and code patcher.
|
|
|
|
A stack map records the location of ``live values`` at a particular
|
|
instruction address. These ``live values`` do not refer to all the
|
|
LLVM values live across the stack map. Instead, they are only the
|
|
values that the runtime requires to be live at this point. For
|
|
example, they may be the values the runtime will need to resume
|
|
program execution at that point independent of the compiled function
|
|
containing the stack map.
|
|
|
|
LLVM emits stack map data into the object code within a designated
|
|
:ref:`stackmap-section`. This stack map data contains a record for
|
|
each stack map. The record stores the stack map's instruction address
|
|
and contains a entry for each mapped value. Each entry encodes a
|
|
value's location as a register, stack offset, or constant.
|
|
|
|
A patch point is an instruction address at which space is reserved for
|
|
patching a new instruction sequence at run time. Patch points look
|
|
much like calls to LLVM. They take arguments that follow a calling
|
|
convention and may return a value. They also imply stack map
|
|
generation, which allows the runtime to locate the patchpoint and
|
|
find the location of ``live values`` at that point.
|
|
|
|
Motivation
|
|
==========
|
|
|
|
This functionality is currently experimental but is potentially useful
|
|
in a variety of settings, the most obvious being a runtime (JIT)
|
|
compiler. Example applications of the patchpoint intrinsics are
|
|
implementing an inline call cache for polymorphic method dispatch or
|
|
optimizing the retrieval of properties in dynamically typed languages
|
|
such as JavaScript.
|
|
|
|
The intrinsics documented here are currently used by the JavaScript
|
|
compiler within the open source WebKit project, see the `FTL JIT
|
|
<https://trac.webkit.org/wiki/FTLJIT>`_, but they are designed to be
|
|
used whenever stack maps or code patching are needed. Because the
|
|
intrinsics have experimental status, compatibility across LLVM
|
|
releases is not guaranteed.
|
|
|
|
The stack map functionality described in this document is separate
|
|
from the functionality described in
|
|
:ref:`stack-map`. `GCFunctionMetadata` provides the location of
|
|
pointers into a collected heap captured by the `GCRoot` intrinsic,
|
|
which can also be considered a "stack map". Unlike the stack maps
|
|
defined above, the `GCFunctionMetadata` stack map interface does not
|
|
provide a way to associate live register values of arbitrary type with
|
|
an instruction address, nor does it specify a format for the resulting
|
|
stack map. The stack maps described here could potentially provide
|
|
richer information to a garbage collecting runtime, but that usage
|
|
will not be discussed in this document.
|
|
|
|
Intrinsics
|
|
==========
|
|
|
|
The following two kinds of intrinsics can be used to implement stack
|
|
maps and patch points: ``llvm.experimental.stackmap`` and
|
|
``llvm.experimental.patchpoint``. Both kinds of intrinsics generate a
|
|
stack map record, and they both allow some form of code patching. They
|
|
can be used independently (i.e. ``llvm.experimental.patchpoint``
|
|
implicitly generates a stack map without the need for an additional
|
|
call to ``llvm.experimental.stackmap``). The choice of which to use
|
|
depends on whether it is necessary to reserve space for code patching
|
|
and whether any of the intrinsic arguments should be lowered according
|
|
to calling conventions. ``llvm.experimental.stackmap`` does not
|
|
reserve any space, nor does it expect any call arguments. If the
|
|
runtime patches code at the stack map's address, it will destructively
|
|
overwrite the program text. This is unlike
|
|
``llvm.experimental.patchpoint``, which reserves space for in-place
|
|
patching without overwriting surrounding code. The
|
|
``llvm.experimental.patchpoint`` intrinsic also lowers a specified
|
|
number of arguments according to its calling convention. This allows
|
|
patched code to make in-place function calls without marshaling.
|
|
|
|
Each instance of one of these intrinsics generates a stack map record
|
|
in the :ref:`stackmap-section`. The record includes an ID, allowing
|
|
the runtime to uniquely identify the stack map, and the offset within
|
|
the code from the beginning of the enclosing function.
|
|
|
|
'``llvm.experimental.stackmap``' Intrinsic
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
Syntax:
|
|
"""""""
|
|
|
|
::
|
|
|
|
declare void
|
|
@llvm.experimental.stackmap(i64 <id>, i32 <numShadowBytes>, ...)
|
|
|
|
Overview:
|
|
"""""""""
|
|
|
|
The '``llvm.experimental.stackmap``' intrinsic records the location of
|
|
specified values in the stack map without generating any code.
|
|
|
|
Operands:
|
|
"""""""""
|
|
|
|
The first operand is an ID to be encoded within the stack map. The
|
|
second operand is the number of shadow bytes following the
|
|
intrinsic. The variable number of operands that follow are the ``live
|
|
values`` for which locations will be recorded in the stack map.
|
|
|
|
To use this intrinsic as a bare-bones stack map, with no code patching
|
|
support, the number of shadow bytes can be set to zero.
|
|
|
|
Semantics:
|
|
""""""""""
|
|
|
|
The stack map intrinsic generates no code in place, unless nops are
|
|
needed to cover its shadow (see below). However, its offset from
|
|
function entry is stored in the stack map. This is the relative
|
|
instruction address immediately following the instructions that
|
|
precede the stack map.
|
|
|
|
The stack map ID allows a runtime to locate the desired stack map
|
|
record. LLVM passes this ID through directly to the stack map
|
|
record without checking uniqueness.
|
|
|
|
LLVM guarantees a shadow of instructions following the stack map's
|
|
instruction offset during which neither the end of the basic block nor
|
|
another call to ``llvm.experimental.stackmap`` or
|
|
``llvm.experimental.patchpoint`` may occur. This allows the runtime to
|
|
patch the code at this point in response to an event triggered from
|
|
outside the code. The code for instructions following the stack map
|
|
may be emitted in the stack map's shadow, and these instructions may
|
|
be overwritten by destructive patching. Without shadow bytes, this
|
|
destructive patching could overwrite program text or data outside the
|
|
current function. We disallow overlapping stack map shadows so that
|
|
the runtime does not need to consider this corner case.
|
|
|
|
For example, a stack map with 8 byte shadow:
|
|
|
|
.. code-block:: llvm
|
|
|
|
call void @runtime()
|
|
call void (i64, i32, ...)* @llvm.experimental.stackmap(i64 77, i32 8,
|
|
i64* %ptr)
|
|
%val = load i64* %ptr
|
|
%add = add i64 %val, 3
|
|
ret i64 %add
|
|
|
|
May require one byte of nop-padding:
|
|
|
|
.. code-block:: none
|
|
|
|
0x00 callq _runtime
|
|
0x05 nop <--- stack map address
|
|
0x06 movq (%rdi), %rax
|
|
0x07 addq $3, %rax
|
|
0x0a popq %rdx
|
|
0x0b ret <---- end of 8-byte shadow
|
|
|
|
Now, if the runtime needs to invalidate the compiled code, it may
|
|
patch 8 bytes of code at the stack map's address at follows:
|
|
|
|
.. code-block:: none
|
|
|
|
0x00 callq _runtime
|
|
0x05 movl $0xffff, %rax <--- patched code at stack map address
|
|
0x0a callq *%rax <---- end of 8-byte shadow
|
|
|
|
This way, after the normal call to the runtime returns, the code will
|
|
execute a patched call to a special entry point that can rebuild a
|
|
stack frame from the values located by the stack map.
|
|
|
|
'``llvm.experimental.patchpoint.*``' Intrinsic
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
Syntax:
|
|
"""""""
|
|
|
|
::
|
|
|
|
declare void
|
|
@llvm.experimental.patchpoint.void(i64 <id>, i32 <numBytes>,
|
|
i8* <target>, i32 <numArgs>, ...)
|
|
declare i64
|
|
@llvm.experimental.patchpoint.i64(i64 <id>, i32 <numBytes>,
|
|
i8* <target>, i32 <numArgs>, ...)
|
|
|
|
Overview:
|
|
"""""""""
|
|
|
|
The '``llvm.experimental.patchpoint.*``' intrinsics creates a function
|
|
call to the specified ``<target>`` and records the location of specified
|
|
values in the stack map.
|
|
|
|
Operands:
|
|
"""""""""
|
|
|
|
The first operand is an ID, the second operand is the number of bytes
|
|
reserved for the patchable region, the third operand is the target
|
|
address of a function (optionally null), and the fourth operand
|
|
specifies how many of the following variable operands are considered
|
|
function call arguments. The remaining variable number of operands are
|
|
the ``live values`` for which locations will be recorded in the stack
|
|
map.
|
|
|
|
Semantics:
|
|
""""""""""
|
|
|
|
The patch point intrinsic generates a stack map. It also emits a
|
|
function call to the address specified by ``<target>`` if the address
|
|
is not a constant null. The function call and its arguments are
|
|
lowered according to the calling convention specified at the
|
|
intrinsic's callsite. Variants of the intrinsic with non-void return
|
|
type also return a value according to calling convention.
|
|
|
|
On PowerPC, note that ``<target>`` must be the ABI function pointer for the
|
|
intended target of the indirect call. Specifically, when compiling for the
|
|
ELF V1 ABI, ``<target>`` is the function-descriptor address normally used as
|
|
the C/C++ function-pointer representation.
|
|
|
|
Requesting zero patch point arguments is valid. In this case, all
|
|
variable operands are handled just like
|
|
``llvm.experimental.stackmap.*``. The difference is that space will
|
|
still be reserved for patching, a call will be emitted, and a return
|
|
value is allowed.
|
|
|
|
The location of the arguments are not normally recorded in the stack
|
|
map because they are already fixed by the calling convention. The
|
|
remaining ``live values`` will have their location recorded, which
|
|
could be a register, stack location, or constant. A special calling
|
|
convention has been introduced for use with stack maps, anyregcc,
|
|
which forces the arguments to be loaded into registers but allows
|
|
those register to be dynamically allocated. These argument registers
|
|
will have their register locations recorded in the stack map in
|
|
addition to the remaining ``live values``.
|
|
|
|
The patch point also emits nops to cover at least ``<numBytes>`` of
|
|
instruction encoding space. Hence, the client must ensure that
|
|
``<numBytes>`` is enough to encode a call to the target address on the
|
|
supported targets. If the call target is constant null, then there is
|
|
no minimum requirement. A zero-byte null target patchpoint is
|
|
valid.
|
|
|
|
The runtime may patch the code emitted for the patch point, including
|
|
the call sequence and nops. However, the runtime may not assume
|
|
anything about the code LLVM emits within the reserved space. Partial
|
|
patching is not allowed. The runtime must patch all reserved bytes,
|
|
padding with nops if necessary.
|
|
|
|
This example shows a patch point reserving 15 bytes, with one argument
|
|
in $rdi, and a return value in $rax per native calling convention:
|
|
|
|
.. code-block:: llvm
|
|
|
|
%target = inttoptr i64 -281474976710654 to i8*
|
|
%val = call i64 (i64, i32, ...)*
|
|
@llvm.experimental.patchpoint.i64(i64 78, i32 15,
|
|
i8* %target, i32 1, i64* %ptr)
|
|
%add = add i64 %val, 3
|
|
ret i64 %add
|
|
|
|
May generate:
|
|
|
|
.. code-block:: none
|
|
|
|
0x00 movabsq $0xffff000000000002, %r11 <--- patch point address
|
|
0x0a callq *%r11
|
|
0x0d nop
|
|
0x0e nop <--- end of reserved 15-bytes
|
|
0x0f addq $0x3, %rax
|
|
0x10 movl %rax, 8(%rsp)
|
|
|
|
Note that no stack map locations will be recorded. If the patched code
|
|
sequence does not need arguments fixed to specific calling convention
|
|
registers, then the ``anyregcc`` convention may be used:
|
|
|
|
.. code-block:: none
|
|
|
|
%val = call anyregcc @llvm.experimental.patchpoint(i64 78, i32 15,
|
|
i8* %target, i32 1,
|
|
i64* %ptr)
|
|
|
|
The stack map now indicates the location of the %ptr argument and
|
|
return value:
|
|
|
|
.. code-block:: none
|
|
|
|
Stack Map: ID=78, Loc0=%r9 Loc1=%r8
|
|
|
|
The patch code sequence may now use the argument that happened to be
|
|
allocated in %r8 and return a value allocated in %r9:
|
|
|
|
.. code-block:: none
|
|
|
|
0x00 movslq 4(%r8) %r9 <--- patched code at patch point address
|
|
0x03 nop
|
|
...
|
|
0x0e nop <--- end of reserved 15-bytes
|
|
0x0f addq $0x3, %r9
|
|
0x10 movl %r9, 8(%rsp)
|
|
|
|
.. _stackmap-format:
|
|
|
|
Stack Map Format
|
|
================
|
|
|
|
The existence of a stack map or patch point intrinsic within an LLVM
|
|
Module forces code emission to create a :ref:`stackmap-section`. The
|
|
format of this section follows:
|
|
|
|
.. code-block:: none
|
|
|
|
Header {
|
|
uint8 : Stack Map Version (current version is 2)
|
|
uint8 : Reserved (expected to be 0)
|
|
uint16 : Reserved (expected to be 0)
|
|
}
|
|
uint32 : NumFunctions
|
|
uint32 : NumConstants
|
|
uint32 : NumRecords
|
|
StkSizeRecord[NumFunctions] {
|
|
uint64 : Function Address
|
|
uint64 : Stack Size
|
|
uint64 : Record Count
|
|
}
|
|
Constants[NumConstants] {
|
|
uint64 : LargeConstant
|
|
}
|
|
StkMapRecord[NumRecords] {
|
|
uint64 : PatchPoint ID
|
|
uint32 : Instruction Offset
|
|
uint16 : Reserved (record flags)
|
|
uint16 : NumLocations
|
|
Location[NumLocations] {
|
|
uint8 : Register | Direct | Indirect | Constant | ConstantIndex
|
|
uint8 : Reserved (location flags)
|
|
uint16 : Dwarf RegNum
|
|
int32 : Offset or SmallConstant
|
|
}
|
|
uint16 : Padding
|
|
uint16 : NumLiveOuts
|
|
LiveOuts[NumLiveOuts]
|
|
uint16 : Dwarf RegNum
|
|
uint8 : Reserved
|
|
uint8 : Size in Bytes
|
|
}
|
|
uint32 : Padding (only if required to align to 8 byte)
|
|
}
|
|
|
|
The first byte of each location encodes a type that indicates how to
|
|
interpret the ``RegNum`` and ``Offset`` fields as follows:
|
|
|
|
======== ========== =================== ===========================
|
|
Encoding Type Value Description
|
|
-------- ---------- ------------------- ---------------------------
|
|
0x1 Register Reg Value in a register
|
|
0x2 Direct Reg + Offset Frame index value
|
|
0x3 Indirect [Reg + Offset] Spilled value
|
|
0x4 Constant Offset Small constant
|
|
0x5 ConstIndex Constants[Offset] Large constant
|
|
======== ========== =================== ===========================
|
|
|
|
In the common case, a value is available in a register, and the
|
|
``Offset`` field will be zero. Values spilled to the stack are encoded
|
|
as ``Indirect`` locations. The runtime must load those values from a
|
|
stack address, typically in the form ``[BP + Offset]``. If an
|
|
``alloca`` value is passed directly to a stack map intrinsic, then
|
|
LLVM may fold the frame index into the stack map as an optimization to
|
|
avoid allocating a register or stack slot. These frame indices will be
|
|
encoded as ``Direct`` locations in the form ``BP + Offset``. LLVM may
|
|
also optimize constants by emitting them directly in the stack map,
|
|
either in the ``Offset`` of a ``Constant`` location or in the constant
|
|
pool, referred to by ``ConstantIndex`` locations.
|
|
|
|
At each callsite, a "liveout" register list is also recorded. These
|
|
are the registers that are live across the stackmap and therefore must
|
|
be saved by the runtime. This is an important optimization when the
|
|
patchpoint intrinsic is used with a calling convention that by default
|
|
preserves most registers as callee-save.
|
|
|
|
Each entry in the liveout register list contains a DWARF register
|
|
number and size in bytes. The stackmap format deliberately omits
|
|
specific subregister information. Instead the runtime must interpret
|
|
this information conservatively. For example, if the stackmap reports
|
|
one byte at ``%rax``, then the value may be in either ``%al`` or
|
|
``%ah``. It doesn't matter in practice, because the runtime will
|
|
simply save ``%rax``. However, if the stackmap reports 16 bytes at
|
|
``%ymm0``, then the runtime can safely optimize by saving only
|
|
``%xmm0``.
|
|
|
|
The stack map format is a contract between an LLVM SVN revision and
|
|
the runtime. It is currently experimental and may change in the short
|
|
term, but minimizing the need to update the runtime is
|
|
important. Consequently, the stack map design is motivated by
|
|
simplicity and extensibility. Compactness of the representation is
|
|
secondary because the runtime is expected to parse the data
|
|
immediately after compiling a module and encode the information in its
|
|
own format. Since the runtime controls the allocation of sections, it
|
|
can reuse the same stack map space for multiple modules.
|
|
|
|
Stackmap support is currently only implemented for 64-bit
|
|
platforms. However, a 32-bit implementation should be able to use the
|
|
same format with an insignificant amount of wasted space.
|
|
|
|
.. _stackmap-section:
|
|
|
|
Stack Map Section
|
|
^^^^^^^^^^^^^^^^^
|
|
|
|
A JIT compiler can easily access this section by providing its own
|
|
memory manager via the LLVM C API
|
|
``LLVMCreateSimpleMCJITMemoryManager()``. When creating the memory
|
|
manager, the JIT provides a callback:
|
|
``LLVMMemoryManagerAllocateDataSectionCallback()``. When LLVM creates
|
|
this section, it invokes the callback and passes the section name. The
|
|
JIT can record the in-memory address of the section at this time and
|
|
later parse it to recover the stack map data.
|
|
|
|
On Darwin, the stack map section name is "__llvm_stackmaps". The
|
|
segment name is "__LLVM_STACKMAPS".
|
|
|
|
Stack Map Usage
|
|
===============
|
|
|
|
The stack map support described in this document can be used to
|
|
precisely determine the location of values at a specific position in
|
|
the code. LLVM does not maintain any mapping between those values and
|
|
any higher-level entity. The runtime must be able to interpret the
|
|
stack map record given only the ID, offset, and the order of the
|
|
locations, records, and functions, which LLVM preserves.
|
|
|
|
Note that this is quite different from the goal of debug information,
|
|
which is a best-effort attempt to track the location of named
|
|
variables at every instruction.
|
|
|
|
An important motivation for this design is to allow a runtime to
|
|
commandeer a stack frame when execution reaches an instruction address
|
|
associated with a stack map. The runtime must be able to rebuild a
|
|
stack frame and resume program execution using the information
|
|
provided by the stack map. For example, execution may resume in an
|
|
interpreter or a recompiled version of the same function.
|
|
|
|
This usage restricts LLVM optimization. Clearly, LLVM must not move
|
|
stores across a stack map. However, loads must also be handled
|
|
conservatively. If the load may trigger an exception, hoisting it
|
|
above a stack map could be invalid. For example, the runtime may
|
|
determine that a load is safe to execute without a type check given
|
|
the current state of the type system. If the type system changes while
|
|
some activation of the load's function exists on the stack, the load
|
|
becomes unsafe. The runtime can prevent subsequent execution of that
|
|
load by immediately patching any stack map location that lies between
|
|
the current call site and the load (typically, the runtime would
|
|
simply patch all stack map locations to invalidate the function). If
|
|
the compiler had hoisted the load above the stack map, then the
|
|
program could crash before the runtime could take back control.
|
|
|
|
To enforce these semantics, stackmap and patchpoint intrinsics are
|
|
considered to potentially read and write all memory. This may limit
|
|
optimization more than some clients desire. This limitation may be
|
|
avoided by marking the call site as "readonly". In the future we may
|
|
also allow meta-data to be added to the intrinsic call to express
|
|
aliasing, thereby allowing optimizations to hoist certain loads above
|
|
stack maps.
|
|
|
|
Direct Stack Map Entries
|
|
^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
As shown in :ref:`stackmap-section`, a Direct stack map location
|
|
records the address of frame index. This address is itself the value
|
|
that the runtime requested. This differs from Indirect locations,
|
|
which refer to a stack locations from which the requested values must
|
|
be loaded. Direct locations can communicate the address if an alloca,
|
|
while Indirect locations handle register spills.
|
|
|
|
For example:
|
|
|
|
.. code-block:: none
|
|
|
|
entry:
|
|
%a = alloca i64...
|
|
llvm.experimental.stackmap(i64 <ID>, i32 <shadowBytes>, i64* %a)
|
|
|
|
The runtime can determine this alloca's relative location on the
|
|
stack immediately after compilation, or at any time thereafter. This
|
|
differs from Register and Indirect locations, because the runtime can
|
|
only read the values in those locations when execution reaches the
|
|
instruction address of the stack map.
|
|
|
|
This functionality requires LLVM to treat entry-block allocas
|
|
specially when they are directly consumed by an intrinsics. (This is
|
|
the same requirement imposed by the llvm.gcroot intrinsic.) LLVM
|
|
transformations must not substitute the alloca with any intervening
|
|
value. This can be verified by the runtime simply by checking that the
|
|
stack map's location is a Direct location type.
|
|
|
|
|
|
Supported Architectures
|
|
=======================
|
|
|
|
Support for StackMap generation and the related intrinsics requires
|
|
some code for each backend. Today, only a subset of LLVM's backends
|
|
are supported. The currently supported architectures are X86_64,
|
|
PowerPC, and Aarch64.
|