mirror of
https://github.com/RPCS3/llvm.git
synced 2024-11-27 13:40:43 +00:00
e26c421c66
Summary: For SamplePGO, the profile may contain cross-module inline stacks. As we need to make sure the profile annotation happens when all the hot inline stacks are expanded, we need to pass this info to the module importer so that it can import proper functions if necessary. This patch implemented this feature by emitting cross-module targets as part of function entry metadata. In the module-summary phase, the metadata is used to build call edges that points to functions need to be imported. Reviewers: mehdi_amini, tejohnson Reviewed By: tejohnson Subscribers: davidxl, llvm-commits Differential Revision: https://reviews.llvm.org/D30053 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296498 91177308-0d34-0410-b5e6-96231b3b80d8
151 lines
4.4 KiB
ReStructuredText
151 lines
4.4 KiB
ReStructuredText
===========================
|
|
LLVM Branch Weight Metadata
|
|
===========================
|
|
|
|
.. contents::
|
|
:local:
|
|
|
|
Introduction
|
|
============
|
|
|
|
Branch Weight Metadata represents branch weights as its likeliness to be taken
|
|
(see :doc:`BlockFrequencyTerminology`). Metadata is assigned to the
|
|
``TerminatorInst`` as a ``MDNode`` of the ``MD_prof`` kind. The first operator
|
|
is always a ``MDString`` node with the string "branch_weights". Number of
|
|
operators depends on the terminator type.
|
|
|
|
Branch weights might be fetch from the profiling file, or generated based on
|
|
`__builtin_expect`_ instruction.
|
|
|
|
All weights are represented as an unsigned 32-bit values, where higher value
|
|
indicates greater chance to be taken.
|
|
|
|
Supported Instructions
|
|
======================
|
|
|
|
``BranchInst``
|
|
^^^^^^^^^^^^^^
|
|
|
|
Metadata is only assigned to the conditional branches. There are two extra
|
|
operands for the true and the false branch.
|
|
|
|
.. code-block:: none
|
|
|
|
!0 = metadata !{
|
|
metadata !"branch_weights",
|
|
i32 <TRUE_BRANCH_WEIGHT>,
|
|
i32 <FALSE_BRANCH_WEIGHT>
|
|
}
|
|
|
|
``SwitchInst``
|
|
^^^^^^^^^^^^^^
|
|
|
|
Branch weights are assigned to every case (including the ``default`` case which
|
|
is always case #0).
|
|
|
|
.. code-block:: none
|
|
|
|
!0 = metadata !{
|
|
metadata !"branch_weights",
|
|
i32 <DEFAULT_BRANCH_WEIGHT>
|
|
[ , i32 <CASE_BRANCH_WEIGHT> ... ]
|
|
}
|
|
|
|
``IndirectBrInst``
|
|
^^^^^^^^^^^^^^^^^^
|
|
|
|
Branch weights are assigned to every destination.
|
|
|
|
.. code-block:: none
|
|
|
|
!0 = metadata !{
|
|
metadata !"branch_weights",
|
|
i32 <LABEL_BRANCH_WEIGHT>
|
|
[ , i32 <LABEL_BRANCH_WEIGHT> ... ]
|
|
}
|
|
|
|
Other
|
|
^^^^^
|
|
|
|
Other terminator instructions are not allowed to contain Branch Weight Metadata.
|
|
|
|
.. _\__builtin_expect:
|
|
|
|
Built-in ``expect`` Instructions
|
|
================================
|
|
|
|
``__builtin_expect(long exp, long c)`` instruction provides branch prediction
|
|
information. The return value is the value of ``exp``.
|
|
|
|
It is especially useful in conditional statements. Currently Clang supports two
|
|
conditional statements:
|
|
|
|
``if`` statement
|
|
^^^^^^^^^^^^^^^^
|
|
|
|
The ``exp`` parameter is the condition. The ``c`` parameter is the expected
|
|
comparison value. If it is equal to 1 (true), the condition is likely to be
|
|
true, in other case condition is likely to be false. For example:
|
|
|
|
.. code-block:: c++
|
|
|
|
if (__builtin_expect(x > 0, 1)) {
|
|
// This block is likely to be taken.
|
|
}
|
|
|
|
``switch`` statement
|
|
^^^^^^^^^^^^^^^^^^^^
|
|
|
|
The ``exp`` parameter is the value. The ``c`` parameter is the expected
|
|
value. If the expected value doesn't show on the cases list, the ``default``
|
|
case is assumed to be likely taken.
|
|
|
|
.. code-block:: c++
|
|
|
|
switch (__builtin_expect(x, 5)) {
|
|
default: break;
|
|
case 0: // ...
|
|
case 3: // ...
|
|
case 5: // This case is likely to be taken.
|
|
}
|
|
|
|
CFG Modifications
|
|
=================
|
|
|
|
Branch Weight Metatada is not proof against CFG changes. If terminator operands'
|
|
are changed some action should be taken. In other case some misoptimizations may
|
|
occur due to incorrect branch prediction information.
|
|
|
|
Function Entry Counts
|
|
=====================
|
|
|
|
To allow comparing different functions during inter-procedural analysis and
|
|
optimization, ``MD_prof`` nodes can also be assigned to a function definition.
|
|
The first operand is a string indicating the name of the associated counter.
|
|
|
|
Currently, one counter is supported: "function_entry_count". The second operand
|
|
is a 64-bit counter that indicates the number of times that this function was
|
|
invoked (in the case of instrumentation-based profiles). In the case of
|
|
sampling-based profiles, this operand is an approximation of how many times
|
|
the function was invoked.
|
|
|
|
For example, in the code below, the instrumentation for function foo()
|
|
indicates that it was called 2,590 times at runtime.
|
|
|
|
.. code-block:: llvm
|
|
|
|
define i32 @foo() !prof !1 {
|
|
ret i32 0
|
|
}
|
|
!1 = !{!"function_entry_count", i64 2590}
|
|
|
|
If "function_entry_count" has more than 2 operands, the later operands are
|
|
the GUID of the functions that needs to be imported by ThinLTO. This is only
|
|
set by sampling based profile. It is needed because the sampling based profile
|
|
was collected on a binary that had already imported and inlined these functions,
|
|
and we need to ensure the IR matches in the ThinLTO backends for profile
|
|
annotation. The reason why we cannot annotate this on the callsite is that it
|
|
can only goes down 1 level in the call chain. For the cases where
|
|
foo_in_a_cc()->bar_in_b_cc()->baz_in_c_cc(), we will need to go down 2 levels
|
|
in the call chain to import both bar_in_b_cc and baz_in_c_cc.
|