mirror of
https://github.com/RPCS3/llvm-mirror.git
synced 2024-11-29 14:20:29 +00:00
[globalisel] Restructure the GlobalISel documentation
There's a couple minor deletions amongst this but 99% of it is just moving the documentation around to prepare the way for more meaningful changes.
This commit is contained in:
parent
728fa0079b
commit
c808a49b50
@ -83,7 +83,7 @@ A |LLVM IR fuzzer| aimed at finding bugs in instruction selection.
|
||||
|
||||
This fuzzer accepts flags after `ignore_remaining_args=1`. The flags match
|
||||
those of :doc:`llc <CommandGuide/llc>` and the triple is required. For example,
|
||||
the following command would fuzz AArch64 with :doc:`GlobalISel`:
|
||||
the following command would fuzz AArch64 with :doc:`GlobalISel/index`:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
|
@ -1,941 +0,0 @@
|
||||
============================
|
||||
Global Instruction Selection
|
||||
============================
|
||||
|
||||
.. contents::
|
||||
:local:
|
||||
:depth: 1
|
||||
|
||||
.. warning::
|
||||
This document is a work in progress. It reflects the current state of the
|
||||
implementation, as well as open design and implementation issues.
|
||||
|
||||
Introduction
|
||||
============
|
||||
|
||||
GlobalISel is a framework that provides a set of reusable passes and utilities
|
||||
for instruction selection --- translation from LLVM IR to target-specific
|
||||
Machine IR (MIR).
|
||||
|
||||
GlobalISel is intended to be a replacement for SelectionDAG and FastISel, to
|
||||
solve three major problems:
|
||||
|
||||
* **Performance** --- SelectionDAG introduces a dedicated intermediate
|
||||
representation, which has a compile-time cost.
|
||||
|
||||
GlobalISel directly operates on the post-isel representation used by the
|
||||
rest of the code generator, MIR.
|
||||
It does require extensions to that representation to support arbitrary
|
||||
incoming IR: :ref:`gmir`.
|
||||
|
||||
* **Granularity** --- SelectionDAG and FastISel operate on individual basic
|
||||
blocks, losing some global optimization opportunities.
|
||||
|
||||
GlobalISel operates on the whole function.
|
||||
|
||||
* **Modularity** --- SelectionDAG and FastISel are radically different and share
|
||||
very little code.
|
||||
|
||||
GlobalISel is built in a way that enables code reuse. For instance, both the
|
||||
optimized and fast selectors share the :ref:`pipeline`, and targets can
|
||||
configure that pipeline to better suit their needs.
|
||||
|
||||
|
||||
.. _gmir:
|
||||
|
||||
Generic Machine IR
|
||||
==================
|
||||
|
||||
Machine IR operates on physical registers, register classes, and (mostly)
|
||||
target-specific instructions.
|
||||
|
||||
To bridge the gap with LLVM IR, GlobalISel introduces "generic" extensions to
|
||||
Machine IR:
|
||||
|
||||
.. contents::
|
||||
:local:
|
||||
|
||||
``NOTE``:
|
||||
The generic MIR (GMIR) representation still contains references to IR
|
||||
constructs (such as ``GlobalValue``). Removing those should let us write more
|
||||
accurate tests, or delete IR after building the initial MIR. However, it is
|
||||
not part of the GlobalISel effort.
|
||||
|
||||
.. _gmir-instructions:
|
||||
|
||||
Generic Instructions
|
||||
--------------------
|
||||
|
||||
The main addition is support for pre-isel generic machine instructions (e.g.,
|
||||
``G_ADD``). Like other target-independent instructions (e.g., ``COPY`` or
|
||||
``PHI``), these are available on all targets.
|
||||
|
||||
``TODO``:
|
||||
While we're progressively adding instructions, one kind in particular exposes
|
||||
interesting problems: compares and how to represent condition codes.
|
||||
Some targets (x86, ARM) have generic comparisons setting multiple flags,
|
||||
which are then used by predicated variants.
|
||||
Others (IR) specify the predicate in the comparison and users just get a single
|
||||
bit. SelectionDAG uses SETCC/CONDBR vs BR_CC (and similar for select) to
|
||||
represent this.
|
||||
|
||||
The ``MachineIRBuilder`` class wraps the ``MachineInstrBuilder`` and provides
|
||||
a convenient way to create these generic instructions.
|
||||
|
||||
.. _gmir-gvregs:
|
||||
|
||||
Generic Virtual Registers
|
||||
-------------------------
|
||||
|
||||
Generic instructions operate on a new kind of register: "generic" virtual
|
||||
registers. As opposed to non-generic vregs, they are not assigned a Register
|
||||
Class. Instead, generic vregs have a :ref:`gmir-llt`, and can be assigned
|
||||
a :ref:`gmir-regbank`.
|
||||
|
||||
``MachineRegisterInfo`` tracks the same information that it does for
|
||||
non-generic vregs (e.g., use-def chains). Additionally, it also tracks the
|
||||
:ref:`gmir-llt` of the register, and, instead of the ``TargetRegisterClass``,
|
||||
its :ref:`gmir-regbank`, if any.
|
||||
|
||||
For simplicity, most generic instructions only accept generic vregs:
|
||||
|
||||
* instead of immediates, they use a gvreg defined by an instruction
|
||||
materializing the immediate value (see :ref:`irtranslator-constants`).
|
||||
* instead of physical register, they use a gvreg defined by a ``COPY``.
|
||||
|
||||
``NOTE``:
|
||||
We started with an alternative representation, where MRI tracks a size for
|
||||
each gvreg, and instructions have lists of types.
|
||||
That had two flaws: the type and size are redundant, and there was no generic
|
||||
way of getting a given operand's type (as there was no 1:1 mapping between
|
||||
instruction types and operands).
|
||||
We considered putting the type in some variant of MCInstrDesc instead:
|
||||
See `PR26576 <http://llvm.org/PR26576>`_: [GlobalISel] Generic MachineInstrs
|
||||
need a type but this increases the memory footprint of the related objects
|
||||
|
||||
.. _gmir-regbank:
|
||||
|
||||
Register Bank
|
||||
-------------
|
||||
|
||||
A Register Bank is a set of register classes defined by the target.
|
||||
A bank has a size, which is the maximum store size of all covered classes.
|
||||
|
||||
In general, cross-class copies inside a bank are expected to be cheaper than
|
||||
copies across banks. They are also coalesceable by the register coalescer,
|
||||
whereas cross-bank copies are not.
|
||||
|
||||
Also, equivalent operations can be performed on different banks using different
|
||||
instructions.
|
||||
|
||||
For example, X86 can be seen as having 3 main banks: general-purpose, x87, and
|
||||
vector (which could be further split into a bank per domain for single vs
|
||||
double precision instructions).
|
||||
|
||||
Register banks are described by a target-provided API,
|
||||
:ref:`RegisterBankInfo <api-registerbankinfo>`.
|
||||
|
||||
.. _gmir-llt:
|
||||
|
||||
Low Level Type
|
||||
--------------
|
||||
|
||||
Additionally, every generic virtual register has a type, represented by an
|
||||
instance of the ``LLT`` class.
|
||||
|
||||
Like ``EVT``/``MVT``/``Type``, it has no distinction between unsigned and signed
|
||||
integer types. Furthermore, it also has no distinction between integer and
|
||||
floating-point types: it mainly conveys absolutely necessary information, such
|
||||
as size and number of vector lanes:
|
||||
|
||||
* ``sN`` for scalars
|
||||
* ``pN`` for pointers
|
||||
* ``<N x sM>`` for vectors
|
||||
* ``unsized`` for labels, etc..
|
||||
|
||||
``LLT`` is intended to replace the usage of ``EVT`` in SelectionDAG.
|
||||
|
||||
Here are some LLT examples and their ``EVT`` and ``Type`` equivalents:
|
||||
|
||||
============= ========= ======================================
|
||||
LLT EVT IR Type
|
||||
============= ========= ======================================
|
||||
``s1`` ``i1`` ``i1``
|
||||
``s8`` ``i8`` ``i8``
|
||||
``s32`` ``i32`` ``i32``
|
||||
``s32`` ``f32`` ``float``
|
||||
``s17`` ``i17`` ``i17``
|
||||
``s16`` N/A ``{i8, i8}``
|
||||
``s32`` N/A ``[4 x i8]``
|
||||
``p0`` ``iPTR`` ``i8*``, ``i32*``, ``%opaque*``
|
||||
``p2`` ``iPTR`` ``i8 addrspace(2)*``
|
||||
``<4 x s32>`` ``v4f32`` ``<4 x float>``
|
||||
``s64`` ``v1f64`` ``<1 x double>``
|
||||
``<3 x s32>`` ``v3i32`` ``<3 x i32>``
|
||||
``unsized`` ``Other`` ``label``
|
||||
============= ========= ======================================
|
||||
|
||||
|
||||
Rationale: instructions already encode a specific interpretation of types
|
||||
(e.g., ``add`` vs. ``fadd``, or ``sdiv`` vs. ``udiv``). Also encoding that
|
||||
information in the type system requires introducing bitcast with no real
|
||||
advantage for the selector.
|
||||
|
||||
Pointer types are distinguished by address space. This matches IR, as opposed
|
||||
to SelectionDAG where address space is an attribute on operations.
|
||||
This representation better supports pointers having different sizes depending
|
||||
on their addressspace.
|
||||
|
||||
``NOTE``:
|
||||
Currently, LLT requires at least 2 elements in vectors, but some targets have
|
||||
the concept of a '1-element vector'. Representing them as their underlying
|
||||
scalar type is a nice simplification.
|
||||
|
||||
``TODO``:
|
||||
Currently, non-generic virtual registers, defined by non-pre-isel-generic
|
||||
instructions, cannot have a type, and thus cannot be used by a pre-isel generic
|
||||
instruction. Instead, they are given a type using a COPY. We could relax that
|
||||
and allow types on all vregs: this would reduce the number of MI required when
|
||||
emitting target-specific MIR early in the pipeline. This should purely be
|
||||
a compile-time optimization.
|
||||
|
||||
.. _pipeline:
|
||||
|
||||
Core Pipeline
|
||||
=============
|
||||
|
||||
There are four required passes, regardless of the optimization mode:
|
||||
|
||||
.. contents::
|
||||
:local:
|
||||
|
||||
Additional passes can then be inserted at higher optimization levels or for
|
||||
specific targets. For example, to match the current SelectionDAG set of
|
||||
transformations: MachineCSE and a better MachineCombiner between every pass.
|
||||
|
||||
``NOTE``:
|
||||
In theory, not all passes are always necessary.
|
||||
As an additional compile-time optimization, we could skip some of the passes by
|
||||
setting the relevant MachineFunction properties. For instance, if the
|
||||
IRTranslator did not encounter any illegal instruction, it would set the
|
||||
``legalized`` property to avoid running the :ref:`milegalizer`.
|
||||
Similarly, we considered specializing the IRTranslator per-target to directly
|
||||
emit target-specific MI.
|
||||
However, we instead decided to keep the core pipeline simple, and focus on
|
||||
minimizing the overhead of the passes in the no-op cases.
|
||||
|
||||
|
||||
.. _irtranslator:
|
||||
|
||||
IRTranslator
|
||||
------------
|
||||
|
||||
This pass translates the input LLVM IR ``Function`` to a GMIR
|
||||
``MachineFunction``.
|
||||
|
||||
``TODO``:
|
||||
This currently doesn't support the more complex instructions, in particular
|
||||
those involving control flow (``switch``, ``invoke``, ...).
|
||||
For ``switch`` in particular, we can initially use the ``LowerSwitch`` pass.
|
||||
|
||||
.. _api-calllowering:
|
||||
|
||||
API: CallLowering
|
||||
^^^^^^^^^^^^^^^^^
|
||||
|
||||
The ``IRTranslator`` (using the ``CallLowering`` target-provided utility) also
|
||||
implements the ABI's calling convention by lowering calls, returns, and
|
||||
arguments to the appropriate physical register usage and instruction sequences.
|
||||
|
||||
.. _irtranslator-aggregates:
|
||||
|
||||
Aggregates
|
||||
^^^^^^^^^^
|
||||
|
||||
Aggregates are lowered to a single scalar vreg.
|
||||
This differs from SelectionDAG's multiple vregs via ``GetValueVTs``.
|
||||
|
||||
``TODO``:
|
||||
As some of the bits are undef (padding), we should consider augmenting the
|
||||
representation with additional metadata (in effect, caching computeKnownBits
|
||||
information on vregs).
|
||||
See `PR26161 <http://llvm.org/PR26161>`_: [GlobalISel] Value to vreg during
|
||||
IR to MachineInstr translation for aggregate type
|
||||
|
||||
.. _irtranslator-constants:
|
||||
|
||||
Constant Lowering
|
||||
^^^^^^^^^^^^^^^^^
|
||||
|
||||
The ``IRTranslator`` lowers ``Constant`` operands into uses of gvregs defined
|
||||
by ``G_CONSTANT`` or ``G_FCONSTANT`` instructions.
|
||||
Currently, these instructions are always emitted in the entry basic block.
|
||||
In a ``MachineFunction``, each ``Constant`` is materialized by a single gvreg.
|
||||
|
||||
This is beneficial as it allows us to fold constants into immediate operands
|
||||
during :ref:`instructionselect`, while still avoiding redundant materializations
|
||||
for expensive non-foldable constants.
|
||||
However, this can lead to unnecessary spills and reloads in an -O0 pipeline, as
|
||||
these vregs can have long live ranges.
|
||||
|
||||
``TODO``:
|
||||
We're investigating better placement of these instructions, in fast and
|
||||
optimized modes.
|
||||
|
||||
|
||||
.. _milegalizer:
|
||||
|
||||
Legalizer
|
||||
---------
|
||||
|
||||
This pass transforms the generic machine instructions such that they are legal.
|
||||
|
||||
A legal instruction is defined as:
|
||||
|
||||
* **selectable** --- the target will later be able to select it to a
|
||||
target-specific (non-generic) instruction.
|
||||
|
||||
* operating on **vregs that can be loaded and stored** -- if necessary, the
|
||||
target can select a ``G_LOAD``/``G_STORE`` of each gvreg operand.
|
||||
|
||||
As opposed to SelectionDAG, there are no legalization phases. In particular,
|
||||
'type' and 'operation' legalization are not separate.
|
||||
|
||||
Legalization is iterative, and all state is contained in GMIR. To maintain the
|
||||
validity of the intermediate code, instructions are introduced:
|
||||
|
||||
* ``G_MERGE_VALUES`` --- concatenate multiple registers of the same
|
||||
size into a single wider register.
|
||||
|
||||
* ``G_UNMERGE_VALUES`` --- extract multiple registers of the same size
|
||||
from a single wider register.
|
||||
|
||||
* ``G_EXTRACT`` --- extract a simple register (as contiguous sequences of bits)
|
||||
from a single wider register.
|
||||
|
||||
As they are expected to be temporary byproducts of the legalization process,
|
||||
they are combined at the end of the :ref:`milegalizer` pass.
|
||||
If any remain, they are expected to always be selectable, using loads and stores
|
||||
if necessary.
|
||||
|
||||
The legality of an instruction may only depend on the instruction itself and
|
||||
must not depend on any context in which the instruction is used. However, after
|
||||
deciding that an instruction is not legal, using the context of the instruction
|
||||
to decide how to legalize the instruction is permitted. As an example, if we
|
||||
have a ``G_FOO`` instruction of the form::
|
||||
|
||||
%1:_(s32) = G_CONSTANT i32 1
|
||||
%2:_(s32) = G_FOO %0:_(s32), %1:_(s32)
|
||||
|
||||
it's impossible to say that G_FOO is legal iff %1 is a ``G_CONSTANT`` with
|
||||
value ``1``. However, the following::
|
||||
|
||||
%2:_(s32) = G_FOO %0:_(s32), i32 1
|
||||
|
||||
can say that it's legal iff operand 2 is an immediate with value ``1`` because
|
||||
that information is entirely contained within the single instruction.
|
||||
|
||||
.. _api-legalizerinfo:
|
||||
|
||||
API: LegalizerInfo
|
||||
^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The recommended [#legalizer-legacy-footnote]_ API looks like this::
|
||||
|
||||
getActionDefinitionsBuilder({G_ADD, G_SUB, G_MUL, G_AND, G_OR, G_XOR, G_SHL})
|
||||
.legalFor({s32, s64, v2s32, v4s32, v2s64})
|
||||
.clampScalar(0, s32, s64)
|
||||
.widenScalarToNextPow2(0)
|
||||
.clampNumElements(0, v2s32, v4s32)
|
||||
.clampNumElements(0, v2s64, v2s64)
|
||||
.moreElementsToNextPow2(0);
|
||||
|
||||
and describes a set of rules by which we can either declare an instruction legal
|
||||
or decide which action to take to make it more legal.
|
||||
|
||||
At the core of this ruleset is the ``LegalityQuery`` which describes the
|
||||
instruction. We use a description rather than the instruction to both allow other
|
||||
passes to determine legality without having to create an instruction and also to
|
||||
limit the information available to the predicates to that which is safe to rely
|
||||
on. Currently, the information available to the predicates that determine
|
||||
legality contains:
|
||||
|
||||
* The opcode for the instruction
|
||||
|
||||
* The type of each type index (see ``type0``, ``type1``, etc.)
|
||||
|
||||
* The size in bytes and atomic ordering for each MachineMemOperand
|
||||
|
||||
Rule Processing and Declaring Rules
|
||||
"""""""""""""""""""""""""""""""""""
|
||||
|
||||
The ``getActionDefinitionsBuilder`` function generates a ruleset for the given
|
||||
opcode(s) that rules can be added to. If multiple opcodes are given, they are
|
||||
all permanently bound to the same ruleset. The rules in a ruleset are executed
|
||||
from top to bottom and will start again from the top if an instruction is
|
||||
legalized as a result of the rules. If the ruleset is exhausted without
|
||||
satisfying any rule, then it is considered unsupported.
|
||||
|
||||
When it doesn't declare the instruction legal, each pass over the rules may
|
||||
request that one type changes to another type. Sometimes this can cause multiple
|
||||
types to change but we avoid this as much as possible as making multiple changes
|
||||
can make it difficult to avoid infinite loops where, for example, narrowing one
|
||||
type causes another to be too small and widening that type causes the first one
|
||||
to be too big.
|
||||
|
||||
In general, it's advisable to declare instructions legal as close to the top of
|
||||
the rule as possible and to place any expensive rules as low as possible. This
|
||||
helps with performance as testing for legality happens more often than
|
||||
legalization and legalization can require multiple passes over the rules.
|
||||
|
||||
As a concrete example, consider the rule::
|
||||
|
||||
getActionDefinitionsBuilder({G_ADD, G_SUB, G_MUL, G_AND, G_OR, G_XOR, G_SHL})
|
||||
.legalFor({s32, s64, v2s32, v4s32, v2s64})
|
||||
.clampScalar(0, s32, s64)
|
||||
.widenScalarToNextPow2(0);
|
||||
|
||||
and the instruction::
|
||||
|
||||
%2:_(s7) = G_ADD %0:_(s7), %1:_(s7)
|
||||
|
||||
this doesn't meet the predicate for the :ref:`.legalFor() <legalfor>` as ``s7``
|
||||
is not one of the listed types so it falls through to the
|
||||
:ref:`.clampScalar() <clampscalar>`. It does meet the predicate for this rule
|
||||
as the type is smaller than the ``s32`` and this rule instructs the legalizer
|
||||
to change type 0 to ``s32``. It then restarts from the top. This time it does
|
||||
satisfy ``.legalFor()`` and the resulting output is::
|
||||
|
||||
%3:_(s32) = G_ANYEXT %0:_(s7)
|
||||
%4:_(s32) = G_ANYEXT %1:_(s7)
|
||||
%5:_(s32) = G_ADD %3:_(s32), %4:_(s32)
|
||||
%2:_(s7) = G_TRUNC %5:_(s32)
|
||||
|
||||
where the ``G_ADD`` is legal and the other instructions are scheduled for
|
||||
processing by the legalizer.
|
||||
|
||||
Rule Actions
|
||||
""""""""""""
|
||||
|
||||
There are various rule factories that append rules to a ruleset but they have a
|
||||
few actions in common:
|
||||
|
||||
.. _legalfor:
|
||||
|
||||
* ``legalIf()``, ``legalFor()``, etc. declare an instruction to be legal if the
|
||||
predicate is satisfied.
|
||||
|
||||
* ``narrowScalarIf()``, ``narrowScalarFor()``, etc. declare an instruction to be illegal
|
||||
if the predicate is satisfied and indicates that narrowing the scalars in one
|
||||
of the types to a specific type would make it more legal. This action supports
|
||||
both scalars and vectors.
|
||||
|
||||
* ``widenScalarIf()``, ``widenScalarFor()``, etc. declare an instruction to be illegal
|
||||
if the predicate is satisfied and indicates that widening the scalars in one
|
||||
of the types to a specific type would make it more legal. This action supports
|
||||
both scalars and vectors.
|
||||
|
||||
* ``fewerElementsIf()``, ``fewerElementsFor()``, etc. declare an instruction to be
|
||||
illegal if the predicate is satisfied and indicates reducing the number of
|
||||
vector elements in one of the types to a specific type would make it more
|
||||
legal. This action supports vectors.
|
||||
|
||||
* ``moreElementsIf()``, ``moreElementsFor()``, etc. declare an instruction to be illegal
|
||||
if the predicate is satisfied and indicates increasing the number of vector
|
||||
elements in one of the types to a specific type would make it more legal.
|
||||
This action supports vectors.
|
||||
|
||||
* ``lowerIf()``, ``lowerFor()``, etc. declare an instruction to be illegal if the
|
||||
predicate is satisfied and indicates that replacing it with equivalent
|
||||
instruction(s) would make it more legal. Support for this action differs for
|
||||
each opcode.
|
||||
|
||||
* ``libcallIf()``, ``libcallFor()``, etc. declare an instruction to be illegal if the
|
||||
predicate is satisfied and indicates that replacing it with a libcall would
|
||||
make it more legal. Support for this action differs for
|
||||
each opcode.
|
||||
|
||||
* ``customIf()``, ``customFor()``, etc. declare an instruction to be illegal if the
|
||||
predicate is satisfied and indicates that the backend developer will supply
|
||||
a means of making it more legal.
|
||||
|
||||
* ``unsupportedIf()``, ``unsupportedFor()``, etc. declare an instruction to be illegal
|
||||
if the predicate is satisfied and indicates that there is no way to make it
|
||||
legal and the compiler should fail.
|
||||
|
||||
* ``fallback()`` falls back on an older API and should only be used while porting
|
||||
existing code from that API.
|
||||
|
||||
Rule Predicates
|
||||
"""""""""""""""
|
||||
|
||||
The rule factories also have predicates in common:
|
||||
|
||||
* ``legal()``, ``lower()``, etc. are always satisfied.
|
||||
|
||||
* ``legalIf()``, ``narrowScalarIf()``, etc. are satisfied if the user-supplied
|
||||
``LegalityPredicate`` function returns true. This predicate has access to the
|
||||
information in the ``LegalityQuery`` to make its decision.
|
||||
User-supplied predicates can also be combined using ``all(P0, P1, ...)``.
|
||||
|
||||
* ``legalFor()``, ``narrowScalarFor()``, etc. are satisfied if the type matches one in
|
||||
a given set of types. For example ``.legalFor({s16, s32})`` declares the
|
||||
instruction legal if type 0 is either s16 or s32. Additional versions for two
|
||||
and three type indices are generally available. For these, all the type
|
||||
indices considered together must match all the types in one of the tuples. So
|
||||
``.legalFor({{s16, s32}, {s32, s64}})`` will only accept ``{s16, s32}``, or
|
||||
``{s32, s64}`` but will not accept ``{s16, s64}``.
|
||||
|
||||
* ``legalForTypesWithMemSize()``, ``narrowScalarForTypesWithMemSize()``, etc. are
|
||||
similar to ``legalFor()``, ``narrowScalarFor()``, etc. but additionally require a
|
||||
MachineMemOperand to have a given size in each tuple.
|
||||
|
||||
* ``legalForCartesianProduct()``, ``narrowScalarForCartesianProduct()``, etc. are
|
||||
satisfied if each type index matches one element in each of the independent
|
||||
sets. So ``.legalForCartesianProduct({s16, s32}, {s32, s64})`` will accept
|
||||
``{s16, s32}``, ``{s16, s64}``, ``{s32, s32}``, and ``{s32, s64}``.
|
||||
|
||||
Composite Rules
|
||||
"""""""""""""""
|
||||
|
||||
There are some composite rules for common situations built out of the above facilities:
|
||||
|
||||
* ``widenScalarToNextPow2()`` is like ``widenScalarIf()`` but is satisfied iff the type
|
||||
size in bits is not a power of 2 and selects a target type that is the next
|
||||
largest power of 2.
|
||||
|
||||
.. _clampscalar:
|
||||
|
||||
* ``minScalar()`` is like ``widenScalarIf()`` but is satisfied iff the type
|
||||
size in bits is smaller than the given minimum and selects the minimum as the
|
||||
target type. Similarly, there is also a ``maxScalar()`` for the maximum and a
|
||||
``clampScalar()`` to do both at once.
|
||||
|
||||
* ``minScalarSameAs()`` is like ``minScalar()`` but the minimum is taken from another
|
||||
type index.
|
||||
|
||||
* ``moreElementsToNextMultiple()`` is like ``moreElementsToNextPow2()`` but is based on
|
||||
multiples of X rather than powers of 2.
|
||||
|
||||
Other Information
|
||||
"""""""""""""""""
|
||||
|
||||
``TODO``:
|
||||
An alternative worth investigating is to generalize the API to represent
|
||||
actions using ``std::function`` that implements the action, instead of explicit
|
||||
enum tokens (``Legal``, ``WidenScalar``, ...).
|
||||
|
||||
``TODO``:
|
||||
Moreover, we could use TableGen to initially infer legality of operation from
|
||||
existing patterns (as any pattern we can select is by definition legal).
|
||||
Expanding that to describe legalization actions is a much larger but
|
||||
potentially useful project.
|
||||
|
||||
.. rubric:: Footnotes
|
||||
|
||||
.. [#legalizer-legacy-footnote] An API is broadly similar to
|
||||
SelectionDAG/TargetLowering is available but is not recommended as a more
|
||||
powerful API is available.
|
||||
|
||||
.. _min-legalizerinfo:
|
||||
|
||||
Minimum Rule Set
|
||||
^^^^^^^^^^^^^^^^
|
||||
|
||||
GlobalISel's legalizer has a great deal of flexibility in how a given target
|
||||
shapes the GMIR that the rest of the backend must handle. However, there are
|
||||
a small number of requirements that all targets must meet.
|
||||
|
||||
Before discussing the minimum requirements, we'll need some terminology:
|
||||
|
||||
Producer Type Set
|
||||
The set of types which is the union of all possible types produced by at
|
||||
least one legal instruction.
|
||||
|
||||
Consumer Type Set
|
||||
The set of types which is the union of all possible types consumed by at
|
||||
least one legal instruction.
|
||||
|
||||
Both sets are often identical but there's no guarantee of that. For example,
|
||||
it's not uncommon to be unable to consume s64 but still be able to produce it
|
||||
for a few specific instructions.
|
||||
|
||||
Minimum Rules For Scalars
|
||||
"""""""""""""""""""""""""
|
||||
|
||||
* G_ANYEXT must be legal for all inputs from the producer type set and all larger
|
||||
outputs from the consumer type set.
|
||||
* G_TRUNC must be legal for all inputs from the producer type set and all
|
||||
smaller outputs from the consumer type set.
|
||||
|
||||
G_ANYEXT, and G_TRUNC have mandatory legality since the GMIR requires a means to
|
||||
connect operations with different type sizes. They are usually trivial to support
|
||||
since G_ANYEXT doesn't define the value of the additional bits and G_TRUNC is
|
||||
discarding bits. The other conversions can be lowered into G_ANYEXT/G_TRUNC
|
||||
with some additional operations that are subject to further legalization. For
|
||||
example, G_SEXT can lower to::
|
||||
|
||||
%1 = G_ANYEXT %0
|
||||
%2 = G_CONSTANT ...
|
||||
%3 = G_SHL %1, %2
|
||||
%4 = G_ASHR %3, %2
|
||||
|
||||
and the G_CONSTANT/G_SHL/G_ASHR can further lower to other operations or target
|
||||
instructions. Similarly, G_FPEXT has no legality requirement since it can lower
|
||||
to a G_ANYEXT followed by a target instruction.
|
||||
|
||||
G_MERGE_VALUES and G_UNMERGE_VALUES do not have legality requirements since the
|
||||
former can lower to G_ANYEXT and some other legalizable instructions, while the
|
||||
latter can lower to some legalizable instructions followed by G_TRUNC.
|
||||
|
||||
Minimum Legality For Vectors
|
||||
""""""""""""""""""""""""""""
|
||||
|
||||
Within the vector types, there aren't any defined conversions in LLVM IR as
|
||||
vectors are often converted by reinterpreting the bits or by decomposing the
|
||||
vector and reconstituting it as a different type. As such, G_BITCAST is the
|
||||
only operation to account for. We generally don't require that it's legal
|
||||
because it can usually be lowered to COPY (or to nothing using
|
||||
replaceAllUses()). However, there are situations where G_BITCAST is non-trivial
|
||||
(e.g. little-endian vectors of big-endian data such as on big-endian MIPS MSA and
|
||||
big-endian ARM NEON, see `_i_bitcast`). To account for this G_BITCAST must be
|
||||
legal for all type combinations that change the bit pattern in the value.
|
||||
|
||||
There are no legality requirements for G_BUILD_VECTOR, or G_BUILD_VECTOR_TRUNC
|
||||
since these can be handled by:
|
||||
* Declaring them legal.
|
||||
* Scalarizing them.
|
||||
* Lowering them to G_TRUNC+G_ANYEXT and some legalizable instructions.
|
||||
* Lowering them to target instructions which are legal by definition.
|
||||
|
||||
The same reasoning also allows G_UNMERGE_VALUES to lack legality requirements
|
||||
for vector inputs.
|
||||
|
||||
Minimum Legality for Pointers
|
||||
"""""""""""""""""""""""""""""
|
||||
|
||||
There are no minimum rules for pointers since G_INTTOPTR and G_PTRTOINT can
|
||||
be selected to a COPY from register class to another by the legalizer.
|
||||
|
||||
Minimum Legality For Operations
|
||||
"""""""""""""""""""""""""""""""
|
||||
|
||||
The rules for G_ANYEXT, G_MERGE_VALUES, G_BITCAST, G_BUILD_VECTOR,
|
||||
G_BUILD_VECTOR_TRUNC, G_CONCAT_VECTORS, G_UNMERGE_VALUES, G_PTRTOINT, and
|
||||
G_INTTOPTR have already been noted above. In addition to those, the following
|
||||
operations have requirements:
|
||||
|
||||
* At least one G_IMPLICIT_DEF must be legal. This is usually trivial as it
|
||||
requires no code to be selected.
|
||||
* G_PHI must be legal for all types in the producer and consumer typesets. This
|
||||
is usually trivial as it requires no code to be selected.
|
||||
* At least one G_FRAME_INDEX must be legal
|
||||
* At least one G_BLOCK_ADDR must be legal
|
||||
|
||||
There are many other operations you'd expect to have legality requirements but
|
||||
they can be lowered to target instructions which are legal by definition.
|
||||
|
||||
.. _regbankselect:
|
||||
|
||||
RegBankSelect
|
||||
-------------
|
||||
|
||||
This pass constrains the :ref:`gmir-gvregs` operands of generic
|
||||
instructions to some :ref:`gmir-regbank`.
|
||||
|
||||
It iteratively maps instructions to a set of per-operand bank assignment.
|
||||
The possible mappings are determined by the target-provided
|
||||
:ref:`RegisterBankInfo <api-registerbankinfo>`.
|
||||
The mapping is then applied, possibly introducing ``COPY`` instructions if
|
||||
necessary.
|
||||
|
||||
It traverses the ``MachineFunction`` top down so that all operands are already
|
||||
mapped when analyzing an instruction.
|
||||
|
||||
This pass could also remap target-specific instructions when beneficial.
|
||||
In the future, this could replace the ExeDepsFix pass, as we can directly
|
||||
select the best variant for an instruction that's available on multiple banks.
|
||||
|
||||
.. _api-registerbankinfo:
|
||||
|
||||
API: RegisterBankInfo
|
||||
^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The ``RegisterBankInfo`` class describes multiple aspects of register banks.
|
||||
|
||||
* **Banks**: ``addRegBankCoverage`` --- which register bank covers each
|
||||
register class.
|
||||
|
||||
* **Cross-Bank Copies**: ``copyCost`` --- the cost of a ``COPY`` from one bank
|
||||
to another.
|
||||
|
||||
* **Default Mapping**: ``getInstrMapping`` --- the default bank assignments for
|
||||
a given instruction.
|
||||
|
||||
* **Alternative Mapping**: ``getInstrAlternativeMapping`` --- the other
|
||||
possible bank assignments for a given instruction.
|
||||
|
||||
``TODO``:
|
||||
All this information should eventually be static and generated by TableGen,
|
||||
mostly using existing information augmented by bank descriptions.
|
||||
|
||||
``TODO``:
|
||||
``getInstrMapping`` is currently separate from ``getInstrAlternativeMapping``
|
||||
because the latter is more expensive: as we move to static mapping info,
|
||||
both methods should be free, and we should merge them.
|
||||
|
||||
.. _regbankselect-modes:
|
||||
|
||||
RegBankSelect Modes
|
||||
^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
``RegBankSelect`` currently has two modes:
|
||||
|
||||
* **Fast** --- For each instruction, pick a target-provided "default" bank
|
||||
assignment. This is the default at -O0.
|
||||
|
||||
* **Greedy** --- For each instruction, pick the cheapest of several
|
||||
target-provided bank assignment alternatives.
|
||||
|
||||
We intend to eventually introduce an additional optimizing mode:
|
||||
|
||||
* **Global** --- Across multiple instructions, pick the cheapest combination of
|
||||
bank assignments.
|
||||
|
||||
``NOTE``:
|
||||
On AArch64, we are considering using the Greedy mode even at -O0 (or perhaps at
|
||||
backend -O1): because :ref:`gmir-llt` doesn't distinguish floating point from
|
||||
integer scalars, the default assignment for loads and stores is the integer
|
||||
bank, introducing cross-bank copies on most floating point operations.
|
||||
|
||||
|
||||
.. _instructionselect:
|
||||
|
||||
InstructionSelect
|
||||
-----------------
|
||||
|
||||
This pass transforms generic machine instructions into equivalent
|
||||
target-specific instructions. It traverses the ``MachineFunction`` bottom-up,
|
||||
selecting uses before definitions, enabling trivial dead code elimination.
|
||||
|
||||
.. _api-instructionselector:
|
||||
|
||||
API: InstructionSelector
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The target implements the ``InstructionSelector`` class, containing the
|
||||
target-specific selection logic proper.
|
||||
|
||||
The instance is provided by the subtarget, so that it can specialize the
|
||||
selector by subtarget feature (with, e.g., a vector selector overriding parts
|
||||
of a general-purpose common selector).
|
||||
We might also want to parameterize it by MachineFunction, to enable selector
|
||||
variants based on function attributes like optsize.
|
||||
|
||||
The simple API consists of:
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
virtual bool select(MachineInstr &MI)
|
||||
|
||||
This target-provided method is responsible for mutating (or replacing) a
|
||||
possibly-generic MI into a fully target-specific equivalent.
|
||||
It is also responsible for doing the necessary constraining of gvregs into the
|
||||
appropriate register classes as well as passing through COPY instructions to
|
||||
the register allocator.
|
||||
|
||||
The ``InstructionSelector`` can fold other instructions into the selected MI,
|
||||
by walking the use-def chain of the vreg operands.
|
||||
As GlobalISel is Global, this folding can occur across basic blocks.
|
||||
|
||||
SelectionDAG Rule Imports
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
TableGen will import SelectionDAG rules and provide the following function to
|
||||
execute them:
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
bool selectImpl(MachineInstr &MI)
|
||||
|
||||
The ``--stats`` option can be used to determine what proportion of rules were
|
||||
successfully imported. The easiest way to use this is to copy the
|
||||
``-gen-globalisel`` tablegen command from ``ninja -v`` and modify it.
|
||||
|
||||
Similarly, the ``--warn-on-skipped-patterns`` option can be used to obtain the
|
||||
reasons that rules weren't imported. This can be used to focus on the most
|
||||
important rejection reasons.
|
||||
|
||||
PatLeaf Predicates
|
||||
^^^^^^^^^^^^^^^^^^
|
||||
|
||||
PatLeafs cannot be imported because their C++ is implemented in terms of
|
||||
``SDNode`` objects. PatLeafs that handle immediate predicates should be
|
||||
replaced by ``ImmLeaf``, ``IntImmLeaf``, or ``FPImmLeaf`` as appropriate.
|
||||
|
||||
There's no standard answer for other PatLeafs. Some standard predicates have
|
||||
been baked into TableGen but this should not generally be done.
|
||||
|
||||
Custom SDNodes
|
||||
^^^^^^^^^^^^^^
|
||||
|
||||
Custom SDNodes should be mapped to Target Pseudos using ``GINodeEquiv``. This
|
||||
will cause the instruction selector to import them but you will also need to
|
||||
ensure the target pseudo is introduced to the MIR before the instruction
|
||||
selector. Any preceding pass is suitable but the legalizer will be a
|
||||
particularly common choice.
|
||||
|
||||
ComplexPatterns
|
||||
^^^^^^^^^^^^^^^
|
||||
|
||||
ComplexPatterns cannot be imported because their C++ is implemented in terms of
|
||||
``SDNode`` objects. GlobalISel versions should be defined with
|
||||
``GIComplexOperandMatcher`` and mapped to ComplexPattern with
|
||||
``GIComplexPatternEquiv``.
|
||||
|
||||
The following predicates are useful for porting ComplexPattern:
|
||||
|
||||
* isBaseWithConstantOffset() - Check for base+offset structures
|
||||
* isOperandImmEqual() - Check for a particular constant
|
||||
* isObviouslySafeToFold() - Check for reasons an instruction can't be sunk and folded into another.
|
||||
|
||||
There are some important points for the C++ implementation:
|
||||
|
||||
* Don't modify MIR in the predicate
|
||||
* Renderer lambdas should capture by value to avoid use-after-free. They will be used after the predicate returns.
|
||||
* Only create instructions in a renderer lambda. GlobalISel won't clean up things you create but don't use.
|
||||
|
||||
|
||||
.. _maintainability:
|
||||
|
||||
Maintainability
|
||||
===============
|
||||
|
||||
.. _maintainability-iterative:
|
||||
|
||||
Iterative Transformations
|
||||
-------------------------
|
||||
|
||||
Passes are split into small, iterative transformations, with all state
|
||||
represented in the MIR.
|
||||
|
||||
This differs from SelectionDAG (in particular, the legalizer) using various
|
||||
in-memory side-tables.
|
||||
|
||||
|
||||
.. _maintainability-mir:
|
||||
|
||||
MIR Serialization
|
||||
-----------------
|
||||
|
||||
.. FIXME: Update the MIRLangRef to include GMI additions.
|
||||
|
||||
:ref:`gmir` is serializable (see :doc:`MIRLangRef`).
|
||||
Combined with :ref:`maintainability-iterative`, this enables much finer-grained
|
||||
testing, rather than requiring large and fragile IR-to-assembly tests.
|
||||
|
||||
The current "stage" in the :ref:`pipeline` is represented by a set of
|
||||
``MachineFunctionProperties``:
|
||||
|
||||
* ``legalized``
|
||||
* ``regBankSelected``
|
||||
* ``selected``
|
||||
|
||||
|
||||
.. _maintainability-verifier:
|
||||
|
||||
MachineVerifier
|
||||
---------------
|
||||
|
||||
The pass approach lets us use the ``MachineVerifier`` to enforce invariants.
|
||||
For instance, a ``regBankSelected`` function may not have gvregs without
|
||||
a bank.
|
||||
|
||||
``TODO``:
|
||||
The ``MachineVerifier`` being monolithic, some of the checks we want to do
|
||||
can't be integrated to it: GlobalISel is a separate library, so we can't
|
||||
directly reference it from CodeGen. For instance, legality checks are
|
||||
currently done in RegBankSelect/InstructionSelect proper. We could #ifdef out
|
||||
the checks, or we could add some sort of verifier API.
|
||||
|
||||
|
||||
.. _progress:
|
||||
|
||||
Progress and Future Work
|
||||
========================
|
||||
|
||||
The initial goal is to replace FastISel on AArch64. The next step will be to
|
||||
replace SelectionDAG as the optimized ISel.
|
||||
|
||||
``NOTE``:
|
||||
While we iterate on GlobalISel, we strive to avoid affecting the performance of
|
||||
SelectionDAG, FastISel, or the other MIR passes. For instance, the types of
|
||||
:ref:`gmir-gvregs` are stored in a separate table in ``MachineRegisterInfo``,
|
||||
that is destroyed after :ref:`instructionselect`.
|
||||
|
||||
.. _progress-fastisel:
|
||||
|
||||
FastISel Replacement
|
||||
--------------------
|
||||
|
||||
For the initial FastISel replacement, we intend to fallback to SelectionDAG on
|
||||
selection failures.
|
||||
|
||||
Currently, compile-time of the fast pipeline is within 1.5x of FastISel.
|
||||
We're optimistic we can get to within 1.1/1.2x, but beating FastISel will be
|
||||
challenging given the multi-pass approach.
|
||||
Still, supporting all IR (via a complete legalizer) and avoiding the fallback
|
||||
to SelectionDAG in the worst case should enable better amortized performance
|
||||
than SelectionDAG+FastISel.
|
||||
|
||||
``NOTE``:
|
||||
We considered never having a fallback to SelectionDAG, instead deciding early
|
||||
whether a given function is supported by GlobalISel or not. The decision would
|
||||
be based on :ref:`milegalizer` queries.
|
||||
We abandoned that for two reasons:
|
||||
a) on IR inputs, we'd need to basically simulate the :ref:`irtranslator`;
|
||||
b) to be robust against unforeseen failures and to enable iterative
|
||||
improvements.
|
||||
|
||||
.. _progress-targets:
|
||||
|
||||
Support For Other Targets
|
||||
-------------------------
|
||||
|
||||
In parallel, we're investigating adding support for other - ideally quite
|
||||
different - targets. For instance, there is some initial AMDGPU support.
|
||||
|
||||
|
||||
.. _porting:
|
||||
|
||||
Porting GlobalISel to A New Target
|
||||
==================================
|
||||
|
||||
There are four major classes to implement by the target:
|
||||
|
||||
* :ref:`CallLowering <api-calllowering>` --- lower calls, returns, and arguments
|
||||
according to the ABI.
|
||||
* :ref:`RegisterBankInfo <api-registerbankinfo>` --- describe
|
||||
:ref:`gmir-regbank` coverage, cross-bank copy cost, and the mapping of
|
||||
operands onto banks for each instruction.
|
||||
* :ref:`LegalizerInfo <api-legalizerinfo>` --- describe what is legal, and how
|
||||
to legalize what isn't.
|
||||
* :ref:`InstructionSelector <api-instructionselector>` --- select generic MIR
|
||||
to target-specific MIR.
|
||||
|
||||
Additionally:
|
||||
|
||||
* ``TargetPassConfig`` --- create the passes constituting the pipeline,
|
||||
including additional passes not included in the :ref:`pipeline`.
|
||||
|
||||
.. _other_resources:
|
||||
|
||||
Resources
|
||||
=========
|
||||
|
||||
* `Global Instruction Selection - A Proposal by Quentin Colombet @LLVMDevMeeting 2015 <https://www.youtube.com/watch?v=F6GGbYtae3g>`_
|
||||
* `Global Instruction Selection - Status by Quentin Colombet, Ahmed Bougacha, and Tim Northover @LLVMDevMeeting 2016 <https://www.youtube.com/watch?v=6tfb344A7w8>`_
|
||||
* `GlobalISel - LLVM's Latest Instruction Selection Framework by Diana Picus @FOSDEM17 <https://www.youtube.com/watch?v=d6dF6E4BPeU>`_
|
||||
* `GlobalISel: Past, Present, and Future by Quentin Colombet and Ahmed Bougacha @LLVMDevMeeting 2017 <https://www.llvm.org/devmtg/2017-10/#talk11>`_
|
||||
* `Head First into GlobalISel by Daniel Sanders, Aditya Nandakumar, and Justin Bogner @LLVMDevMeeting 2017 <https://www.llvm.org/devmtg/2017-10/#tutorial2>`_
|
||||
* `Generating Optimized Code with GlobalISel by Volkan Keles, Daniel Sanders @LLVMDevMeeting 2019 <https://www.llvm.org/devmtg/2019-10/talk-abstracts.html#keynote1>`_
|
158
docs/GlobalISel/GMIR.rst
Normal file
158
docs/GlobalISel/GMIR.rst
Normal file
@ -0,0 +1,158 @@
|
||||
.. _gmir:
|
||||
|
||||
Generic Machine IR
|
||||
==================
|
||||
|
||||
Machine IR operates on physical registers, register classes, and (mostly)
|
||||
target-specific instructions.
|
||||
|
||||
To bridge the gap with LLVM IR, GlobalISel introduces "generic" extensions to
|
||||
Machine IR:
|
||||
|
||||
.. contents::
|
||||
:local:
|
||||
|
||||
``NOTE``:
|
||||
The generic MIR (GMIR) representation still contains references to IR
|
||||
constructs (such as ``GlobalValue``). Removing those should let us write more
|
||||
accurate tests, or delete IR after building the initial MIR. However, it is
|
||||
not part of the GlobalISel effort.
|
||||
|
||||
.. _gmir-instructions:
|
||||
|
||||
Generic Instructions
|
||||
--------------------
|
||||
|
||||
The main addition is support for pre-isel generic machine instructions (e.g.,
|
||||
``G_ADD``). Like other target-independent instructions (e.g., ``COPY`` or
|
||||
``PHI``), these are available on all targets.
|
||||
|
||||
``TODO``:
|
||||
While we're progressively adding instructions, one kind in particular exposes
|
||||
interesting problems: compares and how to represent condition codes.
|
||||
Some targets (x86, ARM) have generic comparisons setting multiple flags,
|
||||
which are then used by predicated variants.
|
||||
Others (IR) specify the predicate in the comparison and users just get a single
|
||||
bit. SelectionDAG uses SETCC/CONDBR vs BR_CC (and similar for select) to
|
||||
represent this.
|
||||
|
||||
The ``MachineIRBuilder`` class wraps the ``MachineInstrBuilder`` and provides
|
||||
a convenient way to create these generic instructions.
|
||||
|
||||
.. _gmir-gvregs:
|
||||
|
||||
Generic Virtual Registers
|
||||
-------------------------
|
||||
|
||||
Generic instructions operate on a new kind of register: "generic" virtual
|
||||
registers. As opposed to non-generic vregs, they are not assigned a Register
|
||||
Class. Instead, generic vregs have a :ref:`gmir-llt`, and can be assigned
|
||||
a :ref:`gmir-regbank`.
|
||||
|
||||
``MachineRegisterInfo`` tracks the same information that it does for
|
||||
non-generic vregs (e.g., use-def chains). Additionally, it also tracks the
|
||||
:ref:`gmir-llt` of the register, and, instead of the ``TargetRegisterClass``,
|
||||
its :ref:`gmir-regbank`, if any.
|
||||
|
||||
For simplicity, most generic instructions only accept generic vregs:
|
||||
|
||||
* instead of immediates, they use a gvreg defined by an instruction
|
||||
materializing the immediate value (see :ref:`irtranslator-constants`).
|
||||
* instead of physical register, they use a gvreg defined by a ``COPY``.
|
||||
|
||||
``NOTE``:
|
||||
We started with an alternative representation, where MRI tracks a size for
|
||||
each gvreg, and instructions have lists of types.
|
||||
That had two flaws: the type and size are redundant, and there was no generic
|
||||
way of getting a given operand's type (as there was no 1:1 mapping between
|
||||
instruction types and operands).
|
||||
We considered putting the type in some variant of MCInstrDesc instead:
|
||||
See `PR26576 <http://llvm.org/PR26576>`_: [GlobalISel] Generic MachineInstrs
|
||||
need a type but this increases the memory footprint of the related objects
|
||||
|
||||
.. _gmir-regbank:
|
||||
|
||||
Register Bank
|
||||
-------------
|
||||
|
||||
A Register Bank is a set of register classes defined by the target.
|
||||
A bank has a size, which is the maximum store size of all covered classes.
|
||||
|
||||
In general, cross-class copies inside a bank are expected to be cheaper than
|
||||
copies across banks. They are also coalesceable by the register coalescer,
|
||||
whereas cross-bank copies are not.
|
||||
|
||||
Also, equivalent operations can be performed on different banks using different
|
||||
instructions.
|
||||
|
||||
For example, X86 can be seen as having 3 main banks: general-purpose, x87, and
|
||||
vector (which could be further split into a bank per domain for single vs
|
||||
double precision instructions).
|
||||
|
||||
Register banks are described by a target-provided API,
|
||||
:ref:`RegisterBankInfo <api-registerbankinfo>`.
|
||||
|
||||
.. _gmir-llt:
|
||||
|
||||
Low Level Type
|
||||
--------------
|
||||
|
||||
Additionally, every generic virtual register has a type, represented by an
|
||||
instance of the ``LLT`` class.
|
||||
|
||||
Like ``EVT``/``MVT``/``Type``, it has no distinction between unsigned and signed
|
||||
integer types. Furthermore, it also has no distinction between integer and
|
||||
floating-point types: it mainly conveys absolutely necessary information, such
|
||||
as size and number of vector lanes:
|
||||
|
||||
* ``sN`` for scalars
|
||||
* ``pN`` for pointers
|
||||
* ``<N x sM>`` for vectors
|
||||
* ``unsized`` for labels, etc..
|
||||
|
||||
``LLT`` is intended to replace the usage of ``EVT`` in SelectionDAG.
|
||||
|
||||
Here are some LLT examples and their ``EVT`` and ``Type`` equivalents:
|
||||
|
||||
============= ========= ======================================
|
||||
LLT EVT IR Type
|
||||
============= ========= ======================================
|
||||
``s1`` ``i1`` ``i1``
|
||||
``s8`` ``i8`` ``i8``
|
||||
``s32`` ``i32`` ``i32``
|
||||
``s32`` ``f32`` ``float``
|
||||
``s17`` ``i17`` ``i17``
|
||||
``s16`` N/A ``{i8, i8}``
|
||||
``s32`` N/A ``[4 x i8]``
|
||||
``p0`` ``iPTR`` ``i8*``, ``i32*``, ``%opaque*``
|
||||
``p2`` ``iPTR`` ``i8 addrspace(2)*``
|
||||
``<4 x s32>`` ``v4f32`` ``<4 x float>``
|
||||
``s64`` ``v1f64`` ``<1 x double>``
|
||||
``<3 x s32>`` ``v3i32`` ``<3 x i32>``
|
||||
``unsized`` ``Other`` ``label``
|
||||
============= ========= ======================================
|
||||
|
||||
|
||||
Rationale: instructions already encode a specific interpretation of types
|
||||
(e.g., ``add`` vs. ``fadd``, or ``sdiv`` vs. ``udiv``). Also encoding that
|
||||
information in the type system requires introducing bitcast with no real
|
||||
advantage for the selector.
|
||||
|
||||
Pointer types are distinguished by address space. This matches IR, as opposed
|
||||
to SelectionDAG where address space is an attribute on operations.
|
||||
This representation better supports pointers having different sizes depending
|
||||
on their addressspace.
|
||||
|
||||
``NOTE``:
|
||||
Currently, LLT requires at least 2 elements in vectors, but some targets have
|
||||
the concept of a '1-element vector'. Representing them as their underlying
|
||||
scalar type is a nice simplification.
|
||||
|
||||
``TODO``:
|
||||
Currently, non-generic virtual registers, defined by non-pre-isel-generic
|
||||
instructions, cannot have a type, and thus cannot be used by a pre-isel generic
|
||||
instruction. Instead, they are given a type using a COPY. We could relax that
|
||||
and allow types on all vregs: this would reduce the number of MI required when
|
||||
emitting target-specific MIR early in the pipeline. This should purely be
|
||||
a compile-time optimization.
|
||||
|
57
docs/GlobalISel/IRTranslator.rst
Normal file
57
docs/GlobalISel/IRTranslator.rst
Normal file
@ -0,0 +1,57 @@
|
||||
.. _irtranslator:
|
||||
|
||||
IRTranslator
|
||||
------------
|
||||
|
||||
This pass translates the input LLVM IR ``Function`` to a GMIR
|
||||
``MachineFunction``.
|
||||
|
||||
``TODO``:
|
||||
This currently doesn't support the more complex instructions, in particular
|
||||
those involving control flow (``switch``, ``invoke``, ...).
|
||||
For ``switch`` in particular, we can initially use the ``LowerSwitch`` pass.
|
||||
|
||||
.. _api-calllowering:
|
||||
|
||||
API: CallLowering
|
||||
^^^^^^^^^^^^^^^^^
|
||||
|
||||
The ``IRTranslator`` (using the ``CallLowering`` target-provided utility) also
|
||||
implements the ABI's calling convention by lowering calls, returns, and
|
||||
arguments to the appropriate physical register usage and instruction sequences.
|
||||
|
||||
.. _irtranslator-aggregates:
|
||||
|
||||
Aggregates
|
||||
^^^^^^^^^^
|
||||
|
||||
Aggregates are lowered to a single scalar vreg.
|
||||
This differs from SelectionDAG's multiple vregs via ``GetValueVTs``.
|
||||
|
||||
``TODO``:
|
||||
As some of the bits are undef (padding), we should consider augmenting the
|
||||
representation with additional metadata (in effect, caching computeKnownBits
|
||||
information on vregs).
|
||||
See `PR26161 <http://llvm.org/PR26161>`_: [GlobalISel] Value to vreg during
|
||||
IR to MachineInstr translation for aggregate type
|
||||
|
||||
.. _irtranslator-constants:
|
||||
|
||||
Constant Lowering
|
||||
^^^^^^^^^^^^^^^^^
|
||||
|
||||
The ``IRTranslator`` lowers ``Constant`` operands into uses of gvregs defined
|
||||
by ``G_CONSTANT`` or ``G_FCONSTANT`` instructions.
|
||||
Currently, these instructions are always emitted in the entry basic block.
|
||||
In a ``MachineFunction``, each ``Constant`` is materialized by a single gvreg.
|
||||
|
||||
This is beneficial as it allows us to fold constants into immediate operands
|
||||
during :ref:`instructionselect`, while still avoiding redundant materializations
|
||||
for expensive non-foldable constants.
|
||||
However, this can lead to unnecessary spills and reloads in an -O0 pipeline, as
|
||||
these vregs can have long live ranges.
|
||||
|
||||
``TODO``:
|
||||
We're investigating better placement of these instructions, in fast and
|
||||
optimized modes.
|
||||
|
98
docs/GlobalISel/InstructionSelect.rst
Normal file
98
docs/GlobalISel/InstructionSelect.rst
Normal file
@ -0,0 +1,98 @@
|
||||
|
||||
.. _instructionselect:
|
||||
|
||||
InstructionSelect
|
||||
-----------------
|
||||
|
||||
This pass transforms generic machine instructions into equivalent
|
||||
target-specific instructions. It traverses the ``MachineFunction`` bottom-up,
|
||||
selecting uses before definitions, enabling trivial dead code elimination.
|
||||
|
||||
.. _api-instructionselector:
|
||||
|
||||
API: InstructionSelector
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The target implements the ``InstructionSelector`` class, containing the
|
||||
target-specific selection logic proper.
|
||||
|
||||
The instance is provided by the subtarget, so that it can specialize the
|
||||
selector by subtarget feature (with, e.g., a vector selector overriding parts
|
||||
of a general-purpose common selector).
|
||||
We might also want to parameterize it by MachineFunction, to enable selector
|
||||
variants based on function attributes like optsize.
|
||||
|
||||
The simple API consists of:
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
virtual bool select(MachineInstr &MI)
|
||||
|
||||
This target-provided method is responsible for mutating (or replacing) a
|
||||
possibly-generic MI into a fully target-specific equivalent.
|
||||
It is also responsible for doing the necessary constraining of gvregs into the
|
||||
appropriate register classes as well as passing through COPY instructions to
|
||||
the register allocator.
|
||||
|
||||
The ``InstructionSelector`` can fold other instructions into the selected MI,
|
||||
by walking the use-def chain of the vreg operands.
|
||||
As GlobalISel is Global, this folding can occur across basic blocks.
|
||||
|
||||
SelectionDAG Rule Imports
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
TableGen will import SelectionDAG rules and provide the following function to
|
||||
execute them:
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
bool selectImpl(MachineInstr &MI)
|
||||
|
||||
The ``--stats`` option can be used to determine what proportion of rules were
|
||||
successfully imported. The easiest way to use this is to copy the
|
||||
``-gen-globalisel`` tablegen command from ``ninja -v`` and modify it.
|
||||
|
||||
Similarly, the ``--warn-on-skipped-patterns`` option can be used to obtain the
|
||||
reasons that rules weren't imported. This can be used to focus on the most
|
||||
important rejection reasons.
|
||||
|
||||
PatLeaf Predicates
|
||||
^^^^^^^^^^^^^^^^^^
|
||||
|
||||
PatLeafs cannot be imported because their C++ is implemented in terms of
|
||||
``SDNode`` objects. PatLeafs that handle immediate predicates should be
|
||||
replaced by ``ImmLeaf``, ``IntImmLeaf``, or ``FPImmLeaf`` as appropriate.
|
||||
|
||||
There's no standard answer for other PatLeafs. Some standard predicates have
|
||||
been baked into TableGen but this should not generally be done.
|
||||
|
||||
Custom SDNodes
|
||||
^^^^^^^^^^^^^^
|
||||
|
||||
Custom SDNodes should be mapped to Target Pseudos using ``GINodeEquiv``. This
|
||||
will cause the instruction selector to import them but you will also need to
|
||||
ensure the target pseudo is introduced to the MIR before the instruction
|
||||
selector. Any preceding pass is suitable but the legalizer will be a
|
||||
particularly common choice.
|
||||
|
||||
ComplexPatterns
|
||||
^^^^^^^^^^^^^^^
|
||||
|
||||
ComplexPatterns cannot be imported because their C++ is implemented in terms of
|
||||
``SDNode`` objects. GlobalISel versions should be defined with
|
||||
``GIComplexOperandMatcher`` and mapped to ComplexPattern with
|
||||
``GIComplexPatternEquiv``.
|
||||
|
||||
The following predicates are useful for porting ComplexPattern:
|
||||
|
||||
* isBaseWithConstantOffset() - Check for base+offset structures
|
||||
* isOperandImmEqual() - Check for a particular constant
|
||||
* isObviouslySafeToFold() - Check for reasons an instruction can't be sunk and folded into another.
|
||||
|
||||
There are some important points for the C++ implementation:
|
||||
|
||||
* Don't modify MIR in the predicate
|
||||
* Renderer lambdas should capture by value to avoid use-after-free. They will be used after the predicate returns.
|
||||
* Only create instructions in a renderer lambda. GlobalISel won't clean up things you create but don't use.
|
||||
|
||||
|
351
docs/GlobalISel/Legalizer.rst
Normal file
351
docs/GlobalISel/Legalizer.rst
Normal file
@ -0,0 +1,351 @@
|
||||
.. _milegalizer:
|
||||
|
||||
Legalizer
|
||||
---------
|
||||
|
||||
This pass transforms the generic machine instructions such that they are legal.
|
||||
|
||||
A legal instruction is defined as:
|
||||
|
||||
* **selectable** --- the target will later be able to select it to a
|
||||
target-specific (non-generic) instruction.
|
||||
|
||||
* operating on **vregs that can be loaded and stored** -- if necessary, the
|
||||
target can select a ``G_LOAD``/``G_STORE`` of each gvreg operand.
|
||||
|
||||
As opposed to SelectionDAG, there are no legalization phases. In particular,
|
||||
'type' and 'operation' legalization are not separate.
|
||||
|
||||
Legalization is iterative, and all state is contained in GMIR. To maintain the
|
||||
validity of the intermediate code, instructions are introduced:
|
||||
|
||||
* ``G_MERGE_VALUES`` --- concatenate multiple registers of the same
|
||||
size into a single wider register.
|
||||
|
||||
* ``G_UNMERGE_VALUES`` --- extract multiple registers of the same size
|
||||
from a single wider register.
|
||||
|
||||
* ``G_EXTRACT`` --- extract a simple register (as contiguous sequences of bits)
|
||||
from a single wider register.
|
||||
|
||||
As they are expected to be temporary byproducts of the legalization process,
|
||||
they are combined at the end of the :ref:`milegalizer` pass.
|
||||
If any remain, they are expected to always be selectable, using loads and stores
|
||||
if necessary.
|
||||
|
||||
The legality of an instruction may only depend on the instruction itself and
|
||||
must not depend on any context in which the instruction is used. However, after
|
||||
deciding that an instruction is not legal, using the context of the instruction
|
||||
to decide how to legalize the instruction is permitted. As an example, if we
|
||||
have a ``G_FOO`` instruction of the form::
|
||||
|
||||
%1:_(s32) = G_CONSTANT i32 1
|
||||
%2:_(s32) = G_FOO %0:_(s32), %1:_(s32)
|
||||
|
||||
it's impossible to say that G_FOO is legal iff %1 is a ``G_CONSTANT`` with
|
||||
value ``1``. However, the following::
|
||||
|
||||
%2:_(s32) = G_FOO %0:_(s32), i32 1
|
||||
|
||||
can say that it's legal iff operand 2 is an immediate with value ``1`` because
|
||||
that information is entirely contained within the single instruction.
|
||||
|
||||
.. _api-legalizerinfo:
|
||||
|
||||
API: LegalizerInfo
|
||||
^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The recommended [#legalizer-legacy-footnote]_ API looks like this::
|
||||
|
||||
getActionDefinitionsBuilder({G_ADD, G_SUB, G_MUL, G_AND, G_OR, G_XOR, G_SHL})
|
||||
.legalFor({s32, s64, v2s32, v4s32, v2s64})
|
||||
.clampScalar(0, s32, s64)
|
||||
.widenScalarToNextPow2(0)
|
||||
.clampNumElements(0, v2s32, v4s32)
|
||||
.clampNumElements(0, v2s64, v2s64)
|
||||
.moreElementsToNextPow2(0);
|
||||
|
||||
and describes a set of rules by which we can either declare an instruction legal
|
||||
or decide which action to take to make it more legal.
|
||||
|
||||
At the core of this ruleset is the ``LegalityQuery`` which describes the
|
||||
instruction. We use a description rather than the instruction to both allow other
|
||||
passes to determine legality without having to create an instruction and also to
|
||||
limit the information available to the predicates to that which is safe to rely
|
||||
on. Currently, the information available to the predicates that determine
|
||||
legality contains:
|
||||
|
||||
* The opcode for the instruction
|
||||
|
||||
* The type of each type index (see ``type0``, ``type1``, etc.)
|
||||
|
||||
* The size in bytes and atomic ordering for each MachineMemOperand
|
||||
|
||||
Rule Processing and Declaring Rules
|
||||
"""""""""""""""""""""""""""""""""""
|
||||
|
||||
The ``getActionDefinitionsBuilder`` function generates a ruleset for the given
|
||||
opcode(s) that rules can be added to. If multiple opcodes are given, they are
|
||||
all permanently bound to the same ruleset. The rules in a ruleset are executed
|
||||
from top to bottom and will start again from the top if an instruction is
|
||||
legalized as a result of the rules. If the ruleset is exhausted without
|
||||
satisfying any rule, then it is considered unsupported.
|
||||
|
||||
When it doesn't declare the instruction legal, each pass over the rules may
|
||||
request that one type changes to another type. Sometimes this can cause multiple
|
||||
types to change but we avoid this as much as possible as making multiple changes
|
||||
can make it difficult to avoid infinite loops where, for example, narrowing one
|
||||
type causes another to be too small and widening that type causes the first one
|
||||
to be too big.
|
||||
|
||||
In general, it's advisable to declare instructions legal as close to the top of
|
||||
the rule as possible and to place any expensive rules as low as possible. This
|
||||
helps with performance as testing for legality happens more often than
|
||||
legalization and legalization can require multiple passes over the rules.
|
||||
|
||||
As a concrete example, consider the rule::
|
||||
|
||||
getActionDefinitionsBuilder({G_ADD, G_SUB, G_MUL, G_AND, G_OR, G_XOR, G_SHL})
|
||||
.legalFor({s32, s64, v2s32, v4s32, v2s64})
|
||||
.clampScalar(0, s32, s64)
|
||||
.widenScalarToNextPow2(0);
|
||||
|
||||
and the instruction::
|
||||
|
||||
%2:_(s7) = G_ADD %0:_(s7), %1:_(s7)
|
||||
|
||||
this doesn't meet the predicate for the :ref:`.legalFor() <legalfor>` as ``s7``
|
||||
is not one of the listed types so it falls through to the
|
||||
:ref:`.clampScalar() <clampscalar>`. It does meet the predicate for this rule
|
||||
as the type is smaller than the ``s32`` and this rule instructs the legalizer
|
||||
to change type 0 to ``s32``. It then restarts from the top. This time it does
|
||||
satisfy ``.legalFor()`` and the resulting output is::
|
||||
|
||||
%3:_(s32) = G_ANYEXT %0:_(s7)
|
||||
%4:_(s32) = G_ANYEXT %1:_(s7)
|
||||
%5:_(s32) = G_ADD %3:_(s32), %4:_(s32)
|
||||
%2:_(s7) = G_TRUNC %5:_(s32)
|
||||
|
||||
where the ``G_ADD`` is legal and the other instructions are scheduled for
|
||||
processing by the legalizer.
|
||||
|
||||
Rule Actions
|
||||
""""""""""""
|
||||
|
||||
There are various rule factories that append rules to a ruleset but they have a
|
||||
few actions in common:
|
||||
|
||||
.. _legalfor:
|
||||
|
||||
* ``legalIf()``, ``legalFor()``, etc. declare an instruction to be legal if the
|
||||
predicate is satisfied.
|
||||
|
||||
* ``narrowScalarIf()``, ``narrowScalarFor()``, etc. declare an instruction to be illegal
|
||||
if the predicate is satisfied and indicates that narrowing the scalars in one
|
||||
of the types to a specific type would make it more legal. This action supports
|
||||
both scalars and vectors.
|
||||
|
||||
* ``widenScalarIf()``, ``widenScalarFor()``, etc. declare an instruction to be illegal
|
||||
if the predicate is satisfied and indicates that widening the scalars in one
|
||||
of the types to a specific type would make it more legal. This action supports
|
||||
both scalars and vectors.
|
||||
|
||||
* ``fewerElementsIf()``, ``fewerElementsFor()``, etc. declare an instruction to be
|
||||
illegal if the predicate is satisfied and indicates reducing the number of
|
||||
vector elements in one of the types to a specific type would make it more
|
||||
legal. This action supports vectors.
|
||||
|
||||
* ``moreElementsIf()``, ``moreElementsFor()``, etc. declare an instruction to be illegal
|
||||
if the predicate is satisfied and indicates increasing the number of vector
|
||||
elements in one of the types to a specific type would make it more legal.
|
||||
This action supports vectors.
|
||||
|
||||
* ``lowerIf()``, ``lowerFor()``, etc. declare an instruction to be illegal if the
|
||||
predicate is satisfied and indicates that replacing it with equivalent
|
||||
instruction(s) would make it more legal. Support for this action differs for
|
||||
each opcode.
|
||||
|
||||
* ``libcallIf()``, ``libcallFor()``, etc. declare an instruction to be illegal if the
|
||||
predicate is satisfied and indicates that replacing it with a libcall would
|
||||
make it more legal. Support for this action differs for
|
||||
each opcode.
|
||||
|
||||
* ``customIf()``, ``customFor()``, etc. declare an instruction to be illegal if the
|
||||
predicate is satisfied and indicates that the backend developer will supply
|
||||
a means of making it more legal.
|
||||
|
||||
* ``unsupportedIf()``, ``unsupportedFor()``, etc. declare an instruction to be illegal
|
||||
if the predicate is satisfied and indicates that there is no way to make it
|
||||
legal and the compiler should fail.
|
||||
|
||||
* ``fallback()`` falls back on an older API and should only be used while porting
|
||||
existing code from that API.
|
||||
|
||||
Rule Predicates
|
||||
"""""""""""""""
|
||||
|
||||
The rule factories also have predicates in common:
|
||||
|
||||
* ``legal()``, ``lower()``, etc. are always satisfied.
|
||||
|
||||
* ``legalIf()``, ``narrowScalarIf()``, etc. are satisfied if the user-supplied
|
||||
``LegalityPredicate`` function returns true. This predicate has access to the
|
||||
information in the ``LegalityQuery`` to make its decision.
|
||||
User-supplied predicates can also be combined using ``all(P0, P1, ...)``.
|
||||
|
||||
* ``legalFor()``, ``narrowScalarFor()``, etc. are satisfied if the type matches one in
|
||||
a given set of types. For example ``.legalFor({s16, s32})`` declares the
|
||||
instruction legal if type 0 is either s16 or s32. Additional versions for two
|
||||
and three type indices are generally available. For these, all the type
|
||||
indices considered together must match all the types in one of the tuples. So
|
||||
``.legalFor({{s16, s32}, {s32, s64}})`` will only accept ``{s16, s32}``, or
|
||||
``{s32, s64}`` but will not accept ``{s16, s64}``.
|
||||
|
||||
* ``legalForTypesWithMemSize()``, ``narrowScalarForTypesWithMemSize()``, etc. are
|
||||
similar to ``legalFor()``, ``narrowScalarFor()``, etc. but additionally require a
|
||||
MachineMemOperand to have a given size in each tuple.
|
||||
|
||||
* ``legalForCartesianProduct()``, ``narrowScalarForCartesianProduct()``, etc. are
|
||||
satisfied if each type index matches one element in each of the independent
|
||||
sets. So ``.legalForCartesianProduct({s16, s32}, {s32, s64})`` will accept
|
||||
``{s16, s32}``, ``{s16, s64}``, ``{s32, s32}``, and ``{s32, s64}``.
|
||||
|
||||
Composite Rules
|
||||
"""""""""""""""
|
||||
|
||||
There are some composite rules for common situations built out of the above facilities:
|
||||
|
||||
* ``widenScalarToNextPow2()`` is like ``widenScalarIf()`` but is satisfied iff the type
|
||||
size in bits is not a power of 2 and selects a target type that is the next
|
||||
largest power of 2.
|
||||
|
||||
.. _clampscalar:
|
||||
|
||||
* ``minScalar()`` is like ``widenScalarIf()`` but is satisfied iff the type
|
||||
size in bits is smaller than the given minimum and selects the minimum as the
|
||||
target type. Similarly, there is also a ``maxScalar()`` for the maximum and a
|
||||
``clampScalar()`` to do both at once.
|
||||
|
||||
* ``minScalarSameAs()`` is like ``minScalar()`` but the minimum is taken from another
|
||||
type index.
|
||||
|
||||
* ``moreElementsToNextMultiple()`` is like ``moreElementsToNextPow2()`` but is based on
|
||||
multiples of X rather than powers of 2.
|
||||
|
||||
Other Information
|
||||
"""""""""""""""""
|
||||
|
||||
``TODO``:
|
||||
An alternative worth investigating is to generalize the API to represent
|
||||
actions using ``std::function`` that implements the action, instead of explicit
|
||||
enum tokens (``Legal``, ``WidenScalar``, ...).
|
||||
|
||||
``TODO``:
|
||||
Moreover, we could use TableGen to initially infer legality of operation from
|
||||
existing patterns (as any pattern we can select is by definition legal).
|
||||
Expanding that to describe legalization actions is a much larger but
|
||||
potentially useful project.
|
||||
|
||||
.. rubric:: Footnotes
|
||||
|
||||
.. [#legalizer-legacy-footnote] An API is broadly similar to
|
||||
SelectionDAG/TargetLowering is available but is not recommended as a more
|
||||
powerful API is available.
|
||||
|
||||
.. _min-legalizerinfo:
|
||||
|
||||
Minimum Rule Set
|
||||
^^^^^^^^^^^^^^^^
|
||||
|
||||
GlobalISel's legalizer has a great deal of flexibility in how a given target
|
||||
shapes the GMIR that the rest of the backend must handle. However, there are
|
||||
a small number of requirements that all targets must meet.
|
||||
|
||||
Before discussing the minimum requirements, we'll need some terminology:
|
||||
|
||||
Producer Type Set
|
||||
The set of types which is the union of all possible types produced by at
|
||||
least one legal instruction.
|
||||
|
||||
Consumer Type Set
|
||||
The set of types which is the union of all possible types consumed by at
|
||||
least one legal instruction.
|
||||
|
||||
Both sets are often identical but there's no guarantee of that. For example,
|
||||
it's not uncommon to be unable to consume s64 but still be able to produce it
|
||||
for a few specific instructions.
|
||||
|
||||
Minimum Rules For Scalars
|
||||
"""""""""""""""""""""""""
|
||||
|
||||
* G_ANYEXT must be legal for all inputs from the producer type set and all larger
|
||||
outputs from the consumer type set.
|
||||
* G_TRUNC must be legal for all inputs from the producer type set and all
|
||||
smaller outputs from the consumer type set.
|
||||
|
||||
G_ANYEXT, and G_TRUNC have mandatory legality since the GMIR requires a means to
|
||||
connect operations with different type sizes. They are usually trivial to support
|
||||
since G_ANYEXT doesn't define the value of the additional bits and G_TRUNC is
|
||||
discarding bits. The other conversions can be lowered into G_ANYEXT/G_TRUNC
|
||||
with some additional operations that are subject to further legalization. For
|
||||
example, G_SEXT can lower to::
|
||||
|
||||
%1 = G_ANYEXT %0
|
||||
%2 = G_CONSTANT ...
|
||||
%3 = G_SHL %1, %2
|
||||
%4 = G_ASHR %3, %2
|
||||
|
||||
and the G_CONSTANT/G_SHL/G_ASHR can further lower to other operations or target
|
||||
instructions. Similarly, G_FPEXT has no legality requirement since it can lower
|
||||
to a G_ANYEXT followed by a target instruction.
|
||||
|
||||
G_MERGE_VALUES and G_UNMERGE_VALUES do not have legality requirements since the
|
||||
former can lower to G_ANYEXT and some other legalizable instructions, while the
|
||||
latter can lower to some legalizable instructions followed by G_TRUNC.
|
||||
|
||||
Minimum Legality For Vectors
|
||||
""""""""""""""""""""""""""""
|
||||
|
||||
Within the vector types, there aren't any defined conversions in LLVM IR as
|
||||
vectors are often converted by reinterpreting the bits or by decomposing the
|
||||
vector and reconstituting it as a different type. As such, G_BITCAST is the
|
||||
only operation to account for. We generally don't require that it's legal
|
||||
because it can usually be lowered to COPY (or to nothing using
|
||||
replaceAllUses()). However, there are situations where G_BITCAST is non-trivial
|
||||
(e.g. little-endian vectors of big-endian data such as on big-endian MIPS MSA and
|
||||
big-endian ARM NEON, see `_i_bitcast`). To account for this G_BITCAST must be
|
||||
legal for all type combinations that change the bit pattern in the value.
|
||||
|
||||
There are no legality requirements for G_BUILD_VECTOR, or G_BUILD_VECTOR_TRUNC
|
||||
since these can be handled by:
|
||||
* Declaring them legal.
|
||||
* Scalarizing them.
|
||||
* Lowering them to G_TRUNC+G_ANYEXT and some legalizable instructions.
|
||||
* Lowering them to target instructions which are legal by definition.
|
||||
|
||||
The same reasoning also allows G_UNMERGE_VALUES to lack legality requirements
|
||||
for vector inputs.
|
||||
|
||||
Minimum Legality for Pointers
|
||||
"""""""""""""""""""""""""""""
|
||||
|
||||
There are no minimum rules for pointers since G_INTTOPTR and G_PTRTOINT can
|
||||
be selected to a COPY from register class to another by the legalizer.
|
||||
|
||||
Minimum Legality For Operations
|
||||
"""""""""""""""""""""""""""""""
|
||||
|
||||
The rules for G_ANYEXT, G_MERGE_VALUES, G_BITCAST, G_BUILD_VECTOR,
|
||||
G_BUILD_VECTOR_TRUNC, G_CONCAT_VECTORS, G_UNMERGE_VALUES, G_PTRTOINT, and
|
||||
G_INTTOPTR have already been noted above. In addition to those, the following
|
||||
operations have requirements:
|
||||
|
||||
* At least one G_IMPLICIT_DEF must be legal. This is usually trivial as it
|
||||
requires no code to be selected.
|
||||
* G_PHI must be legal for all types in the producer and consumer typesets. This
|
||||
is usually trivial as it requires no code to be selected.
|
||||
* At least one G_FRAME_INDEX must be legal
|
||||
* At least one G_BLOCK_ADDR must be legal
|
||||
|
||||
There are many other operations you'd expect to have legality requirements but
|
||||
they can be lowered to target instructions which are legal by definition.
|
82
docs/GlobalISel/Pipeline.rst
Normal file
82
docs/GlobalISel/Pipeline.rst
Normal file
@ -0,0 +1,82 @@
|
||||
.. _pipeline:
|
||||
|
||||
Core Pipeline
|
||||
=============
|
||||
|
||||
There are four required passes, regardless of the optimization mode:
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
IRTranslator
|
||||
Legalizer
|
||||
RegBankSelect
|
||||
InstructionSelect
|
||||
|
||||
Additional passes can then be inserted at higher optimization levels or for
|
||||
specific targets. For example, to match the current SelectionDAG set of
|
||||
transformations: MachineCSE and a better MachineCombiner between every pass.
|
||||
|
||||
``NOTE``:
|
||||
In theory, not all passes are always necessary.
|
||||
As an additional compile-time optimization, we could skip some of the passes by
|
||||
setting the relevant MachineFunction properties. For instance, if the
|
||||
IRTranslator did not encounter any illegal instruction, it would set the
|
||||
``legalized`` property to avoid running the :ref:`milegalizer`.
|
||||
Similarly, we considered specializing the IRTranslator per-target to directly
|
||||
emit target-specific MI.
|
||||
However, we instead decided to keep the core pipeline simple, and focus on
|
||||
minimizing the overhead of the passes in the no-op cases.
|
||||
|
||||
.. _maintainability-verifier:
|
||||
|
||||
MachineVerifier
|
||||
---------------
|
||||
|
||||
The pass approach lets us use the ``MachineVerifier`` to enforce invariants.
|
||||
For instance, a ``regBankSelected`` function may not have gvregs without
|
||||
a bank.
|
||||
|
||||
``TODO``:
|
||||
The ``MachineVerifier`` being monolithic, some of the checks we want to do
|
||||
can't be integrated to it: GlobalISel is a separate library, so we can't
|
||||
directly reference it from CodeGen. For instance, legality checks are
|
||||
currently done in RegBankSelect/InstructionSelect proper. We could #ifdef out
|
||||
the checks, or we could add some sort of verifier API.
|
||||
|
||||
|
||||
.. _maintainability:
|
||||
|
||||
Maintainability
|
||||
===============
|
||||
|
||||
.. _maintainability-iterative:
|
||||
|
||||
Iterative Transformations
|
||||
-------------------------
|
||||
|
||||
Passes are split into small, iterative transformations, with all state
|
||||
represented in the MIR.
|
||||
|
||||
This differs from SelectionDAG (in particular, the legalizer) using various
|
||||
in-memory side-tables.
|
||||
|
||||
|
||||
.. _maintainability-mir:
|
||||
|
||||
MIR Serialization
|
||||
-----------------
|
||||
|
||||
.. FIXME: Update the MIRLangRef to include GMI additions.
|
||||
|
||||
:ref:`gmir` is serializable (see :doc:`../MIRLangRef`).
|
||||
Combined with :ref:`maintainability-iterative`, this enables much finer-grained
|
||||
testing, rather than requiring large and fragile IR-to-assembly tests.
|
||||
|
||||
The current "stage" in the :ref:`pipeline` is represented by a set of
|
||||
``MachineFunctionProperties``:
|
||||
|
||||
* ``legalized``
|
||||
* ``regBankSelected``
|
||||
* ``selected``
|
||||
|
21
docs/GlobalISel/Porting.rst
Normal file
21
docs/GlobalISel/Porting.rst
Normal file
@ -0,0 +1,21 @@
|
||||
.. _porting:
|
||||
|
||||
Porting GlobalISel to A New Target
|
||||
==================================
|
||||
|
||||
There are four major classes to implement by the target:
|
||||
|
||||
* :ref:`CallLowering <api-calllowering>` --- lower calls, returns, and arguments
|
||||
according to the ABI.
|
||||
* :ref:`RegisterBankInfo <api-registerbankinfo>` --- describe
|
||||
:ref:`gmir-regbank` coverage, cross-bank copy cost, and the mapping of
|
||||
operands onto banks for each instruction.
|
||||
* :ref:`LegalizerInfo <api-legalizerinfo>` --- describe what is legal, and how
|
||||
to legalize what isn't.
|
||||
* :ref:`InstructionSelector <api-instructionselector>` --- select generic MIR
|
||||
to target-specific MIR.
|
||||
|
||||
Additionally:
|
||||
|
||||
* ``TargetPassConfig`` --- create the passes constituting the pipeline,
|
||||
including additional passes not included in the :ref:`pipeline`.
|
73
docs/GlobalISel/RegBankSelect.rst
Normal file
73
docs/GlobalISel/RegBankSelect.rst
Normal file
@ -0,0 +1,73 @@
|
||||
.. _regbankselect:
|
||||
|
||||
RegBankSelect
|
||||
-------------
|
||||
|
||||
This pass constrains the :ref:`gmir-gvregs` operands of generic
|
||||
instructions to some :ref:`gmir-regbank`.
|
||||
|
||||
It iteratively maps instructions to a set of per-operand bank assignment.
|
||||
The possible mappings are determined by the target-provided
|
||||
:ref:`RegisterBankInfo <api-registerbankinfo>`.
|
||||
The mapping is then applied, possibly introducing ``COPY`` instructions if
|
||||
necessary.
|
||||
|
||||
It traverses the ``MachineFunction`` top down so that all operands are already
|
||||
mapped when analyzing an instruction.
|
||||
|
||||
This pass could also remap target-specific instructions when beneficial.
|
||||
In the future, this could replace the ExeDepsFix pass, as we can directly
|
||||
select the best variant for an instruction that's available on multiple banks.
|
||||
|
||||
.. _api-registerbankinfo:
|
||||
|
||||
API: RegisterBankInfo
|
||||
^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The ``RegisterBankInfo`` class describes multiple aspects of register banks.
|
||||
|
||||
* **Banks**: ``addRegBankCoverage`` --- which register bank covers each
|
||||
register class.
|
||||
|
||||
* **Cross-Bank Copies**: ``copyCost`` --- the cost of a ``COPY`` from one bank
|
||||
to another.
|
||||
|
||||
* **Default Mapping**: ``getInstrMapping`` --- the default bank assignments for
|
||||
a given instruction.
|
||||
|
||||
* **Alternative Mapping**: ``getInstrAlternativeMapping`` --- the other
|
||||
possible bank assignments for a given instruction.
|
||||
|
||||
``TODO``:
|
||||
All this information should eventually be static and generated by TableGen,
|
||||
mostly using existing information augmented by bank descriptions.
|
||||
|
||||
``TODO``:
|
||||
``getInstrMapping`` is currently separate from ``getInstrAlternativeMapping``
|
||||
because the latter is more expensive: as we move to static mapping info,
|
||||
both methods should be free, and we should merge them.
|
||||
|
||||
.. _regbankselect-modes:
|
||||
|
||||
RegBankSelect Modes
|
||||
^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
``RegBankSelect`` currently has two modes:
|
||||
|
||||
* **Fast** --- For each instruction, pick a target-provided "default" bank
|
||||
assignment. This is the default at -O0.
|
||||
|
||||
* **Greedy** --- For each instruction, pick the cheapest of several
|
||||
target-provided bank assignment alternatives.
|
||||
|
||||
We intend to eventually introduce an additional optimizing mode:
|
||||
|
||||
* **Global** --- Across multiple instructions, pick the cheapest combination of
|
||||
bank assignments.
|
||||
|
||||
``NOTE``:
|
||||
On AArch64, we are considering using the Greedy mode even at -O0 (or perhaps at
|
||||
backend -O1): because :ref:`gmir-llt` doesn't distinguish floating point from
|
||||
integer scalars, the default assignment for loads and stores is the integer
|
||||
bank, introducing cross-bank copies on most floating point operations.
|
||||
|
11
docs/GlobalISel/Resources.rst
Normal file
11
docs/GlobalISel/Resources.rst
Normal file
@ -0,0 +1,11 @@
|
||||
.. _other_resources:
|
||||
|
||||
Resources
|
||||
=========
|
||||
|
||||
* `Global Instruction Selection - A Proposal by Quentin Colombet @LLVMDevMeeting 2015 <https://www.youtube.com/watch?v=F6GGbYtae3g>`_
|
||||
* `Global Instruction Selection - Status by Quentin Colombet, Ahmed Bougacha, and Tim Northover @LLVMDevMeeting 2016 <https://www.youtube.com/watch?v=6tfb344A7w8>`_
|
||||
* `GlobalISel - LLVM's Latest Instruction Selection Framework by Diana Picus @FOSDEM17 <https://www.youtube.com/watch?v=d6dF6E4BPeU>`_
|
||||
* `GlobalISel: Past, Present, and Future by Quentin Colombet and Ahmed Bougacha @LLVMDevMeeting 2017 <https://www.llvm.org/devmtg/2017-10/#talk11>`_
|
||||
* `Head First into GlobalISel by Daniel Sanders, Aditya Nandakumar, and Justin Bogner @LLVMDevMeeting 2017 <https://www.llvm.org/devmtg/2017-10/#tutorial2>`_
|
||||
* `Generating Optimized Code with GlobalISel by Volkan Keles, Daniel Sanders @LLVMDevMeeting 2019 <https://www.llvm.org/devmtg/2019-10/talk-abstracts.html#keynote1>`_
|
94
docs/GlobalISel/index.rst
Normal file
94
docs/GlobalISel/index.rst
Normal file
@ -0,0 +1,94 @@
|
||||
============================
|
||||
Global Instruction Selection
|
||||
============================
|
||||
|
||||
.. warning::
|
||||
This document is a work in progress. It reflects the current state of the
|
||||
implementation, as well as open design and implementation issues.
|
||||
|
||||
.. contents::
|
||||
:local:
|
||||
:depth: 1
|
||||
|
||||
Introduction
|
||||
============
|
||||
|
||||
GlobalISel is a framework that provides a set of reusable passes and utilities
|
||||
for instruction selection --- translation from LLVM IR to target-specific
|
||||
Machine IR (MIR).
|
||||
|
||||
GlobalISel is intended to be a replacement for SelectionDAG and FastISel, to
|
||||
solve three major problems:
|
||||
|
||||
* **Performance** --- SelectionDAG introduces a dedicated intermediate
|
||||
representation, which has a compile-time cost.
|
||||
|
||||
GlobalISel directly operates on the post-isel representation used by the
|
||||
rest of the code generator, MIR.
|
||||
It does require extensions to that representation to support arbitrary
|
||||
incoming IR: :ref:`gmir`.
|
||||
|
||||
* **Granularity** --- SelectionDAG and FastISel operate on individual basic
|
||||
blocks, losing some global optimization opportunities.
|
||||
|
||||
GlobalISel operates on the whole function.
|
||||
|
||||
* **Modularity** --- SelectionDAG and FastISel are radically different and share
|
||||
very little code.
|
||||
|
||||
GlobalISel is built in a way that enables code reuse. For instance, both the
|
||||
optimized and fast selectors share the :ref:`pipeline`, and targets can
|
||||
configure that pipeline to better suit their needs.
|
||||
|
||||
Design and Implementation Reference
|
||||
===================================
|
||||
|
||||
More information on the design and implementation of GlobalISel can be found in
|
||||
the following sections.
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
GMIR
|
||||
Pipeline
|
||||
Porting
|
||||
Resources
|
||||
|
||||
|
||||
.. _progress:
|
||||
|
||||
Progress and Future Work
|
||||
========================
|
||||
|
||||
The initial goal is to replace FastISel on AArch64. The next step will be to
|
||||
replace SelectionDAG as the optimized ISel.
|
||||
|
||||
``NOTE``:
|
||||
While we iterate on GlobalISel, we strive to avoid affecting the performance of
|
||||
SelectionDAG, FastISel, or the other MIR passes. For instance, the types of
|
||||
:ref:`gmir-gvregs` are stored in a separate table in ``MachineRegisterInfo``,
|
||||
that is destroyed after :ref:`instructionselect`.
|
||||
|
||||
.. _progress-fastisel:
|
||||
|
||||
FastISel Replacement
|
||||
--------------------
|
||||
|
||||
For the initial FastISel replacement, we intend to fallback to SelectionDAG on
|
||||
selection failures.
|
||||
|
||||
Currently, compile-time of the fast pipeline is within 1.5x of FastISel.
|
||||
We're optimistic we can get to within 1.1/1.2x, but beating FastISel will be
|
||||
challenging given the multi-pass approach.
|
||||
Still, supporting all IR (via a complete legalizer) and avoiding the fallback
|
||||
to SelectionDAG in the worst case should enable better amortized performance
|
||||
than SelectionDAG+FastISel.
|
||||
|
||||
``NOTE``:
|
||||
We considered never having a fallback to SelectionDAG, instead deciding early
|
||||
whether a given function is supported by GlobalISel or not. The decision would
|
||||
be based on :ref:`milegalizer` queries.
|
||||
We abandoned that for two reasons:
|
||||
a) on IR inputs, we'd need to basically simulate the :ref:`irtranslator`;
|
||||
b) to be robust against unforeseen failures and to enable iterative
|
||||
improvements.
|
@ -23,7 +23,7 @@ LLVM and API reference documentation.
|
||||
FuzzingLLVM
|
||||
GarbageCollection
|
||||
GetElementPtr
|
||||
GlobalISel
|
||||
GlobalISel/index
|
||||
GwpAsan
|
||||
HowToSetUpLLVMStyleRTTI
|
||||
HowToUseAttributes
|
||||
@ -125,7 +125,7 @@ LLVM IR
|
||||
A reference manual for the MIR serialization format, which is used to test
|
||||
LLVM's code generation passes.
|
||||
|
||||
:doc:`GlobalISel`
|
||||
:doc:`GlobalISel/index`
|
||||
This describes the prototype instruction selection replacement, GlobalISel.
|
||||
|
||||
=====================
|
||||
@ -208,4 +208,4 @@ Additional Topics
|
||||
LLVM support for coroutines.
|
||||
|
||||
:doc:`YamlIO`
|
||||
A reference guide for using LLVM's YAML I/O library.
|
||||
A reference guide for using LLVM's YAML I/O library.
|
||||
|
Loading…
Reference in New Issue
Block a user