mirror of
https://github.com/RPCS3/llvm.git
synced 2025-03-01 23:25:56 +00:00
[docs] Remove training whitespaces. NFC
Subscribers: jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67835 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@372399 91177308-0d34-0410-b5e6-96231b3b80d8
This commit is contained in:
parent
22b465abb4
commit
8de00d5b2d
@ -9,220 +9,220 @@ Performance Tips for Frontend Authors
|
||||
Abstract
|
||||
========
|
||||
|
||||
The intended audience of this document is developers of language frontends
|
||||
targeting LLVM IR. This document is home to a collection of tips on how to
|
||||
generate IR that optimizes well.
|
||||
The intended audience of this document is developers of language frontends
|
||||
targeting LLVM IR. This document is home to a collection of tips on how to
|
||||
generate IR that optimizes well.
|
||||
|
||||
IR Best Practices
|
||||
=================
|
||||
|
||||
As with any optimizer, LLVM has its strengths and weaknesses. In some cases,
|
||||
surprisingly small changes in the source IR can have a large effect on the
|
||||
generated code.
|
||||
As with any optimizer, LLVM has its strengths and weaknesses. In some cases,
|
||||
surprisingly small changes in the source IR can have a large effect on the
|
||||
generated code.
|
||||
|
||||
Beyond the specific items on the list below, it's worth noting that the most
|
||||
Beyond the specific items on the list below, it's worth noting that the most
|
||||
mature frontend for LLVM is Clang. As a result, the further your IR gets from
|
||||
what Clang might emit, the less likely it is to be effectively optimized. It
|
||||
can often be useful to write a quick C program with the semantics you're trying
|
||||
to model and see what decisions Clang's IRGen makes about what IR to emit.
|
||||
Studying Clang's CodeGen directory can also be a good source of ideas. Note
|
||||
that Clang and LLVM are explicitly version locked so you'll need to make sure
|
||||
you're using a Clang built from the same svn revision or release as the LLVM
|
||||
library you're using. As always, it's *strongly* recommended that you track
|
||||
to model and see what decisions Clang's IRGen makes about what IR to emit.
|
||||
Studying Clang's CodeGen directory can also be a good source of ideas. Note
|
||||
that Clang and LLVM are explicitly version locked so you'll need to make sure
|
||||
you're using a Clang built from the same svn revision or release as the LLVM
|
||||
library you're using. As always, it's *strongly* recommended that you track
|
||||
tip of tree development, particularly during bring up of a new project.
|
||||
|
||||
The Basics
|
||||
^^^^^^^^^^^
|
||||
|
||||
#. Make sure that your Modules contain both a data layout specification and
|
||||
#. Make sure that your Modules contain both a data layout specification and
|
||||
target triple. Without these pieces, non of the target specific optimization
|
||||
will be enabled. This can have a major effect on the generated code quality.
|
||||
|
||||
#. For each function or global emitted, use the most private linkage type
|
||||
possible (private, internal or linkonce_odr preferably). Doing so will
|
||||
possible (private, internal or linkonce_odr preferably). Doing so will
|
||||
make LLVM's inter-procedural optimizations much more effective.
|
||||
|
||||
#. Avoid high in-degree basic blocks (e.g. basic blocks with dozens or hundreds
|
||||
of predecessors). Among other issues, the register allocator is known to
|
||||
perform badly with confronted with such structures. The only exception to
|
||||
of predecessors). Among other issues, the register allocator is known to
|
||||
perform badly with confronted with such structures. The only exception to
|
||||
this guidance is that a unified return block with high in-degree is fine.
|
||||
|
||||
Use of allocas
|
||||
^^^^^^^^^^^^^^
|
||||
|
||||
An alloca instruction can be used to represent a function scoped stack slot,
|
||||
but can also represent dynamic frame expansion. When representing function
|
||||
scoped variables or locations, placing alloca instructions at the beginning of
|
||||
the entry block should be preferred. In particular, place them before any
|
||||
call instructions. Call instructions might get inlined and replaced with
|
||||
multiple basic blocks. The end result is that a following alloca instruction
|
||||
An alloca instruction can be used to represent a function scoped stack slot,
|
||||
but can also represent dynamic frame expansion. When representing function
|
||||
scoped variables or locations, placing alloca instructions at the beginning of
|
||||
the entry block should be preferred. In particular, place them before any
|
||||
call instructions. Call instructions might get inlined and replaced with
|
||||
multiple basic blocks. The end result is that a following alloca instruction
|
||||
would no longer be in the entry basic block afterward.
|
||||
|
||||
The SROA (Scalar Replacement Of Aggregates) and Mem2Reg passes only attempt
|
||||
to eliminate alloca instructions that are in the entry basic block. Given
|
||||
SSA is the canonical form expected by much of the optimizer; if allocas can
|
||||
not be eliminated by Mem2Reg or SROA, the optimizer is likely to be less
|
||||
to eliminate alloca instructions that are in the entry basic block. Given
|
||||
SSA is the canonical form expected by much of the optimizer; if allocas can
|
||||
not be eliminated by Mem2Reg or SROA, the optimizer is likely to be less
|
||||
effective than it could be.
|
||||
|
||||
Avoid loads and stores of large aggregate type
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
LLVM currently does not optimize well loads and stores of large :ref:`aggregate
|
||||
types <t_aggregate>` (i.e. structs and arrays). As an alternative, consider
|
||||
types <t_aggregate>` (i.e. structs and arrays). As an alternative, consider
|
||||
loading individual fields from memory.
|
||||
|
||||
Aggregates that are smaller than the largest (performant) load or store
|
||||
instruction supported by the targeted hardware are well supported. These can
|
||||
be an effective way to represent collections of small packed fields.
|
||||
Aggregates that are smaller than the largest (performant) load or store
|
||||
instruction supported by the targeted hardware are well supported. These can
|
||||
be an effective way to represent collections of small packed fields.
|
||||
|
||||
Prefer zext over sext when legal
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
On some architectures (X86_64 is one), sign extension can involve an extra
|
||||
On some architectures (X86_64 is one), sign extension can involve an extra
|
||||
instruction whereas zero extension can be folded into a load. LLVM will try to
|
||||
replace a sext with a zext when it can be proven safe, but if you have
|
||||
information in your source language about the range of a integer value, it can
|
||||
be profitable to use a zext rather than a sext.
|
||||
replace a sext with a zext when it can be proven safe, but if you have
|
||||
information in your source language about the range of a integer value, it can
|
||||
be profitable to use a zext rather than a sext.
|
||||
|
||||
Alternatively, you can :ref:`specify the range of the value using metadata
|
||||
Alternatively, you can :ref:`specify the range of the value using metadata
|
||||
<range-metadata>` and LLVM can do the sext to zext conversion for you.
|
||||
|
||||
Zext GEP indices to machine register width
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Internally, LLVM often promotes the width of GEP indices to machine register
|
||||
width. When it does so, it will default to using sign extension (sext)
|
||||
operations for safety. If your source language provides information about
|
||||
the range of the index, you may wish to manually extend indices to machine
|
||||
width. When it does so, it will default to using sign extension (sext)
|
||||
operations for safety. If your source language provides information about
|
||||
the range of the index, you may wish to manually extend indices to machine
|
||||
register width using a zext instruction.
|
||||
|
||||
When to specify alignment
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
LLVM will always generate correct code if you don’t specify alignment, but may
|
||||
generate inefficient code. For example, if you are targeting MIPS (or older
|
||||
ARM ISAs) then the hardware does not handle unaligned loads and stores, and
|
||||
so you will enter a trap-and-emulate path if you do a load or store with
|
||||
lower-than-natural alignment. To avoid this, LLVM will emit a slower
|
||||
sequence of loads, shifts and masks (or load-right + load-left on MIPS) for
|
||||
all cases where the load / store does not have a sufficiently high alignment
|
||||
generate inefficient code. For example, if you are targeting MIPS (or older
|
||||
ARM ISAs) then the hardware does not handle unaligned loads and stores, and
|
||||
so you will enter a trap-and-emulate path if you do a load or store with
|
||||
lower-than-natural alignment. To avoid this, LLVM will emit a slower
|
||||
sequence of loads, shifts and masks (or load-right + load-left on MIPS) for
|
||||
all cases where the load / store does not have a sufficiently high alignment
|
||||
in the IR.
|
||||
|
||||
The alignment is used to guarantee the alignment on allocas and globals,
|
||||
though in most cases this is unnecessary (most targets have a sufficiently
|
||||
high default alignment that they’ll be fine). It is also used to provide a
|
||||
The alignment is used to guarantee the alignment on allocas and globals,
|
||||
though in most cases this is unnecessary (most targets have a sufficiently
|
||||
high default alignment that they’ll be fine). It is also used to provide a
|
||||
contract to the back end saying ‘either this load/store has this alignment, or
|
||||
it is undefined behavior’. This means that the back end is free to emit
|
||||
instructions that rely on that alignment (and mid-level optimizers are free to
|
||||
perform transforms that require that alignment). For x86, it doesn’t make
|
||||
much difference, as almost all instructions are alignment-independent. For
|
||||
it is undefined behavior’. This means that the back end is free to emit
|
||||
instructions that rely on that alignment (and mid-level optimizers are free to
|
||||
perform transforms that require that alignment). For x86, it doesn’t make
|
||||
much difference, as almost all instructions are alignment-independent. For
|
||||
MIPS, it can make a big difference.
|
||||
|
||||
Note that if your loads and stores are atomic, the backend will be unable to
|
||||
lower an under aligned access into a sequence of natively aligned accesses.
|
||||
Note that if your loads and stores are atomic, the backend will be unable to
|
||||
lower an under aligned access into a sequence of natively aligned accesses.
|
||||
As a result, alignment is mandatory for atomic loads and stores.
|
||||
|
||||
Other Things to Consider
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
#. Use ptrtoint/inttoptr sparingly (they interfere with pointer aliasing
|
||||
#. Use ptrtoint/inttoptr sparingly (they interfere with pointer aliasing
|
||||
analysis), prefer GEPs
|
||||
|
||||
#. Prefer globals over inttoptr of a constant address - this gives you
|
||||
dereferencability information. In MCJIT, use getSymbolAddress to provide
|
||||
#. Prefer globals over inttoptr of a constant address - this gives you
|
||||
dereferencability information. In MCJIT, use getSymbolAddress to provide
|
||||
actual address.
|
||||
|
||||
#. Be wary of ordered and atomic memory operations. They are hard to optimize
|
||||
#. Be wary of ordered and atomic memory operations. They are hard to optimize
|
||||
and may not be well optimized by the current optimizer. Depending on your
|
||||
source language, you may consider using fences instead.
|
||||
|
||||
#. If calling a function which is known to throw an exception (unwind), use
|
||||
an invoke with a normal destination which contains an unreachable
|
||||
instruction. This form conveys to the optimizer that the call returns
|
||||
#. If calling a function which is known to throw an exception (unwind), use
|
||||
an invoke with a normal destination which contains an unreachable
|
||||
instruction. This form conveys to the optimizer that the call returns
|
||||
abnormally. For an invoke which neither returns normally or requires unwind
|
||||
code in the current function, you can use a noreturn call instruction if
|
||||
code in the current function, you can use a noreturn call instruction if
|
||||
desired. This is generally not required because the optimizer will convert
|
||||
an invoke with an unreachable unwind destination to a call instruction.
|
||||
|
||||
#. Use profile metadata to indicate statically known cold paths, even if
|
||||
dynamic profiling information is not available. This can make a large
|
||||
#. Use profile metadata to indicate statically known cold paths, even if
|
||||
dynamic profiling information is not available. This can make a large
|
||||
difference in code placement and thus the performance of tight loops.
|
||||
|
||||
#. When generating code for loops, try to avoid terminating the header block of
|
||||
the loop earlier than necessary. If the terminator of the loop header
|
||||
the loop earlier than necessary. If the terminator of the loop header
|
||||
block is a loop exiting conditional branch, the effectiveness of LICM will
|
||||
be limited for loads not in the header. (This is due to the fact that LLVM
|
||||
may not know such a load is safe to speculatively execute and thus can't
|
||||
lift an otherwise loop invariant load unless it can prove the exiting
|
||||
condition is not taken.) It can be profitable, in some cases, to emit such
|
||||
instructions into the header even if they are not used along a rarely
|
||||
executed path that exits the loop. This guidance specifically does not
|
||||
be limited for loads not in the header. (This is due to the fact that LLVM
|
||||
may not know such a load is safe to speculatively execute and thus can't
|
||||
lift an otherwise loop invariant load unless it can prove the exiting
|
||||
condition is not taken.) It can be profitable, in some cases, to emit such
|
||||
instructions into the header even if they are not used along a rarely
|
||||
executed path that exits the loop. This guidance specifically does not
|
||||
apply if the condition which terminates the loop header is itself invariant,
|
||||
or can be easily discharged by inspecting the loop index variables.
|
||||
|
||||
#. In hot loops, consider duplicating instructions from small basic blocks
|
||||
which end in highly predictable terminators into their successor blocks.
|
||||
If a hot successor block contains instructions which can be vectorized
|
||||
#. In hot loops, consider duplicating instructions from small basic blocks
|
||||
which end in highly predictable terminators into their successor blocks.
|
||||
If a hot successor block contains instructions which can be vectorized
|
||||
with the duplicated ones, this can provide a noticeable throughput
|
||||
improvement. Note that this is not always profitable and does involve a
|
||||
improvement. Note that this is not always profitable and does involve a
|
||||
potentially large increase in code size.
|
||||
|
||||
#. When checking a value against a constant, emit the check using a consistent
|
||||
comparison type. The GVN pass *will* optimize redundant equalities even if
|
||||
the type of comparison is inverted, but GVN only runs late in the pipeline.
|
||||
As a result, you may miss the opportunity to run other important
|
||||
optimizations. Improvements to EarlyCSE to remove this issue are tracked in
|
||||
As a result, you may miss the opportunity to run other important
|
||||
optimizations. Improvements to EarlyCSE to remove this issue are tracked in
|
||||
Bug 23333.
|
||||
|
||||
#. Avoid using arithmetic intrinsics unless you are *required* by your source
|
||||
language specification to emit a particular code sequence. The optimizer
|
||||
#. Avoid using arithmetic intrinsics unless you are *required* by your source
|
||||
language specification to emit a particular code sequence. The optimizer
|
||||
is quite good at reasoning about general control flow and arithmetic, it is
|
||||
not anywhere near as strong at reasoning about the various intrinsics. If
|
||||
profitable for code generation purposes, the optimizer will likely form the
|
||||
intrinsics itself late in the optimization pipeline. It is *very* rarely
|
||||
not anywhere near as strong at reasoning about the various intrinsics. If
|
||||
profitable for code generation purposes, the optimizer will likely form the
|
||||
intrinsics itself late in the optimization pipeline. It is *very* rarely
|
||||
profitable to emit these directly in the language frontend. This item
|
||||
explicitly includes the use of the :ref:`overflow intrinsics <int_overflow>`.
|
||||
|
||||
#. Avoid using the :ref:`assume intrinsic <int_assume>` until you've
|
||||
established that a) there's no other way to express the given fact and b)
|
||||
that fact is critical for optimization purposes. Assumes are a great
|
||||
prototyping mechanism, but they can have negative effects on both compile
|
||||
time and optimization effectiveness. The former is fixable with enough
|
||||
#. Avoid using the :ref:`assume intrinsic <int_assume>` until you've
|
||||
established that a) there's no other way to express the given fact and b)
|
||||
that fact is critical for optimization purposes. Assumes are a great
|
||||
prototyping mechanism, but they can have negative effects on both compile
|
||||
time and optimization effectiveness. The former is fixable with enough
|
||||
effort, but the later is fairly fundamental to their designed purpose.
|
||||
|
||||
|
||||
Describing Language Specific Properties
|
||||
=======================================
|
||||
|
||||
When translating a source language to LLVM, finding ways to express concepts
|
||||
and guarantees available in your source language which are not natively
|
||||
provided by LLVM IR will greatly improve LLVM's ability to optimize your code.
|
||||
When translating a source language to LLVM, finding ways to express concepts
|
||||
and guarantees available in your source language which are not natively
|
||||
provided by LLVM IR will greatly improve LLVM's ability to optimize your code.
|
||||
As an example, C/C++'s ability to mark every add as "no signed wrap (nsw)" goes
|
||||
a long way to assisting the optimizer in reasoning about loop induction
|
||||
variables and thus generating more optimal code for loops.
|
||||
a long way to assisting the optimizer in reasoning about loop induction
|
||||
variables and thus generating more optimal code for loops.
|
||||
|
||||
The LLVM LangRef includes a number of mechanisms for annotating the IR with
|
||||
additional semantic information. It is *strongly* recommended that you become
|
||||
highly familiar with this document. The list below is intended to highlight a
|
||||
The LLVM LangRef includes a number of mechanisms for annotating the IR with
|
||||
additional semantic information. It is *strongly* recommended that you become
|
||||
highly familiar with this document. The list below is intended to highlight a
|
||||
couple of items of particular interest, but is by no means exhaustive.
|
||||
|
||||
Restricted Operation Semantics
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
#. Add nsw/nuw flags as appropriate. Reasoning about overflow is
|
||||
generally hard for an optimizer so providing these facts from the frontend
|
||||
can be very impactful.
|
||||
#. Add nsw/nuw flags as appropriate. Reasoning about overflow is
|
||||
generally hard for an optimizer so providing these facts from the frontend
|
||||
can be very impactful.
|
||||
|
||||
#. Use fast-math flags on floating point operations if legal. If you don't
|
||||
need strict IEEE floating point semantics, there are a number of additional
|
||||
optimizations that can be performed. This can be highly impactful for
|
||||
#. Use fast-math flags on floating point operations if legal. If you don't
|
||||
need strict IEEE floating point semantics, there are a number of additional
|
||||
optimizations that can be performed. This can be highly impactful for
|
||||
floating point intensive computations.
|
||||
|
||||
Describing Aliasing Properties
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
#. Add noalias/align/dereferenceable/nonnull to function arguments and return
|
||||
#. Add noalias/align/dereferenceable/nonnull to function arguments and return
|
||||
values as appropriate
|
||||
|
||||
#. Use pointer aliasing metadata, especially tbaa metadata, to communicate
|
||||
#. Use pointer aliasing metadata, especially tbaa metadata, to communicate
|
||||
otherwise-non-deducible pointer aliasing facts
|
||||
|
||||
#. Use inbounds on geps. This can help to disambiguate some aliasing queries.
|
||||
@ -233,37 +233,37 @@ Modeling Memory Effects
|
||||
|
||||
#. Mark functions as readnone/readonly/argmemonly or noreturn/nounwind when
|
||||
known. The optimizer will try to infer these flags, but may not always be
|
||||
able to. Manual annotations are particularly important for external
|
||||
able to. Manual annotations are particularly important for external
|
||||
functions that the optimizer can not analyze.
|
||||
|
||||
#. Use the lifetime.start/lifetime.end and invariant.start/invariant.end
|
||||
intrinsics where possible. Common profitable uses are for stack like data
|
||||
structures (thus allowing dead store elimination) and for describing
|
||||
life times of allocas (thus allowing smaller stack sizes).
|
||||
#. Use the lifetime.start/lifetime.end and invariant.start/invariant.end
|
||||
intrinsics where possible. Common profitable uses are for stack like data
|
||||
structures (thus allowing dead store elimination) and for describing
|
||||
life times of allocas (thus allowing smaller stack sizes).
|
||||
|
||||
#. Mark invariant locations using !invariant.load and TBAA's constant flags
|
||||
|
||||
Pass Ordering
|
||||
^^^^^^^^^^^^^
|
||||
|
||||
One of the most common mistakes made by new language frontend projects is to
|
||||
One of the most common mistakes made by new language frontend projects is to
|
||||
use the existing -O2 or -O3 pass pipelines as is. These pass pipelines make a
|
||||
good starting point for an optimizing compiler for any language, but they have
|
||||
been carefully tuned for C and C++, not your target language. You will almost
|
||||
certainly need to use a custom pass order to achieve optimal performance. A
|
||||
good starting point for an optimizing compiler for any language, but they have
|
||||
been carefully tuned for C and C++, not your target language. You will almost
|
||||
certainly need to use a custom pass order to achieve optimal performance. A
|
||||
couple specific suggestions:
|
||||
|
||||
#. For languages with numerous rarely executed guard conditions (e.g. null
|
||||
checks, type checks, range checks) consider adding an extra execution or
|
||||
two of LoopUnswith and LICM to your pass order. The standard pass order,
|
||||
which is tuned for C and C++ applications, may not be sufficient to remove
|
||||
#. For languages with numerous rarely executed guard conditions (e.g. null
|
||||
checks, type checks, range checks) consider adding an extra execution or
|
||||
two of LoopUnswith and LICM to your pass order. The standard pass order,
|
||||
which is tuned for C and C++ applications, may not be sufficient to remove
|
||||
all dischargeable checks from loops.
|
||||
|
||||
#. If you language uses range checks, consider using the IRCE pass. It is not
|
||||
#. If you language uses range checks, consider using the IRCE pass. It is not
|
||||
currently part of the standard pass order.
|
||||
|
||||
#. A useful sanity check to run is to run your optimized IR back through the
|
||||
-O2 pipeline again. If you see noticeable improvement in the resulting IR,
|
||||
#. A useful sanity check to run is to run your optimized IR back through the
|
||||
-O2 pipeline again. If you see noticeable improvement in the resulting IR,
|
||||
you likely need to adjust your pass order.
|
||||
|
||||
|
||||
@ -272,16 +272,16 @@ I Still Can't Find What I'm Looking For
|
||||
|
||||
If you didn't find what you were looking for above, consider proposing an piece
|
||||
of metadata which provides the optimization hint you need. Such extensions are
|
||||
relatively common and are generally well received by the community. You will
|
||||
need to ensure that your proposal is sufficiently general so that it benefits
|
||||
relatively common and are generally well received by the community. You will
|
||||
need to ensure that your proposal is sufficiently general so that it benefits
|
||||
others if you wish to contribute it upstream.
|
||||
|
||||
You should also consider describing the problem you're facing on `llvm-dev
|
||||
<http://lists.llvm.org/mailman/listinfo/llvm-dev>`_ and asking for advice.
|
||||
It's entirely possible someone has encountered your problem before and can
|
||||
give good advice. If there are multiple interested parties, that also
|
||||
You should also consider describing the problem you're facing on `llvm-dev
|
||||
<http://lists.llvm.org/mailman/listinfo/llvm-dev>`_ and asking for advice.
|
||||
It's entirely possible someone has encountered your problem before and can
|
||||
give good advice. If there are multiple interested parties, that also
|
||||
increases the chances that a metadata extension would be well received by the
|
||||
community as a whole.
|
||||
community as a whole.
|
||||
|
||||
Adding to this document
|
||||
=======================
|
||||
@ -290,8 +290,8 @@ If you run across a case that you feel deserves to be covered here, please send
|
||||
a patch to `llvm-commits
|
||||
<http://lists.llvm.org/mailman/listinfo/llvm-commits>`_ for review.
|
||||
|
||||
If you have questions on these items, please direct them to `llvm-dev
|
||||
<http://lists.llvm.org/mailman/listinfo/llvm-dev>`_. The more relevant
|
||||
context you are able to give to your question, the more likely it is to be
|
||||
If you have questions on these items, please direct them to `llvm-dev
|
||||
<http://lists.llvm.org/mailman/listinfo/llvm-dev>`_. The more relevant
|
||||
context you are able to give to your question, the more likely it is to be
|
||||
answered.
|
||||
|
||||
|
Loading…
x
Reference in New Issue
Block a user