This commit adds a new target-independent calling convention for C++ TLS
access functions. It aims to minimize overhead in the caller by perserving as
many registers as possible.
The target-specific implementation for X86-64 is defined as following:
Arguments are passed as for the default C calling convention
The same applies for the return value(s)
The callee preserves all GPRs - except RAX and RDI
The access function makes C-style TLS function calls in the entry and exit
block, C-style TLS functions save a lot more registers than normal calls.
The added calling convention ties into the existing implementation of the
C-style TLS functions, so we can't simply use existing calling conventions
such as preserve_mostcc.
rdar://9001553
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254737 91177308-0d34-0410-b5e6-96231b3b80d8
This is a continuation of r253367.
These functions return is owned by the caller, so they return
std::unique_ptr now.
The call can fail, so the return is wrapped in ErrorOr.
They have a context where to report diagnostics, so they don't need to
take a string out parameter.
With this there are no call to getGlobalContext in lib/LTO.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254721 91177308-0d34-0410-b5e6-96231b3b80d8
Since BuildMI() automatically adds the implicit operands for a new instruction,
adding the old instructions CC operand resulted in that there were two CC imp-def
operands, where only one was marked as dead. This caused buildSchedGraph() to
miss dependencies on the CC reg.
Review by Ulrich Weigand
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254714 91177308-0d34-0410-b5e6-96231b3b80d8
Add new x86 pass which replaces address calculations in load or store instructions with def register of existing LEA (must be in the same basic block), if the LEA calculates address that differs only by a displacement. Works only with -Os or -Oz.
Differential Revision: http://reviews.llvm.org/D13294
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254712 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
If we remove the MMOs from Load/Store instructions,
they are treated as volatile. This makes other optimization passes unhappy.
eg. Load/Store Optimization
So, it looks better to merge, not remove.
Reviewers: gberry, mcrosier
Subscribers: gberry, llvm-commits
Differential Revision: http://reviews.llvm.org/D14797
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254694 91177308-0d34-0410-b5e6-96231b3b80d8
This class is turning into a useful interface, rather than an implementation
detail, so I'm dropping the 'Base' suffix.
No functional change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254693 91177308-0d34-0410-b5e6-96231b3b80d8
with its source instead of forcing the values on GPRs.
This improves the lowering of vector code when such bitcasts happen in the
middle of vector computations.
rdar://problem/23691584
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254684 91177308-0d34-0410-b5e6-96231b3b80d8
Re-comitting with a change that avoids undefined uses getting put into
the VRegUses list.
The new algorithm remembers the uses encountered while walking backwards
until a matching def is found. Contrary to the previous version this:
- Works without LiveIntervals being available
- Allows to increase the precision to subregisters/lanemasks
(not used for now)
The changes in the AMDGPU tests are necessary because the R600 scheduler
is not stable with respect to the order of nodes in the ready queues.
Differential Revision: http://reviews.llvm.org/D9068
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254683 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
computeRegisterLiveness and analyzePhysReg are currently getting
confused about liveness in some cases, breaking copyPhysReg's
calculation of whether AX is dead in some cases. Work around this issue
temporarily by assuming that AX is always live.
See detail in: https://llvm.org/bugs/show_bug.cgi?id=25033#c7
And associated bugs PR24535 PR25033 PR24991 PR24992 PR25201.
This workaround makes the code correct but slightly inefficient, but it
seems to confuse the machine instr verifier which now things EAX was
undefined in some cases where it's being conservatively saved /
restored.
Reviewers: majnemer, sanjoy
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D15198
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254680 91177308-0d34-0410-b5e6-96231b3b80d8
With the latest refactoring and code sharing patches landed,
it is possible to unify the value profile implementation between
raw and indexed profile. This is the patch in raw profile reader
that uses the common interface.
Differential Revision: http://reviews.llvm.org/D15056
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254677 91177308-0d34-0410-b5e6-96231b3b80d8
CFI emits jump slots for indirect functions as a byte array
constant, and declares function-typed aliases to these constants.
This change fixes AsmPrinter to emit these aliases as function
symbols and not data symbols.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254674 91177308-0d34-0410-b5e6-96231b3b80d8
Currently in LLVM's cost model, a vectorized arithmetic instruction will have
high cost if its type is split into multiple registers. However, this
punishment is too heavy and unnecessary. The overhead of the split should not
be on arithmetic instructions but instructions that implement the split. Note
that during vectorization we have calculated the register pressure, and we
only choose proper interleaving factor (and also vectorization factor) so
that we don't use more registers than the maximum number.
Here is a very simple example: if a vadd has the cost 1, and if we double VF
so that we need two registers to perform it, then its cost will become 4 with
the current implementation, which will prevent us to use larger VF.
Differential revision: http://reviews.llvm.org/D15159
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254671 91177308-0d34-0410-b5e6-96231b3b80d8
This change adds support for an optional weight when merging profile data with the llvm-profdata tool.
Weights are specified by adding an option ':<weight>' suffix to the input file names.
Adding support for arbitrary weighting of input profile data allows for relative importance to be placed on the
input data from multiple training runs.
Both sampled and instrumented profiles are supported.
Reviewers: dnovillo, bogner, davidxl
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D14547
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254669 91177308-0d34-0410-b5e6-96231b3b80d8
Code generation often exposes redundant physical register copies through
virtual registers such as:
%vreg = COPY %PHYSREG
...
%PHYSREG = COPY %vreg
There are cases where no intervening clobber of %PHYSREG occurs, and the
later copy could therefore be removed. In some cases this further allows
us to remove the initial copy.
This patch contains a motivating example which comes from the x86 build
of Chrome, specifically cc::ResourceProvider::UnlockForRead uses
libstdc++'s implementation of hash_map. That example has two tests live
at the same time, and after machine sinking LLVM has confused itself
enough and things spilling EFLAGS is a great idea even though it's
never restored and the comparison results are both live.
Before this patch we have:
DEC32m %RIP, 1, %noreg, <ga:@L>, %noreg, %EFLAGS<imp-def>
%vreg1<def> = COPY %EFLAGS; GR64:%vreg1
%EFLAGS<def> = COPY %vreg1; GR64:%vreg1
JNE_1 <BB#1>, %EFLAGS<imp-use>
Both copies are useless. This patch tries to eliminate the later copy in
a generic manner.
dec is especially confusing to LLVM when compared with sub.
I wrote this patch to treat all physical registers generically, but only
remove redundant copies of non-allocatable physical registers because
the allocatable ones caused issues (e.g. when calling conventions weren't
properly modeled) and should be handled later by the register allocator
anyways.
The following tests used to failed when the patch also replaced allocatable
registers:
CodeGen/X86/StackColoring.ll
CodeGen/X86/avx512-calling-conv.ll
CodeGen/X86/copy-propagation.ll
CodeGen/X86/inline-asm-fpstack.ll
CodeGen/X86/musttail-varargs.ll
CodeGen/X86/pop-stack-cleanup.ll
CodeGen/X86/preserve_mostcc64.ll
CodeGen/X86/tailcallstack64.ll
CodeGen/X86/this-return-64.ll
This happens because COPY has other special meaning for e.g. dependency
breakage and x87 FP stack.
Note that all other backends' tests pass.
Reviewers: qcolombet
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D15157
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254665 91177308-0d34-0410-b5e6-96231b3b80d8
When a block has no terminator instructions, getFirstTerminator() returns
end(), which can't be used in dominance checks. Check dominance for phi
operands separately.
Also, remove some bits from WebAssemblyRegStackify.cpp that were causing
trouble on the same testcase; they were left behind from an earlier
experiment.
Differential Revision: http://reviews.llvm.org/D15210
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254662 91177308-0d34-0410-b5e6-96231b3b80d8
The compiler can take advantage of the allocation/deallocation
function's properties. We knew how to do this for Itanium but had no
support for MSVC-style functions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254656 91177308-0d34-0410-b5e6-96231b3b80d8
Almost all these changes are conditioned and only apply to the new
x86-64 f128 type configuration, which will be enabled in a follow up
patch. They are required together to make new f128 work. If there is
any error, we should fix or revert them as a whole.
These changes should have no impact to current configurations.
* Relax type legalization checks to accept new f128 type configuration,
whose TypeAction is TypeSoftenFloat, not TypeLegal, but also has
TLI.isTypeLegal true.
* Relax GetSoftenedFloat to return in some cases f128 type SDValue,
which is TLI.isTypeLegal but not "softened" to i128 node.
* Allow customized FABS, FNEG, FCOPYSIGN on new f128 type configuration,
to generate optimized bitwise operators for libm functions.
* Enhance related Lower* functions to handle f128 type.
* Enhance DAGTypeLegalizer::run, SoftenFloatResult, and related functions
to keep new f128 type in register, and convert f128 operators to library calls.
* Fix Combiner, Emitter, Legalizer routines that did not handle f128 type.
* Add ExpandConstant to handle i128 constants, ExpandNode
to handle ISD::Constant node.
* Add one more parameter to getCommonSubClass and firstCommonClass,
to guarantee that returned common sub class will contain the specified
simple value type.
This extra parameter is used by EmitCopyFromReg in InstrEmitter.cpp.
* Fix infinite loop in getTypeLegalizationCost when f128 is the value type.
* Fix printOperand to handle null operand.
* Enhance ISD::BITCAST node to handle f128 constant.
* Expand new f128 type for BR_CC, SELECT_CC, SELECT, SETCC nodes.
* Enhance X86AsmPrinter to emit f128 values in comments.
Differential Revision: http://reviews.llvm.org/D15134
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254653 91177308-0d34-0410-b5e6-96231b3b80d8
DenseMap is most applicable when both keys and values are small.
In this case, the value violates that assumption, causing quite
significant memory overhead. A std::unordered_map is more appropriate
in this case (or at least fixed the memory problems I was seeing).
Differential Revision: http://reviews.llvm.org/D14910
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254651 91177308-0d34-0410-b5e6-96231b3b80d8