The truncation was causing the sorting algorithm to behave oddly when comparing
positive and negative offsets. Fortunately, this doesn't currently happen in
practice and was exposed by a WIP. Thus, I can't test this change now, but the
follow on patch will.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263255 91177308-0d34-0410-b5e6-96231b3b80d8
llvm::getDISubprogram walks the instructions in a function, looking for one in the scope of the current function, so that it can find the !dbg entry for the subprogram itself.
Now that !dbg is attached to functions, this should not be necessary. This patch changes all uses to just query the subprogram directly on the function.
Ideally this should be NFC, but in reality its possible that a function:
has no !dbg (in which case there's likely a bug somewhere in an opt pass), or
that none of the instructions had a scope referencing the function, so we used to not find the !dbg on the function but now we will
Reviewed by Duncan Exon Smith.
Differential Revision: http://reviews.llvm.org/D18074
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263184 91177308-0d34-0410-b5e6-96231b3b80d8
Generalise the existing SIGN_EXTEND to SIGN_EXTEND_VECTOR_INREG combine to support zero extension as well and get rid of a lot of unnecessary ANY_EXTEND + mask patterns.
Reapplied with a fix for PR26870 (avoid premature use of TargetConstant in ZERO_EXTEND_VECTOR_INREG expansion).
Differential Revision: http://reviews.llvm.org/D17691
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263159 91177308-0d34-0410-b5e6-96231b3b80d8
This is a fairly straightforward port to the new pass manager with one
exception. It removes a very questionable use of releaseMemory() in
the old pass to invalidate its caches between runs on a function.
I don't think this is really guaranteed to be safe. I've just used the
more direct port to the new PM to address this by nuking the results
object each time the pass runs. While this could cause some minor malloc
traffic increase, I don't expect the compile time performance hit to be
noticable, and it makes the correctness and other aspects of the pass
much easier to reason about. In some cases, it may make things faster by
making the sets and maps smaller with better locality. Indeed, the
measurements collected by Bruno (thanks!!!) show mostly compile time
improvements.
There is sadly very limited testing at this point as there are only two
tests of memdep, and both rely on GVN. I'll be porting GVN next and that
will exercise this heavily though.
Differential Revision: http://reviews.llvm.org/D17962
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263082 91177308-0d34-0410-b5e6-96231b3b80d8
This patch teaches CGP to duplicate addressing mode computations into cold paths (detected via explicit cold attribute on calls) if required to let addressing mode be safely sunk into the basic block containing each load and store.
In general, duplicating code into cold blocks may result in code growth, but should not effect performance. In this case, it's better to duplicate some code than to put extra pressure on the register allocator by making it keep the address through the entirely of the fast path.
This patch only handles addressing computations, but in principal, we could implement a more general cold cold scheduling heuristic which tries to reduce register pressure in the fast path by duplicating code into the cold path. Getting the profitability of the general case right seemed likely to be challenging, so I stuck to the existing case (addressing computation) we already had.
Differential Revision: http://reviews.llvm.org/D17652
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263074 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The code in SelectionDAG did not handle the case where the
register type and output types were different, but had the same size.
Reviewers: arsenm, echristo
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D17940
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263022 91177308-0d34-0410-b5e6-96231b3b80d8
This re-applies r262886 with a fix for 32 bit platforms that have 8 byte
pointer alignment, effectively reverting r262892.
Original Message:
Currently some SDNode operands are malloc'd, some are stored inline in
subclasses of SDNode, and some are thrown into a BumpPtrAllocator.
This scheme is complex, inconsistent, and makes refactoring SDNodes
fairly difficult.
Instead, we can allocate all of the operands using an ArrayRecycler
that wraps a BumpPtrAllocator. This keeps the cache locality when
iterating operands, improves locality when iterating SDNodes without
looking at operands, and vastly simplifies the ownership semantics.
It also means we stop overallocating SDNodes by 2-3x and will make it
simpler to fix the rampant undefined behaviour we have in how we
mutate SDNodes from one kind to another (See llvm.org/pr26808).
This is NFC other than the changes in memory behaviour, and I ran some
LNT tests to make sure this didn't hurt compile time. Not many tests
changed: there were a couple of 1-2% regressions reported, but there
were more improvements (of up to 4%) than regressions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262902 91177308-0d34-0410-b5e6-96231b3b80d8
Looks like the largest SDNode is different between 32 and 64 bit now,
so this is breaking 32 bit bots. Reverting while I figure out a fix.
This reverts r262886.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262892 91177308-0d34-0410-b5e6-96231b3b80d8
Currently some SDNode operands are malloc'd, some are stored inline in
subclasses of SDNode, and some are thrown into a BumpPtrAllocator.
This scheme is complex, inconsistent, and makes refactoring SDNodes
fairly difficult.
Instead, we can allocate all of the operands using an ArrayRecycler
that wraps a BumpPtrAllocator. This keeps the cache locality when
iterating operands, improves locality when iterating SDNodes without
looking at operands, and vastly simplifies the ownership semantics.
It also means we stop overallocating SDNodes by 2-3x and will make it
simpler to fix the rampant undefined behaviour we have in how we
mutate SDNodes from one kind to another (See llvm.org/pr26808).
This is NFC other than the changes in memory behaviour, and I ran some
LNT tests to make sure this didn't hurt compile time. Not many tests
changed: there were a couple of 1-2% regressions reported, but there
were more improvements (of up to 4%) than regressions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262886 91177308-0d34-0410-b5e6-96231b3b80d8
Before this change, we would get the type definition in the middle
of the instruction.
E.g., %0(48) = G_ADD %struct_alias = type { i32, i16 } %edi, %edi
Now, we have just the expected type name:
%0(48) = G_ADD %struct_alias %edi, %edi
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262885 91177308-0d34-0410-b5e6-96231b3b80d8
Now the type API is always available, but when global-isel is not
built the implementation does nothing.
Note: The implementation free of ifdefs is WIP and tracked here in PR26576.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262873 91177308-0d34-0410-b5e6-96231b3b80d8
copy coalescing with enabled subregister liveness can reveal undef uses,
previously this was only checked for the SrcReg in updateRegDefsUses()
but we need to check DstReg as well.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262767 91177308-0d34-0410-b5e6-96231b3b80d8
The divrem combine assumed the type of the div/rem is simple, which isn't
necessarily true. This probably worked fine until r250825, since it only
saw legal types, but now breaks when it runs as a pre-type-legalization
combine.
This fixes PR26835.
Differential Revision: http://reviews.llvm.org/D17878
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262746 91177308-0d34-0410-b5e6-96231b3b80d8
When div+rem calls on the same arguments are found, the ARM back-end merges the
two calls into one __aeabi_divmod call for up to 32-bits values. However,
for 64-bit values, which also have a lib call (__aeabi_ldivmod), it wasn't
merging the calls, and thus calling ldivmod twice and spilling the temporary
results, which generated pretty bad code.
This patch legalises 64-bit lib calls for divmod, so that now all the spilling
and the second call are gone. It also relaxes the DivRem combiner a bit on the
legal type check, since it was already checking for isLegalOrCustom on every
value, so the extra check for isTypeLegal was redundant.
Second attempt, creating TLI.isOperationCustom like isOperationExpand, to make
sure we only emit valid types or the ones that were explicitly marked as custom.
Now, passing check-all and test-suite on x86, ARM and AArch64.
This patch fixes PR17193 (and a long time FIXME in the tests).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262738 91177308-0d34-0410-b5e6-96231b3b80d8
Generalise the existing SIGN_EXTEND to SIGN_EXTEND_VECTOR_INREG combine to support zero extension as well and get rid of a lot of unnecessary ANY_EXTEND + mask patterns.
Differential Revision: http://reviews.llvm.org/D17691
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262599 91177308-0d34-0410-b5e6-96231b3b80d8
If we have a loop with a rarely taken path, we will prune that from the blocks which get added as part of the loop chain. The problem is that we weren't then recognizing the loop chain as schedulable when considering the preheader when forming the function chain. We'd then fall to various non-predecessors before finally scheduling the loop chain (as if the CFG was unnatural.) The net result was that there could be lots of garbage between a loop preheader and the loop, even though we could have directly fallen into the loop. It also meant we separated hot code with regions of colder code.
The particular reason for the rejection of the loop chain was that we were scanning predecessor of the header, seeing the backedge, believing that was a globally more important predecessor (true), but forgetting to account for the fact the backedge precessor was already part of the existing loop chain (oops!.
Differential Revision: http://reviews.llvm.org/D17830
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262547 91177308-0d34-0410-b5e6-96231b3b80d8
Catch objects with a displacement of zero do not initialize a catch
object. The displacement is relative to %rsp at the end of the
function's prologue for x86_64 targets.
If we place an object at the top-of-stack, we will end up wit a
displacement of zero resulting in our catch object remaining
uninitialized.
Address this by creating our catch objects as fixed objects. We will
ensure that the UnwindHelp object is created after the catch objects so
that no catch object will have a displacement of zero.
Differential Revision: http://reviews.llvm.org/D17823
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262546 91177308-0d34-0410-b5e6-96231b3b80d8
When div+rem calls on the same arguments are found, the ARM back-end merges the
two calls into one __aeabi_divmod call for up to 32-bits values. However,
for 64-bit values, which also have a lib call (__aeabi_ldivmod), it wasn't
merging the calls, and thus calling ldivmod twice and spilling the temporary
results, which generated pretty bad code.
This patch legalises 64-bit lib calls for divmod, so that now all the spilling
and the second call are gone. It also relaxes the DivRem combiner a bit on the
legal type check, since it was already checking for isLegalOrCustom on every
value, so the extra check for isTypeLegal was redundant.
This patch fixes PR17193 (and a long time FIXME in the tests).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262507 91177308-0d34-0410-b5e6-96231b3b80d8
The placement new calls here were all calling the allocation function
in RecyclingAllocator/Recycler for SDNode, instead of the function for
the specific subclass we were constructing.
Since this particular allocator always overallocates it more or less
worked, but would hide what we're actually doing from any memory
tools. Also, if you tried to change this allocator so something like a
BumpPtrAllocator or MallocAllocator, the compiler would crash horribly
all the time.
Part of llvm.org/PR26808.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262500 91177308-0d34-0410-b5e6-96231b3b80d8
On AMDGPU where operations i64 operations are often bitcasted to v2i32
and back, this pattern shows up regularly where it breaks some
expected combines on i64, such as load width reducing.
This fixes some test failures in a future commit when i64 loads
are changed to promote.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262397 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r262316.
It seems that my change breaks an out-of-tree chromium buildbot, so
I'm reverting this in order to investigate the situation further.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262387 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Calls sometimes need to be convergent. This is already handled at the
LLVM IR level, but it also needs to be handled at the MI level.
Ideally we'd propagate convergence from instructions, down through the
selection DAG, and into MIs. But this is Hard, and would affect
optimizations in the SDNs -- right now only SDNs with two operands have
any flags at all.
Instead, here's a much simpler hack: Add new opcodes for NVPTX for
convergent calls, and generate these when lowering convergent LLVM
calls.
Reviewers: jholewinski
Subscribers: jholewinski, chandlerc, joker.eph, jhen, tra, llvm-commits
Differential Revision: http://reviews.llvm.org/D17423
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262373 91177308-0d34-0410-b5e6-96231b3b80d8
This reduces the number of bitcast nodes and generally cleans up the
DAG when bitcasting between integers and vectors everywhere.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262358 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This patch modifies the existing comparison, branch, conditional-move
and select patterns, and adds new ones where needed. Also, the updated
SLT{u,i,iu} set of instructions generate a GPR width result.
The majority of the code changes in the Mips back-end fix the wrong
assumption that the result of SETCC nodes always produce an i32 value.
The changes in the common code path account for the fact that in 64-bit
MIPS targets, i1 is promoted to i32 instead of i64.
Reviewers: dsanders
Subscribers: dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D10970
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262316 91177308-0d34-0410-b5e6-96231b3b80d8
This fixes regressions exposed in existing AMDGPU tests in a
future commit when all loads are custom lowered.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262299 91177308-0d34-0410-b5e6-96231b3b80d8
The CatchObjOffset is relative to the end of the EH registration node
for 32-bit x86 WinEH targets. A special sentinel value, 0, is used to
indicate that no catch object should be initialized.
This means that a catch object allocated immediately before the
registration node would be assigned a CatchObjOffset of 0, leading the
runtime to believe that a catch object should not be initialized.
To handle this, allocate the registration node prior to any other frame
object. This will ensure that catch objects will not be allocated
before the registration node.
This fixes PR26757.
Differential Revision: http://reviews.llvm.org/D17689
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262294 91177308-0d34-0410-b5e6-96231b3b80d8
When a variable is described by a single DBG_VALUE instruction we can
often use a more efficient inline DW_AT_location instead of using a
location list.
This commit makes the heuristic that decides when to apply this
optimization stricter by also verifying that the DBG_VALUE is live at the
entry of the function (instead of just checking that it is valid until
the end of the function).
<rdar://problem/24611008>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262247 91177308-0d34-0410-b5e6-96231b3b80d8
InlineSpiller::rematerializeFor() never uses its parameter as an
iterator, so take it by reference instead. This removes an implicit
conversion from MachineBasicBlock::iterator to MachineInstr*.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262152 91177308-0d34-0410-b5e6-96231b3b80d8
Change MachineInstr API to prefer MachineInstr& over MachineInstr*
whenever the parameter is expected to be non-null. Slowly inching
toward being able to fix PR26753.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262149 91177308-0d34-0410-b5e6-96231b3b80d8
In the case where op = add, y = base_ptr, and x = offset, this
transform:
(op y, (op x, c1)) -> (op (op x, y), c1)
breaks the canonical form of add by putting the base pointer in the
second operand and the offset in the first.
This fix is important for the R600 target, because for some address
spaces the base pointer and the offset are stored in separate register
classes. The old pattern caused the ISel code for matching addressing
modes to put the base pointer and offset in the wrong register classes,
which required no-trivial code transformations to fix.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262148 91177308-0d34-0410-b5e6-96231b3b80d8
Take parameters as MachineInstr& instead of MachineInstr* in
AntiDepBreaker API, since these are required to be non-null. No
functionality change intended. Looking toward PR26753.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262145 91177308-0d34-0410-b5e6-96231b3b80d8
This already assumes a valid MI, since it dereferences the MI in an
assertion before checking for null. At an explicit assert.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262144 91177308-0d34-0410-b5e6-96231b3b80d8
In all but one case, change the DFAPacketizer API to take MachineInstr&
instead of MachineInstr*. In DFAPacketizer::endPacket(), take
MachineBasicBlock::iterator. Besides cleaning up the API, this is in
search of PR26753.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262142 91177308-0d34-0410-b5e6-96231b3b80d8
Update APIs in MachineInstrBundle.h to take and return MachineInstr&
instead of MachineInstr* when the instruction cannot be null. Besides
being a nice cleanup, this is tacking toward a fix for PR26753.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262141 91177308-0d34-0410-b5e6-96231b3b80d8
Take MachineInstr by reference instead of by pointer in SlotIndexes and
the SlotIndex wrappers in LiveIntervals. The MachineInstrs here are
never null, so this cleans up the API a bit. It also incidentally
removes a few implicit conversions from MachineInstrBundleIterator to
MachineInstr* (see PR26753).
At a couple of call sites it was convenient to convert to a range-based
for loop over MachineBasicBlock::instr_begin/instr_end, so I added
MachineBasicBlock::instrs.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262115 91177308-0d34-0410-b5e6-96231b3b80d8
MBB slot index intervals are half open, not closed. getMBBEndIndex()
returns the slot index of the start of the next block in layout order.
Placing a register mask there is incorrect if the successor of the
funclet return is not laid out after the return. Clang generates IR for
catch bodies before generating the following normal code, so we never
noticed this issue until the D frontend authors filed a bug about it.
Instead, we can put the clobber mask on the last instruction of the
funclet return block. We still aren't using a register mask operand on
the CATCHRET instruction because it would cause PEI to spill all CSRs,
including XMM regs, in the prologue.
Fixes PR26679.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262035 91177308-0d34-0410-b5e6-96231b3b80d8
This also simplifies the code by removing the overly conservative
NoInterveningSideEffect() function. This function checked:
- That the two copies belong to the same block: We only process one
block at a time and clear our maps in between it is impossible to find a
copy from a different block.
- There is no terminator between the two copy instructions: This is not
allowed anyway (the MachineVerifier would complain)
- Does not have instructions with hasUnmodeledSideEffects() or isCall()
set: Even for those instructuction we must have all clobbers/defs of
registers explicit as an operand. If the register is explicitely
clobbered we would never come to the point of checking for
NoInterveningSideEffect() anyway.
(I also checked this with a temporary build of the test-suite with all
potentially failing conditions in NoInterveningSideEffect() turned into
asserts)
Differential Revision: http://reviews.llvm.org/D17474
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261965 91177308-0d34-0410-b5e6-96231b3b80d8
Inline-asm calls aren't annotated with funclet bundle operands because
they don't throw and cannot be inlined through. We shouldn't require
them to bear an funclet bundle operand.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261942 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Both the hardware and LLVM have changed since 2012.
Now, load-based heuristic don't show big differences any more on OoO cores.
There is no notable regressons and improvements on spec2000/2006. (Cortex-A57, Core i5).
Reviewers: spatel, zansari
Differential Revision: http://reviews.llvm.org/D16836
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261809 91177308-0d34-0410-b5e6-96231b3b80d8
(This is the second attemp to commit this patch, after fixing pr26652 & pr26653).
This patch detects vector reductions before instruction selection. Vector
reductions are vectorized reduction operations, and for such operations we have
freedom to reorganize the elements of the result as long as the reduction of them
stay unchanged. This will enable some reduction pattern recognition during
instruction combine such as SAD/dot-product on X86. A flag is added to
SDNodeFlags to mark those vector reduction nodes to be checked during instruction
combine.
To detect those vector reductions, we search def-use chains starting from the
given instruction, and check if all uses fall into two categories:
1. Reduction with another vector.
2. Reduction on all elements.
in which 2 is detected by recognizing the pattern that the loop vectorizer
generates to reduce all elements in the vector outside of the loop, which
includes several ShuffleVector and one ExtractElement instructions.
Differential revision: http://reviews.llvm.org/D15250
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261804 91177308-0d34-0410-b5e6-96231b3b80d8
This fixes bugs in copy elimination code in llvm. It slightly changes the
semantics of clearRegisterKills(). This is appropriate because:
- Users in lib/CodeGen/MachineCopyPropagation.cpp and
lib/Target/AArch64RedundantCopyElimination.cpp and
lib/Target/SystemZ/SystemZElimCompare.cpp are incorrect without it
(see included testcase).
- All other users in llvm are unaffected (they pass TRI==nullptr)
- (Kill flags are optional anyway so removing too many shouldn't hurt.)
Differential Revision: http://reviews.llvm.org/D17554
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261763 91177308-0d34-0410-b5e6-96231b3b80d8
This is a part of the refactoring to unify isSafeToLoadUnconditionally and isDereferenceablePointer functions. In subsequent change I'm going to eliminate isDerferenceableAndAlignedPointer from Loads API, leaving isSafeToLoadSpecualtively the only function to check is load instruction can be speculated.
Reviewed By: hfinkel
Differential Revision: http://reviews.llvm.org/D16180
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261736 91177308-0d34-0410-b5e6-96231b3b80d8
Change TargetInstrInfo API to take `MachineInstr&` instead of
`MachineInstr*` in the functions related to predicated instructions
(I'll try to come back later and get some of the rest). All of these
functions require non-null parameters already, so references are more
clear. As a bonus, this happens to factor away a host of implicit
iterator => pointer conversions.
No functionality change intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261605 91177308-0d34-0410-b5e6-96231b3b80d8
We supported creating mergeable constant pool entries for smaller
constants but not for 32-byte AVX constants.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261584 91177308-0d34-0410-b5e6-96231b3b80d8
This was causing assertions later from using the wrong pointer
size with LDS operations. getOptimalMemOpType should also have
address space arguments later.
This avoids assertions in existing tests exposed by
a future commit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261580 91177308-0d34-0410-b5e6-96231b3b80d8
DMB instructions can be expensive, so it's best to avoid them if possible. In
atomicrmw operations there will always be an attempted store so a release
barrier is always needed, but in the cmpxchg case we can delay the DMB until we
know we'll definitely try to perform a store (and so need release semantics).
In the strong cmpxchg case this isn't quite free: we must duplicate the LDREX
instructions to skip the barrier on subsequent iterations. The basic outline
becomes:
ldrex rOld, [rAddr]
cmp rOld, rDesired
bne Ldone
dmb
Lloop:
strex rRes, rNew, [rAddr]
cbz rRes Ldone
ldrex rOld, [rAddr]
cmp rOld, rDesired
beq Lloop
Ldone:
So we'll skip this version for strong operations in "minsize" functions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261568 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Convergent instrs shouldn't be made control-dependent on other values,
but this is basically the whole point of tail duplication. So just bail
if we see a convergent instruction.
Reviewers: iteratee
Subscribers: jholewinski, jhen, hfinkel, tra, jingyue, llvm-commits
Differential Revision: http://reviews.llvm.org/D17320
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261540 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r261510, effectively reapplying r261509. The
original commit missed a caller in AArch64ConditionalCompares.
Original commit message:
Pass non-null arguments by reference in MachineTraceMetrics::Trace,
simplifying future work to remove implicit iterator => pointer
conversions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261511 91177308-0d34-0410-b5e6-96231b3b80d8
Pass non-null arguments by reference in MachineTraceMetrics::Trace,
simplifying future work to remove implicit iterator => pointer
conversions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261509 91177308-0d34-0410-b5e6-96231b3b80d8
Delete MachineInstr::getIterator(), since the term "iterator" is
overloaded when talking about MachineInstr.
- Downcast to ilist_node in iplist::getNextNode() and getPrevNode() so
that ilist_node::getIterator() is still available.
- Add it back as MachineInstr::getInstrIterator(). This matches the
naming in MachineBasicBlock.
- Add MachineInstr::getBundleIterator(). This is explicitly called
"bundle" (not matching MachineBasicBlock) to disintinguish it clearly
from ilist_node::getIterator().
- Update all calls. Some of these I switched to `auto` to remove
boiler-plate, since the new name is clear about the type.
There was one call I updated that looked fishy, but it wasn't clear what
the right answer was. This was in X86FrameLowering::inlineStackProbe(),
added in r252578 in lib/Target/X86/X86FrameLowering.cpp. I opted to
leave the behaviour unchanged, but I'll reply to the original commit on
the list in a moment.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261504 91177308-0d34-0410-b5e6-96231b3b80d8
I missed == and != when I removed implicit conversions between iterators
and pointers in r252380 since they were defined outside ilist_iterator.
Since they depend on getNodePtrUnchecked(), they indirectly rely on UB.
This commit removes all uses of these operators. (I'll delete the
operators themselves in a separate commit so that it can be easily
reverted if necessary.)
There should be NFC here.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261498 91177308-0d34-0410-b5e6-96231b3b80d8
Stop using `getNodePtrUnchecked()` when building IR. Eventually a
dereference will be required to get at the downcast node, since the
iterator will only store an `ilist_node_base` of some sort.
This should have no functionality change for now, but is a path towards
removing some more UB from ilist.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261495 91177308-0d34-0410-b5e6-96231b3b80d8
`ilist_iterator<NodeTy>::getNodePtrUnchecked()` is documented as being
for internal use only, but CodeGenPrepare was using it anyway. This
code relies on pulling out the `Value*` pointer even after the lifetime
of the iterator is over. But having this pointer available in
ilist_iterator depends on UB in the first place.
Instead, safely pull out the `Value*` when the iterator is alive and
stop using the internal-only API.
There should be no functionality change here.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261493 91177308-0d34-0410-b5e6-96231b3b80d8
COFF doesn't have sections with mergeable contents. Instead, each
constant pool entry ends up in a COMDAT section. The linker, when
choosing between COMDAT sections, doesn't choose the max alignment of
the two sections. You just get whatever alignment was on the section.
If one constant needed a higher alignment in one object file from
another one, then we will get into trouble if the linker chooses the
lower alignment one.
Instead, lets promote the alignment of the constant pool entry to make
sure we don't use an under aligned constant with an instruction which
assumed otherwise.
This fixes PR26680.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261462 91177308-0d34-0410-b5e6-96231b3b80d8
TailDuplicate can run on either on SSA code or non-SSA code, as indicated to
it by MRI->isSSA() ("PreRegAlloc" here). TailDuplicate does extra work to
preserve SSA invariants when it duplicates code. This patch makes it skip
some of this extra work in the case where the code is not in SSA form.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261450 91177308-0d34-0410-b5e6-96231b3b80d8
This avoids unnecessarily passing them around when calling helper
functions. It may also be slightly faster to call clear() on the
datastructures instead of freshly initializing them for each block.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261407 91177308-0d34-0410-b5e6-96231b3b80d8
Now that we don't always add an element to AllocatedStackSlots if we
don't find a pre-existing unallocated stack slot, bumping
StatepointMaxSlotsRequired to `NumSlots + 1` is not correct. Instead
bump the statistic near the push_back, to
Builder.FuncInfo.StatepointStackSlots.size().
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261348 91177308-0d34-0410-b5e6-96231b3b80d8
The check on MFI->getObjectSize() has to be on the FrameIndex, not on
the index of the FrameIndex in AllocatedStackSlots. Weirdly, the tests
I added in rL261336 didn't catch this.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261347 91177308-0d34-0410-b5e6-96231b3b80d8
NFCI. They key motivation here is that I'd like to use
SmallBitVector::all() in a later change. Also, using a bit vector here
seemed better in general.
The only interesting change here is that in the failure case of
allocateStackSlot, we no longer (the equivalent of) push_back(true) to
AllocatedStackSlots. As far as I can tell, this is fine, since we'd
never re-use those slots in the same StatepointLoweringState instance.
Technically there was no need to change the operator[] type accesses to
set() and test(), but I thought it'd be nice to make it obvious that
we're using something other than a std::vector like thing.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261337 91177308-0d34-0410-b5e6-96231b3b80d8
allocateStackSlot did not consider the size of the value to be spilled
before deciding to re-use a spill slot. This was originally okay (since
originally we'd only ever spill pointers), but it became not okay when
we changed our scheme to directly spill vectors of pointers.
While this change fixes the bug pointed out, it has two performance
caveats:
- It matches spill slot and spillee size exactly, while in theory we
can spill, e.g., an 8 byte pointer into a 16 byte slot. This is
slightly complicated to fix since in the stackmaps section, we report
the size of the spill slot as the size of the "indirect value"; and
if they're no longer equivalent, we'll have to keep track of the
(indirect) value size separately from the stack slot size.
- It will "spuriously run out" of reusable slots, since we now have an
second check in the search loop in addition to the availablity
check (e.g. you had two free scalar slots, and you first ask for a
vector slot followed by a scalar slot). I'll fix this in a later
commit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261336 91177308-0d34-0410-b5e6-96231b3b80d8
This removes the unusual loop structure in allocateStackSlot in favor of
something more straightforward. I've also removed the cautionary
comment in the function, which I suspect is historical cruft now, and
confuses more than it enlightens.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261335 91177308-0d34-0410-b5e6-96231b3b80d8
Certain optimization passes (like globaldce) can prune function
declaration that SjLjEHPrepare assumed would exit when it'd
runOnFunction.
This fixes PR26669.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261303 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Without this, this command
$ llvm-run llc -stop-after machine-cp -o - <( echo '' )
outputs an error, because we close stdout twice -- once when closing the
file opened for "-o", and again when closing outs().
Also clarify in the outs() definition that you can't ever call it if you
want to open your own raw_fd_ostream on stdout.
Reviewers: jroelofs, tstellarAMD
Subscribers: jholewinski, qcolombet, dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D17422
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261286 91177308-0d34-0410-b5e6-96231b3b80d8
Today, we do not allow cmpxchg operations with pointer arguments. We require the frontend to insert ptrtoint casts and do the cmpxchg in integers. While correct, this is problematic from a couple of perspectives:
1) It makes the IR harder to analyse (for instance, it make capture tracking overly conservative)
2) It pushes work onto the frontend authors for no real gain
This patch implements the simplest form of IR support. As we did with floating point loads and stores, we teach AtomicExpand to convert back to the old representation. This prevents us needing to change all backends in a single lock step change. Over time, we can migrate each backend to natively selecting the pointer type. In the meantime, we get the advantages of a cleaner IR representation without waiting for the backend changes.
Differential Revision: http://reviews.llvm.org/D17413
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261281 91177308-0d34-0410-b5e6-96231b3b80d8
covmap needs to created as non allocatable, but not with
SHT_NOTE. The latter was needed to workaround a problem
of BFD linker with gc, which is no longer needed. (A more
proper longer term fix requires changing FE driver to force
referencing the section using linker script).
Differential Revision: http://reviews.llvm.org/D17309
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261228 91177308-0d34-0410-b5e6-96231b3b80d8
The commit breaks stage2 compilation on PowerPC. Reverting for now while
this is analyzed. I also have to revert the LiveIntervalTest for now as
that depends on this commit.
Revert "LiveIntervalAnalysis: Remove LiveVariables requirement"
This reverts commit r260806.
Revert "Remove an unnecessary std::move to fix -Wpessimizing-move warning."
This reverts commit r260931.
Revert "Fix typo in LiveIntervalTest"
This reverts commit r260907.
Revert "Add unittest for LiveIntervalAnalysis::handleMove()"
This reverts commit r260905.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261189 91177308-0d34-0410-b5e6-96231b3b80d8
This patch detects vector reductions before instruction selection. Vector
reductions are vectorized reduction operations, and for such operations we have
freedom to reorganize the elements of the result as long as the reduction of them
stay unchanged. This will enable some reduction pattern recognition during
instruction combine such as SAD/dot-product on X86. A flag is added to
SDNodeFlags to mark those vector reduction nodes to be checked during instruction
combine.
To detect those vector reductions, we search def-use chains starting from the
given instruction, and check if all uses fall into two categories:
1. Reduction with another vector.
2. Reduction on all elements.
in which 2 is detected by recognizing the pattern that the loop vectorizer
generates to reduce all elements in the vector outside of the loop, which
includes several ShuffleVector and one ExtractElement instructions.
Differential revision: http://reviews.llvm.org/D15250
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261070 91177308-0d34-0410-b5e6-96231b3b80d8
This apparently comes up when the register allocator decides that a
variable will become undef along a certain path.
Also improve the error message we emit when we can't map from LLVM
register number to CV register number.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261016 91177308-0d34-0410-b5e6-96231b3b80d8
Eventually we should find a way to describe constant variables, but it
is not obvious how to do this at the moment.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261010 91177308-0d34-0410-b5e6-96231b3b80d8
Original message:
Get rid of the ifdefs in TargetLowering.
Introduce a new API used only by GlobalISel: CallLowering.
This API will contain target hooks dedicated to call lowering.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260998 91177308-0d34-0410-b5e6-96231b3b80d8