Summary:
This change adds some verification in the IR verifier around struct path
TBAA metadata.
Other than some basic sanity checks (e.g. we get constant integers where
we expect constant integers), this checks:
- That by the time an struct access tuple `(base-type, offset)` is
"reduced" to a scalar base type, the offset is `0`. For instance, in
C++ you can't start from, say `("struct-a", 16)`, and end up with
`("int", 4)` -- by the time the base type is `"int"`, the offset
better be zero. In particular, a variant of this invariant is needed
for `llvm::getMostGenericTBAA` to be correct.
- That there are no cycles in a struct path.
- That struct type nodes have their offsets listed in an ascending
order.
- That when generating the struct access path, you eventually reach the
access type listed in the tbaa tag node.
Reviewers: dexonsmith, chandlerc, reames, mehdi_amini, manmanren
Subscribers: mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D26438
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@289402 91177308-0d34-0410-b5e6-96231b3b80d8
This teaches SimplifyDemandedElts that the FMA can be removed if the lower element isn't used. It also teaches it that if upper elements of the first operand aren't used then we can simplify them.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@289377 91177308-0d34-0410-b5e6-96231b3b80d8
The motivating example is:
extern int patatino;
int goo() {
int x = 0;
for (int i = 0; i < 1000000; ++i) {
x *= patatino;
}
return x;
}
Currently SCCP will not realize that this function returns always zero,
therefore will try to unroll and vectorize the loop at -O3 producing an
awful lot of (useless) code. With this change, it will just produce:
0000000000000000 <g>:
xor %eax,%eax
retq
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@289175 91177308-0d34-0410-b5e6-96231b3b80d8
Currently SCCP folds the value to -1, while ConstantProp folds to
0. This changes SCCP to do what ConstantFolding does.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@289147 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Attaching !absolute_symbol to a global variable does two things:
1) Marks it as an absolute symbol reference.
2) Specifies the value range of that symbol's address.
Teach the X86 backend to allow absolute symbols to appear in place of
immediates by extending the relocImm and mov64imm32 matchers. Start using
relocImm in more places where it is legal.
As previously proposed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2016-October/105800.html
Differential Revision: https://reviews.llvm.org/D25878
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@289087 91177308-0d34-0410-b5e6-96231b3b80d8
When trying to vectorize trees that start at insertelement instructions
function tryToVectorizeList() uses vectorization factor calculated as
MinVecRegSize/ScalarTypeSize. But sometimes it does not work as tree
cost for this fixed vectorization factor is too high.
Patch tries to improve the situation. It tries different vectorization
factors from max(PowerOf2Floor(NumberOfVectorizedValues),
MinVecRegSize/ScalarTypeSize) to MinVecRegSize/ScalarTypeSize and tries
to choose the best one.
Differential Revision: https://reviews.llvm.org/D27215
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@289043 91177308-0d34-0410-b5e6-96231b3b80d8
Replace @progbits in the section directive with %progbits, because "@" starts a comment on arm/thumb.
Use b.w branch instruction.
Use .thumb_function and .thumb_set for proper arm/thumb interwork. This way jumptable entry addresses on thumb have bit 0 set (correctly). This does not affect CFI check math, because the address of the jumptable start also has that bit set.
This does not work on thumbv5, because it does not support b.w, and the linker would not insert a veneer (trampoline?) to extend the range of b.n. We may need to do full-range plt-style jumptables on thumbv54, which are 12 bytes per entry. Another option is "push lr; bl; pop pc" (4 bytes) but that needs unwinding instructions, etc.
Differential Revision: https://reviews.llvm.org/D27499
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@289008 91177308-0d34-0410-b5e6-96231b3b80d8
The fix committed in r288851 doesn't cover all the cases.
In particular, if we have an instruction with side effects
which has a no non-dbg use not depending on the bits, we still
perform RAUW destroying the dbg.value's first argument.
Prevent metadata from being replaced here to avoid the issue.
Differential Revision: https://reviews.llvm.org/D27534
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288987 91177308-0d34-0410-b5e6-96231b3b80d8
The tests that already work are folded in InstSimplify, so those
tests should be redundant and we can remove them if they don't
seem worthwhile for completeness.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288957 91177308-0d34-0410-b5e6-96231b3b80d8
This patch attempts to scalarize the operand expressions of predicated
instructions if they were conditionally executed in the original loop. After
scalarization, the expressions will be sunk inside the blocks created for the
predicated instructions. The transformation essentially performs
un-if-conversion on the operands.
The cost model has been updated to determine if scalarization is profitable. It
compares the cost of a vectorized instruction, assuming it will be
if-converted, to the cost of the scalarized instruction, assuming that the
instructions corresponding to each vector lane will be sunk inside a predicated
block, possibly avoiding execution. If it's more profitable to scalarize the
entire expression tree feeding the predicated instruction, the expression will
be scalarized; otherwise, it will be vectorized. We only consider the cost of
the entire expression to accurately estimate the cost of the required
insertelement and extractelement instructions.
Differential Revision: https://reviews.llvm.org/D26083
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288909 91177308-0d34-0410-b5e6-96231b3b80d8
In the case of a fully redundant load LI dominated by an equivalent load V, GVN
should always preserve the original debug location of V. Otherwise, we risk to
introduce an incorrect stepping.
If V has debug info, then clearly it should not be modified. If V has a null
debugloc, then it is still potentially incorrect to propagate LI's debugloc
because LI may not post-dominate V.
Differential Revision: https://reviews.llvm.org/D27468
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288903 91177308-0d34-0410-b5e6-96231b3b80d8
As Eli noted in the post-commit thread for r288833, the use of
swapOperands() may not be allowed in InstSimplify, so I'm
removing those calls here pending further review.
The swap mutates the icmp, and there doesn't appear to be precedent
for instruction mutation in InstSimplify.
I didn't actually have any tests for those cases, so I'm adding
a few here.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288855 91177308-0d34-0410-b5e6-96231b3b80d8
BDCE has two phases:
1. It asks SimplifyDemandedBits if all the bits of an instruction are dead, and if so,
replaces all its uses with the constant zero.
2. Then, it asks SimplifyDemandedBits again if the instruction is really dead
(no side effects etc..) and if so, eliminates it.
Now, in 1) if all the bits of an instruction are dead, we may end up replacing a dbg use:
%call = tail call i32 (...) @g() #4, !dbg !15
tail call void @llvm.dbg.value(metadata i32 %call, i64 0, metadata !8, metadata !16), !dbg !17
->
%call = tail call i32 (...) @g() #4, !dbg !15
tail call void @llvm.dbg.value(metadata i32 0, i64 0, metadata !8, metadata !16), !dbg !17
but not eliminating the call because it may have arbitrary side effects.
In other words, we lose some debug informations.
This patch fixes the problem making sure that BDCE does nothing with the instruction if
it has side effects and no non-dbg uses.
Differential Revision: https://reviews.llvm.org/D27471
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288851 91177308-0d34-0410-b5e6-96231b3b80d8
All of these (and a few more) are already handled by InstCombine,
but we shouldn't have to wait until then to simplify these because
they're cheap to deal with here in InstSimplify.
This is the 'and' sibling of the earlier 'or' patch:
https://reviews.llvm.org/rL288833
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288841 91177308-0d34-0410-b5e6-96231b3b80d8
All of these (and a few more) are already handled by InstCombine,
but we shouldn't have to wait until then to simplify these because
they're cheap to deal with here in InstSimplify.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288833 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
If LAA expands a bound that is loop invariant, but not hoisted out
of the loop body, it used to use that value anyway, causing a
non-domination error, because the memcheck block is of course not
dominated by the scalar loop body. Detect this situation and expand
the SCEV expression instead.
Fixes PR31251
Reviewers: anemet
Subscribers: mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D27397
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288705 91177308-0d34-0410-b5e6-96231b3b80d8
so we can stop using DW_OP_bit_piece with the wrong semantics.
The entire back story can be found here:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20161114/405934.html
The gist is that in LLVM we've been misinterpreting DW_OP_bit_piece's
offset field to mean the offset into the source variable rather than
the offset into the location at the top the DWARF expression stack. In
order to be able to fix this in a subsequent patch, this patch
introduces a dedicated DW_OP_LLVM_fragment operation with the
semantics that we used to apply to DW_OP_bit_piece, which is what we
actually need while inside of LLVM. This patch is complete with a
bitcode upgrade for expressions using the old format. It does not yet
fix the DWARF backend to use DW_OP_bit_piece correctly.
Implementation note: We discussed several options for implementing
this, including reserving a dedicated field in DIExpression for the
fragment size and offset, but using an custom operator at the end of
the expression works just fine and is more efficient because we then
only pay for it when we need it.
Differential Revision: https://reviews.llvm.org/D27361
rdar://problem/29335809
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288683 91177308-0d34-0410-b5e6-96231b3b80d8
VSX has instructions lxsiwax/lxsdx that can load 32/64 bit value into VSX register cheaply. That patch makes it known to memory cost model, so the vectorization of the test case in pr30990 is beneficial.
Differential Revision: https://reviews.llvm.org/D26713
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288560 91177308-0d34-0410-b5e6-96231b3b80d8
For -O0 there might be unreachable BBs, which breaks the assumption that all the
BBs have an auxiliary data structure. In this patch, we add another interface
called findBBInfo() so that a nullptr can be returned for the unreachable BBs
(and the callers can ignore those BBs).
This fixes the bug reported
https://llvm.org/bugs/show_bug.cgi?id=31209
Differential Revision: https://reviews.llvm.org/D27280
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288528 91177308-0d34-0410-b5e6-96231b3b80d8
Demonstrate missed opportunity for urem -> and combine for powerof2 or zero non-uniform constant dividers
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288510 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r288497, as it broke the AArch64 build of Compiler-RT's
builtins (twice: once in r288412 and once in r288497). We should investigate
this offline.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288508 91177308-0d34-0410-b5e6-96231b3b80d8
When trying to vectorize trees that start at insertelement instructions
function tryToVectorizeList() uses vectorization factor calculated as
MinVecRegSize/ScalarTypeSize. But sometimes it does not work as tree
cost for this fixed vectorization factor is too high.
Patch tries to improve the situation. It tries different vectorization
factors from max(PowerOf2Floor(NumberOfVectorizedValues),
MinVecRegSize/ScalarTypeSize) to MinVecRegSize/ScalarTypeSize and tries
to choose the best one.
Differential Revision: https://reviews.llvm.org/D27215
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288497 91177308-0d34-0410-b5e6-96231b3b80d8
The instcombine code which folds loads and stores into their use types can trip up if the use is a bitcast to a type which we can't directly load or store in the IR. In principle, such types shouldn't exist, but in practice they do today. This is a workaround to avoid a bug while we work towards the long term goal.
Differential Revision: https://reviews.llvm.org/D24365
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288415 91177308-0d34-0410-b5e6-96231b3b80d8
When trying to vectorize trees that start at insertelement instructions
function tryToVectorizeList() uses vectorization factor calculated as
MinVecRegSize/ScalarTypeSize. But sometimes it does not work as tree
cost for this fixed vectorization factor is too high.
Patch tries to improve the situation. It tries different vectorization
factors from max(PowerOf2Floor(NumberOfVectorizedValues),
MinVecRegSize/ScalarTypeSize) to MinVecRegSize/ScalarTypeSize and tries
to choose the best one.
Differential Revision: https://reviews.llvm.org/D27215
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288412 91177308-0d34-0410-b5e6-96231b3b80d8
Currently when cost of scalar operations is evaluated the vector type is
used for scalar operations. Patch fixes this issue and fixes evaluation
of the vector operations cost.
Several test showed that vector cost model is too optimistic. It
allowed vectorization of 8 or less add/fadd operations, though scalar
code is faster. Actually, only for 16 or more operations vector code
provides better performance.
Differential Revision: https://reviews.llvm.org/D26277
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288398 91177308-0d34-0410-b5e6-96231b3b80d8