Without this test, you can just remove the code fixing the
switch to the first constant in ResolvedUndefs in and everything
pass. This test, instead, fails with an assertion if the code
is removed. Found while refactoring SCCP to integrate undef in
the solver.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287731 91177308-0d34-0410-b5e6-96231b3b80d8
If there is no debug info in the callee, inlining it will not help annotator. This avoids infinite loop as reported in PR/31119.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287710 91177308-0d34-0410-b5e6-96231b3b80d8
We visit and/or, we try to derive a lattice value for the
instruction even if one of the operands is overdefined.
If the non-overdefined value is still 'unknown' just return and wait
for ResolvedUndefsIn to "plug in" the correct value. This simplifies
the logic a bit. While I'm here add tests for missing cases.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287709 91177308-0d34-0410-b5e6-96231b3b80d8
In PR27925:
https://llvm.org/bugs/show_bug.cgi?id=27925
...we proposed adding this fold to eliminate a bitcast. In D20774, there was
some concern about changing the type of a bitwise op as well as creating
bitcasts that might not be free for a target. However, if we're strictly
eliminating an instruction (by limiting this to one-use ops), then we should
be able to do this in InstCombine.
But we're cautiously restricting the transform for now to vector types to
avoid possible backend problems. A transform to make sure the logic op is
legal for the target should be added to reverse this transform and improve
codegen.
Differential Revision: https://reviews.llvm.org/D26641
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287707 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Previously, CGP would unconditionally sink addrspacecast instructions,
even going so far as to sink them into a loop.
Now we check that the cast is "cheap", as defined by TLI.
We introduce a new "is-cheap" function to TLI rather than using
isNopAddrSpaceCast because some GPU platforms want the ability to ask
for non-nop casts to be sunk.
Reviewers: arsenm, tra
Subscribers: jholewinski, wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D26923
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287591 91177308-0d34-0410-b5e6-96231b3b80d8
Allow using an instruction other than a mul or phi as the base for
root-finding. For example, the included testcase includes a loop
which requires using a getelementptr as the base for root-finding.
Differential Revision: https://reviews.llvm.org/D26529
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287588 91177308-0d34-0410-b5e6-96231b3b80d8
This is a first step towards canonicalization and improved folding/codegen
for integer min/max as discussed here:
http://lists.llvm.org/pipermail/llvm-dev/2016-November/106868.html
Here, we're just matching the simplest min/max patterns and adjusting the
icmp predicate while swapping the select operands.
I've included FIXME tests in test/Transforms/InstCombine/select_meta.ll
so it's easier to see how this might be extended (corresponds to the TODO
comment in the code). That's also why I'm using matchSelectPattern()
rather than a simpler check; once the backend is patched, we can just
remove some of the restrictions to allow the obfuscated min/max patterns
in the FIXME tests to be matched.
Differential Revision: https://reviews.llvm.org/D26525
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287585 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
D26704 fixed the non-determinism in codegen by sorting basic blocks before
iteration so as to have a defined iteration order. As a result we need to fix
the names (numbers) of the temporaries in the following unit tests:
test/Transforms/Util/MemorySSA/multi-edges.ll
test/Transforms/Util/MemorySSA/multiple-backedges-hal.ll
Reviewers: dberlin, david2050, mgrang
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26926
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287575 91177308-0d34-0410-b5e6-96231b3b80d8
This patch fixes the non-determinism caused due to iterating SmallPtrSet's
which was uncovered due to the experimental "reverse iteration order " patch:
https://reviews.llvm.org/D26718
The following unit tests failed because of the undefined order of iteration.
LLVM :: Transforms/Util/MemorySSA/cyclicphi.ll
LLVM :: Transforms/Util/MemorySSA/many-dom-backedge.ll
LLVM :: Transforms/Util/MemorySSA/many-doms.ll
LLVM :: Transforms/Util/MemorySSA/phi-translation.ll
Reviewers: dberlin, mgrang
Subscribers: dberlin, llvm-commits, david2050
Differential Revision: https://reviews.llvm.org/D26704
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287563 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: Merging an empty case block into the header block of switch could cause
ISel to add COPY instructions in the header of switch, instead of the case
block, if the case block is used as an incoming block of a PHI. This could
potentially increase dynamic instructions, especially when the switch is in a
loop. I added a test case which was reduced from the benchmark I was targetting.
Reviewers: t.p.northover, mcrosier, manmanren, wmi, davidxl
Subscribers: qcolombet, danielcdh, hfinkel, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D22696
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287553 91177308-0d34-0410-b5e6-96231b3b80d8
Currently LLVM assumes that a pointer addrspacecasted to a different addr space is equivalent to trunc or zext bitwise, which is not true. For example, in amdgcn target, when a null pointer is addrspacecasted from addr space 4 to 0, its value is changed from i64 0 to i32 -1.
This patch teaches LLVM not to assume known bits of addrspacecast instruction to its operand.
Differential Revision: https://reviews.llvm.org/D26803
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287545 91177308-0d34-0410-b5e6-96231b3b80d8
insertUniqueBackedgeBlock in lib/Transforms/Utils/LoopSimplify.cpp now
propagates existing llvm.loop metadata to newly the added backedge.
llvm::TryToSimplifyUncondBranchFromEmptyBlock in lib/Transforms/Utils/Local.cpp
now propagates existing llvm.loop metadata to the branch instructions in the
predecessor blocks of the empty block that is removed.
Differential Revision: https://reviews.llvm.org/D26495
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287341 91177308-0d34-0410-b5e6-96231b3b80d8
This is a straightforward extension of the existing support for 32/64-bit element types. Just needed to add the additional instrinsics to the switches.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287316 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This extends FCOPYSIGN support to 512-bit vectors.
I've also added tests to show what the 128-bit and 256-bit cases look like with broadcast loads.
Reviewers: delena, zvi, RKSimon, spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26791
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287298 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
For flat loop, even if it is hot, it is not a good idea to unroll in runtime, thus we set a lower partial unroll threshold.
For hot loop, we set a higher unroll threshold and allows expensive tripcount computation to allow more aggressive unrolling.
Reviewers: davidxl, mzolotukhin
Subscribers: sanjoy, mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D26527
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287186 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: These intrinsics have been unused for clang for a while. This patch removes them. We auto upgrade them to extractelements, a scalar operation and then an insertelement. This matches the sequence used by clangs intrinsic file.
Reviewers: zvi, delena, RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26660
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287083 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
We don't do BypassSlowDivision when the denominator is a constant, but
we do do it when the numerator is a constant.
This patch makes two related changes to BypassSlowDivision when the
numerator is a constant:
* If the numerator is too large to fit into the bypass width, don't
bypass slow division (because we'll never run the smaller-width
code).
* If we bypass slow division where the numerator is a constant, don't
OR together the numerator and denominator when determining whether
both operands fit within the bypass width. We need to check only the
denominator.
Reviewers: tra
Subscribers: llvm-commits, jholewinski
Differential Revision: https://reviews.llvm.org/D26699
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287062 91177308-0d34-0410-b5e6-96231b3b80d8
In RateRegister of existing LSR, if a formula contains a Reg which is a SCEVAddRecExpr,
and this SCEVAddRecExpr's loop is an outerloop, the formula will be marked as Loser
and dropped.
Suppose we have an IR that %for.body is outerloop and %for.body2 is innerloop. LSR only
handle inner loop now so only %for.body2 will be handled.
Using the logic above, formula like
reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) will be dropped
no matter what because reg({1,+, %size}<%for.body>) is a SCEVAddRecExpr type reg related
with outerloop. Only formula like
reg(%array) + 1*reg({{1,+, %size}<%for.body>,+,1}<nuw><nsw><%for.body2>) will be kept
because the SCEVAddRecExpr related with outerloop is folded into the initial value of the
SCEVAddRecExpr related with current loop.
But in some cases, we do need to share the basic induction variable
reg{0 ,+, 1}<%for.body2> among LSR Uses to reduce the final total number of induction
variables used by LSR, so we don't want to drop the formula like
reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) unconditionally.
From the existing comment, it tries to avoid considering multiple level loops at the same time.
However, existing LSR only handles innermost loop, so for any SCEVAddRecExpr with a loop other
than current loop, it is an invariant and will be simple to handle, and the formula doesn't have
to be dropped.
Differential Revision: https://reviews.llvm.org/D26429
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286999 91177308-0d34-0410-b5e6-96231b3b80d8
When both WidenIV::getWideRecurrence and WidenIV::getExtendedOperandRecurrence
return non-null but different WideAddRec, if getWideRecurrence is called
before getExtendedOperandRecurrence, we won't bother to call
getExtendedOperandRecurrence again. But As we know it is possible that after
SCEV folding, we cannot prove the legality using the SCEVAddRecExpr returned
by getWideRecurrence. Meanwhile if getExtendedOperandRecurrence returns non-null
WideAddRec, we know for sure that it is legal to do widening for current instruction.
So it is better to put getExtendedOperandRecurrence before getWideRecurrence, which
will increase the chance of successful widening.
Differential Revision: https://reviews.llvm.org/D26059
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286987 91177308-0d34-0410-b5e6-96231b3b80d8
The register usage algorithm incorrectly treats instructions whose value is
not used within the loop (e.g. those that do not produce a value).
The algorithm first calculates the usages within the loop. It iterates over
the instructions in order, and records at which instruction index each use
ends (in fact, they're actually recorded against the next index, as this is
when we want to delete them from the open intervals).
The algorithm then iterates over the instructions again, adding each
instruction in turn to a list of open intervals. Instructions are then
removed from the list of open intervals when they occur in the list of uses
ended at the current index.
The problem is, instructions which are not used in the loop are skipped.
However, although they aren't used, the last use of a value may have been
recorded against that instruction index. In this case, the use is not deleted
from the open intervals, which may then bump up the estimated register usage.
This patch fixes the issue by simply moving the "is used" check after the loop
which erases the uses at the current index.
Differential Revision: https://reviews.llvm.org/D26554
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286969 91177308-0d34-0410-b5e6-96231b3b80d8
Add explicit v16i16/v32i8 ADD/SUB costs, matching the costs of v4i64/v8i32 - they were missing for some reason.
This has side effects on the LV max bandwidth tests (AVX1 now prefers 128-bit vectors vs AVX2 which still prefers 256-bit)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286832 91177308-0d34-0410-b5e6-96231b3b80d8
When calculating the cost of a call instruction we were applying a heuristic penalty as well as the cost of the instruction itself.
However, when calculating the benefit from inlining we weren't discounting the equivalent penalty for the call instruction that would be removed! This caused skew in the calculation and meant we wouldn't inline in the following, trivial case:
int g() {
h();
}
int f() {
g();
}
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286814 91177308-0d34-0410-b5e6-96231b3b80d8
This is PR28376.
Unfortunately given the current structure of optimization diagnostics we
lack the capability to tell whether the user has
passed -Rpass-analysis=loop-vectorize since this is local to the
front-end (BackendConsumer::OptimizationRemarkHandler).
So rather than printing this even if the user has already
passed -Rpass-analysis, this patch just punts and stops recommending
this option. I don't think that getting this right is worth the
complexity.
Differential Revision: https://reviews.llvm.org/D26563
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286662 91177308-0d34-0410-b5e6-96231b3b80d8
When a function pointer is replaced with a jumptable pointer, special
case is needed to preserve the semantics of extern_weak functions.
Since a jumptable entry can not be extern_weak, we emulate that
behaviour by replacing all references to F (the extern_weak function)
with the following expression: F != nullptr ? JumpTablePtr : nullptr.
Extra special care is needed for global initializers, since most (or
probably all) backends can not lower an initializer that includes
this kind of constant expression. Initializers like that are replaced
with a global constructor (i.e. a runtime initializer).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286636 91177308-0d34-0410-b5e6-96231b3b80d8
The current implementation is emitting a global constant that happens
to evaluate to the same bytes + relocation as a jump instruction on
X86. This does not work for PIE executables and shared libraries
though, because we end up with a wrong relocation type. And it has no
chance of working on ARM/AArch64 which use different relocation types
for jump instructions (R_ARM_JUMP24) that is never generated for
data.
This change replaces the constant with module-level inline assembly
followed by a hidden declaration of the jump table. Works fine for
ARM/AArch64, but has some drawbacks.
* Extra symbols are added to the static symbol table, which inflate
the size of the unstripped binary a little. Stripped binaries are not
affected. This happens because jump table declarations must be
external (because their body is in the inline asm).
* Original functions that were anonymous are now named
<original name>.cfi, and it affects symbolization sometimes. This is
necessary because the only user of these functions is the (inline
asm) jump table, so they had to be added to @llvm.used, which does
not allow unnamed functions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286611 91177308-0d34-0410-b5e6-96231b3b80d8
The r283656 did this in the remark arguments. We also need to do this
in the main function attribute as that is written to YAML as well.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286482 91177308-0d34-0410-b5e6-96231b3b80d8
Note that the existing metadata checking was re-added by hand because the
script doesn't currently know how to generate checks for lines outside of
functions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286460 91177308-0d34-0410-b5e6-96231b3b80d8
Removing the limitation in visitInsertElementInst() causes several regressions
because we're not prepared to fold sequences of shuffles or inserts and extracts
separated by shuffles. Fixing that appears to be a difficult mission because we
are purposely trying to avoid creating shuffles with arbitrary shuffle masks
because some targets may choke on those.
https://llvm.org/bugs/show_bug.cgi?id=30923
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286423 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The change will test the change in r286159.
The idea behind the change: Make the dbg location different between loop header and preheader/exit. Originally, dbg location 21 exists in 3 BBs: preheader, header, critical edge (exit). Update the debug location of inside the loop header from !21 to !22 so that it will reflect the correct location.
Reviewers: probinson
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26428
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286403 91177308-0d34-0410-b5e6-96231b3b80d8
Scalar Evolution asserts when not all the operands of an Add Recurrence
Expression are loop invariants. Loop Strength Reduction should only
create affine Add Recurrences, so that both the start and the step of
the expression are loop invariants.
Differential Revision: https://reviews.llvm.org/D26185
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286347 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: For functions with profile data, we are confident that loop sink will be optimal in sinking code.
Reviewers: davidxl, hfinkel
Subscribers: mehdi_amini, mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D26155
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286325 91177308-0d34-0410-b5e6-96231b3b80d8