This way we end up not looking at PHI args already removed.
MemSSA now goes through the updater so we can prune
it to avoid having redundant MemoryPHI arguments, but that
doesn't quite work for the general case.
Discussed with Daniel Berlin, fixes PR33406.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305409 91177308-0d34-0410-b5e6-96231b3b80d8
There's an early out that's trying to detect when we don't know any bits that make up the legal range of a shift. The code subtracts one from BitWidth which creates a mask in the lower bits for power of 2 bit widths. This is then ANDed with the known bits to see if any of those bits are known. If the bit width isn't a power of 2 this creates a non-sensical mask.
This patch corrects this by rounding up to a power of 2 before doing the subtract and mask.
Differential Revision: https://reviews.llvm.org/D34165
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305400 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This patch is part of 3 patches that together form a single patch, but must be introduced in stages in order not to break things.
The way that LLVM interprets DW_OP_plus in DIExpression nodes is basically that of the DW_OP_plus_uconst operator since LLVM expects an unsigned constant operand. This unnecessarily restricts the DW_OP_plus operator, preventing it from being used to describe the evaluation of runtime values on the expression stack. These patches try to align the semantics of DW_OP_plus and DW_OP_minus with that of the DWARF definition, which pops two elements off the expression stack, performs the operation and pushes the result back on the stack.
This is done in three stages:
• The first patch (LLVM) adds support for DW_OP_plus_uconst.
• The second patch (Clang) contains changes all its uses from DW_OP_plus to DW_OP_plus_uconst.
• The third patch (LLVM) changes the semantics of DW_OP_plus and DW_OP_minus to be in line with its DWARF meaning. This patch includes the bitcode upgrade from legacy DIExpressions.
Patch by Sander de Smalen.
Reviewers: echristo, pcc, aprantl
Reviewed By: aprantl
Subscribers: fhahn, javed.absar, aprantl, llvm-commits
Differential Revision: https://reviews.llvm.org/D33894
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305386 91177308-0d34-0410-b5e6-96231b3b80d8
InstCombine has an optimization that recognizes an and with the sign bit of legal type size and turns it into a truncate and compare that checks the sign bit. But the select handling code doesn't recognize this idiom.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305338 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Leave an updated VP metadata on the fallback memcpy intrinsic after
specialization. This can be used for later possible expansion based on
the average of the remaining values.
Reviewers: davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34164
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305321 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
After RS4GC, we should drop metadata that is no longer valid. These metadata
is used by optimizations scheduled after RS4GC, and can cause a miscompile.
One such metadata is invariant.load which is used by LICM sinking transform.
After rewriting statepoints, the address of a load maybe relocated. With
invariant.load metadata on a load instruction, LICM sinking assumes the
loaded value (from a dererenceable address) to be invariant, and
rematerializes the load operand and the load at the exit block.
This transforms the IR to have an unrelocated use of the
address after a statepoint, which is incorrect.
Other metadata we conservatively remove are related to
dereferenceability and noalias metadata.
This patch drops such metadata on store and load instructions after
rewriting statepoints.
Reviewers: reames, sanjoy, apilipenko
Reviewed by: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D33756
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305234 91177308-0d34-0410-b5e6-96231b3b80d8
Currently there is a bug in SROA::presplitLoadsAndStores which causes assertion in
GEPOperator::accumulateConstantOffset.
Basically it does not consider the situation that the pointer operand of load or store
may be in a non-zero address space and its size may be different from the size of
a pointer in address space 0.
This patch fixes assertion when compiling Blender Cycles kernels for amdgpu backend.
Diffferential Revision: https://reviews.llvm.org/D33298
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305107 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
isSafeToSpeculativelyExecute is the wrong predicate to use here.
All that checks for is whether it is safe to hoist a value due to
unaligned/un-dereferencable accesses. However, not only are we doing
sinking rather than hoisting, our concern is that the location
we're loading from may have been modified. Instead forbid sinking
any load across a critical edge.
Reviewers: majnemer
Subscribers: davide, llvm-commits
Differential Revision: https://reviews.llvm.org/D33179
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305102 91177308-0d34-0410-b5e6-96231b3b80d8
This change adds an option disable-lftr to be able to disable Linear Function Test Replace optimization.
By default option is off so current behavior is not changed.
Reviewers: reames, sanjoy, wmi, andreadb, apilipenko
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D33979
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305055 91177308-0d34-0410-b5e6-96231b3b80d8
If we're shrinking a binary operation, it may be the case that the new
operations wraps where the old didn't. If this happens, the behavior
should be well-defined. So, we can't always carry wrapping flags with us
when we shrink operations.
If we do, we get incorrect optimizations in cases like:
void foo(const unsigned char *from, unsigned char *to, int n) {
for (int i = 0; i < n; i++)
to[i] = from[i] - 128;
}
which gets optimized to:
void foo(const unsigned char *from, unsigned char *to, int n) {
for (int i = 0; i < n; i++)
to[i] = from[i] | 128;
}
Because:
- InstCombine turned `sub i32 %from.i, 128` into
`add nuw nsw i32 %from.i, 128`.
- LoopVectorize vectorized the add to be `add nuw nsw <16 x i8>` with a
vector full of `i8 128`s
- InstCombine took advantage of the fact that the newly-shrunken add
"couldn't wrap", and changed the `add` to an `or`.
InstCombine seems happy to figure out whether we can add nuw/nsw on its
own, so I just decided to drop the flags. There are already a number of
places in LoopVectorize where we rely on InstCombine to clean up.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305053 91177308-0d34-0410-b5e6-96231b3b80d8
Other comments/implications are that this isn't intended behavior (nor
perserved/reimplemented in the new inliner) & complicates fixing the
'inlining' of trivially dead calls without consulting the cost function
first.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305052 91177308-0d34-0410-b5e6-96231b3b80d8
Since D17854 LinkerSubsectionsViaSymbols is unnecessary.
It is interfering with ThinLTO implementation of CFI-ICall, where
the aliases used on the !LinkerSubsectionsViaSymbols branch are
needed to export jump tables to ThinLTO backends.
This is the second attempt to land this change after fixing PR33316.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305031 91177308-0d34-0410-b5e6-96231b3b80d8
No IR tests were added with rL304313 ( https://reviews.llvm.org/D28637 ),
so I want these for extra coverage if we enable memcmp expansion for x86.
As shown, nothing is expanded for x86 in CGP yet.
Also fundamentally, we're doing an IR transform, so we should have IR tests
for just that part. If something goes wrong, we need to know if the bug is
in CGP or later lowering.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305011 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: Early-inlining of recursive call makes the code size bloat exponentially. We should not disable it.
Reviewers: davidxl, dnovillo, iteratee
Reviewed By: iteratee
Subscribers: iteratee, llvm-commits, sanjoy
Differential Revision: https://reviews.llvm.org/D34017
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305009 91177308-0d34-0410-b5e6-96231b3b80d8
This is a temporarily fix which needs additional work, as it triggers a test3 failure.
test3 is commented out till then.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304993 91177308-0d34-0410-b5e6-96231b3b80d8
This was discussed in D33338. We have larger pattern-matching ending in a truncate that
we can reduce or remove by handling these smaller patterns first. Further motivation is
that narrower shift ops are easier for value tracking and zext is better than sext.
http://rise4fun.com/Alive/rhh
Name: boolshift
%sext = sext i1 %x to i8
%r = lshr i8 %sext, 7
=>
%r = zext i1 %x to i8
Name: noboolshift
%sext = sext i3 %x to i8
%r = lshr i8 %sext, 7
=>
%sh = lshr i3 %x, 2
%r = zext i3 %sh to i8
Differential Revision: https://reviews.llvm.org/D33879
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304939 91177308-0d34-0410-b5e6-96231b3b80d8
This makes it so that the code quality for CFI checks when compiling
with -O2 and linking with --lto-O0 is similar to that of the rest of
the code.
Reduces the size of a chrome binary built with -O2/--lto-O0 by
about 750KB.
Differential Revision: https://reviews.llvm.org/D33925
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304921 91177308-0d34-0410-b5e6-96231b3b80d8
A few tests in llvm/test/Transforms/Util/PredicateInfo/ are using -reverse-iterate.
The option -reverse-iterate is enabled with +Asserts in usual cases, but it can be turned on/off regardless of LLVM_ENABLE_ASSERTIONS.
I wonder if this were incompatible to https://reviews.llvm.org/D33908 (r304757).
Differential Revision: https://reviews.llvm.org/D33854
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304851 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The patch makes instruction count the highest priority for
LSR solution for X86 (previously registers had highest priority).
Reviewers: qcolombet
Differential Revision: http://reviews.llvm.org/D30562
From: Evgeny Stupachenko <evstupac@gmail.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304824 91177308-0d34-0410-b5e6-96231b3b80d8
Patch https://reviews.llvm.org/rL304806 was causing failures in Aarch64
and multiple other targets since the test should be run on X86 only.
Specifying the target triple is not enough. Moving the testcase to the
X86 target directory in LoopIdiom.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304809 91177308-0d34-0410-b5e6-96231b3b80d8
1. When there is no perfect iteration order, we can't let phi nodes
put themselves in terms of things that come later in the iteration
order, or we will endlessly cycle (the normal RPO algorithm clears the
hashtable to avoid this issue).
2. We are sometimes erasing the wrong expression (causing pessimism)
because our equality says loads and stores are the same.
We introduce an exact equality function and use it when erasing to
make sure we erase only identical expressions, not equivalent ones.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304807 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Expanding the loop idiom test for memcpy to also recognize
unordered atomic memcpy. The only difference for recognizing
an unordered atomic memcpy and instead of a normal memcpy is
that the loads and/or stores involved are unordered atomic operations.
Background: http://lists.llvm.org/pipermail/llvm-dev/2017-May/112779.html
Patch by Daniel Neilson!
Reviewers: reames, anna, skatkov
Reviewed By: reames, anna
Subscribers: llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D33243
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304806 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
We were canonizalizing the pre loop (into loop-simplify form) before
the post loop blocks were added into parent loop. This is incorrect when IRCE is
done on a subloop. The post-loop blocks are created, but not yet added to the
parent loop. So, loop-simplification on the pre-loop incorrectly updates
LoopInfo.
This patch corrects the ordering so that pre and post loop blocks are added to
parent loop (if any), and then the loops are canonicalized to LCSSA and
LoopSimplifyForm.
Reviewers: reames, sanjoy, apilipenko
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D33846
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304800 91177308-0d34-0410-b5e6-96231b3b80d8
This fixes a bug that can cause extractelements with operands that
haven't been defined yet to be inserted at a wrong point when
optimising insertelements.
Patch by Karl Hylen.
Differential Revision: https://reviews.llvm.org/D33449
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304701 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This is to enable the new switch inline cost heuristic (r301649) by removing the
old heuristic as well as the flag itself.
In my experiment for LLVM test suite and spec2000/2006, +17.82% performance and
8% code size reduce was observed in spec2000/vertex with O3 LTO in AArch64.
No significant code size / performance regression was found in O3/O2/Os. No
significant complain was reported from the llvm-dev thread.
Reviewers: hans, chandlerc, eraman, haicheng, mcrosier, bmakam, eastig, ddibyend, echristo
Reviewed By: echristo
Subscribers: javed.absar, kristof.beyls, echristo, aemerson, rengolin, mehdi_amini
Differential Revision: https://reviews.llvm.org/D32653
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304594 91177308-0d34-0410-b5e6-96231b3b80d8