Currently only "add nsw" are widened. This patch eliminates tons of "sext" instructions for 64 bit code (and the corresponding target code) in cases like:
int N = 100;
float **A;
void foo(int x0, int x1)
{
float * A_cur = &A[0][0];
float * A_next = &A[1][0];
for(int x = x0; x < x1; ++x).
{
// Currently only [x+N] case is widened. Others 2 cases lead to sext.
// This patch fixes it, so all 3 cases do not need sext.
const float div = A_cur[x + N] + A_cur[x - N] + A_cur[x * N];
A_next[x] = div;
}
}
...
> clang++ test.cpp -march=core-avx2 -Ofast -fno-unroll-loops -fno-tree-vectorize -S -o -
Differential Revision: http://reviews.llvm.org/D4695
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216160 91177308-0d34-0410-b5e6-96231b3b80d8
If we have a scalar reduction, we can increase the critical path length if the loop we're unrolling is inside another loop. Limit, by default to 2, so the critical path only gets increased by one reduction operation.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216140 91177308-0d34-0410-b5e6-96231b3b80d8
We can prove that a 'sub' can be a 'sub nuw' if the left-hand side is
negative and the right-hand side is non-negative.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216045 91177308-0d34-0410-b5e6-96231b3b80d8
Because declarations of these functions can appear in places like autoconf
checks, they have to be handled somehow, even though we do not support
vararg custom functions. We do so by printing a warning and calling the
uninstrumented function, as we do for unimplemented functions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216042 91177308-0d34-0410-b5e6-96231b3b80d8
We can prove that a 'sub' can be a 'sub nsw' under certain conditions:
- The sign bits of the operands is the same.
- Both operands have more than 1 sign bit.
The subtraction cannot be a signed overflow in either case.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216037 91177308-0d34-0410-b5e6-96231b3b80d8
Previously, the hint mechanism relied on clean up passes to remove redundant
metadata, which still showed up if running opt at low levels of optimization.
That also has shown that multiple nodes of the same type, but with different
values could still coexist, even if temporary, and cause confusion if the
next pass got the wrong value.
This patch makes sure that, if metadata already exists in a loop, the hint
mechanism will never append a new node, but always replace the existing one.
It also enhances the algorithm to cope with more metadata types in the future
by just adding a new type, not a lot of code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215994 91177308-0d34-0410-b5e6-96231b3b80d8
While this might seem like an obvious canonicalization, there is one subtle problem with it. The result of the original expression
is undef when x is NaN (remember, fast math flags), but the result of the select is always defined when x is NaN. This means that the
new expression is strictly more defined than the original one. One unfortunate consequence of this is that the transform is not reversible!
It's always legal to make increase the defined-ness of an expression, but it's not legal to reduce it. Thus, targets that prefer the original
form of the expression cannot reverse the transform to recover it. Another way to think of it is that the transform has lost source-level
information (the fast math flags), which is undesirable.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215825 91177308-0d34-0410-b5e6-96231b3b80d8
While *most* (X sdiv 1) operations will get caught by InstSimplify, it
is still possible for a sdiv to appear in the worklist which hasn't been
simplified yet.
This means that it is possible for 0 - (X sdiv 1) to get transformed
into (X sdiv -1); dividing by -1 can make the transform produce undef
values instead of the proper result.
Sorry for the lack of testcase, it's a bit problematic because it relies
on the exact order of operations in the worklist.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215818 91177308-0d34-0410-b5e6-96231b3b80d8
We can combne a mul with a div if one of the operands is a multiple of
the other:
%mul = mul nsw nuw %a, C1
%ret = udiv %mul, C2
=>
%ret = mul nsw %a, (C1 / C2)
This can expose further optimization opportunities if we end up
multiplying or dividing by a power of 2.
Consider this small example:
define i32 @f(i32 %a) {
%mul = mul nuw i32 %a, 14
%div = udiv exact i32 %mul, 7
ret i32 %div
}
which gets CodeGen'd to:
imull $14, %edi, %eax
imulq $613566757, %rax, %rcx
shrq $32, %rcx
subl %ecx, %eax
shrl %eax
addl %ecx, %eax
shrl $2, %eax
retq
We can now transform this into:
define i32 @f(i32 %a) {
%shl = shl nuw i32 %a, 1
ret i32 %shl
}
which gets CodeGen'd to:
leal (%rdi,%rdi), %eax
retq
This fixes PR20681.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215815 91177308-0d34-0410-b5e6-96231b3b80d8
Replace the old code in GVN and BBVectorize with it. Update SimplifyCFG to use
it.
Patch by Björn Steinbrink!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215723 91177308-0d34-0410-b5e6-96231b3b80d8
When a call site with noalias metadata is inlined, that metadata can be
propagated directly to the inlined instructions (only those that might access
memory because it is not useful on the others). Prior to inlining, the noalias
metadata could express that a call would not alias with some other memory
access, which implies that no instruction within that called function would
alias. By propagating the metadata to the inlined instructions, we preserve
that knowledge.
This should complete the enhancements requested in PR20500.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215676 91177308-0d34-0410-b5e6-96231b3b80d8
When preserving noalias function parameter attributes by adding noalias
metadata in the inliner, we should do this for general function calls (not just
memory intrinsics). The logic is very similar to what already existed (except
that we want to add this metadata even for functions taking no relevant
parameters). This metadata can be used by ModRef queries in the caller after
inlining.
This addresses the first part of PR20500. Adding noalias metadata during
inlining is still turned off by default.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215657 91177308-0d34-0410-b5e6-96231b3b80d8
Vector instructions are (still) not supported for either integer or floating
point. Hopefully, that work will be landed shortly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215647 91177308-0d34-0410-b5e6-96231b3b80d8
v2: continue iterating through the rest of the bb
use for loop
v3: initialize FlattenCFG pass in ScalarOps
add test
v4: split off initializing flattencfg to a separate patch
add comment
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215574 91177308-0d34-0410-b5e6-96231b3b80d8
Add header guards to files that were missing guards. Remove #endif comments
as they don't seem common in LLVM (we can easily add them back if we decide
they're useful)
Changes made by clang-tidy with minor tweaks.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215558 91177308-0d34-0410-b5e6-96231b3b80d8
attribute and function argument attribute synthesizing and propagating.
As with the other uses of this attribute, the goal remains a best-effort
(no guarantees) attempt to not optimize the function or assume things
about the function when optimizing. This is particularly useful for
compiler testing, bisecting miscompiles, triaging things, etc. I was
hitting specific issues using optnone to isolate test code from a test
driver for my fuzz testing, and this is one step of fixing that.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215538 91177308-0d34-0410-b5e6-96231b3b80d8
First, avoid calling setTailCall(false) on musttail calls. The funciton
prototypes should be "congruent", so the shadow layout should be exactly
the same.
Second, avoid inserting instrumentation after a musttail call to
propagate the return value shadow. We don't need to propagate the
result of a tail call, it should already be in the right place.
Reviewed By: eugenis
Differential Revision: http://reviews.llvm.org/D4331
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215415 91177308-0d34-0410-b5e6-96231b3b80d8
What follows bellow is a correctness proof of the transform using CVC3.
$ < t.cvc
A, B : BITVECTOR(32);
QUERY BVPLUS(32, A & B, A | B) = BVPLUS(32, A, B);
$ cvc3 < t.cvc
Valid.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215400 91177308-0d34-0410-b5e6-96231b3b80d8
GlobalOpt didn't know how to simulate InsertValueInst or
ExtractValueInst. Optimizing these is pretty straightforward.
N.B. This came up when looking at clang's IRGen for MS ABI member
pointers; they are represented as aggregates.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215184 91177308-0d34-0410-b5e6-96231b3b80d8
this case, the code path dealing with vector promotion was missing the explicit
checks for lifetime intrinsics that were present on the corresponding integer
promotion path.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215148 91177308-0d34-0410-b5e6-96231b3b80d8
This swaps the order of the loop vectorizer and the SLP/BB vectorizers. It is disabled by default so we can do performance testing - ideally we want to change to having the loop vectorizer running first, and the SLP vectorizer using its leftovers instead of the other way around.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214963 91177308-0d34-0410-b5e6-96231b3b80d8
already have a large number of blocks. Works around a performance issue with
the greedy register allocator.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214944 91177308-0d34-0410-b5e6-96231b3b80d8
This is mostly a cleanup, but it changes a fairly old behavior.
Every "real" LTO user was already disabling the silly internalize pass
and creating the internalize pass itself. The difference with this
patch is for "opt -std-link-opts" and the C api.
Now to get a usable behavior out of opt one doesn't need the funny
looking command line:
opt -internalize -disable-internalize -internalize-public-api-list=foo,bar -std-link-opts
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214919 91177308-0d34-0410-b5e6-96231b3b80d8
Optimize the following IR:
%1 = tail call noalias i8* @calloc(i64 1, i64 4)
%2 = bitcast i8* %1 to i32*
; This store is dead and should be removed
store i32 0, i32* %2, align 4
Memory returned by calloc is guaranteed to be zero initialized. If the value being stored is the constant zero (and the store is not otherwise observable across threads), we can delete the store. If the store is to an out of bounds address, it is undefined and thus also removable.
Reviewed By: nicholas
Differential Revision: http://reviews.llvm.org/D3942
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214897 91177308-0d34-0410-b5e6-96231b3b80d8
Some types, such as 128-bit vector types on AArch64, don't have any callee-saved registers. So if a value needs to stay live over a callsite, it must be spilled and refilled. This cost is now taken into account.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@214859 91177308-0d34-0410-b5e6-96231b3b80d8