As discussed here:
http://lists.llvm.org/pipermail/llvm-dev/2018-May/123292.htmlhttp://lists.llvm.org/pipermail/llvm-dev/2018-July/124400.html
We want to add rotate intrinsics because the IR expansion of that pattern is 4+ instructions,
and we can lose pieces of the pattern before it gets to the backend. Generalizing the operation
by allowing 2 different input values (plus the 3rd shift/rotate amount) gives us a "funnel shift"
operation which may also be a single hardware instruction.
Initially, I thought we needed to define new DAG nodes for these ops, and I spent time working
on that (much larger patch), but then I concluded that we don't need it. At least as a first
step, we have all of the backend support necessary to match these ops...because it was required.
And shepherding these through the IR optimizer is the primary concern, so the IR intrinsics are
likely all that we'll ever need.
There was also a question about converting the intrinsics to the existing ROTL/ROTR DAG nodes
(along with improving the oversized shift documentation). Again, I don't think that's strictly
necessary (as the test results here prove). That can be an efficiency improvement as a small
follow-up patch.
So all we're left with is documentation, definition of the IR intrinsics, and DAG builder support.
Differential Revision: https://reviews.llvm.org/D49242
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337221 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
If the high part of the load is not used the offset to the next element
will not be set correctly.
For example, on Sparc V8, the following code will read val2 from offset 4
instead of 8.
```
int val = __builtin_va_arg(va, long long);
int val2 = __builtin_va_arg(va, int);
```
Reviewers: jyknight
Reviewed By: jyknight
Subscribers: fedor.sergeev, jrtc27, llvm-commits
Differential Revision: https://reviews.llvm.org/D48595
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337161 91177308-0d34-0410-b5e6-96231b3b80d8
This is almost the same as an existing IR canonicalization in instcombine,
so I'm assuming this is a good early generic DAG combine too.
The motivation comes from reduced bit-hacking for select-of-constants in IR
after rL331486. We want to restore that functionality in the DAG as noted in
the commit comments for that change and the llvm-dev discussion here:
http://lists.llvm.org/pipermail/llvm-dev/2018-July/124433.html
The PPC and AArch tests show that those targets are already doing something
similar. x86 will be neutral in the minimal case and generally better when
this pattern is extended with other ops as shown in the signbit-shift.ll tests.
Note the asymmetry: we don't include the (extend (ifneg X)) transform because
it already exists in SimplifySelectCC(), and that is verified in the later
unchanged tests in the signbit-shift.ll files. Without the 'not' op, the
general transform to use a shift is always a win because that's a single
instruction.
Alive proofs:
https://rise4fun.com/Alive/ysli
Name: if pos, get -1
%c = icmp sgt i16 %x, -1
%r = sext i1 %c to i16
=>
%n = xor i16 %x, -1
%r = ashr i16 %n, 15
Name: if pos, get 1
%c = icmp sgt i16 %x, -1
%r = zext i1 %c to i16
=>
%n = xor i16 %x, -1
%r = lshr i16 %n, 15
Differential Revision: https://reviews.llvm.org/D48970
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337130 91177308-0d34-0410-b5e6-96231b3b80d8
This re-applies r336929 with a fix to accomodate for the Mips target
scheduling multiple SelectionDAG instances into the pass pipeline.
PrologEpilogInserter and StackColoring depend on the StackProtector analysis
being alive from the point it is run until PEI, which requires that they are all
scheduled in the same FunctionPassManager. Inserting a (machine) ModulePass
between StackProtector and PEI results in these passes being in separate
FunctionPassManagers and the StackProtector is not available for PEI.
PEI and StackColoring don't use much information from the StackProtector pass,
so transfering the required information to MachineFrameInfo is cleaner than
keeping the StackProtector pass around. This commit moves the SSP layout
information to MFI instead of keeping it in the pass.
This patch set (D37580, D37581, D37582, D37583, D37584, D37585, D37586, D37587)
is a first draft of the pagerando implementation described in
http://lists.llvm.org/pipermail/llvm-dev/2017-June/113794.html.
Patch by Stephen Crane <sjc@immunant.com>
Differential Revision: https://reviews.llvm.org/D49256
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336964 91177308-0d34-0410-b5e6-96231b3b80d8
PrologEpilogInserter and StackColoring depend on the StackProtector analysis
being alive from the point it is run until PEI, which requires that they are all
scheduled in the same FunctionPassManager. Inserting a (machine) ModulePass
between StackProtector and PEI results in these passes being in separate
FunctionPassManagers and the StackProtector is not available for PEI.
PEI and StackColoring don't use much information from the StackProtector pass,
so transfering the required information to MachineFrameInfo is cleaner than
keeping the StackProtector pass around. This commit moves the SSP layout
information to MFI instead of keeping it in the pass.
This patch set (D37580, D37581, D37582, D37583, D37584, D37585, D37586, D37587)
is a first draft of the pagerando implementation described in
http://lists.llvm.org/pipermail/llvm-dev/2017-June/113794.html.
Patch by Stephen Crane <sjc@immunant.com>
Differential Revision: https://reviews.llvm.org/D49256
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336929 91177308-0d34-0410-b5e6-96231b3b80d8
This is marginally helpful for removing redundant extensions, and the
code is easier to read, so it seems like an all-around win. In the new
test i8-phi-ext.ll, we used to emit an AssertSext i8; now we emit an
AssertZext i2, which allows the extension of the return value to be
eliminated.
Differential Revision: https://reviews.llvm.org/D49004
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336868 91177308-0d34-0410-b5e6-96231b3b80d8
This allows us to use SelectionDAG::isKnownNeverZero in DAGCombiner::visitREM (visitSDIVLike/visitUDIVLike handle the checking for constants).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336779 91177308-0d34-0410-b5e6-96231b3b80d8
First stage in PR38057 - support non-uniform constant vectors in the combine to reuse the division-by-constant logic.
We can definitely do better for srem pow2 remainders (and avoid that extra multiply....) but this at least helps keep everything on the vector unit.
Differential Revision: https://reviews.llvm.org/D48975
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336774 91177308-0d34-0410-b5e6-96231b3b80d8
As suggested by @efriedma on D48975 use the visitSDIVLike/visitUDIVLike functions introduced at rL336656.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336664 91177308-0d34-0410-b5e6-96231b3b80d8
As suggested by @efriedma on D48975, this patch separates the BuildDiv/Pow2 style optimizations from the rest of the visitSDIV/visitUDIV to make it easier to reuse the combines and will allow us to avoid some rather nasty node recursive combining in visitREM.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336656 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This adds a reverse transform for the instcombine canonicalizations
that were added in D47980, D47981.
As discussed later, that was worse at least for the code size,
and potentially for the performance, too.
https://rise4fun.com/Alive/Zmpl
Reviewers: craig.topper, RKSimon, spatel
Reviewed By: spatel
Subscribers: reames, llvm-commits
Differential Revision: https://reviews.llvm.org/D48768
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336585 91177308-0d34-0410-b5e6-96231b3b80d8
This is similar to what is done for binops. I don't know if this would have helped us catch the bug fixed in r336566 earlier or not, but I figured it couldn't hurt.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336576 91177308-0d34-0410-b5e6-96231b3b80d8
Splits off isKnownNeverZeroFloat to handle +/- 0 float cases.
This will make it easier to be more aggressive with the integer isKnownNeverZero tests (similar to ValueTracking), use computeKnownBits etc.
Differential Revision: https://reviews.llvm.org/D48969
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336492 91177308-0d34-0410-b5e6-96231b3b80d8
D48278
Allow to reduce redundant shift masks.
For example:
x1 = x & 0xAB00
x2 = (x >> 8) & 0xAB
can be reduced to:
x1 = x & 0xAB00
x2 = x1 >> 8
It only allows folding when the masks and shift values are constants.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336426 91177308-0d34-0410-b5e6-96231b3b80d8
Now that D45806 has landed, we can re-enable support for MIN_SIGNED_VALUE in the sdiv by pow2-constant code
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336198 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This patch introduce new intrinsic -
strip.invariant.group that was described in the
RFC: Devirtualization v2
Reviewers: rsmith, hfinkel, nlopes, sanjoy, amharc, kuhar
Subscribers: arsenm, nhaehnle, JDevlieghere, hiraditya, xbolva00, llvm-commits
Differential Revision: https://reviews.llvm.org/D47103
Co-authored-by: Krzysztof Pszeniczny <krzysztof.pszeniczny@gmail.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336073 91177308-0d34-0410-b5e6-96231b3b80d8
The combine added in commit 329525 overlooked the case where one, but not all, of the divisor elements is -1, -1 is the only power of two value for which the sdiv expansion recipe breaks.
Thanks to @zvi for the original patch.
Differential Revision: https://reviews.llvm.org/D45806
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336048 91177308-0d34-0410-b5e6-96231b3b80d8
We could get away with it for constant folded cases, but not for rL335719.
Thanks to Krzysztof Parzyszek for noticing.
Reapply original commit rL335821 which was reverted at rL335871 due to a WebAssembly bug that was fixed at rL335884.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335886 91177308-0d34-0410-b5e6-96231b3b80d8
Add NoTrapAfterNoreturn target option which skips emission of traps
behind noreturn calls even if TrapUnreachable is enabled.
Enable the feature on Mach-O to save code size; Comments suggest it is
not possible to enable it for the other users of TrapUnreachable.
rdar://41530228
DifferentialRevision: https://reviews.llvm.org/D48674
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335877 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r335821.
This crashes the webassembly test, run "ninja check-llvm-codegen-webassembly" to reproduce.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335871 91177308-0d34-0410-b5e6-96231b3b80d8
We could get away with it for constant folded cases, but not for rL335719.
Thanks to Krzysztof Parzyszek for noticing.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335821 91177308-0d34-0410-b5e6-96231b3b80d8
As noted in the D44909 review, the transform from (fptosi+sitofp) to ftrunc
can produce -0.0 where the original code does not:
#include <stdio.h>
int main(int argc) {
float x;
x = -0.8 * argc;
printf("%f\n", (float)((int)x));
return 0;
}
$ clang -O0 -mavx fp.c ; ./a.out
0.000000
$ clang -O1 -mavx fp.c ; ./a.out
-0.000000
Ideally, we'd use IR/node flags to predicate the transform, but the IR parser
doesn't currently allow fast-math-flags on the cast instructions. So for now,
just use the function attribute that corresponds to clang's "-fno-signed-zeros"
option.
Differential Revision: https://reviews.llvm.org/D48085
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335761 91177308-0d34-0410-b5e6-96231b3b80d8
Temporary fix until I've managed to get D45806 updated - both +1 and -1 special cases need to be properly supported.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335637 91177308-0d34-0410-b5e6-96231b3b80d8
This removes debug locations from ConstantSDNode and ConstantSDFPNode.
When this kind of node is materialized we no longer create a line table
entry which jumps back to the constant's first point of use. This makes
single-stepping behavior smoother, and it matches the model used by IR,
where Constants have no locations. See this thread for more context:
http://lists.llvm.org/pipermail/llvm-dev/2018-June/124164.html
I'd like to handle constant BuildVectorSDNodes and to try to eliminate
passing SDLocs to SelectionDAG::getConstant*() in follow-up commits.
Differential Revision: https://reviews.llvm.org/D48468
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335497 91177308-0d34-0410-b5e6-96231b3b80d8
This patch has the same motivating example as D48466:
define void @foo(i64 %x, i32 %c.0282.in, i32 %d.0280, i32* %ptr0, i32* %ptr1) {
%c.0282 = and i32 %c.0282.in, 268435455
%a16 = lshr i64 32508, %x
%a17 = and i64 %a16, 1
%tobool = icmp eq i64 %a17, 0
%. = select i1 %tobool, i32 1, i32 2
%.286 = select i1 %tobool, i32 27, i32 26
%shr97 = lshr i32 %c.0282, %.
%shl98 = shl i32 %c.0282.in, %.286
%or99 = or i32 %shr97, %shl98
%shr100 = lshr i32 %d.0280, %.
%shl101 = shl i32 %d.0280, %.286
%or102 = or i32 %shr100, %shl101
store i32 %or99, i32* %ptr0
store i32 %or102, i32* %ptr1
ret void
}
...but I'm trying to kill the setcc bool math sooner rather than later.
By matching a larger pattern that includes both the low-bit mask and the trailing add/sub,
we can create a universally good fold because we always eliminate the condition code
intermediate value.
Here are Alive proofs for these (currently instcombine folds the 'add' variants, but
misses the 'sub' patterns):
https://rise4fun.com/Alive/Gsyp
Name: sub of zext cmp mask
%a = and i8 %x, 1
%c = icmp eq i8 %a, 0
%z = zext i1 %c to i32
%r = sub i32 C1, %z
=>
%optional_cast = zext i8 %a to i32
%r = add i32 %optional_cast, C1-1
Name: add of zext cmp mask
%a = and i32 %x, 1
%c = icmp eq i32 %a, 0
%z = zext i1 %c to i8
%r = add i8 %z, C1
=>
%optional_cast = trunc i32 %a to i8
%r = sub i8 C1+1, %optional_cast
All of the tests look like improvements or neutral to me. But it is possible that x86
test+set+bitop is better than what we now show here. I suspect we could do better by
adding another fold for the 'sub' variants.
We start with select-of-constant in IR in the larger motivating test, so that's why I
included tests with selects. Proofs for those variants:
https://rise4fun.com/Alive/Bx1
Name: true const is bigger
Pre: C2 == (C1 + 1)
%a = and i8 %x, 1
%c = icmp eq i8 %a, 0
%r = select i1 %c, i64 C2, i64 C1
=>
%z = zext i8 %a to i64
%r = sub i64 C2, %z
Name: false const is bigger
Pre: C2 == (C1 + 1)
%a = and i8 %x, 1
%c = icmp eq i8 %a, 0
%r = select i1 %c, i64 C1, i64 C2
=>
%z = zext i8 %a to i64
%r = add i64 C1, %z
Differential Revision: https://reviews.llvm.org/D48466
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335433 91177308-0d34-0410-b5e6-96231b3b80d8
Allowed folding for "and/or" binops with non-constant operand if
arguments of select are 0/-1 values.
Normally this code with "and" opcode does not get to a DAG combiner
and simplified yet in the InstCombine. However AMDGPU produces it
during lowering and InstCombine has no chance to optimize it out.
In turn the same pattern with "or" opcode can reach DAG.
Differential Revision: https://reviews.llvm.org/D48301
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335250 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
In some cases, these operands lacked the IsDebug property, which is meant to signal that
they should not affect codegen. This patch adds a check for this property in the
MachineVerifier and adds it where it was missing.
This includes refactorings to use MachineInstrBuilder construction functions instead of
manually setting up the intrinsic everywhere.
Patch by: JesperAntonsson
Reviewers: aprantl, rnk, echristo, javed.absar
Reviewed By: aprantl
Subscribers: qcolombet, sdardis, nemanjai, JDevlieghere, atanasyan, llvm-commits
Differential Revision: https://reviews.llvm.org/D48319
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335214 91177308-0d34-0410-b5e6-96231b3b80d8
The alignment parameter to getExtLoad is treated as a base alignment,
not the alignment of the load (base + offset). When we infer a better
alignment for a Ptr we need to ensure that it applies to the base to
prevent the alignment on the load from being wrong.
This fixes a bug where the alignment could then be used to incorrectly
prove noalias between a load and a store, leading to a miscompile.
Differential Revision: https://reviews.llvm.org/D48029
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335210 91177308-0d34-0410-b5e6-96231b3b80d8