When merging through a TokenFactor we need to check that the
load may be ordered such that no other aliasing memory operations may
happen. It is not sufficient to just check that the load is a member
of the chain token factor as it there may be a indirect chain. Require
the load's chain has only one use.
This fixes PR37826.
Reviewers: spatel, davide, efriedma, craig.topper, RKSimon
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D49388
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337560 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
If unfolding an SUnit results in both load or the operation using it which
already exist in the DAG, abort the unfold if they are already scheduled.
If not, make sure we don't add duplicate dependencies.
This fixes PR37916.
Reviewers: davide, eli.friedman, fhahn, bogner
Subscribers: MatzeB, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D48666
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337409 91177308-0d34-0410-b5e6-96231b3b80d8
As discussed here:
http://lists.llvm.org/pipermail/llvm-dev/2018-May/123292.htmlhttp://lists.llvm.org/pipermail/llvm-dev/2018-July/124400.html
We want to add rotate intrinsics because the IR expansion of that pattern is 4+ instructions,
and we can lose pieces of the pattern before it gets to the backend. Generalizing the operation
by allowing 2 different input values (plus the 3rd shift/rotate amount) gives us a "funnel shift"
operation which may also be a single hardware instruction.
Initially, I thought we needed to define new DAG nodes for these ops, and I spent time working
on that (much larger patch), but then I concluded that we don't need it. At least as a first
step, we have all of the backend support necessary to match these ops...because it was required.
And shepherding these through the IR optimizer is the primary concern, so the IR intrinsics are
likely all that we'll ever need.
There was also a question about converting the intrinsics to the existing ROTL/ROTR DAG nodes
(along with improving the oversized shift documentation). Again, I don't think that's strictly
necessary (as the test results here prove). That can be an efficiency improvement as a small
follow-up patch.
So all we're left with is documentation, definition of the IR intrinsics, and DAG builder support.
Differential Revision: https://reviews.llvm.org/D49242
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337221 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
If the high part of the load is not used the offset to the next element
will not be set correctly.
For example, on Sparc V8, the following code will read val2 from offset 4
instead of 8.
```
int val = __builtin_va_arg(va, long long);
int val2 = __builtin_va_arg(va, int);
```
Reviewers: jyknight
Reviewed By: jyknight
Subscribers: fedor.sergeev, jrtc27, llvm-commits
Differential Revision: https://reviews.llvm.org/D48595
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337161 91177308-0d34-0410-b5e6-96231b3b80d8
This is almost the same as an existing IR canonicalization in instcombine,
so I'm assuming this is a good early generic DAG combine too.
The motivation comes from reduced bit-hacking for select-of-constants in IR
after rL331486. We want to restore that functionality in the DAG as noted in
the commit comments for that change and the llvm-dev discussion here:
http://lists.llvm.org/pipermail/llvm-dev/2018-July/124433.html
The PPC and AArch tests show that those targets are already doing something
similar. x86 will be neutral in the minimal case and generally better when
this pattern is extended with other ops as shown in the signbit-shift.ll tests.
Note the asymmetry: we don't include the (extend (ifneg X)) transform because
it already exists in SimplifySelectCC(), and that is verified in the later
unchanged tests in the signbit-shift.ll files. Without the 'not' op, the
general transform to use a shift is always a win because that's a single
instruction.
Alive proofs:
https://rise4fun.com/Alive/ysli
Name: if pos, get -1
%c = icmp sgt i16 %x, -1
%r = sext i1 %c to i16
=>
%n = xor i16 %x, -1
%r = ashr i16 %n, 15
Name: if pos, get 1
%c = icmp sgt i16 %x, -1
%r = zext i1 %c to i16
=>
%n = xor i16 %x, -1
%r = lshr i16 %n, 15
Differential Revision: https://reviews.llvm.org/D48970
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337130 91177308-0d34-0410-b5e6-96231b3b80d8
This re-applies r336929 with a fix to accomodate for the Mips target
scheduling multiple SelectionDAG instances into the pass pipeline.
PrologEpilogInserter and StackColoring depend on the StackProtector analysis
being alive from the point it is run until PEI, which requires that they are all
scheduled in the same FunctionPassManager. Inserting a (machine) ModulePass
between StackProtector and PEI results in these passes being in separate
FunctionPassManagers and the StackProtector is not available for PEI.
PEI and StackColoring don't use much information from the StackProtector pass,
so transfering the required information to MachineFrameInfo is cleaner than
keeping the StackProtector pass around. This commit moves the SSP layout
information to MFI instead of keeping it in the pass.
This patch set (D37580, D37581, D37582, D37583, D37584, D37585, D37586, D37587)
is a first draft of the pagerando implementation described in
http://lists.llvm.org/pipermail/llvm-dev/2017-June/113794.html.
Patch by Stephen Crane <sjc@immunant.com>
Differential Revision: https://reviews.llvm.org/D49256
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336964 91177308-0d34-0410-b5e6-96231b3b80d8
PrologEpilogInserter and StackColoring depend on the StackProtector analysis
being alive from the point it is run until PEI, which requires that they are all
scheduled in the same FunctionPassManager. Inserting a (machine) ModulePass
between StackProtector and PEI results in these passes being in separate
FunctionPassManagers and the StackProtector is not available for PEI.
PEI and StackColoring don't use much information from the StackProtector pass,
so transfering the required information to MachineFrameInfo is cleaner than
keeping the StackProtector pass around. This commit moves the SSP layout
information to MFI instead of keeping it in the pass.
This patch set (D37580, D37581, D37582, D37583, D37584, D37585, D37586, D37587)
is a first draft of the pagerando implementation described in
http://lists.llvm.org/pipermail/llvm-dev/2017-June/113794.html.
Patch by Stephen Crane <sjc@immunant.com>
Differential Revision: https://reviews.llvm.org/D49256
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336929 91177308-0d34-0410-b5e6-96231b3b80d8
This is marginally helpful for removing redundant extensions, and the
code is easier to read, so it seems like an all-around win. In the new
test i8-phi-ext.ll, we used to emit an AssertSext i8; now we emit an
AssertZext i2, which allows the extension of the return value to be
eliminated.
Differential Revision: https://reviews.llvm.org/D49004
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336868 91177308-0d34-0410-b5e6-96231b3b80d8
This allows us to use SelectionDAG::isKnownNeverZero in DAGCombiner::visitREM (visitSDIVLike/visitUDIVLike handle the checking for constants).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336779 91177308-0d34-0410-b5e6-96231b3b80d8
First stage in PR38057 - support non-uniform constant vectors in the combine to reuse the division-by-constant logic.
We can definitely do better for srem pow2 remainders (and avoid that extra multiply....) but this at least helps keep everything on the vector unit.
Differential Revision: https://reviews.llvm.org/D48975
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336774 91177308-0d34-0410-b5e6-96231b3b80d8
As suggested by @efriedma on D48975 use the visitSDIVLike/visitUDIVLike functions introduced at rL336656.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336664 91177308-0d34-0410-b5e6-96231b3b80d8
As suggested by @efriedma on D48975, this patch separates the BuildDiv/Pow2 style optimizations from the rest of the visitSDIV/visitUDIV to make it easier to reuse the combines and will allow us to avoid some rather nasty node recursive combining in visitREM.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336656 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This adds a reverse transform for the instcombine canonicalizations
that were added in D47980, D47981.
As discussed later, that was worse at least for the code size,
and potentially for the performance, too.
https://rise4fun.com/Alive/Zmpl
Reviewers: craig.topper, RKSimon, spatel
Reviewed By: spatel
Subscribers: reames, llvm-commits
Differential Revision: https://reviews.llvm.org/D48768
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336585 91177308-0d34-0410-b5e6-96231b3b80d8
This is similar to what is done for binops. I don't know if this would have helped us catch the bug fixed in r336566 earlier or not, but I figured it couldn't hurt.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336576 91177308-0d34-0410-b5e6-96231b3b80d8
Splits off isKnownNeverZeroFloat to handle +/- 0 float cases.
This will make it easier to be more aggressive with the integer isKnownNeverZero tests (similar to ValueTracking), use computeKnownBits etc.
Differential Revision: https://reviews.llvm.org/D48969
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336492 91177308-0d34-0410-b5e6-96231b3b80d8
D48278
Allow to reduce redundant shift masks.
For example:
x1 = x & 0xAB00
x2 = (x >> 8) & 0xAB
can be reduced to:
x1 = x & 0xAB00
x2 = x1 >> 8
It only allows folding when the masks and shift values are constants.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336426 91177308-0d34-0410-b5e6-96231b3b80d8
Now that D45806 has landed, we can re-enable support for MIN_SIGNED_VALUE in the sdiv by pow2-constant code
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336198 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This patch introduce new intrinsic -
strip.invariant.group that was described in the
RFC: Devirtualization v2
Reviewers: rsmith, hfinkel, nlopes, sanjoy, amharc, kuhar
Subscribers: arsenm, nhaehnle, JDevlieghere, hiraditya, xbolva00, llvm-commits
Differential Revision: https://reviews.llvm.org/D47103
Co-authored-by: Krzysztof Pszeniczny <krzysztof.pszeniczny@gmail.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336073 91177308-0d34-0410-b5e6-96231b3b80d8
The combine added in commit 329525 overlooked the case where one, but not all, of the divisor elements is -1, -1 is the only power of two value for which the sdiv expansion recipe breaks.
Thanks to @zvi for the original patch.
Differential Revision: https://reviews.llvm.org/D45806
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336048 91177308-0d34-0410-b5e6-96231b3b80d8
We could get away with it for constant folded cases, but not for rL335719.
Thanks to Krzysztof Parzyszek for noticing.
Reapply original commit rL335821 which was reverted at rL335871 due to a WebAssembly bug that was fixed at rL335884.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335886 91177308-0d34-0410-b5e6-96231b3b80d8
Add NoTrapAfterNoreturn target option which skips emission of traps
behind noreturn calls even if TrapUnreachable is enabled.
Enable the feature on Mach-O to save code size; Comments suggest it is
not possible to enable it for the other users of TrapUnreachable.
rdar://41530228
DifferentialRevision: https://reviews.llvm.org/D48674
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335877 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r335821.
This crashes the webassembly test, run "ninja check-llvm-codegen-webassembly" to reproduce.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335871 91177308-0d34-0410-b5e6-96231b3b80d8
We could get away with it for constant folded cases, but not for rL335719.
Thanks to Krzysztof Parzyszek for noticing.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335821 91177308-0d34-0410-b5e6-96231b3b80d8
As noted in the D44909 review, the transform from (fptosi+sitofp) to ftrunc
can produce -0.0 where the original code does not:
#include <stdio.h>
int main(int argc) {
float x;
x = -0.8 * argc;
printf("%f\n", (float)((int)x));
return 0;
}
$ clang -O0 -mavx fp.c ; ./a.out
0.000000
$ clang -O1 -mavx fp.c ; ./a.out
-0.000000
Ideally, we'd use IR/node flags to predicate the transform, but the IR parser
doesn't currently allow fast-math-flags on the cast instructions. So for now,
just use the function attribute that corresponds to clang's "-fno-signed-zeros"
option.
Differential Revision: https://reviews.llvm.org/D48085
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335761 91177308-0d34-0410-b5e6-96231b3b80d8
Temporary fix until I've managed to get D45806 updated - both +1 and -1 special cases need to be properly supported.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335637 91177308-0d34-0410-b5e6-96231b3b80d8