Commit Graph

572 Commits

Author SHA1 Message Date
Nikita Popov
360d0ef498 [InstCombine] Remove redundant/bogus mul_with_overflow combines
As pointed out in D60518 folding mulo(%x, undef) to {undef, undef}
isn't correct. As a correct version of this already exists in
InstructionSimplify (bd8056ef32/lib/Analysis/InstructionSimplify.cpp (L4750-L4757)) this is just
dead code though. Drop it together with the mul(%x, 0) -> {0, false}
fold that is also already handled by InstSimplify.

Differential Revision: https://reviews.llvm.org/D60649

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@358339 91177308-0d34-0410-b5e6-96231b3b80d8
2019-04-13 19:43:35 +00:00
Nikita Popov
bea42108cf [InstCombine] Handle ssubo always overflow
Following D60483 and D60497, this adds support for AlwaysOverflows
handling for ssubo. This is the last case we can handle right now.

Differential Revision: https://reviews.llvm.org/D60518

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@358100 91177308-0d34-0410-b5e6-96231b3b80d8
2019-04-10 16:32:15 +00:00
Nikita Popov
84db5e0028 [InstCombine] Handle saddo always overflow
Followup to D60483: Handle AlwaysOverflow conditions for saddo as
well.

Differential Revision: https://reviews.llvm.org/D60497

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@358095 91177308-0d34-0410-b5e6-96231b3b80d8
2019-04-10 16:18:01 +00:00
Nikita Popov
b8b80606ab [InstCombine] Handle usubo always overflow
Check AlwaysOverflow condition for usubo. The implementation is the
same as the existing handling for uaddo and umulo. Handling for saddo
and ssubo will follow (smulo doesn't have the necessary ValueTracking
support).

Differential Revision: https://reviews.llvm.org/D60483

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@358052 91177308-0d34-0410-b5e6-96231b3b80d8
2019-04-10 07:10:53 +00:00
Nikita Popov
ce68404678 [InstCombine] Directly call computeOverflow methods in OptimizeOverflowCheck; NFC
Instead of using the willOverflow helpers. This makes it easier to
extend handling of AlwaysOverflows.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@358051 91177308-0d34-0410-b5e6-96231b3b80d8
2019-04-10 07:10:44 +00:00
Nikita Popov
b9767f7ec0 [InstCombine] Restructure OptimizeOverflowCheck; NFC
Change the code to always handle the unsigned+signed cases together
with the same basic structure for add/sub/mul. The simple folds are
always handled first and then the ValueTracking overflow checks are
used.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@358025 91177308-0d34-0410-b5e6-96231b3b80d8
2019-04-09 18:32:28 +00:00
Luqman Aden
0f925da273 [InstCombine] Combine no-wrap sub and icmp w/ constant.
Teach InstCombine the transformation `(icmp P (sub nuw|nsw C2, Y), C) -> (icmp swap(P) Y, C2-C)`

Reviewers: majnemer, apilipenko, sanjoy, spatel, lebedev.ri

Reviewed By: lebedev.ri

Subscribers: dmgreen, lebedev.ri, nikic, hiraditya, JDevlieghere, jfb, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59916

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357674 91177308-0d34-0410-b5e6-96231b3b80d8
2019-04-04 07:08:30 +00:00
Sanjay Patel
bcd8ea0e2d [InstCombine] Fix crashing from (icmp (bitcast ([su]itofp X)), Y)
This fixes a class of bugs introduced by D44367, 
which transforms various cases of icmp (bitcast ([su]itofp X)), Y to icmp X, Y. 
If the bitcast is between vector types with a different number of elements, 
the current code will produce bad IR along the lines of: icmp <N x i32> ..., <M x i32> <...>.

This patch suppresses the transform if the bitcast changes the number of vector elements.

Patch by: @AndrewScheidecker (Andrew Scheidecker)

Differential Revision: https://reviews.llvm.org/D57871


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@353467 91177308-0d34-0410-b5e6-96231b3b80d8
2019-02-07 21:12:01 +00:00
Sanjay Patel
fdf00f35cb [InstCombine] refactor folds for (icmp (bitcast X), Y); NFCI
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@353462 91177308-0d34-0410-b5e6-96231b3b80d8
2019-02-07 20:54:09 +00:00
Sanjay Patel
9cecbbb1ba [InstCombine] X | C == C --> (X & ~C) == 0
We should canonicalize to one of these forms,
and compare-with-zero could be more conducive
to follow-on transforms. This also leads to
generally better codegen as shown in PR40611:
https://bugs.llvm.org/show_bug.cgi?id=40611


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@353313 91177308-0d34-0410-b5e6-96231b3b80d8
2019-02-06 16:43:54 +00:00
James Y Knight
6029aa8149 [opaque pointer types] Pass function types to CallInst creation.
This cleans up all CallInst creation in LLVM to explicitly pass a
function type rather than deriving it from the pointer's element-type.

Differential Revision: https://reviews.llvm.org/D57170

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@352909 91177308-0d34-0410-b5e6-96231b3b80d8
2019-02-01 20:43:25 +00:00
Nikita Popov
c333f36cb3 [InstCombine] Simplify cttz/ctlz + icmp ugt/ult
Followup to D55745, this time handling comparisons with ugt and ult
predicates (which are the canonical forms for non-equality predicates).

For ctlz we can convert into a simple icmp, for cttz we can convert
into a mask check.

Differential Revision: https://reviews.llvm.org/D56355

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@351645 91177308-0d34-0410-b5e6-96231b3b80d8
2019-01-19 09:56:01 +00:00
Chandler Carruth
6b547686c5 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@351636 91177308-0d34-0410-b5e6-96231b3b80d8
2019-01-19 08:50:56 +00:00
Nikita Popov
a0b99f282a [InstCombine] Simplify cttz/ctlz + icmp eq/ne into mask check
Checking whether a number has a certain number of trailing / leading
zeros means checking whether it is of the form XXXX1000 / 0001XXXX,
which can be done with an and+icmp.

Related to https://bugs.llvm.org/show_bug.cgi?id=28668. As a next
step, this can be extended to non-equality predicates.

Differential Revision: https://reviews.llvm.org/D55745

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@349530 91177308-0d34-0410-b5e6-96231b3b80d8
2018-12-18 19:59:50 +00:00
Nikita Popov
387b5efaea [InstCombine] Fix negative GEP offset evaluation for 32-bit pointers
This fixes https://bugs.llvm.org/show_bug.cgi?id=39908.

The evaluateGEPOffsetExpression() function simplifies GEP offsets for
use in comparisons against zero, basically by converting X*Scale+Offset==0
to X+Offset/Scale==0 if Scale divides Offset. However, before this is done,
Offset is masked down to the pointer size. This results in incorrect
results for negative Offsets, because we basically end up dividing the
32-bit offset *zero* extended to 64-bit bits (rather than sign extended).

Fix this by explicitly sign extending the truncated value.

Differential Revision: https://reviews.llvm.org/D55449

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348987 91177308-0d34-0410-b5e6-96231b3b80d8
2018-12-12 23:19:03 +00:00
Roman Lebedev
261e911988 [InstCombine] foldICmpWithLowBitMaskedVal(): don't miscompile -1 vector elts
I was finally able to quantify what i thought was missing in the fix,
it was vector constants. If we have a scalar (and %x, -1),
it will be instsimplified before we reach this code,
but if it is a vector, we may still have a -1 element.

Thus, we want to avoid the fold if *at least one* element is -1.
Or in other words, ignoring the undef elements, no sign bits
should be set. Thus, m_NonNegative().

A follow-up for rL348181
https://bugs.llvm.org/show_bug.cgi?id=39861

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348462 91177308-0d34-0410-b5e6-96231b3b80d8
2018-12-06 08:14:24 +00:00
Sanjay Patel
d222be1e7a [InstCombine] simplify icmps with same operands based on dominating cmp
The tests here are based on the motivating cases from D54827.

More background:
1. We don't get these cases in general with SimplifyCFG because the root
   of the pattern match is an icmp, not a branch. I'm not sure how often
   we encounter this pattern vs. the seemingly more likely case with 
   branches, but I don't see evidence to leave the minimal pattern
   unoptimized.

2. This has a chance of increasing compile-time because we're using a
   ValueTracking call to handle the match. The motivating cases could be
   handled with a simpler pair of calls to isImpliedTrueByMatchingCmp/
   isImpliedFalseByMatchingCmp, but I saw that we have a more 
   comprehensive wrapper around those, so we might as well use it here
   unless there's evidence that it's significantly slower.

3. Ideally, we'd handle the fold to constants in InstSimplify, but as
   with the existing code here, we could extend this to handle cases
   where the result is not a constant, but a new combined predicate.
   That would mean splitting the logic across the 2 passes and possibly
   duplicating the pattern-matching cost.

4. As mentioned in D54827, this seems like the kind of thing that should
   be handled in Correlated Value Propagation, but that pass is currently
   limited to dealing with instructions with constant operands, so extending
   this bit of InstCombine is the smallest/easiest way to get these patterns 
   optimized.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348367 91177308-0d34-0410-b5e6-96231b3b80d8
2018-12-05 15:04:00 +00:00
Sanjay Patel
3d1c516e81 [InstCombine] rearrange foldICmpWithDominatingICmp; NFC
Move it out from under the constant check, reorder
predicates, add comments. This makes it easier to
extend to handle the non-constant case.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348284 91177308-0d34-0410-b5e6-96231b3b80d8
2018-12-04 17:44:24 +00:00
Sanjay Patel
09dac2e9e3 [InstCombine] add helper for icmp with dominator; NFC
There's a potential small enhancement to this code that could
solve the cases currently under proposal in D54827 via SimplifyCFG.

Whether instcombine should be doing this kind of semi-non-local
analysis in the first place is an open question, but separating
the logic out can only help if/when we decide to move it to a
different pass. 

AFAICT, any proposal to do this in SimplifyCFG could also be seen 
as an overreach + it would be incomplete to start the fold from a 
branch rather than an icmp.

There's another question here about the code for processUGT_ADDCST_ADD().
That part may be completely dead after rL234638 ?


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348273 91177308-0d34-0410-b5e6-96231b3b80d8
2018-12-04 15:35:17 +00:00
Roman Lebedev
d7774dcabf [InstCombine] foldICmpWithLowBitMaskedVal(): disable 2 faulty folds.
These two folds are invalid for this non-constant pattern
when the mask ends up being all-ones:
https://rise4fun.com/Alive/9au
https://rise4fun.com/Alive/UcQM

Fixes https://bugs.llvm.org/show_bug.cgi?id=39861

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348181 91177308-0d34-0410-b5e6-96231b3b80d8
2018-12-03 20:07:58 +00:00
Sanjay Patel
7a2b35fa20 [InstCombine] propagate FMF for fcmp+fabs folds
By morphing the instruction rather than deleting and creating a new one,
we retain fast-math-flags and potentially other metadata (profile info?).


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346331 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-07 16:15:01 +00:00
Sanjay Patel
3bfaa4d7be [InstCombine] peek through fabs() when checking isnan()
That should be the end of the missing cases for this fold.
See earlier patches in this series:
rL346321
rL346324


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346327 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-07 15:44:26 +00:00
Sanjay Patel
b3939a8dff [InstCombine] add folds for fcmp Pred fabs(X), 0.0
Similar to rL346321, we had folds for the ordered
versions of these compares already, so add the
unordered siblings for completeness.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346324 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-07 15:33:03 +00:00
Sanjay Patel
b9f324c261 [InstCombine] add fold for fabs(X) u< 0.0
The sibling fold for 'oge' --> 'ord' was already here,
but this half was missing. 

The result of fabs() must be positive or nan, so asking 
if the result is negative or nan is the same as asking 
if the result is nan.

This is another step towards fixing:
https://bugs.llvm.org/show_bug.cgi?id=39475


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346321 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-07 15:11:32 +00:00
Sanjay Patel
be9ce13f15 [IR] add optional parameter for copying IR flags to compare instructions
As shown, this is used to eliminate redundant code in InstCombine,
and there are more cases where we should be using this pattern, but
we're currently unintentionally dropping flags. 


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346282 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-07 00:00:42 +00:00
Sanjay Patel
09d77c0de9 [InstCombine] allow vector types for fcmp+fpext fold
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346245 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-06 17:20:20 +00:00
Sanjay Patel
031587fccf [InstCombine] propagate fast-math-flags when folding fcmp+fpext, part 2
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346242 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-06 16:45:27 +00:00
Sanjay Patel
ac037d267f [InstCombine] rearrange code for fcmp+fpext; NFCI
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346241 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-06 16:37:35 +00:00
Sanjay Patel
6052aa3705 [InstCombine] propagate fast-math-flags when folding fcmp+fpext
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346240 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-06 16:23:03 +00:00
Sanjay Patel
d174746db8 [InstCombine] propagate fast-math-flags when folding fcmp+fneg, part 2
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346238 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-06 15:58:57 +00:00
Sanjay Patel
c52594aa97 [InstCombine] reduce code; NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346235 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-06 15:53:58 +00:00
Sanjay Patel
bd1a44f2b8 [InstCombine] propagate fast-math-flags when folding fcmp+fneg
This is another part of solving PR39475:
https://bugs.llvm.org/show_bug.cgi?id=39475

This might be enough to fix that particular issue, but as noted
with the FIXME, we're still dropping FMF on other folds around here.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346234 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-06 15:49:45 +00:00
Sanjay Patel
ab049d88fa [InstCombine] canonicalize -0.0 to +0.0 in fcmp
As stated in IEEE-754 and discussed in:
https://bugs.llvm.org/show_bug.cgi?id=38086
...the sign of zero does not affect any FP compare predicate.

Known regressions were fixed with:
rL346097 (D54001)
rL346143

The transform will help reduce pattern-matching complexity to solve:
https://bugs.llvm.org/show_bug.cgi?id=39475
...as well as improve CSE and codegen (a zero constant is almost always
easier to produce than 0x80..00).



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346147 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-05 17:26:42 +00:00
Sanjay Patel
22968f72bd [InstCombine] refactor fabs+fcmp fold; NFC
Also, remove/replace/minimize/enhance the tests for this fold.
The code drops FMF, so it needs more tests and at least 1 fix.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345734 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-31 16:34:43 +00:00
Sanjay Patel
123a45feb6 [InstCombine] add assertion that InstSimplify has folded a fabs+fcmp; NFC
The 'OLT' case was updated at rL266175, so I assume it was just an
oversight that 'UGE' was not included because that patch handled
both predicates in InstSimplify.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345727 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-31 15:31:45 +00:00
Sanjay Patel
1ef057dce8 [InstSimplify] fold 'fcmp nnan oge X, 0.0' when X is not negative
This re-raises some of the open questions about how to apply and use fast-math-flags in IR from PR38086:
https://bugs.llvm.org/show_bug.cgi?id=38086
...but given the current implementation (no FMF on casts), this is likely the only way to predicate the 
transform.

This is part of solving PR39475:
https://bugs.llvm.org/show_bug.cgi?id=39475

Differential Revision: https://reviews.llvm.org/D53874


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345725 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-31 14:57:23 +00:00
Sanjay Patel
ef97784739 [InstCombine] use 'match' to reduce code; NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345647 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-30 20:52:25 +00:00
Sanjay Patel
7e79132257 [InstCombine] use getFltSemantics() instead of duplicating it; NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345613 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-30 16:21:56 +00:00
Sanjay Patel
66a2c5ecaa [InstCombine] reverse 'trunc X to <N x i1>' canonicalization; 2nd try
Re-trying r344082 because it unintentionally included extra diffs.

Original commit message:
icmp ne (and X, 1), 0 --> trunc X to N x i1

Ideally, we'd do the same for scalars, but there will likely be
regressions unless we add more trunc folds as we're doing here
for vectors.

The motivating vector case is from PR37549:
https://bugs.llvm.org/show_bug.cgi?id=37549

define <4 x float> @bitwise_select(<4 x float> %x, <4 x float> %y, <4 x float> %z, <4 x float> %w) {

  %c = fcmp ole <4 x float> %x, %y
  %s = sext <4 x i1> %c to <4 x i32>
  %s1 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 0, i32 0, i32 1, i32 1>
  %s2 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 2, i32 2, i32 3, i32 3>
  %cond = or <4 x i32> %s1, %s2
  %condtr = trunc <4 x i32> %cond to <4 x i1>
  %r = select <4 x i1> %condtr, <4 x float> %z, <4 x float> %w
  ret <4 x float> %r

}

Here's a sampling of the vector codegen for that case using
mask+icmp (current behavior) vs. trunc (with this patch):

AVX before:

vcmpleps        %xmm1, %xmm0, %xmm0
vpermilps       $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1]
vpermilps       $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3]
vorps   %xmm0, %xmm1, %xmm0
vandps  LCPI0_0(%rip), %xmm0, %xmm0
vxorps  %xmm1, %xmm1, %xmm1
vpcmpeqd        %xmm1, %xmm0, %xmm0
vblendvps       %xmm0, %xmm3, %xmm2, %xmm0

AVX after:

vcmpleps        %xmm1, %xmm0, %xmm0
vpermilps       $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1]
vpermilps       $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3]
vorps   %xmm0, %xmm1, %xmm0
vblendvps       %xmm0, %xmm2, %xmm3, %xmm0

AVX512f before:

vcmpleps        %xmm1, %xmm0, %xmm0
vpermilps       $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1]
vpermilps       $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3]
vorps   %xmm0, %xmm1, %xmm0
vpbroadcastd    LCPI0_0(%rip), %xmm1 ## xmm1 = [1,1,1,1]
vptestnmd       %zmm1, %zmm0, %k1
vblendmps       %zmm3, %zmm2, %zmm0 {%k1}

AVX512f after:

vcmpleps        %xmm1, %xmm0, %xmm0
vpermilps       $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1]
vpermilps       $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3]
vorps   %xmm0, %xmm1, %xmm0
vpslld  $31, %xmm0, %xmm0
vptestmd        %zmm0, %zmm0, %k1
vblendmps       %zmm2, %zmm3, %zmm0 {%k1}

AArch64 before:

fcmge   v0.4s, v1.4s, v0.4s
zip1    v1.4s, v0.4s, v0.4s
zip2    v0.4s, v0.4s, v0.4s
orr     v0.16b, v1.16b, v0.16b
movi    v1.4s, #1
and     v0.16b, v0.16b, v1.16b
cmeq    v0.4s, v0.4s, #0
bsl     v0.16b, v3.16b, v2.16b

AArch64 after:

fcmge   v0.4s, v1.4s, v0.4s
zip1    v1.4s, v0.4s, v0.4s
zip2    v0.4s, v0.4s, v0.4s
orr     v0.16b, v1.16b, v0.16b
bsl     v0.16b, v2.16b, v3.16b

PowerPC-le before:

xvcmpgesp 34, 35, 34
vspltisw 0, 1
vmrglw 3, 2, 2
vmrghw 2, 2, 2
xxlor 0, 35, 34
xxlxor 35, 35, 35
xxland 34, 0, 32
vcmpequw 2, 2, 3
xxsel 34, 36, 37, 34

PowerPC-le after:

xvcmpgesp 34, 35, 34
vmrglw 3, 2, 2
vmrghw 2, 2, 2
xxlor 0, 35, 34
xxsel 34, 37, 36, 0

Differential Revision: https://reviews.llvm.org/D52747



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344181 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-10 20:47:46 +00:00
Sanjay Patel
9f5daa2df0 revert r344082: [InstCombine] reverse 'trunc X to <N x i1>' canonicalization
This commit accidentally included the diffs from D53057.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344178 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-10 20:39:39 +00:00
Sanjay Patel
1dd3c06445 [InstCombine] reverse 'trunc X to <N x i1>' canonicalization
icmp ne (and X, 1), 0 --> trunc X to N x i1

Ideally, we'd do the same for scalars, but there will likely be 
regressions unless we add more trunc folds as we're doing here 
for vectors.

The motivating vector case is from PR37549:
https://bugs.llvm.org/show_bug.cgi?id=37549

define <4 x float> @bitwise_select(<4 x float> %x, <4 x float> %y, <4 x float> %z, <4 x float> %w) {
  %c = fcmp ole <4 x float> %x, %y
  %s = sext <4 x i1> %c to <4 x i32>
  %s1 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 0, i32 0, i32 1, i32 1>
  %s2 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 2, i32 2, i32 3, i32 3>
  %cond = or <4 x i32> %s1, %s2
  %condtr = trunc <4 x i32> %cond to <4 x i1>
  %r = select <4 x i1> %condtr, <4 x float> %z, <4 x float> %w
  ret <4 x float> %r
}

Here's a sampling of the vector codegen for that case using 
mask+icmp (current behavior) vs. trunc (with this patch):

AVX before:

vcmpleps	%xmm1, %xmm0, %xmm0
vpermilps	$80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1]
vpermilps	$250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3]
vorps	%xmm0, %xmm1, %xmm0
vandps	LCPI0_0(%rip), %xmm0, %xmm0
vxorps	%xmm1, %xmm1, %xmm1
vpcmpeqd	%xmm1, %xmm0, %xmm0
vblendvps	%xmm0, %xmm3, %xmm2, %xmm0

AVX after:

vcmpleps	%xmm1, %xmm0, %xmm0
vpermilps	$80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1]
vpermilps	$250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3]
vorps	%xmm0, %xmm1, %xmm0
vblendvps	%xmm0, %xmm2, %xmm3, %xmm0

AVX512f before:

vcmpleps	%xmm1, %xmm0, %xmm0
vpermilps	$80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1]
vpermilps	$250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3]
vorps	%xmm0, %xmm1, %xmm0
vpbroadcastd	LCPI0_0(%rip), %xmm1 ## xmm1 = [1,1,1,1]
vptestnmd	%zmm1, %zmm0, %k1
vblendmps	%zmm3, %zmm2, %zmm0 {%k1}

AVX512f after:

vcmpleps	%xmm1, %xmm0, %xmm0
vpermilps	$80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1]
vpermilps	$250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3]
vorps	%xmm0, %xmm1, %xmm0
vpslld	$31, %xmm0, %xmm0
vptestmd	%zmm0, %zmm0, %k1
vblendmps	%zmm2, %zmm3, %zmm0 {%k1}

AArch64 before:

fcmge	v0.4s, v1.4s, v0.4s
zip1	v1.4s, v0.4s, v0.4s
zip2	v0.4s, v0.4s, v0.4s
orr	v0.16b, v1.16b, v0.16b
movi	v1.4s, #1
and	v0.16b, v0.16b, v1.16b
cmeq	v0.4s, v0.4s, #0
bsl	v0.16b, v3.16b, v2.16b

AArch64 after:

fcmge	v0.4s, v1.4s, v0.4s
zip1	v1.4s, v0.4s, v0.4s
zip2	v0.4s, v0.4s, v0.4s
orr	v0.16b, v1.16b, v0.16b
bsl	v0.16b, v2.16b, v3.16b

PowerPC-le before:

xvcmpgesp 34, 35, 34
vspltisw 0, 1
vmrglw 3, 2, 2
vmrghw 2, 2, 2
xxlor 0, 35, 34
xxlxor 35, 35, 35
xxland 34, 0, 32
vcmpequw 2, 2, 3
xxsel 34, 36, 37, 34

PowerPC-le after:

xvcmpgesp 34, 35, 34
vmrglw 3, 2, 2
vmrghw 2, 2, 2
xxlor 0, 35, 34
xxsel 34, 37, 36, 0

Differential Revision: https://reviews.llvm.org/D52747



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344082 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-09 21:26:01 +00:00
Jesper Antonsson
6b99c7f259 [InstCombine] Handle vector compares in foldGEPIcmp(), take 2
Summary:
This is a continuation of the fix for PR34627 "InstCombine assertion at vector gep/icmp folding". (I just realized bugpoint had fuzzed the original test for me, so I had fixed another trigger of the same assert in adjacent code in InstCombine.)

This patch avoids optimizing an icmp (to look only at the base pointers) when the resulting icmp would have a different type.

The patch adds a testcase and also cleans up and shrinks the pre-existing test for the adjacent assert trigger.

Reviewers: lebedev.ri, majnemer, spatel

Reviewed By: lebedev.ri

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D52494

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343486 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-01 14:59:25 +00:00
Sanjay Patel
94a9f57861 [InstCombine] Without infinites, fold (C / X) < 0.0 --> (X < 0)
When C is not zero and infinites are not allowed (C / X) > 0 is a sign
test. Depending on the sign of C, the predicate must be swapped.

E.g.:
  foo(double X) {
    if ((-2.0 / X) <= 0) ...
  }
 =>
  foo(double X) {
    if (X >= 0) ...
  }

Patch by: @marels (Martin Elshuber)

Differential Revision: https://reviews.llvm.org/D51942


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343228 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-27 15:59:24 +00:00
Jesper Antonsson
a6fe72111f [InstCombine] Handle vector compares in foldGEPIcmp()
Summary:
This is to fix PR38984 "InstCombine assertion at vector gep/icmp folding":
https://bugs.llvm.org/show_bug.cgi?id=38984

Reviewers: majnemer, spatel, lattner, lebedev.ri

Reviewed By: lebedev.ri

Subscribers: lebedev.ri, llvm-commits

Differential Revision: https://reviews.llvm.org/D52263

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342647 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-20 13:37:28 +00:00
Roman Lebedev
7795456d9e [InstCombine] foldICmpWithLowBitMaskedVal(): handle uncanonical ((-1 << y) >> y) mask
Summary:
The last low-bit-mask-pattern-producing-pattern i can think of.

https://rise4fun.com/Alive/UGzE <- non-canonical
But we can not canonicalize it because of extra uses.

https://bugs.llvm.org/show_bug.cgi?id=38123

Reviewers: spatel, craig.topper, RKSimon

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D52148

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342548 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-19 13:35:46 +00:00
Roman Lebedev
47c99e0a96 [InstCombine] foldICmpWithLowBitMaskedVal(): handle uncanonical ((1 << y)+(-1)) mask
Summary:
Same as to D52146.
`((1 << y)+(-1))` is simply non-canoniacal version of `~(-1 << y)`: https://rise4fun.com/Alive/0vl
We can not canonicalize it due to the extra uses. But we can handle it here.

Reviewers: spatel, craig.topper, RKSimon

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D52147

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342547 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-19 13:35:40 +00:00
Roman Lebedev
74e3c34a76 [InstCombine] foldICmpWithLowBitMaskedVal(): handle ~(-1 << y) mask
Summary:
Two folds are happening here:
1. https://rise4fun.com/Alive/oaFX
2. And then `foldICmpWithHighBitMask()` (D52001): https://rise4fun.com/Alive/wsP4

This change doesn't just add the handling for eq/ne predicates,
it actually builds upon the previous `foldICmpWithLowBitMaskedVal()` work,
so **all** the 16 fold variants* are immediately supported.

I'm indeed only testing these two predicates.
I do not feel like re-proving all 16 folds*, because they were already proven
for the general case of constant with all-ones in low bits. So as long as
the mask produces all-ones in low bits, i'm pretty sure the fold is valid.

But required, i can re-prove, let me know.

* eq/ne are commutative - 4 folds; ult/ule/ugt/uge - are not commutative (the commuted variant is InstSimplified), 4 folds; slt/sle/sgt/sge are not commutative - 4 folds. 12 folds in total.

https://bugs.llvm.org/show_bug.cgi?id=38123
https://bugs.llvm.org/show_bug.cgi?id=38708

Reviewers: spatel, craig.topper, RKSimon

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D52146

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342546 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-19 13:35:27 +00:00
Roman Lebedev
4c3db1c1e8 [InstCombine] Inefficient pattern for high-bits checking 3 (PR38708)
Summary:
It is sometimes important to check that some newly-computed value
is non-negative and only n bits wide (where n is a variable.)
There are many ways to check that:
https://godbolt.org/z/o4RB8D
The last variant seems best?
(I'm sure there are some other variations i haven't thought of..)

The last (as far i know?) pattern, non-canonical due to the extra use.
https://godbolt.org/z/aCMsPk
https://rise4fun.com/Alive/I6f

https://bugs.llvm.org/show_bug.cgi?id=38708

Reviewers: spatel, craig.topper, RKSimon

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D52062

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342321 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-15 12:04:13 +00:00
Roman Lebedev
ad7e5b0687 [InstCombine] Inefficient pattern for high-bits checking 2 (PR38708)
Summary:
It is sometimes important to check that some newly-computed value
is non-negative and only n bits wide (where n is a variable.)
There are many ways to check that:
https://godbolt.org/z/o4RB8D
The last variant seems best?
(I'm sure there are some other variations i haven't thought of..)

More complicated, canonical pattern:
https://rise4fun.com/Alive/uhA

We do need to have two `switch()`'es like this,
to not mismatch the swappable predicates.

https://bugs.llvm.org/show_bug.cgi?id=38708

Reviewers: spatel, craig.topper, RKSimon

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D52001

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342173 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-13 20:33:12 +00:00
Roman Lebedev
b5eaa9c28c [InstCombine] Inefficient pattern for high-bits checking (PR38708)
Summary:
It is sometimes important to check that some newly-computed value
is non-negative and only `n` bits wide (where `n` is a variable.)
There are **many** ways to check that:
https://godbolt.org/z/o4RB8D
The last variant seems best?
(I'm sure there are some other variations i haven't thought of..)

Let's handle the second variant first, since it is much simpler.
https://rise4fun.com/Alive/LYjY

https://bugs.llvm.org/show_bug.cgi?id=38708

Reviewers: spatel, craig.topper, RKSimon

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D51985

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342067 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-12 18:19:43 +00:00