561 Commits

Author SHA1 Message Date
Nikita Popov
c333f36cb3 [InstCombine] Simplify cttz/ctlz + icmp ugt/ult
Followup to D55745, this time handling comparisons with ugt and ult
predicates (which are the canonical forms for non-equality predicates).

For ctlz we can convert into a simple icmp, for cttz we can convert
into a mask check.

Differential Revision: https://reviews.llvm.org/D56355

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@351645 91177308-0d34-0410-b5e6-96231b3b80d8
2019-01-19 09:56:01 +00:00
Chandler Carruth
6b547686c5 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@351636 91177308-0d34-0410-b5e6-96231b3b80d8
2019-01-19 08:50:56 +00:00
Nikita Popov
a0b99f282a [InstCombine] Simplify cttz/ctlz + icmp eq/ne into mask check
Checking whether a number has a certain number of trailing / leading
zeros means checking whether it is of the form XXXX1000 / 0001XXXX,
which can be done with an and+icmp.

Related to https://bugs.llvm.org/show_bug.cgi?id=28668. As a next
step, this can be extended to non-equality predicates.

Differential Revision: https://reviews.llvm.org/D55745

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@349530 91177308-0d34-0410-b5e6-96231b3b80d8
2018-12-18 19:59:50 +00:00
Nikita Popov
387b5efaea [InstCombine] Fix negative GEP offset evaluation for 32-bit pointers
This fixes https://bugs.llvm.org/show_bug.cgi?id=39908.

The evaluateGEPOffsetExpression() function simplifies GEP offsets for
use in comparisons against zero, basically by converting X*Scale+Offset==0
to X+Offset/Scale==0 if Scale divides Offset. However, before this is done,
Offset is masked down to the pointer size. This results in incorrect
results for negative Offsets, because we basically end up dividing the
32-bit offset *zero* extended to 64-bit bits (rather than sign extended).

Fix this by explicitly sign extending the truncated value.

Differential Revision: https://reviews.llvm.org/D55449

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348987 91177308-0d34-0410-b5e6-96231b3b80d8
2018-12-12 23:19:03 +00:00
Roman Lebedev
261e911988 [InstCombine] foldICmpWithLowBitMaskedVal(): don't miscompile -1 vector elts
I was finally able to quantify what i thought was missing in the fix,
it was vector constants. If we have a scalar (and %x, -1),
it will be instsimplified before we reach this code,
but if it is a vector, we may still have a -1 element.

Thus, we want to avoid the fold if *at least one* element is -1.
Or in other words, ignoring the undef elements, no sign bits
should be set. Thus, m_NonNegative().

A follow-up for rL348181
https://bugs.llvm.org/show_bug.cgi?id=39861

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348462 91177308-0d34-0410-b5e6-96231b3b80d8
2018-12-06 08:14:24 +00:00
Sanjay Patel
d222be1e7a [InstCombine] simplify icmps with same operands based on dominating cmp
The tests here are based on the motivating cases from D54827.

More background:
1. We don't get these cases in general with SimplifyCFG because the root
   of the pattern match is an icmp, not a branch. I'm not sure how often
   we encounter this pattern vs. the seemingly more likely case with 
   branches, but I don't see evidence to leave the minimal pattern
   unoptimized.

2. This has a chance of increasing compile-time because we're using a
   ValueTracking call to handle the match. The motivating cases could be
   handled with a simpler pair of calls to isImpliedTrueByMatchingCmp/
   isImpliedFalseByMatchingCmp, but I saw that we have a more 
   comprehensive wrapper around those, so we might as well use it here
   unless there's evidence that it's significantly slower.

3. Ideally, we'd handle the fold to constants in InstSimplify, but as
   with the existing code here, we could extend this to handle cases
   where the result is not a constant, but a new combined predicate.
   That would mean splitting the logic across the 2 passes and possibly
   duplicating the pattern-matching cost.

4. As mentioned in D54827, this seems like the kind of thing that should
   be handled in Correlated Value Propagation, but that pass is currently
   limited to dealing with instructions with constant operands, so extending
   this bit of InstCombine is the smallest/easiest way to get these patterns 
   optimized.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348367 91177308-0d34-0410-b5e6-96231b3b80d8
2018-12-05 15:04:00 +00:00
Sanjay Patel
3d1c516e81 [InstCombine] rearrange foldICmpWithDominatingICmp; NFC
Move it out from under the constant check, reorder
predicates, add comments. This makes it easier to
extend to handle the non-constant case.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348284 91177308-0d34-0410-b5e6-96231b3b80d8
2018-12-04 17:44:24 +00:00
Sanjay Patel
09dac2e9e3 [InstCombine] add helper for icmp with dominator; NFC
There's a potential small enhancement to this code that could
solve the cases currently under proposal in D54827 via SimplifyCFG.

Whether instcombine should be doing this kind of semi-non-local
analysis in the first place is an open question, but separating
the logic out can only help if/when we decide to move it to a
different pass. 

AFAICT, any proposal to do this in SimplifyCFG could also be seen 
as an overreach + it would be incomplete to start the fold from a 
branch rather than an icmp.

There's another question here about the code for processUGT_ADDCST_ADD().
That part may be completely dead after rL234638 ?


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348273 91177308-0d34-0410-b5e6-96231b3b80d8
2018-12-04 15:35:17 +00:00
Roman Lebedev
d7774dcabf [InstCombine] foldICmpWithLowBitMaskedVal(): disable 2 faulty folds.
These two folds are invalid for this non-constant pattern
when the mask ends up being all-ones:
https://rise4fun.com/Alive/9au
https://rise4fun.com/Alive/UcQM

Fixes https://bugs.llvm.org/show_bug.cgi?id=39861

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348181 91177308-0d34-0410-b5e6-96231b3b80d8
2018-12-03 20:07:58 +00:00
Sanjay Patel
7a2b35fa20 [InstCombine] propagate FMF for fcmp+fabs folds
By morphing the instruction rather than deleting and creating a new one,
we retain fast-math-flags and potentially other metadata (profile info?).


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346331 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-07 16:15:01 +00:00
Sanjay Patel
3bfaa4d7be [InstCombine] peek through fabs() when checking isnan()
That should be the end of the missing cases for this fold.
See earlier patches in this series:
rL346321
rL346324


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346327 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-07 15:44:26 +00:00
Sanjay Patel
b3939a8dff [InstCombine] add folds for fcmp Pred fabs(X), 0.0
Similar to rL346321, we had folds for the ordered
versions of these compares already, so add the
unordered siblings for completeness.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346324 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-07 15:33:03 +00:00
Sanjay Patel
b9f324c261 [InstCombine] add fold for fabs(X) u< 0.0
The sibling fold for 'oge' --> 'ord' was already here,
but this half was missing. 

The result of fabs() must be positive or nan, so asking 
if the result is negative or nan is the same as asking 
if the result is nan.

This is another step towards fixing:
https://bugs.llvm.org/show_bug.cgi?id=39475


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346321 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-07 15:11:32 +00:00
Sanjay Patel
be9ce13f15 [IR] add optional parameter for copying IR flags to compare instructions
As shown, this is used to eliminate redundant code in InstCombine,
and there are more cases where we should be using this pattern, but
we're currently unintentionally dropping flags. 


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346282 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-07 00:00:42 +00:00
Sanjay Patel
09d77c0de9 [InstCombine] allow vector types for fcmp+fpext fold
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346245 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-06 17:20:20 +00:00
Sanjay Patel
031587fccf [InstCombine] propagate fast-math-flags when folding fcmp+fpext, part 2
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346242 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-06 16:45:27 +00:00
Sanjay Patel
ac037d267f [InstCombine] rearrange code for fcmp+fpext; NFCI
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346241 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-06 16:37:35 +00:00
Sanjay Patel
6052aa3705 [InstCombine] propagate fast-math-flags when folding fcmp+fpext
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346240 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-06 16:23:03 +00:00
Sanjay Patel
d174746db8 [InstCombine] propagate fast-math-flags when folding fcmp+fneg, part 2
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346238 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-06 15:58:57 +00:00
Sanjay Patel
c52594aa97 [InstCombine] reduce code; NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346235 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-06 15:53:58 +00:00
Sanjay Patel
bd1a44f2b8 [InstCombine] propagate fast-math-flags when folding fcmp+fneg
This is another part of solving PR39475:
https://bugs.llvm.org/show_bug.cgi?id=39475

This might be enough to fix that particular issue, but as noted
with the FIXME, we're still dropping FMF on other folds around here.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346234 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-06 15:49:45 +00:00
Sanjay Patel
ab049d88fa [InstCombine] canonicalize -0.0 to +0.0 in fcmp
As stated in IEEE-754 and discussed in:
https://bugs.llvm.org/show_bug.cgi?id=38086
...the sign of zero does not affect any FP compare predicate.

Known regressions were fixed with:
rL346097 (D54001)
rL346143

The transform will help reduce pattern-matching complexity to solve:
https://bugs.llvm.org/show_bug.cgi?id=39475
...as well as improve CSE and codegen (a zero constant is almost always
easier to produce than 0x80..00).



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346147 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-05 17:26:42 +00:00
Sanjay Patel
22968f72bd [InstCombine] refactor fabs+fcmp fold; NFC
Also, remove/replace/minimize/enhance the tests for this fold.
The code drops FMF, so it needs more tests and at least 1 fix.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345734 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-31 16:34:43 +00:00
Sanjay Patel
123a45feb6 [InstCombine] add assertion that InstSimplify has folded a fabs+fcmp; NFC
The 'OLT' case was updated at rL266175, so I assume it was just an
oversight that 'UGE' was not included because that patch handled
both predicates in InstSimplify.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345727 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-31 15:31:45 +00:00
Sanjay Patel
1ef057dce8 [InstSimplify] fold 'fcmp nnan oge X, 0.0' when X is not negative
This re-raises some of the open questions about how to apply and use fast-math-flags in IR from PR38086:
https://bugs.llvm.org/show_bug.cgi?id=38086
...but given the current implementation (no FMF on casts), this is likely the only way to predicate the 
transform.

This is part of solving PR39475:
https://bugs.llvm.org/show_bug.cgi?id=39475

Differential Revision: https://reviews.llvm.org/D53874


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345725 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-31 14:57:23 +00:00
Sanjay Patel
ef97784739 [InstCombine] use 'match' to reduce code; NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345647 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-30 20:52:25 +00:00
Sanjay Patel
7e79132257 [InstCombine] use getFltSemantics() instead of duplicating it; NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345613 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-30 16:21:56 +00:00
Sanjay Patel
66a2c5ecaa [InstCombine] reverse 'trunc X to <N x i1>' canonicalization; 2nd try
Re-trying r344082 because it unintentionally included extra diffs.

Original commit message:
icmp ne (and X, 1), 0 --> trunc X to N x i1

Ideally, we'd do the same for scalars, but there will likely be
regressions unless we add more trunc folds as we're doing here
for vectors.

The motivating vector case is from PR37549:
https://bugs.llvm.org/show_bug.cgi?id=37549

define <4 x float> @bitwise_select(<4 x float> %x, <4 x float> %y, <4 x float> %z, <4 x float> %w) {

  %c = fcmp ole <4 x float> %x, %y
  %s = sext <4 x i1> %c to <4 x i32>
  %s1 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 0, i32 0, i32 1, i32 1>
  %s2 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 2, i32 2, i32 3, i32 3>
  %cond = or <4 x i32> %s1, %s2
  %condtr = trunc <4 x i32> %cond to <4 x i1>
  %r = select <4 x i1> %condtr, <4 x float> %z, <4 x float> %w
  ret <4 x float> %r

}

Here's a sampling of the vector codegen for that case using
mask+icmp (current behavior) vs. trunc (with this patch):

AVX before:

vcmpleps        %xmm1, %xmm0, %xmm0
vpermilps       $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1]
vpermilps       $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3]
vorps   %xmm0, %xmm1, %xmm0
vandps  LCPI0_0(%rip), %xmm0, %xmm0
vxorps  %xmm1, %xmm1, %xmm1
vpcmpeqd        %xmm1, %xmm0, %xmm0
vblendvps       %xmm0, %xmm3, %xmm2, %xmm0

AVX after:

vcmpleps        %xmm1, %xmm0, %xmm0
vpermilps       $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1]
vpermilps       $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3]
vorps   %xmm0, %xmm1, %xmm0
vblendvps       %xmm0, %xmm2, %xmm3, %xmm0

AVX512f before:

vcmpleps        %xmm1, %xmm0, %xmm0
vpermilps       $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1]
vpermilps       $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3]
vorps   %xmm0, %xmm1, %xmm0
vpbroadcastd    LCPI0_0(%rip), %xmm1 ## xmm1 = [1,1,1,1]
vptestnmd       %zmm1, %zmm0, %k1
vblendmps       %zmm3, %zmm2, %zmm0 {%k1}

AVX512f after:

vcmpleps        %xmm1, %xmm0, %xmm0
vpermilps       $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1]
vpermilps       $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3]
vorps   %xmm0, %xmm1, %xmm0
vpslld  $31, %xmm0, %xmm0
vptestmd        %zmm0, %zmm0, %k1
vblendmps       %zmm2, %zmm3, %zmm0 {%k1}

AArch64 before:

fcmge   v0.4s, v1.4s, v0.4s
zip1    v1.4s, v0.4s, v0.4s
zip2    v0.4s, v0.4s, v0.4s
orr     v0.16b, v1.16b, v0.16b
movi    v1.4s, #1
and     v0.16b, v0.16b, v1.16b
cmeq    v0.4s, v0.4s, #0
bsl     v0.16b, v3.16b, v2.16b

AArch64 after:

fcmge   v0.4s, v1.4s, v0.4s
zip1    v1.4s, v0.4s, v0.4s
zip2    v0.4s, v0.4s, v0.4s
orr     v0.16b, v1.16b, v0.16b
bsl     v0.16b, v2.16b, v3.16b

PowerPC-le before:

xvcmpgesp 34, 35, 34
vspltisw 0, 1
vmrglw 3, 2, 2
vmrghw 2, 2, 2
xxlor 0, 35, 34
xxlxor 35, 35, 35
xxland 34, 0, 32
vcmpequw 2, 2, 3
xxsel 34, 36, 37, 34

PowerPC-le after:

xvcmpgesp 34, 35, 34
vmrglw 3, 2, 2
vmrghw 2, 2, 2
xxlor 0, 35, 34
xxsel 34, 37, 36, 0

Differential Revision: https://reviews.llvm.org/D52747



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344181 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-10 20:47:46 +00:00
Sanjay Patel
9f5daa2df0 revert r344082: [InstCombine] reverse 'trunc X to <N x i1>' canonicalization
This commit accidentally included the diffs from D53057.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344178 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-10 20:39:39 +00:00
Sanjay Patel
1dd3c06445 [InstCombine] reverse 'trunc X to <N x i1>' canonicalization
icmp ne (and X, 1), 0 --> trunc X to N x i1

Ideally, we'd do the same for scalars, but there will likely be 
regressions unless we add more trunc folds as we're doing here 
for vectors.

The motivating vector case is from PR37549:
https://bugs.llvm.org/show_bug.cgi?id=37549

define <4 x float> @bitwise_select(<4 x float> %x, <4 x float> %y, <4 x float> %z, <4 x float> %w) {
  %c = fcmp ole <4 x float> %x, %y
  %s = sext <4 x i1> %c to <4 x i32>
  %s1 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 0, i32 0, i32 1, i32 1>
  %s2 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 2, i32 2, i32 3, i32 3>
  %cond = or <4 x i32> %s1, %s2
  %condtr = trunc <4 x i32> %cond to <4 x i1>
  %r = select <4 x i1> %condtr, <4 x float> %z, <4 x float> %w
  ret <4 x float> %r
}

Here's a sampling of the vector codegen for that case using 
mask+icmp (current behavior) vs. trunc (with this patch):

AVX before:

vcmpleps	%xmm1, %xmm0, %xmm0
vpermilps	$80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1]
vpermilps	$250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3]
vorps	%xmm0, %xmm1, %xmm0
vandps	LCPI0_0(%rip), %xmm0, %xmm0
vxorps	%xmm1, %xmm1, %xmm1
vpcmpeqd	%xmm1, %xmm0, %xmm0
vblendvps	%xmm0, %xmm3, %xmm2, %xmm0

AVX after:

vcmpleps	%xmm1, %xmm0, %xmm0
vpermilps	$80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1]
vpermilps	$250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3]
vorps	%xmm0, %xmm1, %xmm0
vblendvps	%xmm0, %xmm2, %xmm3, %xmm0

AVX512f before:

vcmpleps	%xmm1, %xmm0, %xmm0
vpermilps	$80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1]
vpermilps	$250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3]
vorps	%xmm0, %xmm1, %xmm0
vpbroadcastd	LCPI0_0(%rip), %xmm1 ## xmm1 = [1,1,1,1]
vptestnmd	%zmm1, %zmm0, %k1
vblendmps	%zmm3, %zmm2, %zmm0 {%k1}

AVX512f after:

vcmpleps	%xmm1, %xmm0, %xmm0
vpermilps	$80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1]
vpermilps	$250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3]
vorps	%xmm0, %xmm1, %xmm0
vpslld	$31, %xmm0, %xmm0
vptestmd	%zmm0, %zmm0, %k1
vblendmps	%zmm2, %zmm3, %zmm0 {%k1}

AArch64 before:

fcmge	v0.4s, v1.4s, v0.4s
zip1	v1.4s, v0.4s, v0.4s
zip2	v0.4s, v0.4s, v0.4s
orr	v0.16b, v1.16b, v0.16b
movi	v1.4s, #1
and	v0.16b, v0.16b, v1.16b
cmeq	v0.4s, v0.4s, #0
bsl	v0.16b, v3.16b, v2.16b

AArch64 after:

fcmge	v0.4s, v1.4s, v0.4s
zip1	v1.4s, v0.4s, v0.4s
zip2	v0.4s, v0.4s, v0.4s
orr	v0.16b, v1.16b, v0.16b
bsl	v0.16b, v2.16b, v3.16b

PowerPC-le before:

xvcmpgesp 34, 35, 34
vspltisw 0, 1
vmrglw 3, 2, 2
vmrghw 2, 2, 2
xxlor 0, 35, 34
xxlxor 35, 35, 35
xxland 34, 0, 32
vcmpequw 2, 2, 3
xxsel 34, 36, 37, 34

PowerPC-le after:

xvcmpgesp 34, 35, 34
vmrglw 3, 2, 2
vmrghw 2, 2, 2
xxlor 0, 35, 34
xxsel 34, 37, 36, 0

Differential Revision: https://reviews.llvm.org/D52747



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344082 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-09 21:26:01 +00:00
Jesper Antonsson
6b99c7f259 [InstCombine] Handle vector compares in foldGEPIcmp(), take 2
Summary:
This is a continuation of the fix for PR34627 "InstCombine assertion at vector gep/icmp folding". (I just realized bugpoint had fuzzed the original test for me, so I had fixed another trigger of the same assert in adjacent code in InstCombine.)

This patch avoids optimizing an icmp (to look only at the base pointers) when the resulting icmp would have a different type.

The patch adds a testcase and also cleans up and shrinks the pre-existing test for the adjacent assert trigger.

Reviewers: lebedev.ri, majnemer, spatel

Reviewed By: lebedev.ri

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D52494

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343486 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-01 14:59:25 +00:00
Sanjay Patel
94a9f57861 [InstCombine] Without infinites, fold (C / X) < 0.0 --> (X < 0)
When C is not zero and infinites are not allowed (C / X) > 0 is a sign
test. Depending on the sign of C, the predicate must be swapped.

E.g.:
  foo(double X) {
    if ((-2.0 / X) <= 0) ...
  }
 =>
  foo(double X) {
    if (X >= 0) ...
  }

Patch by: @marels (Martin Elshuber)

Differential Revision: https://reviews.llvm.org/D51942


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343228 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-27 15:59:24 +00:00
Jesper Antonsson
a6fe72111f [InstCombine] Handle vector compares in foldGEPIcmp()
Summary:
This is to fix PR38984 "InstCombine assertion at vector gep/icmp folding":
https://bugs.llvm.org/show_bug.cgi?id=38984

Reviewers: majnemer, spatel, lattner, lebedev.ri

Reviewed By: lebedev.ri

Subscribers: lebedev.ri, llvm-commits

Differential Revision: https://reviews.llvm.org/D52263

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342647 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-20 13:37:28 +00:00
Roman Lebedev
7795456d9e [InstCombine] foldICmpWithLowBitMaskedVal(): handle uncanonical ((-1 << y) >> y) mask
Summary:
The last low-bit-mask-pattern-producing-pattern i can think of.

https://rise4fun.com/Alive/UGzE <- non-canonical
But we can not canonicalize it because of extra uses.

https://bugs.llvm.org/show_bug.cgi?id=38123

Reviewers: spatel, craig.topper, RKSimon

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D52148

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342548 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-19 13:35:46 +00:00
Roman Lebedev
47c99e0a96 [InstCombine] foldICmpWithLowBitMaskedVal(): handle uncanonical ((1 << y)+(-1)) mask
Summary:
Same as to D52146.
`((1 << y)+(-1))` is simply non-canoniacal version of `~(-1 << y)`: https://rise4fun.com/Alive/0vl
We can not canonicalize it due to the extra uses. But we can handle it here.

Reviewers: spatel, craig.topper, RKSimon

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D52147

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342547 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-19 13:35:40 +00:00
Roman Lebedev
74e3c34a76 [InstCombine] foldICmpWithLowBitMaskedVal(): handle ~(-1 << y) mask
Summary:
Two folds are happening here:
1. https://rise4fun.com/Alive/oaFX
2. And then `foldICmpWithHighBitMask()` (D52001): https://rise4fun.com/Alive/wsP4

This change doesn't just add the handling for eq/ne predicates,
it actually builds upon the previous `foldICmpWithLowBitMaskedVal()` work,
so **all** the 16 fold variants* are immediately supported.

I'm indeed only testing these two predicates.
I do not feel like re-proving all 16 folds*, because they were already proven
for the general case of constant with all-ones in low bits. So as long as
the mask produces all-ones in low bits, i'm pretty sure the fold is valid.

But required, i can re-prove, let me know.

* eq/ne are commutative - 4 folds; ult/ule/ugt/uge - are not commutative (the commuted variant is InstSimplified), 4 folds; slt/sle/sgt/sge are not commutative - 4 folds. 12 folds in total.

https://bugs.llvm.org/show_bug.cgi?id=38123
https://bugs.llvm.org/show_bug.cgi?id=38708

Reviewers: spatel, craig.topper, RKSimon

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D52146

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342546 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-19 13:35:27 +00:00
Roman Lebedev
4c3db1c1e8 [InstCombine] Inefficient pattern for high-bits checking 3 (PR38708)
Summary:
It is sometimes important to check that some newly-computed value
is non-negative and only n bits wide (where n is a variable.)
There are many ways to check that:
https://godbolt.org/z/o4RB8D
The last variant seems best?
(I'm sure there are some other variations i haven't thought of..)

The last (as far i know?) pattern, non-canonical due to the extra use.
https://godbolt.org/z/aCMsPk
https://rise4fun.com/Alive/I6f

https://bugs.llvm.org/show_bug.cgi?id=38708

Reviewers: spatel, craig.topper, RKSimon

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D52062

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342321 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-15 12:04:13 +00:00
Roman Lebedev
ad7e5b0687 [InstCombine] Inefficient pattern for high-bits checking 2 (PR38708)
Summary:
It is sometimes important to check that some newly-computed value
is non-negative and only n bits wide (where n is a variable.)
There are many ways to check that:
https://godbolt.org/z/o4RB8D
The last variant seems best?
(I'm sure there are some other variations i haven't thought of..)

More complicated, canonical pattern:
https://rise4fun.com/Alive/uhA

We do need to have two `switch()`'es like this,
to not mismatch the swappable predicates.

https://bugs.llvm.org/show_bug.cgi?id=38708

Reviewers: spatel, craig.topper, RKSimon

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D52001

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342173 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-13 20:33:12 +00:00
Roman Lebedev
b5eaa9c28c [InstCombine] Inefficient pattern for high-bits checking (PR38708)
Summary:
It is sometimes important to check that some newly-computed value
is non-negative and only `n` bits wide (where `n` is a variable.)
There are **many** ways to check that:
https://godbolt.org/z/o4RB8D
The last variant seems best?
(I'm sure there are some other variations i haven't thought of..)

Let's handle the second variant first, since it is much simpler.
https://rise4fun.com/Alive/LYjY

https://bugs.llvm.org/show_bug.cgi?id=38708

Reviewers: spatel, craig.topper, RKSimon

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D51985

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342067 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-12 18:19:43 +00:00
Sanjay Patel
b74213ab8d [InstCombine] add folds for unsigned-overflow compares
Name: op_ugt_sum
  %a = add i8 %x, %y
  %r = icmp ugt i8 %x, %a
  =>
  %notx = xor i8 %x, -1
  %r = icmp ugt i8 %y, %notx

Name: sum_ult_op
  %a = add i8 %x, %y
  %r = icmp ult i8 %a, %x
  =>
  %notx = xor i8 %x, -1
  %r = icmp ugt i8 %y, %notx

https://rise4fun.com/Alive/ZRxI

AFAICT, this doesn't interfere with any add-saturation patterns
because those have >1 use for the 'add'. But this should be
better for IR analysis and codegen in the basic cases.

This is another fold inspired by PR14613:
https://bugs.llvm.org/show_bug.cgi?id=14613


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342004 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-11 22:40:20 +00:00
Sanjay Patel
da5e387562 [InstCombine] add folds for icmp with xor mask constant
These are the folds in Alive;
Name: xor_ult
Pre: isPowerOf2(-C1)
%xor = xor i8 %x, C1
%r = icmp ult i8 %xor, C1
=>
%r = icmp ugt i8 %x, ~C1

Name: xor_ugt
Pre: isPowerOf2(C1+1)
%xor = xor i8 %x, C1
%r = icmp ugt i8 %xor, C1
=>
%r = icmp ugt i8 %x, C1

https://rise4fun.com/Alive/Vty

The ugt case in its simplest form was already handled by DemandedBits,
but that's not ideal as shown in the multi-use test.

I'm not sure if these are all of the symmetrical folds, but I adjusted 
the existing code for one of the folds to try to show the similarities.

There's no obvious connection, but this is another preliminary step 
for PR14613...
https://bugs.llvm.org/show_bug.cgi?id=14613


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@341997 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-11 22:00:15 +00:00
Tim Northover
0d018d17b3 InstCombine: move hasOneUse check to the top of foldICmpAddConstant
There were two combines not covered by the check before now, neither of which
actually differed from normal in the benefit analysis.

The most recent seems to be because it was just added at the top of the
function (naturally). The older is from way back in 2008 (r46687) when we just
didn't put those checks in so routinely, and has been diligently maintained
since.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@341831 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-10 14:26:44 +00:00
Nicola Zaghen
4cf831c0ee [InstCombine] Fold icmp ugt/ult (add nuw X, C2), C --> icmp ugt/ult X, (C - C2)
Support for sgt/slt was added in rL294898, this adds the same cases also for unsigned compares.

This is the Alive proof: https://rise4fun.com/Alive/nyY

Differential Revision: https://reviews.llvm.org/D50972



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@341353 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-04 10:29:48 +00:00
Craig Topper
d43029ed51 [InstCombine] Add splat vector constant support to foldICmpAddOpConst.
Differential Revision: https://reviews.llvm.org/D50946

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@340231 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-20 23:04:25 +00:00
Sanjay Patel
7edab834d4 [InstCombine] move vector compare before same-shuffled ops
This is a step towards fixing PR37463:
https://bugs.llvm.org/show_bug.cgi?id=37463


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@339875 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-16 12:52:17 +00:00
Matt Arsenault
0904319a62 ValueTracking: Start enhancing isKnownNeverNaN
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@339399 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-09 22:40:08 +00:00
Roman Lebedev
1cd41cb270 [InstCombine] Re-commit: Fold 'check for [no] signed truncation' pattern
Summary:
[[ https://bugs.llvm.org/show_bug.cgi?id=38149 | PR38149 ]]

As discussed in https://reviews.llvm.org/D49179#1158957 and later,
the IR for 'check for [no] signed truncation' pattern can be improved:
https://rise4fun.com/Alive/gBf
^ that pattern will be produced by Implicit Integer Truncation sanitizer,
https://reviews.llvm.org/D48958 https://bugs.llvm.org/show_bug.cgi?id=21530
in signed case, therefore it is probably a good idea to improve it.

The DAGCombine will reverse this transform, see
https://reviews.llvm.org/D49266

This transform is surprisingly frustrating.
This does not deal with non-splat shift amounts, or with undef shift amounts.
I've outlined what i think the solution should be:
```
  // Potential handling of non-splats: for each element:
  //  * if both are undef, replace with constant 0.
  //    Because (1<<0) is OK and is 1, and ((1<<0)>>1) is also OK and is 0.
  //  * if both are not undef, and are different, bailout.
  //  * else, only one is undef, then pick the non-undef one.
```

This is a re-commit, as the original patch, committed in rL337190
was reverted in rL337344 as it broke chromium build:
https://bugs.llvm.org/show_bug.cgi?id=38204 and
https://crbug.com/864832
Proofs that the fixed folds are ok: https://rise4fun.com/Alive/VYM

Differential Revision: https://reviews.llvm.org/D49320

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337376 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-18 10:55:17 +00:00
Bob Haarman
adf4ac8b22 Revert "[InstCombine] Fold 'check for [no] signed truncation' pattern"
This reverts r337190 (and a few follow-up commits), which caused the
Chromium build to fail. See
https://bugs.llvm.org/show_bug.cgi?id=38204 and
https://crbug.com/864832

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337344 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-18 02:18:28 +00:00
Simon Pilgrim
15fa57ae79 Fix MSVC "result of 32-bit shift implicitly converted to 64 bits" warning. NFCI.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337257 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-17 09:39:55 +00:00
Roman Lebedev
a5425a350e [InstCombine] Fold 'check for [no] signed truncation' pattern
Summary:
[[ https://bugs.llvm.org/show_bug.cgi?id=38149 | PR38149 ]]

As discussed in https://reviews.llvm.org/D49179#1158957 and later,
the IR for 'check for [no] signed truncation' pattern can be improved:
https://rise4fun.com/Alive/gBf
^ that pattern will be produced by Implicit Integer Truncation sanitizer,
https://reviews.llvm.org/D48958 https://bugs.llvm.org/show_bug.cgi?id=21530
in signed case, therefore it is probably a good idea to improve it.

Proofs for this transform: https://rise4fun.com/Alive/mgu
This transform is surprisingly frustrating.
This does not deal with non-splat shift amounts, or with undef shift amounts.
I've outlined what i think the solution should be:
```
  // Potential handling of non-splats: for each element:
  //  * if both are undef, replace with constant 0.
  //    Because (1<<0) is OK and is 1, and ((1<<0)>>1) is also OK and is 0.
  //  * if both are not undef, and are different, bailout.
  //  * else, only one is undef, then pick the non-undef one.
```

The DAGCombine will reverse this transform, see
https://reviews.llvm.org/D49266

Reviewers: spatel, craig.topper

Reviewed By: spatel

Subscribers: JDevlieghere, rkruppe, llvm-commits

Differential Revision: https://reviews.llvm.org/D49320

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337190 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-16 16:45:42 +00:00