954 Commits

Author SHA1 Message Date
Craig Topper
301db41084 [SimplifyDemandedBits] Use APInt::intersects to instead of ANDing and comparing to 0 separately. NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@372158 91177308-0d34-0410-b5e6-96231b3b80d8
2019-09-17 18:19:02 +00:00
Simon Pilgrim
88ae6d603d [TargetLowering] SimplifyDemandedBits - add EXTRACT_SUBVECTOR support.
Call SimplifyDemandedBits on the source vector.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@371923 91177308-0d34-0410-b5e6-96231b3b80d8
2019-09-14 16:38:26 +00:00
Philip Reames
11df0bc741 [SDAG] Update generic code to conservatively check for isAtomic in addition to isVolatile
This is the first sweep of generic code to add isAtomic bailouts where appropriate. The intention here is to have the switch from AtomicSDNode to LoadSDNode/StoreSDNode be close to NFC; that is, I'm not looking to allow additional optimizations at this time. That will come later.  See D66309 for context.

Differential Revision: https://reviews.llvm.org/D66318



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@371786 91177308-0d34-0410-b5e6-96231b3b80d8
2019-09-12 22:49:17 +00:00
Tim Northover
43f94c59dd GlobalISel: add combiner to form indexed loads.
Loosely based on DAGCombiner version, but this part is slightly simpler in
GlobalIsel because all address calculation is performed by G_GEP. That makes
the inc/dec distinction moot so there's just pre/post to think about.

No targets can handle it yet so testing is via a special flag that overrides
target hooks.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@371384 91177308-0d34-0410-b5e6-96231b3b80d8
2019-09-09 10:04:23 +00:00
Bjorn Pettersson
fb1c8990ae [CodeGen] Handle SMULFIXSAT with scale zero in TargetLowering::expandFixedPointMul
Summary:
Normally TargetLowering::expandFixedPointMul would handle
SMULFIXSAT with scale zero by using an SMULO to compute the
product and determine if saturation is needed (if overflow
happened). But if SMULO isn't custom/legal it falls through
and uses the same technique, using MULHS/SMUL_LOHI, as used
for non-zero scales.

Problem was that when checking for overflow (handling saturation)
when not using MULO we did not expect to find a zero scale. So
we ended up in an assertion when doing
  APInt::getLowBitsSet(VTSize, Scale - 1)

This patch fixes the problem by adding a new special case for
how saturation is computed when scale is zero.

Reviewers: RKSimon, bevinh, leonardchan, spatel

Reviewed By: RKSimon

Subscribers: wuzish, nemanjai, hiraditya, MaskRay, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67071

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@371309 91177308-0d34-0410-b5e6-96231b3b80d8
2019-09-07 12:16:23 +00:00
Bjorn Pettersson
2d0b4264f3 [Intrinsic] Add the llvm.umul.fix.sat intrinsic
Summary:
Add an intrinsic that takes 2 unsigned integers with
the scale of them provided as the third argument and
performs fixed point multiplication on them. The
result is saturated and clamped between the largest and
smallest representable values of the first 2 operands.

This is a part of implementing fixed point arithmetic
in clang where some of the more complex operations
will be implemented as intrinsics.

Patch by: leonardchan, bjope

Reviewers: RKSimon, craig.topper, bevinh, leonardchan, lebedev.ri, spatel

Reviewed By: leonardchan

Subscribers: ychen, wuzish, nemanjai, MaskRay, jsji, jdoerfert, Ka-Ka, hiraditya, rjmccall, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57836

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@371308 91177308-0d34-0410-b5e6-96231b3b80d8
2019-09-07 12:16:14 +00:00
Shiva Chen
f0b04e6f72 [TargetLowering] Fix Bugzilla ID 43183 to avoid soften comparison broken with constant inputs
Summary:
  This fixes the bugzilla id 43183 which triggerd by the following commit:
  [RISCV] Avoid generating AssertZext for LP64 ABI when lowering floating LibCall

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370604 91177308-0d34-0410-b5e6-96231b3b80d8
2019-09-01 04:52:54 +00:00
Simon Pilgrim
5bfcd8dbf0 [TargetLowering] SimplifyDemandedBits ADD/SUB/MUL - correctly inherit SDNodeFlags from the original node.
Just disable NSW/NUW flags. This matches what we're already doing for the other situations for these nodes, it was just missed for the demanded constant case.

Noticed by inspection - confirmed in offline discussion with @spatel. I've checked we have test coverage in the x86 extract-bits.ll and extract-lowbits.ll tests

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370497 91177308-0d34-0410-b5e6-96231b3b80d8
2019-08-30 17:58:55 +00:00
Shiva Chen
3767f18861 [RISCV] Avoid generating AssertZext for LP64 ABI when lowering floating LibCall
The patch fixed the issue that RV64 didn't clear the upper bits
when return complex floating value with lp64 ABI.

float _Complex
complex_add(float _Complex a, float _Complex b)
{
   return a + b;
}

RealResult = zero_extend(RealA + RealB)
ImageResult = ImageA + ImageB
Return (RealResult | (ImageResult << 32))

The patch introduces shouldExtendTypeInLibCall target hook to suppress
the AssertZext generation when lowering floating LibCall.

Thanks to Eli's comments from the Bugzilla
https://bugs.llvm.org/show_bug.cgi?id=42820

Differential Revision: https://reviews.llvm.org/D65497

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370275 91177308-0d34-0410-b5e6-96231b3b80d8
2019-08-28 23:40:37 +00:00
Kevin P. Neal
dda5f16734 [FPEnv] Add fptosi and fptoui constrained intrinsics.
This implements constrained floating point intrinsics for FP to signed and
unsigned integers.

Quoting from D32319:
The purpose of the constrained intrinsics is to force the optimizer to
respect the restrictions that will be necessary to support things like the
STDC FENV_ACCESS ON pragma without interfering with optimizations when
these restrictions are not needed.

Reviewed by:	Andrew Kaylor, Craig Topper, Hal Finkel, Cameron McInally, Roman Lebedev, Kit Barton
Approved by:	Craig Topper
Differential Revision:	http://reviews.llvm.org/D63782


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370228 91177308-0d34-0410-b5e6-96231b3b80d8
2019-08-28 16:33:36 +00:00
Amaury Sechet
77ff67796a [TargetLowering] Add buildLegalVectorShuffle facility to help build legal shuffles
Summary: There are at least 2 ways to express the same shuffle. Various pieces of code explicit check for both option, but other places do not when they would benefit from doing it. This patches refactor the codebase to use buildLegalVectorShuffle in order to make that behavior more consistent.

Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri

Subscribers: javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66804

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370190 91177308-0d34-0410-b5e6-96231b3b80d8
2019-08-28 12:00:06 +00:00
Craig Topper
4e272f536e [SelectionDAG][X86] Enable iX SimplifyDemandedBits to vXi1 SimplifyDemandedVectorElts simplification. Add a hack to X86 to avoid a regression
Patch showing the effect of enabling bool vector oversimplification.

Non-VLX builds can simplify a kshift shuffle, but VLX builds simplify:

insert_subvector v8i zeroinitializer, v2i --> insert_subvector v8i undef, v2i

Preventing the removal of the AND to clear the upper bits of result

Differential Revision: https://reviews.llvm.org/D53022

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@369780 91177308-0d34-0410-b5e6-96231b3b80d8
2019-08-23 17:14:58 +00:00
Shiva Chen
bbb2620939 [TargetLowering] Remove optional arguments passing to makeLibCall
The patch introduces MakeLibCallOptions struct as suggested by @efriedma on D65497.
The struct contain argument flags which will pass to makeLibCall function.
The patch should not has any functionality changes.

Differential Revision: https://reviews.llvm.org/D65795

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@369622 91177308-0d34-0410-b5e6-96231b3b80d8
2019-08-22 04:59:43 +00:00
Roman Lebedev
5863423dec [TargetLowering] x s% C == 0 fold: vector divisor with INT_MIN handling
Summary:
The general fold is only valid for positive divisors.
Which effectively means, it is invalid for `INT_MIN` divisors,
and we currently bailout if we see them.

But that is too strict, we can just fix-up the results.
For that, let's do a second computation 'in parallel':
```
Name: srem -> and
Pre: isPowerOf2(C)
%o = srem i8 %X, C
%r = icmp eq %o, 0
  =>
%n = and i8 %X, C-1
%r = icmp eq %n, 0
```
https://rise4fun.com/Alive/Sup

And then just blend results: if the divisor was `INT_MIN`,
pick the value we got via bit-test,
else pick the value from general fold.

There's interesting observation - `ISD::ROTR` is set to
`LegalizeAction::Expand` before AVX512, so we should not
treat `INT_MIN` divisor as even; and as it can be seen
while `@test_srem_odd_even_one` improves on all run-lines,
`@test_srem_odd_even_INT_MIN` only improves for AVX512.

Reviewers: RKSimon, craig.topper, spatel

Reviewed By: RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66300

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@369268 91177308-0d34-0410-b5e6-96231b3b80d8
2019-08-19 15:01:42 +00:00
Daniel Sanders
57a8129407 Apply llvm-prefer-register-over-unsigned from clang-tidy to LLVM
Summary:
This clang-tidy check is looking for unsigned integer variables whose initializer
starts with an implicit cast from llvm::Register and changes the type of the
variable to llvm::Register (dropping the llvm:: where possible).

Partial reverts in:
X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister
X86FixupLEAs.cpp - Some functions return unsigned and arguably should be MCRegister
X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister
HexagonBitSimplify.cpp - Function takes BitTracker::RegisterRef which appears to be unsigned&
MachineVerifier.cpp - Ambiguous operator==() given MCRegister and const Register
PPCFastISel.cpp - No Register::operator-=()
PeepholeOptimizer.cpp - TargetInstrInfo::optimizeLoadInstr() takes an unsigned&
MachineTraceMetrics.cpp - MachineTraceMetrics lacks a suitable constructor

Manual fixups in:
ARMFastISel.cpp - ARMEmitLoad() now takes a Register& instead of unsigned&
HexagonSplitDouble.cpp - Ternary operator was ambiguous between unsigned/Register
HexagonConstExtenders.cpp - Has a local class named Register, used llvm::Register instead of Register.
PPCFastISel.cpp - PPCEmitLoad() now takes a Register& instead of unsigned&

Depends on D65919

Reviewers: arsenm, bogner, craig.topper, RKSimon

Reviewed By: arsenm

Subscribers: RKSimon, craig.topper, lenary, aemerson, wuzish, jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65962

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@369041 91177308-0d34-0410-b5e6-96231b3b80d8
2019-08-15 19:22:08 +00:00
Simon Pilgrim
5d376fa7ac Remove BitVector.h include. NFCI.
BitVector type isn't used at all in the cpp file.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@369007 91177308-0d34-0410-b5e6-96231b3b80d8
2019-08-15 14:39:28 +00:00
Roman Lebedev
c285223daf [CodeGen][SelectionDAG] More efficient code for X % C == 0 (SREM case)
Summary:
This implements an optimization described in Hacker's Delight 10-17:
when `C` is constant, the result of `X % C == 0` can be computed
more cheaply without actually calculating the remainder.
The motivation is discussed here: https://bugs.llvm.org/show_bug.cgi?id=35479.

One huge caveat: this signed case is only valid for positive divisors.

While we can freely negate negative divisors, we can't negate `INT_MIN`,
so for now if `INT_MIN` is encountered, we bailout.
As a follow-up, it should be possible to handle that more gracefully
via extra `and`+`setcc`+`select`.

This passes llvm's test-suite, and from cursory(!) cross-examination
the folds (the assembly) match those of GCC, and manual checking via alive
did not reveal any issues (other than the `INT_MIN` case)

Reviewers: RKSimon, spatel, hermord, craig.topper, xbolva00

Reviewed By: RKSimon, xbolva00

Subscribers: xbolva00, thakis, javed.absar, hiraditya, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65366

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@368702 91177308-0d34-0410-b5e6-96231b3b80d8
2019-08-13 14:57:37 +00:00
Roman Lebedev
e7279f838c [TargetLowering][NFC] prepareUREMEqFold(): fixup comment
The comment initially matched the code, but the code was incorrect
and was fixed after the initial revert back back when it was introduced,
but the comment was never updated.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@368701 91177308-0d34-0410-b5e6-96231b3b80d8
2019-08-13 14:57:08 +00:00
Hans Wennborg
3b17e47a11 Revert r368276 "[TargetLowering] SimplifyDemandedBits - call SimplifyMultipleUseDemandedBits for ISD::EXTRACT_VECTOR_ELT"
This introduced a false positive MemorySanitizer warning about use of
uninitialized memory in a vectorized crc function in Chromium. That suggests
maybe something is not right with this transformation. See
https://crbug.com/992853#c7 for a reproducer.

This also reverts the follow-up commits r368307 and r368308 which
depended on this.

> This patch attempts to peek through vectors based on the demanded bits/elt of a particular ISD::EXTRACT_VECTOR_ELT node, allowing us to avoid dependencies on ops that have no impact on the extract.
>
> In particular this helps remove some unnecessary scalar->vector->scalar patterns.
>
> The wasm shift patterns are annoying - @tlively has indicated that the wasm vector shift codegen are to be refactored in the near-term and isn't considered a major issue.
>
> Differential Revision: https://reviews.llvm.org/D65887

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@368660 91177308-0d34-0410-b5e6-96231b3b80d8
2019-08-13 09:33:25 +00:00
Simon Pilgrim
1d838d593f [TargetLowering] SimplifyDemandedBits - call SimplifyMultipleUseDemandedBits for ISD::TRUNCATE
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@368553 91177308-0d34-0410-b5e6-96231b3b80d8
2019-08-12 10:56:05 +00:00
Simon Pilgrim
621a67330b [TargetLowering] SimplifyDemandedBits - call SimplifyMultipleUseDemandedBits for ISD::EXTRACT_VECTOR_ELT
This patch attempts to peek through vectors based on the demanded bits/elt of a particular ISD::EXTRACT_VECTOR_ELT node, allowing us to avoid dependencies on ops that have no impact on the extract.

In particular this helps remove some unnecessary scalar->vector->scalar patterns.

The wasm shift patterns are annoying - @tlively has indicated that the wasm vector shift codegen are to be refactored in the near-term and isn't considered a major issue.

Differential Revision: https://reviews.llvm.org/D65887

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@368276 91177308-0d34-0410-b5e6-96231b3b80d8
2019-08-08 10:37:03 +00:00
Simon Pilgrim
25608372b0 [TargetLowering] SimplifyDemandedBits - call SimplifyMultipleUseDemandedBits for ISD::VECTOR_SHUFFLE
In particular this helps the SSE vector shift cvttps2dq+add+shl pattern by avoiding the need for zeros in shuffle style extensions to vXi32 types as we'll be shifting out those bits anyway

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@368155 91177308-0d34-0410-b5e6-96231b3b80d8
2019-08-07 11:43:13 +00:00
Aditya Nandakumar
215335e026 [GISel]: Add GISelKnownBits analysis
https://reviews.llvm.org/D65698

This adds a KnownBits analysis pass for GISel. This was done as a
pass (compared to static functions) so that we can add other features
such as caching queries(within a pass and across passes) in the future.
This patch only adds the basic pass boiler plate, and implements a lazy
non caching knownbits implementation (ported from SelectionDAG). I've
also hooked up the AArch64PreLegalizerCombiner pass to use this - there
should be no compile time regression as the analysis is lazy.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@368065 91177308-0d34-0410-b5e6-96231b3b80d8
2019-08-06 17:18:29 +00:00
Simon Pilgrim
474f996554 [TargetLowering] SimplifyMultipleUseDemandedBits - return UNDEF for undemanded ops
If we demand no bits/elts from an Op, just return UNDEF

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@368043 91177308-0d34-0410-b5e6-96231b3b80d8
2019-08-06 14:30:42 +00:00
Craig Topper
d071b0806d [TargetLowering][X86] Teach SimplifyDemandedVectorElts to replace the base vector of INSERT_SUBVECTOR with undef if none of the elements are demanded even if the node has other users.
Summary:
The SimplifyDemandedVectorElts function can replace with undef
when no elements are demanded, but due to how it interacts with
TargetLoweringOpts, it can only do this when the node has
no other users.

Remove a now unneeded DAG combine from the X86 backend.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65713

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@367788 91177308-0d34-0410-b5e6-96231b3b80d8
2019-08-04 17:30:41 +00:00
Bill Wendling
30c0b52ad3 Emit diagnostic if an inline asm constraint requires an immediate
Summary:
An inline asm call can result in an immediate after inlining. Therefore emit a
diagnostic here if constraint requires an immediate but one isn't supplied.

Reviewers: joerg, mgorny, efriedma, rsmith

Reviewed By: joerg

Subscribers: asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, s.egerton, MaskRay, jyknight, dylanmckay, javed.absar, fedor.sergeev, jrtc27, Jim, krytarowski, eraman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60942

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@367750 91177308-0d34-0410-b5e6-96231b3b80d8
2019-08-03 05:52:47 +00:00
Simon Pilgrim
296ab4b2ad [TargetLowering] SimplifyMultipleUseDemandedBits - don't assume INSERT_VECTOR_ELT value type is simple.
Noticed by inspection - this was copied from the X86 target equivalent where we can assume its legal/simple.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@367721 91177308-0d34-0410-b5e6-96231b3b80d8
2019-08-02 21:07:07 +00:00
Simon Pilgrim
20c7527943 [TargetLowering] SimplifyMultipleUseDemandedBits - Add ISD::INSERT_VECTOR_ELT handling
Allow us to peek through vector insertions to avoid dependencies on entire insertion chains.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@367588 91177308-0d34-0410-b5e6-96231b3b80d8
2019-08-01 17:46:44 +00:00
Simon Pilgrim
803ac756ab [TargetLowering] SimplifyMultipleUseDemandedBits - add BITCAST pass through support (Reapplied)
This allows us to peek through BITCASTs, attempt to simplify the source operand, and then bitcast back.

This reapplies rL367091 which was reverted at rL367118 - we were inconsistently peeking through the bitcasts to the source value.

Fixes PR42777

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@367174 91177308-0d34-0410-b5e6-96231b3b80d8
2019-07-27 14:11:59 +00:00
Simon Pilgrim
e1534e4ff4 [SelectionDAG] Check for any recursion depth greater than or equal to limit instead of just equal the limit.
If anything called the recursive isKnownNeverNaN/computeKnownBits/ComputeNumSignBits/SimplifyDemandedBits/SimplifyMultipleUseDemandedBits with an incorrect depth then we could continue to recurse if we'd already exceeded the depth limit.

This replaces the limit check (Depth == 6) with a (Depth >= 6) to make sure that we don't circumvent it. 

This causes a couple of regressions as a mixture of calls (SimplifyMultipleUseDemandedBits + combineX86ShufflesRecursively) were calling with depths that were already over the limit. I've fixed SimplifyMultipleUseDemandedBits to not do this. combineX86ShufflesRecursively is trickier as we get a lot of regressions if we reduce its own limit from 8 to 6 (it also starts at Depth == 1 instead of Depth == 0 like the others....) - I'll see what I can do in future patches.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@367171 91177308-0d34-0410-b5e6-96231b3b80d8
2019-07-27 12:48:46 +00:00
Simon Pilgrim
5972aca6be [TargetLowering] Add depth limit to SimplifyMultipleUseDemandedBits
We're getting reports of massive compile time increases because SimplifyMultipleUseDemandedBits was losing track of the depth and not earlying-out. No repro yet, but consider this a pre-emptive commit.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@367169 91177308-0d34-0410-b5e6-96231b3b80d8
2019-07-27 12:23:36 +00:00
Nico Weber
37656fef28 Revert r367091, it caused PR42777.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@367118 91177308-0d34-0410-b5e6-96231b3b80d8
2019-07-26 14:58:42 +00:00
Simon Pilgrim
36e90d3860 [TargetLowering] SimplifyMultipleUseDemandedBits - add SIGN_EXTEND_INREG support.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@367096 91177308-0d34-0410-b5e6-96231b3b80d8
2019-07-26 09:41:08 +00:00
Simon Pilgrim
b8ff25a954 [TargetLowering] SimplifyMultipleUseDemandedBits - add BITCAST pass through support.
This allows us to peek through BITCASTs and attempt simplify the source operand, and then bitcast back.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@367091 91177308-0d34-0410-b5e6-96231b3b80d8
2019-07-26 08:38:39 +00:00
Roman Lebedev
1dea4cd1d3 [Codegen] (X & (C l>>/<< Y)) ==/!= 0 --> ((X <</l>> Y) & C) ==/!= 0 fold
Summary:
This was originally reported in D62818.
https://rise4fun.com/Alive/oPH

InstCombine does the opposite fold, in hope that `C l>>/<< Y` expression
will be hoisted out of a loop if `Y` is invariant and `X` is not.
But as it is seen from the diffs here, if it didn't get hoisted,
the produced assembly is almost universally worse.

Much like with my recent "hoist add/sub by/from const" patches,
we should get almost universal win if we hoist constant,
there is almost always an "and/test by imm" instruction,
but "shift of imm" not so much, so we may avoid having to
materialize the immediate, and thus need one less register.
And since we now shift not by constant, but by something else,
the live-range of that something else may reduce.

Special care needs to be applied not to disturb x86 `BT` / hexagon `tstbit`
instruction pattern. And to not get into endless combine loop.

Reviewers: RKSimon, efriedma, t.p.northover, craig.topper, spatel, arsenm

Reviewed By: spatel

Subscribers: hiraditya, MaskRay, wuzish, xbolva00, nikic, nemanjai, jvesely, wdng, nhaehnle, javed.absar, tpr, kristof.beyls, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62871

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@366955 91177308-0d34-0410-b5e6-96231b3b80d8
2019-07-24 22:57:22 +00:00
Sanjay Patel
3109dd325b [SDAG] convert (sub x, 1) to (add x, -1) in ctpop expansion; NFC
We canonicalize to the add form, so create that directly for efficiency.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@366914 91177308-0d34-0410-b5e6-96231b3b80d8
2019-07-24 15:43:50 +00:00
Simon Pilgrim
c6f823d406 [TargetLowering] SimplifyMultipleUseDemandedBits - add VECTOR_SHUFFLE support.
If all the demanded elts are from one operand and are inline, then we can use the operand directly.

The changes are mainly from SSE41 targets which has blendvpd but not cmpgtq, allowing the v2i64 comparison to be simplified as we only need the signbit from alternate v4i32 elements.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@366817 91177308-0d34-0410-b5e6-96231b3b80d8
2019-07-23 15:35:55 +00:00
Simon Pilgrim
c0664972a0 [TargetLowering] Add SimplifyMultipleUseDemandedBits
This patch introduces the DAG version of SimplifyMultipleUseDemandedBits, which attempts to peek through ops (mainly and/or/xor so far) that don't contribute to the demandedbits/elts of a node - which means we can do this even in cases where we have multiple uses of an op, which normally requires us to demanded all bits/elts. The intention is to remove a similar instruction - SelectionDAG::GetDemandedBits - once SimplifyMultipleUseDemandedBits has matured.

The InstCombine version of SimplifyMultipleUseDemandedBits can constant fold which I haven't added here yet, and so far I've only wired this up to some basic binops (and/or/xor/add/sub/mul) to demonstrate its use.

We do see a couple of regressions that need to be addressed:

    AMDGPU unsigned dot product codegen retains an AND mask (for ZERO_EXTEND) that it previously removed (but otherwise the dotproduct codegen is a lot better).
	
    X86/AVX2 has poor handling of vector ANY_EXTEND/ANY_EXTEND_VECTOR_INREG - it prematurely gets converted to ZERO_EXTEND_VECTOR_INREG.

The code owners have confirmed its ok for these cases to fixed up in future patches.

Differential Revision: https://reviews.llvm.org/D63281

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@366799 91177308-0d34-0410-b5e6-96231b3b80d8
2019-07-23 12:39:08 +00:00
Roman Lebedev
5264ba23f2 [Codegen][SelectionDAG] X u% C == 0 fold: non-splat vector improvements
Summary:
Four things here:
1. Generalize the fold to handle non-splat divisors. Reasonably trivial.
2. Unban power-of-two divisors. I don't see any reason why they should
   be illegal.
   * There is no ban in Hacker's Delight
   * I think the ban came from the same bug that caused the miscompile
      in the base patch - in `floor((2^W - 1) / D)` we were dividing by
      `D0` instead of `D`, and we **were** ensuring that `D0` is not `1`,
      which made sense.
3. Unban `1` divisors. I no longer believe Hacker's Delight actually says
   that the fold is invalid for `D = 0`. Further considerations:
   * We know that
     * `(X u% 1) == 0`  can be constant-folded to `1`,
     * `(X u% 1) != 0`  can be constant-folded to `0`,
   *  Also, we know that
     * `X u<= -1` can be constant-folded to `1`,
     * `X u>  -1` can be constant-folded to `0`,
   * https://godbolt.org/z/7jnZJX https://rise4fun.com/Alive/oF6p
   * We know will end up with the following:
       `(setule/setugt (rotr (mul N, P), K), Q)`
   * Therefore, for given new DAG nodes and comparison predicates
     (`ule`/`ugt`), we will still produce the correct answer if:
     `Q` is a all-ones constant; and both `P` and `K` are *anything*
     other than `undef`.
   * The fold will indeed produce `Q = all-ones`.
4. Try to re-splat the `P` and `K` vectors - we don't care about
   their values for the lanes where divisor was `1`.

Reviewers: RKSimon, hermord, craig.topper, spatel, xbolva00

Reviewed By: RKSimon

Subscribers: hiraditya, javed.absar, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63963

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@366637 91177308-0d34-0410-b5e6-96231b3b80d8
2019-07-20 16:33:15 +00:00
Sanjay Patel
4b3ebe8332 [SDAG] commute setcc operands to match a subtract
If we have:

R = sub X, Y
P = cmp Y, X

...then flipping the operands in the compare instruction can allow using a subtract that sets compare flags.

Motivated by diffs in D58875 - not sure if this changes anything there,
but this seems like a good thing independent of that.

There's a more involved version of this transform already in IR (in instcombine
although that seems misplaced to me) - see "swapMayExposeCSEOpportunities()".

Differential Revision: https://reviews.llvm.org/D63958

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@365711 91177308-0d34-0410-b5e6-96231b3b80d8
2019-07-10 23:23:54 +00:00
Nick Desaulniers
7cf1e3f0f2 [TargetLowering] support BlockAddress as "i" inline asm constraint
Summary:
This allows passing address of labels to inline assembly "i" input
constraints.

Fixes pr/42502.

Reviewers: ostannard

Reviewed By: ostannard

Subscribers: void, echristo, nathanchance, ostannard, javed.absar, hiraditya, llvm-commits, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64167

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@365664 91177308-0d34-0410-b5e6-96231b3b80d8
2019-07-10 17:08:25 +00:00
Simon Pilgrim
a9563933b2 [TargetLowering] SimplifyDemandedBits - just call computeKnownBits for BUILD_VECTOR cases.
Don't do this locally, computeKnownBits does this better (and can handle non-constant cases as well).

A next step would be to actually simplify non-constant elements - building on what we already do in SimplifyDemandedVectorElts.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@365309 91177308-0d34-0410-b5e6-96231b3b80d8
2019-07-08 11:00:39 +00:00
Roman Lebedev
7df05c88b0 [NFC][TargetLowering] Some preparatory cleanups around 'prepareUREMEqFold()' from D63963
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364921 91177308-0d34-0410-b5e6-96231b3b80d8
2019-07-02 13:21:23 +00:00
Benjamin Kramer
0ca3c92d55 [SelectionDAG] Do minnum->minimum at legalization time instead of building time
The SDAGBuilder behavior stems from the days when we didn't have fast
math flags available in SDAG. We do now and doing the transformation in
the legalizer has the advantage that it also works for vector types.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364743 91177308-0d34-0410-b5e6-96231b3b80d8
2019-07-01 11:00:23 +00:00
Roman Lebedev
65afd3a0fd [CodeGen] [SelectionDAG] More efficient code for X % C == 0 (UREM case) (try 3)
Summary:
I'm submitting a new revision since i don't understand how to reclaim/reopen/take over the existing one, D50222.
There is no such action in "Add Action" menu...

This implements an optimization described in Hacker's Delight 10-17: when `C` is constant,
the result of `X % C == 0` can be computed more cheaply without actually calculating the remainder.
The motivation is discussed here: https://bugs.llvm.org/show_bug.cgi?id=35479.

This is a recommit, the original commit rL364563 was reverted in rL364568
because test-suite detected miscompile - the new comparison constant 'Q'
was being computed incorrectly (we divided by `D0` instead of `D`).

Original patch D50222 by @hermord (Dmytro Shynkevych)

Notes:
- In principle, it's possible to also handle the `X % C1 == C2` case, as discussed on bugzilla.
  This seems to require an extra branch on overflow, so I refrained from implementing this for now.
- An explicit check for when the `REM` can be reduced to just its LHS is included:
  the `X % C` == 0 optimization breaks `test1` in `test/CodeGen/X86/jump_sign.ll` otherwise.
  I hadn't managed to find a better way to not generate worse output in this case.
- The `test/CodeGen/X86/jump_sign.ll` regresses, and is being fixed by a followup patch D63390.

Reviewers: RKSimon, craig.topper, spatel, hermord, xbolva00

Reviewed By: RKSimon, xbolva00

Subscribers: dexonsmith, kristina, xbolva00, javed.absar, llvm-commits, hermord

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63391

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364600 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-27 21:52:10 +00:00
Roman Lebedev
1df452bb5a Revert "[CodeGen] [SelectionDAG] More efficient code for X % C == 0 (UREM case) (try 2)"
*Appears* to break test-suite on
http://lab.llvm.org:8011/builders/clang-cmake-x86_64-sde-avx512-linux/builds/23790

FAIL: burg.execution_time
FAIL: spiff.execution_time
FAIL: employ.execution_time
FAIL: llu.execution_time
FAIL: gramschmidt.execution_time
FAIL: fdtd-apml.execution_time

This reverts commit r364563.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364568 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-27 17:22:31 +00:00
Roman Lebedev
84139109d4 [CodeGen] [SelectionDAG] More efficient code for X % C == 0 (UREM case) (try 2)
Summary:
I'm submitting a new revision since i don't understand how to reclaim/reopen/take over the existing one, D50222.
There is no such action in "Add Action" menu...
Original patch D50222 by @hermord (Dmytro Shynkevych)

This implements an optimization described in Hacker's Delight 10-17: when `C` is constant,
the result of `X % C == 0` can be computed more cheaply without actually calculating the remainder.
The motivation is discussed here: https://bugs.llvm.org/show_bug.cgi?id=35479.

Original patch author: @hermord (Dmytro Shynkevych)!

Notes:
- In principle, it's possible to also handle the `X % C1 == C2` case, as discussed on bugzilla.
  This seems to require an extra branch on overflow, so I refrained from implementing this for now.
- An explicit check for when the `REM` can be reduced to just its LHS is included:
  the `X % C` == 0 optimization breaks `test1` in `test/CodeGen/X86/jump_sign.ll` otherwise.
  I hadn't managed to find a better way to not generate worse output in this case.
- The `test/CodeGen/X86/jump_sign.ll` regresses, and is being fixed by a followup patch D63390.

Reviewers: RKSimon, craig.topper, spatel, hermord, xbolva00

Reviewed By: RKSimon, xbolva00

Subscribers: xbolva00, javed.absar, llvm-commits, hermord

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63391

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364563 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-27 16:45:42 +00:00
Simon Pilgrim
39f0e3cf18 [TargetLowering] SimplifyDemandedVectorElts - add shift/rotate support.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364548 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-27 14:25:54 +00:00
Simon Pilgrim
c2a0046960 [TargetLowering] SimplifyDemandedBits - use DemandedElts to better identify partial splat shift amounts
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364541 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-27 13:48:43 +00:00
Sanjay Patel
4ad4edc530 [SDAG] expand ctpop != 1
Change the generic ctpop expansion to more efficiently handle a
check for not-a-power-of-two value:
(ctpop x) != 1 --> (x == 0) || ((x & x-1) != 0)

This is the inverted predicate sibling pattern that was added with:
D63004

This should have been done before I changed IR canonicalization to
favor this form with:
rL364246
...so if this requires revert/changing, the earlier commit may also
need to modified.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364319 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-25 14:46:52 +00:00