RPCS3/llvm - llvm - Gitea: Git with a cup of tea

RPCS3/llvm

mirror of https://github.com/RPCS3/llvm.git synced 2025-05-16 02:16:23 +00:00

Author	SHA1	Message	Date
Craig Topper	301db41084	[SimplifyDemandedBits] Use APInt::intersects to instead of ANDing and comparing to 0 separately. NFC git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@372158 91177308-0d34-0410-b5e6-96231b3b80d8	2019-09-17 18:19:02 +00:00
Simon Pilgrim	88ae6d603d	[TargetLowering] SimplifyDemandedBits - add EXTRACT_SUBVECTOR support. Call SimplifyDemandedBits on the source vector. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@371923 91177308-0d34-0410-b5e6-96231b3b80d8	2019-09-14 16:38:26 +00:00
Philip Reames	11df0bc741	[SDAG] Update generic code to conservatively check for isAtomic in addition to isVolatile This is the first sweep of generic code to add isAtomic bailouts where appropriate. The intention here is to have the switch from AtomicSDNode to LoadSDNode/StoreSDNode be close to NFC; that is, I'm not looking to allow additional optimizations at this time. That will come later. See D66309 for context. Differential Revision: https://reviews.llvm.org/D66318 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@371786 91177308-0d34-0410-b5e6-96231b3b80d8	2019-09-12 22:49:17 +00:00
Tim Northover	43f94c59dd	GlobalISel: add combiner to form indexed loads. Loosely based on DAGCombiner version, but this part is slightly simpler in GlobalIsel because all address calculation is performed by G_GEP. That makes the inc/dec distinction moot so there's just pre/post to think about. No targets can handle it yet so testing is via a special flag that overrides target hooks. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@371384 91177308-0d34-0410-b5e6-96231b3b80d8	2019-09-09 10:04:23 +00:00
Bjorn Pettersson	fb1c8990ae	[CodeGen] Handle SMULFIXSAT with scale zero in TargetLowering::expandFixedPointMul Summary: Normally TargetLowering::expandFixedPointMul would handle SMULFIXSAT with scale zero by using an SMULO to compute the product and determine if saturation is needed (if overflow happened). But if SMULO isn't custom/legal it falls through and uses the same technique, using MULHS/SMUL_LOHI, as used for non-zero scales. Problem was that when checking for overflow (handling saturation) when not using MULO we did not expect to find a zero scale. So we ended up in an assertion when doing APInt::getLowBitsSet(VTSize, Scale - 1) This patch fixes the problem by adding a new special case for how saturation is computed when scale is zero. Reviewers: RKSimon, bevinh, leonardchan, spatel Reviewed By: RKSimon Subscribers: wuzish, nemanjai, hiraditya, MaskRay, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67071 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@371309 91177308-0d34-0410-b5e6-96231b3b80d8	2019-09-07 12:16:23 +00:00
Bjorn Pettersson	2d0b4264f3	[Intrinsic] Add the llvm.umul.fix.sat intrinsic Summary: Add an intrinsic that takes 2 unsigned integers with the scale of them provided as the third argument and performs fixed point multiplication on them. The result is saturated and clamped between the largest and smallest representable values of the first 2 operands. This is a part of implementing fixed point arithmetic in clang where some of the more complex operations will be implemented as intrinsics. Patch by: leonardchan, bjope Reviewers: RKSimon, craig.topper, bevinh, leonardchan, lebedev.ri, spatel Reviewed By: leonardchan Subscribers: ychen, wuzish, nemanjai, MaskRay, jsji, jdoerfert, Ka-Ka, hiraditya, rjmccall, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57836 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@371308 91177308-0d34-0410-b5e6-96231b3b80d8	2019-09-07 12:16:14 +00:00
Shiva Chen	f0b04e6f72	[TargetLowering] Fix Bugzilla ID 43183 to avoid soften comparison broken with constant inputs Summary: This fixes the bugzilla id 43183 which triggerd by the following commit: [RISCV] Avoid generating AssertZext for LP64 ABI when lowering floating LibCall git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370604 91177308-0d34-0410-b5e6-96231b3b80d8	2019-09-01 04:52:54 +00:00
Simon Pilgrim	5bfcd8dbf0	[TargetLowering] SimplifyDemandedBits ADD/SUB/MUL - correctly inherit SDNodeFlags from the original node. Just disable NSW/NUW flags. This matches what we're already doing for the other situations for these nodes, it was just missed for the demanded constant case. Noticed by inspection - confirmed in offline discussion with @spatel. I've checked we have test coverage in the x86 extract-bits.ll and extract-lowbits.ll tests git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370497 91177308-0d34-0410-b5e6-96231b3b80d8	2019-08-30 17:58:55 +00:00
Shiva Chen	3767f18861	[RISCV] Avoid generating AssertZext for LP64 ABI when lowering floating LibCall The patch fixed the issue that RV64 didn't clear the upper bits when return complex floating value with lp64 ABI. float _Complex complex_add(float _Complex a, float _Complex b) { return a + b; } RealResult = zero_extend(RealA + RealB) ImageResult = ImageA + ImageB Return (RealResult \| (ImageResult << 32)) The patch introduces shouldExtendTypeInLibCall target hook to suppress the AssertZext generation when lowering floating LibCall. Thanks to Eli's comments from the Bugzilla https://bugs.llvm.org/show_bug.cgi?id=42820 Differential Revision: https://reviews.llvm.org/D65497 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370275 91177308-0d34-0410-b5e6-96231b3b80d8	2019-08-28 23:40:37 +00:00
Kevin P. Neal	dda5f16734	[FPEnv] Add fptosi and fptoui constrained intrinsics. This implements constrained floating point intrinsics for FP to signed and unsigned integers. Quoting from D32319: The purpose of the constrained intrinsics is to force the optimizer to respect the restrictions that will be necessary to support things like the STDC FENV_ACCESS ON pragma without interfering with optimizations when these restrictions are not needed. Reviewed by: Andrew Kaylor, Craig Topper, Hal Finkel, Cameron McInally, Roman Lebedev, Kit Barton Approved by: Craig Topper Differential Revision: http://reviews.llvm.org/D63782 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370228 91177308-0d34-0410-b5e6-96231b3b80d8	2019-08-28 16:33:36 +00:00
Amaury Sechet	77ff67796a	[TargetLowering] Add buildLegalVectorShuffle facility to help build legal shuffles Summary: There are at least 2 ways to express the same shuffle. Various pieces of code explicit check for both option, but other places do not when they would benefit from doing it. This patches refactor the codebase to use buildLegalVectorShuffle in order to make that behavior more consistent. Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri Subscribers: javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66804 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370190 91177308-0d34-0410-b5e6-96231b3b80d8	2019-08-28 12:00:06 +00:00
Craig Topper	4e272f536e	[SelectionDAG][X86] Enable iX SimplifyDemandedBits to vXi1 SimplifyDemandedVectorElts simplification. Add a hack to X86 to avoid a regression Patch showing the effect of enabling bool vector oversimplification. Non-VLX builds can simplify a kshift shuffle, but VLX builds simplify: insert_subvector v8i zeroinitializer, v2i --> insert_subvector v8i undef, v2i Preventing the removal of the AND to clear the upper bits of result Differential Revision: https://reviews.llvm.org/D53022 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@369780 91177308-0d34-0410-b5e6-96231b3b80d8	2019-08-23 17:14:58 +00:00
Shiva Chen	bbb2620939	[TargetLowering] Remove optional arguments passing to makeLibCall The patch introduces MakeLibCallOptions struct as suggested by @efriedma on D65497. The struct contain argument flags which will pass to makeLibCall function. The patch should not has any functionality changes. Differential Revision: https://reviews.llvm.org/D65795 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@369622 91177308-0d34-0410-b5e6-96231b3b80d8	2019-08-22 04:59:43 +00:00
Roman Lebedev	5863423dec	[TargetLowering] x s% C == 0 fold: vector divisor with INT_MIN handling Summary: The general fold is only valid for positive divisors. Which effectively means, it is invalid for `INT_MIN` divisors, and we currently bailout if we see them. But that is too strict, we can just fix-up the results. For that, let's do a second computation 'in parallel': ``` Name: srem -> and Pre: isPowerOf2(C) %o = srem i8 %X, C %r = icmp eq %o, 0 => %n = and i8 %X, C-1 %r = icmp eq %n, 0 ``` https://rise4fun.com/Alive/Sup And then just blend results: if the divisor was `INT_MIN`, pick the value we got via bit-test, else pick the value from general fold. There's interesting observation - `ISD::ROTR` is set to `LegalizeAction::Expand` before AVX512, so we should not treat `INT_MIN` divisor as even; and as it can be seen while `@test_srem_odd_even_one` improves on all run-lines, `@test_srem_odd_even_INT_MIN` only improves for AVX512. Reviewers: RKSimon, craig.topper, spatel Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66300 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@369268 91177308-0d34-0410-b5e6-96231b3b80d8	2019-08-19 15:01:42 +00:00
Daniel Sanders	57a8129407	Apply llvm-prefer-register-over-unsigned from clang-tidy to LLVM Summary: This clang-tidy check is looking for unsigned integer variables whose initializer starts with an implicit cast from llvm::Register and changes the type of the variable to llvm::Register (dropping the llvm:: where possible). Partial reverts in: X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister X86FixupLEAs.cpp - Some functions return unsigned and arguably should be MCRegister X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister HexagonBitSimplify.cpp - Function takes BitTracker::RegisterRef which appears to be unsigned& MachineVerifier.cpp - Ambiguous operator==() given MCRegister and const Register PPCFastISel.cpp - No Register::operator-=() PeepholeOptimizer.cpp - TargetInstrInfo::optimizeLoadInstr() takes an unsigned& MachineTraceMetrics.cpp - MachineTraceMetrics lacks a suitable constructor Manual fixups in: ARMFastISel.cpp - ARMEmitLoad() now takes a Register& instead of unsigned& HexagonSplitDouble.cpp - Ternary operator was ambiguous between unsigned/Register HexagonConstExtenders.cpp - Has a local class named Register, used llvm::Register instead of Register. PPCFastISel.cpp - PPCEmitLoad() now takes a Register& instead of unsigned& Depends on D65919 Reviewers: arsenm, bogner, craig.topper, RKSimon Reviewed By: arsenm Subscribers: RKSimon, craig.topper, lenary, aemerson, wuzish, jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65962 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@369041 91177308-0d34-0410-b5e6-96231b3b80d8	2019-08-15 19:22:08 +00:00
Simon Pilgrim	5d376fa7ac	Remove BitVector.h include. NFCI. BitVector type isn't used at all in the cpp file. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@369007 91177308-0d34-0410-b5e6-96231b3b80d8	2019-08-15 14:39:28 +00:00
Roman Lebedev	c285223daf	[CodeGen][SelectionDAG] More efficient code for X % C == 0 (SREM case) Summary: This implements an optimization described in Hacker's Delight 10-17: when `C` is constant, the result of `X % C == 0` can be computed more cheaply without actually calculating the remainder. The motivation is discussed here: https://bugs.llvm.org/show_bug.cgi?id=35479. One huge caveat: this signed case is only valid for positive divisors. While we can freely negate negative divisors, we can't negate `INT_MIN`, so for now if `INT_MIN` is encountered, we bailout. As a follow-up, it should be possible to handle that more gracefully via extra `and`+`setcc`+`select`. This passes llvm's test-suite, and from cursory(!) cross-examination the folds (the assembly) match those of GCC, and manual checking via alive did not reveal any issues (other than the `INT_MIN` case) Reviewers: RKSimon, spatel, hermord, craig.topper, xbolva00 Reviewed By: RKSimon, xbolva00 Subscribers: xbolva00, thakis, javed.absar, hiraditya, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65366 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@368702 91177308-0d34-0410-b5e6-96231b3b80d8	2019-08-13 14:57:37 +00:00
Roman Lebedev	e7279f838c	[TargetLowering][NFC] prepareUREMEqFold(): fixup comment The comment initially matched the code, but the code was incorrect and was fixed after the initial revert back back when it was introduced, but the comment was never updated. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@368701 91177308-0d34-0410-b5e6-96231b3b80d8	2019-08-13 14:57:08 +00:00
Hans Wennborg	3b17e47a11	Revert r368276 "[TargetLowering] SimplifyDemandedBits - call SimplifyMultipleUseDemandedBits for ISD::EXTRACT_VECTOR_ELT" This introduced a false positive MemorySanitizer warning about use of uninitialized memory in a vectorized crc function in Chromium. That suggests maybe something is not right with this transformation. See https://crbug.com/992853#c7 for a reproducer. This also reverts the follow-up commits r368307 and r368308 which depended on this. > This patch attempts to peek through vectors based on the demanded bits/elt of a particular ISD::EXTRACT_VECTOR_ELT node, allowing us to avoid dependencies on ops that have no impact on the extract. > > In particular this helps remove some unnecessary scalar->vector->scalar patterns. > > The wasm shift patterns are annoying - @tlively has indicated that the wasm vector shift codegen are to be refactored in the near-term and isn't considered a major issue. > > Differential Revision: https://reviews.llvm.org/D65887 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@368660 91177308-0d34-0410-b5e6-96231b3b80d8	2019-08-13 09:33:25 +00:00
Simon Pilgrim	1d838d593f	[TargetLowering] SimplifyDemandedBits - call SimplifyMultipleUseDemandedBits for ISD::TRUNCATE git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@368553 91177308-0d34-0410-b5e6-96231b3b80d8	2019-08-12 10:56:05 +00:00
Simon Pilgrim	621a67330b	[TargetLowering] SimplifyDemandedBits - call SimplifyMultipleUseDemandedBits for ISD::EXTRACT_VECTOR_ELT This patch attempts to peek through vectors based on the demanded bits/elt of a particular ISD::EXTRACT_VECTOR_ELT node, allowing us to avoid dependencies on ops that have no impact on the extract. In particular this helps remove some unnecessary scalar->vector->scalar patterns. The wasm shift patterns are annoying - @tlively has indicated that the wasm vector shift codegen are to be refactored in the near-term and isn't considered a major issue. Differential Revision: https://reviews.llvm.org/D65887 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@368276 91177308-0d34-0410-b5e6-96231b3b80d8	2019-08-08 10:37:03 +00:00
Simon Pilgrim	25608372b0	[TargetLowering] SimplifyDemandedBits - call SimplifyMultipleUseDemandedBits for ISD::VECTOR_SHUFFLE In particular this helps the SSE vector shift cvttps2dq+add+shl pattern by avoiding the need for zeros in shuffle style extensions to vXi32 types as we'll be shifting out those bits anyway git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@368155 91177308-0d34-0410-b5e6-96231b3b80d8	2019-08-07 11:43:13 +00:00
Aditya Nandakumar	215335e026	[GISel]: Add GISelKnownBits analysis https://reviews.llvm.org/D65698 This adds a KnownBits analysis pass for GISel. This was done as a pass (compared to static functions) so that we can add other features such as caching queries(within a pass and across passes) in the future. This patch only adds the basic pass boiler plate, and implements a lazy non caching knownbits implementation (ported from SelectionDAG). I've also hooked up the AArch64PreLegalizerCombiner pass to use this - there should be no compile time regression as the analysis is lazy. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@368065 91177308-0d34-0410-b5e6-96231b3b80d8	2019-08-06 17:18:29 +00:00
Simon Pilgrim	474f996554	[TargetLowering] SimplifyMultipleUseDemandedBits - return UNDEF for undemanded ops If we demand no bits/elts from an Op, just return UNDEF git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@368043 91177308-0d34-0410-b5e6-96231b3b80d8	2019-08-06 14:30:42 +00:00
Craig Topper	d071b0806d	[TargetLowering][X86] Teach SimplifyDemandedVectorElts to replace the base vector of INSERT_SUBVECTOR with undef if none of the elements are demanded even if the node has other users. Summary: The SimplifyDemandedVectorElts function can replace with undef when no elements are demanded, but due to how it interacts with TargetLoweringOpts, it can only do this when the node has no other users. Remove a now unneeded DAG combine from the X86 backend. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65713 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@367788 91177308-0d34-0410-b5e6-96231b3b80d8	2019-08-04 17:30:41 +00:00
Bill Wendling	30c0b52ad3	Emit diagnostic if an inline asm constraint requires an immediate Summary: An inline asm call can result in an immediate after inlining. Therefore emit a diagnostic here if constraint requires an immediate but one isn't supplied. Reviewers: joerg, mgorny, efriedma, rsmith Reviewed By: joerg Subscribers: asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, s.egerton, MaskRay, jyknight, dylanmckay, javed.absar, fedor.sergeev, jrtc27, Jim, krytarowski, eraman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60942 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@367750 91177308-0d34-0410-b5e6-96231b3b80d8	2019-08-03 05:52:47 +00:00
Simon Pilgrim	296ab4b2ad	[TargetLowering] SimplifyMultipleUseDemandedBits - don't assume INSERT_VECTOR_ELT value type is simple. Noticed by inspection - this was copied from the X86 target equivalent where we can assume its legal/simple. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@367721 91177308-0d34-0410-b5e6-96231b3b80d8	2019-08-02 21:07:07 +00:00
Simon Pilgrim	20c7527943	[TargetLowering] SimplifyMultipleUseDemandedBits - Add ISD::INSERT_VECTOR_ELT handling Allow us to peek through vector insertions to avoid dependencies on entire insertion chains. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@367588 91177308-0d34-0410-b5e6-96231b3b80d8	2019-08-01 17:46:44 +00:00
Simon Pilgrim	803ac756ab	[TargetLowering] SimplifyMultipleUseDemandedBits - add BITCAST pass through support (Reapplied) This allows us to peek through BITCASTs, attempt to simplify the source operand, and then bitcast back. This reapplies rL367091 which was reverted at rL367118 - we were inconsistently peeking through the bitcasts to the source value. Fixes PR42777 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@367174 91177308-0d34-0410-b5e6-96231b3b80d8	2019-07-27 14:11:59 +00:00
Simon Pilgrim	e1534e4ff4	[SelectionDAG] Check for any recursion depth greater than or equal to limit instead of just equal the limit. If anything called the recursive isKnownNeverNaN/computeKnownBits/ComputeNumSignBits/SimplifyDemandedBits/SimplifyMultipleUseDemandedBits with an incorrect depth then we could continue to recurse if we'd already exceeded the depth limit. This replaces the limit check (Depth == 6) with a (Depth >= 6) to make sure that we don't circumvent it. This causes a couple of regressions as a mixture of calls (SimplifyMultipleUseDemandedBits + combineX86ShufflesRecursively) were calling with depths that were already over the limit. I've fixed SimplifyMultipleUseDemandedBits to not do this. combineX86ShufflesRecursively is trickier as we get a lot of regressions if we reduce its own limit from 8 to 6 (it also starts at Depth == 1 instead of Depth == 0 like the others....) - I'll see what I can do in future patches. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@367171 91177308-0d34-0410-b5e6-96231b3b80d8	2019-07-27 12:48:46 +00:00
Simon Pilgrim	5972aca6be	[TargetLowering] Add depth limit to SimplifyMultipleUseDemandedBits We're getting reports of massive compile time increases because SimplifyMultipleUseDemandedBits was losing track of the depth and not earlying-out. No repro yet, but consider this a pre-emptive commit. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@367169 91177308-0d34-0410-b5e6-96231b3b80d8	2019-07-27 12:23:36 +00:00
Nico Weber	37656fef28	Revert r367091, it caused PR42777. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@367118 91177308-0d34-0410-b5e6-96231b3b80d8	2019-07-26 14:58:42 +00:00
Simon Pilgrim	36e90d3860	[TargetLowering] SimplifyMultipleUseDemandedBits - add SIGN_EXTEND_INREG support. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@367096 91177308-0d34-0410-b5e6-96231b3b80d8	2019-07-26 09:41:08 +00:00
Simon Pilgrim	b8ff25a954	[TargetLowering] SimplifyMultipleUseDemandedBits - add BITCAST pass through support. This allows us to peek through BITCASTs and attempt simplify the source operand, and then bitcast back. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@367091 91177308-0d34-0410-b5e6-96231b3b80d8	2019-07-26 08:38:39 +00:00
Roman Lebedev	1dea4cd1d3	[Codegen] (X & (C l>>/<< Y)) ==/!= 0 --> ((X <</l>> Y) & C) ==/!= 0 fold Summary: This was originally reported in D62818. https://rise4fun.com/Alive/oPH InstCombine does the opposite fold, in hope that `C l>>/<< Y` expression will be hoisted out of a loop if `Y` is invariant and `X` is not. But as it is seen from the diffs here, if it didn't get hoisted, the produced assembly is almost universally worse. Much like with my recent "hoist add/sub by/from const" patches, we should get almost universal win if we hoist constant, there is almost always an "and/test by imm" instruction, but "shift of imm" not so much, so we may avoid having to materialize the immediate, and thus need one less register. And since we now shift not by constant, but by something else, the live-range of that something else may reduce. Special care needs to be applied not to disturb x86 `BT` / hexagon `tstbit` instruction pattern. And to not get into endless combine loop. Reviewers: RKSimon, efriedma, t.p.northover, craig.topper, spatel, arsenm Reviewed By: spatel Subscribers: hiraditya, MaskRay, wuzish, xbolva00, nikic, nemanjai, jvesely, wdng, nhaehnle, javed.absar, tpr, kristof.beyls, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62871 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@366955 91177308-0d34-0410-b5e6-96231b3b80d8	2019-07-24 22:57:22 +00:00
Sanjay Patel	3109dd325b	[SDAG] convert (sub x, 1) to (add x, -1) in ctpop expansion; NFC We canonicalize to the add form, so create that directly for efficiency. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@366914 91177308-0d34-0410-b5e6-96231b3b80d8	2019-07-24 15:43:50 +00:00
Simon Pilgrim	c6f823d406	[TargetLowering] SimplifyMultipleUseDemandedBits - add VECTOR_SHUFFLE support. If all the demanded elts are from one operand and are inline, then we can use the operand directly. The changes are mainly from SSE41 targets which has blendvpd but not cmpgtq, allowing the v2i64 comparison to be simplified as we only need the signbit from alternate v4i32 elements. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@366817 91177308-0d34-0410-b5e6-96231b3b80d8	2019-07-23 15:35:55 +00:00
Simon Pilgrim	c0664972a0	[TargetLowering] Add SimplifyMultipleUseDemandedBits This patch introduces the DAG version of SimplifyMultipleUseDemandedBits, which attempts to peek through ops (mainly and/or/xor so far) that don't contribute to the demandedbits/elts of a node - which means we can do this even in cases where we have multiple uses of an op, which normally requires us to demanded all bits/elts. The intention is to remove a similar instruction - SelectionDAG::GetDemandedBits - once SimplifyMultipleUseDemandedBits has matured. The InstCombine version of SimplifyMultipleUseDemandedBits can constant fold which I haven't added here yet, and so far I've only wired this up to some basic binops (and/or/xor/add/sub/mul) to demonstrate its use. We do see a couple of regressions that need to be addressed: AMDGPU unsigned dot product codegen retains an AND mask (for ZERO_EXTEND) that it previously removed (but otherwise the dotproduct codegen is a lot better). X86/AVX2 has poor handling of vector ANY_EXTEND/ANY_EXTEND_VECTOR_INREG - it prematurely gets converted to ZERO_EXTEND_VECTOR_INREG. The code owners have confirmed its ok for these cases to fixed up in future patches. Differential Revision: https://reviews.llvm.org/D63281 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@366799 91177308-0d34-0410-b5e6-96231b3b80d8	2019-07-23 12:39:08 +00:00
Roman Lebedev	5264ba23f2	[Codegen][SelectionDAG] X u% C == 0 fold: non-splat vector improvements Summary: Four things here: 1. Generalize the fold to handle non-splat divisors. Reasonably trivial. 2. Unban power-of-two divisors. I don't see any reason why they should be illegal. * There is no ban in Hacker's Delight * I think the ban came from the same bug that caused the miscompile in the base patch - in `floor((2^W - 1) / D)` we were dividing by `D0` instead of `D`, and we were ensuring that `D0` is not `1`, which made sense. 3. Unban `1` divisors. I no longer believe Hacker's Delight actually says that the fold is invalid for `D = 0`. Further considerations: * We know that * `(X u% 1) == 0` can be constant-folded to `1`, * `(X u% 1) != 0` can be constant-folded to `0`, * Also, we know that * `X u<= -1` can be constant-folded to `1`, * `X u> -1` can be constant-folded to `0`, * https://godbolt.org/z/7jnZJX https://rise4fun.com/Alive/oF6p * We know will end up with the following: `(setule/setugt (rotr (mul N, P), K), Q)` * Therefore, for given new DAG nodes and comparison predicates (`ule`/`ugt`), we will still produce the correct answer if: `Q` is a all-ones constant; and both `P` and `K` are anything other than `undef`. * The fold will indeed produce `Q = all-ones`. 4. Try to re-splat the `P` and `K` vectors - we don't care about their values for the lanes where divisor was `1`. Reviewers: RKSimon, hermord, craig.topper, spatel, xbolva00 Reviewed By: RKSimon Subscribers: hiraditya, javed.absar, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63963 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@366637 91177308-0d34-0410-b5e6-96231b3b80d8	2019-07-20 16:33:15 +00:00
Sanjay Patel	4b3ebe8332	[SDAG] commute setcc operands to match a subtract If we have: R = sub X, Y P = cmp Y, X ...then flipping the operands in the compare instruction can allow using a subtract that sets compare flags. Motivated by diffs in D58875 - not sure if this changes anything there, but this seems like a good thing independent of that. There's a more involved version of this transform already in IR (in instcombine although that seems misplaced to me) - see "swapMayExposeCSEOpportunities()". Differential Revision: https://reviews.llvm.org/D63958 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@365711 91177308-0d34-0410-b5e6-96231b3b80d8	2019-07-10 23:23:54 +00:00
Nick Desaulniers	7cf1e3f0f2	[TargetLowering] support BlockAddress as "i" inline asm constraint Summary: This allows passing address of labels to inline assembly "i" input constraints. Fixes pr/42502. Reviewers: ostannard Reviewed By: ostannard Subscribers: void, echristo, nathanchance, ostannard, javed.absar, hiraditya, llvm-commits, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D64167 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@365664 91177308-0d34-0410-b5e6-96231b3b80d8	2019-07-10 17:08:25 +00:00
Simon Pilgrim	a9563933b2	[TargetLowering] SimplifyDemandedBits - just call computeKnownBits for BUILD_VECTOR cases. Don't do this locally, computeKnownBits does this better (and can handle non-constant cases as well). A next step would be to actually simplify non-constant elements - building on what we already do in SimplifyDemandedVectorElts. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@365309 91177308-0d34-0410-b5e6-96231b3b80d8	2019-07-08 11:00:39 +00:00
Roman Lebedev	7df05c88b0	[NFC][TargetLowering] Some preparatory cleanups around 'prepareUREMEqFold()' from D63963 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364921 91177308-0d34-0410-b5e6-96231b3b80d8	2019-07-02 13:21:23 +00:00
Benjamin Kramer	0ca3c92d55	[SelectionDAG] Do minnum->minimum at legalization time instead of building time The SDAGBuilder behavior stems from the days when we didn't have fast math flags available in SDAG. We do now and doing the transformation in the legalizer has the advantage that it also works for vector types. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364743 91177308-0d34-0410-b5e6-96231b3b80d8	2019-07-01 11:00:23 +00:00
Roman Lebedev	65afd3a0fd	[CodeGen] [SelectionDAG] More efficient code for X % C == 0 (UREM case) (try 3) Summary: I'm submitting a new revision since i don't understand how to reclaim/reopen/take over the existing one, D50222. There is no such action in "Add Action" menu... This implements an optimization described in Hacker's Delight 10-17: when `C` is constant, the result of `X % C == 0` can be computed more cheaply without actually calculating the remainder. The motivation is discussed here: https://bugs.llvm.org/show_bug.cgi?id=35479. This is a recommit, the original commit rL364563 was reverted in rL364568 because test-suite detected miscompile - the new comparison constant 'Q' was being computed incorrectly (we divided by `D0` instead of `D`). Original patch D50222 by @hermord (Dmytro Shynkevych) Notes: - In principle, it's possible to also handle the `X % C1 == C2` case, as discussed on bugzilla. This seems to require an extra branch on overflow, so I refrained from implementing this for now. - An explicit check for when the `REM` can be reduced to just its LHS is included: the `X % C` == 0 optimization breaks `test1` in `test/CodeGen/X86/jump_sign.ll` otherwise. I hadn't managed to find a better way to not generate worse output in this case. - The `test/CodeGen/X86/jump_sign.ll` regresses, and is being fixed by a followup patch D63390. Reviewers: RKSimon, craig.topper, spatel, hermord, xbolva00 Reviewed By: RKSimon, xbolva00 Subscribers: dexonsmith, kristina, xbolva00, javed.absar, llvm-commits, hermord Tags: #llvm Differential Revision: https://reviews.llvm.org/D63391 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364600 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-27 21:52:10 +00:00
Roman Lebedev	1df452bb5a	Revert "[CodeGen] [SelectionDAG] More efficient code for X % C == 0 (UREM case) (try 2)" Appears to break test-suite on http://lab.llvm.org:8011/builders/clang-cmake-x86_64-sde-avx512-linux/builds/23790 FAIL: burg.execution_time FAIL: spiff.execution_time FAIL: employ.execution_time FAIL: llu.execution_time FAIL: gramschmidt.execution_time FAIL: fdtd-apml.execution_time This reverts commit r364563. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364568 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-27 17:22:31 +00:00
Roman Lebedev	84139109d4	[CodeGen] [SelectionDAG] More efficient code for X % C == 0 (UREM case) (try 2) Summary: I'm submitting a new revision since i don't understand how to reclaim/reopen/take over the existing one, D50222. There is no such action in "Add Action" menu... Original patch D50222 by @hermord (Dmytro Shynkevych) This implements an optimization described in Hacker's Delight 10-17: when `C` is constant, the result of `X % C == 0` can be computed more cheaply without actually calculating the remainder. The motivation is discussed here: https://bugs.llvm.org/show_bug.cgi?id=35479. Original patch author: @hermord (Dmytro Shynkevych)! Notes: - In principle, it's possible to also handle the `X % C1 == C2` case, as discussed on bugzilla. This seems to require an extra branch on overflow, so I refrained from implementing this for now. - An explicit check for when the `REM` can be reduced to just its LHS is included: the `X % C` == 0 optimization breaks `test1` in `test/CodeGen/X86/jump_sign.ll` otherwise. I hadn't managed to find a better way to not generate worse output in this case. - The `test/CodeGen/X86/jump_sign.ll` regresses, and is being fixed by a followup patch D63390. Reviewers: RKSimon, craig.topper, spatel, hermord, xbolva00 Reviewed By: RKSimon, xbolva00 Subscribers: xbolva00, javed.absar, llvm-commits, hermord Tags: #llvm Differential Revision: https://reviews.llvm.org/D63391 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364563 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-27 16:45:42 +00:00
Simon Pilgrim	39f0e3cf18	[TargetLowering] SimplifyDemandedVectorElts - add shift/rotate support. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364548 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-27 14:25:54 +00:00
Simon Pilgrim	c2a0046960	[TargetLowering] SimplifyDemandedBits - use DemandedElts to better identify partial splat shift amounts git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364541 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-27 13:48:43 +00:00
Sanjay Patel	4ad4edc530	[SDAG] expand ctpop != 1 Change the generic ctpop expansion to more efficiently handle a check for not-a-power-of-two value: (ctpop x) != 1 --> (x == 0) \|\| ((x & x-1) != 0) This is the inverted predicate sibling pattern that was added with: D63004 This should have been done before I changed IR canonicalization to favor this form with: rL364246 ...so if this requires revert/changing, the earlier commit may also need to modified. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364319 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-25 14:46:52 +00:00

1 2 3 4 5 ...

954 Commits