RPCS3/llvm - llvm - Gitea: Git with a cup of tea

RPCS3/llvm

mirror of https://github.com/RPCS3/llvm.git synced 2025-05-15 18:06:08 +00:00

Author	SHA1	Message	Date
Piotr Sobczak	2aebf9af7f	[InstCombine][AMDGPU] Simplify tbuffer loads Summary: Add missing tbuffer loads intrinsics in SimplifyDemandedVectorElts. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66926 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370475 91177308-0d34-0410-b5e6-96231b3b80d8	2019-08-30 14:20:04 +00:00
Sander de Smalen	b09333fe72	[IntrinsicEmitter] Extend argument overloading with forward references. Extend the mechanism to overload intrinsic arguments by using either backward or forward references to the overloadable arguments. In for example: def int_something : Intrinsic<[LLVMPointerToElt<0>], [llvm_anyvector_ty], []>; LLVMPointerToElt<0> is a forward reference to the overloadable operand of type 'llvm_anyvector_ty' and would allow intrinsics such as: declare i32* @llvm.something.v4i32(<4 x i32>); declare i64* @llvm.something.v2i64(<2 x i64>); where the result pointer type is deduced from the element type of the first argument. If the returned pointer is not a pointer to the element type, LLVM will give an error: Intrinsic has incorrect return type! i64* (<4 x i32>)* @llvm.something.v4i32 Reviewers: RKSimon, arsenm, rnk, greened Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D62995 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363233 91177308-0d34-0410-b5e6-96231b3b80d8	2019-06-13 08:19:33 +00:00
Philip Reames	93849fdc0d	[InstCombine] Limit a vector demanded elts rule which was producing invalid IR. The demanded elts rules introduced for GEPs in https://reviews.llvm.org/rL356293 replaced vector constants with undefs (by design). It turns out that the LangRef disallows such cases when indexing structs. The right fix is probably to relax the langref requirement, and update other passes to expect the result, but for the moment, limit the transform to avoid compiler crashes. This should fix https://bugs.llvm.org/show_bug.cgi?id=41624. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@359633 91177308-0d34-0410-b5e6-96231b3b80d8	2019-04-30 23:09:26 +00:00
Philip Reames	11c890ecc7	[InstCombine] Fix a nasty miscompile introduced w/masked.gather demanded elts This fixes a miscompile which was introduced in r356510 (https://reviews.llvm.org/D57372). The problem is that the original patch removed pointer operands where the load results we're demanded, but without considering the legality of the load itself. If the masked.gather had active, but undemanded, lanes, then we could end up creating a load which loaded from an undef address. The result could be a segfault, or, in theory, an arbitrary read from a random memory location into an used register. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@358299 91177308-0d34-0410-b5e6-96231b3b80d8	2019-04-12 18:26:56 +00:00
Tim Renouf	5514b5c1f4	InstCombineSimplifyDemanded: Allow v3 results for AMDGCN buffer and image intrinsics This helps to avoid the situation where RA spots that only 3 of the v4f32 result of a load are used, and immediately reallocates the 4th register for something else, requiring a stall waiting for the load. Differential Revision: https://reviews.llvm.org/D58906 Change-Id: I947661edfd5715f62361a02b100f14aeeada29aa git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356768 91177308-0d34-0410-b5e6-96231b3b80d8	2019-03-22 15:53:50 +00:00
Philip Reames	94c6fbfa0e	Demanded elements support for masked.load and masked.gather Teach instcombine to propagate demanded elements through a masked load or masked gather instruction. This is in the broader context of improving vector pointer instcombine under https://reviews.llvm.org/D57140. Differential Revision: https://reviews.llvm.org/D57372 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356510 91177308-0d34-0410-b5e6-96231b3b80d8	2019-03-19 20:10:00 +00:00
Philip Reames	1c838a4225	[SimplifyDemandedVec] Strengthen handling all undef lanes (particularly GEPs) A change of two parts: 1) A generic enhancement for all callers of SDVE to exploit the fact that if all lanes are undef, the result is undef. 2) A GEP specific piece to strengthen/fix the vector index undef element handling, and call into the generic infrastructure when visiting the GEP. The result is that we replace a vector gep with at least one undef in each lane with a undef. We can also do the same for vector intrinsics. Once the masked.load patch (D57372) has landed, I'll update to include call tests as well. Differential Revision: https://reviews.llvm.org/D57468 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356293 91177308-0d34-0410-b5e6-96231b3b80d8	2019-03-15 19:54:06 +00:00
Matt Arsenault	fab403b55c	AMDGPU: Remove intrinsic operand assert Before r355981, this was under LLVM_DEBUG. I don't think the assert is quite right, but this really should be a verifier check. Instcombine should not be asserting on this sort of thing. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356219 91177308-0d34-0410-b5e6-96231b3b80d8	2019-03-14 23:45:09 +00:00
Matt Arsenault	6e8fb99b69	IR: Add immarg attribute This indicates an intrinsic parameter is required to be a constant, and should not be replaced with a non-constant value. Add the attribute to all AMDGPU and generic intrinsics that comments indicate it should apply to. I scanned other target intrinsics, but I don't see any obvious comments indicating which arguments are intended to be only immediates. This breaks one questionable testcase for the autoupgrade. I'm unclear on whether the autoupgrade is supposed to really handle declarations which were never valid. The verifier fails because the attributes now refer to a parameter past the end of the argument list. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@355981 91177308-0d34-0410-b5e6-96231b3b80d8	2019-03-12 21:02:54 +00:00
Bjorn Pettersson	85de1fd399	Add support for computing "zext of value" in KnownBits. NFCI Summary: The description of KnownBits::zext() and KnownBits::zextOrTrunc() has confusingly been telling that the operation is equivalent to zero extending the value we're tracking. That has not been true, instead the user has been forced to explicitly set the extended bits as known zero afterwards. This patch adds a second argument to KnownBits::zext() and KnownBits::zextOrTrunc() to control if the extended bits should be considered as known zero or as unknown. Reviewers: craig.topper, RKSimon Reviewed By: RKSimon Subscribers: javed.absar, hiraditya, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58650 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@355099 91177308-0d34-0410-b5e6-96231b3b80d8	2019-02-28 15:45:29 +00:00
Nicolai Haehnle	cf033b0e81	[InstCombine] Cleanup the TFE/LWE check in AMDGPU SimplifyDemanded Summary: The fix added in r352904 is not quite correct, or rather misleading: 1. When the texfailctrl (TFC) argument was non-constant, the fix assumed non-TFE/LWE, which is incorrect. 2. Regardless, this code path cannot even be hit for correct TFE/LWE-enabled calls, because those return a struct. Added a test case for those for completeness. Change-Id: I92d314dbc67a2670f6d7adaab765ef45f56a49cf Reviewers: hliao, dstuttard, arsenm Subscribers: kzhuravl, jvesely, wdng, yaxunl, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57681 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@353097 91177308-0d34-0410-b5e6-96231b3b80d8	2019-02-04 21:24:19 +00:00
Michael Liao	37ec5c2e39	[InstCombine] Extra null-checking on TFE/LWE support - If that operand is not ConstantInt, skip enabling TFE/LWE. Differential Revision: https://reviews.llvm.org/D57539 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@352904 91177308-0d34-0410-b5e6-96231b3b80d8	2019-02-01 19:53:44 +00:00
Philip Reames	4a40330dd7	Demanded elements support for vector GEPs GEPs can produce either scalar or vector results. If we're extracting only a subset of the vector lanes, simplifying the operands is helpful in eliminating redundant computation, and (eventually) allowing further optimizations Differential Revision: https://reviews.llvm.org/D57177 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@352440 91177308-0d34-0410-b5e6-96231b3b80d8	2019-01-28 23:24:49 +00:00
Chandler Carruth	6b547686c5	Update the file headers across all of the LLVM projects in the monorepo to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@351636 91177308-0d34-0410-b5e6-96231b3b80d8	2019-01-19 08:50:56 +00:00
David Stuttard	1da08e2514	[AMDGPU] Add support for TFE/LWE in image intrinsics. 2nd try TFE and LWE support requires extra result registers that are written in the event of a failure in order to detect that failure case. The specific use-case that initiated these changes is sparse texture support. This means that if image intrinsics are used with either option turned on, the programmer must ensure that the return type can contain all of the expected results. This can result in redundant registers since the vector size must be a power-of-2. This change takes roughly 6 parts: 1. Modify the instruction defs in tablegen to add new instruction variants that can accomodate the extra return values. 2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE (where the bulk of the work for these instruction types is now done) 3. Extra verification code to catch cases where intrinsics have been used but insufficient return registers are used. 4. Modification to the adjustWritemask optimisation to account for TFE/LWE being enabled (requires extra registers to be maintained for error return value). 5. An extra pass to zero initialize the error value return - this is because if the error does not occur, the register is not written and thus must be zeroed before use. Also added a new (on by default) option to ensure ALL return values are zero-initialized that is required for sparse texture support. 6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO for this to re-enable and handle correctly). There's an additional fix now to avoid a dmask=0 For an image intrinsic with tfe where all result channels except tfe were unused, I was getting an image instruction with dmask=0 and only a single vgpr result for tfe. That is incorrect because the hardware assumes there is at least one vgpr result, plus the one for tfe. Fixed by forcing dmask to 1, which gives the desired two vgpr result with tfe in the second one. The TFE or LWE result is returned from the intrinsics using an aggregate type. Look in the test code provided to see how this works, but in essence IR code to invoke the intrinsic looks as follows: %v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15, i32 %s, <8 x i32> %rsrc, i32 1, i32 0) %v.vec = extractvalue {<4 x float>, i32} %v, 0 %v.err = extractvalue {<4 x float>, i32} %v, 1 This re-submit of the change also includes a slight modification in SIISelLowering.cpp to work-around a compiler bug for the powerpc_le platform that caused a buildbot failure on a previous submission. Differential revision: https://reviews.llvm.org/D48826 Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda Work around for ppcle compiler bug Change-Id: Ie284cf24b2271215be1b9dc95b485fd15000e32b git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@351054 91177308-0d34-0410-b5e6-96231b3b80d8	2019-01-14 11:55:24 +00:00
Piotr Sobczak	41901c01d4	[InstCombine][AMDGPU] Handle more buffer intrinsics Summary: Include the following intrinsics in the InsctCombine simplification: * amdgcn_raw_buffer_load * amdgcn_raw_buffer_load_format * amdgcn_struct_buffer_load * amdgcn_struct_buffer_load_format Change-Id: I14deceff74bcb21179baf6aa6e94bf39e7d63d5d Reviewers: arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D55882 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@349735 91177308-0d34-0410-b5e6-96231b3b80d8	2018-12-20 10:08:18 +00:00
David Stuttard	cb049f818d	Revert r347871 "Fix: Add support for TFE/LWE in image intrinsic" Also revert fix r347876 One of the buildbots was reporting a failure in some relevant tests that I can't repro or explain at present, so reverting until I can isolate. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347911 91177308-0d34-0410-b5e6-96231b3b80d8	2018-11-29 20:14:17 +00:00
David Stuttard	7fd87699a5	Add support for TFE/LWE in image intrinsics TFE and LWE support requires extra result registers that are written in the event of a failure in order to detect that failure case. The specific use-case that initiated these changes is sparse texture support. This means that if image intrinsics are used with either option turned on, the programmer must ensure that the return type can contain all of the expected results. This can result in redundant registers since the vector size must be a power-of-2. This change takes roughly 6 parts: 1. Modify the instruction defs in tablegen to add new instruction variants that can accomodate the extra return values. 2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE (where the bulk of the work for these instruction types is now done) 3. Extra verification code to catch cases where intrinsics have been used but insufficient return registers are used. 4. Modification to the adjustWritemask optimisation to account for TFE/LWE being enabled (requires extra registers to be maintained for error return value). 5. An extra pass to zero initialize the error value return - this is because if the error does not occur, the register is not written and thus must be zeroed before use. Also added a new (on by default) option to ensure ALL return values are zero-initialized that is required for sparse texture support. 6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO for this to re-enable and handle correctly). There's an additional fix now to avoid a dmask=0 For an image intrinsic with tfe where all result channels except tfe were unused, I was getting an image instruction with dmask=0 and only a single vgpr result for tfe. That is incorrect because the hardware assumes there is at least one vgpr result, plus the one for tfe. Fixed by forcing dmask to 1, which gives the desired two vgpr result with tfe in the second one. The TFE or LWE result is returned from the intrinsics using an aggregate type. Look in the test code provided to see how this works, but in essence IR code to invoke the intrinsic looks as follows: %v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15, i32 %s, <8 x i32> %rsrc, i32 1, i32 0) %v.vec = extractvalue {<4 x float>, i32} %v, 0 %v.err = extractvalue {<4 x float>, i32} %v, 1 Differential revision: https://reviews.llvm.org/D48826 Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347871 91177308-0d34-0410-b5e6-96231b3b80d8	2018-11-29 15:21:13 +00:00
Nikita Popov	64638808e5	[InstCombine] Determine demanded and known bits for funnel shifts Support funnel shifts in InstCombine demanded bits simplification. If the shift amount is constant, we can determine both the demanded bits of the operands, as well as the known bits of the result. If one of the operands has no demanded bits, it will be replaced by undef and the funnel shift will be simplified into a simple shift due to the simplifications added in D54778. Differential Revision: https://reviews.llvm.org/D54869 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347515 91177308-0d34-0410-b5e6-96231b3b80d8	2018-11-24 19:00:45 +00:00
David Green	b4f227a2a5	[InstCombine] Demand bits of UMin This is the umin alternative to the umax code from rL344237. We use DeMorgans law on the umax case to bring us to the same thing on umin, but using countLeadingOnes, not countLeadingZeros. Differential Revision: https://reviews.llvm.org/D53036 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344239 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-11 11:28:27 +00:00
David Green	c3c05ee92f	[InstCombine] Demand bits of UMax Use the demanded bits of umax(A,C) to prove we can just use A so long as the lowest non-zero bit of DemandMask is higher than the highest non-zero bit of C Differential Revision: https://reviews.llvm.org/D53033 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344237 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-11 11:04:09 +00:00
Sanjay Patel	f2f6f77ac4	[InstCombine] drop poison flags in SimplifyVectorDemandedElts We established the (unfortunately complicated) rules for UB/poison propagation with vector ops in: D48893 D48987 D49047 It's clear from the affected tests that we are potentially creating poison where none existed before the transforms. For add/sub/mul, the answer is simple: just drop the flags because the extra undef vector lanes are generally more valuable for analysis and codegen. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343819 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-04 21:36:50 +00:00
Sanjay Patel	9e29af39fd	[InstCombine] reduce code duplication in SimplifyDemandedVectorElts; NFCI git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343806 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-04 19:12:07 +00:00
Sanjay Patel	44cfbc7d13	[InstCombine] allow SimplifyDemandedVectorElts to work with FP binops We're a long way from D50992 and D51553, but this is where we have to start. We weren't back-propagating undefs into binop constant values for anything but add/sub/mul/and/or/xor. This is likely because we have to be careful about not introducing UB/poison with div/rem/shift. But I suspect we already are getting the poison part wrong for add/sub/mul (although it may not be possible to expose the bug currently because we use SimplifyDemandedVectorElts from a limited set of opcodes). See the discussion/implementation from D48987 and D49047. This patch just enables functionality for FP ops because those do not have UB/poison potential. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343727 91177308-0d34-0410-b5e6-96231b3b80d8	2018-10-03 21:44:59 +00:00
Sanjay Patel	ca29f6a30b	[InstCombine] enhance vector demanded elements to look at a vector select condition operand I noticed that we were not back-propagating undef lanes to shuffle masks when we have a shuffle that reduces the vector width. This is part of investigating/solving PR38691: https://bugs.llvm.org/show_bug.cgi?id=38691 The DAG equivalent was proposed with: D51696 Differential Revision: https://reviews.llvm.org/D51433 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@341981 91177308-0d34-0410-b5e6-96231b3b80d8	2018-09-11 18:49:00 +00:00
Sanjay Patel	4eef9f6908	[InstCombine] use SelectInst operand names to make code clearer; NFC Cleanup step for D51433. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@341850 91177308-0d34-0410-b5e6-96231b3b80d8	2018-09-10 18:37:59 +00:00
Sanjay Patel	48afe9e636	[InstCombine] fix formatting in SimplifyDemandedVectorElts->Select; NFCI I'm preparing to add the same functionality both here and to the DAG version of this code in D51696 / D51433, so try to make those cases as similar as possible to avoid bugs. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@341545 91177308-0d34-0410-b5e6-96231b3b80d8	2018-09-06 13:19:22 +00:00
Craig Topper	18d8ba4a18	[X86] Remove and autoupgrade the scalar fma intrinsics with masking. This converts them to what clang is now using for codegen. Unfortunately, there seem to be a few kinks to work out still. I'll try to address with follow up patches. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336871 91177308-0d34-0410-b5e6-96231b3b80d8	2018-07-12 00:29:56 +00:00
Craig Topper	5f1cfe90f3	[X86] Remove X86 specific scalar FMA intrinsics and upgrade to tart independent FMA and extractelement/insertelement. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336315 91177308-0d34-0410-b5e6-96231b3b80d8	2018-07-05 06:52:55 +00:00
Simon Pilgrim	a2181998a7	Use APInt[] bit access to avoid "32-bit shift implicitly converted to 64 bits" MSVC warning (again). NFCI. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335457 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-25 11:46:24 +00:00
Simon Pilgrim	4e7cfd69f5	Use APInt[] bit access to avoid "32-bit shift implicitly converted to 64 bits" MSVC warning. NFCI. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335454 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-25 11:38:27 +00:00
Nicolai Haehnle	4c3fa871b5	AMDGPU: Remove old-style image intrinsics Summary: This also removes the need for atomic pseudo instructions, since we select the correct encoding directly in SITargetLowering::lowerImage for dimension-aware image intrinsics. Mesa uses dimension-aware image intrinsics since commit a9a7993441. Change-Id: I7473d20009476a4ed6d919cae4e6dca9ff42e77a Reviewers: arsenm, rampitec, mareko, tpr, b-sumner Subscribers: kzhuravl, wdng, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48167 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335231 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-21 13:37:45 +00:00
Nicolai Haehnle	7f7cea5306	InstCombine/AMDGPU: Add dimension-aware image intrinsics to SimplifyDemanded Summary: Use the expanded features of the TableGen generic tables to avoid manually adding the combinatorially exploded set of intrinsics. The getAMDGPUImageDimIntrinsic lookup function is early-out, i.e. non-AMDGPU intrinsics will never look at the underlying table. Use a generic approach for getting the new intrinsic overload to keep the code simple, and make the image dmask handling more generic: - handle non-sampler image loads - handle the case where the set of demanded elements is not a prefix There is some overlap between this code and an optimization that happens in the backend during code generation. They currently complement each other: - only the codegen optimization can generate vec3 loads - only the InstCombine optimization can handle D16 The InstCombine optimization also likely covers more cases since the codegen optimization is fairly ad-hoc. Ideally, we'll remove the optimization in codegen once the infrastructure for vec3 is in place (which will probably take a long time). Modify the test cases to use dimension-aware intrinsics. This makes it easier to see that the test coverage for the new intrinsics is equivalent, and the old style intrinsics will be removed in a follow-up commit anyway. Change-Id: I4b91ea661413d13004956fe4ef7d13d41b8ce3ad Reviewers: arsenm, rampitec, majnemer Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48165 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335230 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-21 13:37:31 +00:00
Tomasz Krupa	a36133dda7	[X86] Lowering sqrt intrinsics to native IR Summary: Complementary patch to lowering sqrt intrinsics in Clang. Reviewers: craig.topper, spatel, RKSimon, DavidKreitzer, uriel.k Reviewed By: craig.topper Subscribers: tkrupa, mike.dvoretsky, llvm-commits Differential Revision: https://reviews.llvm.org/D41599 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334849 91177308-0d34-0410-b5e6-96231b3b80d8	2018-06-15 18:05:24 +00:00
Craig Topper	c36d516fd6	[X86] Remove and autoupgrade a bunch of FMA instrinsics that are no longer used by clang. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@332146 91177308-0d34-0410-b5e6-96231b3b80d8	2018-05-11 21:59:34 +00:00
Benjamin Kramer	b83e520dee	[InstCombine] Only propagate known leading zeros from udiv input to output. Put in a conservatively correct estimate for now. Avoids miscompiling clang in FDO mode. This is really tricky to trigger in reality as basically all interesting cases will be folded away by computeKnownBits earlier, I was unable to find a reasonably small test case. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331975 91177308-0d34-0410-b5e6-96231b3b80d8	2018-05-10 11:45:18 +00:00
Benjamin Kramer	c1c66c0705	[InstCombine] Teach SimplifyDemandedBits that udiv doesn't demand low dividend bits that are zero in the divisor This is safe as long as the udiv is not exact. The pattern is not common in C++ code, but comes up all the time in code generated by XLA's GPU backend. Differential Revision: https://reviews.llvm.org/D46647 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331933 91177308-0d34-0410-b5e6-96231b3b80d8	2018-05-09 22:27:34 +00:00
Craig Topper	754a558235	[X86] Remove the pmuldq/pmuldq intrinsics and replace with native IR. This completes the work started in r329604 and r329605 when we changed clang to no longer use the intrinsics. We lost some InstCombine SimplifyDemandedBit optimizations through this change as we aren't able to fold 'and', bitcast, shuffle very well. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329990 91177308-0d34-0410-b5e6-96231b3b80d8	2018-04-13 06:07:18 +00:00
Simon Pilgrim	68041a58ac	Remove useless comment - seems to be a copy+paste typo. NFCI git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@325385 91177308-0d34-0410-b5e6-96231b3b80d8	2018-02-16 20:41:06 +00:00
Sanjay Patel	9d923086de	[InstCombine] fix demanded-bits propagation for zext/trunc I was comparing the demanded-bits implementations between InstCombine and TargetLowering as part of investigating questions in D42088 and noticed that this was wrong in IR. We were losing all of the prior known bits when we got back to the 'zext'. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@322662 91177308-0d34-0410-b5e6-96231b3b80d8	2018-01-17 14:39:28 +00:00
Simon Pilgrim	e3aae88e02	[InstCombine] Fix SimplifyDemandedUseBits SHL handling (PR35515) Don't assume that the pattern matched SRL can be cast to an Instruction (might be ConstExpr etc.) git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320270 91177308-0d34-0410-b5e6-96231b3b80d8	2017-12-09 23:42:56 +00:00
Sanjay Patel	da536d4e17	[InstCombine] improve demanded vector elements analysis of insertelement Recurse instead of returning on the first found optimization. Also, return early in the caller instead of continuing because that allows another round of simplification before we might potentially lose undef information from a shuffle mask by eliminating the shuffle. As noted in the review, we could probably do better and be more efficient by moving all of demanded elements into a separate pass, but this is yet another quick fix to instcombine. Differential Revision: https://reviews.llvm.org/D37236 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312248 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-31 15:57:17 +00:00
Craig Topper	a3ced95cbe	[InstCombine] Call hasNoSignedWrap instead of hasNoUnsignedWrap to get the NSW flag when handling Add in SimplifyDemandedUseBits. This is a typo from r311789. This should fix PR34349. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311902 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-28 18:44:28 +00:00
Craig Topper	ef9c2d804e	[InstCombine] Don't fall back to only calling computeKnownBits if the upper bit of Add/Sub is demanded. Just create an all 1s demanded mask and continue recursing like normal. The recursive calls should be able to handle an all 1s mask and do the right thing. The only time we should care about knowing whether the upper bit was demanded is when we need to know if we should clear the NSW/NUW flags. Now that we have a consistent path through the code for all cases, use KnownBits::computeForAddSub to compute the known bits at the end since we already have the LHS and RHS. My larger goal here is to move the code that turns add into xor if only 1 bit is demanded and no bits below it are non-zero from InstCombiner::OptAndOp to here. This will allow it to be more general instead of just looking for 'add' and 'and' with constant RHS. Differential Revision: https://reviews.llvm.org/D36486 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311789 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-25 18:39:40 +00:00
Amjad Aboud	513af851dd	[InstCombine] Consider more cases where SimplifyDemandedUseBits does not convert AShr to LShr. There are cases where AShr have better chance to be optimized than LShr, especially when the demanded bits are not known to be Zero, and also known to be similar to the sign bit. Differential Revision: https://reviews.llvm.org/D36936 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311773 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-25 11:07:54 +00:00
Craig Topper	ef45a1fe1b	[InstCombine] Remove unnecessary temporary APInt. NFCI git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@309887 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-02 21:05:40 +00:00
Craig Topper	e2cbc76f80	[InstCombine] Remove explicit check for impossible condition. Replace with assert Summary: As far as I can tell the earlier call getLimitedValue will guaranteed ShiftAmt is saturated to BitWidth-1 preventing it from ever being equal or greater than BitWidth. At one point in the past the getLimitedValue call was only passed BitWidth not BitWidth - 1. This would have allowed the equality case to get here. And in fact this check was initially added as just BitWidth == ShiftAmt, but was changed shortly after to include > which should have never been possible. Reviewers: spatel, majnemer, davide Reviewed By: davide Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36123 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@309690 91177308-0d34-0410-b5e6-96231b3b80d8	2017-08-01 15:10:25 +00:00
Craig Topper	279ac88b99	[InstCombine] Move (0 - x) & 1 --> x & 1 to SimplifyDemandedUseBits. This removes a dedicated matcher and allows us to support more than just an AND masking the lower bit. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@308124 91177308-0d34-0410-b5e6-96231b3b80d8	2017-07-16 05:37:58 +00:00
Craig Topper	f552e96e02	[InstCombine] Make InstCombine's IRBuilder be passed by reference everywhere Previously the InstCombiner class contained a pointer to an IR builder that had been passed to the constructor. Sometimes this would be passed to helper functions as either a pointer or the pointer would be dereferenced to be passed by reference. This patch makes it a reference everywhere including the InstCombiner class itself so there is more inconsistency. This a large, but mechanical patch. I've done very minimal formatting changes on it despite what clang-format wanted to do. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307451 91177308-0d34-0410-b5e6-96231b3b80d8	2017-07-07 23:16:26 +00:00
Craig Topper	6dbd34d261	[Constants] If we already have a ConstantInt*, prefer to use isZero/isOne/isMinusOne instead of isNullValue/isOneValue/isAllOnesValue inherited from Constant. NFCI Going through the Constant methods requires redetermining that the Constant is a ConstantInt and then calling isZero/isOne/isMinusOne. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307292 91177308-0d34-0410-b5e6-96231b3b80d8	2017-07-06 18:39:47 +00:00

1 2 3 4 5

211 Commits