211 Commits

Author SHA1 Message Date
Piotr Sobczak
2aebf9af7f [InstCombine][AMDGPU] Simplify tbuffer loads
Summary: Add missing tbuffer loads intrinsics in SimplifyDemandedVectorElts.

Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66926

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370475 91177308-0d34-0410-b5e6-96231b3b80d8
2019-08-30 14:20:04 +00:00
Sander de Smalen
b09333fe72 [IntrinsicEmitter] Extend argument overloading with forward references.
Extend the mechanism to overload intrinsic arguments by using either
backward or forward references to the overloadable arguments.

In for example:

  def int_something : Intrinsic<[LLVMPointerToElt<0>],
                                [llvm_anyvector_ty], []>;

LLVMPointerToElt<0> is a forward reference to the overloadable operand
of type 'llvm_anyvector_ty' and would allow intrinsics such as:

  declare i32* @llvm.something.v4i32(<4 x i32>);
  declare i64* @llvm.something.v2i64(<2 x i64>);

where the result pointer type is deduced from the element type of the
first argument.

If the returned pointer is not a pointer to the element type, LLVM will
give an error:

  Intrinsic has incorrect return type!
  i64* (<4 x i32>)* @llvm.something.v4i32

Reviewers: RKSimon, arsenm, rnk, greened

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D62995


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363233 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-13 08:19:33 +00:00
Philip Reames
93849fdc0d [InstCombine] Limit a vector demanded elts rule which was producing invalid IR.
The demanded elts rules introduced for GEPs in https://reviews.llvm.org/rL356293 replaced vector constants with undefs (by design).  It turns out that the LangRef disallows such cases when indexing structs.  The right fix is probably to relax the langref requirement, and update other passes to expect the result, but for the moment, limit the transform to avoid compiler crashes.

This should fix https://bugs.llvm.org/show_bug.cgi?id=41624.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@359633 91177308-0d34-0410-b5e6-96231b3b80d8
2019-04-30 23:09:26 +00:00
Philip Reames
11c890ecc7 [InstCombine] Fix a nasty miscompile introduced w/masked.gather demanded elts
This fixes a miscompile which was introduced in r356510 (https://reviews.llvm.org/D57372).

The problem is that the original patch removed pointer operands where the load results we're demanded, but without considering the legality of the load itself.  If the masked.gather had active, but undemanded, lanes, then we could end up creating a load which loaded from an undef address.  The result could be a segfault, or, in theory, an arbitrary read from a random memory location into an used register.  



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@358299 91177308-0d34-0410-b5e6-96231b3b80d8
2019-04-12 18:26:56 +00:00
Tim Renouf
5514b5c1f4 InstCombineSimplifyDemanded: Allow v3 results for AMDGCN buffer and image intrinsics
This helps to avoid the situation where RA spots that only 3 of the
v4f32 result of a load are used, and immediately reallocates the 4th
register for something else, requiring a stall waiting for the load.

Differential Revision: https://reviews.llvm.org/D58906

Change-Id: I947661edfd5715f62361a02b100f14aeeada29aa

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356768 91177308-0d34-0410-b5e6-96231b3b80d8
2019-03-22 15:53:50 +00:00
Philip Reames
94c6fbfa0e Demanded elements support for masked.load and masked.gather
Teach instcombine to propagate demanded elements through a masked load or masked gather instruction. This is in the broader context of improving vector pointer instcombine under https://reviews.llvm.org/D57140.

Differential Revision: https://reviews.llvm.org/D57372



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356510 91177308-0d34-0410-b5e6-96231b3b80d8
2019-03-19 20:10:00 +00:00
Philip Reames
1c838a4225 [SimplifyDemandedVec] Strengthen handling all undef lanes (particularly GEPs)
A change of two parts:
1) A generic enhancement for all callers of SDVE to exploit the fact that if all lanes are undef, the result is undef.
2) A GEP specific piece to strengthen/fix the vector index undef element handling, and call into the generic infrastructure when visiting the GEP.

The result is that we replace a vector gep with at least one undef in each lane with a undef.  We can also do the same for vector intrinsics.  Once the masked.load patch (D57372) has landed, I'll update to include call tests as well.

Differential Revision: https://reviews.llvm.org/D57468



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356293 91177308-0d34-0410-b5e6-96231b3b80d8
2019-03-15 19:54:06 +00:00
Matt Arsenault
fab403b55c AMDGPU: Remove intrinsic operand assert
Before r355981, this was under LLVM_DEBUG. I don't think the assert is
quite right, but this really should be a verifier check. Instcombine
should not be asserting on this sort of thing.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356219 91177308-0d34-0410-b5e6-96231b3b80d8
2019-03-14 23:45:09 +00:00
Matt Arsenault
6e8fb99b69 IR: Add immarg attribute
This indicates an intrinsic parameter is required to be a constant,
and should not be replaced with a non-constant value.

Add the attribute to all AMDGPU and generic intrinsics that comments
indicate it should apply to. I scanned other target intrinsics, but I
don't see any obvious comments indicating which arguments are intended
to be only immediates.

This breaks one questionable testcase for the autoupgrade. I'm unclear
on whether the autoupgrade is supposed to really handle declarations
which were never valid. The verifier fails because the attributes now
refer to a parameter past the end of the argument list.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@355981 91177308-0d34-0410-b5e6-96231b3b80d8
2019-03-12 21:02:54 +00:00
Bjorn Pettersson
85de1fd399 Add support for computing "zext of value" in KnownBits. NFCI
Summary:
The description of KnownBits::zext() and
KnownBits::zextOrTrunc() has confusingly been telling
that the operation is equivalent to zero extending the
value we're tracking. That has not been true, instead
the user has been forced to explicitly set the extended
bits as known zero afterwards.

This patch adds a second argument to KnownBits::zext()
and KnownBits::zextOrTrunc() to control if the extended
bits should be considered as known zero or as unknown.

Reviewers: craig.topper, RKSimon

Reviewed By: RKSimon

Subscribers: javed.absar, hiraditya, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58650

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@355099 91177308-0d34-0410-b5e6-96231b3b80d8
2019-02-28 15:45:29 +00:00
Nicolai Haehnle
cf033b0e81 [InstCombine] Cleanup the TFE/LWE check in AMDGPU SimplifyDemanded
Summary:
The fix added in r352904 is not quite correct, or rather misleading:

1. When the texfailctrl (TFC) argument was non-constant, the fix assumed
   non-TFE/LWE, which is incorrect.

2. Regardless, this code path cannot even be hit for correct
   TFE/LWE-enabled calls, because those return a struct. Added
   a test case for those for completeness.

Change-Id: I92d314dbc67a2670f6d7adaab765ef45f56a49cf

Reviewers: hliao, dstuttard, arsenm

Subscribers: kzhuravl, jvesely, wdng, yaxunl, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57681

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@353097 91177308-0d34-0410-b5e6-96231b3b80d8
2019-02-04 21:24:19 +00:00
Michael Liao
37ec5c2e39 [InstCombine] Extra null-checking on TFE/LWE support
- If that operand is not ConstantInt, skip enabling TFE/LWE.

Differential Revision: https://reviews.llvm.org/D57539



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@352904 91177308-0d34-0410-b5e6-96231b3b80d8
2019-02-01 19:53:44 +00:00
Philip Reames
4a40330dd7 Demanded elements support for vector GEPs
GEPs can produce either scalar or vector results. If we're extracting only a subset of the vector lanes, simplifying the operands is helpful in eliminating redundant computation, and (eventually) allowing further optimizations

Differential Revision: https://reviews.llvm.org/D57177



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@352440 91177308-0d34-0410-b5e6-96231b3b80d8
2019-01-28 23:24:49 +00:00
Chandler Carruth
6b547686c5 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@351636 91177308-0d34-0410-b5e6-96231b3b80d8
2019-01-19 08:50:56 +00:00
David Stuttard
1da08e2514 [AMDGPU] Add support for TFE/LWE in image intrinsics. 2nd try
TFE and LWE support requires extra result registers that are written in the
event of a failure in order to detect that failure case.
The specific use-case that initiated these changes is sparse texture support.

This means that if image intrinsics are used with either option turned on, the
programmer must ensure that the return type can contain all of the expected
results. This can result in redundant registers since the vector size must be a
power-of-2.

This change takes roughly 6 parts:
1. Modify the instruction defs in tablegen to add new instruction variants that
can accomodate the extra return values.
2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE
(where the bulk of the work for these instruction types is now done)
3. Extra verification code to catch cases where intrinsics have been used but
insufficient return registers are used.
4. Modification to the adjustWritemask optimisation to account for TFE/LWE being
enabled (requires extra registers to be maintained for error return value).
5. An extra pass to zero initialize the error value return - this is because if
the error does not occur, the register is not written and thus must be zeroed
before use. Also added a new (on by default) option to ensure ALL return values
are zero-initialized that is required for sparse texture support.
6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO
for this to re-enable and handle correctly).

There's an additional fix now to avoid a dmask=0

For an image intrinsic with tfe where all result channels except tfe
were unused, I was getting an image instruction with dmask=0 and only a
single vgpr result for tfe. That is incorrect because the hardware
assumes there is at least one vgpr result, plus the one for tfe.

Fixed by forcing dmask to 1, which gives the desired two vgpr result
with tfe in the second one.

The TFE or LWE result is returned from the intrinsics using an aggregate
type. Look in the test code provided to see how this works, but in essence IR
code to invoke the intrinsic looks as follows:

%v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15,
                                      i32 %s, <8 x i32> %rsrc, i32 1, i32 0)
%v.vec = extractvalue {<4 x float>, i32} %v, 0
%v.err = extractvalue {<4 x float>, i32} %v, 1

This re-submit of the change also includes a slight modification in
SIISelLowering.cpp to work-around a compiler bug for the powerpc_le
platform that caused a buildbot failure on a previous submission.

Differential revision: https://reviews.llvm.org/D48826

Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda


Work around for ppcle compiler bug

Change-Id: Ie284cf24b2271215be1b9dc95b485fd15000e32b

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@351054 91177308-0d34-0410-b5e6-96231b3b80d8
2019-01-14 11:55:24 +00:00
Piotr Sobczak
41901c01d4 [InstCombine][AMDGPU] Handle more buffer intrinsics
Summary:
Include the following intrinsics in the InsctCombine
simplification:

* amdgcn_raw_buffer_load
* amdgcn_raw_buffer_load_format
* amdgcn_struct_buffer_load
* amdgcn_struct_buffer_load_format

Change-Id: I14deceff74bcb21179baf6aa6e94bf39e7d63d5d

Reviewers: arsenm

Reviewed By: arsenm

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D55882

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@349735 91177308-0d34-0410-b5e6-96231b3b80d8
2018-12-20 10:08:18 +00:00
David Stuttard
cb049f818d Revert r347871 "Fix: Add support for TFE/LWE in image intrinsic"
Also revert fix r347876

One of the buildbots was reporting a failure in some relevant tests that I can't
repro or explain at present, so reverting until I can isolate.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347911 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-29 20:14:17 +00:00
David Stuttard
7fd87699a5 Add support for TFE/LWE in image intrinsics
TFE and LWE support requires extra result registers that are written in the
event of a failure in order to detect that failure case.
The specific use-case that initiated these changes is sparse texture support.

This means that if image intrinsics are used with either option turned on, the
programmer must ensure that the return type can contain all of the expected
results. This can result in redundant registers since the vector size must be a
power-of-2.

This change takes roughly 6 parts:
1. Modify the instruction defs in tablegen to add new instruction variants that
can accomodate the extra return values.
2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE
(where the bulk of the work for these instruction types is now done)
3. Extra verification code to catch cases where intrinsics have been used but
insufficient return registers are used.
4. Modification to the adjustWritemask optimisation to account for TFE/LWE being
enabled (requires extra registers to be maintained for error return value).
5. An extra pass to zero initialize the error value return - this is because if
the error does not occur, the register is not written and thus must be zeroed
before use. Also added a new (on by default) option to ensure ALL return values
are zero-initialized that is required for sparse texture support.
6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO
for this to re-enable and handle correctly).

There's an additional fix now to avoid a dmask=0

For an image intrinsic with tfe where all result channels except tfe
were unused, I was getting an image instruction with dmask=0 and only a
single vgpr result for tfe. That is incorrect because the hardware
assumes there is at least one vgpr result, plus the one for tfe.

Fixed by forcing dmask to 1, which gives the desired two vgpr result
with tfe in the second one.

The TFE or LWE result is returned from the intrinsics using an aggregate
type. Look in the test code provided to see how this works, but in essence IR
code to invoke the intrinsic looks as follows:

%v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15,
                                      i32 %s, <8 x i32> %rsrc, i32 1, i32 0)
%v.vec = extractvalue {<4 x float>, i32} %v, 0
%v.err = extractvalue {<4 x float>, i32} %v, 1

Differential revision: https://reviews.llvm.org/D48826

Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347871 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-29 15:21:13 +00:00
Nikita Popov
64638808e5 [InstCombine] Determine demanded and known bits for funnel shifts
Support funnel shifts in InstCombine demanded bits simplification.
If the shift amount is constant, we can determine both the demanded
bits of the operands, as well as the known bits of the result.

If one of the operands has no demanded bits, it will be replaced
by undef and the funnel shift will be simplified into a simple shift
due to the simplifications added in D54778.

Differential Revision: https://reviews.llvm.org/D54869

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347515 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-24 19:00:45 +00:00
David Green
b4f227a2a5 [InstCombine] Demand bits of UMin
This is the umin alternative to the umax code from rL344237. We use
DeMorgans law on the umax case to bring us to the same thing on umin,
but using countLeadingOnes, not countLeadingZeros.

Differential Revision: https://reviews.llvm.org/D53036


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344239 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-11 11:28:27 +00:00
David Green
c3c05ee92f [InstCombine] Demand bits of UMax
Use the demanded bits of umax(A,C) to prove we can just use A so long as the
lowest non-zero bit of DemandMask is higher than the highest non-zero bit of C

Differential Revision: https://reviews.llvm.org/D53033


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344237 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-11 11:04:09 +00:00
Sanjay Patel
f2f6f77ac4 [InstCombine] drop poison flags in SimplifyVectorDemandedElts
We established the (unfortunately complicated) rules for UB/poison
propagation with vector ops in:
D48893
D48987
D49047

It's clear from the affected tests that we are potentially creating 
poison where none existed before the transforms. For add/sub/mul,
the answer is simple: just drop the flags because the extra undef
vector lanes are generally more valuable for analysis and codegen.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343819 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-04 21:36:50 +00:00
Sanjay Patel
9e29af39fd [InstCombine] reduce code duplication in SimplifyDemandedVectorElts; NFCI
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343806 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-04 19:12:07 +00:00
Sanjay Patel
44cfbc7d13 [InstCombine] allow SimplifyDemandedVectorElts to work with FP binops
We're a long way from D50992 and D51553, but this is where we have to start.
We weren't back-propagating undefs into binop constant values for anything but
add/sub/mul/and/or/xor. 

This is likely because we have to be careful about not introducing UB/poison 
with div/rem/shift. But I suspect we already are getting the poison part wrong 
for add/sub/mul (although it may not be possible to expose the bug currently
because we use SimplifyDemandedVectorElts from a limited set of opcodes).
See the discussion/implementation from D48987 and D49047.

This patch just enables functionality for FP ops because those do not have 
UB/poison potential.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343727 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-03 21:44:59 +00:00
Sanjay Patel
ca29f6a30b [InstCombine] enhance vector demanded elements to look at a vector select condition operand
I noticed that we were not back-propagating undef lanes to shuffle masks when we have a 
shuffle that reduces the vector width. This is part of investigating/solving PR38691:
https://bugs.llvm.org/show_bug.cgi?id=38691

The DAG equivalent was proposed with:
D51696

Differential Revision: https://reviews.llvm.org/D51433


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@341981 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-11 18:49:00 +00:00
Sanjay Patel
4eef9f6908 [InstCombine] use SelectInst operand names to make code clearer; NFC
Cleanup step for D51433.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@341850 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-10 18:37:59 +00:00
Sanjay Patel
48afe9e636 [InstCombine] fix formatting in SimplifyDemandedVectorElts->Select; NFCI
I'm preparing to add the same functionality both here and to the DAG 
version of this code in D51696 / D51433, so try to make those cases 
as similar as possible to avoid bugs.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@341545 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-06 13:19:22 +00:00
Craig Topper
18d8ba4a18 [X86] Remove and autoupgrade the scalar fma intrinsics with masking.
This converts them to what clang is now using for codegen. Unfortunately, there seem to be a few kinks to work out still. I'll try to address with follow up patches.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336871 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-12 00:29:56 +00:00
Craig Topper
5f1cfe90f3 [X86] Remove X86 specific scalar FMA intrinsics and upgrade to tart independent FMA and extractelement/insertelement.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336315 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-05 06:52:55 +00:00
Simon Pilgrim
a2181998a7 Use APInt[] bit access to avoid "32-bit shift implicitly converted to 64 bits" MSVC warning (again). NFCI.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335457 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-25 11:46:24 +00:00
Simon Pilgrim
4e7cfd69f5 Use APInt[] bit access to avoid "32-bit shift implicitly converted to 64 bits" MSVC warning. NFCI.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335454 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-25 11:38:27 +00:00
Nicolai Haehnle
4c3fa871b5 AMDGPU: Remove old-style image intrinsics
Summary:
This also removes the need for atomic pseudo instructions, since
we select the correct encoding directly in SITargetLowering::lowerImage
for dimension-aware image intrinsics.

Mesa uses dimension-aware image intrinsics since
commit a9a7993441.

Change-Id: I7473d20009476a4ed6d919cae4e6dca9ff42e77a

Reviewers: arsenm, rampitec, mareko, tpr, b-sumner

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D48167

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335231 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-21 13:37:45 +00:00
Nicolai Haehnle
7f7cea5306 InstCombine/AMDGPU: Add dimension-aware image intrinsics to SimplifyDemanded
Summary:
Use the expanded features of the TableGen generic tables to avoid manually
adding the combinatorially exploded set of intrinsics. The
getAMDGPUImageDimIntrinsic lookup function is early-out,
i.e. non-AMDGPU intrinsics will never look at the underlying table.

Use a generic approach for getting the new intrinsic overload to keep the
code simple, and make the image dmask handling more generic:
- handle non-sampler image loads
- handle the case where the set of demanded elements is not a prefix

There is some overlap between this code and an optimization that happens
in the backend during code generation. They currently complement each other:

- only the codegen optimization can generate vec3 loads
- only the InstCombine optimization can handle D16

The InstCombine optimization also likely covers more cases since the
codegen optimization is fairly ad-hoc. Ideally, we'll remove the optimization
in codegen once the infrastructure for vec3 is in place (which will probably
take a long time).

Modify the test cases to use dimension-aware intrinsics. This makes it
easier to see that the test coverage for the new intrinsics is equivalent,
and the old style intrinsics will be removed in a follow-up commit anyway.

Change-Id: I4b91ea661413d13004956fe4ef7d13d41b8ce3ad

Reviewers: arsenm, rampitec, majnemer

Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D48165

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335230 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-21 13:37:31 +00:00
Tomasz Krupa
a36133dda7 [X86] Lowering sqrt intrinsics to native IR
Summary: Complementary patch to lowering sqrt intrinsics in Clang.

Reviewers: craig.topper, spatel, RKSimon, DavidKreitzer, uriel.k

Reviewed By: craig.topper

Subscribers: tkrupa, mike.dvoretsky, llvm-commits

Differential Revision: https://reviews.llvm.org/D41599


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334849 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-15 18:05:24 +00:00
Craig Topper
c36d516fd6 [X86] Remove and autoupgrade a bunch of FMA instrinsics that are no longer used by clang.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@332146 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-11 21:59:34 +00:00
Benjamin Kramer
b83e520dee [InstCombine] Only propagate known leading zeros from udiv input to output.
Put in a conservatively correct estimate for now. Avoids miscompiling
clang in FDO mode. This is really tricky to trigger in reality as
basically all interesting cases will be folded away by computeKnownBits
earlier, I was unable to find a reasonably small test case.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331975 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-10 11:45:18 +00:00
Benjamin Kramer
c1c66c0705 [InstCombine] Teach SimplifyDemandedBits that udiv doesn't demand low dividend bits that are zero in the divisor
This is safe as long as the udiv is not exact. The pattern is not common in
C++ code, but comes up all the time in code generated by XLA's GPU backend.

Differential Revision: https://reviews.llvm.org/D46647

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331933 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-09 22:27:34 +00:00
Craig Topper
754a558235 [X86] Remove the pmuldq/pmuldq intrinsics and replace with native IR.
This completes the work started in r329604 and r329605 when we changed clang to no longer use the intrinsics.

We lost some InstCombine SimplifyDemandedBit optimizations through this change as we aren't able to fold 'and', bitcast, shuffle very well.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329990 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-13 06:07:18 +00:00
Simon Pilgrim
68041a58ac Remove useless comment - seems to be a copy+paste typo. NFCI
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@325385 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-16 20:41:06 +00:00
Sanjay Patel
9d923086de [InstCombine] fix demanded-bits propagation for zext/trunc
I was comparing the demanded-bits implementations between InstCombine
and TargetLowering as part of investigating questions in D42088 and
noticed that this was wrong in IR. We were losing all of the prior
known bits when we got back to the 'zext'.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@322662 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-17 14:39:28 +00:00
Simon Pilgrim
e3aae88e02 [InstCombine] Fix SimplifyDemandedUseBits SHL handling (PR35515)
Don't assume that the pattern matched SRL can be cast to an Instruction (might be ConstExpr etc.)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320270 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-09 23:42:56 +00:00
Sanjay Patel
da536d4e17 [InstCombine] improve demanded vector elements analysis of insertelement
Recurse instead of returning on the first found optimization. Also, return early in the caller
instead of continuing because that allows another round of simplification before we might
potentially lose undef information from a shuffle mask by eliminating the shuffle.

As noted in the review, we could probably do better and be more efficient by moving all of
demanded elements into a separate pass, but this is yet another quick fix to instcombine.

Differential Revision: https://reviews.llvm.org/D37236


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312248 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-31 15:57:17 +00:00
Craig Topper
a3ced95cbe [InstCombine] Call hasNoSignedWrap instead of hasNoUnsignedWrap to get the NSW flag when handling Add in SimplifyDemandedUseBits.
This is a typo from r311789.

This should fix PR34349.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311902 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-28 18:44:28 +00:00
Craig Topper
ef9c2d804e [InstCombine] Don't fall back to only calling computeKnownBits if the upper bit of Add/Sub is demanded.
Just create an all 1s demanded mask and continue recursing like normal. The recursive calls should be able to handle an all 1s mask and do the right thing.

The only time we should care about knowing whether the upper bit was demanded is when we need to know if we should clear the NSW/NUW flags.

Now that we have a consistent path through the code for all cases, use KnownBits::computeForAddSub to compute the known bits at the end since we already have the LHS and RHS.

My larger goal here is to move the code that turns add into xor if only 1 bit is demanded and no bits below it are non-zero from InstCombiner::OptAndOp to here. This will allow it to be more general instead of just looking for 'add' and 'and' with constant RHS.

Differential Revision: https://reviews.llvm.org/D36486

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311789 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-25 18:39:40 +00:00
Amjad Aboud
513af851dd [InstCombine] Consider more cases where SimplifyDemandedUseBits does not convert AShr to LShr.
There are cases where AShr have better chance to be optimized than LShr, especially when the demanded bits are not known to be Zero, and also known to be similar to the sign bit.

Differential Revision: https://reviews.llvm.org/D36936




git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311773 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-25 11:07:54 +00:00
Craig Topper
ef45a1fe1b [InstCombine] Remove unnecessary temporary APInt. NFCI
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@309887 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-02 21:05:40 +00:00
Craig Topper
e2cbc76f80 [InstCombine] Remove explicit check for impossible condition. Replace with assert
Summary:
As far as I can tell the earlier call getLimitedValue will guaranteed ShiftAmt is saturated to BitWidth-1 preventing it from ever being equal or greater than BitWidth.

At one point in the past the getLimitedValue call was only passed BitWidth not BitWidth - 1. This would have allowed the equality case to get here. And in fact this check was initially added as just BitWidth == ShiftAmt, but was changed shortly after to include > which should have never been possible.

Reviewers: spatel, majnemer, davide

Reviewed By: davide

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D36123

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@309690 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-01 15:10:25 +00:00
Craig Topper
279ac88b99 [InstCombine] Move (0 - x) & 1 --> x & 1 to SimplifyDemandedUseBits.
This removes a dedicated matcher and allows us to support more than just an AND masking the lower bit.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@308124 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-16 05:37:58 +00:00
Craig Topper
f552e96e02 [InstCombine] Make InstCombine's IRBuilder be passed by reference everywhere
Previously the InstCombiner class contained a pointer to an IR builder that had been passed to the constructor. Sometimes this would be passed to helper functions as either a pointer or the pointer would be dereferenced to be passed by reference.

This patch makes it a reference everywhere including the InstCombiner class itself so there is more inconsistency. This a large, but mechanical patch. I've done very minimal formatting changes on it despite what clang-format wanted to do.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307451 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-07 23:16:26 +00:00
Craig Topper
6dbd34d261 [Constants] If we already have a ConstantInt*, prefer to use isZero/isOne/isMinusOne instead of isNullValue/isOneValue/isAllOnesValue inherited from Constant. NFCI
Going through the Constant methods requires redetermining that the Constant is a ConstantInt and then calling isZero/isOne/isMinusOne.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307292 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-06 18:39:47 +00:00