236 Commits

Author SHA1 Message Date
Matt Arsenault
db6cc311a0 AMDGPU: Remove redundant combine
This combine was already done in two places. The
generic combiner already has done this since
r217610, for adds (with a single use).

This one was added in r303641, and added support for handling
or as well. r313251 later added support to the generic
combine for or. It also turns out the isOrEquivalentToAdd
check is not necessary for this combine.

Additionally, we already reproduce this combine in yet
another place in the backend, although in that version
multiple uses of the add are still folded if it will
allow a fold into the addressing mode. That version needs
to be improved to understand ors though, as well as the
correct legal offsets for private.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317526 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-07 00:06:32 +00:00
Matt Arsenault
bd04b64cd1 AMDGPU: Select v_mad_u64_u32 and v_mad_i64_i32
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317492 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-06 17:04:37 +00:00
Wei Ding
607acf30af AMDGPU : Fix an error for the llvm.cttz implementation.
Differential Revision: http://reviews.llvm.org/D39014

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@316037 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-17 21:49:52 +00:00
Matt Arsenault
9a6875264b AMDGPU: Implement isFPExtFoldable
This helps match v_mad_mix* in some cases.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@315744 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-13 20:18:59 +00:00
Wei Ding
29de9d738e Implement custom lowering for ISD::CTTZ_ZERO_UNDEF and ISD::CTTZ.
Differential Revision: http://reviews.llvm.org/D37348

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@315610 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-12 19:37:14 +00:00
Stanislav Mekhanoshin
287608ccc3 [AMDGPU] New 64 bit div/rem expansion
Old expansion was 20 VGPRs, 78 SGPRs and ~380 instructions.
This expansion is 11 VGPRs, 12 SGPRs and ~120 instructions.

Passes OpenCL conformance test_integer_ops quick_[u]long_math

Differential Revision: https://reviews.llvm.org/D38607

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@315081 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-06 17:24:45 +00:00
Konstantin Zhuravlyov
9cb20ab95d AMDGPU: Expand setcc for v2f32 and v4f32
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@314853 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-03 21:45:01 +00:00
Konstantin Zhuravlyov
b922e2ff02 AMDGPU: Expand setcc for v2i32 and v4i32
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@314852 91177308-0d34-0410-b5e6-96231b3b80d8
2017-10-03 21:31:24 +00:00
Tim Renouf
8ba98f908f [AMDGPU] calling conventions for AMDPAL OS type
Summary:
This commit adds comments on how the AMDPAL OS type overloads the
existing AMDGPU_ calling conventions used by Mesa, and adds a couple of
new ones.

Reviewers: arsenm, nhaehnle, dstuttard

Subscribers: mehdi_amini, kzhuravl, wdng, yaxunl, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D37752

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@314502 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-29 09:51:22 +00:00
Matt Arsenault
e4e1eed1d7 AMDGPU: Allow coldcc calls
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312936 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-11 18:54:20 +00:00
Stanislav Mekhanoshin
f3b5f2ad4a [AMDGPU] Prevent infinite recursion in DAG.computeKnownBits()
Differential Revision: https://reviews.llvm.org/D37392

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312364 91177308-0d34-0410-b5e6-96231b3b80d8
2017-09-01 20:43:20 +00:00
Matt Arsenault
d213820974 AMDGPU: Turn int pack pattern into build_vector
build_vector is a more useful canonical form when
pattern matching packed operations, so turn shift
into high element into a build_vector.

Should show no change for now.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@312282 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-31 21:17:22 +00:00
Stanislav Mekhanoshin
f4dd1bdd9a [AMDGPU] computeKnownBitsForTargetNode for 24 bit mul
Differential Revision: https://reviews.llvm.org/D37168

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311896 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-28 16:35:37 +00:00
Matt Arsenault
45424dbebb AMDGPU: Start adding tail call support
Handle the sibling call cases.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310753 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-11 20:42:08 +00:00
Matt Arsenault
0856e7acd5 AMDGPU: Don't use report_fatal_error for unsupported call types
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310004 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-03 23:32:41 +00:00
Matt Arsenault
c60159767d AMDGPU: Pass special input registers to functions
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@309998 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-03 23:00:29 +00:00
Matt Arsenault
43950949ad AMDGPU: Initial implementation of calls
Includes a hack to fix the type selected for
the GlobalAddress of the function, which will be
fixed by changing the default datalayout to use
generic pointers for 0.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@309732 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-01 19:54:18 +00:00
Hiroshi Inoue
e3b8cd6b61 fix typos in comments; NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@308127 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-16 08:11:56 +00:00
Matt Arsenault
078c435803 AMDGPU: Return correct type during argument lowering
The type needs to be casted back to the original argument type.
Fixes an assert that for some reason is only run when
using -debug.

Includes an additional combine to avoid test regressions
from having conversions mixed with multiple Assert[SZ]ext
nodes. On subtargets where i16 is legal, this was producing an i32
register with an i16 AssertZExt, truncated to i16 with another i8
AssertZExt.

t2: i32,ch = CopyFromReg t0, Register:i32 %vreg0
t3: i16 = truncate t2
t5: i16 = AssertZext t3, ValueType:ch:i8
t6: i8 = truncate t5
t7: i32 = zero_extend t6

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@308082 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-15 05:52:59 +00:00
Simon Pilgrim
2541a59ac3 Fix some more -Wimplicit-fallthrough warnings. NFCI.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307411 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-07 16:40:06 +00:00
David Stuttard
dad6e61ce7 [AMDGPU] Add intrinsics for tbuffer load and store
Intrinsic already existed for llvm.SI.tbuffer.store

Needed tbuffer.load and also re-implementing the intrinsic as llvm.amdgcn.tbuffer.*

Added CodeGen tests for the 2 new variants added.
Left the original llvm.SI.tbuffer.store implementation to avoid issues with existing code

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, tpr

Differential Revision: https://reviews.llvm.org/D30687

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306031 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-22 16:29:22 +00:00
Matt Arsenault
b9cdbc013b AMDGPU: Cleanup CreateLiveInRegister
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305748 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-19 21:52:45 +00:00
Chandler Carruth
e3e43d9d57 Sort the remaining #include lines in include/... and lib/....
I did this a long time ago with a janky python script, but now
clang-format has built-in support for this. I fed clang-format every
line with a #include and let it re-sort things according to the precise
LLVM rules for include ordering baked into clang-format these days.

I've reverted a number of files where the results of sorting includes
isn't healthy. Either places where we have legacy code relying on
particular include ordering (where possible, I'll fix these separately)
or where we have particular formatting around #include lines that
I didn't want to disturb in this patch.

This patch is *entirely* mechanical. If you get merge conflicts or
anything, just ignore the changes in this patch and run clang-format
over your #include lines in the files.

Sorry for any noise here, but it is important to keep these things
stable. I was seeing an increasing number of patches with irrelevant
re-ordering of #include lines because clang-format was used. This patch
at least isolates that churn, makes it easy to skip when resolving
conflicts, and gets us to a clean baseline (again).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304787 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-06 11:49:48 +00:00
Stanislav Mekhanoshin
6ff1a723f7 [AMDGPU] Combine and (srl) into shl (bfe)
Perform DAG combine:
and (srl x, c), mask => shl (bfe x, nb + c, mask >> nb), nb
Where nb is a number of trailing zeroes in mask.

It replaces two instructions with two and BFE is generally a more
expensive one. However this is only done if we are selecting a byte
or word at an aligned boundary which results in a proper SDWA
operand pattern. It is only done if SDWA is supported.

TODO: improve SDWA pass to actually convert this pattern. It is not
done now because we have an immediate in the instruction, which has
be moved into a VGPR.

Differential Revision: https://reviews.llvm.org/D33455

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303681 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-23 19:54:48 +00:00
Stanislav Mekhanoshin
ddde657138 [AMDGPU] Convert shl (add) into add (shl)
shl (or|add x, c2), c1 => or|add (shl x, c1), (c2 << c1)
This allows to fold a constant into an address in some cases as
well as to eliminate second shift if the expression is used as
an address and second shift is a result of a GEP.

Differential Revision: https://reviews.llvm.org/D33432

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303641 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-23 15:59:58 +00:00
Stanislav Mekhanoshin
69edad7913 [AMDGPU] Narrow lshl from 64 to 32 bit if possible
Turn expensive 64 bit shift into 32 bit if shift does not overflow int:
shl (ext x) => zext (shl x)

Differential Revision: https://reviews.llvm.org/D33367

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303569 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-22 16:58:10 +00:00
Matt Arsenault
a0540d3468 AMDGPU: Start defining a calling convention
Partially implement callee-side for arguments and return values.
byval doesn't work properly, and most likely sret or other on-stack
return values most as well.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303308 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-17 21:56:25 +00:00
Craig Topper
d49344495d [KnownBits] Add bit counting methods to KnownBits struct and use them where possible
This patch adds min/max population count, leading/trailing zero/one bit counting methods.

The min methods return answers based on bits that are known without considering unknown bits. The max methods give answers taking into account the largest count that unknown bits could give.

Differential Revision: https://reviews.llvm.org/D32931

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302925 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-12 17:20:30 +00:00
Matt Arsenault
2bbb56fd75 AMDGPU: Pull fneg out of extract_vector_elt
This allows folding source modifiers in more f16 cases.
Makes it easier to select per-component packed neg modifiers.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302813 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-11 17:26:25 +00:00
Craig Topper
ace8b39f82 [KnownBits] Add wrapper methods for setting and clear all bits in the underlying APInts in KnownBits.
This adds routines for reseting KnownBits to unknown, making the value all zeros or all ones. It also adds methods for querying if the value is zero, all ones or unknown.

Differential Revision: https://reviews.llvm.org/D32637

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302262 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-05 17:36:09 +00:00
Marek Olsak
a2057043bd AMDGPU: Add AMDGPU_HS calling convention
Reviewers: arsenm, nhaehnle

Subscribers: mehdi_amini, kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D32644

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301930 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-02 15:41:10 +00:00
Marek Olsak
007530e1a2 AMDGPU: Add new amdgcn.init.exec intrinsics
v2: More tests, bug fixes, cosmetic changes.

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D31762

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301677 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-28 20:21:58 +00:00
Craig Topper
8b430f87e6 [SelectionDAG] Use KnownBits struct in DAG's computeKnownBits and simplifyDemandedBits
This patch replaces the separate APInts for KnownZero/KnownOne with a single KnownBits struct. This is similar to what was done to ValueTracking's version recently.

This is largely a mechanical transformation from KnownZero to Known.Zero.

Differential Revision: https://reviews.llvm.org/D32569

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301620 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-28 05:31:46 +00:00
Matt Arsenault
666020a37d AMDGPU: Move trap lowering to DAG
Fixes traps in any block besides the entry block,
and fixes depending on a live-in physical register
by using a virtual register copy.

Also happens to stop emitting a nop in the case
debug trap is not supported.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301206 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-24 17:49:13 +00:00
Akira Hatanaka
586c752a82 [AArch64] Improve code generation for logical instructions taking
immediate operands.

This commit adds an AArch64 dag-combine that optimizes code generation
for logical instructions taking immediate operands. The optimization
uses demanded bits to change a logical instruction's immediate operand
so that the immediate can be folded into the immediate field of the
instruction.

This recommits r300932 and r300930, which was causing dag-combine to
loop forever. The problem was that optimizeLogicalImm was returning
true even when there was no change to the immediate node (which happened
when the immediate was all zeros or ones), which caused dag-combine to
push and pop the same node to the work list over and over again without
making any progress.

This commit fixes the bug by returning false early in optimizeLogicalImm
if the immediate is all zeros or ones. Also, it changes the code to
compare the immediate with 0 or Mask rather than calling
countPopulation.

rdar://problem/18231627

Differential Revision: https://reviews.llvm.org/D5591

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301019 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-21 18:53:12 +00:00
Akira Hatanaka
1933132d0a Revert r300932 and r300930.
It seems that r300930 was creating an infinite loop in dag-combine when
compling the following file:

MultiSource/Benchmarks/MiBench/consumer-typeset/z21.c

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300940 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-21 01:31:50 +00:00
Akira Hatanaka
63da689bdf [AArch64] Improve code generation for logical instructions taking
immediate operands.

This commit adds an AArch64 dag-combine that optimizes code generation
for logical instructions taking immediate operands. The optimization
uses demanded bits to change a logical instruction's immediate operand
so that the immediate can be folded into the immediate field of the
instruction.

This recommits r300913, which broke bots because I didn't fix a call to
ShrinkDemandedConstant in SIISelLowering.cpp after changing the APIs of
TargetLoweringOpt and TargetLowering.

rdar://problem/18231627

Differential Revision: https://reviews.llvm.org/D5591

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300930 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-21 00:05:16 +00:00
Akira Hatanaka
01c014ca98 Revert "[AArch64] Improve code generation for logical instructions taking"
This reverts r300913.

This broke bots.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300916 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-20 23:03:30 +00:00
Akira Hatanaka
ac0ecde9f0 [AArch64] Improve code generation for logical instructions taking
immediate operands.

This commit adds an AArch64 dag-combine that optimizes code generation
for logical instructions taking immediate operands. The optimization
uses demanded bits to change a logical instruction's immediate operand
so that the immediate can be folded into the immediate field of the
instruction.

rdar://problem/18231627

Differential Revision: https://reviews.llvm.org/D5591

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300913 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-20 22:47:56 +00:00
Matt Arsenault
938bfaf893 AMDGPU: Refactor argument lowering
Split into smaller functions and prepare for handling
non-entry functions.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299998 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-11 22:29:24 +00:00
Matt Arsenault
0719006e7e AMDGPU: Stop using CCAssignToRegWithShadow
This does not do what it is attempting to use it for
and requires working around in LowerFormalArguments.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299667 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-06 17:37:27 +00:00
Matt Arsenault
513e714dfd AMDGPU: Remove llvm.SI.vs.load.input
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299391 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-03 21:45:13 +00:00
Matt Arsenault
cd7c9c3178 AMDGPU: Remove legacy bfe intrinsics
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299372 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-03 18:08:08 +00:00
Matt Arsenault
c4de629ce2 AMDGPU: Remove unnecessary ands when f16 is legal
Add a new node to act as a fancy bitcast from f16 operations to
i32 that implicitly zero the high 16-bits of the result.

Alternatively could try making v2f16 legal and canonicalizing
on build_vectors.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299246 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-31 19:53:03 +00:00
Simon Pilgrim
9fc191fd45 [DAGCombiner] Add vector demanded elements support to ComputeNumSignBits
Currently ComputeNumSignBits returns the minimum number of sign bits for all elements of vector data, when we may only be interested in one/some of the elements.

This patch adds a DemandedElts argument that allows us to specify the elements we actually care about. The original ComputeNumSignBits implementation calls with a DemandedElts demanding all elements to match current behaviour. Scalar types set this to 1.

I've only added support for BUILD_VECTOR and EXTRACT_VECTOR_ELT so far, all others will default to demanding all elements but can be updated in due course.

Followup to D25691.

Differential Revision: https://reviews.llvm.org/D31311

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299219 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-31 13:54:09 +00:00
Simon Pilgrim
07898901df [DAGCombiner] Add vector demanded elements support to computeKnownBitsForTargetNode
Follow up to D25691, this sets up the plumbing necessary to support vector demanded elements support in known bits calculations in target nodes.

Differential Revision: https://reviews.llvm.org/D31249

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299201 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-31 11:24:16 +00:00
Simon Pilgrim
70a5705cf0 [AMDGPU] Tidy up computeKnownBitsForTargetNode/ComputeNumSignBitsForTargetNode arguments. NFCI.
Based on comment in D31249.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298991 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-29 12:09:25 +00:00
Yaxun Liu
ab3be33d40 [AMDGPU] Get address space mapping by target triple environment
As we introduced target triple environment amdgiz and amdgizcl, the address
space values are no longer enums. We have to decide the value by target triple.

The basic idea is to use struct AMDGPUAS to represent address space values.
For address space values which are not depend on target triple, use static
const members, so that they don't occupy extra memory space and is equivalent
to a compile time constant.

Since the struct is lightweight and cheap, it can be created on the fly at
the point of usage. Or it can be added as member to a pass and created at
the beginning of the run* function.

Differential Revision: https://reviews.llvm.org/D31284


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298846 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-27 14:04:01 +00:00
Matt Arsenault
d4f6485173 AMDGPU: Implement f16 fround
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298730 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-24 20:04:18 +00:00
Matt Arsenault
dc55587b7f AMDGPU: Rename SI_RETURN
This is used for a specific type of return to a shader part's
epilog code. Rename to try avoiding confusion from a true
call's return.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298452 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-21 22:18:10 +00:00