Commit Graph

1518 Commits

Author SHA1 Message Date
Konstantin Zhuravlyov
ae3b2037b4 AMDGPU/Metadata: Always report a fixed number of hidden arguments
Currently it is 6. If the "feature" was not used, report dummy
hidden argument. Otherwise it does not match the kernarg size
reported in the kernel header.

Differential Revision: https://reviews.llvm.org/D45129


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329341 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-05 20:46:04 +00:00
Nicolai Haehnle
83bfebdaca AMDGPU: Dimension-aware image intrinsics
Summary:
These new image intrinsics contain the texture type as part of
their name and have each component of the address/coordinate as
individual parameters.

This is a preparatory step for implementing the A16 feature, where
coordinates are passed as half-floats or -ints, but the Z compare
value and texel offsets are still full dwords, making it difficult
or impossible to distinguish between A16 on or off in the old-style
intrinsics.

Additionally, these intrinsics pass the 'texfailpolicy' and
'cachectrl' as i32 bit fields to reduce operand clutter and allow
for future extensibility.

v2:
- gather4 supports 2darray images
- fix a bug with 1D images on SI

Change-Id: I099f309e0a394082a5901ea196c3967afb867f04

Reviewers: arsenm, rampitec, b-sumner

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D44939

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329166 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-04 10:58:54 +00:00
Nicolai Haehnle
126cd7e831 AMDGPU: Fix copying i1 value out of loop with non-uniform exit
Summary:
When an i1-value is defined inside of a loop and used outside of it, we
cannot simply use the SGPR bitmask from the loop's last iteration.

There are also useful and correct cases of an i1-value being copied between
basic blocks, e.g. when a condition is computed outside of a loop and used
inside it. The concept of dominators is not sufficient to capture what is
going on, so I propose the notion of "lane-dominators".

Fixes a bug encountered in Nier: Automata.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103743
Change-Id: If37b969ddc71d823ab3004aeafb9ea050e45bd9a

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D40547

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329164 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-04 10:57:58 +00:00
Farhana Aleen
a59291c1f6 [AMDGPU] performMinMaxCombine should not optimize patterns of vectors to min3/max3.
Summary: There are no packed instructions for min3 or max3. So, performMinMaxCombine should not optimize vectors of f16 to min3/max3.

Author: FarhanaAleen

Reviewed By: arsenm

Subscribers: llvm-commits, AMDGPU

Differential Revision: https://reviews.llvm.org/D45219

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329131 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-03 23:00:30 +00:00
Farhana Aleen
d82ffe5dae Revert "MSG"
This reverts commit 9a0ce889d1.

This was committed by mistake.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329119 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-03 21:51:45 +00:00
Farhana Aleen
9a0ce889d1 MSG
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329114 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-03 21:20:39 +00:00
Stanislav Mekhanoshin
936a756969 [AMDGPU] Fixed some instructions latencies
Differential Revision: https://reviews.llvm.org/D45073

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@328874 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-30 16:19:13 +00:00
Michael Bedy
5488d68d0b [AMDGPU] Fix the SDWA Peephole phase to handle src for dst:UNUSED_PRESERVE.
Summary:
The phase attempts to transform operations that extract a portion of a value
into an SDWA src operand in cases where that value is used only once. It
was not prepared for this use to be the preserved portion of a value for
dst:UNUSED_PRESERVE, resulting in a crash or assert.

This change either rejects the illegal SDWA attempt, or in the case where
dst:WORD_1 and the src_sel would be WORD_0, removes the unneeded
extract instruction.

Reviewers: arsenm, #amdgpu

Reviewed By: arsenm, #amdgpu

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D44364

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@328856 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-30 05:03:36 +00:00
Matt Arsenault
b18554c107 AMDGPU: Support realigning stack
While the stack access instructions don't care about
alignment > 4, some transformations on the pointer calculation
do make assumptions based on knowing the low bits of a pointer
are 0. If a stack object ends up being accessed through its
absolute address (relative to the kernel scratch wave offset),
the addressing expression may depend on the stack frame being
properly aligned. This was breaking in a testcase due to the
add->or combine.

I think some of the SP/FP handling logic is still backwards,
and overly simplistic to support all of the stack features.
Code which tries to modify the SP with inline asm for example
or variable sized objects will probably require redoing this.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@328831 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-29 21:30:06 +00:00
Matt Arsenault
ad41f941dc AMDGPU: Increase default stack alignment
8 and 16-byte values are common, so increase the default
alignment to avoid realigning the stack in most functions.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@328821 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-29 20:22:04 +00:00
Matt Arsenault
3ca9749f0a AMDGPU: Fix selection error on constant loads with < 4 byte alignment
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@328818 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-29 19:59:28 +00:00
Tim Renouf
9f475f3a91 Revert "[AMDGPU] For OS type AMDPAL, fixed scratch on compute shader"
This reverts commit 0daf86291d.

It was causing an assert in test/CodeGen/AMDGPU/amdpal.ll only on a
release-with-asserts build. I will resubmit the change when I have fixed
that.

Change-Id: If270594eba27a7dc4076bdeab3fa8e6bfda3288a

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@328695 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-28 11:21:07 +00:00
Tim Renouf
0daf86291d [AMDGPU] For OS type AMDPAL, fixed scratch on compute shader
Summary:
For OS type AMDPAL, the scratch descriptor is loaded from offset 0 of
the GIT, whose 32 bit pointer is in s0 (s8 for gfx9 merged shaders).

This commit fixes that to use offset 0x10 instead of offset 0 for a
compute shader, per the PAL ABI spec.

Reviewers: kzhuravl, nhaehnle, timcorringham

Subscribers: kzhuravl, wdng, yaxunl, t-tye, llvm-commits, dstuttard, nhaehnle, arsenm

Differential Revision: https://reviews.llvm.org/D44468

Change-Id: I93dffa647758e37f613bb5e0dfca840d82e6d26f

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@328673 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-27 21:35:00 +00:00
Tim Renouf
fccceddef3 [CodeGen] Fixed unreachable with -print-machineinstrs and custom pseudo source value
Summary:
Rev 327580 "[CodeGen] Use MIR syntax for MachineMemOperand printing"
broke -print-machineinstrs for us on AMDGPU, because we have custom
pseudo source values, and MIR serialization does not implement that.

This commit at least restores the functionality of -print-machineinstrs,
even if it does not properly implement the missing MIR serialization
functionality.

Differential Revision: https://reviews.llvm.org/D44871

Change-Id: I44961c0b90bf6d48c01484ed7a4e466fd300db66

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@328668 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-27 21:14:04 +00:00
Matt Arsenault
a2f8776c07 AMDGPU: Fix not preserving CSR VGPR if used for SGPR spills
Before this was not done if the function had no calls in it. This
is still a possible issue with any callable function, regardless
of calls present.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@328659 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-27 19:42:55 +00:00
Matt Arsenault
bf82806c2e AMDGPU: Fix crash when MachinePointerInfo invalid
The combine on a select of a load only triggers for
addrspace 0, and discards the MachinePointerInfo. The
conservative default needs to be used for this.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@328652 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-27 18:39:45 +00:00
Matt Arsenault
a833b4252d AMDGPU: Fix register name format in tests
These were changed to match the asm output name a long time ago,
although I think the old tablegenerated names still work.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@328651 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-27 18:39:42 +00:00
Matt Arsenault
aaf7156232 AMDGPU: Fix FP restore from being reordered with stack ops
In a function, s5 is used as the frame base SGPR. If a function
is calling another function, during the call sequence
it is copied to a preserved SGPR and restored.

Before it was possible for the scheduler to move stack operations
before the restore of s5, since there's nothing to associate
a frame index access with the restore.

Add an implicit use of s5 to the adjcallstack pseudo which ends
the call sequence to preven this from happening. I'm not 100%
satisfied with this solution, but I'm not sure what else would be
better.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@328650 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-27 18:38:51 +00:00
Tony Tye
9272c8addc [AMDGPU] Update OpenCL to use 48 bytes of implicit arguments for AMDGPU
Add two additional implicit arguments for OpenCL for the AMDGPU target using the AMDHSA runtime to support device enqueue.

Differential Revision: https://reviews.llvm.org/D44697


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@328351 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-23 18:58:47 +00:00
Tony Tye
2b4b7fe362 [AMDGPU] Remove use of OpenCL triple environment and replace with function attribute for AMDGPU
- Remove use of the opencl and amdopencl environment member of the target triple for the AMDGPU target.
- Use function attribute to communicate to the AMDGPU backend to add implicit arguments for OpenCL kernels for the AMDHSA OS.

Differential Revision: https://reviews.llvm.org/D43736


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@328349 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-23 18:45:18 +00:00
Sanjay Patel
3e65cc15a0 [InstSimplify] fp_binop X, NaN --> NaN
We propagate the existing NaN value when possible.

Differential Revision: https://reviews.llvm.org/D44521



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@328140 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-21 19:31:53 +00:00
Sanjay Patel
0245f1cd62 [AMDGPU] change test to avoid NaN math
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@327891 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-19 19:26:22 +00:00
Sanjay Patel
8a350b553d [AMDGPU] adjust tests to be nan-free
As suggested in D44521 - bitcast to integer for the math,
so we preserve the intent of these tests when NaN math
gets folded away.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@327890 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-19 19:23:53 +00:00
Matt Arsenault
41fae9f61a AMDGPU/GlobalISel: RegBankSelect for basic int ops
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@327843 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-19 14:07:23 +00:00
Matt Arsenault
fe57640983 AMDGPU: Don't leave dead illegal VGPR->SGPR copies
Normally DCE kills these, but at -O0 these get left behind
leaving suspicious looking illegal copies.

Replace with IMPLICIT_DEF to avoid iterator issues.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@327842 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-19 14:07:15 +00:00
Matt Arsenault
65181f7b75 AMDGPU/GlobalISel: Cleanup constant legality
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@327774 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-17 15:17:48 +00:00
Matt Arsenault
417485b734 AMDGPU/GlobalISel: Basic G_GEP legality
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@327773 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-17 15:17:45 +00:00
Matt Arsenault
177d1142dd AMDGPU/GlobalISel: Basic legality for load/store
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@327772 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-17 15:17:41 +00:00
Farhana Aleen
7c98e88dc9 [AMDGPU] Supported ds_write_b128 generation.
Summary: This is a follow-on patch of https://reviews.llvm.org/D44210

Author: FarhanaAleen

Reviewed By: msearles

Subscribers: llvm-commits, AMDGPU

Differential Revision: https://reviews.llvm.org/D44319

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@327726 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-16 18:12:00 +00:00
Dmitry Preobrazhensky
a5e8c708f7 [AMDGPU][MC][GFX8][GFX9][DISASSEMBLER] Added "_e32" suffix to 32-bit VINTRP opcodes
See bug 36751: https://bugs.llvm.org/show_bug.cgi?id=36751

Differential Revision: https://reviews.llvm.org/D44529

Reviewers: artem.tamazov, arsenm

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@327723 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-16 16:38:04 +00:00
Mark Searles
b30a83dec3 [AMDGPU] Waitcnt pass: Modify the waitcnt pass to propagate info in the case of a single basic block loop. mergeInputScoreBrackets() does this for us; update it so that it processes the single bb's score bracket when processing the single bb's preds. It is, after all, a pred of itself, so it's score bracket is needed.
Differential Revision: https://reviews.llvm.org/D44434

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@327583 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-14 22:04:32 +00:00
Francis Visoiu Mistrih
0d758f3663 [CodeGen] Use MIR syntax for MachineMemOperand printing
Get rid of the "; mem:" suffix and use the one we use in MIR: ":: (load 2)".

rdar://38163529

Differential Revision: https://reviews.llvm.org/D42377

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@327580 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-14 21:52:13 +00:00
Yaxun Liu
d4b84fce52 [AMDGPU] Fix lowering enqueue kernel when kernel has no name
Since the enqueued kernels have internal linkage, their names may be dropped.
In this case, give them unique names __amdgpu_enqueued_kernel or
__amdgpu_enqueued_kernel.n where n is a sequential number starting from 1.

Differential Revision: https://reviews.llvm.org/D44322


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@327291 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-12 16:34:06 +00:00
Dmitry Preobrazhensky
a70206a47a [AMDGPU][MC] Corrected GATHER4 opcodes
See bug 36252: https://bugs.llvm.org/show_bug.cgi?id=36252

Differential Revision: https://reviews.llvm.org/D43874

Reviewers: artem.tamazov, arsenm

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@327278 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-12 15:03:34 +00:00
Matt Arsenault
7f9dbc4419 AMDGPU/GlobalISel: Legality and RegBankInfo for G_{INSERT|EXTRACT}_VECTOR_ELT
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@327269 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-12 13:35:53 +00:00
Matt Arsenault
e0eff38b22 AMDGPU/GlobalISel: InstrMapping for G_MERGE_VALUES
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@327268 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-12 13:35:49 +00:00
Matt Arsenault
b3834e5d6b AMDGPU/GlobalISel: Make some G_MERGE_VALUEs legal
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@327267 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-12 13:35:43 +00:00
Sanjay Patel
00cb8ab926 [AMDGPU] fix tests to be independent of FP undef
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@327211 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-10 16:39:59 +00:00
Matt Arsenault
5c56853ab7 AMDGPU: Fix crash when constant folding with physreg operand
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@327209 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-10 16:05:35 +00:00
Farhana Aleen
2006e6286b [AMDGPU] Supported ds_read_b128 generation; Widened vector length for local address-space.
Summary: Starting from GCN 2nd generation, ISA supports ds_read_b128 on top of ds_read_b64.
         This patch supports ds_read_b128 instruction pattern and generation of this instruction.
         In the vectorizer, this patch also widen the vector length so that vectorizer generates
         128 bit loads for local address-space which gets translated to ds_read_b128.
         Since the performance benefit is not clear; compiler generates ds_read_b128 under -amdgpu-ds128.

Author: FarhanaAleen

Reviewed By: rampitec, arsenm

Subscribers: llvm-commits, AMDGPU

Differential Revision: https://reviews.llvm.org/D44210

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@327153 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-09 17:41:39 +00:00
Sanjay Patel
21de18a5cc [AMDGPU] fix test to be independent of FP undef
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@327147 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-09 16:33:34 +00:00
Stanislav Mekhanoshin
c88f3543c0 [AMDGPU] Fixed V_DIV_FIXUP_F16 selection on GFX9
GFX9 should select opsel version.

Differential Revision: https://reviews.llvm.org/D44279

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@327106 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-09 07:21:43 +00:00
Sanjay Patel
74ff3cc8bd [AMDGPU] fix test to survive more FP undef constant folding
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@327066 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-08 21:30:56 +00:00
Sanjay Patel
a6a4aed947 [AMDGPU] fix test to survive the most basic undef constant folding
This will likely need to be changed again for anything more than:
fmul undef, undef -> undef


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@327034 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-08 17:34:25 +00:00
Farhana Aleen
084dcd89de [AMDGPU] Increased vector length for global/constant loads.
Summary: GCN ISA supports instructions that can read 16 consecutive dwords from memory through the scalar data cache;
         loadstoreVectorizer should take advantage of the wider vector length and pack 16/8 elements of dwords/quadwords.

Author: FarhanaAleen

Reviewed By: rampitec

Subscribers: llvm-commits, AMDGPU

Differential Revision: https://reviews.llvm.org/D44179

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@326910 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-07 17:09:18 +00:00
Farhana Aleen
832984ded2 Revert "[AMDGPU] Widened vector length for global/constant address space."
This reverts commit ce988cc100dc65e7c6c727aff31ceb99231cab03.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@326907 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-07 16:55:27 +00:00
Farhana Aleen
a446275ee2 [AMDGPU] Widened vector length for global/constant address space.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@326904 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-07 16:29:05 +00:00
Yaxun Liu
2d74623e2e [AMDGPU] Fix lowering OpenCL enqueue_kernel
One addrspacecast disappeared in clang emitted IR for
block invoke function due to adoption of the new
addr space mapping.

Differential Revision: https://reviews.llvm.org/D43785


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@326806 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-06 16:04:39 +00:00
Matt Arsenault
f246669c10 AMDGPU/GlobalISel: Add InstrMapping for G_EXTRACT
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@326715 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-05 16:25:18 +00:00
Matt Arsenault
4e77263fcb AMDGPU/GlobalISel: Make some G_EXTRACTs legal
As far as I can tell legalization of weird sizes for the
output type isn't implemented.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@326714 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-05 16:25:15 +00:00