42 Commits

Author SHA1 Message Date
Matt Arsenault
dc55587b7f AMDGPU: Rename SI_RETURN
This is used for a specific type of return to a shader part's
epilog code. Rename to try avoiding confusion from a true
call's return.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298452 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-21 22:18:10 +00:00
Matt Arsenault
f06f68a796 AMDGPU: Don't wait at end of block with a trivial successor
If there is only one successor, and that successor only
has one predecessor the wait can obviously be delayed until
uses or the end of the next block. This avoids code quality
regressions when there are trivial fallthrough blocks inserted
for structurization.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297251 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-08 01:06:58 +00:00
Konstantin Zhuravlyov
017228cd76 [AMDGPU] Add target information that is required by tools to metadata
Differential Revision: https://reviews.llvm.org/D28760#fb670e28


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294449 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-08 14:05:23 +00:00
Eugene Zelenko
5751b552da [AMDGPU] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292688 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-21 00:53:49 +00:00
Jan Vesely
bf64cb107c AMDGPU/SI: Implement sendmsghalt intrinsic
v2: expose using amdgcn prefix

Differential Revision: https://reviews.llvm.org/D23511

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@290977 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-04 18:06:55 +00:00
Matt Arsenault
350d0dab1e AMDGPU: Refactor exp instructions
Structure the definitions a bit more like the other classes.

The main change here is to split EXP with the done bit set
to a separate opcode, so we can set mayLoad = 1 so that it won't
be reordered before the other exp stores, since this has the special
constraint that if the done bit is set then this should be the last
exp in she shader.

Previously all exp instructions were inferred to have unmodeled
side effects.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288695 91177308-0d34-0410-b5e6-96231b3b80d8
2016-12-05 20:23:10 +00:00
Matt Arsenault
a89dd1d1c4 AMDGPU: Rename flat operands to match mubuf
Use vaddr/vdst for the same purposes.

This also fixes a beg in SIInsertWaits for the
operand check. The stored value operand is currently called
data0 in the single offset case, not data.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288188 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-29 19:30:44 +00:00
Marek Olsak
2a24827c23 AMDGPU/SI: Add back reverted SGPR spilling code, but disable it
suggested as a better solution by Matt

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287942 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-25 17:37:09 +00:00
Marek Olsak
82bcf466c2 Revert "AMDGPU: Implement SGPR spilling with scalar stores"
This reverts commit 4404d0d6e354e80dd7f8f0a0e12d8ad809cf007e.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287936 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-25 16:03:34 +00:00
Matt Arsenault
4404d0d6e3 AMDGPU: Implement SGPR spilling with scalar stores
nThis avoids the nasty problems caused by using
memory instructions that read the exec mask while
spilling / restoring registers used for control flow
masking, but only for VI when these were added.

This always uses the scalar stores when enabled currently,
but it may be better to still try to spill to a VGPR
and use this on the fallback memory path.

The cache also needs to be flushed before wave termination
if a scalar store is used.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286766 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-13 18:20:54 +00:00
Matt Arsenault
e5fd9c09ad AMDGPU: Preserve vcc undef flags when inverting branch
If the branch was on a read-undef of vcc, passes that used
analyzeBranch to invert the branch condition wouldn't preserve
the undef flag resulting in a verifier error.

Fixes verifier failures in a future commit.

Also fix verifier error when inserting copy for vccz
corruption bug.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286133 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-07 19:09:27 +00:00
Tom Stellard
b15bbca10c AMDGPU/SI: Don't use non-0 waitcnt values when waiting on Flat instructions
Summary:
Flat instruction can return out of order, so we need always need to wait
for all the outstanding flat operations.

Reviewers: tony-tye, arsenm

Subscribers: kzhuravl, wdng, nhaehnle, llvm-commits, yaxunl

Differential Revision: https://reviews.llvm.org/D25998

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285479 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-28 23:53:48 +00:00
Konstantin Zhuravlyov
c7a23a58d8 [AMDGPU] Refactor waitcnt encoding
- Refactor bit packing/unpacking
- Calculate bit mask given bit shift and bit width
- Introduce function for decoding bits of waitcnt
- Introduce function for encoding bits of waitcnt
- Introduce function for getting waitcnt mask (instead of using bare numbers)
- Introduce function fot getting max waitcnt(s) (instead of using bare numbers)

Differential Revision: https://reviews.llvm.org/D25298


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283919 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-11 18:58:22 +00:00
Mehdi Amini
67f335d992 Use StringRef in Pass/PassManager APIs (NFC)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283004 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-01 02:56:57 +00:00
Konstantin Zhuravlyov
69560a642c [AMDGPU] Choose VMCNT, EXPCNT, LGKMCNT masks and shifts based on the isa version
Differential Revision: https://reviews.llvm.org/D24973


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282877 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-30 17:01:40 +00:00
Konstantin Zhuravlyov
07f122bad7 [AMDGPU] Ask subtarget if waitcnt instruction is needed before barrier instruction
Differential Revision: https://reviews.llvm.org/D24985


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282875 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-30 16:50:36 +00:00
Duncan P. N. Exon Smith
83b2ab7c4c AMDGPU: Remove implicit iterator conversions, NFC
Remove remaining implicit conversions from MachineInstrBundleIterator to
MachineInstr* from the AMDGPU backend.  In most cases, I made them less
attractive by preferring MachineInstr& or using a ranged-based for loop.

Once all the backends are fixed I'll make the operator explicit so that
this doesn't bitrot back.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274906 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-08 19:16:05 +00:00
Matt Arsenault
759ed7e410 AMDGPU: Cleanup subtarget handling.
Split AMDGPUSubtarget into amdgcn/r600 specific subclasses.
This removes most of the static_casting of the basic codegen
classes everywhere, and tries to restrict the features
visible on the wrong target.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@273652 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-24 06:30:11 +00:00
Tom Stellard
8478bd5765 AMDGPU/SI: Use the hazard recognizer to break SMEM soft clauses
Summary:
Add support for detecting hazards in SMEM soft clauses, so that we only
break the clauses when necessary, either by adding s_nop or re-ordering
other alu instructions.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18870

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@268260 91177308-0d34-0410-b5e6-96231b3b80d8
2016-05-02 17:39:06 +00:00
Tom Stellard
b8a2cc5119 AMDGPU/SI: Use hazard recognizer to detect DPP hazards
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18603

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@268247 91177308-0d34-0410-b5e6-96231b3b80d8
2016-05-02 16:23:09 +00:00
Tom Stellard
df1aa5c25d AMDGPU/SI: Remove wait state handling for SMRD in SIInsertWaits
This was supposed to be part of r268143.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@268154 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-30 04:04:48 +00:00
Nicolai Haehnle
0493c734a2 AMDGPU/SI: Add llvm.amdgcn.s.waitcnt.all intrinsic
Summary:
So it appears that to guarantee some of the ordering requirements of a GLSL
memoryBarrier() executed in the shader, we need to emit an s_waitcnt.

(We can't use an s_barrier, because memoryBarrier() may appear anywhere in
the shader, in particular it may appear in non-uniform control flow.)

Reviewers: arsenm, mareko, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19203

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@267729 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-27 15:46:01 +00:00
Tom Stellard
cb6c943dc2 AMDGPU/SI: Insert wait states required after v_readfirstlane on SI
Summary:
We will be able to handle this case much better once the hazard recognizer
is finished, but this conservative implementation  fixes a hang with the piglit
test:

spec/arb_arrays_of_arrays/execution/sampler/fs-nested-struct-arrays-nonconst-nested-arra

Reviewers: arsenm, nhaehnle

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18988

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@266105 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-12 18:40:43 +00:00
Tom Stellard
c2d9280e43 AMDGPU/SI: Add MachineBasicBlock parameter to SIInstrInfo::insertWaitStates
Summary: This makes it possible to insert nops at the end of blocks.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18549

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265678 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-07 14:47:07 +00:00
Tom Stellard
f53246799f AMDGPU/SI: Handle wait states required for DPP instructions
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D17543

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263447 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-14 17:05:56 +00:00
Marek Olsak
01d3696081 AMDGPU/SI: Incomplete shader binaries need to finish execution at the end
Reviewers: tstellarAMD, arsenm

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D18058

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263441 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-14 15:57:14 +00:00
Matt Arsenault
e407a109c1 AMDGPU: Simplify boolean conditional return statements
Patch by Richard Thomson

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262536 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-02 23:00:21 +00:00
Matt Arsenault
a0358fa131 AMDGPU: Fix bug 26659.
Fix checking the same instruction twice instead of the
second branch that uses vccz. I don't think this matters
currently because s_branch_vccnz is always used currently.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262457 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-02 04:12:39 +00:00
Tom Stellard
aced110517 AMDGPU/SI: Fix s_waitcnt insertion for flat instructions
Summary:
This was broken in r260694 which swapped the address and data operands
for flat store instructions.  The code in SIInsertWaits assumes
that the data operand always comes before the address operand, so
we need to add a special case for flat.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D17366

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261330 91177308-0d34-0410-b5e6-96231b3b80d8
2016-02-19 15:33:13 +00:00
Tom Stellard
58abe5f3d2 AMDGPU/SI: Implement a work-around for smrd corrupting vccz bit
Summary:
We will hit this once we have enabled uniform branches.  The
smrd-vccz-bug.ll test will be added with the uniform branch commit.

Reviewers: mareko, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D16725

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260137 91177308-0d34-0410-b5e6-96231b3b80d8
2016-02-08 19:49:20 +00:00
Tom Stellard
35cb73cd06 AMDGPU/SI: Correctly initialize SIInsertWaits pass
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D16724

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@259894 91177308-0d34-0410-b5e6-96231b3b80d8
2016-02-05 17:42:38 +00:00
Tom Stellard
641de45f36 AMDGPU: waitcnt operand fixes
Summary:
Allow lgkmcnt up to 0xF (hardware allows that).
Fix mask for ExpCnt in AMDGPUInstPrinter.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D16314

Patch by: Nikolay Haustov

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@259059 91177308-0d34-0410-b5e6-96231b3b80d8
2016-01-28 17:13:44 +00:00
Marek Olsak
8578c2b6fb AMDGPU/SI: Remove ending s_endpgm from non-void functions
Reviewers: tstellarAMD, arsenm

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D16035

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@257623 91177308-0d34-0410-b5e6-96231b3b80d8
2016-01-13 17:23:12 +00:00
Marek Olsak
fe319de414 AMDGPU/SI: Add s_waitcnt at the end of non-void functions
Summary:
v2: Make ReturnsVoid private, so that I can another 8 lines of code and
    look more productive.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D16034

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@257622 91177308-0d34-0410-b5e6-96231b3b80d8
2016-01-13 17:23:09 +00:00
Matt Arsenault
d2643e2ff9 AMDGPU: Add MachineInstr overloads for instruction format tests
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250797 91177308-0d34-0410-b5e6-96231b3b80d8
2015-10-20 04:35:43 +00:00
Matt Arsenault
fb65d2c241 AMDGPU: Fix unused variable warning in release build
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249091 91177308-0d34-0410-b5e6-96231b3b80d8
2015-10-01 22:40:35 +00:00
Matt Arsenault
ae6db4bdd7 AMDGPU: Make SIInsertWaits about a factor of 4 faster
This was the slowest target custom pass and was spending 80%
of the time in getMinimalPhysRegClass which was called
for every register operand.

Try to use the statically known register class when possible from
the instruction's MCOperandInfo. There are a few pseudo instructions
which are not well behaved with unknown register classes which still
require the expensive physical register class search.

There are a few other possibilities for making this even faster,
such as not inspecting implicit operands. For now those are checked
because it is technically possible to have a scalar load into
exec or vcc which can be implicitly used.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249079 91177308-0d34-0410-b5e6-96231b3b80d8
2015-10-01 21:43:15 +00:00
Matt Arsenault
7a6a7f2409 AMDGPU: Fix recomputing dominator tree unnecessarily
SIFixSGPRCopies does not modify the CFG, but this was
being recomputed before running SIFoldOperands.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248587 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-25 17:21:28 +00:00
Matt Arsenault
d0edb1f758 AMDGPU: Add s_dcache_* instructions
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248533 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-24 19:52:27 +00:00
Tom Stellard
7d43ecc4d4 AMDGPU/SI: Better handle s_wait insertion
We can wait on either VM, EXP or LGKM.
The waits are independent.

Without this patch, a wait inserted because of one of them
would also wait for all the previous others.
This patch makes s_wait only wait for the ones we need for the next
instruction.

Here's an example of subtle perf reduction this patch solves:

This is without the patch:

buffer_load_format_xyzw v[8:11], v0, s[44:47], 0 idxen
buffer_load_format_xyzw v[12:15], v0, s[48:51], 0 idxen
s_load_dwordx4 s[44:47], s[8:9], 0xc
s_waitcnt lgkmcnt(0)
buffer_load_format_xyzw v[16:19], v0, s[52:55], 0 idxen
s_load_dwordx4 s[48:51], s[8:9], 0x10
s_waitcnt vmcnt(1)
buffer_load_format_xyzw v[20:23], v0, s[44:47], 0 idxen

The s_waitcnt vmcnt(1) is useless.
The reason it is added is because the last
buffer_load_format_xyzw needs s[44:47], which was issued
by the first s_load_dwordx4. It waits for all VM
before that call to have finished.

Internally after every instruction, 3 counters (for VM, EXP and LGTM)
are updated after every instruction. For example buffer_load_format_xyzw
will
increase the VM counter, and s_load_dwordx4 the LGKM one.

Without the patch, for every defined register,
the current 3 counters are stored, and are used to know
how long to wait when an instruction needs the register.

Because of that, the s[44:47] counter includes that to use the register
you need to wait for the previous buffer_load_format_xyzw.

Instead this patch stores only the counters that matter for the
register,
and puts zero for the other ones, since we don't need any wait for them.

Patch by: Axel Davy

Differential Revision: http://reviews.llvm.org/D11883

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@245755 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-21 22:47:27 +00:00
Benjamin Kramer
d3c712e50b Fix some comment typos.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244402 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-08 18:27:36 +00:00
Tom Stellard
953c681473 R600 -> AMDGPU rename
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239657 91177308-0d34-0410-b5e6-96231b3b80d8
2015-06-13 03:28:10 +00:00