15 Commits

Author SHA1 Message Date
Jay Foad
9c7f3a51fd [AMDGPU] gfx10 atomic optimizer changes.
Summary:
Add support for gfx10, where all DPP operations are confined to work
within a single row of 16 lanes, and wave32.

Reviewers: arsenm, sheredom, critson, rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, t-tye, hiraditya, jfb, dstuttard, tpr, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65644

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@369745 91177308-0d34-0410-b5e6-96231b3b80d8
2019-08-23 10:07:43 +00:00
Jay Foad
25eb624f36 [DivergenceAnalysis] Add methods for querying divergence at use
Summary:
The existing isDivergent(Value) methods query whether a value is
divergent at its definition. However even if a value is uniform at its
definition, a use of it in another basic block can be divergent because
of divergent control flow between the def and the use.

This patch adds new isDivergent(Use) methods to DivergenceAnalysis,
LegacyDivergenceAnalysis and GPUDivergenceAnalysis.

This might allow D63953 or other similar workarounds to be removed.

Reviewers: alex-t, nhaehnle, arsenm, rtaylor, rampitec, simoll, jingyue

Reviewed By: nhaehnle

Subscribers: jfb, jvesely, wdng, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65141

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@367218 91177308-0d34-0410-b5e6-96231b3b80d8
2019-07-29 10:22:09 +00:00
Jay Foad
e860308011 [AMDGPU] Save some work when an atomic op has no uses
Summary:
In the atomic optimizer, save doing a bunch of work and generating a
bunch of dead IR in the fairly common case where the result of an
atomic op (i.e. the value that was in memory before the atomic op was
performed) is not used. NFC.

Reviewers: arsenm, dstuttard, tpr

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, t-tye, hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64981

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@366667 91177308-0d34-0410-b5e6-96231b3b80d8
2019-07-22 07:19:44 +00:00
Jay Foad
eeb3d74bfc [AMDGPU] Simplify the exclusive scan used for optimized atomics
Summary:
Change the scan algorithm to use only power-of-two shifts (1, 2, 4, 8,
16, 32) instead of starting off shifting by 1, 2 and 3 and then doing
a 3-way ADD, because:

1. It simplifies the compiler a little.
2. It minimizes vgpr pressure because each instruction is now of the
   form vn = vn + vn << c.
3. It is more friendly to the DPP combiner, which currently can't
   combine into an ADD3 instruction.

Because of #2 and #3 the end result is improved from this:

  v_add_u32_dpp v4, v3, v3  row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
  v_mov_b32_dpp v5, v3  row_shr:2 row_mask:0xf bank_mask:0xf
  v_mov_b32_dpp v1, v3  row_shr:3 row_mask:0xf bank_mask:0xf
  v_add3_u32 v1, v4, v5, v1
  s_nop 1
  v_add_u32_dpp v1, v1, v1  row_shr:4 row_mask:0xf bank_mask:0xe
  s_nop 1
  v_add_u32_dpp v1, v1, v1  row_shr:8 row_mask:0xf bank_mask:0xc
  s_nop 1
  v_add_u32_dpp v1, v1, v1  row_bcast:15 row_mask:0xa bank_mask:0xf
  s_nop 1
  v_add_u32_dpp v1, v1, v1  row_bcast:31 row_mask:0xc bank_mask:0xf

To this:

  v_add_u32_dpp v1, v1, v1  row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
  s_nop 1
  v_add_u32_dpp v1, v1, v1  row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
  s_nop 1
  v_add_u32_dpp v1, v1, v1  row_shr:4 row_mask:0xf bank_mask:0xe
  s_nop 1
  v_add_u32_dpp v1, v1, v1  row_shr:8 row_mask:0xf bank_mask:0xc
  s_nop 1
  v_add_u32_dpp v1, v1, v1  row_bcast:15 row_mask:0xa bank_mask:0xf
  s_nop 1
  v_add_u32_dpp v1, v1, v1  row_bcast:31 row_mask:0xc bank_mask:0xf

I.e. two fewer computational instructions, one extra nop where we could
schedule something else.

Reviewers: arsenm, sheredom, critson, rampitec, vpykhtin

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64411

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@366543 91177308-0d34-0410-b5e6-96231b3b80d8
2019-07-19 08:40:37 +00:00
Jay Foad
a647751288 [AMDGPU] Optimize atomic AND/OR/XOR
Summary: Extend the atomic optimizer to handle AND, OR and XOR.

Reviewers: arsenm, sheredom

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64809

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@366323 91177308-0d34-0410-b5e6-96231b3b80d8
2019-07-17 13:40:03 +00:00
Jay Foad
7935774fce [AMDGPU] Optimize atomic max/min
Summary:
Extend the atomic optimizer to handle signed and unsigned max and min
operations, as well as add and subtract.

Reviewers: arsenm, sheredom, critson, rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64328

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@366235 91177308-0d34-0410-b5e6-96231b3b80d8
2019-07-16 17:44:54 +00:00
Jay Foad
f25416049c [AMDGPU] Use a named predicate instead of a magic number.
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64201

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@365294 91177308-0d34-0410-b5e6-96231b3b80d8
2019-07-08 07:04:58 +00:00
Stanislav Mekhanoshin
6e4d9e6a89 [AMDGPU] gfx1010 wave32 icmp/fcmp intrinsic changes for wave32
Differential Revision: https://reviews.llvm.org/D63301

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363339 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-13 23:47:36 +00:00
Matt Arsenault
c06a0f6ec2 AMDGPU: Don't add unnecessary convergent attributes
These are redundant with the intrinsic declaration.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356143 91177308-0d34-0410-b5e6-96231b3b80d8
2019-03-14 13:46:09 +00:00
Carl Ritson
dfa4156319 [AMDGPU] Fix DPP operand order in atomic optimizer
Summary:
Ensure order of operands in DPP atomic optimizer final WWM step is appropriate for sub instructions.

Change-Id: I631d050e1c00a3b4bc7c11a90437064403c4cf30

Reviewers: sheredom, tpr

Reviewed By: sheredom

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, t-tye, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58900

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@355394 91177308-0d34-0410-b5e6-96231b3b80d8
2019-03-05 12:21:44 +00:00
Benjamin Kramer
b9408bae20 [AMDGPU] Remove unused variable
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@353704 91177308-0d34-0410-b5e6-96231b3b80d8
2019-02-11 14:49:54 +00:00
Neil Henning
4fee3cff4d [AMDGPU] Fix DPP sequence in atomic optimizer.
This commit fixes the DPP sequence in the atomic optimizer (which was
previously missing the row_shr:3 step), and works around a read_register
exec bug by using a ballot instead.

Differential Revision: https://reviews.llvm.org/D57737

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@353703 91177308-0d34-0410-b5e6-96231b3b80d8
2019-02-11 14:44:14 +00:00
Chandler Carruth
6b547686c5 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@351636 91177308-0d34-0410-b5e6-96231b3b80d8
2019-01-19 08:50:56 +00:00
Neil Henning
5efc1996ce [AMDGPU] Fix the new atomic optimizer in pixel shaders.
The new atomic optimizer I previously added in D51969 did not work
correctly when a pixel shader was using derivatives, and had helper
lanes active.

To fix this we add an llvm.amdgcn.ps.live call that guards a branch
around the entire atomic operation - ensuring that all helper lanes are
inactive within the wavefront when we compute our atomic results.

I've added a test case that can cause derivatives, and exposes the
problem.

Differential Revision: https://reviews.llvm.org/D53930

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346128 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-05 12:04:48 +00:00
Neil Henning
b461f4de29 [AMDGPU] Add an AMDGPU specific atomic optimizer.
This commit adds a new IR level pass to the AMDGPU backend to perform
atomic optimizations. It works by:

- Running through a function and finding atomicrmw add/sub or uses of
  the atomic buffer intrinsics for add/sub.
- If all arguments except the value to be added/subtracted are uniform,
  record the value to be optimized.
- Run through the atomic operations we can optimize and, depending on
  whether the value is uniform/divergent use wavefront wide operations
  (DPP in the divergent case) to calculate the total amount to be
  atomically added/subtracted.
- Then let only a single lane of each wavefront perform the atomic
  operation, reducing the total number of atomic operations in flight.
- Lastly we recombine the result from the single lane to each lane of
  the wavefront, and calculate our individual lanes offset into the
  final result.

Differential Revision: https://reviews.llvm.org/D51969

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343973 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-08 15:49:19 +00:00