Commit Graph

55 Commits

Author SHA1 Message Date
Carl Ritson
8cea59c941 AMDGPU/InsertWaitcnts: Update VGPR/SGPR bounds when brackets are merged
Summary:
Fix an issue where VGPR/SGPR bounds are not properly extended when brackets are merged.
This manifests as missing waitcnt insertions when multiple brackets are forwarded to a successor block and the first forward has lower VGPR/SGPR bounds.

Irreducible loop test has been extended based on a CTS failure detected for GFX9.

Reviewers: nhaehnle

Reviewed By: nhaehnle

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, jfb, llvm-commits

Differential Revision: https://reviews.llvm.org/D55602

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@349611 91177308-0d34-0410-b5e6-96231b3b80d8
2018-12-19 10:17:49 +00:00
Nicolai Haehnle
41f457ff4f AMDGPU/InsertWaitcnts: Remove the dependence on MachineLoopInfo
Summary:
MachineLoopInfo cannot be relied on for correctness, because it cannot
properly recognize loops in irreducible control flow which can be
introduced by late machine basic block optimization passes. See the new
test case for the reduced form of an example that occurred in practice.

Use a simple fixpoint iteration instead.

In order to facilitate this change, refactor WaitcntBrackets so that it
only tracks pending events and registers, rather than also maintaining
state that is relevant for the high-level algorithm. Various accessor
methods can be removed or made private as a consequence.

Affects (in radv):
- dEQP-VK.glsl.loops.special.{for,while}_uniform_iterations.select_iteration_count_{fragment,vertex}

Fixes: r345719 ("AMDGPU: Rewrite SILowerI1Copies to always stay on SALU")

Reviewers: msearles, rampitec, scott.linder, kanarayan

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam

Differential Revision: https://reviews.llvm.org/D54231

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347853 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-29 11:06:26 +00:00
Nicolai Haehnle
9da2d56db9 AMDGPU/InsertWaitcnt: Consistently use uint32_t for scores / time points
Summary:
There is one obsolete reference to using -1 as an indication of "unknown",
but this isn't actually used anywhere.

Using unsigned makes robust wrapping checks easier.

Reviewers: msearles, rampitec, scott.linder, kanarayan

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, llvm-commits, tpr, t-tye, hakzsam

Differential Revision: https://reviews.llvm.org/D54230

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347852 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-29 11:06:21 +00:00
Nicolai Haehnle
608f48308a AMDGPU/InsertWaitcnt: Remove unused WaitAtBeginning
Reviewers: msearles, rampitec, scott.linder, kanarayan

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam

Differential Revision: https://reviews.llvm.org/D54229

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347851 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-29 11:06:18 +00:00
Nicolai Haehnle
219b87abb4 AMDGPU/InsertWaitcnts: Simplify pending events tracking
Summary:
Instead of storing the "score" (last time point) of the various relevant
events, only store whether an event is pending or not.

This is sufficient, because whenever only one event of a count type is
pending, its last time point is naturally the upper bound of all time
points of this count type, and when multiple event types are pending,
the count type has gone out of order and an s_waitcnt to 0 is required
to clear any pending event type (and will then clear all pending event
types for that count type).

This also removes the special handling of GDS_GPR_LOCK and EXP_GPR_LOCK.
I do not understand what this special handling ever attempted to achieve.
It has existed ever since the original port from an internal code base,
so my best guess is that it solved a problem related to EXEC handling in
that internal code base.

Reviewers: msearles, rampitec, scott.linder, kanarayan

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam

Differential Revision: https://reviews.llvm.org/D54228

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347850 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-29 11:06:14 +00:00
Nicolai Haehnle
cd71b9eae5 AMDGPU/InsertWaitcnts: Use foreach loops for inst and wait event types
Summary:
It hides the type casting ugliness, and I happened to have to add a new
such loop (in a later patch).

Reviewers: msearles, rampitec, scott.linder, kanarayan

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam

Differential Revision: https://reviews.llvm.org/D54227

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347849 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-29 11:06:11 +00:00
Nicolai Haehnle
f2ec2633c5 AMDGPU/InsertWaitcnts: Untangle some semi-global state
Summary:
Reduce the statefulness of the algorithm in two ways:

1. More clearly split generateWaitcntInstBefore into two phases: the
   first one which determines the required wait, if any, without changing
   the ScoreBrackets, and the second one which actually inserts the wait
   and updates the brackets.

2. Communicate pre-existing s_waitcnt instructions using an argument to
   generateWaitcntInstBefore instead of through the ScoreBrackets.

To simplify these changes, a Waitcnt structure is introduced which carries
the counts of an s_waitcnt instruction in decoded form.

There are some functional changes:

1. The FIXME for the VCCZ bug workaround was implemented: we only wait for
   SMEM instructions as required instead of waiting on all counters.

2. We now properly track pre-existing waitcnt's in all cases, which leads
   to less conservative waitcnts being emitted in some cases.

     s_load_dword ...
     s_waitcnt lgkmcnt(0)    <-- pre-existing wait count
     ds_read_b32 v0, ...
     ds_read_b32 v1, ...
     s_waitcnt lgkmcnt(0)    <-- this is too conservative
     use(v0)
     more code
     use(v1)

   This increases code size a bit, but the reduced latency should still be a
   win in basically all cases. The worst code size regressions in my shader-db
   are:

 WORST REGRESSIONS - Code Size
 Before After     Delta Percentage
   1724  1736        12    0.70 %   shaders/private/f1-2015/1334.shader_test [0]
   2276  2284         8    0.35 %   shaders/private/f1-2015/1306.shader_test [0]
   4632  4640         8    0.17 %   shaders/private/ue4_elemental/62.shader_test [0]
   2376  2384         8    0.34 %   shaders/private/f1-2015/1308.shader_test [0]
   3284  3292         8    0.24 %   shaders/private/talos_principle/1955.shader_test [0]

Reviewers: msearles, rampitec, scott.linder, kanarayan

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam

Differential Revision: https://reviews.llvm.org/D54226

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347848 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-29 11:06:06 +00:00
Nicolai Haehnle
06e2a9cc77 AMDGPU/InsertWaitcnts: Some more const-correctness
Reviewers: msearles, rampitec, scott.linder, kanarayan

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam

Differential Revision: https://reviews.llvm.org/D54225

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347192 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-19 12:03:11 +00:00
Nicolai Haehnle
d3698200d1 AMDGPU/InsertWaitcnts: Cleanup some old cruft (NFCI)
Summary: Remove redundant logic and simplify control flow.

Reviewers: msearles, rampitec, scott.linder, kanarayan

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D54086

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346363 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-07 21:53:36 +00:00
Nicolai Haehnle
6747ae50ad AMDGPU/InsertWaitcnts: Remove kill-related logic
Summary:
This is not needed, because we don't actually insert relevant branches
for KILLs that late in the compilation flow.

Besides, this was always checking for the wrong kill opcode anyway...

Reviewers: msearles, rampitec, scott.linder, kanarayan

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D54085

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346362 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-07 21:53:29 +00:00
Konstantin Zhuravlyov
4d82ce5c27 AMDGPU: Re-apply r341982 after fixing the layering issue
Move isa version determination into TargetParser.

Also switch away from target features to CPU string when
determining isa version. This fixes an issue when we
output wrong isa version in the object code when features
of a particular CPU are altered (i.e. gfx902 w/o xnack
used to result in gfx900).



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342069 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-12 18:50:47 +00:00
Ilya Biryukov
867f48781f Revert "AMDGPU: Move isa version and EF_AMDGPU_MACH_* determination into TargetParser."
This reverts commit r341982.

The change introduced a layering violation. Reverting to unbreak
our integrate.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342023 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-12 07:05:30 +00:00
Konstantin Zhuravlyov
b479681381 AMDGPU: Move isa version and EF_AMDGPU_MACH_* determination
into TargetParser.

Also switch away from target features to CPU string when
determining isa version. This fixes an issue when we
output wrong isa version in the object code when features
of a particular CPU are altered (i.e. gfx902 w/o xnack
used to result in gfx900).

Differential Revision: https://reviews.llvm.org/D51890



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@341982 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-11 18:56:51 +00:00
Matt Arsenault
7e212e4168 AMDGPU: Remove remnants of old address space mapping
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@341165 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-31 05:49:54 +00:00
Mark Searles
a720958169 [AMDGPU][Waitcnt] Re-apply fix "comparison of integers of different signs" build error"
Re-apply "[AMDGPU][Waitcnt] fix "comparison of integers of different signs" build error""
( fe0a456510131f268e388c4a18a92f575c0db183 ), which was inadvertantly reverted via
2b2ee080f0164485562593b1b87291a48cea4a9a .

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337156 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-16 10:21:36 +00:00
Mark Searles
356f9421a1 Revert "[AMDGPU][Waitcnt] fix "comparison of integers of different signs" build error"
This reverts commit fe0a456510131f268e388c4a18a92f575c0db183.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337153 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-16 10:02:40 +00:00
Tom Stellard
1d6fd076a3 AMDGPU: Refactor Subtarget classes
Summary:
This is a follow-up to r335942.
- Merge SISubtarget into AMDGPUSubtarget and rename to GCNSubtarget
- Rename AMDGPUCommonSubtarget to AMDGPUSubtarget
- Merge R600Subtarget::Generation and GCNSubtarget::Generation into
  AMDGPUSubtarget::Generation.

Reviewers: arsenm, jvesely

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D49037

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336851 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-11 20:59:01 +00:00
Mark Searles
28793aedf6 [AMDGPU][Waitcnt] fix "comparison of integers of different signs" build error
Build error on Android; reported by and fix provided by (thanks) by Mauro Rossi <issor.oruam@gmail.com>

Fixes the following building error:

external/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:1903:61:
error: comparison of integers of different signs:
'typename iterator_traits<__wrap_iter<MachineBasicBlock **> >::difference_type'
(aka 'int') and 'unsigned int' [-Werror,-Wsign-compare]
                      BlockWaitcntProcessedSet.end(), &MBB) < Count)) {
                      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~~
1 error generated.

Differential Revision: https://reviews.llvm.org/D49089

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336588 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-09 19:28:14 +00:00
Tom Stellard
cba2181e77 AMDGPU: Separate R600 and GCN TableGen files
Summary:
We now have two sets of generated TableGen files, one for R600 and one
for GCN, so each sub-target now has its own tables of instructions,
registers, ISel patterns, etc.  This should help reduce compile time
since each sub-target now only has to consider information that
is specific to itself.  This will also help prevent the R600
sub-target from slowing down new features for GCN, like disassembler
support, GlobalISel, etc.

Reviewers: arsenm, nhaehnle, jvesely

Reviewed By: arsenm

Subscribers: MatzeB, kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D46365

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335942 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-28 23:47:12 +00:00
Konstantin Zhuravlyov
784d2a8499 AMDGPU: Silence unused warnings in waitcnt insertion pass in release build
Differential Revision: https://reviews.llvm.org/D48607


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335669 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-26 21:33:38 +00:00
Scott Linder
cbeaa8b0e2 [AMDGPU] Fix bug with tracking processed blocks in SIInsertWaitcnts
BlockWaitcntProcessedSet was not being cleared between calls, so it was
producing incorrect counts in cases where MBB addresses happened to coincide
across multiple calls.

Differential Revision: https://reviews.llvm.org/D48391



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335268 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-21 18:48:48 +00:00
Mark Searles
8177aafa74 [AMDGPU][Waitcnt] Fix handling of flat instrs
On GFX9 and earlier, flat memory ops may decrement VMCNT out-of-order as well as LGKMCNT out-of-order.

Differential Revision: https://reviews.llvm.org/D46616

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333926 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-04 16:51:59 +00:00
Mark Searles
b7ba560cf9 [AMDGPU][Waitcnt] Fix build error: unused variable 'SWaitInst'
https://reviews.llvm.org/rL333556 caused a buildbot failure.

See http://lab.llvm.org:8011/builders/lld-x86_64-darwin13/builds/21876/steps/build_Lld/logs/stdio

/Users/buildslave/as-bldslv9/lld-x86_64-darwin13/llvm.src/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:2007:10: error: unused variable 'SWaitInst' [-Werror,-Wunused-variable]
    auto SWaitInst = BuildMI(EntryBB, EntryBB.getFirstNonPHI(),

The unused variable was for debugging purposes; removing that piece of code
to fix the build.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333559 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-30 16:27:57 +00:00
Mark Searles
b81ef9cebe [AMDGPU][Waitcnt] Fix handling of loops with many bottom blocks
In terms of waitcnt insertion/if necessary, the waitcnt pass forces convergence
for a loop. Previously, that kicked if greater than 2 passes over a loop, which
doesn't account for loop with many bottom blocks. So, increase the threshold to
(n+1), where n is the number of bottom blocks. This gives the pass an
opportunity to consider the contribution of each bottom block, to the overall
loop, before the forced convergence potentially kicks in.

Differential Revision: https://reviews.llvm.org/D47488

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333556 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-30 15:47:45 +00:00
Nicola Zaghen
0818e789cb Rename DEBUG macro to LLVM_DEBUG.
The DEBUG() macro is very generic so it might clash with other projects.
The renaming was done as follows:
- git grep -l 'DEBUG' | xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g'
- git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM
- Manual change to APInt
- Manually chage DOCS as regex doesn't match it.

In the transition period the DEBUG() macro is still present and aliased
to the LLVM_DEBUG() one.

Differential Revision: https://reviews.llvm.org/D43624



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@332240 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-14 12:53:11 +00:00
Shiva Chen
24abe71d71 [DebugInfo] Examine all uses of isDebugValue() for debug instructions.
Because we create a new kind of debug instruction, DBG_LABEL, we need to
check all passes which use isDebugValue() to check MachineInstr is debug
instruction or not. When expelling debug instructions, we should expel
both DBG_VALUE and DBG_LABEL. So, I create a new function,
isDebugInstr(), in MachineInstr to check whether the MachineInstr is
debug instruction or not.

This patch has no new test case. I have run regression test and there is
no difference in regression test.

Differential Revision: https://reviews.llvm.org/D45342

Patch by Hsiangkai Wang.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331844 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-09 02:42:00 +00:00
Mark Searles
ac95aa5361 [AMDGPU][Waitcnt] Remove the old waitcnt pass
Remove the old waitcnt pass ( si-insert-waits ), which is no longer maintained
and getting crufty

Differential Revision: https://reviews.llvm.org/D46448

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331641 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-07 14:43:28 +00:00
Adrian Prantl
26b584c691 Remove \brief commands from doxygen comments.
We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.

Patch produced by

  for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done

Differential Revision: https://reviews.llvm.org/D46290

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331272 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-01 15:54:18 +00:00
Mark Searles
3d681a99a4 [AMDGPU][Waitcnt] As of gfx7, VMEM operations do not increment the export counter and the input registers are available in the next instruction; update the waitcnt pass to take this into account.
Differential Revision: https://reviews.llvm.org/D46067

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@330954 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-26 16:11:19 +00:00
Mark Searles
b922196f7a [AMDGPU] Waitcnt pass: add debug options
- Add "amdgpu-waitcnt-forcezero" to force all waitcnt instrs to be emitted as s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)

- Add debug counters to control force emit of s_waitcnt instrs; debug counters:
si-insert-waitcnts-forceexp: force emit s_waitcnt expcnt(0) instrs
si-insert-waitcnts-forcevm: force emit s_waitcnt lgkmcnt(0) instrs
si-insert-waitcnts-forcelgkm: force emit s_waitcnt vmcnt(0) instrs

- Add some debug statements

Note that a variant of this patch was previously committed/reverted.

Differential Revision: https://reviews.llvm.org/D45888

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@330862 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-25 19:21:26 +00:00
Mark Searles
fc4e13c88f [AMDGPU][Waitcnt] NFC. Cleanup some code/naming consistency:
- s/SWaitcnt/Waitcnt s/WaitCnt/Waitcnt

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@330730 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-24 15:59:59 +00:00
Mark Searles
4f520606fc [AMDGPU] Do not only rely on BB number when finding bottom loop
We should also check that the "bottom" basic block of a loopis a successor of the "header" basic block, otherwise we don't propagate the information correctly when the CFG is complex. This fixes an important rendering problem with Wolfsentein 2, because of one vector-memory wait was missing.

Differential Revision: https://reviews.llvm.org/D43831

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@330337 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-19 15:42:30 +00:00
Mark Searles
b30a83dec3 [AMDGPU] Waitcnt pass: Modify the waitcnt pass to propagate info in the case of a single basic block loop. mergeInputScoreBrackets() does this for us; update it so that it processes the single bb's score bracket when processing the single bb's preds. It is, after all, a pred of itself, so it's score bracket is needed.
Differential Revision: https://reviews.llvm.org/D44434

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@327583 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-14 22:04:32 +00:00
Mark Searles
04c7451aa8 [AMDGPU] Make note of existing waitcnt instrs; this is add-on work related to suppression of redundant waitcnt instrs. It is necessary to make note of these existing waitcnt instrs so that we do not fall into an infinite loop when handling loops. Also, [NFC] some minor code clean-up.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@325524 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-19 19:19:59 +00:00
Stanislav Mekhanoshin
ee1ab18539 [AMDGPU] Combine adjacent waitcounts in a single strongest wait
Differential Revision: https://reviews.llvm.org/D43350

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@325299 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-15 22:03:55 +00:00
Stanislav Mekhanoshin
84f9e04139 [AMDGPU] Fixed wait count reuse
The code reusing existing wait counts is incorrect since it keeps
adding new operands to an old instruction instead of replacing
the immediate. It was also effectively switched off by the condition
that wait count is not an AMDGPU::S_WAITCNT.

Also switched to BuildMI instead of creating instructions directly.

Differential Revision: https://reviews.llvm.org/D42997

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@324547 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-08 00:18:35 +00:00
Mark Searles
4fe7c58adf [AMDGPU] Suppress redundant waitcnt instrs.
1. Run the memory legalizer prior to the waitcnt pass; keep the policy that the waitcnt pass does not remove any waitcnts within the incoming IR.

2. The waitcnt pass doesn't (yet) track waitcnts that exist prior to the waitcnt pass (it just skips over them); because the waitcnt pass is ignorant of them, it may insert a redundant waitcnt. To avoid this, check the prev instr. If it and the to-be-inserted waitcnt are the same, suppress the insertion. We keep the existing waitcnt under the assumption that whomever, e.g., the memory legalizer, inserted it knows what they were doing.

3. Follow-on work: teach the waitcnt pass to record the pre-existing waitcnts for better waitcnt production.

Differential Revision: https://reviews.llvm.org/D42854

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@324440 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-07 02:21:21 +00:00
Mark Searles
505994887d [AMDGPU] Revert "[AMDGPU] Add options for waitcnt pass debugging; add instr count in debug output."
Patch caused a buildbot failure; arg; http://lab.llvm.org:8011/builders/lld-x86_64-darwin13/builds/17373/s\
teps/build_Lld/logs/stdio :
        /Users/buildslave/as-bldslv9/lld-x86_64-darwin13/llvm.src/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:1563:18: error: unused variable 'InstCnt' [-Werror,-Wunused-variable]
          static int32_t InstCnt = 0;
                                              "
This reverts commit 4f4a7d61e306b67044d9f16bc2016fee806bc2cc.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@323791 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-30 17:17:06 +00:00
Mark Searles
5b32b73115 [AMDGPU] Add options for waitcnt pass debugging; add instr count in debug output.
-amdgpu-waitcnt-forcezero={1|0}  Force all waitcnt instrs to be emitted as s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-amdgpu-waitcnt-forceexp=<n>  Force emit a s_waitcnt expcnt(0) before the first <n> instrs
-amdgpu-waitcnt-forcelgkm=<n> Force emit a s_waitcnt lgkmcnt(0) before the first <n> instrs
-amdgpu-waitcnt-forcevm=<n>   Force emit a s_waitcnt vmcnt(0) before the first <n> instrs

This patch was pushed ( abb190fd51cd2f9a9eef08c024e109f7f7e909fc ), which caused a buildbot failure, reverted ( 6227480d74da507cf8e1b4bcaffbdb9fb875b4b8 ), and then updated to fix buildbot failures (this patch).

Differential Revision: https://reviews.llvm.org/D40091

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@323788 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-30 16:49:38 +00:00
Hiroshi Inoue
d1b456b6d1 [NFC] fix trivial typos in comments and documents
"to to" -> "to"



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@323628 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-29 05:17:03 +00:00
Mark Searles
82e0652b95 [AMDGPU] Revert "[AMDGPU] Add options for waitcnt pass debugging; add instr count in debug output."
Patch caused a buildbot failure; http://lab.llvm.org:8011/builders/lld-x86_64-darwin13/builds/15733/steps/build_Lld/logs/stdio :
lib/Target/AMDGPU/SIInsertWaitcnts.cpp:396:11: error: private field 'InstCnt' is not used [-Werror,-Wunused-private-field]
  int32_t InstCnt = 0;
          ^
1 error generated.
"
This reverts commit 71627f7901.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320086 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-07 21:14:41 +00:00
Mark Searles
71627f7901 [AMDGPU] Add options for waitcnt pass debugging; add instr count in debug output.
-amdgpu-waitcnt-forcezero={1|0}  Force all waitcnt instrs to be emitted as s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-amdgpu-waitcnt-forceexp=<n>  Force emit a s_waitcnt expcnt(0) before the first <n> instrs
-amdgpu-waitcnt-forcelgkm=<n> Force emit a s_waitcnt lgkmcnt(0) before the first <n> instrs
-amdgpu-waitcnt-forcevm=<n>   Force emit a s_waitcnt vmcnt(0) before the first <n> instrs

Differential Revision: https://reviews.llvm.org/D40091

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320084 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-07 20:36:39 +00:00
Tim Corringham
1e6aa1171d AMDGPU: fix missing s_waitcnt
Summary:
The pass that inserts s_waitcnt instructions where needed propagated
info used to track dependencies for each block by iterating over the
predecessor blocks. The iteration was terminated when a predecessor
that had not yet been processed was encountered. Any info in blocks
later in the list was therefore not processed, leading to the
possiblility of a required s_waitcnt not being inserted.

The fix is simply to change the "break" to "continue" for the
relevant loops, so that all visited blocks are processed. This
is likely what was intended when the code was written.

There is no test case provided for this fix because:
1) the only example that reproduces this is large and resistant to
being reduced
2) the change is trivial

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D40544

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@319651 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-04 12:30:49 +00:00
Matt Arsenault
9f8c0170e6 AMDGPU: Move hazard avoidance out of waitcnt pass.
This is mostly moving VMEM clause breaking into
the hazard recognizer. Also move another hazard
currently handled in the waitcnt pass.

Also stops breaking clauses unless xnack is enabled.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@318557 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-17 21:35:32 +00:00
NAKAMURA Takumi
598658d792 Fix warnings discovered by rL317076. [-Wunused-private-field]
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317091 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-01 13:47:55 +00:00
Evgeny Mankov
cb139f4145 [AMDGPU] NFC: test commit
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@311019 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-16 16:47:29 +00:00
Eugene Zelenko
5ca94f31ee [AMDGPU] Fix some Clang-tidy modernize-use-using and Include What You Use warnings; other minor fixes (NFC).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@310328 91177308-0d34-0410-b5e6-96231b3b80d8
2017-08-08 00:47:13 +00:00
Matt Arsenault
3ff37decad AMDGPU: Partially fix improper reliance on memoperands
There are 2 more places doing this, but I'm not sure
what they are doing and don't make any sense to me

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@308770 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-21 18:54:54 +00:00
Matt Arsenault
ecba33a1f4 AMDGPU: Don't track lgkmcnt for global_/scratch_ instructions
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@308766 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-21 18:34:51 +00:00
Mark Searles
cc713d1bdd [AMDGPU] Fix uninit'ed var (RevisitLoop)
Differential Revision: https://reviews.llvm.org/D33907

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304729 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-05 19:29:01 +00:00