Commit Graph

307 Commits

Author SHA1 Message Date
Matt Arsenault
87d1190761 DAGCombiner: Relax alignment restriction when changing store type
If the target allows the alignment, this should be OK.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@267217 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-22 21:01:41 +00:00
Matt Arsenault
625291533e DAGCombiner: Relax alignment restriction when changing load type
If the target allows the alignment, this should still be OK.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@267209 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-22 20:21:36 +00:00
Konstantin Zhuravlyov
1a459df239 [AMDGPU] Insert nop pass: take care of outstanding feedback
- Switch few loops to range-based for loops
- Fix nop insertion at the end of BB
- Fix formatting
- Check for endpgm

Differential Revision: http://reviews.llvm.org/D19380


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@267167 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-22 17:04:51 +00:00
Nicolai Haehnle
3441786e27 AMDGPU/SI: add llvm.amdgcn.ps.live intrinsic
Summary:
This intrinsic returns true if the current thread belongs to a live pixel
and false if it belongs to a pixel that we are executing only for derivative
computation. It will be used by Mesa to implement gl_HelperInvocation.

Note that for pixels that are killed during the shader, this implementation
also returns true, but it doesn't matter because those pixels are always
disabled in the EXEC mask.

This unearthed a corner case in the instruction verifier, which complained
about a v_cndmask 0, 1, exec, exec<imp-use> instruction. That's stupid but
correct code, so make the verifier accept it as such.

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19191

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@267102 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-22 04:04:08 +00:00
Matt Arsenault
88c88b2b19 DAGCombiner: Reduce 64-bit BFE pattern to pattern on 32-bit component
If the extracted bits are restricted to the upper half or lower half,
this can be truncated.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@267024 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-21 18:03:06 +00:00
Mandeep Singh Grang
2599294457 [LLVM] Remove unwanted --check-prefix=CHECK from unit tests. NFC.
Summary: Removed unwanted --check-prefix=CHECK from numerous unit tests.

Reviewers: t.p.northover, dblaikie, uweigand, MatzeB, tstellarAMD, mcrosier

Subscribers: mcrosier, dsanders

Differential Revision: http://reviews.llvm.org/D19279

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@266834 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-19 23:51:52 +00:00
Nicolai Haehnle
318d6a2351 Add IntrWrite[Arg]Mem intrinsic property
Summary:
This property is used to mark an intrinsic that only writes to memory, but
neither reads from memory nor has other side effects.

An example where this is useful is the llvm.amdgcn.buffer.store.format.*
intrinsic, which corresponds to a store instruction that goes through a special
buffer descriptor rather than through a plain pointer.

With this property, the intrinsic should still be handled as having side
effects at the LLVM IR level, but machine scheduling can make smarter
decisions.

Reviewers: tstellarAMD, arsenm, joker.eph, reames

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18291

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@266826 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-19 21:58:33 +00:00
Nicolai Haehnle
fea41fb59c AMDGPU: Guard VOPC instructions against incorrect commute
Summary:
The added testcase, which triggered this, was derived from a shader-db case
via bugpoint. A separate question is why scalar branching wasn't used.

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19208

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@266825 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-19 21:58:22 +00:00
Konstantin Zhuravlyov
5d42fbaf4c [AMDGPU] Add insert nops pass based on subtarget features instead of cl::opt
Also,
- Skip pass if machine module does not have debug info
- Minor comment changes
- Added test

Differential Revision: http://reviews.llvm.org/D19079


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@266626 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-18 16:28:23 +00:00
Matt Arsenault
d24550703e AMDGPU: Enable LocalStackSlotAllocation pass
This resolves more frame indexes early and folds
the immediate offsets into the scratch mubuf instructions.

This cleans up a lot of the mess that's currently emitted,
such as emitting add 0s and repeatedly initializing the same
register to 0 when spilling.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@266508 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-16 02:13:37 +00:00
Matt Arsenault
992b34c001 AMDGPU: Use s_addk_i32 / s_mulk_i32
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@266506 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-16 01:46:49 +00:00
Adrian Prantl
4eeaa0da04 [PR27284] Reverse the ownership between DICompileUnit and DISubprogram.
Currently each Function points to a DISubprogram and DISubprogram has a
scope field. For member functions the scope is a DICompositeType. DIScopes
point to the DICompileUnit to facilitate type uniquing.

Distinct DISubprograms (with isDefinition: true) are not part of the type
hierarchy and cannot be uniqued. This change removes the subprograms
list from DICompileUnit and instead adds a pointer to the owning compile
unit to distinct DISubprograms. This would make it easy for ThinLTO to
strip unneeded DISubprograms and their transitively referenced debug info.

Motivation
----------

Materializing DISubprograms is currently the most expensive operation when
doing a ThinLTO build of clang.

We want the DISubprogram to be stored in a separate Bitcode block (or the
same block as the function body) so we can avoid having to expensively
deserialize all DISubprograms together with the global metadata. If a
function has been inlined into another subprogram we need to store a
reference the block containing the inlined subprogram.

Attached to https://llvm.org/bugs/show_bug.cgi?id=27284 is a python script
that updates LLVM IR testcases to the new format.

http://reviews.llvm.org/D19034
<rdar://problem/25256815>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@266446 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-15 15:57:41 +00:00
Nicolai Haehnle
4c4aae4a77 AMDGPU/SI: Fix regression with no-return atomics
Summary:
In the added test-case, the atomic instruction feeds into a non-machine
CopyToReg node which hasn't been selected yet, so guard against
non-machine opcodes here.

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19043

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@266433 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-15 14:42:36 +00:00
Matt Arsenault
09c4262a3c AMDGPU: Include LDS size in printed comment
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@266382 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-14 22:11:51 +00:00
Matt Arsenault
6d955a1d6d AMDGPU: Run SIFoldOperands after PeepholeOptimizer
PeepholeOptimizer cleans up redundant copies, which makes
the operand folding more effective.

shader-db stats:

Totals:
SGPRS: 34200 -> 34336 (0.40 %)
VGPRS: 22118 -> 21655 (-2.09 %)
Code Size: 632144 -> 633460 (0.21 %) bytes
LDS: 11 -> 11 (0.00 %) blocks
Scratch: 10240 -> 11264 (10.00 %) bytes per wave
Max Waves: 8822 -> 8918 (1.09 %)
Wait states: 0 -> 0 (0.00 %)

Totals from affected shaders:
SGPRS: 7704 -> 7840 (1.77 %)
VGPRS: 5169 -> 4706 (-8.96 %)
Code Size: 234444 -> 235760 (0.56 %) bytes
LDS: 2 -> 2 (0.00 %) blocks
Scratch: 0 -> 1024 (0.00 %) bytes per wave
Max Waves: 1188 -> 1284 (8.08 %)
Wait states: 0 -> 0 (0.00 %)

Increases:
SGPRS: 35 (0.01 %)
VGPRS: 1 (0.00 %)
Code Size: 59 (0.02 %)
LDS: 0 (0.00 %)
Scratch: 1 (0.00 %)
Max Waves: 48 (0.02 %)
Wait states: 0 (0.00 %)

Decreases:
SGPRS: 26 (0.01 %)
VGPRS: 54 (0.02 %)
Code Size: 68 (0.03 %)
LDS: 0 (0.00 %)
Scratch: 0 (0.00 %)
Max Waves: 4 (0.00 %)
Wait states: 0 (0.00 %)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@266378 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-14 21:58:24 +00:00
Matt Arsenault
38b22579e0 AMDGPU: Fold bitcasts of scalar constants to vectors
This cleans up some messes since the individual scalar components
can be CSEed.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@266376 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-14 21:58:07 +00:00
Tom Stellard
65b55414cc AMDGPU: Add skeleton GlobalIsel implementation
Summary:
This adds the necessary target code to be able to run the ir translator.
Lowering function arguments and returns is a nop and there is no support
for RegBankSelect.

Reviewers: arsenm, qcolombet

Subscribers: arsenm, joker.eph, vkalintiris, llvm-commits

Differential Revision: http://reviews.llvm.org/D19077

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@266356 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-14 19:09:28 +00:00
Nicolai Haehnle
a16ecd4300 [DivergenceAnalysis] Treat PHI with incoming undef as constant
Summary:
If a PHI has an incoming undef, we can pretend that it is equal to one
non-undef, non-self incoming value.

This is particularly relevant in combination with the StructurizeCFG
pass, which introduces PHI nodes with undefs. Previously, this lead to
branch conditions that were uniform before StructurizeCFG to become
non-uniform afterwards, which confused the SIAnnotateControlFlow
pass.

This fixes a crash when Mesa radeonsi compiles a shader from
dEQP-GLES3.functional.shaders.switch.switch_in_for_loop_dynamic_vertex

Reviewers: arsenm, tstellarAMD, jingyue

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D19013

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@266347 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-14 17:42:47 +00:00
Nicolai Haehnle
d07340545e AMDGPU: Remove SIFixSGPRLiveRanges pass
Summary:
This pass is unnecessary and overly conservative. It was motivated by
situations like

  def %vreg0:SGPR_32
  ...
if-block:
  ..
  def %vreg1:SGPR_32
  ...
else-block:
  ...
  use %vreg0:SGPR_32
  ...

and similar situations with uses after the non-uniform control flow, where
we are not allowed to assign %vreg0 and %vreg1 to the same physical register,
even though in the original, thread/workitem-based CFG, it looks like the
live ranges of these registers do not overlap.

However, by the time register allocation runs, we have moved to a wave-based
CFG that accurately represents the fact that the wave may run through both
the if- and the else-block. So the live ranges of %vreg0 and %vreg1 already
overlap even without the SIFixSGPRLiveRanges pass.

In addition to proving this change correct, I have tested it with Piglit
and a small number of other tests.

Reviewers: arsenm, tstellarAMD

Subscribers: MatzeB, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19041

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@266345 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-14 17:42:29 +00:00
Tom Stellard
510a2b9622 AMDGPU: allow specifying a workgroup size that needs to fit in a compute unit
Summary:
For GL_ARB_compute_shader we need to support workgroup sizes of at least 1024. However, if we want to allow large workgroup sizes, we may need to use less registers, as we have to run more waves per SIMD.

This patch adds an attribute to specify the maximum work group size the compiled program needs to support. It defaults, to 256, as that has no wave restrictions.

Reducing the number of registers available is done similarly to how the registers were reserved for chips with the sgpr init bug.

Reviewers: mareko, arsenm, tstellarAMD, nhaehnle

Subscribers: FireBurn, kerberizer, llvm-commits, arsenm

Differential Revision: http://reviews.llvm.org/D18340

Patch By: Bas Nieuwenhuizen

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@266337 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-14 16:27:07 +00:00
Tom Stellard
d5f7d71de6 AMDGPU/SI: Use the correct scratch wave offset register for shaders.
Summary:
The code previously always used s1 as it was using the user + system SGPR
information for compute kernels. This is incorrect for Mesa shaders though,

The register should be the next SGPR after all user and system SGPR's.
We use that Mesa adds arguments for all input and system SGPR's and
take the next available SGPR for the scratch wave offset register.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

Reviewers: mareko, arsenm, nhaehnle, tstellarAMD

Subscribers: qcolombet, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18941

Patch By: Bas Nieuwenhuizen

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@266336 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-14 16:27:03 +00:00
Matt Arsenault
f5cccc3f63 AMDGPU: Implement canonicalize
Also add generic DAG node for it.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@266272 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-14 01:42:16 +00:00
Artem Tamazov
075abcbd06 [AMDGPU][llvm-mc] Support of Trap Handler registers (TTMP0..11 and TBA/TMA)git status
Tests added along with implemented feature.
Note that there is a small leftover of unecessary MI sheduling issue
(more info in the review). CodeGen/AMDGPU/salu-to-valu.ll updated to fix
the false regression.

TODO: Support for TTMP quads, comma-separated syntax in "[]" and more.

Differential Revision: http://reviews.llvm.org/D17825

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@266205 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-13 16:18:41 +00:00
Matt Arsenault
f294b0931f AMDGPU: Add test for m0 initialization in basic loop
Initialization of m0 is emitted for each LDS operation, so
every block with LDS usage ends up with one. MachineLICM
used to fail to hoist this out of the loop, so every loop
iteration with LDS usage in it would re-initialize it.

This seems to be fixed now, so add a test to make sure that
it stays this way.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@266156 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-13 00:39:52 +00:00
Nicolai Haehnle
756309c45b AMDGPU: add llvm.amdgcn.buffer.load/store intrinsics
Summary:
They correspond to BUFFER_LOAD/STORE_DWORD[_X2,X3,X4] and mostly behave like
llvm.amdgcn.buffer.load/store.format. They will be used by Mesa for SSBO and
atomic counters at least when robust buffer access behavior is desired.
(These instructions perform no format conversion and do buffer range checking
per component.)

As a side effect of sharing patterns with llvm.amdgcn.buffer.store.format,
it has become trivial to add support for the f32 and v2f32 variants of that
intrinsic, so the patch does so.

Also DAG-ify (and fix) some tests that I noticed intermittent failures in
while developing this patch.

Some tests were (temporarily) adjusted for the required mayLoad/hasSideEffects
changes to the BUFFER_STORE_DWORD* instructions. See also
http://reviews.llvm.org/D18291.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18292

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@266126 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-12 21:18:10 +00:00
Tom Stellard
cb6c943dc2 AMDGPU/SI: Insert wait states required after v_readfirstlane on SI
Summary:
We will be able to handle this case much better once the hazard recognizer
is finished, but this conservative implementation  fixes a hang with the piglit
test:

spec/arb_arrays_of_arrays/execution/sampler/fs-nested-struct-arrays-nonconst-nested-arra

Reviewers: arsenm, nhaehnle

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18988

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@266105 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-12 18:40:43 +00:00
Matt Arsenault
87f61332d1 AMDGPU: Eliminate half of i64 or if one operand is zero_extend from i32
This helps clean up some of the mess when expanding unaligned 64-bit
loads when changed to be promote to v2i32, and fixes situations
where or x, 0 was emitted after splitting 64-bit ors during moveToVALU.

I think this could be a generic combine but I'm not sure.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@266104 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-12 18:24:38 +00:00
Nicolai Haehnle
4bd7005237 AMDGPU/SI: Fix a mis-compilation of multi-level breaks
Summary:
Under certain circumstances, multi-level breaks (or what is understood by
the control flow passes as such) could be miscompiled in a way that causes
infinite loops, by emitting incorrect control flow intrinsics.

This fixes a hang in
dEQP-GLES3.functional.shaders.loops.while_dynamic_iterations.conditional_continue_vertex

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18967

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@266088 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-12 16:10:38 +00:00
Matt Arsenault
d8f221e6c0 AMDGPU: Implement i64 global atomics
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@266075 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-12 14:05:11 +00:00
Matt Arsenault
bc0aee542f AMDGPU: Add atomic_inc + atomic_dec intrinsics
These are different than atomicrmw add 1 because they have
an additional input value to clamp the result.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@266074 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-12 14:05:04 +00:00
Matt Arsenault
b26a693dfd AMDGPU: Add volatile to test loads and stores
When the memory vectorizer is enabled, these tests break.
These tests don't really care about the memory instructions,
and it's easier to write check lines with the unmerged loads.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@266071 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-12 13:38:18 +00:00
Tom Stellard
62063a8486 Revert "AMDGPU/SI: Do not generate s_waitcnt after ds_permute/ds_bpermute"
This reverts commit r263720.

Just confirmed that s_waitcnt is required after ds_permute/ds_bpermute.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265992 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-11 20:38:40 +00:00
Jan Vesely
62aa62a6e9 AMDGPU/SI: Implement atomic load/store for i32 and i64
Standard load/store instructions with GLC bit set.

Reviewers: tstellardAMD, arsenm

Differential Revision: http://reviews.llvm.org/D18760

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265709 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-07 19:23:11 +00:00
Tom Stellard
f313dae6f3 AMDGPU/SI: Add latency for export instructions
Reviewers: arsenm, nhaehnle

Subscribers: nhaehnle, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18599

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265708 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-07 18:30:05 +00:00
Nicolai Haehnle
ea7a0c0467 AMDGPU: Add a shader calling convention
This makes it possible to distinguish between mesa shaders
and other kernels even in the presence of compute shaders.

Patch By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

Differential Revision: http://reviews.llvm.org/D18559

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265589 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-06 19:40:20 +00:00
Konstantin Zhuravlyov
e1d66f4ce3 [AMDGPU] Emit linkonce and linkonce_odr symbols
Differential Revision: http://reviews.llvm.org/D18726


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265408 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-05 16:00:58 +00:00
Tom Stellard
059753cf8e AMDGPU: Implement {BUFFER,FLAT}_ATOMIC_CMPSWAP{,_X2}
Summary:
Implement BUFFER_ATOMIC_CMPSWAP{,_X2} instructions on all GCN targets, and FLAT_ATOMIC_CMPSWAP{,_X2} on CI+.

32-bit instruction variants tested manually on Kabini and Bonaire. Tests and parts of code provided by Jan Veselý.

Patch by: Vedran Miletić

Reviewers: arsenm, tstellarAMD, nhaehnle

Subscribers: jvesely, scchan, kanarayan, arsenm

Differential Revision: http://reviews.llvm.org/D17280

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265170 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-01 18:27:37 +00:00
Adrian Prantl
7876f64bc3 testcase gardening: update the emissionKind enum to the new syntax. (NFC)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265081 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-01 00:16:49 +00:00
Matt Arsenault
e8cd894c28 AMDGPU: Add frexp_exp intrinsic
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264944 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 22:28:52 +00:00
Tom Stellard
7725fd8c02 AMDGPU/SI: Improve MachineSchedModel definition
This patch contains a few improvements to the model, including:

- Using a single resource with a defined buffers size for each memory unit.
- Setting the IssueWidth correctly.
- Fixing latency values for memory instructions.

shader-db stats:

16429 shaders in 3231 tests
Totals:
SGPRS: 318232 -> 312328 (-1.86 %)
VGPRS: 208996 -> 209346 (0.17 %)
Code Size: 7147044 -> 7166440 (0.27 %) bytes
LDS: 83 -> 83 (0.00 %) blocks
Scratch: 1862656 -> 1459200 (-21.66 %) bytes per wave
Max Waves: 49182 -> 49243 (0.12 %)
Wait states: 0 -> 0 (0.00 %)A

Differential Revision: http://reviews.llvm.org/D18453

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264877 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 16:35:13 +00:00
Tom Stellard
d3adac51fc AMDGPU/SI: Enable lanemask tracking in misched
Summary:
This results in higher register usage, but should make it easier for
the compiler to hide latency.

This pass is a prerequisite for some more scheduler improvements, and I
think the increase register usage with this patch is acceptable, because
when combined with the scheduler improvements, the total register usage
will decrease.

shader-db stats:

2382 shaders in 478 tests
Totals:
SGPRS: 48672 -> 49088 (0.85 %)
VGPRS: 34148 -> 34847 (2.05 %)
Code Size: 1285816 -> 1289128 (0.26 %) bytes
LDS: 28 -> 28 (0.00 %) blocks
Scratch: 492544 -> 573440 (16.42 %) bytes per wave
Max Waves: 6856 -> 6846 (-0.15 %)
Wait states: 0 -> 0 (0.00 %)

Depends on D18451

Reviewers: nhaehnle, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18452

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264876 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 16:35:09 +00:00
Sanjay Patel
206cd3a64e fix checks: *_DAG -> *-DAG
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264676 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-28 22:11:06 +00:00
Matthias Braun
1598569f70 CodeGen: Correct specification of PHI nodes
They do have a def machine operand.

Fixing the definition is necessary for an upcoming patch.

Differential Revision: http://reviews.llvm.org/D18384

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264607 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-28 18:18:41 +00:00
Tom Stellard
4a79dec8e2 AMDGPU/SI: Limit load clustering to 16 bytes instead of 4 instructions
Summary:
This helps prevent load clustering from drastically increasing register
pressure by trying to cluster 4 SMRDx8 loads together.  The limit of 16
bytes was chosen, because it seems like that was the original intent
of setting the limit to 4 instructions, but more analysis could show
that a different limit is better.

This fixes yields small decreases in register usage with shader-db, but
also helps avoid a large increase in register usage when lane mask
tracking is enabled in the machine scheduler, because lane mask tracking
enables more opportunities for load clustering.

shader-db stats:

2379 shaders in 477 tests
Totals:
SGPRS: 49744 -> 48600 (-2.30 %)
VGPRS: 34120 -> 34076 (-0.13 %)
Code Size: 1282888 -> 1283184 (0.02 %) bytes
LDS: 28 -> 28 (0.00 %) blocks
Scratch: 495616 -> 492544 (-0.62 %) bytes per wave
Max Waves: 6843 -> 6853 (0.15 %)
Wait states: 0 -> 0 (0.00 %)

Reviewers: nhaehnle, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18451

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264589 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-28 16:10:13 +00:00
Matthias Braun
afb111acf7 LiveInterval: Fix Distribute() failing on liveranges with unused VNInfos
This fixes http://llvm.org/PR26991

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264345 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-24 21:41:38 +00:00
Matt Arsenault
54f7128a27 AMDGPU: Remove atomic inc/dec patterns
There is no benefit to these since materializing the constant 1
requires the same number of instructions as materializing uint_max

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264215 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-23 23:23:38 +00:00
Matt Arsenault
fc80e900d8 AMDGPU: Promote alloca should skip volatiles
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264214 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-23 23:17:29 +00:00
Matt Arsenault
222f9e97f6 AMDGPU: Insert moves of frame index to value operands
Strengthen tests of storing frame indices.

Right now this just creates irrelevant scheduling changes.

We don't want to have multiple frame index operands
on an instruction. There seem to be various assumptions
that at least the same frame index will not appear twice
in the LocalStackSlotAllocation pass.

There's no reason to have this happen, and it just
makes it easy to introduce bugs where the immediate
offset is appplied to the storing instruction when it should
really be applied to the value being stored as a separate
add.

This might not be sufficient. It might still be problematic
to have an add fi, fi situation, but that's even less unlikely
to happen in real code.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264200 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-23 21:49:25 +00:00
Nicolai Haehnle
f0b7f107b9 AMDGPU: Add SIWholeQuadMode pass
Summary:
Whole quad mode is already enabled for pixel shaders that compute
derivatives, but it must be suspended for instructions that cause a
shader to have side effects (i.e. stores and atomics).

This pass addresses the issue by storing the real (initial) live mask
in a register, masking EXEC before instructions that require exact
execution and (re-)enabling WQM where required.

This pass is run before register coalescing so that we can use
machine SSA for analysis.

The changes in this patch expose a problem with the second machine
scheduling pass: target independent instructions like COPY implicitly
use EXEC when they operate on VGPRs, but this fact is not encoded in
the MIR. This can lead to miscompilation because instructions are
moved past changes to EXEC.

This patch fixes the problem by adding use-implicit operands to
target independent instructions. Some general codegen passes are
relaxed to work with such implicit use operands.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: MatzeB, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18162

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263982 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-21 20:28:33 +00:00
Tom Stellard
1f213b9b37 AMDGPU/SI: Fix threshold calculation for branching when exec is zero
Summary:
When control flow is implemented using the exec mask, the compiler will
insert branch instructions to skip over the masked section when exec is
zero if the section contains more than a certain number of instructions.

The previous code would only count instructions in successor blocks,
and this patch modifies the code to start counting instructions in all
blocks between the start and end of the branch.

Reviewers: nhaehnle, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18282

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263969 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-21 18:56:58 +00:00