Commit Graph

442 Commits

Author SHA1 Message Date
Nicolai Haehnle
a2a1a4f194 AMDGPU: Fix return of non-void-returning shaders
Summary:
Since "AMDGPU: Fix verifier errors in SILowerControlFlow", the logic that
ensures that a non-void-returning shader falls off the end of the last
basic block was effectively disabled, since SI_RETURN is now used.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96731

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: http://reviews.llvm.org/D21975

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274612 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-06 08:35:17 +00:00
Matt Arsenault
18a21c4966 DAGCombiner: Fold away vector extract of insert with the same index
This only really matters when the index is non-constant since the
constant case already gets taken care of by other combines.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274569 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-05 18:25:02 +00:00
Matt Arsenault
6de48c50fe AMDGPU: Fix folding SGPRs into madak/madmk src0
Because of the special immediate operand, the constant
bus is already used so SGPRs are never useful.

r263212 changed the name of the immediate operand, which
broke the verifier check for the restriction.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274564 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-05 17:09:01 +00:00
Matt Arsenault
4a37139fe5 TII: Fix inlineasm size counting comments as insts
The main problem was counting comments on their own
line as instructions.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274405 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-01 23:26:50 +00:00
Matt Arsenault
6e4fd1497d AMDGPU: Add feature for unaligned access
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274398 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-01 23:03:44 +00:00
Matt Arsenault
d4452f8fcf AMDGPU: Expand unaligned accesses early
Due to visit order problems, in the case of an unaligned copy
the legalized DAG fails to eliminate extra instructions introduced
by the expansion of both unaligned parts.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274397 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-01 22:55:55 +00:00
Matt Arsenault
df587174eb AMDGPU: Improve load/store of illegal types.
There was a combine before to handle the simple copy case.
Split this into handling loads and stores separately.

We might want to change how this handles some of the vector
extloads, since this can result in large code size increases.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274394 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-01 22:47:50 +00:00
Nikolay Haustov
2619d465a5 Resubmit r268719 - AMDGPU/SI: Add amdgpu_kernel calling convention. Part 2.
This was reverted in r268740 because of problems with corresponding Clang change.
Clang change was updated and resubmitted in r274220.

Check calling convention in AMDGPUMachineFunction::isKernel

This will be used for AMDGPU_HSA_KERNEL symbol type in output ELF.

Also, in the future unused non-kernels may be optimized.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm, joker.eph, llvm-commits

Differential Revision: http://reviews.llvm.org/D19917

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274341 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-01 10:00:58 +00:00
Matt Arsenault
1bf162a64a AMDGPU: Fix out of bounds indirect indexing errors
This was producing acceses to registers beyond the super
register's limits, resulting in verifier failures.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@273977 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-28 01:09:00 +00:00
Matt Arsenault
d35aece639 AMDGPU: Implement per-function subtargets
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@273940 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-27 20:48:03 +00:00
Matt Arsenault
dca409d5ad AMDGPU: Move subtarget feature checks into passes
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@273937 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-27 20:32:13 +00:00
Matt Arsenault
5123c149e7 AMDGPU: Fix verifier errors with undef vector indices
Also fix pointlessly adding exec to liveins.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@273916 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-27 19:57:44 +00:00
Matt Arsenault
bd288e1778 DAGCombiner: Don't narrow volatile vector loads + extract
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@273909 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-27 19:31:04 +00:00
Jan Vesely
d207fc4c12 AMDGPU/R600: Fix GlobalValue regressions.
Don't cast GV expression to MCSymbolRefExpr. r272705 changed GV to binary
expressions by including offset even if the offset it 0
(we haven't hit this sooner since tested workloads don't include static offsets)
We don't really care about the type of expression, so set it directly.
Fixes: r272705

Consider section relative relocations. Since all const as data is in one boffer section relative is equivalent to abs32.
Fixes: r273166

Differential Revision: http://reviews.llvm.org/D21633

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@273785 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-25 18:24:16 +00:00
Konstantin Zhuravlyov
20c7a48718 [AMDGPU] Emit debugger prologue and emit the rest of the debugger fields in the kernel code header
Debugger prologue is emitted if -mattr=+amdgpu-debugger-emit-prologue.

Debugger prologue writes work group IDs and work item IDs to scratch memory at fixed location in the following format:
  - offset 0: work group ID x
  - offset 4: work group ID y
  - offset 8: work group ID z
  - offset 16: work item ID x
  - offset 20: work item ID y
  - offset 24: work item ID z

Set
  - amd_kernel_code_t::debug_wavefront_private_segment_offset_sgpr to scratch wave offset reg
  - amd_kernel_code_t::debug_private_segment_buffer_sgpr to scratch rsrc reg
  - amd_kernel_code_t::is_debug_supported to true if all debugger features are enabled

Differential Revision: http://reviews.llvm.org/D20335


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@273769 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-25 03:11:28 +00:00
Tom Stellard
16fa6f1061 AMDGPU/SI: Make sure not to fold offsets into local address space globals
Summary:
Offset folding only works if you are emitting relocations, and we don't
emit relocations for local address space globals.

Reviewers: arsenm, nhaustov

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D21647

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@273765 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-25 01:59:16 +00:00
Matthias Braun
f011e37181 MachineScheduler: Fully compare top/bottom candidates
In bidirectional scheduling this gives more stable results than just
comparing the "reason" fields of the top/bottom node because the reason
field may be higher depending on what other nodes are in the queue.

Differential Revision: http://reviews.llvm.org/D19401

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@273755 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-25 00:23:00 +00:00
Matthias Braun
0791b66fef AMDGPU: Define a schedule class for COPY.
COPY was lacking a scheduling class, define it to avoid regressions in
the upcoming change to the bidirectional MachineScheduler. Approved by
tstellar on IRC.

Differential Revision: http://reviews.llvm.org/D21540

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@273751 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-24 23:52:11 +00:00
Matt Arsenault
11c2d4bf28 AMDGPU: Add stub custom CodeGenPrepare pass
This will do various things including ones
CodeGenPrepare does, but with knowledge of uniform
values.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@273657 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-24 07:07:55 +00:00
Matt Arsenault
8b2f86f045 AMDGPU: Un-xfail and add tests
Un XFAIL a few tests plus a few more I had lying around
in my tree, which seem to all work now but I don't see tests
that quite test the same things.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@273655 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-24 06:58:01 +00:00
Matt Arsenault
9af2418e41 AMDGPU: Remove disable-irstructurizer subtarget feature
The only real reason to use it is for testing, so replace
it with a command line option instead of a potentially function
dependent feature.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@273653 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-24 06:30:22 +00:00
Diana Picus
a4a23eae96 [AMDGPU] Remove exit-on-error in test (PR27761)
The exit-on-error flag was necessary in order to avoid an assertion when
handling DYNAMIC_STACKALLOC nodes in SelectionDAGLegalize.

We can avoid the assertion by creating some dummy nodes. This enables us to
remove the exit-on-error flag on the first 2 run lines (SI), but on the third
run line (R600) we would run into another assertion when trying to reserve
indirect registers. This patch also replaces that assertion with an early exit
from the function.

Fixes PR27761.

Differential Revision: http://reviews.llvm.org/D20852

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@273550 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-23 09:19:16 +00:00
Matt Arsenault
fddf7f599f AMDGPU: Fix liveness when expanding m0 loop
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@273514 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-22 23:40:57 +00:00
Changpeng Fang
7cde679f44 AMDGPU/SI: Define an intrinsic to expose ds_swizzle_b32
Reviewers: tstellarAMD, arsenm

Differential Revision: http://reviews.llvm.org/D21533

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@273496 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-22 21:33:49 +00:00
Matt Arsenault
e22857013f AMDGPU: Fix verifier errors in SILowerControlFlow
The main sin this was committing was using terminator
instructions in the middle of the block, and then
not updating the block successors / predecessors.
Split the blocks up to avoid this and introduce new
pseudo instructions for branches taken with exec masking.

Also use a pseudo instead of emitting s_endpgm and erasing
it in the special case of a non-void return.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@273467 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-22 20:15:28 +00:00
Wei Ding
ef86963806 AMDGPU: Add convergent flag to INLINEASM instruction.
Differential Revision: http://reviews.llvm.org/D21214

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@273455 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-22 18:51:08 +00:00
Jan Vesely
7d5ce4d892 AMDGPU: Add implicitarg.ptr intrinsic.
Points to the start of implicit arguments (appended after explicit arguments)

Differential Revision: http://reviews.llvm.org/D20297

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@273317 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-21 20:46:20 +00:00
Matt Arsenault
b2902b2eb0 AMDGPU: Preserve undef flag on vcc when shrinking v_cndmask_b32
The implicit operand is added by the initial instruction construction,
so this was adding an additional vcc use. The original one
was missing the undef flag the original condition had,
so the verifier would complain.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@273182 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-20 18:34:00 +00:00
Matt Arsenault
17f22f98eb AMDGPU: Fold more custom nodes to undef
This will help sneak undefs past GVN into the DAG for
some tests.

Also add missing intrinsic for rsq_legacy, even though the node
was already selected to the instruction. Also start passing
the debug location to intrinsic errors.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@273181 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-20 18:33:56 +00:00
Matt Arsenault
96ad9ea23d Generalize DiagnosticInfoStackSize to support other limits
Backends may want to report errors on resources other than
stack size.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@273177 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-20 18:13:04 +00:00
Matt Arsenault
61691ce470 AMDGPU: Use correct method for determining instruction size
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@273172 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-20 17:51:32 +00:00
Tom Stellard
75473ec73e AMDGPU: Add support for R_AMDGPU_REL32 relocations
Reviewers: arsenm, kzhuravl, rafael

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D21401

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@273168 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-20 17:33:43 +00:00
Tom Stellard
a0adb8d997 AMDGPU: Emit R_AMDGPU_ABS32_{HI,LO} for scratch buffer relocations
Reviewers: arsenm, rafael, kzhuravl

Subscribers: rafael, arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D21400

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@273166 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-20 16:59:44 +00:00
Matt Arsenault
115244a728 AMDGPU: Fix kernel argument alignment impacting stack size
Don't use AllocateStack because kernel arguments have nothing
to do with the stack. The ensureMaxAlignment call was still
changing the stack alignment.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@273080 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-18 05:15:53 +00:00
Matt Arsenault
863cff46f2 AMDGPU: Temporarily select trap to s_endpgm
This should select to s_trap, but that requires
additonal work to setup and enable the trap handler.
For now emit s_endpgm so bugpoint stops getting stuck
on the unsupported call to abort.

Emit a warning that this will only terminate the wave and
not really trap.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@273062 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-17 22:27:03 +00:00
Matt Arsenault
310a3752c0 AMDGPU: Remove llvm.SI.tid intrinsic
Mesa doesn't emit this for llvm >= 3.8 anymore.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@273050 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-17 21:18:41 +00:00
Matt Arsenault
11e5e3bbe1 AMDGPU: Disable scheduling in some slow tests
Disabling the pre-RA scheduler on large-work-group-registers
causes it to be ~50% slower.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272860 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-16 00:56:47 +00:00
Nicolai Haehnle
682fc3e780 AMDGPU: Fix MUBUF offset bugs affecting llvm.amdgcn.buffer.* intrinsics
Summary:
This fixes two related bugs. First, the generic optimization passes
unfortunately generate negative constant offsets but the hardware treats
SOffset as an unsigned value.

Second, there is a hardware bug on SI and CI, where address clamping in MUBUF
instructions does not work correctly when SOffset is larger than the buffer
size. This patch works around this bug by never using SOffset.

An alternative workaround would be to do the clamping manually when SOffset
is too large, but generating the required code sequence during instruction
selection would be rather involved, and in any case the resulting code would
probably be worse.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96360

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D21326

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272761 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-15 07:13:05 +00:00
Matt Arsenault
6af03e5068 AMDGPU: Run pointer optimization passes
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272736 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-15 00:11:01 +00:00
Marek Olsak
760c36c5ae AMDGPU/SI: Set INDEX_STRIDE for scratch coalescing
Summary:
Mesa and other users must set this to enable coalescing:
- STRIDE = 0
- SWIZZLE_ENABLE = 1

This makes one particular compute shader 8x faster.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm, kzhuravl

Differential Revision: http://reviews.llvm.org/D21136

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272556 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-13 16:05:57 +00:00
Tom Stellard
4ee3d0cb4d AMDGPU/SI: Don't use fixup_si_rodata for scratch rsrc relocations
Summary:
We need to set the fixup type to FK_Data_4 for the
SCRATCH_RSRC_DWORD[01] symbols, since these require absolute
relocations, and fixup_si_rodata is for relative relocations.

Reviewers: arsenm, kzhuravl

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: http://reviews.llvm.org/D21153

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272417 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-10 19:26:38 +00:00
Matt Arsenault
dbaa4b4486 AMDGPU: v_cndmask_b32 does not def vcc
Fixes verifier errors after SIShrinkInstructions.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272351 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-10 00:18:41 +00:00
Tom Stellard
60f588f570 AMDGPU/SI: Make sure to emit TargetConstant nodes when matching ds_*permute
Summary:
This fixes a bug with ds_*permute instructions where if it was passed a
constant address, then the offset operand would get assigned a register
operand instead of an immediate.

Reviewers: scchan, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19994

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272349 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-10 00:01:04 +00:00
Matt Arsenault
4080a06a24 AMDGPU: Fix flat atomics
The flat atomics could already be selected, but only
when using flat instructions for global memory. Add
patterns for flat addresses.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272345 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-09 23:42:54 +00:00
Matt Arsenault
bada556f73 AMDGPU: Fix i64 global cmpxchg
This was using extract_subreg sub0 to extract the low register
of the result instead of sub0_sub1, producing an invalid copy.

There doesn't seem to be a way to use the compound subreg indices
in tablegen since those are generated, so manually select it.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272344 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-09 23:42:48 +00:00
Matt Arsenault
003d842e7f AMDGPU: Fix missing and broken check lines in atomic tests
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272343 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-09 23:42:44 +00:00
Wei Ding
39ce7152a2 AMDGPU/SI: Fix 32-bit fdiv lowering
We were using the fast fdiv lowering for all division, implementation of
IEEE754 fdiv is added.

http://reviews.llvm.org/D20557

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272292 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-09 19:17:15 +00:00
Jan Vesely
406c47ff89 SelectionDAG: Implement expansion of {S,U}MIN/MAX in integer legalization
Fixes {u,}long_{min,max,clamp} opencl piglit regressions on EG.

Reviewers: arsenm
Differential Revision: http://reviews.llvm.org/D17898

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272272 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-09 16:04:00 +00:00
Nicolai Haehnle
2ac1fa00c9 AMDGPU: Add amdgpu-ps-wqm-outputs function attributes
Summary:
The presence of this attribute indicates that VGPR outputs should be computed
in whole quad mode. This will be used by Mesa for prolog pixel shaders, so
that derivatives can be taken of shader inputs computed by the prolog, fixing
a bug.

The generated code could certainly be improved: if a prolog pixel shader is
used (which isn't common in modern OpenGL - they're used for gl_Color, polygon
stipples, and forcing per-sample interpolation), Mesa will use this attribute
unconditionally, because it has to be conservative. So WQM may be used in the
prolog when it isn't really needed, and furthermore a silly back-and-forth
switch is likely to happen at the boundary between prolog and main shader
parts.

Fixing this is a bit involved: we'd first have to add a mechanism by which
LLVM writes the WQM-related input requirements to the main shader part binary,
and then Mesa specializes the prolog part accordingly. At that point, we may
as well just compile a monolithic shader...

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95130

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D20839

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272063 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-07 21:37:17 +00:00
Eric Christopher
c2a7f10882 Revert "Differential Revision: http://reviews.llvm.org/D20557"
Author: Wei Ding <wei.ding2@amd.com>
Date:   Tue Jun 7 19:04:44 2016 +0000

    Differential Revision: http://reviews.llvm.org/D20557

    git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272044
    91177308-0d34-0410-b5e6-96231b3b80d8

as it was breaking the bots.

This reverts commit r272044.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272056 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-07 20:27:12 +00:00