Commit Graph

409 Commits

Author SHA1 Message Date
Matt Arsenault
115244a728 AMDGPU: Fix kernel argument alignment impacting stack size
Don't use AllocateStack because kernel arguments have nothing
to do with the stack. The ensureMaxAlignment call was still
changing the stack alignment.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@273080 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-18 05:15:53 +00:00
Matt Arsenault
863cff46f2 AMDGPU: Temporarily select trap to s_endpgm
This should select to s_trap, but that requires
additonal work to setup and enable the trap handler.
For now emit s_endpgm so bugpoint stops getting stuck
on the unsupported call to abort.

Emit a warning that this will only terminate the wave and
not really trap.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@273062 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-17 22:27:03 +00:00
Matt Arsenault
310a3752c0 AMDGPU: Remove llvm.SI.tid intrinsic
Mesa doesn't emit this for llvm >= 3.8 anymore.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@273050 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-17 21:18:41 +00:00
Matt Arsenault
11e5e3bbe1 AMDGPU: Disable scheduling in some slow tests
Disabling the pre-RA scheduler on large-work-group-registers
causes it to be ~50% slower.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272860 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-16 00:56:47 +00:00
Nicolai Haehnle
682fc3e780 AMDGPU: Fix MUBUF offset bugs affecting llvm.amdgcn.buffer.* intrinsics
Summary:
This fixes two related bugs. First, the generic optimization passes
unfortunately generate negative constant offsets but the hardware treats
SOffset as an unsigned value.

Second, there is a hardware bug on SI and CI, where address clamping in MUBUF
instructions does not work correctly when SOffset is larger than the buffer
size. This patch works around this bug by never using SOffset.

An alternative workaround would be to do the clamping manually when SOffset
is too large, but generating the required code sequence during instruction
selection would be rather involved, and in any case the resulting code would
probably be worse.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96360

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D21326

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272761 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-15 07:13:05 +00:00
Matt Arsenault
6af03e5068 AMDGPU: Run pointer optimization passes
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272736 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-15 00:11:01 +00:00
Marek Olsak
760c36c5ae AMDGPU/SI: Set INDEX_STRIDE for scratch coalescing
Summary:
Mesa and other users must set this to enable coalescing:
- STRIDE = 0
- SWIZZLE_ENABLE = 1

This makes one particular compute shader 8x faster.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm, kzhuravl

Differential Revision: http://reviews.llvm.org/D21136

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272556 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-13 16:05:57 +00:00
Tom Stellard
4ee3d0cb4d AMDGPU/SI: Don't use fixup_si_rodata for scratch rsrc relocations
Summary:
We need to set the fixup type to FK_Data_4 for the
SCRATCH_RSRC_DWORD[01] symbols, since these require absolute
relocations, and fixup_si_rodata is for relative relocations.

Reviewers: arsenm, kzhuravl

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: http://reviews.llvm.org/D21153

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272417 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-10 19:26:38 +00:00
Matt Arsenault
dbaa4b4486 AMDGPU: v_cndmask_b32 does not def vcc
Fixes verifier errors after SIShrinkInstructions.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272351 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-10 00:18:41 +00:00
Tom Stellard
60f588f570 AMDGPU/SI: Make sure to emit TargetConstant nodes when matching ds_*permute
Summary:
This fixes a bug with ds_*permute instructions where if it was passed a
constant address, then the offset operand would get assigned a register
operand instead of an immediate.

Reviewers: scchan, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19994

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272349 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-10 00:01:04 +00:00
Matt Arsenault
4080a06a24 AMDGPU: Fix flat atomics
The flat atomics could already be selected, but only
when using flat instructions for global memory. Add
patterns for flat addresses.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272345 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-09 23:42:54 +00:00
Matt Arsenault
bada556f73 AMDGPU: Fix i64 global cmpxchg
This was using extract_subreg sub0 to extract the low register
of the result instead of sub0_sub1, producing an invalid copy.

There doesn't seem to be a way to use the compound subreg indices
in tablegen since those are generated, so manually select it.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272344 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-09 23:42:48 +00:00
Matt Arsenault
003d842e7f AMDGPU: Fix missing and broken check lines in atomic tests
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272343 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-09 23:42:44 +00:00
Wei Ding
39ce7152a2 AMDGPU/SI: Fix 32-bit fdiv lowering
We were using the fast fdiv lowering for all division, implementation of
IEEE754 fdiv is added.

http://reviews.llvm.org/D20557

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272292 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-09 19:17:15 +00:00
Jan Vesely
406c47ff89 SelectionDAG: Implement expansion of {S,U}MIN/MAX in integer legalization
Fixes {u,}long_{min,max,clamp} opencl piglit regressions on EG.

Reviewers: arsenm
Differential Revision: http://reviews.llvm.org/D17898

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272272 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-09 16:04:00 +00:00
Nicolai Haehnle
2ac1fa00c9 AMDGPU: Add amdgpu-ps-wqm-outputs function attributes
Summary:
The presence of this attribute indicates that VGPR outputs should be computed
in whole quad mode. This will be used by Mesa for prolog pixel shaders, so
that derivatives can be taken of shader inputs computed by the prolog, fixing
a bug.

The generated code could certainly be improved: if a prolog pixel shader is
used (which isn't common in modern OpenGL - they're used for gl_Color, polygon
stipples, and forcing per-sample interpolation), Mesa will use this attribute
unconditionally, because it has to be conservative. So WQM may be used in the
prolog when it isn't really needed, and furthermore a silly back-and-forth
switch is likely to happen at the boundary between prolog and main shader
parts.

Fixing this is a bit involved: we'd first have to add a mechanism by which
LLVM writes the WQM-related input requirements to the main shader part binary,
and then Mesa specializes the prolog part accordingly. At that point, we may
as well just compile a monolithic shader...

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95130

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D20839

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272063 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-07 21:37:17 +00:00
Eric Christopher
c2a7f10882 Revert "Differential Revision: http://reviews.llvm.org/D20557"
Author: Wei Ding <wei.ding2@amd.com>
Date:   Tue Jun 7 19:04:44 2016 +0000

    Differential Revision: http://reviews.llvm.org/D20557

    git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272044
    91177308-0d34-0410-b5e6-96231b3b80d8

as it was breaking the bots.

This reverts commit r272044.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272056 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-07 20:27:12 +00:00
Wei Ding
e2d1122183 Differential Revision: http://reviews.llvm.org/D20557
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272044 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-07 19:04:44 +00:00
Matt Arsenault
bc1b8d5b49 AMDGPU: Fix constantexpr addrspacecasts
If we had a constant group address space cast the queue pointer
wasn't enabled for the function, resulting in a crash on noreg
later.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271935 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-06 20:03:31 +00:00
Artem Tamazov
7049ac906c [AMDGPU][llvm-mc] v_cndmask_b32: src2 is mandatory; do not enforce VOP2 when src2 == VCC.
Another step for unification llvm assembler/disassembler with sp3.
Besides, CodeGen output is a bit improved, thus changes in CodeGen tests.
Assembler/Disassembler tests updated/added.

Differential Revision: http://reviews.llvm.org/D20796

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271900 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-06 15:23:43 +00:00
Matt Arsenault
29d0ea4bc8 AMDGPU: Cleanup load tests
There are a lot of different kinds of loads to test for,
and these were scattered around inconsistently with
some redundancy. Try to comprehensively test all loads
in a consistent way.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271571 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-02 19:54:26 +00:00
Matt Arsenault
747c0a6e8b AMDGPU: Temporary fix for broken store combine
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271567 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-02 19:00:55 +00:00
Matt Arsenault
9d6fa96f46 AMDGPU: Fix crashes on unknown processor name
If the processor name failed to parse for amdgcn,
the resulting output would have R600 ISA in it.

If the processor name was missing or invalid for R600,
the wavefront size would not be set and there would be
crashes from missing itinerary data.

Fixes crashes in future commit caused by dividing by the unset/0
wavefront size.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271561 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-02 18:37:16 +00:00
Matthias Braun
1cd242fe11 CodeGen: Refactor renameDisconnectedComponents() as a pass
Refactor LiveIntervals::renameDisconnectedComponents() to be a pass.
Also change the name to "RenameIndependentSubregs":

- renameDisconnectedComponents() worked on a MachineFunction at a time
  so it is a natural candidate for a machine function pass.

- The algorithm is testable with a .mir test now.

- This also fixes a problem where the lazy renaming as part of the
  MachineScheduler introduced IMPLICIT_DEF instructions after the number
  of a nodes in a region were counted leading to a mismatch.

Differential Revision: http://reviews.llvm.org/D20507

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271345 91177308-0d34-0410-b5e6-96231b3b80d8
2016-05-31 22:38:06 +00:00
Matt Arsenault
c3eeba0f4c AMDGPU: Cleanup vector insert/extract tests
This mostly makes sure that 3-vector dynamic inserts
and extracts are covered.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271082 91177308-0d34-0410-b5e6-96231b3b80d8
2016-05-28 00:51:06 +00:00
Matt Arsenault
14cb586d5e AMDGPU: Add fract intrinsic
Remove broken patterns matching it. This was matching the
unsafe math pattern and expanding the fix for the buggy instruction
from the pattern. The problems are also on CI. Remove the workarounds
and only use fract with unsafe math or from the intrinsic.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271078 91177308-0d34-0410-b5e6-96231b3b80d8
2016-05-28 00:19:52 +00:00
Changpeng Fang
faf7289db3 AMDGPU/SI: Enable load-store-opt by default.
Summary: Enable load-store-opt by default, and update LIT tests.

Reviewers: arsenm

Differential Revision: http://reviews.llvm.org/D20694

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@270894 91177308-0d34-0410-b5e6-96231b3b80d8
2016-05-26 19:35:29 +00:00
Diana Picus
f46038dc53 [AMDGPU] Remove exit-on-error flag from test (PR27762)
Similar to r269948, but for argument lowering.

Fixes PR27762

Differential Revision: http://reviews.llvm.org/D20430

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@270856 91177308-0d34-0410-b5e6-96231b3b80d8
2016-05-26 15:24:55 +00:00
Matt Arsenault
2997ae6e3e AMDGPU: Fix v2i64/v2f64 bitcasts
These operations tend to get promoted away to v4i32 so
this doesn't happen often.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@270740 91177308-0d34-0410-b5e6-96231b3b80d8
2016-05-25 18:07:36 +00:00
Matt Arsenault
211d1cd5a3 AMDGPU: Fix missing br_cc i1 test coverage
Also un xfail a test.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@270739 91177308-0d34-0410-b5e6-96231b3b80d8
2016-05-25 17:58:27 +00:00
Matt Arsenault
53d233a178 AMDGPU: Make vectorization defeating test changes
Simplifies test updates in the future.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@270736 91177308-0d34-0410-b5e6-96231b3b80d8
2016-05-25 17:42:39 +00:00
Matt Arsenault
068cdecac2 AMDGPU: Fix inconsistent lowering of select of vectors
f32 vectors would use a sequence of BFI instructions instead
of unrolled cmp + select. This was better in the case of a VALU
select with SGPR inputs, but we don't have a way of dealing with that
in the DAG.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@270731 91177308-0d34-0410-b5e6-96231b3b80d8
2016-05-25 17:34:58 +00:00
Konstantin Zhuravlyov
d7b9b912dd [AMDGPU][NFC] Rename ReserveTrapVGPRs -> ReserveRegs
Differential Revision: http://reviews.llvm.org/D20081


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@270594 91177308-0d34-0410-b5e6-96231b3b80d8
2016-05-24 18:37:18 +00:00
Matt Arsenault
03ca6fb151 AMDGPU: Define priorities for register classes
Allocating larger register classes first should give better allocation
results (and more importantly for myself, make the lit tests more stable
with respect to scheduler changes).

Patch by Matthias Braun

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@270312 91177308-0d34-0410-b5e6-96231b3b80d8
2016-05-21 03:55:07 +00:00
Matt Arsenault
be522c6214 AMDGPU: Cleanup lowering actions
These are kind of a mess and hard to follow, particularly
for loads and stores. Fix various redundant, unnecessary
and dead settings.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@270307 91177308-0d34-0410-b5e6-96231b3b80d8
2016-05-21 02:27:49 +00:00
Matt Arsenault
4e5b30a0a9 AMDGPU: Fix high bits after division optimization
This is essentially doing a 24-bit signed division with FP.
We need to truncate to the N bit result.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@270305 91177308-0d34-0410-b5e6-96231b3b80d8
2016-05-21 01:53:33 +00:00
Matt Arsenault
6416e4c521 AMDGPU: Fix verifier error when spilling SGPRs
The current SGPR spilling test does not stress this
because it is using s_buffer_load instructions to
increase SGPR pressure and spill, but their output
operands have the same SReg_32_XM0 constraint. This fixes
an error when the SReg_32 output from most instructions
is spilled.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@270301 91177308-0d34-0410-b5e6-96231b3b80d8
2016-05-21 00:53:42 +00:00
Matt Arsenault
9d922be248 AMDGPU: Handle cbranch vccz/vccnz
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@270297 91177308-0d34-0410-b5e6-96231b3b80d8
2016-05-21 00:29:40 +00:00
Matt Arsenault
dcb6543de5 AMDGPU: Implement ReverseBranchCondition
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@270296 91177308-0d34-0410-b5e6-96231b3b80d8
2016-05-21 00:29:34 +00:00
Matt Arsenault
f91238f391 AMDGPU: Implement AnalyzeBranch
Original patch by Tom Stellard

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@270295 91177308-0d34-0410-b5e6-96231b3b80d8
2016-05-21 00:29:27 +00:00
Matthias Braun
6054e84d82 LiveIntervalAnalysis: Rework constructMainRangeFromSubranges()
We now use LiveRangeCalc::extendToUses() instead of a specially designed
algorithm in constructMainRangeFromSubranges():
- The original motivation for constructMainRangeFromSubranges() were
  differences between the main liverange and subranges because of hidden
  dead definitions. This case however cannot happen anymore with the
  DetectDeadLaneMasks pass in place.
- It simplifies the code.
- This fixes a longstanding bug where we did not properly create new SSA
  values on merging control flow (the MachineVerifier missed most of
  these cases).
- Move constructMainRangeFromSubranges() to LiveIntervalAnalysis and
  LiveRangeCalc to better match the implementation/available helper
  functions.

This re-applies r269016. The fixes from r270290 and r270259 should avoid
the machine verifier problems this time.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@270291 91177308-0d34-0410-b5e6-96231b3b80d8
2016-05-20 23:14:56 +00:00
Matthias Braun
d8eb7dec3e MachineVerifier: subregs so not require defs/valnos on every path
It is fine for subregister ranges to be undefined on some CFG paths as
we may have a "vregX:other_subreg<read-undef> =" def on that path. We
do not (and should not) have live segments for the subregister ranges.
The MachineVerifier should not complain about this.

This is a slight variant of http://llvm.org/PR27705

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@270290 91177308-0d34-0410-b5e6-96231b3b80d8
2016-05-20 23:02:13 +00:00
Matthias Braun
2a73788c72 LiveIntervalAnalysis: Fix missing defs in renameDisconnectedComponents().
Fix renameDisconnectedComponents() creating vreg uses that can be
reached from function begin withouthaving a definition (or explicit
live-in). Fix this by inserting IMPLICIT_DEF instruction before
control-flow joins as necessary.

Removes an assert from MachineScheduler because we may now get
additional IMPLICIT_DEF when preparing the scheduling policy.

This fixes the underlying problem of http://llvm.org/PR27705

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@270259 91177308-0d34-0410-b5e6-96231b3b80d8
2016-05-20 19:46:13 +00:00
Matt Arsenault
44aaff08ed AMDGPU: Fix promote alloca for pointer loads
If the load has a pointer type, we don't want to change
its type.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@270000 91177308-0d34-0410-b5e6-96231b3b80d8
2016-05-18 23:20:24 +00:00
Matt Arsenault
3cd52aec7c AMDGPU: Other sizes of popcnt are fast
We can chain bcnt instructions together, so
any width popcnt is pretty fast.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@269950 91177308-0d34-0410-b5e6-96231b3b80d8
2016-05-18 16:10:19 +00:00
Matt Arsenault
5d9f8fb9d4 AMDGPU: Fix assert when erroring on a call
For some reason an assert is now hit when a valid chain
is not returned, so return the entry chain.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@269948 91177308-0d34-0410-b5e6-96231b3b80d8
2016-05-18 16:10:11 +00:00
Matt Arsenault
41cf920df5 AMDGPU: Handle alloca promoting with null operands
If the second pointer in a multi-pointer instruction is
a constant, we can replace the type.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@269945 91177308-0d34-0410-b5e6-96231b3b80d8
2016-05-18 15:57:21 +00:00
Matt Arsenault
c33f9cd287 AMDGPU: Fix a few slightly broken tests
Fix minor bugs and uses of undef which break when
pointer related optimization passes are run.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@269944 91177308-0d34-0410-b5e6-96231b3b80d8
2016-05-18 15:48:44 +00:00
Jan Vesely
350e40ffb2 AMDGPU/R600: Use correct number of vector elements when lowering private loads
Reviewer: tstellardAMD, arsenm

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: http://reviews.llvm.org/D20032

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@269725 91177308-0d34-0410-b5e6-96231b3b80d8
2016-05-16 23:56:32 +00:00
Matt Arsenault
abc9f47dfe AMDGPU: Add some private element size tests
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@269712 91177308-0d34-0410-b5e6-96231b3b80d8
2016-05-16 22:17:27 +00:00