Commit Graph

509 Commits

Author SHA1 Message Date
Matt Arsenault
4fd45ebabd AMDGPU: Fix shouldConvertConstantLoadToIntImm behavior
This should really be true for any immediate, not just
inline ones.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277260 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-30 01:40:36 +00:00
Changpeng Fang
539fec5dc2 AMDGPU/SI: Don't handle a loop if there is no loop at all for a terminator BB.
Differential Revision: http://reviews.llvm.org/D22021

Reviewed by: arsenm

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277073 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-28 23:01:45 +00:00
Wei Ding
ee8c4ca1e1 AMDGPU : Add intrinsics for compare with the full wavefront result
Differential Revision: http://reviews.llvm.org/D22482

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276998 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-28 16:42:13 +00:00
Nicolai Haehnle
b18ca96c79 AMDGPU: add execfix flag to SI_ELSE
Summary:
SI_ELSE is lowered into two parts:

s_or_saveexec_b64 dst, src (at the start of the basic block)

s_xor_b64 exec, exec, dst (at the end of the basic block)

The idea is that dst contains the exec mask of the preceding IF block. It can
happen that SIWholeQuadMode decides to switch from WQM to Exact mode inside
the basic block that contains SI_ELSE, in which case it introduces an instruction

s_and_b64 exec, exec, s[...]

which masks out bits that can correspond to both the IF and the ELSE paths.
So the resulting sequence must be:

s_or_savexec_b64 dst, src

s_and_b64 exec, exec, s[...] <-- added by SIWholeQuadMode
s_and_b64 dst, dst, exec <-- added by SILowerControlFlow

s_xor_b64 exec, exec, dst

Whether to add the additional s_and_b64 dst, dst, exec is currently determined
via the ExecModified tracking. With this change, it is instead determined by
an additional flag on SI_ELSE which is set by SIWholeQuadMode.

Finally: It also occured to me that an alternative approach for the long run
is for SILowerControlFlow to unconditionally emit

s_or_saveexec_b64 dst, src

...

s_and_b64 dst, dst, exec
s_xor_b64 exec, exec, dst

and have a pass that detects and cleans up the "redundant AND with exec"
pattern where possible. This could be useful anyway, because we also add
instructions

s_and_b64 vcc, exec, vcc

before s_cbranch_scc (in moveToALU), and those are often redundant. I have
some pending changes to how KILL is lowered that could also benefit from
such a cleanup pass.

In any case, this current patch could help in the short term with the whole
ExecModified business.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D22846

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276972 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-28 11:39:24 +00:00
Matt Arsenault
f799c706db AMDGPU: Use rcp for fdiv 1, x with fpmath metadata
Using rcp should be OK for safe math usually, so this
should not be replacing the original fdiv.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276823 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-26 23:25:44 +00:00
Matt Arsenault
c43677a11d AMDGPU: Add more tests for LDS size with occupancy
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276821 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-26 23:15:59 +00:00
Matthias Braun
ad0f5f6b52 MIRParser: Use dot instead of colon to mark subregisters
Change the syntax to use `%0.sub8` to denote a subregister.

This seems like a more natural fit to denote subregisters; I also plan
to introduce a new ":classname" syntax in upcoming patches to denote the
register class of a vreg.

Note that this commit disallows plain identifiers to start with a '.'
character.  This shouldn't affect anything as external names/IR
references are all prefixed with '$'/'%', plain identifiers are only
used for instruction names, register mask names and subreg indexes.

Differential Revision: https://reviews.llvm.org/D22390

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276815 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-26 21:49:34 +00:00
Tim Northover
d96170e773 GlobalISel: omit braces on MachineInstr types when there's only one.
Tidies up the representation a bit in the common case.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276772 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-26 17:28:01 +00:00
Matt Arsenault
cc67a0a36a AMDGPU: Add missing tests for xnack option for HSA
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276765 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-26 16:45:50 +00:00
Matt Arsenault
ee4cdb7b75 AMDGPU: Add fp legacy instruction intrinsics
This could use some additional optimization work
to use mad/mac legacy.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276764 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-26 16:45:45 +00:00
Jan Vesely
4a44da0c82 AMDGPU: Remove read_workdim intrinsic
Differential revision: https://reviews.llvm.org/D22732

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276682 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-25 20:17:02 +00:00
Matt Arsenault
9b4a967989 AMDGPU: Fix missing verify-machineinstrs in control flow test
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276679 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-25 19:39:06 +00:00
Tom Stellard
a6b9e20623 Revert "[AMDGPU] Emit read-only data to .rodata for hsa"
This reverts commit r276298.

Data stored in .rodata can have a negative offset from .text, but we
don't support negative values in relocations yet.

This caused a regression in one of the amp conformance tests:
5_Data_Cont/5_2_a_v/5_2_3_m/Assignment/Test.02.01

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276498 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-22 23:46:40 +00:00
Tim Northover
3921674c30 GlobalISel: allow multiple types on MachineInstrs.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276481 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-22 22:13:36 +00:00
Anna Thomas
80ee170cb3 Invariant start/end intrinsics overloaded for address space
Summary:
The llvm.invariant.start and llvm.invariant.end intrinsics currently
support specifying invariant memory objects only in the default address
space.

With this change, these intrinsics are overloaded for any adddress space
for memory objects
and we can use these llvm invariant intrinsics in non-default address
spaces.

Example: llvm.invariant.start.p1i8(i64 4, i8 addrspace(1)* %ptr)

This overloaded intrinsic is needed for representing final or invariant
memory in managed languages.

Reviewers: apilipenko, reames

Subscribers: llvm-commits

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276447 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-22 17:49:40 +00:00
Matt Arsenault
c5a5706d17 AMDGPU: Remove redundant test
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276439 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-22 17:01:36 +00:00
Matt Arsenault
9da217ee1e AMDGPU: Fix groupstaticsize for large LDS
The size can exceed s_movk_i32's limit, and we don't
want to use it this early since it inhibits optimizations.

This should probably be merged to the release branch.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276438 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-22 17:01:33 +00:00
Matt Arsenault
30f0e3e4be AMDGPU: Add HSA dispatch id intrinsic
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276437 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-22 17:01:30 +00:00
Matt Arsenault
7488ab3114 AMDGPU: Fix i1 fp_to_int
R600's i1 fp_to_uint selected but was incorrect according to
what instcombine constant folds to.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276435 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-22 17:01:21 +00:00
Anna Thomas
d89a69b5fd Revert "Invariant start/end intrinsics overloaded for address space"
This reverts commit r276316.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276320 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-21 19:06:28 +00:00
Anna Thomas
4227f92f58 Invariant start/end intrinsics overloaded for address space
Summary:
The llvm.invariant.start and llvm.invariant.end intrinsics currently
support specifying invariant memory objects only in the default address space.

With this change, these intrinsics are overloaded for any adddress space for memory objects
and we can use these llvm invariant intrinsics in non-default address spaces.

Example: llvm.invariant.start.p1i8(i64 4, i8 addrspace(1)* %ptr)

This overloaded intrinsic is needed for representing final or invariant memory in managed languages.

Reviewers: tstellarAMD, reames, apilipenko

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D22519

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276316 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-21 18:41:44 +00:00
Konstantin Zhuravlyov
82910c89dd [AMDGPU] Emit read-only data to .rodata for hsa
Differential Revision: https://reviews.llvm.org/D22538


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276298 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-21 15:59:23 +00:00
Matt Arsenault
a9994065f9 AMDGPU: Fix phis from blocks split due to register indexing
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276257 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-21 09:40:57 +00:00
Tim Northover
4951996d06 GlobalISel: implement low-level type with just size & vector lanes.
This should be all the low-level instruction selection needs to determine how
to implement an operation, with the remaining context taken from the opcode
(e.g. G_ADD vs G_FADD) or other flags not based on type (e.g. fast-math).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276158 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-20 19:09:30 +00:00
Matt Arsenault
20e6e25350 AMDGPU: Add missing test coverage for control flow breaks
None of the current lit tests hit si_break handling.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276129 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-20 15:20:35 +00:00
Yaxun Liu
59e8cabf31 AMDGPU: Fix bug causing crash due to invalid opencl version metadata.
Differential Revision: https://reviews.llvm.org/D22526

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276119 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-20 14:38:06 +00:00
Matthias Braun
e3d8cd87b2 Revert "RegScavenging: Add scavengeRegisterBackwards()"
Reverting this commit for now as it seems to be causing failures on
test-suite tests on the clang-ppc64le-linux-lnt bot.

This reverts commit r276044.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276068 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-20 00:21:32 +00:00
Matt Arsenault
63be72069d AMDGPU: Change fdiv lowering based on !fpmath metadata
If 2.5 ulp is acceptable, denormals are not required, and
isn't a reciprocal which will already be handled, replace
with a faster fdiv.

Simplify the lowering tests by using per function
subtarget features.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276051 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-19 23:16:53 +00:00
Matthias Braun
c5e14e0478 RegScavenging: Add scavengeRegisterBackwards()
This is a variant of scavengeRegister() that works for
enterBasicBlockEnd()/backward(). The benefit of the backward mode is
that it is not affected by incomplete kill flags.

This patch also changes
PrologEpilogInserter::doScavengeFrameVirtualRegs() to use the register
scavenger in backwards mode.

Differential Revision: http://reviews.llvm.org/D21885

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276044 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-19 22:37:09 +00:00
Matt Arsenault
4cead0b564 AMDGPU: Expand register indexing pseudos in custom inserter
This is to help moveSILowerControlFlow to before regalloc.
There are a couple of tradeoffs with this. The complete CFG
is visible to more passes, the loop body avoids an extra copy of m0,
vcc isn't required, and immediate offsets can be shrunk into s_movk_i32.

The disadvantage is the register allocator doesn't understand that
the single lane's vector is dead within the loop body, so an extra
register is used to outlive the loop block when expanding the
VGPR -> m0 loop. This also now results in worse waitcnt insertion
before the loop instead of after for pending operations at the point
of the indexing, but that should be fixed by future improvements to
cross block waitcnt insertion.

v_movreld_b32's operands are now modeled more correctly since vdst
is not a true output. This is kind of a hack to treat vdst as a
use operand. Extra checking is required in the verifier since
I can't seem to get tablegen to emit an implicit operand for a
virtual register.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275934 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-19 00:35:03 +00:00
Matt Arsenault
f36ea238a4 AMDGPU: Fix test name and broken CHECK-LABEL
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275928 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-18 23:09:51 +00:00
Matt Arsenault
bb09cfd86f AMDGPU: Add intrinsic for s_flbit_i32/v_ffbh_i32
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275871 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-18 18:35:05 +00:00
Matt Arsenault
40ca91a07a AMDGPU/R600: Replace barrier intrinsics
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275870 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-18 18:34:59 +00:00
Matt Arsenault
865e2fa1dc AMDGPU: Remove dead check in AMDGPUPromoteAlloca
This is currently only called with GEP users. A direct
alloca would only happen with current typed pointers
for arrays which are a perverse case.

Also fix crashes on 0 x and 1 x arrays.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275869 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-18 18:34:53 +00:00
Nicolai Haehnle
0c05ce4746 AMDGPU: Disable AMDGPUPromoteAlloca pass for shader calling conventions.
Summary:
The work item intrinsics are not available for the shader
calling conventions. And even if we did hook them up most
shader stages haves some extra restrictions on the amount
of available LDS.

Reviewers: tstellarAMD, arsenm

Subscribers: nhaehnle, arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D20728

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275779 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-18 09:02:47 +00:00
Yaxun Liu
384c6423e5 Re-commit [AMDGPU] Add metadata for runtime
Attempting to fix lit test failure on ppc.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275676 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-16 05:09:21 +00:00
Matt Arsenault
e066e581b1 AMDGPU: Fix verifier error from partially undef copy
In this situation:

%VGPR2<def> = BUFFER_LOAD_DWORD_OFFSET %SGPR8_SGPR9_SGPR10_SGPR11,
%VGPR7<def,tied3> = V_MAC_F32_e32 %VGPR0<undef>, %VGPR1<kill>, %VGPR7<kill,tied0>, %EXEC<imp-use>
%VGPR3_VGPR4_VGPR5_VGPR6<def> = COPY %VGPR0_VGPR1_VGPR2_VGPR3
%VGPR4<def> = COPY %VGPR2

The copy for VGPR1 -> VGPR4 was an error from reading undefined VGPR1,
but VGPR4 is defined immediately after this copy.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275635 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-15 22:32:02 +00:00
Matt Arsenault
35290cc53d AMDGPU: Remove brev intrinsic
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275620 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-15 21:27:13 +00:00
Matt Arsenault
5fecfa22e5 AMDGPU: Fix TargetPrefix for remaining r600 intrinsics
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275619 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-15 21:27:08 +00:00
Matt Arsenault
a47e87a336 AMDGPU: Remove AMDGPU.ldexp
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275618 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-15 21:26:56 +00:00
Matt Arsenault
7150fbf236 AMDGPU: Remove legacy rsq.clamped intrinsic
Mesa still has a use of llvm.AMDGPU.rsq.f64 remaining.

Also fix mismatch with non-IEEE rsq selecting to IEEE rsq.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275617 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-15 21:26:52 +00:00
Vitaly Buka
a6cb7108c4 Revert "[AMDGPU] Add metadata for runtime"
This reverts commit r275566.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275599 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-15 19:14:57 +00:00
Yaxun Liu
6b0141c6fb [AMDGPU] Add metadata for runtime
Added emitting metadata to elf for runtime.

Runtime requires certain information (metadata) about kernels to be able to execute and query them. Such information is emitted to an elf section as a key-value pair stream.

Differential Revision: https://reviews.llvm.org/D21849

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275566 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-15 14:58:21 +00:00
Matt Arsenault
beff7fe056 AMDGPU: Fix not expanding control flow after some kill blocks
Also stop trying to insert skip blocks at end_cf. This
was inserting them at the end of the block which doesn't make
sense. The skip should be inserted at the beginning of the block
right after the end cf. Just remove this for now since no tests
seem to stress this and I think this can be handled more generally
later.

Fixes bug 28550

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275510 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-15 00:58:15 +00:00
Matt Arsenault
011dcf3d90 AMDGPU: Fix trying to skip from a block with no successors
Found while reducing bug 28550

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275509 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-15 00:58:13 +00:00
Matt Arsenault
435a4467a3 AMDGPU: Fix splitting kill blocks with defs before kill
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275508 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-15 00:58:09 +00:00
Matt Arsenault
4d120e9b24 AMDGPU/R600: Delete/rename intrinsics no longer used by mesa
Use the replacement pass to update the tests, and delete old names.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275375 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-14 05:47:17 +00:00
Matt Arsenault
759af1e5a2 AMDGPU: Remove unused intrinsics
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275371 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-14 05:23:19 +00:00
Matt Arsenault
fb5f7807e0 AMDGPU: Fix test not actually testing anything
It wasn't actually running the pass, and since it is
missing the llvm prefix, the eh intrinsic was not
really an IntrinsicInst.

Also add missing test for lifetime markers.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275370 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-14 05:23:15 +00:00
Quentin Colombet
3d35f0d482 [MIR] Print on the given output instead of stderr.
Currently the MIR framework prints all its outputs (errors and actual
representation) on stderr.

This patch fixes that by printing the regular output in the output
specified with -o.

Differential Revision: http://reviews.llvm.org/D22251

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275314 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-13 20:36:03 +00:00