Commit Graph

189 Commits

Author SHA1 Message Date
Tom Stellard
a9c6165732 AMDGPU/SI: Don't allow unaligned scratch access
Summary: The hardware doesn't support this.

Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D25523

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284257 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-14 18:10:39 +00:00
Nicolai Haehnle
5d662038f2 AMDGPU: Fix use-after-frees
Reviewers: arsenm, tstellarAMD

Subscribers: kzhuravl, wdng, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D25312

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284215 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-14 09:03:04 +00:00
Konstantin Zhuravlyov
a91924b28a [AMDGPU] Emit 32-bit lo/hi got and pc relative variant kinds for external and global address space variables
Differential Revision: https://reviews.llvm.org/D25562


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284196 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-14 04:37:34 +00:00
Matt Arsenault
c31d80dbf5 AMDGPU: Assume spilling will occur at -O0
Because everything live is spilled at the end of a
block by fast regalloc, assume this will happen and
avoid the copies of the resource descriptor.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284119 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-13 13:10:00 +00:00
Matt Arsenault
3339279d4a AMDGPU: Fix truncate to bool warnings
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284116 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-13 12:45:16 +00:00
Matt Arsenault
7b6a558f24 AMDGPU: Initial implementation of VGPR indexing mode
This is the most basic handling of the indirect access
pseudos using GPR indexing mode. This currently only enables
the mode for a single v_mov_b32 and then disables it.
This is much more complicated to use than the movrel instructions,
so a new optimization pass is probably needed to fold the access
into the uses and keep the mode enabled for them.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284031 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-12 18:49:05 +00:00
Matt Arsenault
548c83d355 AMDGPU: Refactor indirect vector lowering
Allow inserting multiple instructions in the
expanded loop.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283177 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-04 01:41:05 +00:00
Konstantin Zhuravlyov
f9bcd7b189 [AMDGPU] Promote uniform i16 ops to i32 ops for targets that have 16 bit instructions
Differential Revision: https://reviews.llvm.org/D24125



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282624 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-28 20:05:39 +00:00
Matt Arsenault
de82da5521 AMDGPU: Fix broken FrameIndex handling
We were trying to avoid using a FrameIndex operand in non-pointer
operands in a convoluted way, and would break because of
using TargetFrameIndex. The TargetFrameIndex should only be used
in the case where it makes sense to fold it as part of the addressing
mode, otherwise it requires materialization like a normal constant.
This wasn't working reliably and failed in the added testcase, hitting
the assert when processing the frame index.

The TargetFrameIndex was coming from trying to produce an AssertZext
limiting the maximum stack size. I'm not sure this was correct to begin
with, because it is apparently possible to have a single workitem
dispatch that requires all 4G of private memory.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281824 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-17 16:09:55 +00:00
Matt Arsenault
8f824feb12 AMDGPU: Allow some control flow intrinsics to be CSEd
These clean up some unnecessary or instructions in
cases with complex loops.

In the original testcase I noticed this, the same
or with exec was repeated 5 or 6 times in a row. With
this only one is emitted or sometimes a copy.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281786 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-16 22:11:18 +00:00
Tom Stellard
77361ae206 AMDGPU: Refactor kernel argument lowering
Summary:
The main challenge in lowering kernel arguments for AMDGPU is determing the
memory type of the argument.  The generic calling convention code assumes
that only legal register types can be stored in memory, but this is not the
case for AMDGPU.

This consolidates all the logic AMDGPU uses for deducing memory types into a single
function.  This will make it much easier to support different ABIs in the future.

Reviewers: arsenm

Subscribers: arsenm, wdng, nhaehnle, llvm-commits, yaxunl

Differential Revision: https://reviews.llvm.org/D24614

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281781 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-16 21:53:00 +00:00
Tom Stellard
1961591989 AMDGPU/SI: Add support for triples with the mesa3d operating system
Summary:
mesa3d will use the same kernel calling convention as amdhsa, but it will
handle everything else like the default 'unknown' OS type.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D22783

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281779 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-16 21:34:26 +00:00
Matt Arsenault
8bc95d0a47 AMDGPU: Improve splitting 64-bit bit ops by constants
This addresses a TODO to handle operations besides and. This
also starts eliminating no-op operations with a constant that
can emerge later.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281488 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-14 15:19:03 +00:00
Justin Lebar
c71d5b41ef [CodeGen] Split out the notions of MI invariance and MI dereferenceability.
Summary:
An IR load can be invariant, dereferenceable, neither, or both.  But
currently, MI's notion of invariance is IR-invariant &&
IR-dereferenceable.

This patch splits up the notions of invariance and dereferenceability at
the MI level.  It's NFC, so adds some probably-unnecessary
"is-dereferenceable" checks, which we can remove later if desired.

Reviewers: chandlerc, tstellarAMD

Subscribers: jholewinski, arsenm, nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D23371

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281151 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-11 01:38:58 +00:00
Matt Arsenault
605a81a85c AMDGPU: Relax SGPR asm constraint register class
s should be SReg_32 to be as general as possible. This can avoid a copy
from m0.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280154 91177308-0d34-0410-b5e6-96231b3b80d8
2016-08-30 20:50:08 +00:00
Matt Arsenault
e52dfc95ef AMDGPU: Move cndmask pseudo to be isel pseudo
There's only one use of this for the convenience
of a pattern. I think v_mov_b64_pseudo should also be
moved, but SIFoldOperands does currently make use of it.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@279901 91177308-0d34-0410-b5e6-96231b3b80d8
2016-08-27 01:00:37 +00:00
NAKAMURA Takumi
805f0aacc0 Untabify.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@279408 91177308-0d34-0410-b5e6-96231b3b80d8
2016-08-22 00:58:04 +00:00
Matt Arsenault
4da2d32371 AMDGPU: Remove dead option
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@278965 91177308-0d34-0410-b5e6-96231b3b80d8
2016-08-17 20:07:16 +00:00
Justin Bogner
6673ea81f6 Replace "fallthrough" comments with LLVM_FALLTHROUGH
This is a mechanical change of comments in switches like fallthrough,
fall-through, or fall-thru to use the LLVM_FALLTHROUGH macro instead.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@278902 91177308-0d34-0410-b5e6-96231b3b80d8
2016-08-17 05:10:15 +00:00
Matt Arsenault
b24aaff187 AMDGPU: Fix missing test for addressing mode with odd offsets
Add test if the constant offset looks unaligned.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@278589 91177308-0d34-0410-b5e6-96231b3b80d8
2016-08-13 01:43:51 +00:00
Alina Sbirlea
4ebfadfe76 LoadStoreVectorizer: Remove TargetBaseAlign. Keep alignment for stack adjustments.
Summary:
TargetBaseAlign is no longer required since LSV checks if target allows misaligned accesses.
A constant defining a base alignment is still needed for stack accesses where alignment can be adjusted.

Previous patch (D22936) was reverted because tests were failing. This patch also fixes the cause of those failures:
- x86 failing tests either did not have the right target, or the right alignment.
- NVPTX failing tests did not have the right alignment.
- AMDGPU failing test (merge-stores) should allow vectorization with the given alignment but the target info
  considers <3xi32> a non-standard type and gives up early. This patch removes the condition and only checks
  for a maximum size allowed and relies on the next condition checking for %4 for correctness.
  This should be revisited to include 3xi32 as a MVT type (on arsenm's non-immediate todo list).

Note that checking the sizeInBits for a MVT is undefined (leads to an assertion failure),
so we need to create an EVT, hence the interface change in allowsMisaligned to include the Context.

Reviewers: arsenm, jlebar, tstellarAMD

Subscribers: jholewinski, arsenm, mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D23068

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277735 91177308-0d34-0410-b5e6-96231b3b80d8
2016-08-04 16:38:44 +00:00
Matt Arsenault
94166e75ac AMDGPU: fdiv -1, x -> rcp -x
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277535 91177308-0d34-0410-b5e6-96231b3b80d8
2016-08-02 22:25:04 +00:00
Matt Arsenault
4fd45ebabd AMDGPU: Fix shouldConvertConstantLoadToIntImm behavior
This should really be true for any immediate, not just
inline ones.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277260 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-30 01:40:36 +00:00
Matthias Braun
f79c57a412 MachineFunction: Return reference for getFrameInfo(); NFC
getFrameInfo() never returns nullptr so we should use a reference
instead of a pointer.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277017 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-28 18:40:00 +00:00
Wei Ding
ee8c4ca1e1 AMDGPU : Add intrinsics for compare with the full wavefront result
Differential Revision: http://reviews.llvm.org/D22482

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276998 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-28 16:42:13 +00:00
Matt Arsenault
b5a809e37c AMDGPU: Remove analyzeImmediate
This no longer uses the more complicated classification
of constants.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276945 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-28 00:32:02 +00:00
Matt Arsenault
d506595769 AMDGPU: Make AMDGPUMachineFunction fields private
ABIArgOffset is a problem because properly fsetting the
KernArgSize requires that the reserved area before the
real kernel arguments be correctly aligned, which requires
fixing clover.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276766 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-26 16:45:58 +00:00
Matt Arsenault
ee4cdb7b75 AMDGPU: Add fp legacy instruction intrinsics
This could use some additional optimization work
to use mad/mac legacy.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276764 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-26 16:45:45 +00:00
Jan Vesely
4a44da0c82 AMDGPU: Remove read_workdim intrinsic
Differential revision: https://reviews.llvm.org/D22732

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276682 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-25 20:17:02 +00:00
Matt Arsenault
9da217ee1e AMDGPU: Fix groupstaticsize for large LDS
The size can exceed s_movk_i32's limit, and we don't
want to use it this early since it inhibits optimizations.

This should probably be merged to the release branch.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276438 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-22 17:01:33 +00:00
Matt Arsenault
30f0e3e4be AMDGPU: Add HSA dispatch id intrinsic
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276437 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-22 17:01:30 +00:00
Matt Arsenault
7c8be6eeb7 AMDGPU: Don't reinvent transferSuccessorsAndUpdatePHIs
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276434 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-22 17:01:15 +00:00
Matt Arsenault
a9994065f9 AMDGPU: Fix phis from blocks split due to register indexing
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276257 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-21 09:40:57 +00:00
Matt Arsenault
63be72069d AMDGPU: Change fdiv lowering based on !fpmath metadata
If 2.5 ulp is acceptable, denormals are not required, and
isn't a reciprocal which will already be handled, replace
with a faster fdiv.

Simplify the lowering tests by using per function
subtarget features.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276051 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-19 23:16:53 +00:00
Matt Arsenault
1ce58d721f AMDGPU: Only use legal inline immediates with kill pseudo
Only if the value is negative or positive is what matters,
so use a constant that doesn't require an instruction to
materialize.

These should really just emit the write exec directly,
but for stick with the kill pseudo-terminator.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275988 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-19 16:27:56 +00:00
Matt Arsenault
4cead0b564 AMDGPU: Expand register indexing pseudos in custom inserter
This is to help moveSILowerControlFlow to before regalloc.
There are a couple of tradeoffs with this. The complete CFG
is visible to more passes, the loop body avoids an extra copy of m0,
vcc isn't required, and immediate offsets can be shrunk into s_movk_i32.

The disadvantage is the register allocator doesn't understand that
the single lane's vector is dead within the loop body, so an extra
register is used to outlive the loop block when expanding the
VGPR -> m0 loop. This also now results in worse waitcnt insertion
before the loop instead of after for pending operations at the point
of the indexing, but that should be fixed by future improvements to
cross block waitcnt insertion.

v_movreld_b32's operands are now modeled more correctly since vdst
is not a true output. This is kind of a hack to treat vdst as a
use operand. Extra checking is required in the verifier since
I can't seem to get tablegen to emit an implicit operand for a
virtual register.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275934 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-19 00:35:03 +00:00
Matt Arsenault
bb09cfd86f AMDGPU: Add intrinsic for s_flbit_i32/v_ffbh_i32
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275871 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-18 18:35:05 +00:00
Matt Arsenault
7150fbf236 AMDGPU: Remove legacy rsq.clamped intrinsic
Mesa still has a use of llvm.AMDGPU.rsq.f64 remaining.

Also fix mismatch with non-IEEE rsq selecting to IEEE rsq.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275617 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-15 21:26:52 +00:00
Justin Lebar
b2d6ad7cfd [SelectionDAG] Get rid of bool parameters in SelectionDAG::getLoad, getStore, and friends.
Summary:
Instead, we take a single flags arg (a bitset).

Also add a default 0 alignment, and change the order of arguments so the
alignment comes before the flags.

This greatly simplifies many callsites, and fixes a bug in
AMDGPUISelLowering, wherein the order of the args to getLoad was
inverted.  It also greatly simplifies the process of adding another flag
to getLoad.

Reviewers: chandlerc, tstellarAMD

Subscribers: jholewinski, arsenm, jyknight, dsanders, nemanjai, llvm-commits

Differential Revision: http://reviews.llvm.org/D22249

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275592 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-15 18:27:10 +00:00
Matt Arsenault
435a4467a3 AMDGPU: Fix splitting kill blocks with defs before kill
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275508 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-15 00:58:09 +00:00
Matt Arsenault
759af1e5a2 AMDGPU: Remove unused intrinsics
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275371 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-14 05:23:19 +00:00
Tom Stellard
fab569e180 AMDGPU/SI: Add support for R_AMDGPU_GOTPCREL
Reviewers: rafael, ruiu, tony-tye, arsenm, kzhuravl

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D21484

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275268 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-13 14:23:33 +00:00
Matt Arsenault
e6aa9db488 AMDGPU: Fold out no-op kill intrinsics
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275253 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-13 06:04:22 +00:00
Matt Arsenault
8a85be7236 AMDGPU: Follow up to r275203
I meant to squash this into it.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275220 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-12 21:41:32 +00:00
Nicolai Haehnle
ac1e47a211 AMDGPU: Treat texture gather instructions more like other MIMG instructions
Summary:
Setting MIMG to 0 has a bunch of unexpected side effects, including that
isVMEM returns false which leads to incorrect treatment in the hazard
recognizer. The reason I noticed it is that it also leads to incorrect
treatment in VGPR-to-SGPR copies, which is one cause of the referenced bug.

The only reason why MIMG was set to 0 is to signal the special handling of
dmasks, but that can be checked differently.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96877

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: http://reviews.llvm.org/D22210

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275113 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-11 21:59:43 +00:00
Matt Arsenault
762cdd4ae8 Revert "AMDGPU: Remove unused control flow intrinsic"
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274978 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-09 17:18:39 +00:00
Matt Arsenault
9ddc329c4e AMDGPU: Fix fdiv lowering when f32 denormals supported
Also fix test not actually using function labels.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274969 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-09 07:48:11 +00:00
Matt Arsenault
5e2ec03cf4 AMDGPU: Remove unused control flow intrinsic
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274939 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-08 21:39:44 +00:00
Matt Arsenault
6e4fd1497d AMDGPU: Add feature for unaligned access
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274398 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-01 23:03:44 +00:00
Matt Arsenault
df587174eb AMDGPU: Improve load/store of illegal types.
There was a combine before to handle the simple copy case.
Split this into handling loads and stores separately.

We might want to change how this handles some of the vector
extloads, since this can result in large code size increases.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274394 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-01 22:47:50 +00:00