30 Commits

Author SHA1 Message Date
Matt Arsenault
a2ba13d731 AMDGPU: Add pass to lower kernel arguments to loads
This replaces most argument uses with loads, but for
now not all.

The code in SelectionDAG for calling convention lowering
is actively harmful for amdgpu_kernel. It attempts to
split the argument types into register legal types, which
results in low quality code for arbitary types. Since
all kernel arguments are passed in memory, we just want the
raw types.

I've tried a couple of methods of mitigating this in SelectionDAG,
but it's easier to just bypass this problem alltogether. It's
possible to hack around the problem in the initial lowering,
but the real problem is the DAG then expects to be able to use
CopyToReg/CopyFromReg for uses of the arguments outside the block.

Exposing the argument loads in the IR also has the advantage
that the LoadStoreVectorizer can merge them.

I'm not sure the best approach to dealing with the IR
argument list is. The patch as-is just leaves the IR arguments
in place, so all the existing code will still compute the same
kernarg size and pointlessly lowers the arguments.

Arguably the frontend should emit kernels with an empty argument
list in the first place. Alternatively a dummy array could be
inserted as a single argument just to reserve space.

This does have some disadvantages. Local pointer kernel arguments can
no longer have AssertZext placed  on them as the equivalent !range
metadata is not valid on pointer  typed loads. This is mostly bad
for SI which needs to know about the known bits in order to use the
DS instruction offset, so in this case this is not done.

More importantly, this skips noalias arguments since this pass
does not yet convert this to the equivalent !alias.scope and !noalias
metadata. Producing this metadata correctly seems to be tricky,
although this logically is the same as inlining into a function which
doesn't exist. Additionally, exposing these loads to the vectorizer
may result in degraded aliasing information if a pointer load is
merged with another argument load.

I'm also not entirely sure this is preserving the current clover
ABI, although I would greatly prefer if it would stop widening
arguments and match the HSA ABI. As-is I think it is extending
< 4-byte arguments to 4-bytes but doesn't align them to 4-bytes.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335650 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-26 19:10:00 +00:00
Dmitry Preobrazhensky
6bc93b9a3b [AMDGPU][MC][GFX8][GFX9] Corrected names of integer v_{add/addc/sub/subrev/subb/subbrev}
See bug 34765: https://bugs.llvm.org//show_bug.cgi?id=34765

Reviewers: tamazov, SamWot, arsenm, vpykhtin

Differential Revision: https://reviews.llvm.org/D40088

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@318675 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-20 18:24:21 +00:00
Matt Arsenault
a038a8340c AMDGPU: Allow SIShrinkInstructions to work in non-SSA
Immediates can be folded as long as the immediate is a vreg.

Also undo commuting instructions if it didn't fold an immediate.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307575 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-10 19:53:57 +00:00
Matt Arsenault
8763b3ac42 AMDGPU: Add macro fusion schedule DAG mutation
Try to increase opportunities to shrink vcc uses.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307313 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-06 20:57:05 +00:00
Alexander Timofeev
f9e9586c80 [AMDGPU] Switch scalarize global loads ON by default
Differential revision: https://reviews.llvm.org/D34407

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307097 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-04 17:32:00 +00:00
NAKAMURA Takumi
0a256123a4 Revert r307026, "[AMDGPU] Switch scalarize global loads ON by default"
It broke a testcase.

  Failing Tests (1):
      LLVM :: CodeGen/AMDGPU/alignbit-pat.ll

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307054 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-04 02:14:18 +00:00
Alexander Timofeev
0f9ec97238 [AMDGPU] Switch scalarize global loads ON by default
Differential revision: https://reviews.llvm.org/D34407

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@307026 91177308-0d34-0410-b5e6-96231b3b80d8
2017-07-03 14:54:11 +00:00
Stanislav Mekhanoshin
69edad7913 [AMDGPU] Narrow lshl from 64 to 32 bit if possible
Turn expensive 64 bit shift into 32 bit if shift does not overflow int:
shl (ext x) => zext (shl x)

Differential Revision: https://reviews.llvm.org/D33367

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303569 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-22 16:58:10 +00:00
Sam Kolton
3481b0868e [AMDGPU] Resubmit SDWA peephole: enable by default
Reviewers: vpykhtin, rampitec, arsenm

Subscribers: qcolombet, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D31671

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299654 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-06 15:03:28 +00:00
Ivan Krasin
f836c7dbde Revert r299536. [AMDGPU] SDWA peephole: enable by default.
Reason: breaks multiple bots:

http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/3988
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/1173

Original Review URL: https://reviews.llvm.org/D31671



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299583 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-05 19:58:12 +00:00
Sam Kolton
4523389543 [AMDGPU] SDWA peephole: enable by default
Reviewers: vpykhtin, rampitec, arsenm

Subscribers: qcolombet, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D31671

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299536 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-05 12:00:45 +00:00
Matt Arsenault
d706d030af AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernel
Currently the default C calling convention functions are treated
the same as compute kernels. Make this explicit so the default
calling convention can be changed to a non-kernel.

Converted with perl -pi -e 's/define void/define amdgpu_kernel void/'
on the relevant test directories (and undoing in one place that actually
wanted a non-kernel).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298444 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-21 21:39:51 +00:00
Matt Arsenault
d019e8638a Enable FeatureFlatForGlobal on Volcanic Islands
This switches to the workaround that HSA defaults to
for the mesa path.

This should be applied to the 4.0 branch.

Patch by Vedran Miletić <vedran@miletic.net>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292982 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-24 22:02:15 +00:00
Matt Arsenault
e2b3286a26 AMDGPU: Invert cmp + select with constant
Canonicalize a select with a constant to the false side. This
enables more instruction shrinking opportunities since an
inline immediate can be used for the false side of v_cndmask_b32_e32.

This seems to usually be better but causes some code size regressions
in some tests.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@290372 91177308-0d34-0410-b5e6-96231b3b80d8
2016-12-22 21:40:08 +00:00
Tom Stellard
0deee390af AMDGPU: Add VI i16 support
Patch By: Wei Ding

Differential Revision: https://reviews.llvm.org/D18049

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286464 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-10 16:02:37 +00:00
Tom Stellard
ae152bde49 Revert "AMDGPU: Add VI i16 support"
This reverts commit r285939 and r285948.  These broke some conformance tests.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285995 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-04 13:06:34 +00:00
Tom Stellard
7c173dd5fa AMDGPU: Add VI i16 support
Patch By: Wei Ding

Differential Revision: https://reviews.llvm.org/D18049

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285939 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-03 17:13:50 +00:00
Konstantin Zhuravlyov
a2791e7e20 [AMDGPU] Check if type transforms to i16 (VI+) when getting AMDGPUISD::FFBH_U32
This will prevent following regression when enabling i16 support (D18049):

test/CodeGen/AMDGPU/ctlz.ll
test/CodeGen/AMDGPU/ctlz_zero_undef.ll

Differential Revision: https://reviews.llvm.org/D25802


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285716 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-01 17:49:33 +00:00
Konstantin Zhuravlyov
2d50d3f3c9 [AMDGPU] Promote uniform (i1, i16] operations to i32
Differential Revision: https://reviews.llvm.org/D25302


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283555 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-07 14:22:58 +00:00
Matt Arsenault
7eba65d30c AMDGPU: Use unsigned compare for eq/ne
For some reason there are both of these available, except
for scalar 64-bit compares which only has u64. I'm not sure
why there are both (I'm guessing it's for the one bit inputs we
don't use), but for consistency always using the
unsigned one.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282832 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-30 01:50:20 +00:00
Konstantin Zhuravlyov
f9bcd7b189 [AMDGPU] Promote uniform i16 ops to i32 ops for targets that have 16 bit instructions
Differential Revision: https://reviews.llvm.org/D24125



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282624 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-28 20:05:39 +00:00
Matt Arsenault
3843a1382e AMDGPU: Fix immediate folding logic when shrinking instructions
If the literal is being folded into src0, it doesn't matter
if it's an SGPR because it's being replaced with the literal.

Also fixes initially selecting 32-bit versions of some instructions
which also confused commuting.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281117 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-09 23:32:53 +00:00
Matt Arsenault
d764af3c4e AMDGPU: Support commuting with immediate in src0
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280970 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-08 17:19:29 +00:00
Matthias Braun
0791b66fef AMDGPU: Define a schedule class for COPY.
COPY was lacking a scheduling class, define it to avoid regressions in
the upcoming change to the bidirectional MachineScheduler. Approved by
tstellar on IRC.

Differential Revision: http://reviews.llvm.org/D21540

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@273751 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-24 23:52:11 +00:00
Tom Stellard
d3adac51fc AMDGPU/SI: Enable lanemask tracking in misched
Summary:
This results in higher register usage, but should make it easier for
the compiler to hide latency.

This pass is a prerequisite for some more scheduler improvements, and I
think the increase register usage with this patch is acceptable, because
when combined with the scheduler improvements, the total register usage
will decrease.

shader-db stats:

2382 shaders in 478 tests
Totals:
SGPRS: 48672 -> 49088 (0.85 %)
VGPRS: 34148 -> 34847 (2.05 %)
Code Size: 1285816 -> 1289128 (0.26 %) bytes
LDS: 28 -> 28 (0.00 %) blocks
Scratch: 492544 -> 573440 (16.42 %) bytes per wave
Max Waves: 6856 -> 6846 (-0.15 %)
Wait states: 0 -> 0 (0.00 %)

Depends on D18451

Reviewers: nhaehnle, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18452

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264876 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 16:35:09 +00:00
Tom Stellard
abf168408a [AMDGPU] Assembler: Swap operands of flat_store instructions to match AMD assembler
Historically, AMD internal sp3 assembler has flat_store* addr, data
format. To match existing code and to enable reuse, change LLVM
definitions to match.  Also update MC and CodeGen tests.

Differential Revision: http://reviews.llvm.org/D16927

Patch by: Nikolay Haustov

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260694 91177308-0d34-0410-b5e6-96231b3b80d8
2016-02-12 17:57:54 +00:00
Matt Arsenault
68f559ea61 AMDGPU: Fix ctlz combine for sub 32-bit types
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@257353 91177308-0d34-0410-b5e6-96231b3b80d8
2016-01-11 17:02:06 +00:00
Matt Arsenault
f12a12cd25 AMDGPU: Pattern match ffbh pattern to instruction.
The hardware instruction's output on 0 is -1 rather than 32.
Eliminate a test and select to -1. This removes an extra instruction
from the compatability function with HSAIL's firstbit instruction.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@257352 91177308-0d34-0410-b5e6-96231b3b80d8
2016-01-11 17:02:00 +00:00
Matt Arsenault
01a6cb6ce3 AMDGPU: Custom lower i64 ctlz
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@257348 91177308-0d34-0410-b5e6-96231b3b80d8
2016-01-11 16:50:29 +00:00
Matt Arsenault
3bbc287300 LegalizeDAG: Expand ctlz with ctlz_zero_undef if legal
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@257345 91177308-0d34-0410-b5e6-96231b3b80d8
2016-01-11 16:37:46 +00:00