231 Commits

Author SHA1 Message Date
Matt Arsenault
44ee20512e AMDGPU: Add additional MIR tests for exec mask optimizations
Also includes one example of how this transform is unsound. This isn't
verifying the copies are used in the control flow intrinisic patterns.

Also add option to disable exec mask opt pass. Since this pass is
unsound, it may be useful to turn it off until it is fixed.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357091 91177308-0d34-0410-b5e6-96231b3b80d8
2019-03-27 16:58:30 +00:00
Matt Arsenault
951c9d9b26 CodeGen: Refactor regallocator command line and target selection
This will allow targets more flexibility to replace the
register allocator core passes. In a future commit,
AMDGPU will run the core register assignment passes
twice, and will also want to disallow using the
standard -regalloc option.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356506 91177308-0d34-0410-b5e6-96231b3b80d8
2019-03-19 19:33:12 +00:00
Neil Henning
89fc4394cb [AMDGPU] Add an experimental buffer fat pointer address space.
Add an experimental buffer fat pointer address space that is currently
unhandled in the backend. This commit reserves address space 7 as a
non-integral pointer repsenting the 160-bit fat pointer (128-bit buffer
descriptor + 32-bit offset) that is heavily used in graphics workloads
using the AMDGPU backend.

Differential Revision: https://reviews.llvm.org/D58957

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356373 91177308-0d34-0410-b5e6-96231b3b80d8
2019-03-18 14:44:28 +00:00
Matt Arsenault
2066fb5fc9 AMDGPU: Partially fix default device for HSA
There are a few different issues, mostly stemming from using
generation based checks for anything instead of subtarget
features. Stop adding flat-address-space as a feature for HSA, as it
should only be a device property. This was incorrectly allowing flat
instructions to select for SI.

Increase the default generation for HSA to avoid the encoding error
when emitting objects. This has some other side effects from various
checks which probably should be separate subtarget features (in the
cost model and for dealing with the DS offset folding issue).

Partial fix for bug 41070. It should probably be an error to try using
amdhsa without flat support.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356347 91177308-0d34-0410-b5e6-96231b3b80d8
2019-03-17 21:31:35 +00:00
Matt Arsenault
d8706fcd74 MIR: Allow targets to serialize MachineFunctionInfo
This has been a very painful missing feature that has made producing
reduced testcases difficult. In particular the various registers
determined for stack access during function lowering were necessary to
avoid undefined register errors in a large percentage of
cases. Implement a subset of the important fields that need to be
preserved for AMDGPU.

Most of the changes are to support targets parsing register fields and
properly reporting errors. The biggest sort-of bug remaining is for
fields that can be initialized from the IR section will be overwritten
by a default initialized machineFunctionInfo section. Another
remaining bug is the machineFunctionInfo section is still printed even
if empty.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356215 91177308-0d34-0410-b5e6-96231b3b80d8
2019-03-14 22:54:43 +00:00
Aakanksha Patil
67bb6a3b71 AMDGPU: Handle "uniform-work-group-size" attribute (fix for RADV)
A previous patch for "uniform-work-group-size" attribute was found to break
some RADV and possibly radeon SI tests and had to be retracted.
This patch fixes that.

Differential Revision: http://reviews.llvm.org/D58993



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@355574 91177308-0d34-0410-b5e6-96231b3b80d8
2019-03-07 00:54:04 +00:00
Matt Arsenault
42e25a8fd0 AMDGPU: Fix typo
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@355056 91177308-0d34-0410-b5e6-96231b3b80d8
2019-02-28 00:52:33 +00:00
Matt Arsenault
df3568d8a9 AMDGPU: Enable function calls by default
Fixes some crashes on illegal call situations which are unfortunately
still valid IR.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@355051 91177308-0d34-0410-b5e6-96231b3b80d8
2019-02-28 00:40:32 +00:00
Matt Arsenault
0410b9ebcc AMDGPU: Remove debugger related subtarget features
As far as I know these aren't needed anymore.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@354634 91177308-0d34-0410-b5e6-96231b3b80d8
2019-02-21 23:27:46 +00:00
Valery Pykhtin
a0ecdf4bba [AMDGPU] Enable DPP combiner pass by default.
Related revisions: https://reviews.llvm.org/D55444, https://reviews.llvm.org/D55314

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@353691 91177308-0d34-0410-b5e6-96231b3b80d8
2019-02-11 11:15:03 +00:00
Chandler Carruth
6b547686c5 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@351636 91177308-0d34-0410-b5e6-96231b3b80d8
2019-01-19 08:50:56 +00:00
David Stuttard
1da08e2514 [AMDGPU] Add support for TFE/LWE in image intrinsics. 2nd try
TFE and LWE support requires extra result registers that are written in the
event of a failure in order to detect that failure case.
The specific use-case that initiated these changes is sparse texture support.

This means that if image intrinsics are used with either option turned on, the
programmer must ensure that the return type can contain all of the expected
results. This can result in redundant registers since the vector size must be a
power-of-2.

This change takes roughly 6 parts:
1. Modify the instruction defs in tablegen to add new instruction variants that
can accomodate the extra return values.
2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE
(where the bulk of the work for these instruction types is now done)
3. Extra verification code to catch cases where intrinsics have been used but
insufficient return registers are used.
4. Modification to the adjustWritemask optimisation to account for TFE/LWE being
enabled (requires extra registers to be maintained for error return value).
5. An extra pass to zero initialize the error value return - this is because if
the error does not occur, the register is not written and thus must be zeroed
before use. Also added a new (on by default) option to ensure ALL return values
are zero-initialized that is required for sparse texture support.
6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO
for this to re-enable and handle correctly).

There's an additional fix now to avoid a dmask=0

For an image intrinsic with tfe where all result channels except tfe
were unused, I was getting an image instruction with dmask=0 and only a
single vgpr result for tfe. That is incorrect because the hardware
assumes there is at least one vgpr result, plus the one for tfe.

Fixed by forcing dmask to 1, which gives the desired two vgpr result
with tfe in the second one.

The TFE or LWE result is returned from the intrinsics using an aggregate
type. Look in the test code provided to see how this works, but in essence IR
code to invoke the intrinsic looks as follows:

%v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15,
                                      i32 %s, <8 x i32> %rsrc, i32 1, i32 0)
%v.vec = extractvalue {<4 x float>, i32} %v, 0
%v.err = extractvalue {<4 x float>, i32} %v, 1

This re-submit of the change also includes a slight modification in
SIISelLowering.cpp to work-around a compiler bug for the powerpc_le
platform that caused a buildbot failure on a previous submission.

Differential revision: https://reviews.llvm.org/D48826

Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda


Work around for ppcle compiler bug

Change-Id: Ie284cf24b2271215be1b9dc95b485fd15000e32b

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@351054 91177308-0d34-0410-b5e6-96231b3b80d8
2019-01-14 11:55:24 +00:00
Aakanksha Patil
cc31a27f1e Revert r348971: [AMDGPU] Support for "uniform-work-group-size" attribute
This patch breaks RADV (and probably RadeonSI as well)



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@349084 91177308-0d34-0410-b5e6-96231b3b80d8
2018-12-13 21:23:12 +00:00
Aakanksha Patil
b20ee3547f [AMDGPU] Support for "uniform-work-group-size" attribute
Updated the annotate-kernel-features pass to support the propagation of uniform-work-group attribute from the kernel to the called functions. Once this pass is run, all kernels, even the ones which initially did not have the attribute, will be able to indicate weather or not they have uniform work group size depending on the value of the attribute. 

Differential Revision: https://reviews.llvm.org/D50200



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348971 91177308-0d34-0410-b5e6-96231b3b80d8
2018-12-12 20:49:17 +00:00
Tim Corringham
c9d081f818 [AMDGPU] Add new Mode Register pass
A new pass to manage the Mode register.

Currently this just manages the floating point double precision
rounding requirements, but is intended to be easily extended to
encompass all Mode register settings.

The immediate motivation comes from the requirement to use the
round-to-zero rounding mode for the 16 bit interpolation
instructions, where the rounding mode setting is shared between
16 and 64 bit operations.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348754 91177308-0d34-0410-b5e6-96231b3b80d8
2018-12-10 12:06:10 +00:00
David Green
7135d8b482 [Targets] Add errors for tiny and kernel codemodel on targets that don't support them
Adds fatal errors for any target that does not support the Tiny or Kernel
codemodels by rejigging the getEffectiveCodeModel calls.

Differential Revision: https://reviews.llvm.org/D50141


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348585 91177308-0d34-0410-b5e6-96231b3b80d8
2018-12-07 12:10:23 +00:00
Valery Pykhtin
92a20faea1 [AMDGPU] Partial revert of rL348371: Turn on the DPP combiner by default
Turn the combiner back off as there're failures until the issue is fixed.

Differential revision: https://reviews.llvm.org/D55314

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348487 91177308-0d34-0410-b5e6-96231b3b80d8
2018-12-06 14:20:02 +00:00
Valery Pykhtin
62a4268c77 [AMDGPU]: Turn on the DPP combiner by default
Differential revision: https://reviews.llvm.org/D55314

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348371 91177308-0d34-0410-b5e6-96231b3b80d8
2018-12-05 15:21:17 +00:00
Valery Pykhtin
d339265d52 [AMDGPU] Combine DPP mov with use instructions (VOP1/2/3)
Introduces DPP pseudo instructions and the pass that combines DPP mov with subsequent uses.

Differential revision: https://reviews.llvm.org/D53762

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347993 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-30 14:21:56 +00:00
David Stuttard
cb049f818d Revert r347871 "Fix: Add support for TFE/LWE in image intrinsic"
Also revert fix r347876

One of the buildbots was reporting a failure in some relevant tests that I can't
repro or explain at present, so reverting until I can isolate.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347911 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-29 20:14:17 +00:00
David Stuttard
7fd87699a5 Add support for TFE/LWE in image intrinsics
TFE and LWE support requires extra result registers that are written in the
event of a failure in order to detect that failure case.
The specific use-case that initiated these changes is sparse texture support.

This means that if image intrinsics are used with either option turned on, the
programmer must ensure that the return type can contain all of the expected
results. This can result in redundant registers since the vector size must be a
power-of-2.

This change takes roughly 6 parts:
1. Modify the instruction defs in tablegen to add new instruction variants that
can accomodate the extra return values.
2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE
(where the bulk of the work for these instruction types is now done)
3. Extra verification code to catch cases where intrinsics have been used but
insufficient return registers are used.
4. Modification to the adjustWritemask optimisation to account for TFE/LWE being
enabled (requires extra registers to be maintained for error return value).
5. An extra pass to zero initialize the error value return - this is because if
the error does not occur, the register is not written and thus must be zeroed
before use. Also added a new (on by default) option to ensure ALL return values
are zero-initialized that is required for sparse texture support.
6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO
for this to re-enable and handle correctly).

There's an additional fix now to avoid a dmask=0

For an image intrinsic with tfe where all result channels except tfe
were unused, I was getting an image instruction with dmask=0 and only a
single vgpr result for tfe. That is incorrect because the hardware
assumes there is at least one vgpr result, plus the one for tfe.

Fixed by forcing dmask to 1, which gives the desired two vgpr result
with tfe in the second one.

The TFE or LWE result is returned from the intrinsics using an aggregate
type. Look in the test code provided to see how this works, but in essence IR
code to invoke the intrinsic looks as follows:

%v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15,
                                      i32 %s, <8 x i32> %rsrc, i32 1, i32 0)
%v.vec = extractvalue {<4 x float>, i32} %v, 0
%v.err = extractvalue {<4 x float>, i32} %v, 1

Differential revision: https://reviews.llvm.org/D48826

Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347871 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-29 15:21:13 +00:00
Matt Arsenault
44412401b3 AMDGPU: Don't optimize exec masks at -O0
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347573 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-26 17:02:02 +00:00
Ron Lieberman
322a8075ef [AMDGPU] Add FixupVectorISel pass, currently Supports SREGs in GLOBAL LD/ST
Add a pass to fixup various vector ISel issues.
Currently we handle converting GLOBAL_{LOAD|STORE}_*
and GLOBAL_Atomic_* instructions into their _SADDR variants.
This involves feeding the sreg into the saddr field of the new instruction.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347008 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-16 01:13:34 +00:00
Matt Arsenault
0cb12ca8f6 Allow subclassing ExternalAA
This allows testing AMDGPU alias analysis like any
other alias analysis pass. This fixes the existing
test pointlessly running opt -O3 when it really
just wants to run the one analysis.

Before there was no way to test this using -aa-eval
with opt, since the default constructed pass
is run. The wrapper subclass allows the
default constructor to pass the necessary callback.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346353 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-07 20:26:42 +00:00
Nicolai Haehnle
776a459079 AMDGPU: Rewrite SILowerI1Copies to always stay on SALU
Summary:
Instead of writing boolean values temporarily into 32-bit VGPRs
if they are involved in PHIs or are observed from outside a loop,
we use bitwise masking operations to combine lane masks in a way
that is consistent with wave control flow.

Move SIFixSGPRCopies to before this pass, since that pass
incorrectly attempts to move SGPR phis to VGPRs.

This should recover most of the code quality that was lost with
the bug fix in "AMDGPU: Remove PHI loop condition optimization".

There are still some relevant cases where code quality could be
improved, in particular:

- We often introduce redundant masks with EXEC. Ideally, we'd
  have a generic computeKnownBits-like analysis to determine
  whether masks are already masked by EXEC, so we can avoid this
  masking both here and when lowering uniform control flow.

- The criterion we use to determine whether a def is observed
  from outside a loop is conservative: it doesn't check whether
  (loop) branch conditions are uniform.

Change-Id: Ibabdb373a7510e426b90deef00f5e16c5d56e64b

Reviewers: arsenm, rampitec, tpr

Subscribers: kzhuravl, jvesely, wdng, mgorny, yaxunl, dstuttard, t-tye, eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D53496

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345719 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-31 13:27:08 +00:00
Scott Linder
92794ee479 [AMDGPU] Add a pass to promote bitcast calls
AMDGPU currently only supports direct calls, but at lower optimisation levels it
fails to lower statically direct calls which appear indirect due to a bitcast.

Add a pass to visit all CallSites and use CallPromotionUtils to "devirtualize"
calls.

Differential Revision: https://reviews.llvm.org/D52741


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345382 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-26 13:18:36 +00:00
Neil Henning
b461f4de29 [AMDGPU] Add an AMDGPU specific atomic optimizer.
This commit adds a new IR level pass to the AMDGPU backend to perform
atomic optimizations. It works by:

- Running through a function and finding atomicrmw add/sub or uses of
  the atomic buffer intrinsics for add/sub.
- If all arguments except the value to be added/subtracted are uniform,
  record the value to be optimized.
- Run through the atomic operations we can optimize and, depending on
  whether the value is uniform/divergent use wavefront wide operations
  (DPP in the divergent case) to calculate the total amount to be
  atomically added/subtracted.
- Then let only a single lane of each wavefront perform the atomic
  operation, reducing the total number of atomic operations in flight.
- Lastly we recombine the result from the single lane to each lane of
  the wavefront, and calculate our individual lanes offset into the
  final result.

Differential Revision: https://reviews.llvm.org/D51969

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343973 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-08 15:49:19 +00:00
Matt Arsenault
47e2c38609 AMDGPU: Always run AMDGPUAlwaysInline
Even if calls are enabled, it still needs to be run
for forcing inline of functions that use LDS.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343657 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-03 02:47:25 +00:00
Matt Arsenault
b3a02e1925 AMDGPU: Expand atomicrmw nand in IR
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343559 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-02 03:50:56 +00:00
Sameer Sahasrabuddhe
116128c1c0 [AMDGPU] restore r342722 which was reverted with r342743
[AMDGPU] lower-switch in preISel as a workaround for legacy DA

Summary:
The default target of the switch instruction may sometimes be an
"unreachable" block, when it is guaranteed that one of the cases is
always taken. The dominator tree concludes that such a switch
instruction does not have an immediate post dominator. This confuses
divergence analysis, which is unable to propagate sync dependence to
the targets of the switch instruction.

As a workaround, the AMDGPU target now invokes lower-switch as a
preISel pass. LowerSwitch is designed to handle the unreachable
default target correctly, allowing the divergence analysis to locate
the correct immediate dominator of the now-lowered switch.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342956 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-25 09:39:21 +00:00
Sameer Sahasrabuddhe
0ccb4cd734 revert changes from r342722
"[AMDGPU] lower-switch in preISel as a workaround for legacy DA"

This broke regression tests. The first breakage was noticed here:
http://lab.llvm.org:8011/builders/lld-x86_64-freebsd/builds/23549


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342743 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-21 16:31:51 +00:00
Sameer Sahasrabuddhe
5b5e532790 [AMDGPU] lower-switch in preISel as a workaround for legacy DA
Summary:
The default target of the switch instruction may sometimes be an
"unreachable" block, when it is guaranteed that one of the cases is
always taken. The dominator tree concludes that such a switch
instruction does not have an immediate post dominator. This confuses
divergence analysis, which is unable to propagate sync dependence to
the targets of the switch instruction.

As a workaround, the AMDGPU target now invokes lower-switch as a
preISel pass. LowerSwitch is designed to handle the unreachable
default target correctly, allowing the divergence analysis to locate
the correct immediate dominator of the now-lowered switch.

Reviewers: arsenm, nhaehnle

Reviewed By: nhaehnle

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits, simoll

Differential Revision: https://reviews.llvm.org/D52221

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342722 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-21 11:26:55 +00:00
Matt Arsenault
c8c005cb52 AMDGPU: Stop forcing internalize at -O0
This doesn't really matter if clang is always emitting
the visibility as hidden by default.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@341168 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-31 06:02:36 +00:00
Matt Arsenault
7e212e4168 AMDGPU: Remove remnants of old address space mapping
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@341165 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-31 05:49:54 +00:00
Mark Searles
249b255c78 run post-RA hazard recognizer pass late
Memory legalizer, waitcnt, and shrink  passes can perturb the instructions,
which means that the post-RA hazard recognizer pass should run after them.
Otherwise, one of those passes may invalidate the work done by the hazard
recognizer. Note that this has adverse side-effect that any consecutive
S_NOP 0's, emitted by the hazard recognizer, will not be shrunk into a
single S_NOP <N>. This should be addressed in a follow-on patch.

Differential Revision: https://reviews.llvm.org/D49288

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@337154 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-16 10:02:41 +00:00
Tom Stellard
1d6fd076a3 AMDGPU: Refactor Subtarget classes
Summary:
This is a follow-up to r335942.
- Merge SISubtarget into AMDGPUSubtarget and rename to GCNSubtarget
- Rename AMDGPUCommonSubtarget to AMDGPUSubtarget
- Merge R600Subtarget::Generation and GCNSubtarget::Generation into
  AMDGPUSubtarget::Generation.

Reviewers: arsenm, jvesely

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D49037

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336851 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-11 20:59:01 +00:00
Matt Arsenault
e07c9538b5 Reapply "AMDGPU: Force inlining if LDS global address is used"
This reverts commit r336623

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336675 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-10 14:03:41 +00:00
Vlad Tsyrklevich
3bda4ed0ec Revert "AMDGPU: Force inlining if LDS global address is used"
This reverts commit r336587, it was causing test failures on the
sanitizer bots.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336623 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-10 00:46:07 +00:00
Matt Arsenault
12d30e1e27 AMDGPU: Force inlining if LDS global address is used
These won't work for the forseeable future. These aren't allowed
from OpenCL, but IPO optimizations can make them appear.

Also directly set the attributes on functions, regardless
of the linkage rather than cloning functions like before.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336587 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-09 19:22:22 +00:00
Stanislav Mekhanoshin
0e1a98e255 [AMDGPU] Enable LICM in the BE pipeline
This allows to hoist code portion to compute reciprocal of loop
invariant denominator in integer division after codegen prepare
expansion.

Differential Revision: https://reviews.llvm.org/D48604

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335988 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-29 16:26:53 +00:00
Matt Arsenault
a2ba13d731 AMDGPU: Add pass to lower kernel arguments to loads
This replaces most argument uses with loads, but for
now not all.

The code in SelectionDAG for calling convention lowering
is actively harmful for amdgpu_kernel. It attempts to
split the argument types into register legal types, which
results in low quality code for arbitary types. Since
all kernel arguments are passed in memory, we just want the
raw types.

I've tried a couple of methods of mitigating this in SelectionDAG,
but it's easier to just bypass this problem alltogether. It's
possible to hack around the problem in the initial lowering,
but the real problem is the DAG then expects to be able to use
CopyToReg/CopyFromReg for uses of the arguments outside the block.

Exposing the argument loads in the IR also has the advantage
that the LoadStoreVectorizer can merge them.

I'm not sure the best approach to dealing with the IR
argument list is. The patch as-is just leaves the IR arguments
in place, so all the existing code will still compute the same
kernarg size and pointlessly lowers the arguments.

Arguably the frontend should emit kernels with an empty argument
list in the first place. Alternatively a dummy array could be
inserted as a single argument just to reserve space.

This does have some disadvantages. Local pointer kernel arguments can
no longer have AssertZext placed  on them as the equivalent !range
metadata is not valid on pointer  typed loads. This is mostly bad
for SI which needs to know about the known bits in order to use the
DS instruction offset, so in this case this is not done.

More importantly, this skips noalias arguments since this pass
does not yet convert this to the equivalent !alias.scope and !noalias
metadata. Producing this metadata correctly seems to be tricky,
although this logically is the same as inlining into a function which
doesn't exist. Additionally, exposing these loads to the vectorizer
may result in degraded aliasing information if a pointer load is
merged with another argument load.

I'm also not entirely sure this is preserving the current clover
ABI, although I would greatly prefer if it would stop widening
arguments and match the HSA ABI. As-is I think it is extending
< 4-byte arguments to 4-bytes but doesn't align them to 4-bytes.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335650 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-26 19:10:00 +00:00
Stanislav Mekhanoshin
e5240df77a [AMDGPU] Construct memory clauses before RA
Memory clauses are formed into bundles in presence of xnack.
Their source operands are marked as early-clobber.

This allows to allocate distinct source and destination registers
within a clause and prevent breaking the clause with s_nop in the
hazard recognizer.

Clauses are undone before post-RA scheduler to allow some rescheduling,
which will not break the clause since artificial edges are created in
the dag to keep memory operations together. Yet this allows a better
ILP in some cases.

Differential Revision: https://reviews.llvm.org/D47511

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333691 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-31 20:13:51 +00:00
Tom Stellard
e0c801c31a AMDGPU: Split AMDGPUTTI into GCNTTI and R600TTI
Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D47359

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333605 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-30 22:55:35 +00:00
Matt Arsenault
c40f49e881 AMDGPU: Fix typo in option description
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333457 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-29 19:35:46 +00:00
Mark Searles
d900d78107 [AMDGPU][Waitcnt] Remove obsolete waitcnt option
With the removal of the old waitcnt pass, the '-enable-si-insert-waitcnts' option is obsolete. Remove it.

Differential Revision: https://reviews.llvm.org/D47378

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333303 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-25 20:24:08 +00:00
Matt Arsenault
cdbb0ae52b AMDGPU: Add pass to optimize reqd_work_group_size
Eliminate loads from the dispatch packet when they will have
a known value.

Also pattern match the code used by the library to handle partial
workgroup dispatches, which isn't necessary if reqd_work_group_size
is used.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@332771 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-18 21:35:00 +00:00
Matt Arsenault
0d44e5b362 AMDGPU: Rename OpenCL lowering pass to be R600 specific.
This pass is
  a) broken.
  b) r600 specific.

Fixing (a) is a bit more non-trivial, but fixing (b)
is easy. Move this pass to being R600 only for now.

This pass does pass all the unit tests, however clang
no longer generates code that looks like the unit test
input, so fixing the pass requires fixing the tests and
the pass as one, and checking it works with clang still.

Patch by Dave Airlie

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@332196 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-13 10:04:48 +00:00
Mark Searles
ac95aa5361 [AMDGPU][Waitcnt] Remove the old waitcnt pass
Remove the old waitcnt pass ( si-insert-waits ), which is no longer maintained
and getting crufty

Differential Revision: https://reviews.llvm.org/D46448

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331641 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-07 14:43:28 +00:00
Adrian Prantl
26b584c691 Remove \brief commands from doxygen comments.
We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.

Patch produced by

  for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done

Differential Revision: https://reviews.llvm.org/D46290

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331272 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-01 15:54:18 +00:00
Tom Stellard
70b7270658 AMDGPU: Initialize GlobalISel passes
Summary:
This fixes AMDGPU GlobalISel test failures when enabling the AMDGPU
target without any other targets that use GlobalISel.

Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D45353

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329588 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-09 16:09:13 +00:00