105 Commits

Author SHA1 Message Date
Rafael Espindola
f39bd4dd4b Revert "AMDGPU: Add 32-bit constant address space"
This reverts commit r324487.

It broke clang tests.

llvm-svn: 324494
2018-02-07 18:09:35 +00:00
Marek Olsak
1556374f7f AMDGPU: Add 32-bit constant address space
Note: This is a candidate for LLVM 6.0, because it was planned to be
      in that release but was delayed due to a long review period.

Merge conflict in release_60 - resolution:
    Add "-p6:32:32" into the second (non-amdgiz) string.

Only scalar loads support 32-bit pointers. An address in a VGPR will
fail to compile. That's OK because the results of loads will only be used
in places where VGPRs are forbidden.

Updated AMDGPUAliasAnalysis and used SReg_64_XEXEC.
The tests cover all uses cases we need for Mesa.

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D41651

llvm-svn: 324487
2018-02-07 16:01:00 +00:00
Tim Renouf
b1963b408e [AMDGPU] stop image_store being moved illegally
Summary:
A recent change
321556: AMDGPU: Remove mayLoad/hasSideEffects from MIMG stores
can allow the machine instruction scheduler to move an image store past
an image load using the same descriptor.

V2: Fixed by marking image ops as mayAlias and isAliased. This may be
overly conservative, and we may need to revisit.
V3: Reverted test change done on 321556.

Reviewers: arsenm, nhaehnle, dstuttard

Subscribers: llvm-commits, t-tye, yaxunl, wdng, kzhuravl

Differential Revision: https://reviews.llvm.org/D41969

llvm-svn: 322419
2018-01-12 22:57:24 +00:00
Matt Arsenault
4f8f93ccba AMDGPU: Use unique PSVs for buffer resources
Also fixes using the wrong memory type for some
intrinsics when custom lowering them.

llvm-svn: 321557
2017-12-29 17:18:21 +00:00
Matt Arsenault
0ed8acaf38 AMDGPU: Implement getTgtMemIntrinsic for images
Currently all images are lowered to have a single
image PseudoSourceValue. Image stores happen to have
overly strict mayLoad/mayStore/hasSideEffects flags
set on them, so this happens to work. When these
are fixed to be correct, the scheduler breaks
this because the identical PSVs are assumed to
be the same address. These need to be unique
to the image resource value.

llvm-svn: 321555
2017-12-29 17:18:14 +00:00
David Blaikie
45b647d5eb Target/TargetInstrInfo.h -> CodeGen/TargetInstrInfo.h to match layering
This header includes CodeGen headers, and is not, itself, included by
any Target headers, so move it into CodeGen to match the layering of its
implementation.

llvm-svn: 317647
2017-11-08 01:01:31 +00:00
Konstantin Zhuravlyov
04ae3039b5 AMDGPU: Fix warning discovered by r317266 [-Wunused-private-field]
llvm-svn: 317280
2017-11-02 22:35:22 +00:00
Konstantin Zhuravlyov
30c5230906 AMDGPU: Remove outdated fixme (it was already fixed)
llvm-svn: 317266
2017-11-02 20:48:06 +00:00
Tim Renouf
9e2adaa646 [AMDGPU] AMDPAL scratch buffer support
Summary:
Added support for scratch (including spilling) for OS type amdpal:
generates code to set up the scratch descriptor if it is needed.

With amdpal, the scratch resource descriptor is loaded from offset 0 of
the global information table. The low 32 bits of the address of the
global information table is passed in s0.

Added amdgpu-git-ptr-high function attribute to hard-wire the high 32
bits of the address of the global information table. If the function
attribute is not specified, or is 0xffffffff, then the backend generates
code to use the high 32 bits of pc.

The documentation for the AMDPAL ABI will be added in a later commit.

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye

Differential Revision: https://reviews.llvm.org/D37483

llvm-svn: 314501
2017-09-29 09:49:35 +00:00
Jan Sjodin
242b2dcd0b Add AddresSpace to PseudoSourceValue.
Differential Revision: https://reviews.llvm.org/D35089

llvm-svn: 313297
2017-09-14 20:53:51 +00:00
Matt Arsenault
09562e957d AMDGPU: Start adding tail call support
Handle the sibling call cases.

llvm-svn: 310753
2017-08-11 20:42:08 +00:00
Eugene Zelenko
6ba0dce149 [AMDGPU] Fix some Clang-tidy modernize-use-using and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 310328
2017-08-08 00:47:13 +00:00
Matt Arsenault
50852a087b AMDGPU: Pass special input registers to functions
llvm-svn: 309998
2017-08-03 23:00:29 +00:00
Matt Arsenault
c8961a5c4b AMDGPU: Fix clobbering CSR VGPRs when spilling SGPR to it
llvm-svn: 309783
2017-08-02 01:52:45 +00:00
Matt Arsenault
2442068eec AMDGPU: Annotate implicitarg.ptr usage
We need to pass something to functions for this to work.
It isn't derivable just from the kernarg segment pointer
because the implicit arguments are placed after the
kernel arguments.

Also fixes missing test for the intrinsic.

llvm-svn: 309398
2017-07-28 15:52:08 +00:00
Matt Arsenault
b8eb99e870 AMDGPU: Figure out private memory regs after lowering
Introduce pseudo-registers for registers needed for stack
access, which are replaced during finalizeLowering.
Note these pseudo-registers are currently only used for the
used register location, and not for determining their
input argument register.

This is better because it avoids the need to try to predict
whether a call will be emitted from the IR, and also
detects stack objects introduced by legalization.

Test changes are from the HasStackObjects check being more
accurate since stack objects introduced during legalization
are now known.

llvm-svn: 308325
2017-07-18 16:44:56 +00:00
Matt Arsenault
ca407a6bda AMDGPU: Annotate features from x work item/group IDs.
This wasn't necessary before since they are always enabled
for kernels, but this is necessary if they need to be
forwarded to a callable function.

llvm-svn: 308226
2017-07-17 22:35:50 +00:00
Matt Arsenault
9b6d8d16dd AMDGPU: Partially fix implicit.buffer.ptr intrinsic handling
This should not be treated as a different version of
private_segment_buffer. These are distinct things with
different uses and register classes, and requires the
function argument info to have more context about the
function's type and environment.

Also add missing test coverage for the intrinsic, and
emit an error for HSA. This also encovers that the intrinsic
is broken unless there happen to be stack objects.

llvm-svn: 306264
2017-06-26 03:01:31 +00:00
Chandler Carruth
eb66b33867 Sort the remaining #include lines in include/... and lib/....
I did this a long time ago with a janky python script, but now
clang-format has built-in support for this. I fed clang-format every
line with a #include and let it re-sort things according to the precise
LLVM rules for include ordering baked into clang-format these days.

I've reverted a number of files where the results of sorting includes
isn't healthy. Either places where we have legacy code relying on
particular include ordering (where possible, I'll fix these separately)
or where we have particular formatting around #include lines that
I didn't want to disturb in this patch.

This patch is *entirely* mechanical. If you get merge conflicts or
anything, just ignore the changes in this patch and run clang-format
over your #include lines in the files.

Sorry for any noise here, but it is important to keep these things
stable. I was seeing an increasing number of patches with irrelevant
re-ordering of #include lines because clang-format was used. This patch
at least isolates that churn, makes it easy to skip when resolving
conflicts, and gets us to a clean baseline (again).

llvm-svn: 304787
2017-06-06 11:49:48 +00:00
Matt Arsenault
ab4fb8ba2f AMDGPU: Start defining a calling convention
Partially implement callee-side for arguments and return values.
byval doesn't work properly, and most likely sret or other on-stack
return values most as well.

llvm-svn: 303308
2017-05-17 21:56:25 +00:00
Matt Arsenault
276b515440 AMDGPU: Add StackPtr and FramePtr registers to MFI
These will be necessary for setting up call sequences.

llvm-svn: 301208
2017-04-24 18:05:16 +00:00
Matt Arsenault
8611f8573b AMDGPU: Make MFI fields private
llvm-svn: 300596
2017-04-18 20:59:40 +00:00
Matt Arsenault
80352fe61c AMDGPU: Refactor argument lowering
Split into smaller functions and prepare for handling
non-entry functions.

llvm-svn: 299998
2017-04-11 22:29:24 +00:00
Matt Arsenault
8dbeb825a7 AMDGPU: Fix crash when disassembling VOP3 mac
The unused dummy src2_modifiers is missing, so it crashes
when trying to print it.

I tried to fully remove src2_modifiers, but there are some
irritations in the places where it is converted to mad since
it starts to require modifying use lists while iterating over
them.

llvm-svn: 299861
2017-04-10 17:58:06 +00:00
Matt Arsenault
85a1bec778 AMDGPU: Don't use stack space for SGPR->VGPR spills
Before frame offsets are calculated, try to eliminate the
frame indexes used by SGPR spills. Then we can delete them
after.

I think for now we can be sure that no other instruction
will be re-using the same frame indexes. It should be easy
to notice if this assumption ever breaks since everything
asserts if it tries to use a dead frame index later.

The unused emergency stack slot seems to still be left behind,
so an additional 4 bytes is still wasted.

llvm-svn: 295753
2017-02-21 19:12:08 +00:00
Tom Stellard
bbf29e433b AMDGPU add support for spilling to a user sgpr pointed buffers
Summary:
This lets you select which sort of spilling you want, either s[0:1] or 64-bit loads from s[0:1].

Patch By: Dave Airlie

Reviewers: nhaehnle, arsenm, tstellarAMD

Reviewed By: arsenm

Subscribers: mareko, llvm-commits, kzhuravl, wdng, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D25428

llvm-svn: 293000
2017-01-25 01:25:13 +00:00
Eugene Zelenko
10b1a2eda2 [AMDGPU] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 292688
2017-01-21 00:53:49 +00:00
Tom Stellard
5634d231ec AMDGPU/SI: Make a function const
llvm-svn: 290185
2016-12-20 17:26:34 +00:00
Tom Stellard
2c0dd4ec69 AMDGPU/SI: Add a MachineMemOperand when lowering llvm.amdgcn.buffer.load.*
Reviewers: arsenm, nhaehnle, mareko

Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D27834

llvm-svn: 290184
2016-12-20 17:19:44 +00:00
Tom Stellard
ec757acb99 AMDGPU/SI: Add a MachineMemOperand to MIMG instructions
Summary:
Without a MachineMemOperand, the scheduler was assuming MIMG instructions
were ordered memory references, so no loads or stores could be reordered
across them.

Reviewers: arsenm

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D27536

llvm-svn: 290179
2016-12-20 15:52:17 +00:00
Konstantin Zhuravlyov
0da0753352 [AMDGPU] Wave and register controls
- Implemented amdgpu-flat-work-group-size attribute
- Implemented amdgpu-num-active-waves-per-eu attribute
- Implemented amdgpu-num-sgpr attribute
- Implemented amdgpu-num-vgpr attribute
- Dynamic LDS constraints are in a separate patch

Patch by Tom Stellard and Konstantin Zhuravlyov

Differential Revision: https://reviews.llvm.org/D21562

llvm-svn: 280747
2016-09-06 20:22:28 +00:00
Saleem Abdulrasool
b7be5c7c36 AMDGPU: fix mismatch tags, NFC
llvm-svn: 280006
2016-08-29 20:42:07 +00:00
Matt Arsenault
278cead855 AMDGPU: Remove unused tracking of flat instructions
llvm-svn: 278361
2016-08-11 17:15:28 +00:00
Matt Arsenault
1c48278fe0 AMDGPU: Make AMDGPUMachineFunction fields private
ABIArgOffset is a problem because properly fsetting the
KernArgSize requires that the reserved area before the
real kernel arguments be correctly aligned, which requires
fixing clover.

llvm-svn: 276766
2016-07-26 16:45:58 +00:00
Matt Arsenault
dc6f828a36 AMDGPU: Add HSA dispatch id intrinsic
llvm-svn: 276437
2016-07-22 17:01:30 +00:00
Marek Olsak
71790c948e AMDGPU/SI: Emit the number of SGPR and VGPR spills
Summary:
v2: don't count SGPRs spilled to scratch twice

I think this is sufficient. It doesn't count private memory usage, which
happens often and uses scratch but isn't technically a spill. The private
memory usage can be computed by:
  [scratch_per_thread - vgpr_spills - a random multiple of SGPR spills].

The fact SGPR spills add very high numbers to the scratch size make that
computation a guessing game, but I don't have a solution to that.

Reviewers: tstellarAMD

Subscribers: arsenm, kzhuravl

Differential Revision: http://reviews.llvm.org/D22197

llvm-svn: 275288
2016-07-13 17:35:15 +00:00
NAKAMURA Takumi
3d39df49e0 SIMachineFunctionInfo.cpp: Appease msc18 to use std::array.
llvm-svn: 273860
2016-06-27 10:26:43 +00:00
NAKAMURA Takumi
2aa0a231a2 Reformat blank lines.
llvm-svn: 273858
2016-06-27 10:26:25 +00:00
Konstantin Zhuravlyov
6dcbdd2d51 [AMDGPU] Emit debugger prologue and emit the rest of the debugger fields in the kernel code header
Debugger prologue is emitted if -mattr=+amdgpu-debugger-emit-prologue.

Debugger prologue writes work group IDs and work item IDs to scratch memory at fixed location in the following format:
  - offset 0: work group ID x
  - offset 4: work group ID y
  - offset 8: work group ID z
  - offset 16: work item ID x
  - offset 20: work item ID y
  - offset 24: work item ID z

Set
  - amd_kernel_code_t::debug_wavefront_private_segment_offset_sgpr to scratch wave offset reg
  - amd_kernel_code_t::debug_private_segment_buffer_sgpr to scratch rsrc reg
  - amd_kernel_code_t::is_debug_supported to true if all debugger features are enabled

Differential Revision: http://reviews.llvm.org/D20335

llvm-svn: 273769
2016-06-25 03:11:28 +00:00
Konstantin Zhuravlyov
480521dd43 [AMDGPU][NFC] Rename ReserveTrapVGPRs -> ReserveRegs
Differential Revision: http://reviews.llvm.org/D20081

llvm-svn: 270594
2016-05-24 18:37:18 +00:00
Konstantin Zhuravlyov
c01e46c011 [AMDGPU] Move reserved vgpr count for trap handler usage to SIMachineFunctionInfo + minor commenting changes
Differential Revision: http://reviews.llvm.org/D19537

llvm-svn: 267573
2016-04-26 17:24:40 +00:00
Matt Arsenault
b60850cb10 AMDGPU: Implement addrspacecast
llvm-svn: 267452
2016-04-25 19:27:24 +00:00
Tom Stellard
c0b7282ebc AMDGPU: allow specifying a workgroup size that needs to fit in a compute unit
Summary:
For GL_ARB_compute_shader we need to support workgroup sizes of at least 1024. However, if we want to allow large workgroup sizes, we may need to use less registers, as we have to run more waves per SIMD.

This patch adds an attribute to specify the maximum work group size the compiled program needs to support. It defaults, to 256, as that has no wave restrictions.

Reducing the number of registers available is done similarly to how the registers were reserved for chips with the sgpr init bug.

Reviewers: mareko, arsenm, tstellarAMD, nhaehnle

Subscribers: FireBurn, kerberizer, llvm-commits, arsenm

Differential Revision: http://reviews.llvm.org/D18340

Patch By: Bas Nieuwenhuizen

llvm-svn: 266337
2016-04-14 16:27:07 +00:00
Tom Stellard
b9654f5566 AMDGPU/SI: Use the correct scratch wave offset register for shaders.
Summary:
The code previously always used s1 as it was using the user + system SGPR
information for compute kernels. This is incorrect for Mesa shaders though,

The register should be the next SGPR after all user and system SGPR's.
We use that Mesa adds arguments for all input and system SGPR's and
take the next available SGPR for the scratch wave offset register.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

Reviewers: mareko, arsenm, nhaehnle, tstellarAMD

Subscribers: qcolombet, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18941

Patch By: Bas Nieuwenhuizen

llvm-svn: 266336
2016-04-14 16:27:03 +00:00
Matt Arsenault
0c86db8f5c AMDGPU: R600 code splitting cleanup
Move a few functions only used by R600 to R600 specific code,
fix header macros to stop using R600, mark classes as final.

llvm-svn: 263204
2016-03-11 08:00:27 +00:00
Tom Stellard
c656bfbad2 AMDGPU/SI: Add support for spiling SGPRs to scratch buffer
Summary:
This is necessary for when we run out of VGPRs and can no
longer use v_{read,write}_lane for spilling SGPRs.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D17592

llvm-svn: 262732
2016-03-04 18:31:18 +00:00
Matt Arsenault
37f2de7107 AMDGPU: Set flat_scratch from flat_scratch_init reg
This was hardcoded to the static private size, but this
would be missing the offset and additional size for someday
when we have dynamic sizing.

Also stops always initializing flat_scratch even when unused.

In the future we should stop emitting this unless flat instructions
are used to access private memory. For example this will initialize
it almost always on VI because flat is used for global access.

llvm-svn: 260658
2016-02-12 06:31:30 +00:00
Marek Olsak
aa3a6d7b3d AMDGPU/SI: Add s_waitcnt at the end of non-void functions
Summary:
v2: Make ReturnsVoid private, so that I can another 8 lines of code and
    look more productive.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D16034

llvm-svn: 257622
2016-01-13 17:23:09 +00:00
Marek Olsak
9d874b8ed0 AMDGPU/SI: Add new target attribute InitialPSInputAddr
Summary:
This allows Mesa to pass initial SPI_PS_INPUT_ADDR to LLVM.
The register assigns VGPR locations to PS inputs, while the ENA register
determines whether or not they are loaded.

Mesa needs to set some inputs as not-movable, so that a pixel shader prolog
binary appended at the beginning can assume where some inputs are.

v2: Make PSInputAddr private, because there is never enough silly getters
    and setters for people to read.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D16030

llvm-svn: 257591
2016-01-13 11:45:36 +00:00
Matt Arsenault
08cdb00306 AMDGPU: Rework how private buffer passed for HSA
If we know we have stack objects, we reserve the registers
that the private buffer resource and wave offset are passed
and use them directly.

If not, reserve the last 5 SGPRs just in case we need to spill.
After register allocation, try to pick the next available registers
instead of the last SGPRs, and then insert copies from the inputs
to the reserved registers in the progloue.

This also only selectively enables all of the input registers
which are really required instead of always enabling them.

llvm-svn: 254331
2015-11-30 21:16:03 +00:00