Currently runtime metadata is emitted as an ELF section with name .AMDGPU.runtime_metadata.
However there is a standard way to convey vendor specific information about how to run an ELF binary, which is called vendor-specific note element (http://www.netbsd.org/docs/kernel/elf-notes.html).
This patch lets AMDGPU backend emits runtime metadata as a note element in .note section.
Differential Revision: https://reviews.llvm.org/D25781
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286502 91177308-0d34-0410-b5e6-96231b3b80d8
These ones need to have the size on the pseudo instruction set for
getInstSizeInBytes to work correctly. These also have a statically
known size.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283437 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
mesa3d will use the same kernel calling convention as amdhsa, but it will
handle everything else like the default 'unknown' OS type.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits, kzhuravl
Differential Revision: https://reviews.llvm.org/D22783
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281779 91177308-0d34-0410-b5e6-96231b3b80d8
OpenCL kernels have hidden kernel arguments for global offset and printf buffer. For consistency, these hidden argument should be included in the runtime metadata. Also updated kernel argument kind metadata.
Differential Revision: https://reviews.llvm.org/D23424
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280829 91177308-0d34-0410-b5e6-96231b3b80d8
- Implemented amdgpu-flat-work-group-size attribute
- Implemented amdgpu-num-active-waves-per-eu attribute
- Implemented amdgpu-num-sgpr attribute
- Implemented amdgpu-num-vgpr attribute
- Dynamic LDS constraints are in a separate patch
Patch by Tom Stellard and Konstantin Zhuravlyov
Differential Revision: https://reviews.llvm.org/D21562
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280747 91177308-0d34-0410-b5e6-96231b3b80d8
ABIArgOffset is a problem because properly fsetting the
KernArgSize requires that the reserved area before the
real kernel arguments be correctly aligned, which requires
fixing clover.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276766 91177308-0d34-0410-b5e6-96231b3b80d8
Remove dead code from r600 intrinsic removal.
Remove unset members, rename StackSize to be less ambiguous.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276436 91177308-0d34-0410-b5e6-96231b3b80d8
Added emitting metadata to elf for runtime.
Runtime requires certain information (metadata) about kernels to be able to execute and query them. Such information is emitted to an elf section as a key-value pair stream.
Differential Revision: https://reviews.llvm.org/D21849
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275566 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
v2: don't count SGPRs spilled to scratch twice
I think this is sufficient. It doesn't count private memory usage, which
happens often and uses scratch but isn't technically a spill. The private
memory usage can be computed by:
[scratch_per_thread - vgpr_spills - a random multiple of SGPR spills].
The fact SGPR spills add very high numbers to the scratch size make that
computation a guessing game, but I don't have a solution to that.
Reviewers: tstellarAMD
Subscribers: arsenm, kzhuravl
Differential Revision: http://reviews.llvm.org/D22197
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275288 91177308-0d34-0410-b5e6-96231b3b80d8
Debugger prologue is emitted if -mattr=+amdgpu-debugger-emit-prologue.
Debugger prologue writes work group IDs and work item IDs to scratch memory at fixed location in the following format:
- offset 0: work group ID x
- offset 4: work group ID y
- offset 8: work group ID z
- offset 16: work item ID x
- offset 20: work item ID y
- offset 24: work item ID z
Set
- amd_kernel_code_t::debug_wavefront_private_segment_offset_sgpr to scratch wave offset reg
- amd_kernel_code_t::debug_private_segment_buffer_sgpr to scratch rsrc reg
- amd_kernel_code_t::is_debug_supported to true if all debugger features are enabled
Differential Revision: http://reviews.llvm.org/D20335
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@273769 91177308-0d34-0410-b5e6-96231b3b80d8
Split AMDGPUSubtarget into amdgcn/r600 specific subclasses.
This removes most of the static_casting of the basic codegen
classes everywhere, and tries to restrict the features
visible on the wrong target.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@273652 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Version 2 is now the default. If you want to emit version 1, use
the amdgcn--amdhsa-amdcov1 triple.
Reviewers: arsenm, kzhuravl
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D19283
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@268647 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
A shader stored the live mask (initial exec mask) in an SGPR which was then
spilled during register allocation. The allocator quite reasonably
optimized turned the spill into
v_writelane_b32 %vgpr, exec_lo, N
v_writelane_b32 %vgpr, exec_hi, N+1
at the beginning of the shader, confusing the SGPR accounting.
No test case, because si-sgpr-spill.ll together with an upcoming patch for
WQM handling exhibits the problem.
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D19199
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@266824 91177308-0d34-0410-b5e6-96231b3b80d8
Tests added along with implemented feature.
Note that there is a small leftover of unecessary MI sheduling issue
(more info in the review). CodeGen/AMDGPU/salu-to-valu.ll updated to fix
the false regression.
TODO: Support for TTMP quads, comma-separated syntax in "[]" and more.
Differential Revision: http://reviews.llvm.org/D17825
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@266205 91177308-0d34-0410-b5e6-96231b3b80d8
Introduce a subtarget feature for this, and leave the default with
the current behavior which assumes up to 16-byte loads/stores can
be used. The field also seems to have the ability to be set to 2 bytes,
but I'm not sure what that would be used for.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260651 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This allows Mesa to pass initial SPI_PS_INPUT_ADDR to LLVM.
The register assigns VGPR locations to PS inputs, while the ENA register
determines whether or not they are loaded.
Mesa needs to set some inputs as not-movable, so that a pixel shader prolog
binary appended at the beginning can assume where some inputs are.
v2: Make PSInputAddr private, because there is never enough silly getters
and setters for people to read.
Reviewers: tstellarAMD, arsenm
Subscribers: arsenm
Differential Revision: http://reviews.llvm.org/D16030
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@257591 91177308-0d34-0410-b5e6-96231b3b80d8