mirror of
https://github.com/RPCS3/llvm.git
synced 2024-11-24 04:09:47 +00:00
AMDHSA: Code object v3 updates
- Do not emit following assembler directives: - .hsa_code_object_version - .hsa_code_object_isa - .amd_amdgpu_isa - .amd_amdgpu_hsa_metadata - .amd_amdgpu_pal_metadata - Do not emit .note entries - Cleanup and bring in sync kernel descriptor header file - Emit kernel descriptor into .rodata with appropriate relocations and alignments git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334519 91177308-0d34-0410-b5e6-96231b3b80d8
This commit is contained in:
parent
0dcc1159b4
commit
299cf5ff6a
@ -686,7 +686,7 @@ Symbols include the following:
|
||||
*link-name* ``STT_OBJECT`` - ``.data`` Global variable
|
||||
- ``.rodata``
|
||||
- ``.bss``
|
||||
*link-name*\ ``@kd`` ``STT_OBJECT`` - ``.rodata`` Kernel descriptor
|
||||
*link-name*\ ``.kd`` ``STT_OBJECT`` - ``.rodata`` Kernel descriptor
|
||||
*link-name* ``STT_FUNC`` - ``.text`` Kernel entry point
|
||||
===================== ============== ============= ==================
|
||||
|
||||
@ -1578,7 +1578,7 @@ CP microcode requires the Kernel descritor to be allocated on 64 byte alignment.
|
||||
======= ======= =============================== ============================
|
||||
Bits Size Field Name Description
|
||||
======= ======= =============================== ============================
|
||||
31:0 4 bytes GroupSegmentFixedSize The amount of fixed local
|
||||
31:0 4 bytes GROUP_SEGMENT_FIXED_SIZE The amount of fixed local
|
||||
address space memory
|
||||
required for a work-group
|
||||
in bytes. This does not
|
||||
@ -1587,7 +1587,7 @@ CP microcode requires the Kernel descritor to be allocated on 64 byte alignment.
|
||||
space memory that may be
|
||||
added when the kernel is
|
||||
dispatched.
|
||||
63:32 4 bytes PrivateSegmentFixedSize The amount of fixed
|
||||
63:32 4 bytes PRIVATE_SEGMENT_FIXED_SIZE The amount of fixed
|
||||
private address space
|
||||
memory required for a
|
||||
work-item in bytes. If
|
||||
@ -1596,7 +1596,7 @@ CP microcode requires the Kernel descritor to be allocated on 64 byte alignment.
|
||||
be added to this value for
|
||||
the call stack.
|
||||
127:64 8 bytes Reserved, must be 0.
|
||||
191:128 8 bytes KernelCodeEntryByteOffset Byte offset (possibly
|
||||
191:128 8 bytes KERNEL_CODE_ENTRY_BYTE_OFFSET Byte offset (possibly
|
||||
negative) from base
|
||||
address of kernel
|
||||
descriptor to kernel's
|
||||
@ -1605,22 +1605,22 @@ CP microcode requires the Kernel descritor to be allocated on 64 byte alignment.
|
||||
aligned.
|
||||
383:192 24 Reserved, must be 0.
|
||||
bytes
|
||||
415:384 4 bytes ComputePgmRsrc1 Compute Shader (CS)
|
||||
415:384 4 bytes COMPUTE_PGM_RSRC1 Compute Shader (CS)
|
||||
program settings used by
|
||||
CP to set up
|
||||
``COMPUTE_PGM_RSRC1``
|
||||
configuration
|
||||
register. See
|
||||
:ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx9-table`.
|
||||
447:416 4 bytes ComputePgmRsrc2 Compute Shader (CS)
|
||||
447:416 4 bytes COMPUTE_PGM_RSRC2 Compute Shader (CS)
|
||||
program settings used by
|
||||
CP to set up
|
||||
``COMPUTE_PGM_RSRC2``
|
||||
configuration
|
||||
register. See
|
||||
:ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx9-table`.
|
||||
448 1 bit EnableSGPRPrivateSegmentBuffer Enable the setup of the
|
||||
SGPR user data registers
|
||||
448 1 bit ENABLE_SGPR_PRIVATE_SEGMENT Enable the setup of the
|
||||
_BUFFER SGPR user data registers
|
||||
(see
|
||||
:ref:`amdgpu-amdhsa-initial-kernel-execution-state`).
|
||||
|
||||
@ -1631,18 +1631,19 @@ CP microcode requires the Kernel descritor to be allocated on 64 byte alignment.
|
||||
``compute_pgm_rsrc2.user_sgpr.user_sgpr_count``.
|
||||
Any requests beyond 16
|
||||
will be ignored.
|
||||
449 1 bit EnableSGPRDispatchPtr *see above*
|
||||
450 1 bit EnableSGPRQueuePtr *see above*
|
||||
451 1 bit EnableSGPRKernargSegmentPtr *see above*
|
||||
452 1 bit EnableSGPRDispatchID *see above*
|
||||
453 1 bit EnableSGPRFlatScratchInit *see above*
|
||||
454 1 bit EnableSGPRPrivateSegmentSize *see above*
|
||||
455 1 bit EnableSGPRGridWorkgroupCountX Not implemented in CP and
|
||||
should always be 0.
|
||||
456 1 bit EnableSGPRGridWorkgroupCountY Not implemented in CP and
|
||||
should always be 0.
|
||||
457 1 bit EnableSGPRGridWorkgroupCountZ Not implemented in CP and
|
||||
should always be 0.
|
||||
449 1 bit ENABLE_SGPR_DISPATCH_PTR *see above*
|
||||
450 1 bit ENABLE_SGPR_QUEUE_PTR *see above*
|
||||
451 1 bit ENABLE_SGPR_KERNARG_SEGMENT_PTR *see above*
|
||||
452 1 bit ENABLE_SGPR_DISPATCH_ID *see above*
|
||||
453 1 bit ENABLE_SGPR_FLAT_SCRATCH_INIT *see above*
|
||||
454 1 bit ENABLE_SGPR_PRIVATE_SEGMENT *see above*
|
||||
_SIZE
|
||||
455 1 bit ENABLE_SGPR_GRID_WORKGROUP Not implemented in CP and
|
||||
_COUNT_X should always be 0.
|
||||
456 1 bit ENABLE_SGPR_GRID_WORKGROUP Not implemented in CP and
|
||||
_COUNT_Y should always be 0.
|
||||
457 1 bit ENABLE_SGPR_GRID_WORKGROUP Not implemented in CP and
|
||||
_COUNT_Z should always be 0.
|
||||
463:458 6 bits Reserved, must be 0.
|
||||
511:464 6 Reserved, must be 0.
|
||||
bytes
|
||||
@ -1996,10 +1997,10 @@ CP microcode requires the Kernel descritor to be allocated on 64 byte alignment.
|
||||
====================================== ===== ==============================
|
||||
Enumeration Name Value Description
|
||||
====================================== ===== ==============================
|
||||
AMDGPU_FLOAT_ROUND_MODE_NEAR_EVEN 0 Round Ties To Even
|
||||
AMDGPU_FLOAT_ROUND_MODE_PLUS_INFINITY 1 Round Toward +infinity
|
||||
AMDGPU_FLOAT_ROUND_MODE_MINUS_INFINITY 2 Round Toward -infinity
|
||||
AMDGPU_FLOAT_ROUND_MODE_ZERO 3 Round Toward 0
|
||||
FLOAT_ROUND_MODE_NEAR_EVEN 0 Round Ties To Even
|
||||
FLOAT_ROUND_MODE_PLUS_INFINITY 1 Round Toward +infinity
|
||||
FLOAT_ROUND_MODE_MINUS_INFINITY 2 Round Toward -infinity
|
||||
FLOAT_ROUND_MODE_ZERO 3 Round Toward 0
|
||||
====================================== ===== ==============================
|
||||
|
||||
..
|
||||
@ -2010,11 +2011,11 @@ CP microcode requires the Kernel descritor to be allocated on 64 byte alignment.
|
||||
====================================== ===== ==============================
|
||||
Enumeration Name Value Description
|
||||
====================================== ===== ==============================
|
||||
AMDGPU_FLOAT_DENORM_MODE_FLUSH_SRC_DST 0 Flush Source and Destination
|
||||
FLOAT_DENORM_MODE_FLUSH_SRC_DST 0 Flush Source and Destination
|
||||
Denorms
|
||||
AMDGPU_FLOAT_DENORM_MODE_FLUSH_DST 1 Flush Output Denorms
|
||||
AMDGPU_FLOAT_DENORM_MODE_FLUSH_SRC 2 Flush Source Denorms
|
||||
AMDGPU_FLOAT_DENORM_MODE_FLUSH_NONE 3 No Flush
|
||||
FLOAT_DENORM_MODE_FLUSH_DST 1 Flush Output Denorms
|
||||
FLOAT_DENORM_MODE_FLUSH_SRC 2 Flush Source Denorms
|
||||
FLOAT_DENORM_MODE_FLUSH_NONE 3 No Flush
|
||||
====================================== ===== ==============================
|
||||
|
||||
..
|
||||
@ -2025,13 +2026,13 @@ CP microcode requires the Kernel descritor to be allocated on 64 byte alignment.
|
||||
======================================== ===== ============================
|
||||
Enumeration Name Value Description
|
||||
======================================== ===== ============================
|
||||
AMDGPU_SYSTEM_VGPR_WORKITEM_ID_X 0 Set work-item X dimension
|
||||
SYSTEM_VGPR_WORKITEM_ID_X 0 Set work-item X dimension
|
||||
ID.
|
||||
AMDGPU_SYSTEM_VGPR_WORKITEM_ID_X_Y 1 Set work-item X and Y
|
||||
SYSTEM_VGPR_WORKITEM_ID_X_Y 1 Set work-item X and Y
|
||||
dimensions ID.
|
||||
AMDGPU_SYSTEM_VGPR_WORKITEM_ID_X_Y_Z 2 Set work-item X, Y and Z
|
||||
SYSTEM_VGPR_WORKITEM_ID_X_Y_Z 2 Set work-item X, Y and Z
|
||||
dimensions ID.
|
||||
AMDGPU_SYSTEM_VGPR_WORKITEM_ID_UNDEFINED 3 Undefined.
|
||||
SYSTEM_VGPR_WORKITEM_ID_UNDEFINED 3 Undefined.
|
||||
======================================== ===== ============================
|
||||
|
||||
.. _amdgpu-amdhsa-initial-kernel-execution-state:
|
||||
|
@ -1,139 +0,0 @@
|
||||
//===--- AMDGPUKernelDescriptor.h -------------------------------*- C++ -*-===//
|
||||
//
|
||||
// The LLVM Compiler Infrastructure
|
||||
//
|
||||
// This file is distributed under the University of Illinois Open Source
|
||||
// License. See LICENSE.TXT for details.
|
||||
//
|
||||
//===----------------------------------------------------------------------===//
|
||||
//
|
||||
/// \file
|
||||
/// AMDGPU kernel descriptor definitions. For more information, visit
|
||||
/// https://llvm.org/docs/AMDGPUUsage.html#kernel-descriptor-for-gfx6-gfx9
|
||||
//
|
||||
//===----------------------------------------------------------------------===//
|
||||
|
||||
#ifndef LLVM_SUPPORT_AMDGPUKERNELDESCRIPTOR_H
|
||||
#define LLVM_SUPPORT_AMDGPUKERNELDESCRIPTOR_H
|
||||
|
||||
#include <cstdint>
|
||||
|
||||
// Creates enumeration entries used for packing bits into integers. Enumeration
|
||||
// entries include bit shift amount, bit width, and bit mask.
|
||||
#define AMDGPU_BITS_ENUM_ENTRY(name, shift, width) \
|
||||
name ## _SHIFT = (shift), \
|
||||
name ## _WIDTH = (width), \
|
||||
name = (((1 << (width)) - 1) << (shift)) \
|
||||
|
||||
// Gets bits for specified bit mask from specified source.
|
||||
#define AMDGPU_BITS_GET(src, mask) \
|
||||
((src & mask) >> mask ## _SHIFT) \
|
||||
|
||||
// Sets bits for specified bit mask in specified destination.
|
||||
#define AMDGPU_BITS_SET(dst, mask, val) \
|
||||
dst &= (~(1 << mask ## _SHIFT) & ~mask); \
|
||||
dst |= (((val) << mask ## _SHIFT) & mask) \
|
||||
|
||||
namespace llvm {
|
||||
namespace AMDGPU {
|
||||
namespace HSAKD {
|
||||
|
||||
/// Floating point rounding modes.
|
||||
enum : uint8_t {
|
||||
AMDGPU_FLOAT_ROUND_MODE_NEAR_EVEN = 0,
|
||||
AMDGPU_FLOAT_ROUND_MODE_PLUS_INFINITY = 1,
|
||||
AMDGPU_FLOAT_ROUND_MODE_MINUS_INFINITY = 2,
|
||||
AMDGPU_FLOAT_ROUND_MODE_ZERO = 3,
|
||||
};
|
||||
|
||||
/// Floating point denorm modes.
|
||||
enum : uint8_t {
|
||||
AMDGPU_FLOAT_DENORM_MODE_FLUSH_SRC_DST = 0,
|
||||
AMDGPU_FLOAT_DENORM_MODE_FLUSH_DST = 1,
|
||||
AMDGPU_FLOAT_DENORM_MODE_FLUSH_SRC = 2,
|
||||
AMDGPU_FLOAT_DENORM_MODE_FLUSH_NONE = 3,
|
||||
};
|
||||
|
||||
/// System VGPR workitem IDs.
|
||||
enum : uint8_t {
|
||||
AMDGPU_SYSTEM_VGPR_WORKITEM_ID_X = 0,
|
||||
AMDGPU_SYSTEM_VGPR_WORKITEM_ID_X_Y = 1,
|
||||
AMDGPU_SYSTEM_VGPR_WORKITEM_ID_X_Y_Z = 2,
|
||||
AMDGPU_SYSTEM_VGPR_WORKITEM_ID_UNDEFINED = 3,
|
||||
};
|
||||
|
||||
/// Compute program resource register one layout.
|
||||
enum ComputePgmRsrc1 {
|
||||
AMDGPU_BITS_ENUM_ENTRY(GRANULATED_WORKITEM_VGPR_COUNT, 0, 6),
|
||||
AMDGPU_BITS_ENUM_ENTRY(GRANULATED_WAVEFRONT_SGPR_COUNT, 6, 4),
|
||||
AMDGPU_BITS_ENUM_ENTRY(PRIORITY, 10, 2),
|
||||
AMDGPU_BITS_ENUM_ENTRY(FLOAT_ROUND_MODE_32, 12, 2),
|
||||
AMDGPU_BITS_ENUM_ENTRY(FLOAT_ROUND_MODE_16_64, 14, 2),
|
||||
AMDGPU_BITS_ENUM_ENTRY(FLOAT_DENORM_MODE_32, 16, 2),
|
||||
AMDGPU_BITS_ENUM_ENTRY(FLOAT_DENORM_MODE_16_64, 18, 2),
|
||||
AMDGPU_BITS_ENUM_ENTRY(PRIV, 20, 1),
|
||||
AMDGPU_BITS_ENUM_ENTRY(ENABLE_DX10_CLAMP, 21, 1),
|
||||
AMDGPU_BITS_ENUM_ENTRY(DEBUG_MODE, 22, 1),
|
||||
AMDGPU_BITS_ENUM_ENTRY(ENABLE_IEEE_MODE, 23, 1),
|
||||
AMDGPU_BITS_ENUM_ENTRY(BULKY, 24, 1),
|
||||
AMDGPU_BITS_ENUM_ENTRY(CDBG_USER, 25, 1),
|
||||
AMDGPU_BITS_ENUM_ENTRY(FP16_OVFL, 26, 1),
|
||||
AMDGPU_BITS_ENUM_ENTRY(RESERVED0, 27, 5),
|
||||
};
|
||||
|
||||
/// Compute program resource register two layout.
|
||||
enum ComputePgmRsrc2 {
|
||||
AMDGPU_BITS_ENUM_ENTRY(ENABLE_SGPR_PRIVATE_SEGMENT_WAVE_OFFSET, 0, 1),
|
||||
AMDGPU_BITS_ENUM_ENTRY(USER_SGPR_COUNT, 1, 5),
|
||||
AMDGPU_BITS_ENUM_ENTRY(ENABLE_TRAP_HANDLER, 6, 1),
|
||||
AMDGPU_BITS_ENUM_ENTRY(ENABLE_SGPR_WORKGROUP_ID_X, 7, 1),
|
||||
AMDGPU_BITS_ENUM_ENTRY(ENABLE_SGPR_WORKGROUP_ID_Y, 8, 1),
|
||||
AMDGPU_BITS_ENUM_ENTRY(ENABLE_SGPR_WORKGROUP_ID_Z, 9, 1),
|
||||
AMDGPU_BITS_ENUM_ENTRY(ENABLE_SGPR_WORKGROUP_INFO, 10, 1),
|
||||
AMDGPU_BITS_ENUM_ENTRY(ENABLE_VGPR_WORKITEM_ID, 11, 2),
|
||||
AMDGPU_BITS_ENUM_ENTRY(ENABLE_EXCEPTION_ADDRESS_WATCH, 13, 1),
|
||||
AMDGPU_BITS_ENUM_ENTRY(ENABLE_EXCEPTION_MEMORY, 14, 1),
|
||||
AMDGPU_BITS_ENUM_ENTRY(GRANULATED_LDS_SIZE, 15, 9),
|
||||
AMDGPU_BITS_ENUM_ENTRY(ENABLE_EXCEPTION_IEEE_754_FP_INVALID_OPERATION, 24, 1),
|
||||
AMDGPU_BITS_ENUM_ENTRY(ENABLE_EXCEPTION_FP_DENORMAL_SOURCE, 25, 1),
|
||||
AMDGPU_BITS_ENUM_ENTRY(ENABLE_EXCEPTION_IEEE_754_FP_DIVISION_BY_ZERO, 26, 1),
|
||||
AMDGPU_BITS_ENUM_ENTRY(ENABLE_EXCEPTION_IEEE_754_FP_OVERFLOW, 27, 1),
|
||||
AMDGPU_BITS_ENUM_ENTRY(ENABLE_EXCEPTION_IEEE_754_FP_UNDERFLOW, 28, 1),
|
||||
AMDGPU_BITS_ENUM_ENTRY(ENABLE_EXCEPTION_IEEE_754_FP_INEXACT, 29, 1),
|
||||
AMDGPU_BITS_ENUM_ENTRY(ENABLE_EXCEPTION_INT_DIVIDE_BY_ZERO, 30, 1),
|
||||
AMDGPU_BITS_ENUM_ENTRY(RESERVED1, 31, 1),
|
||||
};
|
||||
|
||||
/// Kernel descriptor layout. This layout should be kept backwards
|
||||
/// compatible as it is consumed by the command processor.
|
||||
struct KernelDescriptor final {
|
||||
uint32_t GroupSegmentFixedSize;
|
||||
uint32_t PrivateSegmentFixedSize;
|
||||
uint32_t MaxFlatWorkGroupSize;
|
||||
uint64_t IsDynamicCallStack : 1;
|
||||
uint64_t IsXNACKEnabled : 1;
|
||||
uint64_t Reserved0 : 30;
|
||||
int64_t KernelCodeEntryByteOffset;
|
||||
uint64_t Reserved1[3];
|
||||
uint32_t ComputePgmRsrc1;
|
||||
uint32_t ComputePgmRsrc2;
|
||||
uint64_t EnableSGPRPrivateSegmentBuffer : 1;
|
||||
uint64_t EnableSGPRDispatchPtr : 1;
|
||||
uint64_t EnableSGPRQueuePtr : 1;
|
||||
uint64_t EnableSGPRKernargSegmentPtr : 1;
|
||||
uint64_t EnableSGPRDispatchID : 1;
|
||||
uint64_t EnableSGPRFlatScratchInit : 1;
|
||||
uint64_t EnableSGPRPrivateSegmentSize : 1;
|
||||
uint64_t EnableSGPRGridWorkgroupCountX : 1;
|
||||
uint64_t EnableSGPRGridWorkgroupCountY : 1;
|
||||
uint64_t EnableSGPRGridWorkgroupCountZ : 1;
|
||||
uint64_t Reserved2 : 54;
|
||||
|
||||
KernelDescriptor() = default;
|
||||
};
|
||||
|
||||
} // end namespace HSAKD
|
||||
} // end namespace AMDGPU
|
||||
} // end namespace llvm
|
||||
|
||||
#endif // LLVM_SUPPORT_AMDGPUKERNELDESCRIPTOR_H
|
187
include/llvm/Support/AMDHSAKernelDescriptor.h
Normal file
187
include/llvm/Support/AMDHSAKernelDescriptor.h
Normal file
@ -0,0 +1,187 @@
|
||||
//===--- AMDHSAKernelDescriptor.h -----------------------------*- C++ -*---===//
|
||||
//
|
||||
// The LLVM Compiler Infrastructure
|
||||
//
|
||||
// This file is distributed under the University of Illinois Open Source
|
||||
// License. See LICENSE.TXT for details.
|
||||
//
|
||||
//===----------------------------------------------------------------------===//
|
||||
//
|
||||
/// \file
|
||||
/// AMDHSA kernel descriptor definitions. For more information, visit
|
||||
/// https://llvm.org/docs/AMDGPUUsage.html#kernel-descriptor
|
||||
//
|
||||
//===----------------------------------------------------------------------===//
|
||||
|
||||
#ifndef LLVM_SUPPORT_AMDHSAKERNELDESCRIPTOR_H
|
||||
#define LLVM_SUPPORT_AMDHSAKERNELDESCRIPTOR_H
|
||||
|
||||
#include <cstdint>
|
||||
|
||||
// Gets offset of specified member in specified type.
|
||||
#ifndef offsetof
|
||||
#define offsetof(TYPE, MEMBER) ((size_t)&((TYPE*)0)->MEMBER)
|
||||
#endif // offsetof
|
||||
|
||||
// Creates enumeration entries used for packing bits into integers. Enumeration
|
||||
// entries include bit shift amount, bit width, and bit mask.
|
||||
#ifndef AMDHSA_BITS_ENUM_ENTRY
|
||||
#define AMDHSA_BITS_ENUM_ENTRY(NAME, SHIFT, WIDTH) \
|
||||
NAME ## _SHIFT = (SHIFT), \
|
||||
NAME ## _WIDTH = (WIDTH), \
|
||||
NAME = (((1 << (WIDTH)) - 1) << (SHIFT))
|
||||
#endif // AMDHSA_BITS_ENUM_ENTRY
|
||||
|
||||
// Gets bits for specified bit mask from specified source.
|
||||
#ifndef AMDHSA_BITS_GET
|
||||
#define AMDHSA_BITS_GET(SRC, MSK) ((SRC & MSK) >> MSK ## _SHIFT)
|
||||
#endif // AMDHSA_BITS_GET
|
||||
|
||||
// Sets bits for specified bit mask in specified destination.
|
||||
#ifndef AMDHSA_BITS_SET
|
||||
#define AMDHSA_BITS_SET(DST, MSK, VAL) \
|
||||
DST &= ~MSK; \
|
||||
DST |= ((VAL << MSK ## _SHIFT) & MSK)
|
||||
#endif // AMDHSA_BITS_SET
|
||||
|
||||
namespace llvm {
|
||||
namespace amdhsa {
|
||||
|
||||
// Floating point rounding modes. Must be kept backwards compatible.
|
||||
enum : uint8_t {
|
||||
FLOAT_ROUND_MODE_NEAR_EVEN = 0,
|
||||
FLOAT_ROUND_MODE_PLUS_INFINITY = 1,
|
||||
FLOAT_ROUND_MODE_MINUS_INFINITY = 2,
|
||||
FLOAT_ROUND_MODE_ZERO = 3,
|
||||
};
|
||||
|
||||
// Floating point denorm modes. Must be kept backwards compatible.
|
||||
enum : uint8_t {
|
||||
FLOAT_DENORM_MODE_FLUSH_SRC_DST = 0,
|
||||
FLOAT_DENORM_MODE_FLUSH_DST = 1,
|
||||
FLOAT_DENORM_MODE_FLUSH_SRC = 2,
|
||||
FLOAT_DENORM_MODE_FLUSH_NONE = 3,
|
||||
};
|
||||
|
||||
// System VGPR workitem IDs. Must be kept backwards compatible.
|
||||
enum : uint8_t {
|
||||
SYSTEM_VGPR_WORKITEM_ID_X = 0,
|
||||
SYSTEM_VGPR_WORKITEM_ID_X_Y = 1,
|
||||
SYSTEM_VGPR_WORKITEM_ID_X_Y_Z = 2,
|
||||
SYSTEM_VGPR_WORKITEM_ID_UNDEFINED = 3,
|
||||
};
|
||||
|
||||
// Compute program resource register 1. Must be kept backwards compatible.
|
||||
#define COMPUTE_PGM_RSRC1(NAME, SHIFT, WIDTH) \
|
||||
AMDHSA_BITS_ENUM_ENTRY(COMPUTE_PGM_RSRC1_ ## NAME, SHIFT, WIDTH)
|
||||
enum : int32_t {
|
||||
COMPUTE_PGM_RSRC1(GRANULATED_WORKITEM_VGPR_COUNT, 0, 6),
|
||||
COMPUTE_PGM_RSRC1(GRANULATED_WAVEFRONT_SGPR_COUNT, 6, 4),
|
||||
COMPUTE_PGM_RSRC1(PRIORITY, 10, 2),
|
||||
COMPUTE_PGM_RSRC1(FLOAT_ROUND_MODE_32, 12, 2),
|
||||
COMPUTE_PGM_RSRC1(FLOAT_ROUND_MODE_16_64, 14, 2),
|
||||
COMPUTE_PGM_RSRC1(FLOAT_DENORM_MODE_32, 16, 2),
|
||||
COMPUTE_PGM_RSRC1(FLOAT_DENORM_MODE_16_64, 18, 2),
|
||||
COMPUTE_PGM_RSRC1(PRIV, 20, 1),
|
||||
COMPUTE_PGM_RSRC1(ENABLE_DX10_CLAMP, 21, 1),
|
||||
COMPUTE_PGM_RSRC1(DEBUG_MODE, 22, 1),
|
||||
COMPUTE_PGM_RSRC1(ENABLE_IEEE_MODE, 23, 1),
|
||||
COMPUTE_PGM_RSRC1(BULKY, 24, 1),
|
||||
COMPUTE_PGM_RSRC1(CDBG_USER, 25, 1),
|
||||
COMPUTE_PGM_RSRC1(FP16_OVFL, 26, 1),
|
||||
COMPUTE_PGM_RSRC1(RESERVED, 27, 5),
|
||||
};
|
||||
#undef COMPUTE_PGM_RSRC1
|
||||
|
||||
// Compute program resource register 2. Must be kept backwards compatible.
|
||||
#define COMPUTE_PGM_RSRC2(NAME, SHIFT, WIDTH) \
|
||||
AMDHSA_BITS_ENUM_ENTRY(COMPUTE_PGM_RSRC2_ ## NAME, SHIFT, WIDTH)
|
||||
enum : int32_t {
|
||||
COMPUTE_PGM_RSRC2(ENABLE_SGPR_PRIVATE_SEGMENT_WAVEFRONT_OFFSET, 0, 1),
|
||||
COMPUTE_PGM_RSRC2(USER_SGPR_COUNT, 1, 5),
|
||||
COMPUTE_PGM_RSRC2(ENABLE_TRAP_HANDLER, 6, 1),
|
||||
COMPUTE_PGM_RSRC2(ENABLE_SGPR_WORKGROUP_ID_X, 7, 1),
|
||||
COMPUTE_PGM_RSRC2(ENABLE_SGPR_WORKGROUP_ID_Y, 8, 1),
|
||||
COMPUTE_PGM_RSRC2(ENABLE_SGPR_WORKGROUP_ID_Z, 9, 1),
|
||||
COMPUTE_PGM_RSRC2(ENABLE_SGPR_WORKGROUP_INFO, 10, 1),
|
||||
COMPUTE_PGM_RSRC2(ENABLE_VGPR_WORKITEM_ID, 11, 2),
|
||||
COMPUTE_PGM_RSRC2(ENABLE_EXCEPTION_ADDRESS_WATCH, 13, 1),
|
||||
COMPUTE_PGM_RSRC2(ENABLE_EXCEPTION_MEMORY, 14, 1),
|
||||
COMPUTE_PGM_RSRC2(GRANULATED_LDS_SIZE, 15, 9),
|
||||
COMPUTE_PGM_RSRC2(ENABLE_EXCEPTION_IEEE_754_FP_INVALID_OPERATION, 24, 1),
|
||||
COMPUTE_PGM_RSRC2(ENABLE_EXCEPTION_FP_DENORMAL_SOURCE, 25, 1),
|
||||
COMPUTE_PGM_RSRC2(ENABLE_EXCEPTION_IEEE_754_FP_DIVISION_BY_ZERO, 26, 1),
|
||||
COMPUTE_PGM_RSRC2(ENABLE_EXCEPTION_IEEE_754_FP_OVERFLOW, 27, 1),
|
||||
COMPUTE_PGM_RSRC2(ENABLE_EXCEPTION_IEEE_754_FP_UNDERFLOW, 28, 1),
|
||||
COMPUTE_PGM_RSRC2(ENABLE_EXCEPTION_IEEE_754_FP_INEXACT, 29, 1),
|
||||
COMPUTE_PGM_RSRC2(ENABLE_EXCEPTION_INT_DIVIDE_BY_ZERO, 30, 1),
|
||||
COMPUTE_PGM_RSRC2(RESERVED, 31, 1),
|
||||
};
|
||||
#undef COMPUTE_PGM_RSRC2
|
||||
|
||||
// Kernel code properties. Must be kept backwards compatible.
|
||||
#define KERNEL_CODE_PROPERTY(NAME, SHIFT, WIDTH) \
|
||||
AMDHSA_BITS_ENUM_ENTRY(KERNEL_CODE_PROPERTY_ ## NAME, SHIFT, WIDTH)
|
||||
enum : int32_t {
|
||||
KERNEL_CODE_PROPERTY(ENABLE_SGPR_PRIVATE_SEGMENT_BUFFER, 0, 1),
|
||||
KERNEL_CODE_PROPERTY(ENABLE_SGPR_DISPATCH_PTR, 1, 1),
|
||||
KERNEL_CODE_PROPERTY(ENABLE_SGPR_QUEUE_PTR, 2, 1),
|
||||
KERNEL_CODE_PROPERTY(ENABLE_SGPR_KERNARG_SEGMENT_PTR, 3, 1),
|
||||
KERNEL_CODE_PROPERTY(ENABLE_SGPR_DISPATCH_ID, 4, 1),
|
||||
KERNEL_CODE_PROPERTY(ENABLE_SGPR_FLAT_SCRATCH_INIT, 5, 1),
|
||||
KERNEL_CODE_PROPERTY(ENABLE_SGPR_PRIVATE_SEGMENT_SIZE, 6, 1),
|
||||
KERNEL_CODE_PROPERTY(ENABLE_SGPR_GRID_WORKGROUP_COUNT_X, 7, 1),
|
||||
KERNEL_CODE_PROPERTY(ENABLE_SGPR_GRID_WORKGROUP_COUNT_Y, 8, 1),
|
||||
KERNEL_CODE_PROPERTY(ENABLE_SGPR_GRID_WORKGROUP_COUNT_Z, 9, 1),
|
||||
KERNEL_CODE_PROPERTY(RESERVED, 10, 6),
|
||||
};
|
||||
#undef KERNEL_CODE_PROPERTY
|
||||
|
||||
// Kernel descriptor. Must be kept backwards compatible.
|
||||
struct kernel_descriptor_t {
|
||||
uint32_t group_segment_fixed_size;
|
||||
uint32_t private_segment_fixed_size;
|
||||
uint8_t reserved0[8];
|
||||
int64_t kernel_code_entry_byte_offset;
|
||||
uint8_t reserved1[24];
|
||||
uint32_t compute_pgm_rsrc1;
|
||||
uint32_t compute_pgm_rsrc2;
|
||||
uint16_t kernel_code_properties;
|
||||
uint8_t reserved2[6];
|
||||
};
|
||||
|
||||
static_assert(
|
||||
sizeof(kernel_descriptor_t) == 64,
|
||||
"invalid size for kernel_descriptor_t");
|
||||
static_assert(
|
||||
offsetof(kernel_descriptor_t, group_segment_fixed_size) == 0,
|
||||
"invalid offset for group_segment_fixed_size");
|
||||
static_assert(
|
||||
offsetof(kernel_descriptor_t, private_segment_fixed_size) == 4,
|
||||
"invalid offset for private_segment_fixed_size");
|
||||
static_assert(
|
||||
offsetof(kernel_descriptor_t, reserved0) == 8,
|
||||
"invalid offset for reserved0");
|
||||
static_assert(
|
||||
offsetof(kernel_descriptor_t, kernel_code_entry_byte_offset) == 16,
|
||||
"invalid offset for kernel_code_entry_byte_offset");
|
||||
static_assert(
|
||||
offsetof(kernel_descriptor_t, reserved1) == 24,
|
||||
"invalid offset for reserved1");
|
||||
static_assert(
|
||||
offsetof(kernel_descriptor_t, compute_pgm_rsrc1) == 48,
|
||||
"invalid offset for compute_pgm_rsrc1");
|
||||
static_assert(
|
||||
offsetof(kernel_descriptor_t, compute_pgm_rsrc2) == 52,
|
||||
"invalid offset for compute_pgm_rsrc2");
|
||||
static_assert(
|
||||
offsetof(kernel_descriptor_t, kernel_code_properties) == 56,
|
||||
"invalid offset for kernel_code_properties");
|
||||
static_assert(
|
||||
offsetof(kernel_descriptor_t, reserved2) == 58,
|
||||
"invalid offset for reserved2");
|
||||
|
||||
} // end namespace amdhsa
|
||||
} // end namespace llvm
|
||||
|
||||
#endif // LLVM_SUPPORT_AMDHSAKERNELDESCRIPTOR_H
|
@ -116,6 +116,10 @@ AMDGPUTargetStreamer* AMDGPUAsmPrinter::getTargetStreamer() const {
|
||||
}
|
||||
|
||||
void AMDGPUAsmPrinter::EmitStartOfAsmFile(Module &M) {
|
||||
if (IsaInfo::hasCodeObjectV3(getSTI()) &&
|
||||
TM.getTargetTriple().getOS() == Triple::AMDHSA)
|
||||
return;
|
||||
|
||||
if (TM.getTargetTriple().getOS() != Triple::AMDHSA &&
|
||||
TM.getTargetTriple().getOS() != Triple::AMDPAL)
|
||||
return;
|
||||
@ -126,10 +130,6 @@ void AMDGPUAsmPrinter::EmitStartOfAsmFile(Module &M) {
|
||||
if (TM.getTargetTriple().getOS() == Triple::AMDPAL)
|
||||
readPALMetadata(M);
|
||||
|
||||
// Deprecated notes are not emitted for code object v3.
|
||||
if (IsaInfo::hasCodeObjectV3(getSTI()->getFeatureBits()))
|
||||
return;
|
||||
|
||||
// HSA emits NT_AMDGPU_HSA_CODE_OBJECT_VERSION for code objects v2.
|
||||
if (TM.getTargetTriple().getOS() == Triple::AMDHSA)
|
||||
getTargetStreamer()->EmitDirectiveHSACodeObjectVersion(2, 1);
|
||||
@ -141,6 +141,10 @@ void AMDGPUAsmPrinter::EmitStartOfAsmFile(Module &M) {
|
||||
}
|
||||
|
||||
void AMDGPUAsmPrinter::EmitEndOfAsmFile(Module &M) {
|
||||
// TODO: Add metadata to code object v3.
|
||||
if (IsaInfo::hasCodeObjectV3(getSTI()) &&
|
||||
TM.getTargetTriple().getOS() == Triple::AMDHSA)
|
||||
return;
|
||||
|
||||
// Following code requires TargetStreamer to be present.
|
||||
if (!getTargetStreamer())
|
||||
@ -186,8 +190,11 @@ bool AMDGPUAsmPrinter::isBlockOnlyReachableByFallthrough(
|
||||
}
|
||||
|
||||
void AMDGPUAsmPrinter::EmitFunctionBodyStart() {
|
||||
const AMDGPUMachineFunction *MFI = MF->getInfo<AMDGPUMachineFunction>();
|
||||
if (!MFI->isEntryFunction())
|
||||
const SIMachineFunctionInfo &MFI = *MF->getInfo<SIMachineFunctionInfo>();
|
||||
if (!MFI.isEntryFunction())
|
||||
return;
|
||||
if (IsaInfo::hasCodeObjectV3(getSTI()) &&
|
||||
TM.getTargetTriple().getOS() == Triple::AMDHSA)
|
||||
return;
|
||||
|
||||
const AMDGPUSubtarget &STM = MF->getSubtarget<AMDGPUSubtarget>();
|
||||
@ -205,7 +212,27 @@ void AMDGPUAsmPrinter::EmitFunctionBodyStart() {
|
||||
getHSADebugProps(*MF, CurrentProgramInfo));
|
||||
}
|
||||
|
||||
void AMDGPUAsmPrinter::EmitFunctionBodyEnd() {
|
||||
const SIMachineFunctionInfo &MFI = *MF->getInfo<SIMachineFunctionInfo>();
|
||||
if (!MFI.isEntryFunction())
|
||||
return;
|
||||
if (!IsaInfo::hasCodeObjectV3(getSTI()) ||
|
||||
TM.getTargetTriple().getOS() != Triple::AMDHSA)
|
||||
return;
|
||||
|
||||
SmallString<128> KernelName;
|
||||
getNameWithPrefix(KernelName, &MF->getFunction());
|
||||
getTargetStreamer()->EmitAmdhsaKernelDescriptor(
|
||||
KernelName, getAmdhsaKernelDescriptor(*MF, CurrentProgramInfo));
|
||||
}
|
||||
|
||||
void AMDGPUAsmPrinter::EmitFunctionEntryLabel() {
|
||||
if (IsaInfo::hasCodeObjectV3(getSTI()) &&
|
||||
TM.getTargetTriple().getOS() == Triple::AMDHSA) {
|
||||
AsmPrinter::EmitFunctionEntryLabel();
|
||||
return;
|
||||
}
|
||||
|
||||
const SIMachineFunctionInfo *MFI = MF->getInfo<SIMachineFunctionInfo>();
|
||||
const AMDGPUSubtarget &STM = MF->getSubtarget<AMDGPUSubtarget>();
|
||||
if (MFI->isEntryFunction() && STM.isAmdCodeObjectV2(MF->getFunction())) {
|
||||
@ -288,6 +315,70 @@ void AMDGPUAsmPrinter::emitCommonFunctionComments(
|
||||
false);
|
||||
}
|
||||
|
||||
uint16_t AMDGPUAsmPrinter::getAmdhsaKernelCodeProperties(
|
||||
const MachineFunction &MF) const {
|
||||
const SIMachineFunctionInfo &MFI = *MF.getInfo<SIMachineFunctionInfo>();
|
||||
uint16_t KernelCodeProperties = 0;
|
||||
|
||||
if (MFI.hasPrivateSegmentBuffer()) {
|
||||
KernelCodeProperties |=
|
||||
amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_PRIVATE_SEGMENT_BUFFER;
|
||||
}
|
||||
if (MFI.hasDispatchPtr()) {
|
||||
KernelCodeProperties |=
|
||||
amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_DISPATCH_PTR;
|
||||
}
|
||||
if (MFI.hasQueuePtr()) {
|
||||
KernelCodeProperties |=
|
||||
amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_QUEUE_PTR;
|
||||
}
|
||||
if (MFI.hasKernargSegmentPtr()) {
|
||||
KernelCodeProperties |=
|
||||
amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_KERNARG_SEGMENT_PTR;
|
||||
}
|
||||
if (MFI.hasDispatchID()) {
|
||||
KernelCodeProperties |=
|
||||
amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_DISPATCH_ID;
|
||||
}
|
||||
if (MFI.hasFlatScratchInit()) {
|
||||
KernelCodeProperties |=
|
||||
amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_FLAT_SCRATCH_INIT;
|
||||
}
|
||||
if (MFI.hasGridWorkgroupCountX()) {
|
||||
KernelCodeProperties |=
|
||||
amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_GRID_WORKGROUP_COUNT_X;
|
||||
}
|
||||
if (MFI.hasGridWorkgroupCountY()) {
|
||||
KernelCodeProperties |=
|
||||
amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_GRID_WORKGROUP_COUNT_Y;
|
||||
}
|
||||
if (MFI.hasGridWorkgroupCountZ()) {
|
||||
KernelCodeProperties |=
|
||||
amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_GRID_WORKGROUP_COUNT_Z;
|
||||
}
|
||||
|
||||
return KernelCodeProperties;
|
||||
}
|
||||
|
||||
amdhsa::kernel_descriptor_t AMDGPUAsmPrinter::getAmdhsaKernelDescriptor(
|
||||
const MachineFunction &MF,
|
||||
const SIProgramInfo &PI) const {
|
||||
amdhsa::kernel_descriptor_t KernelDescriptor;
|
||||
memset(&KernelDescriptor, 0x0, sizeof(KernelDescriptor));
|
||||
|
||||
assert(isUInt<32>(PI.ScratchSize));
|
||||
assert(isUInt<32>(PI.ComputePGMRSrc1));
|
||||
assert(isUInt<32>(PI.ComputePGMRSrc2));
|
||||
|
||||
KernelDescriptor.group_segment_fixed_size = PI.LDSSize;
|
||||
KernelDescriptor.private_segment_fixed_size = PI.ScratchSize;
|
||||
KernelDescriptor.compute_pgm_rsrc1 = PI.ComputePGMRSrc1;
|
||||
KernelDescriptor.compute_pgm_rsrc2 = PI.ComputePGMRSrc2;
|
||||
KernelDescriptor.kernel_code_properties = getAmdhsaKernelCodeProperties(MF);
|
||||
|
||||
return KernelDescriptor;
|
||||
}
|
||||
|
||||
bool AMDGPUAsmPrinter::runOnMachineFunction(MachineFunction &MF) {
|
||||
CurrentProgramInfo = SIProgramInfo();
|
||||
|
||||
|
@ -20,6 +20,7 @@
|
||||
#include "MCTargetDesc/AMDGPUHSAMetadataStreamer.h"
|
||||
#include "llvm/ADT/StringRef.h"
|
||||
#include "llvm/CodeGen/AsmPrinter.h"
|
||||
#include "llvm/Support/AMDHSAKernelDescriptor.h"
|
||||
#include <cstddef>
|
||||
#include <cstdint>
|
||||
#include <limits>
|
||||
@ -148,6 +149,13 @@ private:
|
||||
uint64_t CodeSize,
|
||||
const AMDGPUMachineFunction* MFI);
|
||||
|
||||
uint16_t getAmdhsaKernelCodeProperties(
|
||||
const MachineFunction &MF) const;
|
||||
|
||||
amdhsa::kernel_descriptor_t getAmdhsaKernelDescriptor(
|
||||
const MachineFunction &MF,
|
||||
const SIProgramInfo &PI) const;
|
||||
|
||||
public:
|
||||
explicit AMDGPUAsmPrinter(TargetMachine &TM,
|
||||
std::unique_ptr<MCStreamer> Streamer);
|
||||
@ -180,6 +188,8 @@ public:
|
||||
|
||||
void EmitFunctionBodyStart() override;
|
||||
|
||||
void EmitFunctionBodyEnd() override;
|
||||
|
||||
void EmitFunctionEntryLabel() override;
|
||||
|
||||
void EmitBasicBlockStart(const MachineBasicBlock &MBB) const override;
|
||||
|
@ -196,6 +196,12 @@ bool AMDGPUTargetAsmStreamer::EmitPALMetadata(
|
||||
return true;
|
||||
}
|
||||
|
||||
void AMDGPUTargetAsmStreamer::EmitAmdhsaKernelDescriptor(
|
||||
StringRef KernelName,
|
||||
const amdhsa::kernel_descriptor_t &KernelDescriptor) {
|
||||
// FIXME: not supported yet.
|
||||
}
|
||||
|
||||
//===----------------------------------------------------------------------===//
|
||||
// AMDGPUTargetELFStreamer
|
||||
//===----------------------------------------------------------------------===//
|
||||
@ -362,3 +368,57 @@ bool AMDGPUTargetELFStreamer::EmitPALMetadata(
|
||||
);
|
||||
return true;
|
||||
}
|
||||
|
||||
void AMDGPUTargetELFStreamer::EmitAmdhsaKernelDescriptor(
|
||||
StringRef KernelName,
|
||||
const amdhsa::kernel_descriptor_t &KernelDescriptor) {
|
||||
auto &Streamer = getStreamer();
|
||||
auto &Context = Streamer.getContext();
|
||||
auto &ObjectFileInfo = *Context.getObjectFileInfo();
|
||||
auto &ReadOnlySection = *ObjectFileInfo.getReadOnlySection();
|
||||
|
||||
Streamer.PushSection();
|
||||
Streamer.SwitchSection(&ReadOnlySection);
|
||||
|
||||
// CP microcode requires the kernel descriptor to be allocated on 64 byte
|
||||
// alignment.
|
||||
Streamer.EmitValueToAlignment(64, 0, 1, 0);
|
||||
if (ReadOnlySection.getAlignment() < 64)
|
||||
ReadOnlySection.setAlignment(64);
|
||||
|
||||
MCSymbolELF *KernelDescriptorSymbol = cast<MCSymbolELF>(
|
||||
Context.getOrCreateSymbol(Twine(KernelName) + Twine(".kd")));
|
||||
KernelDescriptorSymbol->setBinding(ELF::STB_GLOBAL);
|
||||
KernelDescriptorSymbol->setType(ELF::STT_OBJECT);
|
||||
KernelDescriptorSymbol->setSize(
|
||||
MCConstantExpr::create(sizeof(KernelDescriptor), Context));
|
||||
|
||||
MCSymbolELF *KernelCodeSymbol = cast<MCSymbolELF>(
|
||||
Context.getOrCreateSymbol(Twine(KernelName)));
|
||||
KernelCodeSymbol->setBinding(ELF::STB_LOCAL);
|
||||
|
||||
Streamer.EmitLabel(KernelDescriptorSymbol);
|
||||
Streamer.EmitBytes(StringRef(
|
||||
(const char*)&(KernelDescriptor),
|
||||
offsetof(amdhsa::kernel_descriptor_t, kernel_code_entry_byte_offset)));
|
||||
// FIXME: Remove the use of VK_AMDGPU_REL64 in the expression below. The
|
||||
// expression being created is:
|
||||
// (start of kernel code) - (start of kernel descriptor)
|
||||
// It implies R_AMDGPU_REL64, but ends up being R_AMDGPU_ABS64.
|
||||
Streamer.EmitValue(MCBinaryExpr::createSub(
|
||||
MCSymbolRefExpr::create(
|
||||
KernelCodeSymbol, MCSymbolRefExpr::VK_AMDGPU_REL64, Context),
|
||||
MCSymbolRefExpr::create(
|
||||
KernelDescriptorSymbol, MCSymbolRefExpr::VK_None, Context),
|
||||
Context),
|
||||
sizeof(KernelDescriptor.kernel_code_entry_byte_offset));
|
||||
Streamer.EmitBytes(StringRef(
|
||||
(const char*)&(KernelDescriptor) +
|
||||
offsetof(amdhsa::kernel_descriptor_t, kernel_code_entry_byte_offset) +
|
||||
sizeof(KernelDescriptor.kernel_code_entry_byte_offset),
|
||||
sizeof(KernelDescriptor) -
|
||||
offsetof(amdhsa::kernel_descriptor_t, kernel_code_entry_byte_offset) -
|
||||
sizeof(KernelDescriptor.kernel_code_entry_byte_offset)));
|
||||
|
||||
Streamer.PopSection();
|
||||
}
|
||||
|
@ -14,6 +14,7 @@
|
||||
#include "llvm/MC/MCStreamer.h"
|
||||
#include "llvm/MC/MCSubtargetInfo.h"
|
||||
#include "llvm/Support/AMDGPUMetadata.h"
|
||||
#include "llvm/Support/AMDHSAKernelDescriptor.h"
|
||||
|
||||
namespace llvm {
|
||||
#include "AMDGPUPTNote.h"
|
||||
@ -62,6 +63,10 @@ public:
|
||||
|
||||
/// \returns True on success, false on failure.
|
||||
virtual bool EmitPALMetadata(const AMDGPU::PALMD::Metadata &PALMetadata) = 0;
|
||||
|
||||
virtual void EmitAmdhsaKernelDescriptor(
|
||||
StringRef KernelName,
|
||||
const amdhsa::kernel_descriptor_t &KernelDescriptor) = 0;
|
||||
};
|
||||
|
||||
class AMDGPUTargetAsmStreamer final : public AMDGPUTargetStreamer {
|
||||
@ -87,6 +92,10 @@ public:
|
||||
|
||||
/// \returns True on success, false on failure.
|
||||
bool EmitPALMetadata(const AMDGPU::PALMD::Metadata &PALMetadata) override;
|
||||
|
||||
void EmitAmdhsaKernelDescriptor(
|
||||
StringRef KernelName,
|
||||
const amdhsa::kernel_descriptor_t &KernelDescriptor) override;
|
||||
};
|
||||
|
||||
class AMDGPUTargetELFStreamer final : public AMDGPUTargetStreamer {
|
||||
@ -119,6 +128,10 @@ public:
|
||||
|
||||
/// \returns True on success, false on failure.
|
||||
bool EmitPALMetadata(const AMDGPU::PALMD::Metadata &PALMetadata) override;
|
||||
|
||||
void EmitAmdhsaKernelDescriptor(
|
||||
StringRef KernelName,
|
||||
const amdhsa::kernel_descriptor_t &KernelDescriptor) override;
|
||||
};
|
||||
|
||||
}
|
||||
|
@ -248,8 +248,8 @@ void streamIsaVersion(const MCSubtargetInfo *STI, raw_ostream &Stream) {
|
||||
Stream.flush();
|
||||
}
|
||||
|
||||
bool hasCodeObjectV3(const FeatureBitset &Features) {
|
||||
return Features.test(FeatureCodeObjectV3);
|
||||
bool hasCodeObjectV3(const MCSubtargetInfo *STI) {
|
||||
return STI->getFeatureBits().test(FeatureCodeObjectV3);
|
||||
}
|
||||
|
||||
unsigned getWavefrontSize(const FeatureBitset &Features) {
|
||||
|
@ -59,9 +59,9 @@ IsaVersion getIsaVersion(const FeatureBitset &Features);
|
||||
/// Streams isa version string for given subtarget \p STI into \p Stream.
|
||||
void streamIsaVersion(const MCSubtargetInfo *STI, raw_ostream &Stream);
|
||||
|
||||
/// \returns True if given subtarget \p Features support code object version 3,
|
||||
/// \returns True if given subtarget \p STI supports code object version 3,
|
||||
/// false otherwise.
|
||||
bool hasCodeObjectV3(const FeatureBitset &Features);
|
||||
bool hasCodeObjectV3(const MCSubtargetInfo *STI);
|
||||
|
||||
/// \returns Wavefront size for given subtarget \p Features.
|
||||
unsigned getWavefrontSize(const FeatureBitset &Features);
|
||||
|
48
test/CodeGen/AMDGPU/code-object-v3.ll
Normal file
48
test/CodeGen/AMDGPU/code-object-v3.ll
Normal file
@ -0,0 +1,48 @@
|
||||
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx803 -mattr=+code-object-v3 < %s | FileCheck --check-prefixes=ALL-ASM,OSABI-AMDHSA-ASM %s
|
||||
; RUN: llc -filetype=obj -mtriple=amdgcn-amd-amdhsa -mcpu=gfx803 -mattr=+code-object-v3 < %s | llvm-readobj -elf-output-style=GNU -notes -relocations -sections -symbols | FileCheck --check-prefixes=ALL-ELF,OSABI-AMDHSA-ELF %s
|
||||
|
||||
; OSABI-AMDHSA-ASM-NOT: .hsa_code_object_version
|
||||
; OSABI-AMDHSA-ASM-NOT: .hsa_code_object_isa
|
||||
; OSABI-AMDHSA-ASM-NOT: .amd_amdgpu_isa
|
||||
; OSABI-AMDHSA-ASM-NOT: .amd_amdgpu_hsa_metadata
|
||||
; OSABI-AMDHSA-ASM-NOT: .amd_amdgpu_pal_metadata
|
||||
|
||||
; OSABI-AMDHSA-ELF: Section Headers
|
||||
; OSABI-AMDHSA-ELF: .text PROGBITS {{[0-9]+}} {{[0-9]+}} {{[0-9a-f]+}} {{[0-9]+}} AX {{[0-9]+}} {{[0-9]+}} 256
|
||||
; OSABI-AMDHSA-ELF: .rodata PROGBITS {{[0-9]+}} {{[0-9]+}} {{[0-9a-f]+}} {{[0-9]+}} A {{[0-9]+}} {{[0-9]+}} 64
|
||||
|
||||
; OSABI-AMDHSA-ELF: Relocation section '.rela.rodata' at offset
|
||||
; OSABI-AMDHSA-ELF: 0000000000000010 0000000300000005 R_AMDGPU_REL64 0000000000000000 .text + 10
|
||||
; OSABI-AMDHSA-ELF: 0000000000000050 0000000300000005 R_AMDGPU_REL64 0000000000000000 .text + 110
|
||||
|
||||
; OSABI-AMDHSA-ELF: Symbol table '.symtab' contains {{[0-9]+}} entries
|
||||
; OSABI-AMDHSA-ELF: {{[0-9]+}}: 0000000000000000 {{[0-9]+}} FUNC LOCAL DEFAULT {{[0-9]+}} fadd
|
||||
; OSABI-AMDHSA-ELF: {{[0-9]+}}: 0000000000000100 {{[0-9]+}} FUNC LOCAL DEFAULT {{[0-9]+}} fsub
|
||||
; OSABI-AMDHSA-ELF: {{[0-9]+}}: 0000000000000000 64 OBJECT GLOBAL DEFAULT {{[0-9]+}} fadd.kd
|
||||
; OSABI-AMDHSA-ELF: {{[0-9]+}}: 0000000000000040 64 OBJECT GLOBAL DEFAULT {{[0-9]+}} fsub.kd
|
||||
|
||||
; OSABI-AMDHSA-ELF-NOT: Displaying notes found
|
||||
|
||||
define amdgpu_kernel void @fadd(
|
||||
float addrspace(1)* %r,
|
||||
float addrspace(1)* %a,
|
||||
float addrspace(1)* %b) {
|
||||
entry:
|
||||
%a.val = load float, float addrspace(1)* %a
|
||||
%b.val = load float, float addrspace(1)* %b
|
||||
%r.val = fadd float %a.val, %b.val
|
||||
store float %r.val, float addrspace(1)* %r
|
||||
ret void
|
||||
}
|
||||
|
||||
define amdgpu_kernel void @fsub(
|
||||
float addrspace(1)* %r,
|
||||
float addrspace(1)* %a,
|
||||
float addrspace(1)* %b) {
|
||||
entry:
|
||||
%a.val = load float, float addrspace(1)* %a
|
||||
%b.val = load float, float addrspace(1)* %b
|
||||
%r.val = fsub float %a.val, %b.val
|
||||
store float %r.val, float addrspace(1)* %r
|
||||
ret void
|
||||
}
|
@ -1,13 +1,13 @@
|
||||
; RUN: llc -mtriple=amdgcn-amd-unknown -mcpu=gfx802 -mattr=+code-object-v3 < %s | FileCheck --check-prefix=GCN --check-prefix=OSABI-UNK --check-prefix=GFX802 %s
|
||||
; RUN: llc -mtriple=amdgcn-amd-unknown -mcpu=iceland -mattr=+code-object-v3 < %s | FileCheck --check-prefix=GCN --check-prefix=OSABI-UNK --check-prefix=GFX802 %s
|
||||
; RUN: llc -mtriple=amdgcn-amd-unknown -mcpu=gfx802 -mattr=+code-object-v3 -filetype=obj < %s | llvm-readobj -elf-output-style=GNU -notes | FileCheck --check-prefix=GCN --check-prefix=OSABI-UNK-ELF --check-prefix=GFX802 %s
|
||||
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx802 -mattr=+code-object-v3 < %s | FileCheck --check-prefix=GCN --check-prefix=OSABI-HSA --check-prefix=GFX802 %s
|
||||
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=iceland -mattr=+code-object-v3 < %s | FileCheck --check-prefix=GCN --check-prefix=OSABI-HSA --check-prefix=GFX802 %s
|
||||
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx802 -mattr=+code-object-v3 -filetype=obj < %s | llvm-readobj -elf-output-style=GNU -notes | FileCheck --check-prefix=GCN --check-prefix=OSABI-HSA-ELF --check-prefix=GFX802 %s
|
||||
; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx802 -mattr=+code-object-v3 < %s | FileCheck --check-prefix=GCN --check-prefix=OSABI-PAL --check-prefix=GFX802 %s
|
||||
; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=iceland -mattr=+code-object-v3 < %s | FileCheck --check-prefix=GCN --check-prefix=OSABI-PAL --check-prefix=GFX802 %s
|
||||
; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx802 -mattr=+code-object-v3 -filetype=obj < %s | llvm-readobj -elf-output-style=GNU -notes | FileCheck --check-prefix=GCN --check-prefix=OSABI-PAL-ELF --check-prefix=GFX802 %s
|
||||
; RUN: llc -march=r600 -mattr=+code-object-v3 < %s | FileCheck --check-prefix=R600 %s
|
||||
; RUN: llc -mtriple=amdgcn-amd-unknown -mcpu=gfx802 < %s | FileCheck --check-prefix=GCN --check-prefix=OSABI-UNK --check-prefix=GFX802 %s
|
||||
; RUN: llc -mtriple=amdgcn-amd-unknown -mcpu=iceland < %s | FileCheck --check-prefix=GCN --check-prefix=OSABI-UNK --check-prefix=GFX802 %s
|
||||
; RUN: llc -mtriple=amdgcn-amd-unknown -mcpu=gfx802 -filetype=obj < %s | llvm-readobj -elf-output-style=GNU -notes | FileCheck --check-prefix=GCN --check-prefix=OSABI-UNK-ELF --check-prefix=GFX802 %s
|
||||
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx802 < %s | FileCheck --check-prefix=GCN --check-prefix=OSABI-HSA --check-prefix=GFX802 %s
|
||||
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=iceland < %s | FileCheck --check-prefix=GCN --check-prefix=OSABI-HSA --check-prefix=GFX802 %s
|
||||
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx802 -filetype=obj < %s | llvm-readobj -elf-output-style=GNU -notes | FileCheck --check-prefix=GCN --check-prefix=OSABI-HSA-ELF --check-prefix=GFX802 %s
|
||||
; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx802 < %s | FileCheck --check-prefix=GCN --check-prefix=OSABI-PAL --check-prefix=GFX802 %s
|
||||
; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=iceland < %s | FileCheck --check-prefix=GCN --check-prefix=OSABI-PAL --check-prefix=GFX802 %s
|
||||
; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx802 -filetype=obj < %s | llvm-readobj -elf-output-style=GNU -notes | FileCheck --check-prefix=GCN --check-prefix=OSABI-PAL-ELF --check-prefix=GFX802 %s
|
||||
; RUN: llc -march=r600 < %s | FileCheck --check-prefix=R600 %s
|
||||
|
||||
; OSABI-UNK-NOT: .hsa_code_object_version
|
||||
; OSABI-UNK-NOT: .hsa_code_object_isa
|
||||
@ -25,17 +25,17 @@
|
||||
; OSABI-UNK-ELF-NOT: NT_AMD_AMDGPU_PAL_METADATA (PAL Metadata)
|
||||
; OSABI-UNK-ELF-NOT: Unknown note type
|
||||
|
||||
; OSABI-HSA-NOT: .hsa_code_object_version
|
||||
; OSABI-HSA-NOT: .hsa_code_object_isa
|
||||
; OSABI-HSA: .hsa_code_object_version
|
||||
; OSABI-HSA: .hsa_code_object_isa
|
||||
; OSABI-HSA: .amd_amdgpu_isa "amdgcn-amd-amdhsa--gfx802"
|
||||
; OSABI-HSA: .amd_amdgpu_hsa_metadata
|
||||
; OSABI-HSA-NOT: .amd_amdgpu_pal_metadata
|
||||
|
||||
; OSABI-HSA-ELF-NOT: Unknown note type
|
||||
; OSABI-HSA-ELF: Unknown note type (0x00000001)
|
||||
; OSABI-HSA-ELF: Unknown note type (0x00000003)
|
||||
; OSABI-HSA-ELF: NT_AMD_AMDGPU_ISA (ISA Version)
|
||||
; OSABI-HSA-ELF: ISA Version:
|
||||
; OSABI-HSA-ELF: amdgcn-amd-amdhsa--gfx802
|
||||
; OSABI-HSA-ELF-NOT: Unknown note type
|
||||
; OSABI-HSA-ELF: NT_AMD_AMDGPU_HSA_METADATA (HSA Metadata)
|
||||
; OSABI-HSA-ELF: HSA Metadata:
|
||||
; OSABI-HSA-ELF: ---
|
||||
@ -51,34 +51,29 @@
|
||||
; OSABI-HSA-ELF: WavefrontSize: 64
|
||||
; OSABI-HSA-ELF: NumSGPRs: 96
|
||||
; OSABI-HSA-ELF: ...
|
||||
; OSABI-HSA-ELF-NOT: Unknown note type
|
||||
; OSABI-HSA-ELF-NOT: NT_AMD_AMDGPU_PAL_METADATA (PAL Metadata)
|
||||
; OSABI-HSA-ELF-NOT: Unknown note type
|
||||
|
||||
; OSABI-PAL-NOT: .hsa_code_object_version
|
||||
; OSABI-PAL-NOT: .hsa_code_object_isa
|
||||
; OSABI-PAL: .hsa_code_object_isa
|
||||
; OSABI-PAL: .amd_amdgpu_isa "amdgcn-amd-amdpal--gfx802"
|
||||
; OSABI-PAL-NOT: .amd_amdgpu_hsa_metadata
|
||||
; OSABI-PAL: .amd_amdgpu_pal_metadata
|
||||
|
||||
; OSABI-PAL-ELF-NOT: Unknown note type
|
||||
; OSABI-PAL-ELF: Unknown note type (0x00000003)
|
||||
; OSABI-PAL-ELF: NT_AMD_AMDGPU_ISA (ISA Version)
|
||||
; OSABI-PAL-ELF: ISA Version:
|
||||
; OSABI-PAL-ELF: amdgcn-amd-amdpal--gfx802
|
||||
; OSABI-PAL-ELF-NOT: Unknown note type
|
||||
; OSABI-PAL-ELF-NOT: NT_AMD_AMDGPU_HSA_METADATA (HSA Metadata)
|
||||
; OSABI-PAL-ELF-NOT: Unknown note type
|
||||
; OSABI-PAL-ELF: NT_AMD_AMDGPU_PAL_METADATA (PAL Metadata)
|
||||
; OSABI-PAL-ELF: PAL Metadata:
|
||||
; TODO: Following check line fails on mips:
|
||||
; OSABI-PAL-ELF-XXX: 0x2e12,0xac02c0,0x2e13,0x80,0x1000001b,0x1,0x10000022,0x60,0x1000003e,0x0
|
||||
; OSABI-PAL-ELF-NOT: Unknown note type
|
||||
|
||||
; R600-NOT: .hsa_code_object_version
|
||||
; R600-NOT: .hsa_code_object_isa
|
||||
; R600-NOT: .amd_amdgpu_isa
|
||||
; R600-NOT: .amd_amdgpu_hsa_metadata
|
||||
; R600-NOT: .amd_amdgpu_pal_metadatas
|
||||
; R600-NOT: .amd_amdgpu_pal_metadata
|
||||
|
||||
define amdgpu_kernel void @elf_notes() {
|
||||
ret void
|
||||
|
Loading…
Reference in New Issue
Block a user