AMDHSA: Code object v3 updates

- Do not emit following assembler directives:
  - .hsa_code_object_version
  - .hsa_code_object_isa
  - .amd_amdgpu_isa
  - .amd_amdgpu_hsa_metadata
  - .amd_amdgpu_pal_metadata
- Do not emit .note entries
- Cleanup and bring in sync kernel descriptor header file
- Emit kernel descriptor into .rodata with appropriate relocations and
  alignments



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334519 91177308-0d34-0410-b5e6-96231b3b80d8
This commit is contained in:
Konstantin Zhuravlyov 2018-06-12 18:02:46 +00:00
parent 0dcc1159b4
commit 299cf5ff6a
11 changed files with 469 additions and 203 deletions

View File

@ -686,7 +686,7 @@ Symbols include the following:
*link-name* ``STT_OBJECT`` - ``.data`` Global variable
- ``.rodata``
- ``.bss``
*link-name*\ ``@kd`` ``STT_OBJECT`` - ``.rodata`` Kernel descriptor
*link-name*\ ``.kd`` ``STT_OBJECT`` - ``.rodata`` Kernel descriptor
*link-name* ``STT_FUNC`` - ``.text`` Kernel entry point
===================== ============== ============= ==================
@ -1578,7 +1578,7 @@ CP microcode requires the Kernel descritor to be allocated on 64 byte alignment.
======= ======= =============================== ============================
Bits Size Field Name Description
======= ======= =============================== ============================
31:0 4 bytes GroupSegmentFixedSize The amount of fixed local
31:0 4 bytes GROUP_SEGMENT_FIXED_SIZE The amount of fixed local
address space memory
required for a work-group
in bytes. This does not
@ -1587,7 +1587,7 @@ CP microcode requires the Kernel descritor to be allocated on 64 byte alignment.
space memory that may be
added when the kernel is
dispatched.
63:32 4 bytes PrivateSegmentFixedSize The amount of fixed
63:32 4 bytes PRIVATE_SEGMENT_FIXED_SIZE The amount of fixed
private address space
memory required for a
work-item in bytes. If
@ -1596,7 +1596,7 @@ CP microcode requires the Kernel descritor to be allocated on 64 byte alignment.
be added to this value for
the call stack.
127:64 8 bytes Reserved, must be 0.
191:128 8 bytes KernelCodeEntryByteOffset Byte offset (possibly
191:128 8 bytes KERNEL_CODE_ENTRY_BYTE_OFFSET Byte offset (possibly
negative) from base
address of kernel
descriptor to kernel's
@ -1605,22 +1605,22 @@ CP microcode requires the Kernel descritor to be allocated on 64 byte alignment.
aligned.
383:192 24 Reserved, must be 0.
bytes
415:384 4 bytes ComputePgmRsrc1 Compute Shader (CS)
415:384 4 bytes COMPUTE_PGM_RSRC1 Compute Shader (CS)
program settings used by
CP to set up
``COMPUTE_PGM_RSRC1``
configuration
register. See
:ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx9-table`.
447:416 4 bytes ComputePgmRsrc2 Compute Shader (CS)
447:416 4 bytes COMPUTE_PGM_RSRC2 Compute Shader (CS)
program settings used by
CP to set up
``COMPUTE_PGM_RSRC2``
configuration
register. See
:ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx9-table`.
448 1 bit EnableSGPRPrivateSegmentBuffer Enable the setup of the
SGPR user data registers
448 1 bit ENABLE_SGPR_PRIVATE_SEGMENT Enable the setup of the
_BUFFER SGPR user data registers
(see
:ref:`amdgpu-amdhsa-initial-kernel-execution-state`).
@ -1631,18 +1631,19 @@ CP microcode requires the Kernel descritor to be allocated on 64 byte alignment.
``compute_pgm_rsrc2.user_sgpr.user_sgpr_count``.
Any requests beyond 16
will be ignored.
449 1 bit EnableSGPRDispatchPtr *see above*
450 1 bit EnableSGPRQueuePtr *see above*
451 1 bit EnableSGPRKernargSegmentPtr *see above*
452 1 bit EnableSGPRDispatchID *see above*
453 1 bit EnableSGPRFlatScratchInit *see above*
454 1 bit EnableSGPRPrivateSegmentSize *see above*
455 1 bit EnableSGPRGridWorkgroupCountX Not implemented in CP and
should always be 0.
456 1 bit EnableSGPRGridWorkgroupCountY Not implemented in CP and
should always be 0.
457 1 bit EnableSGPRGridWorkgroupCountZ Not implemented in CP and
should always be 0.
449 1 bit ENABLE_SGPR_DISPATCH_PTR *see above*
450 1 bit ENABLE_SGPR_QUEUE_PTR *see above*
451 1 bit ENABLE_SGPR_KERNARG_SEGMENT_PTR *see above*
452 1 bit ENABLE_SGPR_DISPATCH_ID *see above*
453 1 bit ENABLE_SGPR_FLAT_SCRATCH_INIT *see above*
454 1 bit ENABLE_SGPR_PRIVATE_SEGMENT *see above*
_SIZE
455 1 bit ENABLE_SGPR_GRID_WORKGROUP Not implemented in CP and
_COUNT_X should always be 0.
456 1 bit ENABLE_SGPR_GRID_WORKGROUP Not implemented in CP and
_COUNT_Y should always be 0.
457 1 bit ENABLE_SGPR_GRID_WORKGROUP Not implemented in CP and
_COUNT_Z should always be 0.
463:458 6 bits Reserved, must be 0.
511:464 6 Reserved, must be 0.
bytes
@ -1996,10 +1997,10 @@ CP microcode requires the Kernel descritor to be allocated on 64 byte alignment.
====================================== ===== ==============================
Enumeration Name Value Description
====================================== ===== ==============================
AMDGPU_FLOAT_ROUND_MODE_NEAR_EVEN 0 Round Ties To Even
AMDGPU_FLOAT_ROUND_MODE_PLUS_INFINITY 1 Round Toward +infinity
AMDGPU_FLOAT_ROUND_MODE_MINUS_INFINITY 2 Round Toward -infinity
AMDGPU_FLOAT_ROUND_MODE_ZERO 3 Round Toward 0
FLOAT_ROUND_MODE_NEAR_EVEN 0 Round Ties To Even
FLOAT_ROUND_MODE_PLUS_INFINITY 1 Round Toward +infinity
FLOAT_ROUND_MODE_MINUS_INFINITY 2 Round Toward -infinity
FLOAT_ROUND_MODE_ZERO 3 Round Toward 0
====================================== ===== ==============================
..
@ -2010,11 +2011,11 @@ CP microcode requires the Kernel descritor to be allocated on 64 byte alignment.
====================================== ===== ==============================
Enumeration Name Value Description
====================================== ===== ==============================
AMDGPU_FLOAT_DENORM_MODE_FLUSH_SRC_DST 0 Flush Source and Destination
FLOAT_DENORM_MODE_FLUSH_SRC_DST 0 Flush Source and Destination
Denorms
AMDGPU_FLOAT_DENORM_MODE_FLUSH_DST 1 Flush Output Denorms
AMDGPU_FLOAT_DENORM_MODE_FLUSH_SRC 2 Flush Source Denorms
AMDGPU_FLOAT_DENORM_MODE_FLUSH_NONE 3 No Flush
FLOAT_DENORM_MODE_FLUSH_DST 1 Flush Output Denorms
FLOAT_DENORM_MODE_FLUSH_SRC 2 Flush Source Denorms
FLOAT_DENORM_MODE_FLUSH_NONE 3 No Flush
====================================== ===== ==============================
..
@ -2025,13 +2026,13 @@ CP microcode requires the Kernel descritor to be allocated on 64 byte alignment.
======================================== ===== ============================
Enumeration Name Value Description
======================================== ===== ============================
AMDGPU_SYSTEM_VGPR_WORKITEM_ID_X 0 Set work-item X dimension
SYSTEM_VGPR_WORKITEM_ID_X 0 Set work-item X dimension
ID.
AMDGPU_SYSTEM_VGPR_WORKITEM_ID_X_Y 1 Set work-item X and Y
SYSTEM_VGPR_WORKITEM_ID_X_Y 1 Set work-item X and Y
dimensions ID.
AMDGPU_SYSTEM_VGPR_WORKITEM_ID_X_Y_Z 2 Set work-item X, Y and Z
SYSTEM_VGPR_WORKITEM_ID_X_Y_Z 2 Set work-item X, Y and Z
dimensions ID.
AMDGPU_SYSTEM_VGPR_WORKITEM_ID_UNDEFINED 3 Undefined.
SYSTEM_VGPR_WORKITEM_ID_UNDEFINED 3 Undefined.
======================================== ===== ============================
.. _amdgpu-amdhsa-initial-kernel-execution-state:

View File

@ -1,139 +0,0 @@
//===--- AMDGPUKernelDescriptor.h -------------------------------*- C++ -*-===//
//
// The LLVM Compiler Infrastructure
//
// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.
//
//===----------------------------------------------------------------------===//
//
/// \file
/// AMDGPU kernel descriptor definitions. For more information, visit
/// https://llvm.org/docs/AMDGPUUsage.html#kernel-descriptor-for-gfx6-gfx9
//
//===----------------------------------------------------------------------===//
#ifndef LLVM_SUPPORT_AMDGPUKERNELDESCRIPTOR_H
#define LLVM_SUPPORT_AMDGPUKERNELDESCRIPTOR_H
#include <cstdint>
// Creates enumeration entries used for packing bits into integers. Enumeration
// entries include bit shift amount, bit width, and bit mask.
#define AMDGPU_BITS_ENUM_ENTRY(name, shift, width) \
name ## _SHIFT = (shift), \
name ## _WIDTH = (width), \
name = (((1 << (width)) - 1) << (shift)) \
// Gets bits for specified bit mask from specified source.
#define AMDGPU_BITS_GET(src, mask) \
((src & mask) >> mask ## _SHIFT) \
// Sets bits for specified bit mask in specified destination.
#define AMDGPU_BITS_SET(dst, mask, val) \
dst &= (~(1 << mask ## _SHIFT) & ~mask); \
dst |= (((val) << mask ## _SHIFT) & mask) \
namespace llvm {
namespace AMDGPU {
namespace HSAKD {
/// Floating point rounding modes.
enum : uint8_t {
AMDGPU_FLOAT_ROUND_MODE_NEAR_EVEN = 0,
AMDGPU_FLOAT_ROUND_MODE_PLUS_INFINITY = 1,
AMDGPU_FLOAT_ROUND_MODE_MINUS_INFINITY = 2,
AMDGPU_FLOAT_ROUND_MODE_ZERO = 3,
};
/// Floating point denorm modes.
enum : uint8_t {
AMDGPU_FLOAT_DENORM_MODE_FLUSH_SRC_DST = 0,
AMDGPU_FLOAT_DENORM_MODE_FLUSH_DST = 1,
AMDGPU_FLOAT_DENORM_MODE_FLUSH_SRC = 2,
AMDGPU_FLOAT_DENORM_MODE_FLUSH_NONE = 3,
};
/// System VGPR workitem IDs.
enum : uint8_t {
AMDGPU_SYSTEM_VGPR_WORKITEM_ID_X = 0,
AMDGPU_SYSTEM_VGPR_WORKITEM_ID_X_Y = 1,
AMDGPU_SYSTEM_VGPR_WORKITEM_ID_X_Y_Z = 2,
AMDGPU_SYSTEM_VGPR_WORKITEM_ID_UNDEFINED = 3,
};
/// Compute program resource register one layout.
enum ComputePgmRsrc1 {
AMDGPU_BITS_ENUM_ENTRY(GRANULATED_WORKITEM_VGPR_COUNT, 0, 6),
AMDGPU_BITS_ENUM_ENTRY(GRANULATED_WAVEFRONT_SGPR_COUNT, 6, 4),
AMDGPU_BITS_ENUM_ENTRY(PRIORITY, 10, 2),
AMDGPU_BITS_ENUM_ENTRY(FLOAT_ROUND_MODE_32, 12, 2),
AMDGPU_BITS_ENUM_ENTRY(FLOAT_ROUND_MODE_16_64, 14, 2),
AMDGPU_BITS_ENUM_ENTRY(FLOAT_DENORM_MODE_32, 16, 2),
AMDGPU_BITS_ENUM_ENTRY(FLOAT_DENORM_MODE_16_64, 18, 2),
AMDGPU_BITS_ENUM_ENTRY(PRIV, 20, 1),
AMDGPU_BITS_ENUM_ENTRY(ENABLE_DX10_CLAMP, 21, 1),
AMDGPU_BITS_ENUM_ENTRY(DEBUG_MODE, 22, 1),
AMDGPU_BITS_ENUM_ENTRY(ENABLE_IEEE_MODE, 23, 1),
AMDGPU_BITS_ENUM_ENTRY(BULKY, 24, 1),
AMDGPU_BITS_ENUM_ENTRY(CDBG_USER, 25, 1),
AMDGPU_BITS_ENUM_ENTRY(FP16_OVFL, 26, 1),
AMDGPU_BITS_ENUM_ENTRY(RESERVED0, 27, 5),
};
/// Compute program resource register two layout.
enum ComputePgmRsrc2 {
AMDGPU_BITS_ENUM_ENTRY(ENABLE_SGPR_PRIVATE_SEGMENT_WAVE_OFFSET, 0, 1),
AMDGPU_BITS_ENUM_ENTRY(USER_SGPR_COUNT, 1, 5),
AMDGPU_BITS_ENUM_ENTRY(ENABLE_TRAP_HANDLER, 6, 1),
AMDGPU_BITS_ENUM_ENTRY(ENABLE_SGPR_WORKGROUP_ID_X, 7, 1),
AMDGPU_BITS_ENUM_ENTRY(ENABLE_SGPR_WORKGROUP_ID_Y, 8, 1),
AMDGPU_BITS_ENUM_ENTRY(ENABLE_SGPR_WORKGROUP_ID_Z, 9, 1),
AMDGPU_BITS_ENUM_ENTRY(ENABLE_SGPR_WORKGROUP_INFO, 10, 1),
AMDGPU_BITS_ENUM_ENTRY(ENABLE_VGPR_WORKITEM_ID, 11, 2),
AMDGPU_BITS_ENUM_ENTRY(ENABLE_EXCEPTION_ADDRESS_WATCH, 13, 1),
AMDGPU_BITS_ENUM_ENTRY(ENABLE_EXCEPTION_MEMORY, 14, 1),
AMDGPU_BITS_ENUM_ENTRY(GRANULATED_LDS_SIZE, 15, 9),
AMDGPU_BITS_ENUM_ENTRY(ENABLE_EXCEPTION_IEEE_754_FP_INVALID_OPERATION, 24, 1),
AMDGPU_BITS_ENUM_ENTRY(ENABLE_EXCEPTION_FP_DENORMAL_SOURCE, 25, 1),
AMDGPU_BITS_ENUM_ENTRY(ENABLE_EXCEPTION_IEEE_754_FP_DIVISION_BY_ZERO, 26, 1),
AMDGPU_BITS_ENUM_ENTRY(ENABLE_EXCEPTION_IEEE_754_FP_OVERFLOW, 27, 1),
AMDGPU_BITS_ENUM_ENTRY(ENABLE_EXCEPTION_IEEE_754_FP_UNDERFLOW, 28, 1),
AMDGPU_BITS_ENUM_ENTRY(ENABLE_EXCEPTION_IEEE_754_FP_INEXACT, 29, 1),
AMDGPU_BITS_ENUM_ENTRY(ENABLE_EXCEPTION_INT_DIVIDE_BY_ZERO, 30, 1),
AMDGPU_BITS_ENUM_ENTRY(RESERVED1, 31, 1),
};
/// Kernel descriptor layout. This layout should be kept backwards
/// compatible as it is consumed by the command processor.
struct KernelDescriptor final {
uint32_t GroupSegmentFixedSize;
uint32_t PrivateSegmentFixedSize;
uint32_t MaxFlatWorkGroupSize;
uint64_t IsDynamicCallStack : 1;
uint64_t IsXNACKEnabled : 1;
uint64_t Reserved0 : 30;
int64_t KernelCodeEntryByteOffset;
uint64_t Reserved1[3];
uint32_t ComputePgmRsrc1;
uint32_t ComputePgmRsrc2;
uint64_t EnableSGPRPrivateSegmentBuffer : 1;
uint64_t EnableSGPRDispatchPtr : 1;
uint64_t EnableSGPRQueuePtr : 1;
uint64_t EnableSGPRKernargSegmentPtr : 1;
uint64_t EnableSGPRDispatchID : 1;
uint64_t EnableSGPRFlatScratchInit : 1;
uint64_t EnableSGPRPrivateSegmentSize : 1;
uint64_t EnableSGPRGridWorkgroupCountX : 1;
uint64_t EnableSGPRGridWorkgroupCountY : 1;
uint64_t EnableSGPRGridWorkgroupCountZ : 1;
uint64_t Reserved2 : 54;
KernelDescriptor() = default;
};
} // end namespace HSAKD
} // end namespace AMDGPU
} // end namespace llvm
#endif // LLVM_SUPPORT_AMDGPUKERNELDESCRIPTOR_H

View File

@ -0,0 +1,187 @@
//===--- AMDHSAKernelDescriptor.h -----------------------------*- C++ -*---===//
//
// The LLVM Compiler Infrastructure
//
// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.
//
//===----------------------------------------------------------------------===//
//
/// \file
/// AMDHSA kernel descriptor definitions. For more information, visit
/// https://llvm.org/docs/AMDGPUUsage.html#kernel-descriptor
//
//===----------------------------------------------------------------------===//
#ifndef LLVM_SUPPORT_AMDHSAKERNELDESCRIPTOR_H
#define LLVM_SUPPORT_AMDHSAKERNELDESCRIPTOR_H
#include <cstdint>
// Gets offset of specified member in specified type.
#ifndef offsetof
#define offsetof(TYPE, MEMBER) ((size_t)&((TYPE*)0)->MEMBER)
#endif // offsetof
// Creates enumeration entries used for packing bits into integers. Enumeration
// entries include bit shift amount, bit width, and bit mask.
#ifndef AMDHSA_BITS_ENUM_ENTRY
#define AMDHSA_BITS_ENUM_ENTRY(NAME, SHIFT, WIDTH) \
NAME ## _SHIFT = (SHIFT), \
NAME ## _WIDTH = (WIDTH), \
NAME = (((1 << (WIDTH)) - 1) << (SHIFT))
#endif // AMDHSA_BITS_ENUM_ENTRY
// Gets bits for specified bit mask from specified source.
#ifndef AMDHSA_BITS_GET
#define AMDHSA_BITS_GET(SRC, MSK) ((SRC & MSK) >> MSK ## _SHIFT)
#endif // AMDHSA_BITS_GET
// Sets bits for specified bit mask in specified destination.
#ifndef AMDHSA_BITS_SET
#define AMDHSA_BITS_SET(DST, MSK, VAL) \
DST &= ~MSK; \
DST |= ((VAL << MSK ## _SHIFT) & MSK)
#endif // AMDHSA_BITS_SET
namespace llvm {
namespace amdhsa {
// Floating point rounding modes. Must be kept backwards compatible.
enum : uint8_t {
FLOAT_ROUND_MODE_NEAR_EVEN = 0,
FLOAT_ROUND_MODE_PLUS_INFINITY = 1,
FLOAT_ROUND_MODE_MINUS_INFINITY = 2,
FLOAT_ROUND_MODE_ZERO = 3,
};
// Floating point denorm modes. Must be kept backwards compatible.
enum : uint8_t {
FLOAT_DENORM_MODE_FLUSH_SRC_DST = 0,
FLOAT_DENORM_MODE_FLUSH_DST = 1,
FLOAT_DENORM_MODE_FLUSH_SRC = 2,
FLOAT_DENORM_MODE_FLUSH_NONE = 3,
};
// System VGPR workitem IDs. Must be kept backwards compatible.
enum : uint8_t {
SYSTEM_VGPR_WORKITEM_ID_X = 0,
SYSTEM_VGPR_WORKITEM_ID_X_Y = 1,
SYSTEM_VGPR_WORKITEM_ID_X_Y_Z = 2,
SYSTEM_VGPR_WORKITEM_ID_UNDEFINED = 3,
};
// Compute program resource register 1. Must be kept backwards compatible.
#define COMPUTE_PGM_RSRC1(NAME, SHIFT, WIDTH) \
AMDHSA_BITS_ENUM_ENTRY(COMPUTE_PGM_RSRC1_ ## NAME, SHIFT, WIDTH)
enum : int32_t {
COMPUTE_PGM_RSRC1(GRANULATED_WORKITEM_VGPR_COUNT, 0, 6),
COMPUTE_PGM_RSRC1(GRANULATED_WAVEFRONT_SGPR_COUNT, 6, 4),
COMPUTE_PGM_RSRC1(PRIORITY, 10, 2),
COMPUTE_PGM_RSRC1(FLOAT_ROUND_MODE_32, 12, 2),
COMPUTE_PGM_RSRC1(FLOAT_ROUND_MODE_16_64, 14, 2),
COMPUTE_PGM_RSRC1(FLOAT_DENORM_MODE_32, 16, 2),
COMPUTE_PGM_RSRC1(FLOAT_DENORM_MODE_16_64, 18, 2),
COMPUTE_PGM_RSRC1(PRIV, 20, 1),
COMPUTE_PGM_RSRC1(ENABLE_DX10_CLAMP, 21, 1),
COMPUTE_PGM_RSRC1(DEBUG_MODE, 22, 1),
COMPUTE_PGM_RSRC1(ENABLE_IEEE_MODE, 23, 1),
COMPUTE_PGM_RSRC1(BULKY, 24, 1),
COMPUTE_PGM_RSRC1(CDBG_USER, 25, 1),
COMPUTE_PGM_RSRC1(FP16_OVFL, 26, 1),
COMPUTE_PGM_RSRC1(RESERVED, 27, 5),
};
#undef COMPUTE_PGM_RSRC1
// Compute program resource register 2. Must be kept backwards compatible.
#define COMPUTE_PGM_RSRC2(NAME, SHIFT, WIDTH) \
AMDHSA_BITS_ENUM_ENTRY(COMPUTE_PGM_RSRC2_ ## NAME, SHIFT, WIDTH)
enum : int32_t {
COMPUTE_PGM_RSRC2(ENABLE_SGPR_PRIVATE_SEGMENT_WAVEFRONT_OFFSET, 0, 1),
COMPUTE_PGM_RSRC2(USER_SGPR_COUNT, 1, 5),
COMPUTE_PGM_RSRC2(ENABLE_TRAP_HANDLER, 6, 1),
COMPUTE_PGM_RSRC2(ENABLE_SGPR_WORKGROUP_ID_X, 7, 1),
COMPUTE_PGM_RSRC2(ENABLE_SGPR_WORKGROUP_ID_Y, 8, 1),
COMPUTE_PGM_RSRC2(ENABLE_SGPR_WORKGROUP_ID_Z, 9, 1),
COMPUTE_PGM_RSRC2(ENABLE_SGPR_WORKGROUP_INFO, 10, 1),
COMPUTE_PGM_RSRC2(ENABLE_VGPR_WORKITEM_ID, 11, 2),
COMPUTE_PGM_RSRC2(ENABLE_EXCEPTION_ADDRESS_WATCH, 13, 1),
COMPUTE_PGM_RSRC2(ENABLE_EXCEPTION_MEMORY, 14, 1),
COMPUTE_PGM_RSRC2(GRANULATED_LDS_SIZE, 15, 9),
COMPUTE_PGM_RSRC2(ENABLE_EXCEPTION_IEEE_754_FP_INVALID_OPERATION, 24, 1),
COMPUTE_PGM_RSRC2(ENABLE_EXCEPTION_FP_DENORMAL_SOURCE, 25, 1),
COMPUTE_PGM_RSRC2(ENABLE_EXCEPTION_IEEE_754_FP_DIVISION_BY_ZERO, 26, 1),
COMPUTE_PGM_RSRC2(ENABLE_EXCEPTION_IEEE_754_FP_OVERFLOW, 27, 1),
COMPUTE_PGM_RSRC2(ENABLE_EXCEPTION_IEEE_754_FP_UNDERFLOW, 28, 1),
COMPUTE_PGM_RSRC2(ENABLE_EXCEPTION_IEEE_754_FP_INEXACT, 29, 1),
COMPUTE_PGM_RSRC2(ENABLE_EXCEPTION_INT_DIVIDE_BY_ZERO, 30, 1),
COMPUTE_PGM_RSRC2(RESERVED, 31, 1),
};
#undef COMPUTE_PGM_RSRC2
// Kernel code properties. Must be kept backwards compatible.
#define KERNEL_CODE_PROPERTY(NAME, SHIFT, WIDTH) \
AMDHSA_BITS_ENUM_ENTRY(KERNEL_CODE_PROPERTY_ ## NAME, SHIFT, WIDTH)
enum : int32_t {
KERNEL_CODE_PROPERTY(ENABLE_SGPR_PRIVATE_SEGMENT_BUFFER, 0, 1),
KERNEL_CODE_PROPERTY(ENABLE_SGPR_DISPATCH_PTR, 1, 1),
KERNEL_CODE_PROPERTY(ENABLE_SGPR_QUEUE_PTR, 2, 1),
KERNEL_CODE_PROPERTY(ENABLE_SGPR_KERNARG_SEGMENT_PTR, 3, 1),
KERNEL_CODE_PROPERTY(ENABLE_SGPR_DISPATCH_ID, 4, 1),
KERNEL_CODE_PROPERTY(ENABLE_SGPR_FLAT_SCRATCH_INIT, 5, 1),
KERNEL_CODE_PROPERTY(ENABLE_SGPR_PRIVATE_SEGMENT_SIZE, 6, 1),
KERNEL_CODE_PROPERTY(ENABLE_SGPR_GRID_WORKGROUP_COUNT_X, 7, 1),
KERNEL_CODE_PROPERTY(ENABLE_SGPR_GRID_WORKGROUP_COUNT_Y, 8, 1),
KERNEL_CODE_PROPERTY(ENABLE_SGPR_GRID_WORKGROUP_COUNT_Z, 9, 1),
KERNEL_CODE_PROPERTY(RESERVED, 10, 6),
};
#undef KERNEL_CODE_PROPERTY
// Kernel descriptor. Must be kept backwards compatible.
struct kernel_descriptor_t {
uint32_t group_segment_fixed_size;
uint32_t private_segment_fixed_size;
uint8_t reserved0[8];
int64_t kernel_code_entry_byte_offset;
uint8_t reserved1[24];
uint32_t compute_pgm_rsrc1;
uint32_t compute_pgm_rsrc2;
uint16_t kernel_code_properties;
uint8_t reserved2[6];
};
static_assert(
sizeof(kernel_descriptor_t) == 64,
"invalid size for kernel_descriptor_t");
static_assert(
offsetof(kernel_descriptor_t, group_segment_fixed_size) == 0,
"invalid offset for group_segment_fixed_size");
static_assert(
offsetof(kernel_descriptor_t, private_segment_fixed_size) == 4,
"invalid offset for private_segment_fixed_size");
static_assert(
offsetof(kernel_descriptor_t, reserved0) == 8,
"invalid offset for reserved0");
static_assert(
offsetof(kernel_descriptor_t, kernel_code_entry_byte_offset) == 16,
"invalid offset for kernel_code_entry_byte_offset");
static_assert(
offsetof(kernel_descriptor_t, reserved1) == 24,
"invalid offset for reserved1");
static_assert(
offsetof(kernel_descriptor_t, compute_pgm_rsrc1) == 48,
"invalid offset for compute_pgm_rsrc1");
static_assert(
offsetof(kernel_descriptor_t, compute_pgm_rsrc2) == 52,
"invalid offset for compute_pgm_rsrc2");
static_assert(
offsetof(kernel_descriptor_t, kernel_code_properties) == 56,
"invalid offset for kernel_code_properties");
static_assert(
offsetof(kernel_descriptor_t, reserved2) == 58,
"invalid offset for reserved2");
} // end namespace amdhsa
} // end namespace llvm
#endif // LLVM_SUPPORT_AMDHSAKERNELDESCRIPTOR_H

View File

@ -116,6 +116,10 @@ AMDGPUTargetStreamer* AMDGPUAsmPrinter::getTargetStreamer() const {
}
void AMDGPUAsmPrinter::EmitStartOfAsmFile(Module &M) {
if (IsaInfo::hasCodeObjectV3(getSTI()) &&
TM.getTargetTriple().getOS() == Triple::AMDHSA)
return;
if (TM.getTargetTriple().getOS() != Triple::AMDHSA &&
TM.getTargetTriple().getOS() != Triple::AMDPAL)
return;
@ -126,10 +130,6 @@ void AMDGPUAsmPrinter::EmitStartOfAsmFile(Module &M) {
if (TM.getTargetTriple().getOS() == Triple::AMDPAL)
readPALMetadata(M);
// Deprecated notes are not emitted for code object v3.
if (IsaInfo::hasCodeObjectV3(getSTI()->getFeatureBits()))
return;
// HSA emits NT_AMDGPU_HSA_CODE_OBJECT_VERSION for code objects v2.
if (TM.getTargetTriple().getOS() == Triple::AMDHSA)
getTargetStreamer()->EmitDirectiveHSACodeObjectVersion(2, 1);
@ -141,6 +141,10 @@ void AMDGPUAsmPrinter::EmitStartOfAsmFile(Module &M) {
}
void AMDGPUAsmPrinter::EmitEndOfAsmFile(Module &M) {
// TODO: Add metadata to code object v3.
if (IsaInfo::hasCodeObjectV3(getSTI()) &&
TM.getTargetTriple().getOS() == Triple::AMDHSA)
return;
// Following code requires TargetStreamer to be present.
if (!getTargetStreamer())
@ -186,8 +190,11 @@ bool AMDGPUAsmPrinter::isBlockOnlyReachableByFallthrough(
}
void AMDGPUAsmPrinter::EmitFunctionBodyStart() {
const AMDGPUMachineFunction *MFI = MF->getInfo<AMDGPUMachineFunction>();
if (!MFI->isEntryFunction())
const SIMachineFunctionInfo &MFI = *MF->getInfo<SIMachineFunctionInfo>();
if (!MFI.isEntryFunction())
return;
if (IsaInfo::hasCodeObjectV3(getSTI()) &&
TM.getTargetTriple().getOS() == Triple::AMDHSA)
return;
const AMDGPUSubtarget &STM = MF->getSubtarget<AMDGPUSubtarget>();
@ -205,7 +212,27 @@ void AMDGPUAsmPrinter::EmitFunctionBodyStart() {
getHSADebugProps(*MF, CurrentProgramInfo));
}
void AMDGPUAsmPrinter::EmitFunctionBodyEnd() {
const SIMachineFunctionInfo &MFI = *MF->getInfo<SIMachineFunctionInfo>();
if (!MFI.isEntryFunction())
return;
if (!IsaInfo::hasCodeObjectV3(getSTI()) ||
TM.getTargetTriple().getOS() != Triple::AMDHSA)
return;
SmallString<128> KernelName;
getNameWithPrefix(KernelName, &MF->getFunction());
getTargetStreamer()->EmitAmdhsaKernelDescriptor(
KernelName, getAmdhsaKernelDescriptor(*MF, CurrentProgramInfo));
}
void AMDGPUAsmPrinter::EmitFunctionEntryLabel() {
if (IsaInfo::hasCodeObjectV3(getSTI()) &&
TM.getTargetTriple().getOS() == Triple::AMDHSA) {
AsmPrinter::EmitFunctionEntryLabel();
return;
}
const SIMachineFunctionInfo *MFI = MF->getInfo<SIMachineFunctionInfo>();
const AMDGPUSubtarget &STM = MF->getSubtarget<AMDGPUSubtarget>();
if (MFI->isEntryFunction() && STM.isAmdCodeObjectV2(MF->getFunction())) {
@ -288,6 +315,70 @@ void AMDGPUAsmPrinter::emitCommonFunctionComments(
false);
}
uint16_t AMDGPUAsmPrinter::getAmdhsaKernelCodeProperties(
const MachineFunction &MF) const {
const SIMachineFunctionInfo &MFI = *MF.getInfo<SIMachineFunctionInfo>();
uint16_t KernelCodeProperties = 0;
if (MFI.hasPrivateSegmentBuffer()) {
KernelCodeProperties |=
amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_PRIVATE_SEGMENT_BUFFER;
}
if (MFI.hasDispatchPtr()) {
KernelCodeProperties |=
amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_DISPATCH_PTR;
}
if (MFI.hasQueuePtr()) {
KernelCodeProperties |=
amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_QUEUE_PTR;
}
if (MFI.hasKernargSegmentPtr()) {
KernelCodeProperties |=
amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_KERNARG_SEGMENT_PTR;
}
if (MFI.hasDispatchID()) {
KernelCodeProperties |=
amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_DISPATCH_ID;
}
if (MFI.hasFlatScratchInit()) {
KernelCodeProperties |=
amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_FLAT_SCRATCH_INIT;
}
if (MFI.hasGridWorkgroupCountX()) {
KernelCodeProperties |=
amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_GRID_WORKGROUP_COUNT_X;
}
if (MFI.hasGridWorkgroupCountY()) {
KernelCodeProperties |=
amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_GRID_WORKGROUP_COUNT_Y;
}
if (MFI.hasGridWorkgroupCountZ()) {
KernelCodeProperties |=
amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_GRID_WORKGROUP_COUNT_Z;
}
return KernelCodeProperties;
}
amdhsa::kernel_descriptor_t AMDGPUAsmPrinter::getAmdhsaKernelDescriptor(
const MachineFunction &MF,
const SIProgramInfo &PI) const {
amdhsa::kernel_descriptor_t KernelDescriptor;
memset(&KernelDescriptor, 0x0, sizeof(KernelDescriptor));
assert(isUInt<32>(PI.ScratchSize));
assert(isUInt<32>(PI.ComputePGMRSrc1));
assert(isUInt<32>(PI.ComputePGMRSrc2));
KernelDescriptor.group_segment_fixed_size = PI.LDSSize;
KernelDescriptor.private_segment_fixed_size = PI.ScratchSize;
KernelDescriptor.compute_pgm_rsrc1 = PI.ComputePGMRSrc1;
KernelDescriptor.compute_pgm_rsrc2 = PI.ComputePGMRSrc2;
KernelDescriptor.kernel_code_properties = getAmdhsaKernelCodeProperties(MF);
return KernelDescriptor;
}
bool AMDGPUAsmPrinter::runOnMachineFunction(MachineFunction &MF) {
CurrentProgramInfo = SIProgramInfo();

View File

@ -20,6 +20,7 @@
#include "MCTargetDesc/AMDGPUHSAMetadataStreamer.h"
#include "llvm/ADT/StringRef.h"
#include "llvm/CodeGen/AsmPrinter.h"
#include "llvm/Support/AMDHSAKernelDescriptor.h"
#include <cstddef>
#include <cstdint>
#include <limits>
@ -148,6 +149,13 @@ private:
uint64_t CodeSize,
const AMDGPUMachineFunction* MFI);
uint16_t getAmdhsaKernelCodeProperties(
const MachineFunction &MF) const;
amdhsa::kernel_descriptor_t getAmdhsaKernelDescriptor(
const MachineFunction &MF,
const SIProgramInfo &PI) const;
public:
explicit AMDGPUAsmPrinter(TargetMachine &TM,
std::unique_ptr<MCStreamer> Streamer);
@ -180,6 +188,8 @@ public:
void EmitFunctionBodyStart() override;
void EmitFunctionBodyEnd() override;
void EmitFunctionEntryLabel() override;
void EmitBasicBlockStart(const MachineBasicBlock &MBB) const override;

View File

@ -196,6 +196,12 @@ bool AMDGPUTargetAsmStreamer::EmitPALMetadata(
return true;
}
void AMDGPUTargetAsmStreamer::EmitAmdhsaKernelDescriptor(
StringRef KernelName,
const amdhsa::kernel_descriptor_t &KernelDescriptor) {
// FIXME: not supported yet.
}
//===----------------------------------------------------------------------===//
// AMDGPUTargetELFStreamer
//===----------------------------------------------------------------------===//
@ -362,3 +368,57 @@ bool AMDGPUTargetELFStreamer::EmitPALMetadata(
);
return true;
}
void AMDGPUTargetELFStreamer::EmitAmdhsaKernelDescriptor(
StringRef KernelName,
const amdhsa::kernel_descriptor_t &KernelDescriptor) {
auto &Streamer = getStreamer();
auto &Context = Streamer.getContext();
auto &ObjectFileInfo = *Context.getObjectFileInfo();
auto &ReadOnlySection = *ObjectFileInfo.getReadOnlySection();
Streamer.PushSection();
Streamer.SwitchSection(&ReadOnlySection);
// CP microcode requires the kernel descriptor to be allocated on 64 byte
// alignment.
Streamer.EmitValueToAlignment(64, 0, 1, 0);
if (ReadOnlySection.getAlignment() < 64)
ReadOnlySection.setAlignment(64);
MCSymbolELF *KernelDescriptorSymbol = cast<MCSymbolELF>(
Context.getOrCreateSymbol(Twine(KernelName) + Twine(".kd")));
KernelDescriptorSymbol->setBinding(ELF::STB_GLOBAL);
KernelDescriptorSymbol->setType(ELF::STT_OBJECT);
KernelDescriptorSymbol->setSize(
MCConstantExpr::create(sizeof(KernelDescriptor), Context));
MCSymbolELF *KernelCodeSymbol = cast<MCSymbolELF>(
Context.getOrCreateSymbol(Twine(KernelName)));
KernelCodeSymbol->setBinding(ELF::STB_LOCAL);
Streamer.EmitLabel(KernelDescriptorSymbol);
Streamer.EmitBytes(StringRef(
(const char*)&(KernelDescriptor),
offsetof(amdhsa::kernel_descriptor_t, kernel_code_entry_byte_offset)));
// FIXME: Remove the use of VK_AMDGPU_REL64 in the expression below. The
// expression being created is:
// (start of kernel code) - (start of kernel descriptor)
// It implies R_AMDGPU_REL64, but ends up being R_AMDGPU_ABS64.
Streamer.EmitValue(MCBinaryExpr::createSub(
MCSymbolRefExpr::create(
KernelCodeSymbol, MCSymbolRefExpr::VK_AMDGPU_REL64, Context),
MCSymbolRefExpr::create(
KernelDescriptorSymbol, MCSymbolRefExpr::VK_None, Context),
Context),
sizeof(KernelDescriptor.kernel_code_entry_byte_offset));
Streamer.EmitBytes(StringRef(
(const char*)&(KernelDescriptor) +
offsetof(amdhsa::kernel_descriptor_t, kernel_code_entry_byte_offset) +
sizeof(KernelDescriptor.kernel_code_entry_byte_offset),
sizeof(KernelDescriptor) -
offsetof(amdhsa::kernel_descriptor_t, kernel_code_entry_byte_offset) -
sizeof(KernelDescriptor.kernel_code_entry_byte_offset)));
Streamer.PopSection();
}

View File

@ -14,6 +14,7 @@
#include "llvm/MC/MCStreamer.h"
#include "llvm/MC/MCSubtargetInfo.h"
#include "llvm/Support/AMDGPUMetadata.h"
#include "llvm/Support/AMDHSAKernelDescriptor.h"
namespace llvm {
#include "AMDGPUPTNote.h"
@ -62,6 +63,10 @@ public:
/// \returns True on success, false on failure.
virtual bool EmitPALMetadata(const AMDGPU::PALMD::Metadata &PALMetadata) = 0;
virtual void EmitAmdhsaKernelDescriptor(
StringRef KernelName,
const amdhsa::kernel_descriptor_t &KernelDescriptor) = 0;
};
class AMDGPUTargetAsmStreamer final : public AMDGPUTargetStreamer {
@ -87,6 +92,10 @@ public:
/// \returns True on success, false on failure.
bool EmitPALMetadata(const AMDGPU::PALMD::Metadata &PALMetadata) override;
void EmitAmdhsaKernelDescriptor(
StringRef KernelName,
const amdhsa::kernel_descriptor_t &KernelDescriptor) override;
};
class AMDGPUTargetELFStreamer final : public AMDGPUTargetStreamer {
@ -119,6 +128,10 @@ public:
/// \returns True on success, false on failure.
bool EmitPALMetadata(const AMDGPU::PALMD::Metadata &PALMetadata) override;
void EmitAmdhsaKernelDescriptor(
StringRef KernelName,
const amdhsa::kernel_descriptor_t &KernelDescriptor) override;
};
}

View File

@ -248,8 +248,8 @@ void streamIsaVersion(const MCSubtargetInfo *STI, raw_ostream &Stream) {
Stream.flush();
}
bool hasCodeObjectV3(const FeatureBitset &Features) {
return Features.test(FeatureCodeObjectV3);
bool hasCodeObjectV3(const MCSubtargetInfo *STI) {
return STI->getFeatureBits().test(FeatureCodeObjectV3);
}
unsigned getWavefrontSize(const FeatureBitset &Features) {

View File

@ -59,9 +59,9 @@ IsaVersion getIsaVersion(const FeatureBitset &Features);
/// Streams isa version string for given subtarget \p STI into \p Stream.
void streamIsaVersion(const MCSubtargetInfo *STI, raw_ostream &Stream);
/// \returns True if given subtarget \p Features support code object version 3,
/// \returns True if given subtarget \p STI supports code object version 3,
/// false otherwise.
bool hasCodeObjectV3(const FeatureBitset &Features);
bool hasCodeObjectV3(const MCSubtargetInfo *STI);
/// \returns Wavefront size for given subtarget \p Features.
unsigned getWavefrontSize(const FeatureBitset &Features);

View File

@ -0,0 +1,48 @@
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx803 -mattr=+code-object-v3 < %s | FileCheck --check-prefixes=ALL-ASM,OSABI-AMDHSA-ASM %s
; RUN: llc -filetype=obj -mtriple=amdgcn-amd-amdhsa -mcpu=gfx803 -mattr=+code-object-v3 < %s | llvm-readobj -elf-output-style=GNU -notes -relocations -sections -symbols | FileCheck --check-prefixes=ALL-ELF,OSABI-AMDHSA-ELF %s
; OSABI-AMDHSA-ASM-NOT: .hsa_code_object_version
; OSABI-AMDHSA-ASM-NOT: .hsa_code_object_isa
; OSABI-AMDHSA-ASM-NOT: .amd_amdgpu_isa
; OSABI-AMDHSA-ASM-NOT: .amd_amdgpu_hsa_metadata
; OSABI-AMDHSA-ASM-NOT: .amd_amdgpu_pal_metadata
; OSABI-AMDHSA-ELF: Section Headers
; OSABI-AMDHSA-ELF: .text PROGBITS {{[0-9]+}} {{[0-9]+}} {{[0-9a-f]+}} {{[0-9]+}} AX {{[0-9]+}} {{[0-9]+}} 256
; OSABI-AMDHSA-ELF: .rodata PROGBITS {{[0-9]+}} {{[0-9]+}} {{[0-9a-f]+}} {{[0-9]+}} A {{[0-9]+}} {{[0-9]+}} 64
; OSABI-AMDHSA-ELF: Relocation section '.rela.rodata' at offset
; OSABI-AMDHSA-ELF: 0000000000000010 0000000300000005 R_AMDGPU_REL64 0000000000000000 .text + 10
; OSABI-AMDHSA-ELF: 0000000000000050 0000000300000005 R_AMDGPU_REL64 0000000000000000 .text + 110
; OSABI-AMDHSA-ELF: Symbol table '.symtab' contains {{[0-9]+}} entries
; OSABI-AMDHSA-ELF: {{[0-9]+}}: 0000000000000000 {{[0-9]+}} FUNC LOCAL DEFAULT {{[0-9]+}} fadd
; OSABI-AMDHSA-ELF: {{[0-9]+}}: 0000000000000100 {{[0-9]+}} FUNC LOCAL DEFAULT {{[0-9]+}} fsub
; OSABI-AMDHSA-ELF: {{[0-9]+}}: 0000000000000000 64 OBJECT GLOBAL DEFAULT {{[0-9]+}} fadd.kd
; OSABI-AMDHSA-ELF: {{[0-9]+}}: 0000000000000040 64 OBJECT GLOBAL DEFAULT {{[0-9]+}} fsub.kd
; OSABI-AMDHSA-ELF-NOT: Displaying notes found
define amdgpu_kernel void @fadd(
float addrspace(1)* %r,
float addrspace(1)* %a,
float addrspace(1)* %b) {
entry:
%a.val = load float, float addrspace(1)* %a
%b.val = load float, float addrspace(1)* %b
%r.val = fadd float %a.val, %b.val
store float %r.val, float addrspace(1)* %r
ret void
}
define amdgpu_kernel void @fsub(
float addrspace(1)* %r,
float addrspace(1)* %a,
float addrspace(1)* %b) {
entry:
%a.val = load float, float addrspace(1)* %a
%b.val = load float, float addrspace(1)* %b
%r.val = fsub float %a.val, %b.val
store float %r.val, float addrspace(1)* %r
ret void
}

View File

@ -1,13 +1,13 @@
; RUN: llc -mtriple=amdgcn-amd-unknown -mcpu=gfx802 -mattr=+code-object-v3 < %s | FileCheck --check-prefix=GCN --check-prefix=OSABI-UNK --check-prefix=GFX802 %s
; RUN: llc -mtriple=amdgcn-amd-unknown -mcpu=iceland -mattr=+code-object-v3 < %s | FileCheck --check-prefix=GCN --check-prefix=OSABI-UNK --check-prefix=GFX802 %s
; RUN: llc -mtriple=amdgcn-amd-unknown -mcpu=gfx802 -mattr=+code-object-v3 -filetype=obj < %s | llvm-readobj -elf-output-style=GNU -notes | FileCheck --check-prefix=GCN --check-prefix=OSABI-UNK-ELF --check-prefix=GFX802 %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx802 -mattr=+code-object-v3 < %s | FileCheck --check-prefix=GCN --check-prefix=OSABI-HSA --check-prefix=GFX802 %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=iceland -mattr=+code-object-v3 < %s | FileCheck --check-prefix=GCN --check-prefix=OSABI-HSA --check-prefix=GFX802 %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx802 -mattr=+code-object-v3 -filetype=obj < %s | llvm-readobj -elf-output-style=GNU -notes | FileCheck --check-prefix=GCN --check-prefix=OSABI-HSA-ELF --check-prefix=GFX802 %s
; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx802 -mattr=+code-object-v3 < %s | FileCheck --check-prefix=GCN --check-prefix=OSABI-PAL --check-prefix=GFX802 %s
; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=iceland -mattr=+code-object-v3 < %s | FileCheck --check-prefix=GCN --check-prefix=OSABI-PAL --check-prefix=GFX802 %s
; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx802 -mattr=+code-object-v3 -filetype=obj < %s | llvm-readobj -elf-output-style=GNU -notes | FileCheck --check-prefix=GCN --check-prefix=OSABI-PAL-ELF --check-prefix=GFX802 %s
; RUN: llc -march=r600 -mattr=+code-object-v3 < %s | FileCheck --check-prefix=R600 %s
; RUN: llc -mtriple=amdgcn-amd-unknown -mcpu=gfx802 < %s | FileCheck --check-prefix=GCN --check-prefix=OSABI-UNK --check-prefix=GFX802 %s
; RUN: llc -mtriple=amdgcn-amd-unknown -mcpu=iceland < %s | FileCheck --check-prefix=GCN --check-prefix=OSABI-UNK --check-prefix=GFX802 %s
; RUN: llc -mtriple=amdgcn-amd-unknown -mcpu=gfx802 -filetype=obj < %s | llvm-readobj -elf-output-style=GNU -notes | FileCheck --check-prefix=GCN --check-prefix=OSABI-UNK-ELF --check-prefix=GFX802 %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx802 < %s | FileCheck --check-prefix=GCN --check-prefix=OSABI-HSA --check-prefix=GFX802 %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=iceland < %s | FileCheck --check-prefix=GCN --check-prefix=OSABI-HSA --check-prefix=GFX802 %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx802 -filetype=obj < %s | llvm-readobj -elf-output-style=GNU -notes | FileCheck --check-prefix=GCN --check-prefix=OSABI-HSA-ELF --check-prefix=GFX802 %s
; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx802 < %s | FileCheck --check-prefix=GCN --check-prefix=OSABI-PAL --check-prefix=GFX802 %s
; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=iceland < %s | FileCheck --check-prefix=GCN --check-prefix=OSABI-PAL --check-prefix=GFX802 %s
; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx802 -filetype=obj < %s | llvm-readobj -elf-output-style=GNU -notes | FileCheck --check-prefix=GCN --check-prefix=OSABI-PAL-ELF --check-prefix=GFX802 %s
; RUN: llc -march=r600 < %s | FileCheck --check-prefix=R600 %s
; OSABI-UNK-NOT: .hsa_code_object_version
; OSABI-UNK-NOT: .hsa_code_object_isa
@ -25,17 +25,17 @@
; OSABI-UNK-ELF-NOT: NT_AMD_AMDGPU_PAL_METADATA (PAL Metadata)
; OSABI-UNK-ELF-NOT: Unknown note type
; OSABI-HSA-NOT: .hsa_code_object_version
; OSABI-HSA-NOT: .hsa_code_object_isa
; OSABI-HSA: .hsa_code_object_version
; OSABI-HSA: .hsa_code_object_isa
; OSABI-HSA: .amd_amdgpu_isa "amdgcn-amd-amdhsa--gfx802"
; OSABI-HSA: .amd_amdgpu_hsa_metadata
; OSABI-HSA-NOT: .amd_amdgpu_pal_metadata
; OSABI-HSA-ELF-NOT: Unknown note type
; OSABI-HSA-ELF: Unknown note type (0x00000001)
; OSABI-HSA-ELF: Unknown note type (0x00000003)
; OSABI-HSA-ELF: NT_AMD_AMDGPU_ISA (ISA Version)
; OSABI-HSA-ELF: ISA Version:
; OSABI-HSA-ELF: amdgcn-amd-amdhsa--gfx802
; OSABI-HSA-ELF-NOT: Unknown note type
; OSABI-HSA-ELF: NT_AMD_AMDGPU_HSA_METADATA (HSA Metadata)
; OSABI-HSA-ELF: HSA Metadata:
; OSABI-HSA-ELF: ---
@ -51,34 +51,29 @@
; OSABI-HSA-ELF: WavefrontSize: 64
; OSABI-HSA-ELF: NumSGPRs: 96
; OSABI-HSA-ELF: ...
; OSABI-HSA-ELF-NOT: Unknown note type
; OSABI-HSA-ELF-NOT: NT_AMD_AMDGPU_PAL_METADATA (PAL Metadata)
; OSABI-HSA-ELF-NOT: Unknown note type
; OSABI-PAL-NOT: .hsa_code_object_version
; OSABI-PAL-NOT: .hsa_code_object_isa
; OSABI-PAL: .hsa_code_object_isa
; OSABI-PAL: .amd_amdgpu_isa "amdgcn-amd-amdpal--gfx802"
; OSABI-PAL-NOT: .amd_amdgpu_hsa_metadata
; OSABI-PAL: .amd_amdgpu_pal_metadata
; OSABI-PAL-ELF-NOT: Unknown note type
; OSABI-PAL-ELF: Unknown note type (0x00000003)
; OSABI-PAL-ELF: NT_AMD_AMDGPU_ISA (ISA Version)
; OSABI-PAL-ELF: ISA Version:
; OSABI-PAL-ELF: amdgcn-amd-amdpal--gfx802
; OSABI-PAL-ELF-NOT: Unknown note type
; OSABI-PAL-ELF-NOT: NT_AMD_AMDGPU_HSA_METADATA (HSA Metadata)
; OSABI-PAL-ELF-NOT: Unknown note type
; OSABI-PAL-ELF: NT_AMD_AMDGPU_PAL_METADATA (PAL Metadata)
; OSABI-PAL-ELF: PAL Metadata:
; TODO: Following check line fails on mips:
; OSABI-PAL-ELF-XXX: 0x2e12,0xac02c0,0x2e13,0x80,0x1000001b,0x1,0x10000022,0x60,0x1000003e,0x0
; OSABI-PAL-ELF-NOT: Unknown note type
; R600-NOT: .hsa_code_object_version
; R600-NOT: .hsa_code_object_isa
; R600-NOT: .amd_amdgpu_isa
; R600-NOT: .amd_amdgpu_hsa_metadata
; R600-NOT: .amd_amdgpu_pal_metadatas
; R600-NOT: .amd_amdgpu_pal_metadata
define amdgpu_kernel void @elf_notes() {
ret void