[AMDGPU][DOC][NFC] Update assembler syntax description

Summary of changes:
- Enable register tuples with 9, 10, 11 and 12 registers (https://reviews.llvm.org/D138205).
- Small improvements and clarifications.
- Correct typos.
This commit is contained in:
Dmitry Preobrazhensky 2022-12-20 14:01:37 +03:00
parent ff302f8502
commit d9daee5a66
4 changed files with 298 additions and 345 deletions

View File

@ -10,10 +10,10 @@ AMDGPU Instructions Notation
Introduction
============
This is an overview of notation used to describe syntax of AMDGPU assembler instructions.
This is an overview of notation used to describe the syntax of AMDGPU assembler instructions.
This notation mimics the :ref:`syntax of assembler instructions<amdgpu_syn_instructions>`
except that instead of real operands and modifiers it provides references to their description.
This notation looks a lot like the :ref:`syntax of assembler instructions<amdgpu_syn_instructions>`,
except that instead of real operands and modifiers, it uses references to their descriptions.
Instructions
============
@ -23,7 +23,9 @@ Notation
This is the notation used to describe AMDGPU instructions:
``<``\ :ref:`opcode description<amdgpu_syn_opcode_notation>`\ ``> <``\ :ref:`operands description<amdgpu_syn_instruction_operands_notation>`\ ``> <``\ :ref:`modifiers description<amdgpu_syn_instruction_modifiers_notation>`\ ``>``
| ``<``\ :ref:`opcode description<amdgpu_syn_opcode_notation>`\ ``>
<``\ :ref:`operands description<amdgpu_syn_instruction_operands_notation>`\ ``>
<``\ :ref:`modifiers description<amdgpu_syn_instruction_modifiers_notation>`\ ``>``
.. _amdgpu_syn_opcode_notation:
@ -42,7 +44,8 @@ Operands
An instruction may have zero or more *operands*. They are comma-separated in the description:
``<``\ :ref:`description of operand 0<amdgpu_syn_instruction_operand_notation>`\ ``>, <``\ :ref:`description of operand 1<amdgpu_syn_instruction_operand_notation>`\ ``>, ...``
| ``<``\ :ref:`description of operand 0<amdgpu_syn_instruction_operand_notation>`\ ``>,
<``\ :ref:`description of operand 1<amdgpu_syn_instruction_operand_notation>`\ ``>, ...``
The order of *operands* is fixed. *Operands* cannot be omitted
except for special cases described below.
@ -60,7 +63,8 @@ Where:
* *kind* is an optional prefix describing operand :ref:`kind<amdgpu_syn_instruction_operand_kinds>`.
* *name* is a link to a description of the operand.
* *tags* are optional. They are used to indicate :ref:`special operand properties<amdgpu_syn_instruction_operand_tags>`.
* *tags* are optional. They are used to indicate
:ref:`special operand properties<amdgpu_syn_instruction_operand_tags>`.
.. _amdgpu_syn_instruction_operand_kinds:
@ -70,8 +74,8 @@ Operand Kinds
Operand kind indicates which values are accepted by the operand.
* Operands which only accept *vector* registers are labelled with 'v' prefix.
* Operands which only accept *scalar* values are labelled with 's' prefix.
* Operands which accept both *vector* registers and *scalar* values have no prefix.
* Operands which only accept *scalar* registers and values are labelled with 's' prefix.
* Operands which accept any registers and values have no prefix.
Examples:
@ -79,7 +83,7 @@ Examples:
vdata // operand only accepts vector registers
sdst // operand only accepts scalar registers
src1 // operand accepts both scalar and vector registers
src1 // operand accepts vector registers, scalar registers, and scalar values
.. _amdgpu_syn_instruction_operand_tags:
@ -92,16 +96,16 @@ Operand tags indicate special operand properties.
Operand tag Meaning
============== =================================================================================
:opt An optional operand.
:m An operand which may be used with
:ref:`VOP3 operand modifiers<amdgpu_synid_vop3_operand_modifiers>` or
:ref:`SDWA operand modifiers<amdgpu_synid_sdwa_operand_modifiers>`.
:dst An input operand which may also serve as a destination
:m An operand which may be used with operand modifiers
:ref:`abs<amdgpu_synid_abs>`, :ref:`neg<amdgpu_synid_neg>` or
:ref:`sext<amdgpu_synid_sext>`.
:dst An input operand which is also used as a destination
if :ref:`glc<amdgpu_synid_glc>` modifier is specified.
:fx This is an *f32* or *f16* operand depending on
:fx This is a *f32* or *f16* operand, depending on
:ref:`m_op_sel_hi<amdgpu_synid_mad_mix_op_sel_hi>` modifier.
:<type> Operand *type* differs from *type*
:<type> The operand *type* differs from the *type*
:ref:`implied by the opcode name<amdgpu_syn_instruction_type>`.
This tag specifies actual operand *type*.
This tag specifies the actual operand *type*.
============== =================================================================================
Examples:
@ -119,7 +123,8 @@ Modifiers
An instruction may have zero or more optional *modifiers*. They are space-separated in the description:
``<``\ :ref:`description of modifier 0<amdgpu_syn_instruction_modifier_notation>`\ ``> <``\ :ref:`description of modifier 1<amdgpu_syn_instruction_modifier_notation>`\ ``> ...``
| ``<``\ :ref:`description of modifier 0<amdgpu_syn_instruction_modifier_notation>`\ ``>
<``\ :ref:`description of modifier 1<amdgpu_syn_instruction_modifier_notation>`\ ``> ...``
The order of *modifiers* is fixed.
@ -132,4 +137,4 @@ A *modifier* is described using the following notation:
*<name>*
Where *name* is a link to a description of the *modifier*.
Where the *name* is a link to a description of the *modifier*.

View File

@ -15,9 +15,10 @@ Syntax
An instruction has the following syntax:
``<``\ *opcode mnemonic*\ ``> <``\ *operand0*\ ``>, <``\ *operand1*\ ``>,... <``\ *modifier0*\ ``> <``\ *modifier1*\ ``>...``
| ``<``\ *opcode mnemonic*\ ``> <``\ *operand0*\ ``>,
<``\ *operand1*\ ``>,... <``\ *modifier0*\ ``> <``\ *modifier1*\ ``>...``
:doc:`Operands<AMDGPUOperandSyntax>` are normally comma-separated while
:doc:`Operands<AMDGPUOperandSyntax>` are normally comma-separated, while
:doc:`modifiers<AMDGPUModifierSyntax>` are space-separated.
The order of *operands* and *modifiers* is fixed.
@ -28,7 +29,8 @@ Most *modifiers* are optional and may be omitted.
Opcode Mnemonic
~~~~~~~~~~~~~~~
Opcode mnemonic describes opcode semantics and may include one or more suffices in this order:
Opcode mnemonic describes opcode semantics
and may include one or more suffices in this order:
* :ref:`Packing suffix<amdgpu_syn_instruction_pk>`.
* :ref:`Destination operand type suffix<amdgpu_syn_instruction_type>`.
@ -81,7 +83,7 @@ The following table enumerates the most frequently used type suffices.
============================================ ======================= ============================
Instructions which have no type suffices are assumed to operate with typeless data.
The size of data is specified by size suffices:
The size of typeless data is specified by size suffices:
================= =================== =====================================
Size Suffix Implied data type Required register size in dwords
@ -103,8 +105,8 @@ The size of data is specified by size suffices:
================= =================== =====================================
.. WARNING::
There are exceptions from rules described above.
Operands which have type different from type specified by the opcode are
There are exceptions to the rules described above.
Operands which have a type different from the type specified by the opcode are
:ref:`tagged<amdgpu_syn_instruction_operand_tags>` in the description.
Examples of instructions with different types of source and destination operands:
@ -144,7 +146,9 @@ Encoding Suffices
Most *VOP1*, *VOP2* and *VOPC* instructions have several variants:
they may also be encoded in *VOP3*, *DPP* and *SDWA* formats.
The assembler will automatically use optimal encoding based on instruction operands.
The assembler selects an optimal encoding automatically
based on instruction operands and modifiers,
unless a specific encoding is explicitly requested.
To force specific encoding, one can add a suffix to the opcode of the instruction:
=================================================== =================
@ -156,8 +160,8 @@ To force specific encoding, one can add a suffix to the opcode of the instructio
*SDWA* encoding _sdwa
=================================================== =================
These suffices are used in this reference to indicate the assumed encoding.
When no suffix is specified, native instruction encoding is implied.
This reference uses encoding suffices to specify which encoding is implied.
When no suffix is specified, native instruction encoding is assumed.
Operands
========
@ -165,9 +169,9 @@ Operands
Syntax
~~~~~~
Syntax of generic operands is described :doc:`in this document<AMDGPUOperandSyntax>`.
The syntax of generic operands is described :doc:`in this document<AMDGPUOperandSyntax>`.
For detailed information about operands follow *operand links* in GPU-specific documents.
For detailed information about operands, follow *operand links* in GPU-specific documents.
Modifiers
=========
@ -175,6 +179,7 @@ Modifiers
Syntax
~~~~~~
Syntax of modifiers is described :doc:`in this document<AMDGPUModifierSyntax>`.
The syntax of modifiers is described :doc:`in this document<AMDGPUModifierSyntax>`.
Information about modifiers supported for individual instructions may be found in GPU-specific documents.
Information about modifiers supported for individual instructions
may be found in GPU-specific documents.

File diff suppressed because it is too large Load Diff

View File

@ -14,7 +14,7 @@ The following notation is used throughout this document:
Notation Description
=================== =============================================================================
{0..N} Any integer value in the range from 0 to N (inclusive).
<x> Syntax and meaning of *x* is explained elsewhere.
<x> Syntax and meaning of *x* are explained elsewhere.
=================== =============================================================================
.. _amdgpu_syn_operands:
@ -31,7 +31,7 @@ Vector registers. There are 256 32-bit vector registers.
A sequence of *vector* registers may be used to operate with more than 32 bits of data.
Assembler currently supports sequences of 1, 2, 3, 4, 5, 6, 7, 8, 16 and 32 *vector* registers.
Assembler currently supports tuples with 1 to 12, 16 and 32 *vector* registers.
=================================================== ====================================================================
Syntax Description
@ -61,9 +61,10 @@ Note: *N* and *K* must satisfy the following conditions:
* *N* <= *K*.
* 0 <= *N* <= 255.
* 0 <= *K* <= 255.
* *K-N+1* must be equal to 1, 2, 3, 4, 5, 6, 7, 8, 16 or 32.
* *K-N+1* must be in the range from 1 to 12 or equal to 16 or 32.
GFX90A has an additional alignment requirement: pairs of *vector* registers must be even-aligned
GFX90A and GFX940 have an additional alignment requirement:
pairs of *vector* registers must be even-aligned
(first register must be even).
Examples:
@ -82,19 +83,20 @@ Examples:
.. _amdgpu_synid_nsa:
GFX10 *Image* instructions may use special *NSA* (Non-Sequential Address) syntax for *image addresses*:
GFX10+ *image* instructions may use special *NSA* (Non-Sequential Address)
syntax for *image addresses*:
===================================== =================================================
Syntax Description
===================================== =================================================
**[Vm**, \ **Vn**, ... **Vk**\ **]** A sequence of 32-bit *vector* registers.
Each register may be specified using syntax
Each register may be specified using the syntax
defined :ref:`above<amdgpu_synid_v>`.
In contrast with standard syntax, registers
In contrast with the standard syntax, registers
in *NSA* sequence are not required to have
consecutive indices. Moreover, the same register
may appear in the list more than once.
may appear in the sequence more than once.
===================================== =================================================
Examples:
@ -114,10 +116,10 @@ Accumulator registers. There are 256 32-bit accumulator registers.
A sequence of *accumulator* registers may be used to operate with more than 32 bits of data.
Assembler currently supports sequences of 1, 2, 3, 4, 5, 6, 7, 8, 16 and 32 *accumulator* registers.
Assembler currently supports tuples with 1 to 12, 16 and 32 *accumulator* registers.
=================================================== ========================================================= ====================================================================
Syntax An Alternative Syntax (SP3) Description
Syntax Alternative Syntax (SP3) Description
=================================================== ========================================================= ====================================================================
**a**\<N> **acc**\<N> A single 32-bit *accumulator* register.
@ -144,9 +146,10 @@ Note: *N* and *K* must satisfy the following conditions:
* *N* <= *K*.
* 0 <= *N* <= 255.
* 0 <= *K* <= 255.
* *K-N+1* must be equal to 1, 2, 3, 4, 5, 6, 7, 8, 16 or 32.
* *K-N+1* must be in the range from 1 to 12 or equal to 16 or 32.
GFX90A has an additional alignment requirement: pairs of *accumulator* registers must be even-aligned
GFX90A and GFX940 have an additional alignment requirement:
pairs of *accumulator* registers must be even-aligned
(first register must be even).
Examples:
@ -173,7 +176,7 @@ Examples:
s
-
Scalar 32-bit registers. The number of available *scalar* registers depends on GPU:
Scalar 32-bit registers. The number of available *scalar* registers depends on the GPU:
======= ============================
GPU Number of *scalar* registers
@ -181,11 +184,11 @@ Scalar 32-bit registers. The number of available *scalar* registers depends on G
GFX7 104
GFX8 102
GFX9 102
GFX10 106
GFX10+ 106
======= ============================
A sequence of *scalar* registers may be used to operate with more than 32 bits of data.
Assembler currently supports sequences of 1, 2, 3, 4, 5, 6, 7, 8, 16 and 32 *scalar* registers.
Assembler currently supports tuples with 1 to 12, 16 and 32 *scalar* registers.
Pairs of *scalar* registers must be even-aligned (first register must be even).
Sequences of 4 and more *scalar* registers must be quad-aligned.
@ -217,11 +220,11 @@ Sequences of 4 and more *scalar* registers must be quad-aligned.
Note: *N* and *K* must satisfy the following conditions:
* *N* must be properly aligned based on sequence size.
* *N* must be properly aligned based on the sequence size.
* *N* <= *K*.
* 0 <= *N* < *SMAX*\ , where *SMAX* is the number of available *scalar* registers.
* 0 <= *K* < *SMAX*\ , where *SMAX* is the number of available *scalar* registers.
* *K-N+1* must be equal to 1, 2, 3, 4, 5, 6, 7, 8, 16 or 32.
* *K-N+1* must be in the range from 1 to 12 or equal to 16 or 32.
Examples:
@ -261,7 +264,7 @@ ttmp
----
Trap handler temporary scalar registers, 32-bits wide.
The number of available *ttmp* registers depends on GPU:
The number of available *ttmp* registers depends on the GPU:
======= ===========================
GPU Number of *ttmp* registers
@ -269,11 +272,11 @@ The number of available *ttmp* registers depends on GPU:
GFX7 12
GFX8 12
GFX9 16
GFX10 16
GFX10+ 16
======= ===========================
A sequence of *ttmp* registers may be used to operate with more than 32 bits of data.
Assembler currently supports sequences of 1, 2, 3, 4, 5, 6, 7, 8 and 16 *ttmp* registers.
Assembler currently supports tuples with 1 to 12 and 16 *ttmp* registers.
Pairs of *ttmp* registers must be even-aligned (first register must be even).
Sequences of 4 and more *ttmp* registers must be quad-aligned.
@ -303,11 +306,11 @@ Sequences of 4 and more *ttmp* registers must be quad-aligned.
Note: *N* and *K* must satisfy the following conditions:
* *N* must be properly aligned based on sequence size.
* *N* must be properly aligned based on the sequence size.
* *N* <= *K*.
* 0 <= *N* < *TMAX*, where *TMAX* is the number of available *ttmp* registers.
* 0 <= *K* < *TMAX*, where *TMAX* is the number of available *ttmp* registers.
* *K-N+1* must be equal to 1, 2, 3, 4, 5, 6, 7, 8 or 16.
* *K-N+1* must be in the range from 1 to 12 or equal to 16.
Examples:
@ -335,7 +338,8 @@ Examples of *ttmp* registers with an invalid alignment:
tba
---
Trap base address, 64-bits wide. Holds the pointer to the current trap handler program.
Trap base address, 64-bits wide. Holds the pointer to the current
trap handler program.
================== ======================================================================= =============
Syntax Description Availability
@ -356,9 +360,6 @@ High and low 32 bits of *trap base address* may be accessed as separate register
[tba_hi] High 32 bits of *trap base address* register (an SP3 syntax). GFX7, GFX8
================== ======================================================================= =============
Note that *tba*, *tba_lo* and *tba_hi* are not accessible as assembler registers in GFX9 and GFX10,
but *tba* is readable/writable with the help of *s_get_reg* and *s_set_reg* instructions.
.. _amdgpu_synid_tma:
tma
@ -385,9 +386,6 @@ High and low 32 bits of *trap memory address* may be accessed as separate regist
[tma_hi] High 32 bits of *trap memory address* register (an SP3 syntax). GFX7, GFX8
================= ======================================================================= ==================
Note that *tma*, *tma_lo* and *tma_hi* are not accessible as assembler registers in GFX9 and GFX10,
but *tma* is readable/writable with the help of *s_get_reg* and *s_set_reg* instructions.
.. _amdgpu_synid_flat_scratch:
flat_scratch
@ -414,10 +412,6 @@ High and low 32 bits of *flat scratch* address may be accessed as separate regis
[flat_scratch_hi] High 32 bits of *flat scratch* address register (an SP3 syntax).
========================= =========================================================================
Note that *flat_scratch*, *flat_scratch_lo* and *flat_scratch_hi* are not accessible as assembler
registers in GFX10, but *flat_scratch* is readable/writable with the help of
*s_get_reg* and *s_set_reg* instructions.
.. _amdgpu_synid_xnack:
.. _amdgpu_synid_xnack_mask:
@ -427,9 +421,7 @@ xnack_mask
Xnack mask, 64-bits wide. Holds a 64-bit mask of which threads
received an *XNACK* due to a vector memory operation.
.. WARNING:: GFX7 does not support *xnack* feature. For availability of this feature in other GPUs, refer :ref:`this table<amdgpu-processors>`.
\
For availability of *xnack* feature, refer to :ref:`this table<amdgpu-processors>`.
============================== =====================================================
Syntax Description
@ -450,10 +442,6 @@ High and low 32 bits of *xnack mask* may be accessed as separate registers:
[xnack_mask_hi] High 32 bits of *xnack mask* register (an SP3 syntax).
===================== ==============================================================
Note that *xnack_mask*, *xnack_mask_lo* and *xnack_mask_hi* are not accessible as assembler
registers in GFX10, but *xnack_mask* is readable/writable with the help of
*s_get_reg* and *s_set_reg* instructions.
.. _amdgpu_synid_vcc:
.. _amdgpu_synid_vcc_lo:
@ -463,7 +451,7 @@ vcc
Vector condition code, 64-bits wide. A bit mask with one bit per thread;
it holds the result of a vector compare operation.
Note that GFX10 H/W does not use high 32 bits of *vcc* in *wave32* mode.
Note that GFX10+ H/W does not use high 32 bits of *vcc* in *wave32* mode.
================ =========================================================================
Syntax Description
@ -508,7 +496,7 @@ Execute mask, 64-bits wide. A bit mask with one bit per thread,
which is applied to vector instructions and controls which threads execute
and which ignore the instruction.
Note that GFX10 H/W does not use high 32 bits of *exec* in *wave32* mode.
Note that GFX10+ H/W does not use high 32 bits of *exec* in *wave32* mode.
===================== =================================================================
Syntax Description
@ -534,18 +522,22 @@ High and low 32 bits of *execute mask* may be accessed as separate registers:
vccz
----
A single bit flag indicating that the :ref:`vcc<amdgpu_synid_vcc>` is all zeros.
A single bit flag indicating that the :ref:`vcc<amdgpu_synid_vcc>`
is all zeros.
Note: when GFX10 operates in *wave32* mode, this register reflects state of :ref:`vcc_lo<amdgpu_synid_vcc_lo>`.
Note: when GFX10+ operates in *wave32* mode, this register reflects
the state of :ref:`vcc_lo<amdgpu_synid_vcc_lo>`.
.. _amdgpu_synid_execz:
execz
-----
A single bit flag indicating that the :ref:`exec<amdgpu_synid_exec>` is all zeros.
A single bit flag indicating that the :ref:`exec<amdgpu_synid_exec>`
is all zeros.
Note: when GFX10 operates in *wave32* mode, this register reflects state of :ref:`exec_lo<amdgpu_synid_exec>`.
Note: when GFX10+ operates in *wave32* mode, this register reflects
the state of :ref:`exec_lo<amdgpu_synid_exec>`.
.. _amdgpu_synid_scc:
@ -567,34 +559,31 @@ fetched from *LDS* memory using :ref:`m0<amdgpu_synid_m0>` as an address.
null
----
This is a special operand which may be used as a source or a destination.
This is a special operand that may be used as a source or a destination.
When used as a destination, the result of the operation is discarded.
When used as a source, it supplies zero value.
GFX10 only.
.. WARNING:: Due to a H/W bug, this operand cannot be used with VALU instructions in first generation of GFX10.
.. _amdgpu_synid_constant:
inline constant
---------------
An *inline constant* is an integer or a floating-point value encoded as a part of an instruction.
Compare *inline constants* with :ref:`literals<amdgpu_synid_literal>`.
An *inline constant* is an integer or a floating-point value
encoded as a part of an instruction. Compare *inline constants*
with :ref:`literals<amdgpu_synid_literal>`.
Inline constants include:
* :ref:`iconst<amdgpu_synid_iconst>`
* :ref:`fconst<amdgpu_synid_fconst>`
* :ref:`ival<amdgpu_synid_ival>`
* :ref:`Integer inline constants<amdgpu_synid_iconst>`;
* :ref:`Floating-point inline constants<amdgpu_synid_fconst>`;
* :ref:`Inline values<amdgpu_synid_ival>`.
If a number may be encoded as either
a :ref:`literal<amdgpu_synid_literal>` or
a :ref:`constant<amdgpu_synid_constant>`,
assembler selects the latter encoding as more efficient.
the assembler selects the latter encoding as more efficient.
.. _amdgpu_synid_iconst:
@ -607,7 +596,7 @@ encoded as an *inline constant*.
Only a small fraction of integer numbers may be encoded as *inline constants*.
They are enumerated in the table below.
Other integer numbers have to be encoded as :ref:`literals<amdgpu_synid_literal>`.
Other integer numbers are encoded as :ref:`literals<amdgpu_synid_literal>`.
================================== ====================================
Value Note
@ -616,8 +605,6 @@ Other integer numbers have to be encoded as :ref:`literals<amdgpu_synid_literal>
{-16..-1} Negative integer inline constants.
================================== ====================================
.. WARNING:: GFX7 does not support inline constants for *f16* operands.
.. _amdgpu_synid_fconst:
fconst
@ -626,9 +613,10 @@ fconst
A :ref:`floating-point number<amdgpu_synid_floating-point_number>`
encoded as an *inline constant*.
Only a small fraction of floating-point numbers may be encoded as *inline constants*.
They are enumerated in the table below.
Other floating-point numbers have to be encoded as :ref:`literals<amdgpu_synid_literal>`.
Only a small fraction of floating-point numbers may be encoded
as *inline constants*. They are enumerated in the table below.
Other floating-point numbers are encoded as
:ref:`literals<amdgpu_synid_literal>`.
===================== ===================================================== ==================
Value Note Availability
@ -642,15 +630,13 @@ Other floating-point numbers have to be encoded as :ref:`literals<amdgpu_synid_l
-1.0 Floating-point constant -1.0 All GPUs
-2.0 Floating-point constant -2.0 All GPUs
-4.0 Floating-point constant -4.0 All GPUs
0.1592 1.0/(2.0*pi). Use only for 16-bit operands. GFX8, GFX9, GFX10
0.15915494 1.0/(2.0*pi). Use only for 16- and 32-bit operands. GFX8, GFX9, GFX10
0.15915494309189532 1.0/(2.0*pi). GFX8, GFX9, GFX10
0.1592 1.0/(2.0*pi). Use only for 16-bit operands. GFX8+
0.15915494 1.0/(2.0*pi). Use only for 16- and 32-bit operands. GFX8+
0.15915494309189532 1.0/(2.0*pi). GFX8+
===================== ===================================================== ==================
.. WARNING:: Floating-point inline constants cannot be used with *16-bit integer* operands. \
Assembler will attempt to encode these values as literals.
.. WARNING:: GFX7 does not support inline constants for *f16* operands.
Assembler encodes these values as literals.
.. _amdgpu_synid_ival:
@ -660,42 +646,45 @@ ival
A symbolic operand encoded as an *inline constant*.
These operands provide read-only access to H/W registers.
======================== ================================================ =============
Syntax Note Availability
======================== ================================================ =============
shared_base Base address of shared memory region. GFX9, GFX10
shared_limit Address of the end of shared memory region. GFX9, GFX10
private_base Base address of private memory region. GFX9, GFX10
private_limit Address of the end of private memory region. GFX9, GFX10
pops_exiting_wave_id A dedicated counter for POPS. GFX9, GFX10
======================== ================================================ =============
===================== ========================= ================================================ =============
Syntax Alternative Syntax (SP3) Note Availability
===================== ========================= ================================================ =============
shared_base src_shared_base Base address of shared memory region. GFX9+
shared_limit src_shared_limit Address of the end of shared memory region. GFX9+
private_base src_private_base Base address of private memory region. GFX9+
private_limit src_private_limit Address of the end of private memory region. GFX9+
pops_exiting_wave_id src_pops_exiting_wave_id A dedicated counter for POPS. GFX9, GFX10
===================== ========================= ================================================ =============
.. _amdgpu_synid_literal:
literal
-------
A *literal* is a 64-bit value encoded as a separate 32-bit dword in the instruction stream.
Compare *literals* with :ref:`inline constants<amdgpu_synid_constant>`.
A *literal* is a 64-bit value encoded as a separate
32-bit dword in the instruction stream. Compare *literals*
with :ref:`inline constants<amdgpu_synid_constant>`.
If a number may be encoded as either
a :ref:`literal<amdgpu_synid_literal>` or
an :ref:`inline constant<amdgpu_synid_constant>`,
assembler selects the latter encoding as more efficient.
Literals may be specified as :ref:`integer numbers<amdgpu_synid_integer_number>`,
Literals may be specified as
:ref:`integer numbers<amdgpu_synid_integer_number>`,
:ref:`floating-point numbers<amdgpu_synid_floating-point_number>`,
:ref:`absolute expressions<amdgpu_synid_absolute_expression>` or
:ref:`relocatable expressions<amdgpu_synid_relocatable_expression>`.
An instruction may use only one literal but several operands may refer the same literal.
An instruction may use only one literal,
but several operands may refer to the same literal.
.. _amdgpu_synid_uimm8:
uimm8
-----
A 8-bit :ref:`integer number<amdgpu_synid_integer_number>`
An 8-bit :ref:`integer number<amdgpu_synid_integer_number>`
or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
The value must be in the range 0..0xFF.
@ -756,7 +745,8 @@ Integer numbers are 64 bits wide.
They are converted to :ref:`expected operand type<amdgpu_syn_instruction_type>`
as described :ref:`here<amdgpu_synid_int_conv>`.
Integer numbers may be specified in binary, octal, hexadecimal and decimal formats:
Integer numbers may be specified in binary, octal,
hexadecimal and decimal formats:
============ =============================== ========
Format Syntax Example
@ -829,22 +819,23 @@ Relocatable Expressions
The value of a relocatable expression depends on program relocation.
Note that use of relocatable expressions is limited with branch targets
Note that use of relocatable expressions is limited to branch targets
and 32-bit integer operands.
A relocatable expression is evaluated to a 64-bit integer value
which depends on operand kind and :ref:`relocation type<amdgpu-relocation-records>`
of symbol(s) used in the expression. For example, if an instruction refers a label,
this reference is evaluated to an offset from the address after the instruction
to the label address:
A relocatable expression is evaluated to a 64-bit integer value,
which depends on operand kind and
:ref:`relocation type<amdgpu-relocation-records>` of symbol(s)
used in the expression. For example, if an instruction refers to a label,
this reference is evaluated to an offset from the address after
the instruction to the label address:
.. parsed-literal::
label:
v_add_co_u32_e32 v0, vcc, label, v1 // 'label' operand is evaluated to -4
Note that values of relocatable expressions are usually unknown at assembly time;
they are resolved later by a linker and converted to
Note that values of relocatable expressions are usually unknown
at assembly time; they are resolved later by a linker and converted to
:ref:`expected operand type<amdgpu_syn_instruction_type>`
as described :ref:`here<amdgpu_synid_rl_conv>`.
@ -855,9 +846,11 @@ Expressions are composed of 64-bit integer operands and operations.
Operands include :ref:`integer numbers<amdgpu_synid_integer_number>`
and :ref:`symbols<amdgpu_synid_symbol>`.
Expressions may also use "." which is a reference to the current PC (program counter).
Expressions may also use "." which is a reference
to the current PC (program counter).
:ref:`Unary<amdgpu_synid_expression_un_op>` and :ref:`binary<amdgpu_synid_expression_bin_op>`
:ref:`Unary<amdgpu_synid_expression_un_op>` and
:ref:`binary<amdgpu_synid_expression_bin_op>`
operations produce 64-bit integer results.
Syntax of Expressions
@ -988,20 +981,25 @@ is used for an operand which has a different type or size.
Conversion of Integer Values
----------------------------
Instruction operands may be specified as 64-bit :ref:`integer numbers<amdgpu_synid_integer_number>` or
:ref:`absolute expressions<amdgpu_synid_absolute_expression>`. These values are converted to
the :ref:`expected operand type<amdgpu_syn_instruction_type>` using the following steps:
Instruction operands may be specified as 64-bit
:ref:`integer numbers<amdgpu_synid_integer_number>` or
:ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
These values are converted to the
:ref:`expected operand type<amdgpu_syn_instruction_type>`
using the following steps:
1. *Validation*. Assembler checks if the input value may be truncated without loss to the required *truncation width*
(see the table below). There are two cases when this operation is enabled:
1. *Validation*. Assembler checks if the input value may be truncated
without loss to the required *truncation width* (see the table below).
There are two cases when this operation is enabled:
* The truncated bits are all 0.
* The truncated bits are all 1 and the value after truncation has its MSB bit set.
In all other cases assembler triggers an error.
In all other cases, the assembler triggers an error.
2. *Conversion*. The input value is converted to the expected type as described in the table below.
Depending on operand kind, this conversion is performed by either assembler or AMDGPU H/W (or both).
2. *Conversion*. The input value is converted to the expected type
as described in the table below. Depending on operand kind, this conversion
is performed by either assembler or AMDGPU H/W (or both).
============== ================= =============== ====================================================================
Expected type Truncation Width Conversion Description
@ -1055,21 +1053,26 @@ Examples of disabled conversions:
Conversion of Floating-Point Values
-----------------------------------
Instruction operands may be specified as 64-bit :ref:`floating-point numbers<amdgpu_synid_floating-point_number>`.
These values are converted to the :ref:`expected operand type<amdgpu_syn_instruction_type>` using the following steps:
Instruction operands may be specified as 64-bit
:ref:`floating-point numbers<amdgpu_synid_floating-point_number>`.
These values are converted to the
:ref:`expected operand type<amdgpu_syn_instruction_type>`
using the following steps:
1. *Validation*. Assembler checks if the input f64 number can be converted
to the *required floating-point type* (see the table below) without overflow or underflow.
Precision lost is allowed. If this conversion is not possible, assembler triggers an error.
to the *required floating-point type* (see the table below) without overflow
or underflow. Precision lost is allowed. If this conversion is not possible,
the assembler triggers an error.
2. *Conversion*. The input value is converted to the expected type as described in the table below.
Depending on operand kind, this is performed by either assembler or AMDGPU H/W (or both).
2. *Conversion*. The input value is converted to the expected type
as described in the table below. Depending on operand kind, this is
performed by either assembler or AMDGPU H/W (or both).
============== ================ ================= =================================================================
Expected type Required FP Type Conversion Description
============== ================ ================= =================================================================
i16, u16, b16 f16 f16(num) Convert to f16 and use bits of the result as an integer value.
The value has to be encoded as a literal or an error occurs.
The value has to be encoded as a literal, or an error occurs.
Note that the value cannot be encoded as an inline constant.
i32, u32, b32 f32 f32(num) Convert to f32 and use bits of the result as an integer value.
i64, u64, b64 \- \- Conversion disabled.
@ -1122,8 +1125,9 @@ When the value of a relocatable expression is resolved by a linker, it is
converted as needed and truncated to the operand size. The conversion depends
on :ref:`relocation type<amdgpu-relocation-records>` and operand kind.
For example, when a 32-bit operand of an instruction refers a relocatable expression *expr*,
this reference is evaluated to a 64-bit offset from the address after the
For example, when a 32-bit operand of an instruction refers
to a relocatable expression *expr*, this reference is evaluated
to a 64-bit offset from the address after the
instruction to the address being referenced, *counted in bytes*.
Then the value is truncated to 32 bits and encoded as a literal:
@ -1133,7 +1137,7 @@ Then the value is truncated to 32 bits and encoded as a literal:
v_add_co_u32_e32 v0, vcc, expr, v1 // 'expr' operand is evaluated to -4
// and then truncated to 0xFFFFFFFC
As another example, when a branch instruction refers a label,
As another example, when a branch instruction refers to a label,
this reference is evaluated to an offset from the address after the
instruction to the label address, *counted in dwords*.
Then the value is truncated to 16 bits: