mirror of
https://github.com/capstone-engine/llvm-capstone.git
synced 2024-11-23 13:50:11 +00:00
[AMDGPU][DOC][NFC] Update assembler syntax description
Summary of changes: - Enable register tuples with 9, 10, 11 and 12 registers (https://reviews.llvm.org/D138205). - Small improvements and clarifications. - Correct typos.
This commit is contained in:
parent
ff302f8502
commit
d9daee5a66
@ -10,10 +10,10 @@ AMDGPU Instructions Notation
|
||||
Introduction
|
||||
============
|
||||
|
||||
This is an overview of notation used to describe syntax of AMDGPU assembler instructions.
|
||||
This is an overview of notation used to describe the syntax of AMDGPU assembler instructions.
|
||||
|
||||
This notation mimics the :ref:`syntax of assembler instructions<amdgpu_syn_instructions>`
|
||||
except that instead of real operands and modifiers it provides references to their description.
|
||||
This notation looks a lot like the :ref:`syntax of assembler instructions<amdgpu_syn_instructions>`,
|
||||
except that instead of real operands and modifiers, it uses references to their descriptions.
|
||||
|
||||
Instructions
|
||||
============
|
||||
@ -23,7 +23,9 @@ Notation
|
||||
|
||||
This is the notation used to describe AMDGPU instructions:
|
||||
|
||||
``<``\ :ref:`opcode description<amdgpu_syn_opcode_notation>`\ ``> <``\ :ref:`operands description<amdgpu_syn_instruction_operands_notation>`\ ``> <``\ :ref:`modifiers description<amdgpu_syn_instruction_modifiers_notation>`\ ``>``
|
||||
| ``<``\ :ref:`opcode description<amdgpu_syn_opcode_notation>`\ ``>
|
||||
<``\ :ref:`operands description<amdgpu_syn_instruction_operands_notation>`\ ``>
|
||||
<``\ :ref:`modifiers description<amdgpu_syn_instruction_modifiers_notation>`\ ``>``
|
||||
|
||||
.. _amdgpu_syn_opcode_notation:
|
||||
|
||||
@ -42,7 +44,8 @@ Operands
|
||||
|
||||
An instruction may have zero or more *operands*. They are comma-separated in the description:
|
||||
|
||||
``<``\ :ref:`description of operand 0<amdgpu_syn_instruction_operand_notation>`\ ``>, <``\ :ref:`description of operand 1<amdgpu_syn_instruction_operand_notation>`\ ``>, ...``
|
||||
| ``<``\ :ref:`description of operand 0<amdgpu_syn_instruction_operand_notation>`\ ``>,
|
||||
<``\ :ref:`description of operand 1<amdgpu_syn_instruction_operand_notation>`\ ``>, ...``
|
||||
|
||||
The order of *operands* is fixed. *Operands* cannot be omitted
|
||||
except for special cases described below.
|
||||
@ -60,7 +63,8 @@ Where:
|
||||
|
||||
* *kind* is an optional prefix describing operand :ref:`kind<amdgpu_syn_instruction_operand_kinds>`.
|
||||
* *name* is a link to a description of the operand.
|
||||
* *tags* are optional. They are used to indicate :ref:`special operand properties<amdgpu_syn_instruction_operand_tags>`.
|
||||
* *tags* are optional. They are used to indicate
|
||||
:ref:`special operand properties<amdgpu_syn_instruction_operand_tags>`.
|
||||
|
||||
.. _amdgpu_syn_instruction_operand_kinds:
|
||||
|
||||
@ -70,8 +74,8 @@ Operand Kinds
|
||||
Operand kind indicates which values are accepted by the operand.
|
||||
|
||||
* Operands which only accept *vector* registers are labelled with 'v' prefix.
|
||||
* Operands which only accept *scalar* values are labelled with 's' prefix.
|
||||
* Operands which accept both *vector* registers and *scalar* values have no prefix.
|
||||
* Operands which only accept *scalar* registers and values are labelled with 's' prefix.
|
||||
* Operands which accept any registers and values have no prefix.
|
||||
|
||||
Examples:
|
||||
|
||||
@ -79,7 +83,7 @@ Examples:
|
||||
|
||||
vdata // operand only accepts vector registers
|
||||
sdst // operand only accepts scalar registers
|
||||
src1 // operand accepts both scalar and vector registers
|
||||
src1 // operand accepts vector registers, scalar registers, and scalar values
|
||||
|
||||
.. _amdgpu_syn_instruction_operand_tags:
|
||||
|
||||
@ -92,16 +96,16 @@ Operand tags indicate special operand properties.
|
||||
Operand tag Meaning
|
||||
============== =================================================================================
|
||||
:opt An optional operand.
|
||||
:m An operand which may be used with
|
||||
:ref:`VOP3 operand modifiers<amdgpu_synid_vop3_operand_modifiers>` or
|
||||
:ref:`SDWA operand modifiers<amdgpu_synid_sdwa_operand_modifiers>`.
|
||||
:dst An input operand which may also serve as a destination
|
||||
:m An operand which may be used with operand modifiers
|
||||
:ref:`abs<amdgpu_synid_abs>`, :ref:`neg<amdgpu_synid_neg>` or
|
||||
:ref:`sext<amdgpu_synid_sext>`.
|
||||
:dst An input operand which is also used as a destination
|
||||
if :ref:`glc<amdgpu_synid_glc>` modifier is specified.
|
||||
:fx This is an *f32* or *f16* operand depending on
|
||||
:fx This is a *f32* or *f16* operand, depending on
|
||||
:ref:`m_op_sel_hi<amdgpu_synid_mad_mix_op_sel_hi>` modifier.
|
||||
:<type> Operand *type* differs from *type*
|
||||
:<type> The operand *type* differs from the *type*
|
||||
:ref:`implied by the opcode name<amdgpu_syn_instruction_type>`.
|
||||
This tag specifies actual operand *type*.
|
||||
This tag specifies the actual operand *type*.
|
||||
============== =================================================================================
|
||||
|
||||
Examples:
|
||||
@ -119,7 +123,8 @@ Modifiers
|
||||
|
||||
An instruction may have zero or more optional *modifiers*. They are space-separated in the description:
|
||||
|
||||
``<``\ :ref:`description of modifier 0<amdgpu_syn_instruction_modifier_notation>`\ ``> <``\ :ref:`description of modifier 1<amdgpu_syn_instruction_modifier_notation>`\ ``> ...``
|
||||
| ``<``\ :ref:`description of modifier 0<amdgpu_syn_instruction_modifier_notation>`\ ``>
|
||||
<``\ :ref:`description of modifier 1<amdgpu_syn_instruction_modifier_notation>`\ ``> ...``
|
||||
|
||||
The order of *modifiers* is fixed.
|
||||
|
||||
@ -132,4 +137,4 @@ A *modifier* is described using the following notation:
|
||||
|
||||
*<name>*
|
||||
|
||||
Where *name* is a link to a description of the *modifier*.
|
||||
Where the *name* is a link to a description of the *modifier*.
|
||||
|
@ -15,9 +15,10 @@ Syntax
|
||||
|
||||
An instruction has the following syntax:
|
||||
|
||||
``<``\ *opcode mnemonic*\ ``> <``\ *operand0*\ ``>, <``\ *operand1*\ ``>,... <``\ *modifier0*\ ``> <``\ *modifier1*\ ``>...``
|
||||
| ``<``\ *opcode mnemonic*\ ``> <``\ *operand0*\ ``>,
|
||||
<``\ *operand1*\ ``>,... <``\ *modifier0*\ ``> <``\ *modifier1*\ ``>...``
|
||||
|
||||
:doc:`Operands<AMDGPUOperandSyntax>` are normally comma-separated while
|
||||
:doc:`Operands<AMDGPUOperandSyntax>` are normally comma-separated, while
|
||||
:doc:`modifiers<AMDGPUModifierSyntax>` are space-separated.
|
||||
|
||||
The order of *operands* and *modifiers* is fixed.
|
||||
@ -28,7 +29,8 @@ Most *modifiers* are optional and may be omitted.
|
||||
Opcode Mnemonic
|
||||
~~~~~~~~~~~~~~~
|
||||
|
||||
Opcode mnemonic describes opcode semantics and may include one or more suffices in this order:
|
||||
Opcode mnemonic describes opcode semantics
|
||||
and may include one or more suffices in this order:
|
||||
|
||||
* :ref:`Packing suffix<amdgpu_syn_instruction_pk>`.
|
||||
* :ref:`Destination operand type suffix<amdgpu_syn_instruction_type>`.
|
||||
@ -81,7 +83,7 @@ The following table enumerates the most frequently used type suffices.
|
||||
============================================ ======================= ============================
|
||||
|
||||
Instructions which have no type suffices are assumed to operate with typeless data.
|
||||
The size of data is specified by size suffices:
|
||||
The size of typeless data is specified by size suffices:
|
||||
|
||||
================= =================== =====================================
|
||||
Size Suffix Implied data type Required register size in dwords
|
||||
@ -103,8 +105,8 @@ The size of data is specified by size suffices:
|
||||
================= =================== =====================================
|
||||
|
||||
.. WARNING::
|
||||
There are exceptions from rules described above.
|
||||
Operands which have type different from type specified by the opcode are
|
||||
There are exceptions to the rules described above.
|
||||
Operands which have a type different from the type specified by the opcode are
|
||||
:ref:`tagged<amdgpu_syn_instruction_operand_tags>` in the description.
|
||||
|
||||
Examples of instructions with different types of source and destination operands:
|
||||
@ -144,7 +146,9 @@ Encoding Suffices
|
||||
Most *VOP1*, *VOP2* and *VOPC* instructions have several variants:
|
||||
they may also be encoded in *VOP3*, *DPP* and *SDWA* formats.
|
||||
|
||||
The assembler will automatically use optimal encoding based on instruction operands.
|
||||
The assembler selects an optimal encoding automatically
|
||||
based on instruction operands and modifiers,
|
||||
unless a specific encoding is explicitly requested.
|
||||
To force specific encoding, one can add a suffix to the opcode of the instruction:
|
||||
|
||||
=================================================== =================
|
||||
@ -156,8 +160,8 @@ To force specific encoding, one can add a suffix to the opcode of the instructio
|
||||
*SDWA* encoding _sdwa
|
||||
=================================================== =================
|
||||
|
||||
These suffices are used in this reference to indicate the assumed encoding.
|
||||
When no suffix is specified, native instruction encoding is implied.
|
||||
This reference uses encoding suffices to specify which encoding is implied.
|
||||
When no suffix is specified, native instruction encoding is assumed.
|
||||
|
||||
Operands
|
||||
========
|
||||
@ -165,9 +169,9 @@ Operands
|
||||
Syntax
|
||||
~~~~~~
|
||||
|
||||
Syntax of generic operands is described :doc:`in this document<AMDGPUOperandSyntax>`.
|
||||
The syntax of generic operands is described :doc:`in this document<AMDGPUOperandSyntax>`.
|
||||
|
||||
For detailed information about operands follow *operand links* in GPU-specific documents.
|
||||
For detailed information about operands, follow *operand links* in GPU-specific documents.
|
||||
|
||||
Modifiers
|
||||
=========
|
||||
@ -175,6 +179,7 @@ Modifiers
|
||||
Syntax
|
||||
~~~~~~
|
||||
|
||||
Syntax of modifiers is described :doc:`in this document<AMDGPUModifierSyntax>`.
|
||||
The syntax of modifiers is described :doc:`in this document<AMDGPUModifierSyntax>`.
|
||||
|
||||
Information about modifiers supported for individual instructions may be found in GPU-specific documents.
|
||||
Information about modifiers supported for individual instructions
|
||||
may be found in GPU-specific documents.
|
||||
|
File diff suppressed because it is too large
Load Diff
@ -14,7 +14,7 @@ The following notation is used throughout this document:
|
||||
Notation Description
|
||||
=================== =============================================================================
|
||||
{0..N} Any integer value in the range from 0 to N (inclusive).
|
||||
<x> Syntax and meaning of *x* is explained elsewhere.
|
||||
<x> Syntax and meaning of *x* are explained elsewhere.
|
||||
=================== =============================================================================
|
||||
|
||||
.. _amdgpu_syn_operands:
|
||||
@ -31,7 +31,7 @@ Vector registers. There are 256 32-bit vector registers.
|
||||
|
||||
A sequence of *vector* registers may be used to operate with more than 32 bits of data.
|
||||
|
||||
Assembler currently supports sequences of 1, 2, 3, 4, 5, 6, 7, 8, 16 and 32 *vector* registers.
|
||||
Assembler currently supports tuples with 1 to 12, 16 and 32 *vector* registers.
|
||||
|
||||
=================================================== ====================================================================
|
||||
Syntax Description
|
||||
@ -61,9 +61,10 @@ Note: *N* and *K* must satisfy the following conditions:
|
||||
* *N* <= *K*.
|
||||
* 0 <= *N* <= 255.
|
||||
* 0 <= *K* <= 255.
|
||||
* *K-N+1* must be equal to 1, 2, 3, 4, 5, 6, 7, 8, 16 or 32.
|
||||
* *K-N+1* must be in the range from 1 to 12 or equal to 16 or 32.
|
||||
|
||||
GFX90A has an additional alignment requirement: pairs of *vector* registers must be even-aligned
|
||||
GFX90A and GFX940 have an additional alignment requirement:
|
||||
pairs of *vector* registers must be even-aligned
|
||||
(first register must be even).
|
||||
|
||||
Examples:
|
||||
@ -82,19 +83,20 @@ Examples:
|
||||
|
||||
.. _amdgpu_synid_nsa:
|
||||
|
||||
GFX10 *Image* instructions may use special *NSA* (Non-Sequential Address) syntax for *image addresses*:
|
||||
GFX10+ *image* instructions may use special *NSA* (Non-Sequential Address)
|
||||
syntax for *image addresses*:
|
||||
|
||||
===================================== =================================================
|
||||
Syntax Description
|
||||
===================================== =================================================
|
||||
**[Vm**, \ **Vn**, ... **Vk**\ **]** A sequence of 32-bit *vector* registers.
|
||||
Each register may be specified using syntax
|
||||
Each register may be specified using the syntax
|
||||
defined :ref:`above<amdgpu_synid_v>`.
|
||||
|
||||
In contrast with standard syntax, registers
|
||||
In contrast with the standard syntax, registers
|
||||
in *NSA* sequence are not required to have
|
||||
consecutive indices. Moreover, the same register
|
||||
may appear in the list more than once.
|
||||
may appear in the sequence more than once.
|
||||
===================================== =================================================
|
||||
|
||||
Examples:
|
||||
@ -114,10 +116,10 @@ Accumulator registers. There are 256 32-bit accumulator registers.
|
||||
|
||||
A sequence of *accumulator* registers may be used to operate with more than 32 bits of data.
|
||||
|
||||
Assembler currently supports sequences of 1, 2, 3, 4, 5, 6, 7, 8, 16 and 32 *accumulator* registers.
|
||||
Assembler currently supports tuples with 1 to 12, 16 and 32 *accumulator* registers.
|
||||
|
||||
=================================================== ========================================================= ====================================================================
|
||||
Syntax An Alternative Syntax (SP3) Description
|
||||
Syntax Alternative Syntax (SP3) Description
|
||||
=================================================== ========================================================= ====================================================================
|
||||
**a**\<N> **acc**\<N> A single 32-bit *accumulator* register.
|
||||
|
||||
@ -144,9 +146,10 @@ Note: *N* and *K* must satisfy the following conditions:
|
||||
* *N* <= *K*.
|
||||
* 0 <= *N* <= 255.
|
||||
* 0 <= *K* <= 255.
|
||||
* *K-N+1* must be equal to 1, 2, 3, 4, 5, 6, 7, 8, 16 or 32.
|
||||
* *K-N+1* must be in the range from 1 to 12 or equal to 16 or 32.
|
||||
|
||||
GFX90A has an additional alignment requirement: pairs of *accumulator* registers must be even-aligned
|
||||
GFX90A and GFX940 have an additional alignment requirement:
|
||||
pairs of *accumulator* registers must be even-aligned
|
||||
(first register must be even).
|
||||
|
||||
Examples:
|
||||
@ -173,7 +176,7 @@ Examples:
|
||||
s
|
||||
-
|
||||
|
||||
Scalar 32-bit registers. The number of available *scalar* registers depends on GPU:
|
||||
Scalar 32-bit registers. The number of available *scalar* registers depends on the GPU:
|
||||
|
||||
======= ============================
|
||||
GPU Number of *scalar* registers
|
||||
@ -181,11 +184,11 @@ Scalar 32-bit registers. The number of available *scalar* registers depends on G
|
||||
GFX7 104
|
||||
GFX8 102
|
||||
GFX9 102
|
||||
GFX10 106
|
||||
GFX10+ 106
|
||||
======= ============================
|
||||
|
||||
A sequence of *scalar* registers may be used to operate with more than 32 bits of data.
|
||||
Assembler currently supports sequences of 1, 2, 3, 4, 5, 6, 7, 8, 16 and 32 *scalar* registers.
|
||||
Assembler currently supports tuples with 1 to 12, 16 and 32 *scalar* registers.
|
||||
|
||||
Pairs of *scalar* registers must be even-aligned (first register must be even).
|
||||
Sequences of 4 and more *scalar* registers must be quad-aligned.
|
||||
@ -217,11 +220,11 @@ Sequences of 4 and more *scalar* registers must be quad-aligned.
|
||||
|
||||
Note: *N* and *K* must satisfy the following conditions:
|
||||
|
||||
* *N* must be properly aligned based on sequence size.
|
||||
* *N* must be properly aligned based on the sequence size.
|
||||
* *N* <= *K*.
|
||||
* 0 <= *N* < *SMAX*\ , where *SMAX* is the number of available *scalar* registers.
|
||||
* 0 <= *K* < *SMAX*\ , where *SMAX* is the number of available *scalar* registers.
|
||||
* *K-N+1* must be equal to 1, 2, 3, 4, 5, 6, 7, 8, 16 or 32.
|
||||
* *K-N+1* must be in the range from 1 to 12 or equal to 16 or 32.
|
||||
|
||||
Examples:
|
||||
|
||||
@ -261,7 +264,7 @@ ttmp
|
||||
----
|
||||
|
||||
Trap handler temporary scalar registers, 32-bits wide.
|
||||
The number of available *ttmp* registers depends on GPU:
|
||||
The number of available *ttmp* registers depends on the GPU:
|
||||
|
||||
======= ===========================
|
||||
GPU Number of *ttmp* registers
|
||||
@ -269,11 +272,11 @@ The number of available *ttmp* registers depends on GPU:
|
||||
GFX7 12
|
||||
GFX8 12
|
||||
GFX9 16
|
||||
GFX10 16
|
||||
GFX10+ 16
|
||||
======= ===========================
|
||||
|
||||
A sequence of *ttmp* registers may be used to operate with more than 32 bits of data.
|
||||
Assembler currently supports sequences of 1, 2, 3, 4, 5, 6, 7, 8 and 16 *ttmp* registers.
|
||||
Assembler currently supports tuples with 1 to 12 and 16 *ttmp* registers.
|
||||
|
||||
Pairs of *ttmp* registers must be even-aligned (first register must be even).
|
||||
Sequences of 4 and more *ttmp* registers must be quad-aligned.
|
||||
@ -303,11 +306,11 @@ Sequences of 4 and more *ttmp* registers must be quad-aligned.
|
||||
|
||||
Note: *N* and *K* must satisfy the following conditions:
|
||||
|
||||
* *N* must be properly aligned based on sequence size.
|
||||
* *N* must be properly aligned based on the sequence size.
|
||||
* *N* <= *K*.
|
||||
* 0 <= *N* < *TMAX*, where *TMAX* is the number of available *ttmp* registers.
|
||||
* 0 <= *K* < *TMAX*, where *TMAX* is the number of available *ttmp* registers.
|
||||
* *K-N+1* must be equal to 1, 2, 3, 4, 5, 6, 7, 8 or 16.
|
||||
* *K-N+1* must be in the range from 1 to 12 or equal to 16.
|
||||
|
||||
Examples:
|
||||
|
||||
@ -335,7 +338,8 @@ Examples of *ttmp* registers with an invalid alignment:
|
||||
tba
|
||||
---
|
||||
|
||||
Trap base address, 64-bits wide. Holds the pointer to the current trap handler program.
|
||||
Trap base address, 64-bits wide. Holds the pointer to the current
|
||||
trap handler program.
|
||||
|
||||
================== ======================================================================= =============
|
||||
Syntax Description Availability
|
||||
@ -356,9 +360,6 @@ High and low 32 bits of *trap base address* may be accessed as separate register
|
||||
[tba_hi] High 32 bits of *trap base address* register (an SP3 syntax). GFX7, GFX8
|
||||
================== ======================================================================= =============
|
||||
|
||||
Note that *tba*, *tba_lo* and *tba_hi* are not accessible as assembler registers in GFX9 and GFX10,
|
||||
but *tba* is readable/writable with the help of *s_get_reg* and *s_set_reg* instructions.
|
||||
|
||||
.. _amdgpu_synid_tma:
|
||||
|
||||
tma
|
||||
@ -385,9 +386,6 @@ High and low 32 bits of *trap memory address* may be accessed as separate regist
|
||||
[tma_hi] High 32 bits of *trap memory address* register (an SP3 syntax). GFX7, GFX8
|
||||
================= ======================================================================= ==================
|
||||
|
||||
Note that *tma*, *tma_lo* and *tma_hi* are not accessible as assembler registers in GFX9 and GFX10,
|
||||
but *tma* is readable/writable with the help of *s_get_reg* and *s_set_reg* instructions.
|
||||
|
||||
.. _amdgpu_synid_flat_scratch:
|
||||
|
||||
flat_scratch
|
||||
@ -414,10 +412,6 @@ High and low 32 bits of *flat scratch* address may be accessed as separate regis
|
||||
[flat_scratch_hi] High 32 bits of *flat scratch* address register (an SP3 syntax).
|
||||
========================= =========================================================================
|
||||
|
||||
Note that *flat_scratch*, *flat_scratch_lo* and *flat_scratch_hi* are not accessible as assembler
|
||||
registers in GFX10, but *flat_scratch* is readable/writable with the help of
|
||||
*s_get_reg* and *s_set_reg* instructions.
|
||||
|
||||
.. _amdgpu_synid_xnack:
|
||||
.. _amdgpu_synid_xnack_mask:
|
||||
|
||||
@ -427,9 +421,7 @@ xnack_mask
|
||||
Xnack mask, 64-bits wide. Holds a 64-bit mask of which threads
|
||||
received an *XNACK* due to a vector memory operation.
|
||||
|
||||
.. WARNING:: GFX7 does not support *xnack* feature. For availability of this feature in other GPUs, refer :ref:`this table<amdgpu-processors>`.
|
||||
|
||||
\
|
||||
For availability of *xnack* feature, refer to :ref:`this table<amdgpu-processors>`.
|
||||
|
||||
============================== =====================================================
|
||||
Syntax Description
|
||||
@ -450,10 +442,6 @@ High and low 32 bits of *xnack mask* may be accessed as separate registers:
|
||||
[xnack_mask_hi] High 32 bits of *xnack mask* register (an SP3 syntax).
|
||||
===================== ==============================================================
|
||||
|
||||
Note that *xnack_mask*, *xnack_mask_lo* and *xnack_mask_hi* are not accessible as assembler
|
||||
registers in GFX10, but *xnack_mask* is readable/writable with the help of
|
||||
*s_get_reg* and *s_set_reg* instructions.
|
||||
|
||||
.. _amdgpu_synid_vcc:
|
||||
.. _amdgpu_synid_vcc_lo:
|
||||
|
||||
@ -463,7 +451,7 @@ vcc
|
||||
Vector condition code, 64-bits wide. A bit mask with one bit per thread;
|
||||
it holds the result of a vector compare operation.
|
||||
|
||||
Note that GFX10 H/W does not use high 32 bits of *vcc* in *wave32* mode.
|
||||
Note that GFX10+ H/W does not use high 32 bits of *vcc* in *wave32* mode.
|
||||
|
||||
================ =========================================================================
|
||||
Syntax Description
|
||||
@ -508,7 +496,7 @@ Execute mask, 64-bits wide. A bit mask with one bit per thread,
|
||||
which is applied to vector instructions and controls which threads execute
|
||||
and which ignore the instruction.
|
||||
|
||||
Note that GFX10 H/W does not use high 32 bits of *exec* in *wave32* mode.
|
||||
Note that GFX10+ H/W does not use high 32 bits of *exec* in *wave32* mode.
|
||||
|
||||
===================== =================================================================
|
||||
Syntax Description
|
||||
@ -534,18 +522,22 @@ High and low 32 bits of *execute mask* may be accessed as separate registers:
|
||||
vccz
|
||||
----
|
||||
|
||||
A single bit flag indicating that the :ref:`vcc<amdgpu_synid_vcc>` is all zeros.
|
||||
A single bit flag indicating that the :ref:`vcc<amdgpu_synid_vcc>`
|
||||
is all zeros.
|
||||
|
||||
Note: when GFX10 operates in *wave32* mode, this register reflects state of :ref:`vcc_lo<amdgpu_synid_vcc_lo>`.
|
||||
Note: when GFX10+ operates in *wave32* mode, this register reflects
|
||||
the state of :ref:`vcc_lo<amdgpu_synid_vcc_lo>`.
|
||||
|
||||
.. _amdgpu_synid_execz:
|
||||
|
||||
execz
|
||||
-----
|
||||
|
||||
A single bit flag indicating that the :ref:`exec<amdgpu_synid_exec>` is all zeros.
|
||||
A single bit flag indicating that the :ref:`exec<amdgpu_synid_exec>`
|
||||
is all zeros.
|
||||
|
||||
Note: when GFX10 operates in *wave32* mode, this register reflects state of :ref:`exec_lo<amdgpu_synid_exec>`.
|
||||
Note: when GFX10+ operates in *wave32* mode, this register reflects
|
||||
the state of :ref:`exec_lo<amdgpu_synid_exec>`.
|
||||
|
||||
.. _amdgpu_synid_scc:
|
||||
|
||||
@ -567,34 +559,31 @@ fetched from *LDS* memory using :ref:`m0<amdgpu_synid_m0>` as an address.
|
||||
null
|
||||
----
|
||||
|
||||
This is a special operand which may be used as a source or a destination.
|
||||
This is a special operand that may be used as a source or a destination.
|
||||
|
||||
When used as a destination, the result of the operation is discarded.
|
||||
|
||||
When used as a source, it supplies zero value.
|
||||
|
||||
GFX10 only.
|
||||
|
||||
.. WARNING:: Due to a H/W bug, this operand cannot be used with VALU instructions in first generation of GFX10.
|
||||
|
||||
.. _amdgpu_synid_constant:
|
||||
|
||||
inline constant
|
||||
---------------
|
||||
|
||||
An *inline constant* is an integer or a floating-point value encoded as a part of an instruction.
|
||||
Compare *inline constants* with :ref:`literals<amdgpu_synid_literal>`.
|
||||
An *inline constant* is an integer or a floating-point value
|
||||
encoded as a part of an instruction. Compare *inline constants*
|
||||
with :ref:`literals<amdgpu_synid_literal>`.
|
||||
|
||||
Inline constants include:
|
||||
|
||||
* :ref:`iconst<amdgpu_synid_iconst>`
|
||||
* :ref:`fconst<amdgpu_synid_fconst>`
|
||||
* :ref:`ival<amdgpu_synid_ival>`
|
||||
* :ref:`Integer inline constants<amdgpu_synid_iconst>`;
|
||||
* :ref:`Floating-point inline constants<amdgpu_synid_fconst>`;
|
||||
* :ref:`Inline values<amdgpu_synid_ival>`.
|
||||
|
||||
If a number may be encoded as either
|
||||
a :ref:`literal<amdgpu_synid_literal>` or
|
||||
a :ref:`constant<amdgpu_synid_constant>`,
|
||||
assembler selects the latter encoding as more efficient.
|
||||
the assembler selects the latter encoding as more efficient.
|
||||
|
||||
.. _amdgpu_synid_iconst:
|
||||
|
||||
@ -607,7 +596,7 @@ encoded as an *inline constant*.
|
||||
|
||||
Only a small fraction of integer numbers may be encoded as *inline constants*.
|
||||
They are enumerated in the table below.
|
||||
Other integer numbers have to be encoded as :ref:`literals<amdgpu_synid_literal>`.
|
||||
Other integer numbers are encoded as :ref:`literals<amdgpu_synid_literal>`.
|
||||
|
||||
================================== ====================================
|
||||
Value Note
|
||||
@ -616,8 +605,6 @@ Other integer numbers have to be encoded as :ref:`literals<amdgpu_synid_literal>
|
||||
{-16..-1} Negative integer inline constants.
|
||||
================================== ====================================
|
||||
|
||||
.. WARNING:: GFX7 does not support inline constants for *f16* operands.
|
||||
|
||||
.. _amdgpu_synid_fconst:
|
||||
|
||||
fconst
|
||||
@ -626,9 +613,10 @@ fconst
|
||||
A :ref:`floating-point number<amdgpu_synid_floating-point_number>`
|
||||
encoded as an *inline constant*.
|
||||
|
||||
Only a small fraction of floating-point numbers may be encoded as *inline constants*.
|
||||
They are enumerated in the table below.
|
||||
Other floating-point numbers have to be encoded as :ref:`literals<amdgpu_synid_literal>`.
|
||||
Only a small fraction of floating-point numbers may be encoded
|
||||
as *inline constants*. They are enumerated in the table below.
|
||||
Other floating-point numbers are encoded as
|
||||
:ref:`literals<amdgpu_synid_literal>`.
|
||||
|
||||
===================== ===================================================== ==================
|
||||
Value Note Availability
|
||||
@ -642,15 +630,13 @@ Other floating-point numbers have to be encoded as :ref:`literals<amdgpu_synid_l
|
||||
-1.0 Floating-point constant -1.0 All GPUs
|
||||
-2.0 Floating-point constant -2.0 All GPUs
|
||||
-4.0 Floating-point constant -4.0 All GPUs
|
||||
0.1592 1.0/(2.0*pi). Use only for 16-bit operands. GFX8, GFX9, GFX10
|
||||
0.15915494 1.0/(2.0*pi). Use only for 16- and 32-bit operands. GFX8, GFX9, GFX10
|
||||
0.15915494309189532 1.0/(2.0*pi). GFX8, GFX9, GFX10
|
||||
0.1592 1.0/(2.0*pi). Use only for 16-bit operands. GFX8+
|
||||
0.15915494 1.0/(2.0*pi). Use only for 16- and 32-bit operands. GFX8+
|
||||
0.15915494309189532 1.0/(2.0*pi). GFX8+
|
||||
===================== ===================================================== ==================
|
||||
|
||||
.. WARNING:: Floating-point inline constants cannot be used with *16-bit integer* operands. \
|
||||
Assembler will attempt to encode these values as literals.
|
||||
|
||||
.. WARNING:: GFX7 does not support inline constants for *f16* operands.
|
||||
Assembler encodes these values as literals.
|
||||
|
||||
.. _amdgpu_synid_ival:
|
||||
|
||||
@ -660,42 +646,45 @@ ival
|
||||
A symbolic operand encoded as an *inline constant*.
|
||||
These operands provide read-only access to H/W registers.
|
||||
|
||||
======================== ================================================ =============
|
||||
Syntax Note Availability
|
||||
======================== ================================================ =============
|
||||
shared_base Base address of shared memory region. GFX9, GFX10
|
||||
shared_limit Address of the end of shared memory region. GFX9, GFX10
|
||||
private_base Base address of private memory region. GFX9, GFX10
|
||||
private_limit Address of the end of private memory region. GFX9, GFX10
|
||||
pops_exiting_wave_id A dedicated counter for POPS. GFX9, GFX10
|
||||
======================== ================================================ =============
|
||||
===================== ========================= ================================================ =============
|
||||
Syntax Alternative Syntax (SP3) Note Availability
|
||||
===================== ========================= ================================================ =============
|
||||
shared_base src_shared_base Base address of shared memory region. GFX9+
|
||||
shared_limit src_shared_limit Address of the end of shared memory region. GFX9+
|
||||
private_base src_private_base Base address of private memory region. GFX9+
|
||||
private_limit src_private_limit Address of the end of private memory region. GFX9+
|
||||
pops_exiting_wave_id src_pops_exiting_wave_id A dedicated counter for POPS. GFX9, GFX10
|
||||
===================== ========================= ================================================ =============
|
||||
|
||||
.. _amdgpu_synid_literal:
|
||||
|
||||
literal
|
||||
-------
|
||||
|
||||
A *literal* is a 64-bit value encoded as a separate 32-bit dword in the instruction stream.
|
||||
Compare *literals* with :ref:`inline constants<amdgpu_synid_constant>`.
|
||||
A *literal* is a 64-bit value encoded as a separate
|
||||
32-bit dword in the instruction stream. Compare *literals*
|
||||
with :ref:`inline constants<amdgpu_synid_constant>`.
|
||||
|
||||
If a number may be encoded as either
|
||||
a :ref:`literal<amdgpu_synid_literal>` or
|
||||
an :ref:`inline constant<amdgpu_synid_constant>`,
|
||||
assembler selects the latter encoding as more efficient.
|
||||
|
||||
Literals may be specified as :ref:`integer numbers<amdgpu_synid_integer_number>`,
|
||||
Literals may be specified as
|
||||
:ref:`integer numbers<amdgpu_synid_integer_number>`,
|
||||
:ref:`floating-point numbers<amdgpu_synid_floating-point_number>`,
|
||||
:ref:`absolute expressions<amdgpu_synid_absolute_expression>` or
|
||||
:ref:`relocatable expressions<amdgpu_synid_relocatable_expression>`.
|
||||
|
||||
An instruction may use only one literal but several operands may refer the same literal.
|
||||
An instruction may use only one literal,
|
||||
but several operands may refer to the same literal.
|
||||
|
||||
.. _amdgpu_synid_uimm8:
|
||||
|
||||
uimm8
|
||||
-----
|
||||
|
||||
A 8-bit :ref:`integer number<amdgpu_synid_integer_number>`
|
||||
An 8-bit :ref:`integer number<amdgpu_synid_integer_number>`
|
||||
or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
|
||||
The value must be in the range 0..0xFF.
|
||||
|
||||
@ -756,7 +745,8 @@ Integer numbers are 64 bits wide.
|
||||
They are converted to :ref:`expected operand type<amdgpu_syn_instruction_type>`
|
||||
as described :ref:`here<amdgpu_synid_int_conv>`.
|
||||
|
||||
Integer numbers may be specified in binary, octal, hexadecimal and decimal formats:
|
||||
Integer numbers may be specified in binary, octal,
|
||||
hexadecimal and decimal formats:
|
||||
|
||||
============ =============================== ========
|
||||
Format Syntax Example
|
||||
@ -829,22 +819,23 @@ Relocatable Expressions
|
||||
|
||||
The value of a relocatable expression depends on program relocation.
|
||||
|
||||
Note that use of relocatable expressions is limited with branch targets
|
||||
Note that use of relocatable expressions is limited to branch targets
|
||||
and 32-bit integer operands.
|
||||
|
||||
A relocatable expression is evaluated to a 64-bit integer value
|
||||
which depends on operand kind and :ref:`relocation type<amdgpu-relocation-records>`
|
||||
of symbol(s) used in the expression. For example, if an instruction refers a label,
|
||||
this reference is evaluated to an offset from the address after the instruction
|
||||
to the label address:
|
||||
A relocatable expression is evaluated to a 64-bit integer value,
|
||||
which depends on operand kind and
|
||||
:ref:`relocation type<amdgpu-relocation-records>` of symbol(s)
|
||||
used in the expression. For example, if an instruction refers to a label,
|
||||
this reference is evaluated to an offset from the address after
|
||||
the instruction to the label address:
|
||||
|
||||
.. parsed-literal::
|
||||
|
||||
label:
|
||||
v_add_co_u32_e32 v0, vcc, label, v1 // 'label' operand is evaluated to -4
|
||||
|
||||
Note that values of relocatable expressions are usually unknown at assembly time;
|
||||
they are resolved later by a linker and converted to
|
||||
Note that values of relocatable expressions are usually unknown
|
||||
at assembly time; they are resolved later by a linker and converted to
|
||||
:ref:`expected operand type<amdgpu_syn_instruction_type>`
|
||||
as described :ref:`here<amdgpu_synid_rl_conv>`.
|
||||
|
||||
@ -855,9 +846,11 @@ Expressions are composed of 64-bit integer operands and operations.
|
||||
Operands include :ref:`integer numbers<amdgpu_synid_integer_number>`
|
||||
and :ref:`symbols<amdgpu_synid_symbol>`.
|
||||
|
||||
Expressions may also use "." which is a reference to the current PC (program counter).
|
||||
Expressions may also use "." which is a reference
|
||||
to the current PC (program counter).
|
||||
|
||||
:ref:`Unary<amdgpu_synid_expression_un_op>` and :ref:`binary<amdgpu_synid_expression_bin_op>`
|
||||
:ref:`Unary<amdgpu_synid_expression_un_op>` and
|
||||
:ref:`binary<amdgpu_synid_expression_bin_op>`
|
||||
operations produce 64-bit integer results.
|
||||
|
||||
Syntax of Expressions
|
||||
@ -988,20 +981,25 @@ is used for an operand which has a different type or size.
|
||||
Conversion of Integer Values
|
||||
----------------------------
|
||||
|
||||
Instruction operands may be specified as 64-bit :ref:`integer numbers<amdgpu_synid_integer_number>` or
|
||||
:ref:`absolute expressions<amdgpu_synid_absolute_expression>`. These values are converted to
|
||||
the :ref:`expected operand type<amdgpu_syn_instruction_type>` using the following steps:
|
||||
Instruction operands may be specified as 64-bit
|
||||
:ref:`integer numbers<amdgpu_synid_integer_number>` or
|
||||
:ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
|
||||
These values are converted to the
|
||||
:ref:`expected operand type<amdgpu_syn_instruction_type>`
|
||||
using the following steps:
|
||||
|
||||
1. *Validation*. Assembler checks if the input value may be truncated without loss to the required *truncation width*
|
||||
(see the table below). There are two cases when this operation is enabled:
|
||||
1. *Validation*. Assembler checks if the input value may be truncated
|
||||
without loss to the required *truncation width* (see the table below).
|
||||
There are two cases when this operation is enabled:
|
||||
|
||||
* The truncated bits are all 0.
|
||||
* The truncated bits are all 1 and the value after truncation has its MSB bit set.
|
||||
|
||||
In all other cases assembler triggers an error.
|
||||
In all other cases, the assembler triggers an error.
|
||||
|
||||
2. *Conversion*. The input value is converted to the expected type as described in the table below.
|
||||
Depending on operand kind, this conversion is performed by either assembler or AMDGPU H/W (or both).
|
||||
2. *Conversion*. The input value is converted to the expected type
|
||||
as described in the table below. Depending on operand kind, this conversion
|
||||
is performed by either assembler or AMDGPU H/W (or both).
|
||||
|
||||
============== ================= =============== ====================================================================
|
||||
Expected type Truncation Width Conversion Description
|
||||
@ -1055,21 +1053,26 @@ Examples of disabled conversions:
|
||||
Conversion of Floating-Point Values
|
||||
-----------------------------------
|
||||
|
||||
Instruction operands may be specified as 64-bit :ref:`floating-point numbers<amdgpu_synid_floating-point_number>`.
|
||||
These values are converted to the :ref:`expected operand type<amdgpu_syn_instruction_type>` using the following steps:
|
||||
Instruction operands may be specified as 64-bit
|
||||
:ref:`floating-point numbers<amdgpu_synid_floating-point_number>`.
|
||||
These values are converted to the
|
||||
:ref:`expected operand type<amdgpu_syn_instruction_type>`
|
||||
using the following steps:
|
||||
|
||||
1. *Validation*. Assembler checks if the input f64 number can be converted
|
||||
to the *required floating-point type* (see the table below) without overflow or underflow.
|
||||
Precision lost is allowed. If this conversion is not possible, assembler triggers an error.
|
||||
to the *required floating-point type* (see the table below) without overflow
|
||||
or underflow. Precision lost is allowed. If this conversion is not possible,
|
||||
the assembler triggers an error.
|
||||
|
||||
2. *Conversion*. The input value is converted to the expected type as described in the table below.
|
||||
Depending on operand kind, this is performed by either assembler or AMDGPU H/W (or both).
|
||||
2. *Conversion*. The input value is converted to the expected type
|
||||
as described in the table below. Depending on operand kind, this is
|
||||
performed by either assembler or AMDGPU H/W (or both).
|
||||
|
||||
============== ================ ================= =================================================================
|
||||
Expected type Required FP Type Conversion Description
|
||||
============== ================ ================= =================================================================
|
||||
i16, u16, b16 f16 f16(num) Convert to f16 and use bits of the result as an integer value.
|
||||
The value has to be encoded as a literal or an error occurs.
|
||||
The value has to be encoded as a literal, or an error occurs.
|
||||
Note that the value cannot be encoded as an inline constant.
|
||||
i32, u32, b32 f32 f32(num) Convert to f32 and use bits of the result as an integer value.
|
||||
i64, u64, b64 \- \- Conversion disabled.
|
||||
@ -1122,8 +1125,9 @@ When the value of a relocatable expression is resolved by a linker, it is
|
||||
converted as needed and truncated to the operand size. The conversion depends
|
||||
on :ref:`relocation type<amdgpu-relocation-records>` and operand kind.
|
||||
|
||||
For example, when a 32-bit operand of an instruction refers a relocatable expression *expr*,
|
||||
this reference is evaluated to a 64-bit offset from the address after the
|
||||
For example, when a 32-bit operand of an instruction refers
|
||||
to a relocatable expression *expr*, this reference is evaluated
|
||||
to a 64-bit offset from the address after the
|
||||
instruction to the address being referenced, *counted in bytes*.
|
||||
Then the value is truncated to 32 bits and encoded as a literal:
|
||||
|
||||
@ -1133,7 +1137,7 @@ Then the value is truncated to 32 bits and encoded as a literal:
|
||||
v_add_co_u32_e32 v0, vcc, expr, v1 // 'expr' operand is evaluated to -4
|
||||
// and then truncated to 0xFFFFFFFC
|
||||
|
||||
As another example, when a branch instruction refers a label,
|
||||
As another example, when a branch instruction refers to a label,
|
||||
this reference is evaluated to an offset from the address after the
|
||||
instruction to the label address, *counted in dwords*.
|
||||
Then the value is truncated to 16 bits:
|
||||
|
Loading…
Reference in New Issue
Block a user