[AMDGPU][DOC][NFC] Update assembler syntax description

Summary of changes: - Enable register tuples with 9, 10, 11 and 12 registers (https://reviews.llvm.org/D138205). - Small improvements and clarifications. - Correct typos.
2024-11-27 07:31:28 +00:00 · 2022-12-20 14:01:37 +03:00 · 2022-12-20 14:01:37 +03:00 · d9daee5a66
commit d9daee5a66
parent ff302f8502
4 changed files with 298 additions and 345 deletions
--- a/llvm/docs/AMDGPUInstructionNotation.rst
+++ b/llvm/docs/AMDGPUInstructionNotation.rst
@ -10,10 +10,10 @@ AMDGPU Instructions Notation
 Introduction
 ============

-This is an overview of notation used to describe syntax of AMDGPU assembler instructions.
+This is an overview of notation used to describe the syntax of AMDGPU assembler instructions.

-This notation mimics the :ref:`syntax of assembler instructions<amdgpu_syn_instructions>`
-except that instead of real operands and modifiers it provides references to their description.
+This notation looks a lot like the :ref:`syntax of assembler instructions<amdgpu_syn_instructions>`,
+except that instead of real operands and modifiers, it uses references to their descriptions.

 Instructions
 ============
@ -23,7 +23,9 @@ Notation

 This is the notation used to describe AMDGPU instructions:

-    ``<``\ :ref:`opcode description<amdgpu_syn_opcode_notation>`\ ``>  <``\ :ref:`operands description<amdgpu_syn_instruction_operands_notation>`\ ``>  <``\ :ref:`modifiers description<amdgpu_syn_instruction_modifiers_notation>`\ ``>``
+  | ``<``\ :ref:`opcode description<amdgpu_syn_opcode_notation>`\ ``>
+      <``\ :ref:`operands description<amdgpu_syn_instruction_operands_notation>`\ ``>
+      <``\ :ref:`modifiers description<amdgpu_syn_instruction_modifiers_notation>`\ ``>``

 .. _amdgpu_syn_opcode_notation:

@ -42,7 +44,8 @@ Operands

 An instruction may have zero or more *operands*. They are comma-separated in the description:

-    ``<``\ :ref:`description of operand 0<amdgpu_syn_instruction_operand_notation>`\ ``>, <``\ :ref:`description of operand 1<amdgpu_syn_instruction_operand_notation>`\ ``>, ...``
+  | ``<``\ :ref:`description of operand 0<amdgpu_syn_instruction_operand_notation>`\ ``>,
+      <``\ :ref:`description of operand 1<amdgpu_syn_instruction_operand_notation>`\ ``>, ...``

 The order of *operands* is fixed. *Operands* cannot be omitted
 except for special cases described below.
@ -60,7 +63,8 @@ Where:

 * *kind* is an optional prefix describing operand :ref:`kind<amdgpu_syn_instruction_operand_kinds>`.
 * *name* is a link to a description of the operand.
-* *tags* are optional. They are used to indicate :ref:`special operand properties<amdgpu_syn_instruction_operand_tags>`.
+* *tags* are optional. They are used to indicate
+  :ref:`special operand properties<amdgpu_syn_instruction_operand_tags>`.

 .. _amdgpu_syn_instruction_operand_kinds:

@ -70,8 +74,8 @@ Operand Kinds
 Operand kind indicates which values are accepted by the operand.

 * Operands which only accept *vector* registers are labelled with 'v' prefix.
-* Operands which only accept *scalar* values are labelled with 's' prefix.
-* Operands which accept both *vector* registers and *scalar* values have no prefix.
+* Operands which only accept *scalar* registers and values are labelled with 's' prefix.
+* Operands which accept any registers and values have no prefix.

 Examples:

@ -79,7 +83,7 @@ Examples:

    vdata          // operand only accepts vector registers
    sdst           // operand only accepts scalar registers
-    src1           // operand accepts both scalar and vector registers
+    src1           // operand accepts vector registers, scalar registers, and scalar values

 .. _amdgpu_syn_instruction_operand_tags:

@ -92,16 +96,16 @@ Operand tags indicate special operand properties.
    Operand tag    Meaning
    ============== =================================================================================
    :opt           An optional operand.
-    :m             An operand which may be used with
-                   :ref:`VOP3 operand modifiers<amdgpu_synid_vop3_operand_modifiers>` or
-                   :ref:`SDWA operand modifiers<amdgpu_synid_sdwa_operand_modifiers>`.
-    :dst           An input operand which may also serve as a destination
+    :m             An operand which may be used with operand modifiers
+                   :ref:`abs<amdgpu_synid_abs>`, :ref:`neg<amdgpu_synid_neg>` or
+                   :ref:`sext<amdgpu_synid_sext>`.
+    :dst           An input operand which is also used as a destination
                   if :ref:`glc<amdgpu_synid_glc>` modifier is specified.
-    :fx            This is an *f32* or *f16* operand depending on
+    :fx            This is a *f32* or *f16* operand, depending on
                   :ref:`m_op_sel_hi<amdgpu_synid_mad_mix_op_sel_hi>` modifier.
-    :<type>        Operand *type* differs from *type*
+    :<type>        The operand *type* differs from the *type*
                   :ref:`implied by the opcode name<amdgpu_syn_instruction_type>`.
-                   This tag specifies actual operand *type*.
+                   This tag specifies the actual operand *type*.
    ============== =================================================================================

 Examples:
@ -119,7 +123,8 @@ Modifiers

 An instruction may have zero or more optional *modifiers*. They are space-separated in the description:

-    ``<``\ :ref:`description of modifier 0<amdgpu_syn_instruction_modifier_notation>`\ ``> <``\ :ref:`description of modifier 1<amdgpu_syn_instruction_modifier_notation>`\ ``> ...``
+  | ``<``\ :ref:`description of modifier 0<amdgpu_syn_instruction_modifier_notation>`\ ``>
+      <``\ :ref:`description of modifier 1<amdgpu_syn_instruction_modifier_notation>`\ ``> ...``

 The order of *modifiers* is fixed.

@ -132,4 +137,4 @@ A *modifier* is described using the following notation:

    *<name>*

-Where *name* is a link to a description of the *modifier*.
+Where the *name* is a link to a description of the *modifier*.
--- a/llvm/docs/AMDGPUInstructionSyntax.rst
+++ b/llvm/docs/AMDGPUInstructionSyntax.rst
@ -15,9 +15,10 @@ Syntax

 An instruction has the following syntax:

-    ``<``\ *opcode mnemonic*\ ``>    <``\ *operand0*\ ``>, <``\ *operand1*\ ``>,...    <``\ *modifier0*\ ``> <``\ *modifier1*\ ``>...``
+  | ``<``\ *opcode mnemonic*\ ``>    <``\ *operand0*\ ``>,
+      <``\ *operand1*\ ``>,...    <``\ *modifier0*\ ``> <``\ *modifier1*\ ``>...``

-:doc:`Operands<AMDGPUOperandSyntax>` are normally comma-separated while
+:doc:`Operands<AMDGPUOperandSyntax>` are normally comma-separated, while
 :doc:`modifiers<AMDGPUModifierSyntax>` are space-separated.

 The order of *operands* and *modifiers* is fixed.
@ -28,7 +29,8 @@ Most *modifiers* are optional and may be omitted.
 Opcode Mnemonic
 ~~~~~~~~~~~~~~~

-Opcode mnemonic describes opcode semantics and may include one or more suffices in this order:
+Opcode mnemonic describes opcode semantics
+and may include one or more suffices in this order:

 * :ref:`Packing suffix<amdgpu_syn_instruction_pk>`.
 * :ref:`Destination operand type suffix<amdgpu_syn_instruction_type>`.
@ -81,7 +83,7 @@ The following table enumerates the most frequently used type suffices.
    ============================================ ======================= ============================

 Instructions which have no type suffices are assumed to operate with typeless data.
-The size of data is specified by size suffices:
+The size of typeless data is specified by size suffices:

    ================= =================== =====================================
    Size Suffix       Implied data type   Required register size in dwords
@ -103,8 +105,8 @@ The size of data is specified by size suffices:
    ================= =================== =====================================

 .. WARNING::
-    There are exceptions from rules described above.
-    Operands which have type different from type specified by the opcode are
+    There are exceptions to the rules described above.
+    Operands which have a type different from the type specified by the opcode are
    :ref:`tagged<amdgpu_syn_instruction_operand_tags>` in the description.

 Examples of instructions with different types of source and destination operands:
@ -144,7 +146,9 @@ Encoding Suffices
 Most *VOP1*, *VOP2* and *VOPC* instructions have several variants:
 they may also be encoded in *VOP3*, *DPP* and *SDWA* formats.

-The assembler will automatically use optimal encoding based on instruction operands.
+The assembler selects an optimal encoding automatically
+based on instruction operands and modifiers,
+unless a specific encoding is explicitly requested.
 To force specific encoding, one can add a suffix to the opcode of the instruction:

    =================================================== =================
@ -156,8 +160,8 @@ To force specific encoding, one can add a suffix to the opcode of the instructio
    *SDWA* encoding                                     _sdwa
    =================================================== =================

-These suffices are used in this reference to indicate the assumed encoding.
-When no suffix is specified, native instruction encoding is implied.
+This reference uses encoding suffices to specify which encoding is implied.
+When no suffix is specified, native instruction encoding is assumed.

 Operands
 ========
@ -165,9 +169,9 @@ Operands
 Syntax
 ~~~~~~

-Syntax of generic operands is described :doc:`in this document<AMDGPUOperandSyntax>`.
+The syntax of generic operands is described :doc:`in this document<AMDGPUOperandSyntax>`.

-For detailed information about operands follow *operand links* in GPU-specific documents.
+For detailed information about operands, follow *operand links* in GPU-specific documents.

 Modifiers
 =========
@ -175,6 +179,7 @@ Modifiers
 Syntax
 ~~~~~~

-Syntax of modifiers is described :doc:`in this document<AMDGPUModifierSyntax>`.
+The syntax of modifiers is described :doc:`in this document<AMDGPUModifierSyntax>`.

-Information about modifiers supported for individual instructions may be found in GPU-specific documents.
+Information about modifiers supported for individual instructions
+may be found in GPU-specific documents.
--- a/llvm/docs/AMDGPUModifierSyntax.rst
+++ b/llvm/docs/AMDGPUModifierSyntax.rst
--- a/llvm/docs/AMDGPUOperandSyntax.rst
+++ b/llvm/docs/AMDGPUOperandSyntax.rst
@ -14,7 +14,7 @@ The following notation is used throughout this document:
    Notation            Description
    =================== =============================================================================
    {0..N}              Any integer value in the range from 0 to N (inclusive).
-    <x>                 Syntax and meaning of *x* is explained elsewhere.
+    <x>                 Syntax and meaning of *x* are explained elsewhere.
    =================== =============================================================================

 .. _amdgpu_syn_operands:
@ -31,7 +31,7 @@ Vector registers. There are 256 32-bit vector registers.

 A sequence of *vector* registers may be used to operate with more than 32 bits of data.

-Assembler currently supports sequences of 1, 2, 3, 4, 5, 6, 7, 8, 16 and 32 *vector* registers.
+Assembler currently supports tuples with 1 to 12, 16 and 32 *vector* registers.

    =================================================== ====================================================================
    Syntax                                              Description
@ -61,9 +61,10 @@ Note: *N* and *K* must satisfy the following conditions:
 * *N* <= *K*.
 * 0 <= *N* <= 255.
 * 0 <= *K* <= 255.
-* *K-N+1* must be equal to 1, 2, 3, 4, 5, 6, 7, 8, 16 or 32.
+* *K-N+1* must be in the range from 1 to 12 or equal to 16 or 32.

-GFX90A has an additional alignment requirement: pairs of *vector* registers must be even-aligned
+GFX90A and GFX940 have an additional alignment requirement:
+pairs of *vector* registers must be even-aligned
 (first register must be even).

 Examples:
@ -82,19 +83,20 @@ Examples:

 .. _amdgpu_synid_nsa:

-GFX10 *Image* instructions may use special *NSA* (Non-Sequential Address) syntax for *image addresses*:
+GFX10+ *image* instructions may use special *NSA* (Non-Sequential Address)
+syntax for *image addresses*:

    ===================================== =================================================
    Syntax                                Description
    ===================================== =================================================
    **[Vm**, \ **Vn**, ... **Vk**\ **]**  A sequence of 32-bit *vector* registers.
-                                          Each register may be specified using syntax
+                                          Each register may be specified using the syntax
                                          defined :ref:`above<amdgpu_synid_v>`.

-                                          In contrast with standard syntax, registers
+                                          In contrast with the standard syntax, registers
                                          in *NSA* sequence are not required to have
                                          consecutive indices. Moreover, the same register
-                                          may appear in the list more than once.
+                                          may appear in the sequence more than once.
    ===================================== =================================================

 Examples:
@ -114,10 +116,10 @@ Accumulator registers. There are 256 32-bit accumulator registers.

 A sequence of *accumulator* registers may be used to operate with more than 32 bits of data.

-Assembler currently supports sequences of 1, 2, 3, 4, 5, 6, 7, 8, 16 and 32 *accumulator* registers.
+Assembler currently supports tuples with 1 to 12, 16 and 32 *accumulator* registers.

    =================================================== ========================================================= ====================================================================
-    Syntax                                              An Alternative Syntax (SP3)                               Description
+    Syntax                                              Alternative Syntax (SP3)                                  Description
    =================================================== ========================================================= ====================================================================
    **a**\<N>                                           **acc**\<N>                                               A single 32-bit *accumulator* register.

@ -144,9 +146,10 @@ Note: *N* and *K* must satisfy the following conditions:
 * *N* <= *K*.
 * 0 <= *N* <= 255.
 * 0 <= *K* <= 255.
-* *K-N+1* must be equal to 1, 2, 3, 4, 5, 6, 7, 8, 16 or 32.
+* *K-N+1* must be in the range from 1 to 12 or equal to 16 or 32.

-GFX90A has an additional alignment requirement: pairs of *accumulator* registers must be even-aligned
+GFX90A and GFX940 have an additional alignment requirement:
+pairs of *accumulator* registers must be even-aligned
 (first register must be even).

 Examples:
@ -173,7 +176,7 @@ Examples:
 s
 -

-Scalar 32-bit registers. The number of available *scalar* registers depends on GPU:
+Scalar 32-bit registers. The number of available *scalar* registers depends on the GPU:

    ======= ============================
    GPU     Number of *scalar* registers
@ -181,11 +184,11 @@ Scalar 32-bit registers. The number of available *scalar* registers depends on G
    GFX7    104
    GFX8    102
    GFX9    102
-    GFX10   106
+    GFX10+  106
    ======= ============================

 A sequence of *scalar* registers may be used to operate with more than 32 bits of data.
-Assembler currently supports sequences of 1, 2, 3, 4, 5, 6, 7, 8, 16 and 32 *scalar* registers.
+Assembler currently supports tuples with 1 to 12, 16 and 32 *scalar* registers.

 Pairs of *scalar* registers must be even-aligned (first register must be even).
 Sequences of 4 and more *scalar* registers must be quad-aligned.
@ -217,11 +220,11 @@ Sequences of 4 and more *scalar* registers must be quad-aligned.

 Note: *N* and *K* must satisfy the following conditions:

-* *N* must be properly aligned based on sequence size.
+* *N* must be properly aligned based on the sequence size.
 * *N* <= *K*.
 * 0 <= *N* < *SMAX*\ , where *SMAX* is the number of available *scalar* registers.
 * 0 <= *K* < *SMAX*\ , where *SMAX* is the number of available *scalar* registers.
-* *K-N+1* must be equal to 1, 2, 3, 4, 5, 6, 7, 8, 16 or 32.
+* *K-N+1* must be in the range from 1 to 12 or equal to 16 or 32.

 Examples:

@ -261,7 +264,7 @@ ttmp
 ----

 Trap handler temporary scalar registers, 32-bits wide.
-The number of available *ttmp* registers depends on GPU:
+The number of available *ttmp* registers depends on the GPU:

    ======= ===========================
    GPU     Number of *ttmp* registers
@ -269,11 +272,11 @@ The number of available *ttmp* registers depends on GPU:
    GFX7    12
    GFX8    12
    GFX9    16
-    GFX10   16
+    GFX10+  16
    ======= ===========================

 A sequence of *ttmp* registers may be used to operate with more than 32 bits of data.
-Assembler currently supports sequences of 1, 2, 3, 4, 5, 6, 7, 8 and 16 *ttmp* registers.
+Assembler currently supports tuples with 1 to 12 and 16 *ttmp* registers.

 Pairs of *ttmp* registers must be even-aligned (first register must be even).
 Sequences of 4 and more *ttmp* registers must be quad-aligned.
@ -303,11 +306,11 @@ Sequences of 4 and more *ttmp* registers must be quad-aligned.

 Note: *N* and *K* must satisfy the following conditions:

-* *N* must be properly aligned based on sequence size.
+* *N* must be properly aligned based on the sequence size.
 * *N* <= *K*.
 * 0 <= *N* < *TMAX*, where *TMAX* is the number of available *ttmp* registers.
 * 0 <= *K* < *TMAX*, where *TMAX* is the number of available *ttmp* registers.
-* *K-N+1* must be equal to 1, 2, 3, 4, 5, 6, 7, 8 or 16.
+* *K-N+1* must be in the range from 1 to 12 or equal to 16.

 Examples:

@ -335,7 +338,8 @@ Examples of *ttmp* registers with an invalid alignment:
 tba
 ---

-Trap base address, 64-bits wide. Holds the pointer to the current trap handler program.
+Trap base address, 64-bits wide. Holds the pointer to the current
+trap handler program.

    ================== ======================================================================= =============
    Syntax             Description                                                             Availability
@ -356,9 +360,6 @@ High and low 32 bits of *trap base address* may be accessed as separate register
    [tba_hi]           High 32 bits of *trap base address* register (an SP3 syntax).           GFX7, GFX8
    ================== ======================================================================= =============

-Note that *tba*, *tba_lo* and *tba_hi* are not accessible as assembler registers in GFX9 and GFX10,
-but *tba* is readable/writable with the help of *s_get_reg* and *s_set_reg* instructions.
-
 .. _amdgpu_synid_tma:

 tma
@ -385,9 +386,6 @@ High and low 32 bits of *trap memory address* may be accessed as separate regist
    [tma_hi]          High 32 bits of *trap memory address* register (an SP3 syntax).         GFX7, GFX8
    ================= ======================================================================= ==================

-Note that *tma*, *tma_lo* and *tma_hi* are not accessible as assembler registers in GFX9 and GFX10,
-but *tma* is readable/writable with the help of *s_get_reg* and *s_set_reg* instructions.
-
 .. _amdgpu_synid_flat_scratch:

 flat_scratch
@ -414,10 +412,6 @@ High and low 32 bits of *flat scratch* address may be accessed as separate regis
    [flat_scratch_hi]         High 32 bits of *flat scratch* address register (an SP3 syntax).
    ========================= =========================================================================

-Note that *flat_scratch*, *flat_scratch_lo* and *flat_scratch_hi* are not accessible as assembler
-registers in GFX10, but *flat_scratch* is readable/writable with the help of
-*s_get_reg* and *s_set_reg* instructions.
-
 .. _amdgpu_synid_xnack:
 .. _amdgpu_synid_xnack_mask:

@ -427,9 +421,7 @@ xnack_mask
 Xnack mask, 64-bits wide. Holds a 64-bit mask of which threads
 received an *XNACK* due to a vector memory operation.

-.. WARNING:: GFX7 does not support *xnack* feature. For availability of this feature in other GPUs, refer :ref:`this table<amdgpu-processors>`.
-
-\
+For availability of *xnack* feature, refer to :ref:`this table<amdgpu-processors>`.

    ============================== =====================================================
    Syntax                         Description
@ -450,10 +442,6 @@ High and low 32 bits of *xnack mask* may be accessed as separate registers:
    [xnack_mask_hi]       High 32 bits of *xnack mask* register (an SP3 syntax).
    ===================== ==============================================================

-Note that *xnack_mask*, *xnack_mask_lo* and *xnack_mask_hi* are not accessible as assembler
-registers in GFX10, but *xnack_mask* is readable/writable with the help of
-*s_get_reg* and *s_set_reg* instructions.
-
 .. _amdgpu_synid_vcc:
 .. _amdgpu_synid_vcc_lo:

@ -463,7 +451,7 @@ vcc
 Vector condition code, 64-bits wide. A bit mask with one bit per thread;
 it holds the result of a vector compare operation.

-Note that GFX10 H/W does not use high 32 bits of *vcc* in *wave32* mode.
+Note that GFX10+ H/W does not use high 32 bits of *vcc* in *wave32* mode.

    ================ =========================================================================
    Syntax           Description
@ -508,7 +496,7 @@ Execute mask, 64-bits wide. A bit mask with one bit per thread,
 which is applied to vector instructions and controls which threads execute
 and which ignore the instruction.

-Note that GFX10 H/W does not use high 32 bits of *exec* in *wave32* mode.
+Note that GFX10+ H/W does not use high 32 bits of *exec* in *wave32* mode.

    ===================== =================================================================
    Syntax                Description
@ -534,18 +522,22 @@ High and low 32 bits of *execute mask* may be accessed as separate registers:
 vccz
 ----

-A single bit flag indicating that the :ref:`vcc<amdgpu_synid_vcc>` is all zeros.
+A single bit flag indicating that the :ref:`vcc<amdgpu_synid_vcc>`
+is all zeros.

-Note: when GFX10 operates in *wave32* mode, this register reflects state of :ref:`vcc_lo<amdgpu_synid_vcc_lo>`.
+Note: when GFX10+ operates in *wave32* mode, this register reflects
+the state of :ref:`vcc_lo<amdgpu_synid_vcc_lo>`.

 .. _amdgpu_synid_execz:

 execz
 -----

-A single bit flag indicating that the :ref:`exec<amdgpu_synid_exec>` is all zeros.
+A single bit flag indicating that the :ref:`exec<amdgpu_synid_exec>`
+is all zeros.

-Note: when GFX10 operates in *wave32* mode, this register reflects state of :ref:`exec_lo<amdgpu_synid_exec>`.
+Note: when GFX10+ operates in *wave32* mode, this register reflects
+the state of :ref:`exec_lo<amdgpu_synid_exec>`.

 .. _amdgpu_synid_scc:

@ -567,34 +559,31 @@ fetched from *LDS* memory using :ref:`m0<amdgpu_synid_m0>` as an address.
 null
 ----

-This is a special operand which may be used as a source or a destination.
+This is a special operand that may be used as a source or a destination.

 When used as a destination, the result of the operation is discarded.

 When used as a source, it supplies zero value.

-GFX10 only.
-
-.. WARNING:: Due to a H/W bug, this operand cannot be used with VALU instructions in first generation of GFX10.
-
 .. _amdgpu_synid_constant:

 inline constant
 ---------------

-An *inline constant* is an integer or a floating-point value encoded as a part of an instruction.
-Compare *inline constants* with :ref:`literals<amdgpu_synid_literal>`.
+An *inline constant* is an integer or a floating-point value
+encoded as a part of an instruction. Compare *inline constants*
+with :ref:`literals<amdgpu_synid_literal>`.

 Inline constants include:

-* :ref:`iconst<amdgpu_synid_iconst>`
-* :ref:`fconst<amdgpu_synid_fconst>`
-* :ref:`ival<amdgpu_synid_ival>`
+* :ref:`Integer inline constants<amdgpu_synid_iconst>`;
+* :ref:`Floating-point inline constants<amdgpu_synid_fconst>`;
+* :ref:`Inline values<amdgpu_synid_ival>`.

 If a number may be encoded as either
 a :ref:`literal<amdgpu_synid_literal>` or
 a :ref:`constant<amdgpu_synid_constant>`,
-assembler selects the latter encoding as more efficient.
+the assembler selects the latter encoding as more efficient.

 .. _amdgpu_synid_iconst:

@ -607,7 +596,7 @@ encoded as an *inline constant*.

 Only a small fraction of integer numbers may be encoded as *inline constants*.
 They are enumerated in the table below.
-Other integer numbers have to be encoded as :ref:`literals<amdgpu_synid_literal>`.
+Other integer numbers are encoded as :ref:`literals<amdgpu_synid_literal>`.

    ================================== ====================================
    Value                              Note
@ -616,8 +605,6 @@ Other integer numbers have to be encoded as :ref:`literals<amdgpu_synid_literal>
    {-16..-1}                          Negative integer inline constants.
    ================================== ====================================

-.. WARNING:: GFX7 does not support inline constants for *f16* operands.
-
 .. _amdgpu_synid_fconst:

 fconst
@ -626,9 +613,10 @@ fconst
 A :ref:`floating-point number<amdgpu_synid_floating-point_number>`
 encoded as an *inline constant*.

-Only a small fraction of floating-point numbers may be encoded as *inline constants*.
-They are enumerated in the table below.
-Other floating-point numbers have to be encoded as :ref:`literals<amdgpu_synid_literal>`.
+Only a small fraction of floating-point numbers may be encoded
+as *inline constants*. They are enumerated in the table below.
+Other floating-point numbers are encoded as
+:ref:`literals<amdgpu_synid_literal>`.

    ===================== ===================================================== ==================
    Value                 Note                                                  Availability
@ -642,15 +630,13 @@ Other floating-point numbers have to be encoded as :ref:`literals<amdgpu_synid_l
    -1.0                  Floating-point constant -1.0                          All GPUs
    -2.0                  Floating-point constant -2.0                          All GPUs
    -4.0                  Floating-point constant -4.0                          All GPUs
-    0.1592                1.0/(2.0*pi). Use only for 16-bit operands.           GFX8, GFX9, GFX10
-    0.15915494            1.0/(2.0*pi). Use only for 16- and 32-bit operands.   GFX8, GFX9, GFX10
-    0.15915494309189532   1.0/(2.0*pi).                                         GFX8, GFX9, GFX10
+    0.1592                1.0/(2.0*pi). Use only for 16-bit operands.           GFX8+
+    0.15915494            1.0/(2.0*pi). Use only for 16- and 32-bit operands.   GFX8+
+    0.15915494309189532   1.0/(2.0*pi).                                         GFX8+
    ===================== ===================================================== ==================

 .. WARNING:: Floating-point inline constants cannot be used with *16-bit integer* operands. \
-             Assembler will attempt to encode these values as literals.
-
-.. WARNING:: GFX7 does not support inline constants for *f16* operands.
+             Assembler encodes these values as literals.

 .. _amdgpu_synid_ival:

@ -660,42 +646,45 @@ ival
 A symbolic operand encoded as an *inline constant*.
 These operands provide read-only access to H/W registers.

-    ======================== ================================================ =============
-    Syntax                   Note                                             Availability
-    ======================== ================================================ =============
-    shared_base              Base address of shared memory region.            GFX9, GFX10
-    shared_limit             Address of the end of shared memory region.      GFX9, GFX10
-    private_base             Base address of private memory region.           GFX9, GFX10
-    private_limit            Address of the end of private memory region.     GFX9, GFX10
-    pops_exiting_wave_id     A dedicated counter for POPS.                    GFX9, GFX10
-    ======================== ================================================ =============
+    ===================== ========================= ================================================ =============
+    Syntax                Alternative Syntax (SP3)  Note                                             Availability
+    ===================== ========================= ================================================ =============
+    shared_base           src_shared_base           Base address of shared memory region.            GFX9+
+    shared_limit          src_shared_limit          Address of the end of shared memory region.      GFX9+
+    private_base          src_private_base          Base address of private memory region.           GFX9+
+    private_limit         src_private_limit         Address of the end of private memory region.     GFX9+
+    pops_exiting_wave_id  src_pops_exiting_wave_id  A dedicated counter for POPS.                    GFX9, GFX10
+    ===================== ========================= ================================================ =============

 .. _amdgpu_synid_literal:

 literal
 -------

-A *literal* is a 64-bit value encoded as a separate 32-bit dword in the instruction stream.
-Compare *literals* with :ref:`inline constants<amdgpu_synid_constant>`.
+A *literal* is a 64-bit value encoded as a separate
+32-bit dword in the instruction stream. Compare *literals*
+with :ref:`inline constants<amdgpu_synid_constant>`.

 If a number may be encoded as either
 a :ref:`literal<amdgpu_synid_literal>` or
 an :ref:`inline constant<amdgpu_synid_constant>`,
 assembler selects the latter encoding as more efficient.

-Literals may be specified as :ref:`integer numbers<amdgpu_synid_integer_number>`,
+Literals may be specified as
+:ref:`integer numbers<amdgpu_synid_integer_number>`,
 :ref:`floating-point numbers<amdgpu_synid_floating-point_number>`,
 :ref:`absolute expressions<amdgpu_synid_absolute_expression>` or
 :ref:`relocatable expressions<amdgpu_synid_relocatable_expression>`.

-An instruction may use only one literal but several operands may refer the same literal.
+An instruction may use only one literal,
+but several operands may refer to the same literal.

 .. _amdgpu_synid_uimm8:

 uimm8
 -----

-A 8-bit :ref:`integer number<amdgpu_synid_integer_number>`
+An 8-bit :ref:`integer number<amdgpu_synid_integer_number>`
 or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
 The value must be in the range 0..0xFF.

@ -756,7 +745,8 @@ Integer numbers are 64 bits wide.
 They are converted to :ref:`expected operand type<amdgpu_syn_instruction_type>`
 as described :ref:`here<amdgpu_synid_int_conv>`.

-Integer numbers may be specified in binary, octal, hexadecimal and decimal formats:
+Integer numbers may be specified in binary, octal,
+hexadecimal and decimal formats:

    ============ =============================== ========
    Format       Syntax                          Example
@ -829,22 +819,23 @@ Relocatable Expressions

 The value of a relocatable expression depends on program relocation.

-Note that use of relocatable expressions is limited with branch targets
+Note that use of relocatable expressions is limited to branch targets
 and 32-bit integer operands.

-A relocatable expression is evaluated to a 64-bit integer value
-which depends on operand kind and :ref:`relocation type<amdgpu-relocation-records>`
-of symbol(s) used in the expression. For example, if an instruction refers a label,
-this reference is evaluated to an offset from the address after the instruction
-to the label address:
+A relocatable expression is evaluated to a 64-bit integer value,
+which depends on operand kind and
+:ref:`relocation type<amdgpu-relocation-records>` of symbol(s)
+used in the expression. For example, if an instruction refers to a label,
+this reference is evaluated to an offset from the address after
+the instruction to the label address:

 .. parsed-literal::

    label:
    v_add_co_u32_e32 v0, vcc, label, v1  // 'label' operand is evaluated to -4

-Note that values of relocatable expressions are usually unknown at assembly time;
-they are resolved later by a linker and converted to
+Note that values of relocatable expressions are usually unknown
+at assembly time; they are resolved later by a linker and converted to
 :ref:`expected operand type<amdgpu_syn_instruction_type>`
 as described :ref:`here<amdgpu_synid_rl_conv>`.

@ -855,9 +846,11 @@ Expressions are composed of 64-bit integer operands and operations.
 Operands include :ref:`integer numbers<amdgpu_synid_integer_number>`
 and :ref:`symbols<amdgpu_synid_symbol>`.

-Expressions may also use "." which is a reference to the current PC (program counter).
+Expressions may also use "." which is a reference
+to the current PC (program counter).

-:ref:`Unary<amdgpu_synid_expression_un_op>` and :ref:`binary<amdgpu_synid_expression_bin_op>`
+:ref:`Unary<amdgpu_synid_expression_un_op>` and
+:ref:`binary<amdgpu_synid_expression_bin_op>`
 operations produce 64-bit integer results.

 Syntax of Expressions
@ -988,20 +981,25 @@ is used for an operand which has a different type or size.
 Conversion of Integer Values
 ----------------------------

-Instruction operands may be specified as 64-bit :ref:`integer numbers<amdgpu_synid_integer_number>` or
-:ref:`absolute expressions<amdgpu_synid_absolute_expression>`. These values are converted to
-the :ref:`expected operand type<amdgpu_syn_instruction_type>` using the following steps:
+Instruction operands may be specified as 64-bit
+:ref:`integer numbers<amdgpu_synid_integer_number>` or
+:ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
+These values are converted to the
+:ref:`expected operand type<amdgpu_syn_instruction_type>`
+using the following steps:

-1. *Validation*. Assembler checks if the input value may be truncated without loss to the required *truncation width*
-(see the table below). There are two cases when this operation is enabled:
+1. *Validation*. Assembler checks if the input value may be truncated
+without loss to the required *truncation width* (see the table below).
+There are two cases when this operation is enabled:

    * The truncated bits are all 0.
    * The truncated bits are all 1 and the value after truncation has its MSB bit set.

-In all other cases assembler triggers an error.
+In all other cases, the assembler triggers an error.

-2. *Conversion*. The input value is converted to the expected type as described in the table below.
-Depending on operand kind, this conversion is performed by either assembler or AMDGPU H/W (or both).
+2. *Conversion*. The input value is converted to the expected type
+as described in the table below. Depending on operand kind, this conversion
+is performed by either assembler or AMDGPU H/W (or both).

    ============== ================= =============== ====================================================================
    Expected type  Truncation Width  Conversion      Description
@ -1055,21 +1053,26 @@ Examples of disabled conversions:
 Conversion of Floating-Point Values
 -----------------------------------

-Instruction operands may be specified as 64-bit :ref:`floating-point numbers<amdgpu_synid_floating-point_number>`.
-These values are converted to the :ref:`expected operand type<amdgpu_syn_instruction_type>` using the following steps:
+Instruction operands may be specified as 64-bit
+:ref:`floating-point numbers<amdgpu_synid_floating-point_number>`.
+These values are converted to the
+:ref:`expected operand type<amdgpu_syn_instruction_type>`
+using the following steps:

 1. *Validation*. Assembler checks if the input f64 number can be converted
-to the *required floating-point type* (see the table below) without overflow or underflow.
-Precision lost is allowed. If this conversion is not possible, assembler triggers an error.
+to the *required floating-point type* (see the table below) without overflow
+or underflow. Precision lost is allowed. If this conversion is not possible,
+the assembler triggers an error.

-2. *Conversion*. The input value is converted to the expected type as described in the table below.
-Depending on operand kind, this is performed by either assembler or AMDGPU H/W (or both).
+2. *Conversion*. The input value is converted to the expected type
+as described in the table below. Depending on operand kind, this is
+performed by either assembler or AMDGPU H/W (or both).

    ============== ================ ================= =================================================================
    Expected type  Required FP Type Conversion        Description
    ============== ================ ================= =================================================================
    i16, u16, b16  f16              f16(num)          Convert to f16 and use bits of the result as an integer value.
-                                                      The value has to be encoded as a literal or an error occurs.
+                                                      The value has to be encoded as a literal, or an error occurs.
                                                      Note that the value cannot be encoded as an inline constant.
    i32, u32, b32  f32              f32(num)          Convert to f32 and use bits of the result as an integer value.
    i64, u64, b64  \-               \-                Conversion disabled.
@ -1122,8 +1125,9 @@ When the value of a relocatable expression is resolved by a linker, it is
 converted as needed and truncated to the operand size. The conversion depends
 on :ref:`relocation type<amdgpu-relocation-records>` and operand kind.

-For example, when a 32-bit operand of an instruction refers a relocatable expression *expr*,
-this reference is evaluated to a 64-bit offset from the address after the
+For example, when a 32-bit operand of an instruction refers
+to a relocatable expression *expr*, this reference is evaluated
+to a 64-bit offset from the address after the
 instruction to the address being referenced, *counted in bytes*.
 Then the value is truncated to 32 bits and encoded as a literal:

@ -1133,7 +1137,7 @@ Then the value is truncated to 32 bits and encoded as a literal:
    v_add_co_u32_e32 v0, vcc, expr, v1  // 'expr' operand is evaluated to -4
                                        // and then truncated to 0xFFFFFFFC

-As another example, when a branch instruction refers a label,
+As another example, when a branch instruction refers to a label,
 this reference is evaluated to an offset from the address after the
 instruction to the label address, *counted in dwords*.
 Then the value is truncated to 16 bits: