mirror of
https://gitee.com/openharmony/third_party_spirv-tools
synced 2024-11-30 11:00:47 +00:00
2119694775
Document the fact that we use names for extended instructions and OpSpecConstantOp opcode operands.
239 lines
11 KiB
Markdown
239 lines
11 KiB
Markdown
# SPIR-V Assembly language syntax
|
|
|
|
## Overview
|
|
|
|
The assembly attempts to adhere to the binary form from Section 3 of the SPIR-V
|
|
spec as closely as possible, with one exception aiming at improving the text's
|
|
readability. The `<result-id>` generated by an instruction is moved to the
|
|
beginning of that instruction and followed by an `=` sign. This allows us to
|
|
distinguish between variable definitions and uses and locate value definitions
|
|
more easily.
|
|
|
|
Here is an example:
|
|
|
|
```
|
|
OpCapability Shader
|
|
OpMemoryModel Logical Simple
|
|
OpEntryPoint GLCompute %3 "main"
|
|
OpExecutionMode %3 LocalSize 64 64 1
|
|
%1 = OpTypeVoid
|
|
%2 = OpTypeFunction %1
|
|
%3 = OpFunction %1 None %2
|
|
%4 = OpLabel
|
|
OpReturn
|
|
OpFunctionEnd
|
|
```
|
|
|
|
A module is a sequence of instructions, separated by whitespace.
|
|
An instruction is an opcode name followed by operands, separated by
|
|
whitespace. Typically each instruction is presented on its own line,
|
|
but the assembler does not enforce this rule.
|
|
|
|
The opcode names and expected operands are described in Section 3 of
|
|
the SPIR-V specification. An operand is one of:
|
|
* a literal integer: A decimal integer, or a hexadecimal integer.
|
|
A hexadecimal integer is indicated by a leading `0x` or `0X`. A hex
|
|
integer supplied for a signed integer value will be sign-extended.
|
|
For example, `0xffff` supplied as the literal for an `OpConstant`
|
|
on a signed 16-bit integer type will be interpreted as the value `-1`.
|
|
* a literal floating point number, in decimal or hexadecimal form.
|
|
See [below](#floats).
|
|
* a literal string.
|
|
* A literal string is everything following a double-quote `"` until the
|
|
following un-escaped double-quote. This includes special characters such
|
|
as newlines.
|
|
* A backslash `\` may be used to escape characters in the string. The `\`
|
|
may be used to escape a double-quote or a `\` but is simply ignored when
|
|
preceding any other character.
|
|
* a named enumerated value, specific to that operand position. For example,
|
|
the `OpMemoryModel` takes a named Addressing Model operand (e.g. `Logical` or
|
|
`Physical32`), and a named Memory Model operand (e.g. `Simple` or `OpenCL`).
|
|
Named enumerated values are only meaningful in specific positions, and will
|
|
otherwise generate an error.
|
|
* a mask expression, consisting of one or more mask enum names separated
|
|
by `|`. For example, the expression `NotNaN|NotInf|NSZ` denotes the mask
|
|
which is the combination of the `NotNaN`, `NotInf`, and `NSZ` flags.
|
|
* an injected immediate integer: `!<integer>`. See [below](#immediate).
|
|
* an ID, e.g. `%foo`. See [below](#id).
|
|
* the name of an extended instruction. For example, `sqrt` in an extended
|
|
instruction such as `%f = OpExtInst %f32 %OpenCLImport sqrt %arg`
|
|
* the name of an opcode for OpSpecConstantOp, but where the `Op` prefix
|
|
is removed. For example, the following indicates the use of an integer
|
|
addition in a specialization constant computation:
|
|
`%sum = OpSpecConstantOp %i32 IAdd %a %b`
|
|
|
|
## ID Definitions & Usage
|
|
<a name="id"></a>
|
|
|
|
An ID _definition_ pertains to the `<result-id>` of an instruction, and ID
|
|
_usage_ is a use of an ID as an input to an instruction.
|
|
|
|
An ID in the assembly language begins with `%` and must be followed by a name
|
|
consisting of one or more letters, numbers or underscore characters.
|
|
|
|
For every ID in the assembly program, the assembler generates a unique number
|
|
called the ID's internal number. Then each ID reference translates into its
|
|
internal number in the SPIR-V output. Internal numbers are unique within the
|
|
compilation unit: no two IDs in the same unit will share internal numbers.
|
|
|
|
The disassembler generates IDs where the name is always a decimal number
|
|
greater than 0.
|
|
|
|
So the example can be rewritten using more user-friendly names, as follows:
|
|
```
|
|
OpCapability Shader
|
|
OpMemoryModel Logical Simple
|
|
OpEntryPoint GLCompute %main "main"
|
|
OpExecutionMode %main LocalSize 64 64 1
|
|
%void = OpTypeVoid
|
|
%fnMain = OpTypeFunction %void
|
|
%main = OpFunction %void None %fnMain
|
|
%lbMain = OpLabel
|
|
OpReturn
|
|
OpFunctionEnd
|
|
```
|
|
|
|
## Floating point literals
|
|
<a name="floats"></a>
|
|
|
|
The assembler and disassembler support floating point literals in both
|
|
decimal and hexadecimal form.
|
|
|
|
The syntax for a floating point literal is the same as floating point
|
|
constants in the C programming language, except:
|
|
* An optional leading minus (`-`) is part of the literal.
|
|
* An optional type specifier suffix is not allowed.
|
|
Infinity and NaN values are expressed in hexadecimal float literals
|
|
by using the maximum representable exponent for the bit width.
|
|
|
|
For example, in 32-bit floating point, 8 bits are used for the exponent, and the
|
|
exponent bias is 127. So the maximum representable unbiased exponent is 128.
|
|
Therefore, we represent the infinities and some NaNs as follows:
|
|
|
|
```
|
|
%float32 = OpTypeFloat 32
|
|
%inf = OpConstant %float32 0x1p+128
|
|
%neginf = OpConstant %float32 -0x1p+128
|
|
%aNaN = OpConstant %float32 0x1.8p+128
|
|
%moreNaN = OpConstant %float32 -0x1.0002p+128
|
|
```
|
|
The assembler preserves all the bits of a NaN value. For example, the encoding
|
|
of `%aNaN` in the previous example is the same as the word with bits
|
|
`0x7fc00000`, and `%moreNaN` is encoded as `0xff800100`.
|
|
|
|
The disassembler prints infinite, NaN, and subnormal values in hexadecimal form.
|
|
Zero and normal values are printed in decimal form with enough digits
|
|
to preserve all significand bits.
|
|
|
|
## Arbitrary Integers
|
|
<a name="immediate"></a>
|
|
|
|
When writing tests it can be useful to emit an invalid 32 bit word into the
|
|
binary stream at arbitrary positions within the assembly. To specify an
|
|
arbitrary word into the stream the prefix `!` is used, this takes the form
|
|
`!<integer>`. Here is an example.
|
|
|
|
```
|
|
OpCapability !0x0000FF00
|
|
```
|
|
|
|
Any token in a valid assembly program may be replaced by `!<integer>` -- even
|
|
tokens that dictate how the rest of the instruction is parsed. Consider, for
|
|
example, the following assembly program:
|
|
|
|
```
|
|
%4 = OpConstant %1 123 456 789 OpExecutionMode %2 LocalSize 11 22 33
|
|
OpExecutionMode %3 InputLines
|
|
```
|
|
|
|
The tokens `OpConstant`, `LocalSize`, and `InputLines` may be replaced by random
|
|
`!<integer>` values, and the assembler will still assemble an output binary with
|
|
three instructions. It will not necessarily be valid SPIR-V, but it will
|
|
faithfully reflect the input text.
|
|
|
|
You may wonder how the assembler recognizes the instruction structure (including
|
|
instruction boundaries) in the text with certain crucial tokens replaced by
|
|
arbitrary integers. If, say, `OpConstant` becomes a `!<integer>` whose value
|
|
differs from the binary representation of `OpConstant` (remember that this
|
|
feature is intended for fine-grain control in SPIR-V testing), the assembler
|
|
generally has no idea what that value stands for. So how does it know there is
|
|
exactly one `<id>` and three number literals following in that instruction,
|
|
before the next one begins? And if `LocalSize` is replaced by an arbitrary
|
|
`!<integer>`, how does it know to take the next three tokens (instead of zero or
|
|
one, both of which are possible in the absence of certainty that `LocalSize`
|
|
provided)? The answer is a simple rule governing the parsing of instructions
|
|
with `!<integer>` in them:
|
|
|
|
When a token in the assembly program is a `!<integer>`, that integer value is
|
|
emitted into the binary output, and parsing proceeds differently than before:
|
|
each subsequent token not recognized as an OpCode or a <result-id> is emitted
|
|
into the binary output without any checking; when a recognizable OpCode or a
|
|
<result-id> is eventually encountered, it begins a new instruction and parsing
|
|
returns to normal. (If a subsequent OpCode is never found, then this alternate
|
|
parsing mode handles all the remaining tokens in the program.)
|
|
|
|
The assembler processes the tokens encountered in alternate parsing mode as
|
|
follows:
|
|
|
|
* If the token is a number literal, since context may be lost, the number
|
|
is interpreted as a 32-bit value and output as a single word. In order to
|
|
specify multiple-word literals in alternate-parsing mode, further uses of
|
|
`!<integer>` tokens may be required.
|
|
All formats supported by `strtoul()` are accepted.
|
|
* If the token is a string literal, it outputs a sequence of words representing
|
|
the string as defined in the SPIR-V specification for Literal String.
|
|
* If the token is an ID, it outputs the ID's internal number.
|
|
* If the token is another `!<integer>`, it outputs that integer.
|
|
* Any other token causes the assembler to quit with an error.
|
|
|
|
Note that this has some interesting consequences, including:
|
|
|
|
* When an OpCode is replaced by `!<integer>`, the integer value should encode
|
|
the instruction's word count, as specified in the physical-layout section of
|
|
the SPIR-V specification.
|
|
|
|
* Consecutive instructions may have their OpCode replaced by `!<integer>` and
|
|
still produce valid SPIR-V. For example, `!262187 %1 %2 "abc" !327739 %1 %3 6
|
|
%2` will successfully assemble into SPIR-V declaring a constant and a
|
|
PrivateGlobal variable.
|
|
|
|
* Enums (such as `DontInline` or `SubgroupMemory`, for instance) are not handled
|
|
by the alternate parsing mode. They must be replaced by `!<integer>` for
|
|
successful assembly.
|
|
|
|
* The `<result-id>` on the left-hand side of an assignment cannot be a
|
|
`!<integer>`. The `<result-id>` can be still be manually controlled if desired
|
|
by expressing the entire instruction as `!<integer>` tokens for its opcode and
|
|
operands.
|
|
|
|
* The `=` sign cannot be processed by the alternate parsing mode if the OpCode
|
|
following it is a `!<integer>`.
|
|
|
|
* When replacing a named ID with `!<integer>`, it is possible to generate
|
|
unintentionally valid SPIR-V. If the integer provided happens to equal a
|
|
number generated for an existing named ID, it will result in a reference to
|
|
that named ID being output. This may be valid SPIR-V, contrary to the
|
|
presumed intention of the writer.
|
|
|
|
## Notes
|
|
|
|
* Some enumerants cannot be used by name, because the target instruction
|
|
in which they are meaningful take an ID reference instead of a literal value.
|
|
For example:
|
|
* Named enumerated value `CmdExecTime` from section 3.30 Kernel
|
|
Profiling Info is used in constructing a mask value supplied as
|
|
an ID for `OpCaptureEventProfilingInfo`. But no other instruction
|
|
has enough context to bring the enumerant names from section 3.30
|
|
into scope.
|
|
* Similarly, the names in section 3.29 Kernel Enqueue Flags are used to
|
|
construct a value supplied as an ID to the Flags argument of
|
|
OpEnqueueKernel.
|
|
* Similarly for the names in section 3.25 Memory Semantics.
|
|
* Similarly for the names in section 3.27 Scope.
|
|
* Some enumerants cannot be used by name, because they only name values
|
|
returned by an instruction:
|
|
* Enumerants from 3.12 Image Channel Order name possible values returned
|
|
by the `OpImageQueryOrder` instruction.
|
|
* Enumerants from 3.13 Image Channel Data Type name possible values
|
|
returned by the `OpImageQueryFormat` instruction.
|