mirror of
https://github.com/capstone-engine/llvm-capstone.git
synced 2024-11-23 05:40:09 +00:00
534 lines
22 KiB
ReStructuredText
534 lines
22 KiB
ReStructuredText
=====================
|
|
Clang Offload Bundler
|
|
=====================
|
|
|
|
.. contents::
|
|
:local:
|
|
|
|
.. _clang-offload-bundler:
|
|
|
|
Introduction
|
|
============
|
|
|
|
For heterogeneous single source programming languages, use one or more
|
|
``--offload-arch=<target-id>`` Clang options to specify the target IDs of the
|
|
code to generate for the offload code regions.
|
|
|
|
The tool chain may perform multiple compilations of a translation unit to
|
|
produce separate code objects for the host and potentially multiple offloaded
|
|
devices. The ``clang-offload-bundler`` tool may be used as part of the tool
|
|
chain to combine these multiple code objects into a single bundled code object.
|
|
|
|
The tool chain may use a bundled code object as an intermediate step so that
|
|
each tool chain step consumes and produces a single file as in traditional
|
|
non-heterogeneous tool chains. The bundled code object contains the code objects
|
|
for the host and all the offload devices.
|
|
|
|
A bundled code object may also be used to bundle just the offloaded code
|
|
objects, and embedded as data into the host code object. The host compilation
|
|
includes an ``init`` function that will use the runtime corresponding to the
|
|
offload kind (see :ref:`clang-offload-kind-table`) to load the offload code
|
|
objects appropriate to the devices present when the host program is executed.
|
|
|
|
:program:`clang-offload-bundler` is located in
|
|
`clang/tools/clang-offload-bundler`.
|
|
|
|
.. code-block:: console
|
|
|
|
$ clang-offload-bundler -help
|
|
OVERVIEW: A tool to bundle several input files of the specified type <type>
|
|
referring to the same source file but different targets into a single
|
|
one. The resulting file can also be unbundled into different files by
|
|
this tool if -unbundle is provided.
|
|
|
|
USAGE: clang-offload-bundler [options]
|
|
|
|
OPTIONS:
|
|
|
|
Generic Options:
|
|
|
|
--help - Display available options (--help-hidden for more)
|
|
--help-list - Display list of available options (--help-list-hidden for more)
|
|
--version - Display the version of this program
|
|
|
|
clang-offload-bundler options:
|
|
|
|
--### - Print any external commands that are to be executed instead of actually executing them - for testing purposes.
|
|
--allow-missing-bundles - Create empty files if bundles are missing when unbundling.
|
|
--bundle-align=<uint> - Alignment of bundle for binary files
|
|
--check-input-archive - Check if input heterogeneous archive is valid in terms of TargetID rules.
|
|
--inputs=<string> - [<input file>,...]
|
|
--list - List bundle IDs in the bundled file.
|
|
--outputs=<string> - [<output file>,...]
|
|
--targets=<string> - [<offload kind>-<target triple>,...]
|
|
--type=<string> - Type of the files to be bundled/unbundled.
|
|
Current supported types are:
|
|
i - cpp-output
|
|
ii - c++-cpp-output
|
|
cui - cuda/hip-output
|
|
d - dependency
|
|
ll - llvm
|
|
bc - llvm-bc
|
|
s - assembler
|
|
o - object
|
|
a - archive of bundled files
|
|
gch - precompiled-header
|
|
ast - clang AST file
|
|
--unbundle - Unbundle bundled file into several output files.
|
|
|
|
Usage
|
|
=====
|
|
|
|
This tool can be used as follows for bundling:
|
|
|
|
::
|
|
|
|
clang-offload-bundler -targets=triple1,triple2 -type=ii -inputs=a.triple1.ii,a.triple2.ii -outputs=a.ii
|
|
|
|
or, it can be used as follows for unbundling:
|
|
|
|
::
|
|
|
|
clang-offload-bundler -targets=triple1,triple2 -type=ii -outputs=a.triple1.ii,a.triple2.ii -inputs=a.ii -unbundle
|
|
|
|
|
|
Supported File Formats
|
|
======================
|
|
|
|
Multiple text and binary file formats are supported for bundling/unbundling. See
|
|
:ref:`supported-file-formats-table` for a list of currently supported input
|
|
formats. Use the ``File Type`` column to determine the value to pass to the
|
|
``--type`` option based on the type of input files while bundling/unbundling.
|
|
|
|
.. table:: Supported File Formats
|
|
:name: supported-file-formats-table
|
|
|
|
+--------------------------+----------------+-------------+
|
|
| File Format | File Type | Text/Binary |
|
|
+==========================+================+=============+
|
|
| CPP output | i | Text |
|
|
+--------------------------+----------------+-------------+
|
|
| C++ CPP output | ii | Text |
|
|
+--------------------------+----------------+-------------+
|
|
| CUDA/HIP output | cui | Text |
|
|
+--------------------------+----------------+-------------+
|
|
| Dependency | d | Text |
|
|
+--------------------------+----------------+-------------+
|
|
| LLVM | ll | Text |
|
|
+--------------------------+----------------+-------------+
|
|
| LLVM Bitcode | bc | Binary |
|
|
+--------------------------+----------------+-------------+
|
|
| Assembler | s | Text |
|
|
+--------------------------+----------------+-------------+
|
|
| Object | o | Binary |
|
|
+--------------------------+----------------+-------------+
|
|
| Archive of bundled files | a | Binary |
|
|
+--------------------------+----------------+-------------+
|
|
| Precompiled header | gch | Binary |
|
|
+--------------------------+----------------+-------------+
|
|
| Clang AST file | ast | Binary |
|
|
+--------------------------+----------------+-------------+
|
|
|
|
.. _clang-bundled-code-object-layout-text:
|
|
|
|
Bundled Text File Layout
|
|
========================
|
|
|
|
The text file formats are concatenated with comments that have a magic string
|
|
and bundle entry ID in between. The BNF syntax to represent a code object
|
|
bundle file is:
|
|
|
|
::
|
|
|
|
<file> ::== <bundle> | <bundle> <file>
|
|
<bundle> ::== <comment> <start> <bundle_id> <eol> <bundle> <eol>
|
|
<comment> end <bundle_id> <eol>
|
|
<start> ::== OFFLOAD_BUNDLER_MAGIC_STR__START__
|
|
<end> ::== OFFLOAD_BUNDLER_MAGIC_STR__END__
|
|
|
|
**comment**
|
|
The symbol used for starting single-line comment in the file type of
|
|
constituting bundles. E.g. it is ";" for ll ``File Type`` and "#" for "s"
|
|
``File Type``.
|
|
|
|
**bundle_id**
|
|
The :ref:`clang-bundle-entry-id` for the enclosing bundle.
|
|
|
|
**eol**
|
|
The end of line character.
|
|
|
|
**bundle**
|
|
The code object stored in one of the supported text file formats.
|
|
|
|
**OFFLOAD_BUNDLER_MAGIC_STR__**
|
|
Magic string that marks the existence of offloading data i.e.
|
|
"__CLANG_OFFLOAD_BUNDLE__".
|
|
|
|
.. _clang-bundled-code-object-layout:
|
|
|
|
Bundled Binary File Layout
|
|
==========================
|
|
|
|
The layout of a bundled code object is defined by the following table:
|
|
|
|
.. table:: Bundled Code Object Layout
|
|
:name: bundled-code-object-layout-table
|
|
|
|
=================================== ======= ================ ===============================
|
|
Field Type Size in Bytes Description
|
|
=================================== ======= ================ ===============================
|
|
Magic String string 24 ``__CLANG_OFFLOAD_BUNDLE__``
|
|
Number Of Bundle Entries integer 8 Number of bundle entries.
|
|
1st Bundle Entry Code Object Offset integer 8 Byte offset from beginning of
|
|
bundled code object to 1st code
|
|
object.
|
|
1st Bundle Entry Code Object Size integer 8 Byte size of 1st code object.
|
|
1st Bundle Entry ID Length integer 8 Character length of bundle
|
|
entry ID of 1st code object.
|
|
1st Bundle Entry ID string 1st Bundle Entry Bundle entry ID of 1st code
|
|
ID Length object. This is not NUL
|
|
terminated. See
|
|
:ref:`clang-bundle-entry-id`.
|
|
\...
|
|
Nth Bundle Entry Code Object Offset integer 8
|
|
Nth Bundle Entry Code Object Size integer 8
|
|
Nth Bundle Entry ID Length integer 8
|
|
Nth Bundle Entry ID string 1st Bundle Entry
|
|
ID Length
|
|
1st Bundle Entry Code Object bytes 1st Bundle Entry
|
|
Code Object Size
|
|
\...
|
|
Nth Bundle Entry Code Object bytes Nth Bundle Entry
|
|
Code Object Size
|
|
=================================== ======= ================ ===============================
|
|
|
|
.. _clang-bundle-entry-id:
|
|
|
|
Bundle Entry ID
|
|
===============
|
|
|
|
Each entry in a bundled code object (see :ref:`clang-bundled-code-object-layout-text`
|
|
and :ref:`clang-bundled-code-object-layout`) has a bundle entry ID that indicates
|
|
the kind of the entry's code object and the runtime that manages it.
|
|
|
|
Bundle entry ID syntax is defined by the following BNF syntax:
|
|
|
|
.. code::
|
|
|
|
<bundle-entry-id> ::== <offload-kind> "-" <target-triple> [ "-" <target-id> ]
|
|
|
|
Where:
|
|
|
|
**offload-kind**
|
|
The runtime responsible for managing the bundled entry code object. See
|
|
:ref:`clang-offload-kind-table`.
|
|
|
|
.. table:: Bundled Code Object Offload Kind
|
|
:name: clang-offload-kind-table
|
|
|
|
============= ==============================================================
|
|
Offload Kind Description
|
|
============= ==============================================================
|
|
host Host code object. ``clang-offload-bundler`` always includes
|
|
this entry as the first bundled code object entry. For an
|
|
embedded bundled code object this entry is not used by the
|
|
runtime and so is generally an empty code object.
|
|
|
|
hip Offload code object for the HIP language. Used for all
|
|
HIP language offload code objects when the
|
|
``clang-offload-bundler`` is used to bundle code objects as
|
|
intermediate steps of the tool chain. Also used for AMD GPU
|
|
code objects before ABI version V4 when the
|
|
``clang-offload-bundler`` is used to create a *fat binary*
|
|
to be loaded by the HIP runtime. The fat binary can be
|
|
loaded directly from a file, or be embedded in the host code
|
|
object as a data section with the name ``.hip_fatbin``.
|
|
|
|
hipv4 Offload code object for the HIP language. Used for AMD GPU
|
|
code objects with at least ABI version V4 when the
|
|
``clang-offload-bundler`` is used to create a *fat binary*
|
|
to be loaded by the HIP runtime. The fat binary can be
|
|
loaded directly from a file, or be embedded in the host code
|
|
object as a data section with the name ``.hip_fatbin``.
|
|
|
|
openmp Offload code object for the OpenMP language extension.
|
|
============= ==============================================================
|
|
|
|
**target-triple**
|
|
The target triple of the code object. See `Target Triple
|
|
<https://clang.llvm.org/docs/CrossCompilation.html#target-triple>`_.
|
|
|
|
The bundler accepts target triples with or without the optional environment
|
|
field:
|
|
|
|
``<arch><sub>-<vendor>-<sys>``, or
|
|
``<arch><sub>-<vendor>-<sys>-<env>``
|
|
|
|
However, in order to standardize outputs for tools that consume bitcode
|
|
bundles, bundles written by the bundler internally use only the 4-field
|
|
target triple:
|
|
|
|
``<arch><sub>-<vendor>-<sys>-<env>``
|
|
|
|
**target-id**
|
|
The canonical target ID of the code object. Present only if the target
|
|
supports a target ID. See :ref:`clang-target-id`.
|
|
|
|
.. _code-object-composition:
|
|
|
|
Bundled Code Object Composition
|
|
-------------------------------
|
|
|
|
* Each entry of a bundled code object must have a different bundle entry ID.
|
|
* There can be multiple entries for the same processor provided they differ
|
|
in target feature settings.
|
|
* If there is an entry with a target feature specified as *Any*, then all
|
|
entries must specify that target feature as *Any* for the same processor.
|
|
|
|
There may be additional target specific restrictions.
|
|
|
|
.. _compatibility-bundle-entry-id:
|
|
|
|
Compatibility Rules for Bundle Entry ID
|
|
---------------------------------------
|
|
|
|
A code object, specified using its Bundle Entry ID, can be loaded and
|
|
executed on a target processor, if:
|
|
|
|
* Their offload kinds are the same.
|
|
* Their target triples are compatible.
|
|
* Their Target IDs are compatible as defined in :ref:`compatibility-target-id`.
|
|
|
|
.. _clang-target-id:
|
|
|
|
Target ID
|
|
=========
|
|
|
|
A target ID is used to indicate the processor and optionally its configuration,
|
|
expressed by a set of target features, that affect ISA generation. It is target
|
|
specific if a target ID is supported, or if the target triple alone is
|
|
sufficient to specify the ISA generation.
|
|
|
|
It is used with the ``-mcpu=<target-id>`` and ``--offload-arch=<target-id>``
|
|
Clang compilation options to specify the kind of code to generate.
|
|
|
|
It is also used as part of the bundle entry ID to identify the code object. See
|
|
:ref:`clang-bundle-entry-id`.
|
|
|
|
Target ID syntax is defined by the following BNF syntax:
|
|
|
|
.. code::
|
|
|
|
<target-id> ::== <processor> ( ":" <target-feature> ( "+" | "-" ) )*
|
|
|
|
Where:
|
|
|
|
**processor**
|
|
Is a the target specific processor or any alternative processor name.
|
|
|
|
**target-feature**
|
|
Is a target feature name that is supported by the processor. Each target
|
|
feature must appear at most once in a target ID and can have one of three
|
|
values:
|
|
|
|
*Any*
|
|
Specified by omitting the target feature from the target ID.
|
|
A code object compiled with a target ID specifying the default
|
|
value of a target feature can be loaded and executed on a processor
|
|
configured with the target feature on or off.
|
|
|
|
*On*
|
|
Specified by ``+``, indicating the target feature is enabled. A code
|
|
object compiled with a target ID specifying a target feature on
|
|
can only be loaded on a processor configured with the target feature on.
|
|
|
|
*Off*
|
|
specified by ``-``, indicating the target feature is disabled. A code
|
|
object compiled with a target ID specifying a target feature off
|
|
can only be loaded on a processor configured with the target feature off.
|
|
|
|
.. _compatibility-target-id:
|
|
|
|
Compatibility Rules for Target ID
|
|
---------------------------------
|
|
|
|
A code object compiled for a Target ID is considered compatible for a
|
|
target, if:
|
|
|
|
* Their processor is same.
|
|
* Their feature set is compatible as defined above.
|
|
|
|
There are two forms of target ID:
|
|
|
|
*Non-Canonical Form*
|
|
The non-canonical form is used as the input to user commands to allow the user
|
|
greater convenience. It allows both the primary and alternative processor name
|
|
to be used and the target features may be specified in any order.
|
|
|
|
*Canonical Form*
|
|
The canonical form is used for all generated output to allow greater
|
|
convenience for tools that consume the information. It is also used for
|
|
internal passing of information between tools. Only the primary and not
|
|
alternative processor name is used and the target features are specified in
|
|
alphabetic order. Command line tools convert non-canonical form to canonical
|
|
form.
|
|
|
|
Target Specific information
|
|
===========================
|
|
|
|
Target specific information is available for the following:
|
|
|
|
*AMD GPU*
|
|
AMD GPU supports target ID and target features. See `User Guide for AMDGPU Backend
|
|
<https://llvm.org/docs/AMDGPUUsage.html>`_ which defines the `processors
|
|
<https://llvm.org/docs/AMDGPUUsage.html#amdgpu-processors>`_ and `target
|
|
features <https://llvm.org/docs/AMDGPUUsage.html#amdgpu-target-features>`_
|
|
supported.
|
|
|
|
Most other targets do not support target IDs.
|
|
|
|
Archive Unbundling
|
|
==================
|
|
|
|
Unbundling of a heterogeneous device archive (HDA) is done to create device specific
|
|
archives. HDA is in a format compatible with GNU ``ar`` utility and contains a
|
|
collection of bundled device binaries where each bundle file will contain
|
|
device binaries for a host and one or more targets. The output device-specific
|
|
archive is in a format compatible with GNU ``ar`` utility and contains a
|
|
collection of device binaries for a specific target.
|
|
|
|
::
|
|
|
|
Heterogeneous Device Archive, HDA = {F1.X, F2.X, ..., FN.Y}
|
|
where, Fi = Bundle{Host-DeviceBinary, T1-DeviceBinary, T2-DeviceBinary, ...,
|
|
Tm-DeviceBinary},
|
|
Ti = {Target i, qualified using Bundle Entry ID},
|
|
X/Y = \*.bc for AMDGPU and \*.cubin for NVPTX
|
|
|
|
Device Specific Archive, DSA(Tk) = {F1-Tk-DeviceBinary.X, F2-Tk-DeviceBinary.X, ...
|
|
FN-Tk-DeviceBinary.Y}
|
|
where, Fi-Tj-DeviceBinary.X represents device binary of i-th bundled device
|
|
binary file for target Tj.
|
|
|
|
The clang-offload-bundler extracts compatible device binaries for a given target
|
|
from the bundled device binaries in a heterogeneous device archive and creates
|
|
a target-specific device archive without bundling.
|
|
|
|
The clang-offload-bundler determines whether a device binary is compatible
|
|
with a target by comparing bundle IDs. Two bundle IDs are considered
|
|
compatible if:
|
|
|
|
* Their offload kinds are the same
|
|
* Their target triples are the same
|
|
* Their Target IDs are the same
|
|
|
|
Creating a Heterogeneous Device Archive
|
|
---------------------------------------
|
|
|
|
1. Compile source file(s) to generate object file(s)
|
|
|
|
::
|
|
|
|
clang -O2 -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa,amdgcn-amd-amdhsa,\
|
|
nvptx64-nvidia-cuda, nvptx64-nvidia-cuda \
|
|
-Xopenmp-target=amdgcn-amd-amdhsa -march=gfx906:sramecc-:xnack+ \
|
|
-Xopenmp-target=amdgcn-amd-amdhsa -march=gfx906:sramecc+:xnack+ \
|
|
-Xopenmp-target=nvptx64-nvidia-cuda -march=sm_70 \
|
|
-Xopenmp-target=nvptx64-nvidia-cuda -march=sm_80 \
|
|
-c func_1.c -o func_1.o
|
|
|
|
clang -O2 -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa,amdgcn-amd-amdhsa,
|
|
nvptx64-nvidia-cuda, nvptx64-nvidia-cuda \
|
|
-Xopenmp-target=amdgcn-amd-amdhsa -march=gfx906:sramecc-:xnack+ \
|
|
-Xopenmp-target=amdgcn-amd-amdhsa -march=gfx906:sramecc+:xnack+ \
|
|
-Xopenmp-target=nvptx64-nvidia-cuda -march=sm_70 \
|
|
-Xopenmp-target=nvptx64-nvidia-cuda -march=sm_80 \
|
|
-c func_2.c -o func_2.o
|
|
|
|
2. Create a heterogeneous device archive by combining all the object file(s)
|
|
|
|
::
|
|
|
|
llvm-ar cr libFatArchive.a func_1.o func_2.o
|
|
|
|
Extracting a Device Specific Archive
|
|
------------------------------------
|
|
|
|
UnbundleArchive takes a heterogeneous device archive file (".a") as input
|
|
containing bundled device binary files, and a list of offload targets (not
|
|
host), and extracts the device binaries into a new archive file for each
|
|
offload target. Each resulting archive file contains all device binaries
|
|
compatible with that particular offload target. Compatibility between a
|
|
device binary in HDA and a target is based on the compatibility between their
|
|
bundle entry IDs as defined in :ref:`compatibility-bundle-entry-id`.
|
|
|
|
Following cases may arise during compatibility testing:
|
|
|
|
* A binary is compatible with one or more targets: Insert the binary into the
|
|
device-specific archive of each compatible target.
|
|
* A binary is not compatible with any target: Skip the binary.
|
|
* One or more binaries are compatible with a target: Insert all binaries into
|
|
the device-specific archive of the target. The insertion need not be ordered.
|
|
* No binary is compatible with a target: If ``allow-missing-bundles`` option is
|
|
present then create an empty archive for the target. Otherwise, produce an
|
|
error without creating an archive.
|
|
|
|
The created archive file does not contain an index of the symbols and device
|
|
binary files are named as <<Parent Bundle Name>-<DeviceBinary's TargetID>>,
|
|
with ':' replaced with '_'.
|
|
|
|
Usage
|
|
-----
|
|
|
|
::
|
|
|
|
clang-offload-bundler --unbundle --inputs=libFatArchive.a -type=a \
|
|
-targets=openmp-amdgcn-amdhsa-gfx906:sramecc+:xnack+, \
|
|
openmp-amdgcn-amdhsa-gfx908:sramecc-:xnack+ \
|
|
-outputs=devicelib-gfx906.a,deviceLib-gfx908.a
|
|
|
|
.. _additional-options-archive-unbundling:
|
|
|
|
Additional Options while Archive Unbundling
|
|
-------------------------------------------
|
|
|
|
**-allow-missing-bundles**
|
|
Create an empty archive file if no compatible device binary is found.
|
|
|
|
**-check-input-archive**
|
|
Check if input heterogeneous device archive follows rules for composition
|
|
as defined in :ref:`code-object-composition` before creating device-specific
|
|
archive(s).
|
|
|
|
**-debug-only=CodeObjectCompatibility**
|
|
Verbose printing of matched/unmatched comparisons between bundle entry id of
|
|
a device binary from HDA and bundle entry ID of a given target processor
|
|
(see :ref:`compatibility-bundle-entry-id`).
|
|
|
|
Compression and Decompression
|
|
=============================
|
|
|
|
``clang-offload-bundler`` provides features to compress and decompress the full
|
|
bundle, leveraging inherent redundancies within the bundle entries. Use the
|
|
`-compress` command-line option to enable this compression capability.
|
|
|
|
The compressed offload bundle begins with a header followed by the compressed binary data:
|
|
|
|
- **Magic Number (4 bytes)**:
|
|
This is a unique identifier to distinguish compressed offload bundles. The value is the string 'CCOB' (Compressed Clang Offload Bundle).
|
|
|
|
- **Version Number (16-bit unsigned int)**:
|
|
This denotes the version of the compressed offload bundle format. The current version is `1`.
|
|
|
|
- **Compression Method (16-bit unsigned int)**:
|
|
This field indicates the compression method used. The value corresponds to either `zlib` or `zstd`, represented as a 16-bit unsigned integer cast from the LLVM compression enumeration.
|
|
|
|
- **Uncompressed Binary Size (32-bit unsigned int)**:
|
|
This is the size (in bytes) of the binary data before it was compressed.
|
|
|
|
- **Hash (64-bit unsigned int)**:
|
|
This is a 64-bit truncated MD5 hash of the uncompressed binary data. It serves for verification and caching purposes.
|
|
|
|
- **Compressed Data**:
|
|
The actual compressed binary data follows the header. Its size can be inferred from the total size of the file minus the header size.
|