[libomptarget] Initial documentation on amdgpu offload

[libomptarget] Initial documentation on amdgpu offload Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D101927
2024-11-23 22:00:10 +00:00 · 2021-05-05 19:58:51 +01:00 · 2021-05-05 19:58:51 +01:00 · 25fe17d3c1
commit 25fe17d3c1
parent 18959a6a09
1 changed files with 72 additions and 8 deletions
--- a/openmp/docs/SupportAndFAQ.rst
+++ b/openmp/docs/SupportAndFAQ.rst
@ -49,20 +49,16 @@ All patches go through the regular `LLVM review process

 .. _build_offload_capable_compiler:

-Q: How to build an OpenMP offload capable compiler?
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
+Q: How to build an OpenMP GPU offload capable compiler?
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 To build an *effective* OpenMP offload capable compiler, only one extra CMake
 option, `LLVM_ENABLE_RUNTIMES="openmp"`, is needed when building LLVM (Generic
 information about building LLVM is available `here <https://llvm.org/docs/GettingStarted.html>`__.).
 Make sure all backends that are targeted by OpenMP to be enabled. By default,
 Clang will be built with all backends enabled.

-If your build machine is not the target machine or automatic detection of the
-available GPUs failed, you should also set:
-
- `CLANG_OPENMP_NVPTX_DEFAULT_ARCH=sm_XX` where `XX` is the architecture of your GPU, e.g, 80.
- `LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES=YY` where `YY` is the numeric compute capacity of your GPU, e.g., 75.
+For Nvidia offload, please see :ref:`_build_nvidia_offload_capable_compiler`.
+For AMDGPU offload, please see :ref:`_build_amdgpu_offload_capable_compiler`.

 .. note::
  The compiler that generates the offload code should be the same (version) as
@ -71,7 +67,75 @@ available GPUs failed, you should also set:

 .. _advanced_builds: https://llvm.org//docs/AdvancedBuilds.html

+.. _build_nvidia_offload_capable_compiler:

+Q: How to build an OpenMP NVidia offload capable compiler?
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+The Cuda SDK is required on the machine that will execute the openmp application.
+
+If your build machine is not the target machine or automatic detection of the
+available GPUs failed, you should also set:
+
+- `CLANG_OPENMP_NVPTX_DEFAULT_ARCH=sm_XX` where `XX` is the architecture of your GPU, e.g, 80.
+- `LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES=YY` where `YY` is the numeric compute capacity of your GPU, e.g., 75.
+
+
+.. _build_amdgpu_offload_capable_compiler:
+
+Q: How to build an OpenMP AMDGPU offload capable compiler?
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+A subset of the `ROCm <https://github.com/radeonopencompute>` toolchain is
+required to build the LLVM toolchain and to execute the openmp application.
+Either install ROCm somewhere that cmake's find_package can locate it, or
+build the required subcomponents ROCt and ROCr from source.
+
+The two components used are ROCT-Thunk-Interface, roct, and ROCR-Runtime,
+rocr. Roct is the userspace part of the linux driver. It calls into the
+driver which ships with the linux kernel. It is an implementation detail of
+Rocr from OpenMP's perspective. Rocr is an implementation of `HSA <http://www.hsafoundation.com>`.
+
+    SOURCE_DIR=same-as-llvm-source # e.g. the checkout of llvm-project, next to openmp
+    BUILD_DIR=somewhere
+    INSTALL_PREFIX=same-as-llvm-install
+    
+    cd $SOURCE_DIR
+    git clone git@github.com:RadeonOpenCompute/ROCT-Thunk-Interface.git -b roc-4.1.x --single-branch
+    git clone git@github.com:RadeonOpenCompute/ROCR-Runtime.git -b rocm-4.1.x --single-branch
+    
+    cd $BUILD_DIR && mkdir roct && cd roct
+    cmake $SOURCE_DIR/ROCT-Thunk-Interface/ -DCMAKE_INSTALL_PREFIX=$INSTALL_PREFIX -DCMAKE_BUILD_TYPE=Release -DBUILD_SHARED_LIBS=OFF
+    make && make install
+
+    cd $BUILD_DIR && mkdir rocr && cd rocr
+    cmake $SOURCE_DIR/ROCR-Runtime/src -DIMAGE_SUPPORT=OFF -DCMAKE_INSTALL_PREFIX=$INSTALL_PREFIX -DCMAKE_BUILD_TYPE=Release -DBUILD_SHARED_LIBS=ON
+    make && make install
+
+IMAGE_SUPPORT requires building rocr with clang and is not used by openmp.
+    
+Provided cmake's find_package can find the ROCR-Runtime package, LLVM will
+build a tool `bin/amdgpu-arch` which will print a string like 'gfx906' when
+run if it recognises a GPU on the local system. LLVM will also build a shared
+library, libomptarget.rtl.amdgpu.so, which is linked against rocr.
+
+With those libraries installed, then LLVM build and installed, try:
+
+    clang -O2 -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa example.c -o example && ./example
+
+Q: What are the known limitations of OpenMP AMDGPU offload?
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+LD_LIBRARY_PATH is presently required to find the openmp libraries.
+
+There is no libc. That is, malloc and printf do not exist. Also no libm, so
+functions like cos(double) will not work from target regions.
+
+Cards from the gfx10 line, 'navi', that use wave32 are not yet implemented.
+
+Some versions of the driver for the radeon vii (gfx906) will error unless the
+environment variable 'export HSA_IGNORE_SRAMECC_MISREPORT=1' is set.
+
+It is a recent addition to LLVM and the implementation differs from that which
+has been shipping in ROCm and AOMP for some time. Early adopters will encounter
+bugs.

 Q: Does OpenMP offloading support work in pre-packaged LLVM releases?
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^