llvm-capstone/libc/docs/gpu_mode.rst

.. _GPU_mode:

==============
GPU Mode
==============

.. include:: check.rst

.. contents:: Table of Contents
  :depth: 4
  :local:

.. note:: This feature is very experimental and may change in the future.

The *GPU* mode of LLVM's libc is an experimental mode used to support calling
libc routines during GPU execution. The goal of this project is to provide
access to the standard C library on systems running accelerators. To begin using
this library, build and install the ``libcgpu.a`` static archive following the
instructions in :ref:`building_gpu_mode` and link with your offloading
application.

.. _building_gpu_mode:

Building the GPU library
========================

LLVM's libc GPU support *must* be built using the same compiler as the final
application to ensure relative LLVM bitcode compatibility. This can be done
automatically using the ``LLVM_ENABLE_RUNTIMES=libc`` option. Furthermore,
building for the GPU is only supported in :ref:`fullbuild_mode`. To enable the
GPU build, set the target OS to ``gpu`` via ``LLVM_LIBC_TARGET_OS=gpu``. By
default, ``libcgpu.a`` will be built using every supported GPU architecture. To
restrict the number of architectures build, set ``LLVM_LIBC_GPU_ARCHITECTURES``
to the list of desired architectures or use ``all``. A typical ``cmake``
configuration will look like this:

.. code-block:: sh

  $> cd llvm-project  # The llvm-project checkout
  $> mkdir build
  $> cd build
  $> cmake ../llvm -G Ninja                                \
     -DLLVM_ENABLE_PROJECTS="clang;lld;compiler-rt"        \
     -DLLVM_ENABLE_RUNTIMES="libc;openmp"                  \
     -DCMAKE_BUILD_TYPE=<Debug|Release>  \ # Select build type
     -DLLVM_LIBC_FULL_BUILD=ON           \ # We need the full libc
     -DLIBC_GPU_BUILD=ON                 \ # Build in GPU mode
     -DLLVM_LIBC_GPU_ARCHITECTURES=all   \ # Build all supported architectures
     -DCMAKE_INSTALL_PREFIX=<PATH>       \ # Where 'libcgpu.a' will live
  $> ninja install

Since we want to include ``clang``, ``lld`` and ``compiler-rt`` in our
toolchain, we list them in ``LLVM_ENABLE_PROJECTS``. To ensure ``libc`` is built
using a compatible compiler and to support ``openmp`` offloading, we list them
in ``LLVM_ENABLE_RUNTIMES`` to build them after the enabled projects using the
newly built compiler. ``CMAKE_INSTALL_PREFIX`` specifies the installation
directory in which to install the ``libcgpu.a`` library along with LLVM.

Usage
=====

Once the ``libcgpu.a`` static archive has been built in
:ref:`building_gpu_mode`, it can be linked directly with offloading applications
as a standard library. This process is described in the `clang documentation
<https://clang.llvm.org/docs/OffloadingDesign.html>_`. This linking mode is used
by the OpenMP toolchain, but is currently opt-in for the CUDA and HIP toolchains
using the ``--offload-new-driver``` and ``-fgpu-rdc`` flags. A typical usage
will look this this:

.. code-block:: sh

  $> clang foo.c -fopenmp --offload-arch=gfx90a -lcgpu

The ``libcgpu.a`` static archive is a fat-binary containing LLVM-IR for each
supported target device. The supported architectures can be seen using LLVM's
objdump with the ``--offloading`` flag:

.. code-block:: sh

  $> llvm-objdump --offloading libcgpu.a
  libcgpu.a(strcmp.cpp.o):    file format elf64-x86-64

  OFFLOADING IMAGE [0]:
  kind            llvm ir
  arch            gfx90a
  triple          amdgcn-amd-amdhsa
  producer        <none>

Because the device code is stored inside a fat binary, it can be difficult to
inspect the resulting code. This can be done using the following utilities:

.. code-block:: sh

   $> llvm-ar x libcgpu.a strcmp.cpp.o
   $> clang-offload-packager strcmp.cpp.o --image=arch=gfx90a,file=gfx90a.bc
   $> opt -S out.bc
   ...

Supported Functions
===================

The following functions and headers are supported at least partially on the
device. Currently, only basic device functions that do not require an operating
system are supported on the device. Supporting functions like `malloc` using an
RPC mechanism is a work-in-progress.

ctype.h
-------

=============  =========
Function Name  Available
=============  =========
isalnum        |check|
isalpha        |check|
isascii        |check|
isblank        |check|
iscntrl        |check|
isdigit        |check|
isgraph        |check|
islower        |check|
isprint        |check|
ispunct        |check|
isspace        |check|
isupper        |check|
isxdigit       |check|
toascii        |check|
tolower        |check|
toupper        |check|
=============  =========

string.h
--------

=============   =========
Function Name   Available
=============   =========
bcmp            |check|
bzero           |check|
memccpy         |check|
memchr          |check|
memcmp          |check|
memcpy          |check|
memmove         |check|
mempcpy         |check|
memrchr         |check|
memset          |check|
stpcpy          |check|
stpncpy         |check|
strcat          |check|
strchr          |check|
strcmp          |check|
strcpy          |check|
strcspn         |check|
strlcat         |check|
strlcpy         |check|
strlen          |check|
strncat         |check|
strncmp         |check|
strncpy         |check|
strnlen         |check|
strpbrk         |check|
strrchr         |check|
strspn          |check|
strstr          |check|
strtok          |check|
strtok_r        |check|
strdup
strndup
=============   =========