[mlir][gpu] Add documentation for the new GPU compilation mechanism

Adds documentation to the GPU dialect docs giving a general overview of the new compilation mechanism introduced in the patch series ending in D154153. Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D157461
2024-11-23 22:00:10 +00:00 · 2023-08-12 00:32:19 +00:00 · 2023-08-12 00:32:19 +00:00 · a7cdea7009
commit a7cdea7009
parent b43068e870
1 changed files with 99 additions and 0 deletions
--- a/mlir/docs/Dialects/GPU.md
+++ b/mlir/docs/Dialects/GPU.md
@ -36,6 +36,105 @@ instead; we chose not to use `alloca`-style approach that would require more
 complex lifetime analysis following the principles of MLIR that promote
 structure and representing analysis results in the IR.

+## GPU Compilation
+### Deprecation notice
+The `--gpu-to-(cubin|hsaco)` passes will be deprecated in a future release.
+
+### Compilation overview
+The compilation process in the GPU dialect has two main stages: GPU module
+serialization and offloading operations translation. Together these stages can
+produce GPU binaries and the necessary code to execute them.
+
+An example of how the compilation workflow look is:
+
+```
+mlir-opt example.mlir                   \
+  --pass-pipeline="builtin.module(      \
+    nvvm-attach-target{chip=sm_90 O=3}, \ # Attach an NVVM target to a gpu.module op.
+    gpu.module(convert-gpu-to-nvvm),    \ # Convert GPU to NVVM.
+    gpu-to-llvm,                        \ # Convert GPU to LLVM.
+    gpu-module-to-binary                \ # Serialize GPU modules to binaries.
+  )" -o example-nvvm.mlir
+mlir-translate example-nvvm.mlir        \
+  --mlir-to-llvmir                      \ # Obtain the translated LLVM IR.
+  -o example.ll
+```
+
+### Module serialization
+Attributes implementing the GPU Target Attribute Interface handle the
+serialization process and are called Target attributes. These attributes can be
+attached to GPU Modules indicating the serialization scheme to compile the
+module into a binary string.
+
+The `gpu-module-to-binary` pass searches for all nested GPU modules and
+serializes the module using the target attributes attached to the module,
+producing a binary with an object for every target.
+
+Example:
+```
+// Input:
+gpu.module @kernels [#nvvm.target<chip = "sm_90">, #nvvm.target<chip = "sm_60">] {
+  ...
+}
+// mlir-opt --gpu-module-to-binary:
+gpu.binary @kernels [
+  #gpu.object<#nvvm.target<chip = "sm_90">, "sm_90 cubin">,
+  #gpu.object<#nvvm.target<chip = "sm_60">, "sm_60 cubin">
+]
+```
+
+### Offloading LLVM translation
+Attributes implementing the GPU Offloading LLVM Translation Attribute Interface
+handle the translation of GPU binaries and kernel launches into LLVM
+instructions and are called Offloading attributes. These attributes are
+attached to GPU binary operations.
+
+During the LLVM translation process, GPU binaries get translated using the
+scheme provided by the Offloading attribute, translating the GPU binary into
+LLVM instructions. Meanwhile, Kernel launches are translated by searching the
+appropriate binary and invoking the procedure provided by the Offloading
+attribute in the binary for translating kernel launches into LLVM instructions.
+
+Example:
+```
+// Input:
+// Binary with multiple objects but selecting the second one for embedding.
+gpu.binary @binary <#gpu.select_object<#rocdl.target<chip = "gfx90a">>> [
+    #gpu.object<#nvvm.target, "NVPTX">,
+    #gpu.object<#rocdl.target<chip = "gfx90a">, "AMDGPU">
+  ]
+llvm.func @foo() {
+  ...
+  // Launching a kernel inside the binary.
+  gpu.launch_func @binary::@func blocks in (%0, %0, %0)
+                                 threads in (%0, %0, %0) : i64
+                                 dynamic_shared_memory_size %2
+                                 args(%1 : i32, %1 : i32)
+  ...
+}
+// mlir-translate --mlir-to-llvmir:
+@binary_bin_cst = internal constant [6 x i8] c"AMDGPU", align 8
+@binary_func_kernel_name = private unnamed_addr constant [7 x i8] c"func\00", align 1
+...
+define void @foo() {
+  ...
+  %module = call ptr @mgpuModuleLoad(ptr @binary_bin_cst)
+  %kernel = call ptr @mgpuModuleGetFunction(ptr %module, ptr @binary_func_kernel_name)
+  call void @mgpuLaunchKernel(ptr %kernel, ...) ; Launch the kernel
+  ...
+  call void @mgpuModuleUnload(ptr %module)
+  ...
+}
+...
+```
+
+### The binary operation
+From a semantic point of view, GPU binaries allow the implementation of many
+concepts, from simple object files to fat binaries. By default, the binary
+operation uses the `#gpu.select_object` offloading attribute; this attribute
+embeds a single object in the binary as a global string, see the attribute docs
+for more information.
+
 ## Operations

 [include "Dialects/GPUOps.md"]