500 Commits

Author SHA1 Message Date
Aart Bik
3635c74375
[mlir][gpu][sparse] gracefully accept zero size allocation (#66127)
This cleans up a unnecessary code that changes zero size allocation to
avoid the following error message

'cuMemAlloc(&ptr, sizeBytes)' failed with 'CUDA_ERROR_INVALID_VALUE'
2023-09-12 13:07:24 -07:00
Guray Ozen
1dc0071216 [MLIR] Guard Cuda 12.0+ newer driver APIs with CUDA_VERSION macro checks
Fixes #64529
https://github.com/llvm/llvm-project/issues/64529

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D159440
2023-09-06 08:17:06 +02:00
Aart Bik
9ce445b8c7 [mlir][sparse] simplification of sparse runtime support lib
Incorporated two header files directly into other since
other parts were used (and it makes it hard to find the
definitions). Removed TODOs that are less likely to be done.

Reviewed By: yinying-lisa-li

Differential Revision: https://reviews.llvm.org/D159381
2023-09-01 14:00:19 -07:00
Mehdi Amini
471004c5c9 Revert "[mlir][sparse] simplification of sparse runtime support lib"
This reverts commit 14c58cf5c39a39a335893bc98493c5edc75a91b3.

The gcc7 build is broken.
2023-09-01 11:50:14 -07:00
Aart Bik
14c58cf5c3 [mlir][sparse] simplification of sparse runtime support lib
Incorporated two header files directly into other since
other parts were used (and it makes it hard to find the
definitions). Removed TODOs that are less likely to be done.

Reviewed By: Peiming

Differential Revision: https://reviews.llvm.org/D159330
2023-09-01 09:28:48 -07:00
Aart Bik
b86d3cbc12 [mlir][sparse] complete various FIXMEs in sparse support lib
Reviewed By: Peiming

Differential Revision: https://reviews.llvm.org/D159245
2023-08-30 21:30:25 -07:00
Peiming Liu
fa6726e27b [mlir][sparse] supports sparse_tensor.pack on libgen path
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D158012
2023-08-15 20:20:54 +00:00
Aart Bik
289f7231f9 [mlir][sparse][gpu] minor code cleanup for sparse gpu ops
Consistent order of ops and related methods.
Also, renamed SpGEMMGetSizeOp to SpMatGetSizeOp
since this is a general utility for sparse matrices,
not specific to GEMM ops only.

Reviewed By: Peiming

Differential Revision: https://reviews.llvm.org/D157922
2023-08-14 15:08:57 -07:00
Aart Bik
95a6c509c9 [mlir][sparse][gpu] add set csr pointers, remove estimate op, fix bugs
Rationale:
Since we only support default algorithm for SpGEMM, we can remove the
estimate op (for now at least). This also introduces the set csr pointers
op, and fixes a few bugs in the existing lowering for the SpGEMM breakdown.
This revision paves the way for actual recognition of SpGEMM in the sparsifier.

Reviewed By: K-Wu

Differential Revision: https://reviews.llvm.org/D157645
2023-08-10 13:52:47 -07:00
Aart Bik
e7e4ed0d7a [mlir][sparse][gpu] only support default algorithm for SpGEMM
Rationale:
This is the approach taken for all the others too (SpMV, SpMM, SDDMM),
so it is more consistent to follow the same path (until we have a need
for more algorithms). Also, in a follow up revision, this will allow
us to remove some unused GEMM ops.

Reviewed By: K-Wu

Differential Revision: https://reviews.llvm.org/D157542
2023-08-09 12:49:47 -07:00
Kun Wu
0664db5425 [mlir][sparse][gpu] fix spgemm runtime compile error
Differential Revision: https://reviews.llvm.org/D157349
2023-08-08 01:37:31 +00:00
Kun Wu
dfe2942909 [mlir][sparse][gpu] add spgemm operator
Differential Revision: https://reviews.llvm.org/D152981
2023-08-08 00:29:23 +00:00
Krzysztof Drewniak
2f8d8c78a8 Generalize finding the rocm_agent_enumberator program
On Fedora, rocminfo is a fedora package and rocm_agent_enumberator is
installed to /usr/bin.  This causes this error when building.

CMake Error at external/llvm-project/mlir/lib/ExecutionEngine/CMakeLists.txt:232 (message):
  Could not run rocm_agent_enumerator and ROCM_TEST_CHIPSET is not defined

So use find_program() to look for rocm_agent_enumerator instead of assuming a
single location.

Signed-off-by: Tom Rix <trix@redhat.com>

Reviewed By: krzysz00

Differential Revision: https://reviews.llvm.org/D156826
2023-08-02 20:35:54 +00:00
Guray Ozen
53881490c2 [mlir][cuda runtime] Set Max Dynamic Shared Memory Attribute
This works aims to address the issue related to larger shared memory usage in the MLIR CUDA runtime. Currently, when the shared memory usage exceeds 48KB, we need to set the CU_FUNC_ATTRIBUTE_MAX_DYNAMIC_SHARED_SIZE_BYTES attribute of the CUDA kernel appropriately. This work takes care of that by setting the attribute as required. Additionally, it includes some debug prints for better visibility and troubleshooting.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D156874
2023-08-02 14:18:59 +02:00
Guray Ozen
19b1107963 [mlir][gpu] Add debug print with environment value
This work introduces `MLIR_CUDA_DEBUG` environment value and `debug_print` function to be able to debug runtimes.

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D156232
2023-08-02 11:55:32 +02:00
Kun Wu
1e491c425b [mlir][sparse][gpu] add 2:4 spmm prune_and_check flag
Differential Revision: https://reviews.llvm.org/D155909
2023-08-01 18:24:18 +00:00
Guray Ozen
e56d6745f7 [mlir][nvgpu] Add tma.create.descriptor to create tensor map descriptor
The Op creates a tensor map descriptor object representing tiled memory region. The descriptor is used by Tensor Memory Access (TMA). The `tensor` is the source tensor to be tiled. The `boxDimensions` is the size of the tiled memory region in each dimension.

The pattern here lowers `tma.create.descriptor` to a runtime function call that eventually calls calls CUDA Driver's `cuTensorMapEncodeTiled`. For more information see below:
https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__TENSOR__MEMORY.html

Depends on D155453

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D155680
2023-07-21 11:33:04 +02:00
Aart Bik
4df01dc270 [mlir][sparse][gpu][nvidia] add pruning step and check to 2:4 matrix multiplication
(1) without the check, the results may silently be wrong, so check is needed
(2) add pruning step to guarantee 2:4 property

Note, in the longer run, we may want to split out the pruning step somehow,
or make it optional.

Reviewed By: K-Wu

Differential Revision: https://reviews.llvm.org/D155320
2023-07-14 12:08:13 -07:00
Aart Bik
97678cec1b [mlir][sparse][gpu] remove zero init memset
avoids quite a big memory fill for each setup

Reviewed By: K-Wu

Differential Revision: https://reviews.llvm.org/D155251
2023-07-13 18:22:21 -07:00
Aart Bik
86eff489e7 [mlir][sparse][gpu] force 16-byte alignment on data structs for cuSparseLt
Also makes some minor consistency edits in the cuSparseLt wrapper lib.

Reviewed By: Peiming, K-Wu

Differential Revision: https://reviews.llvm.org/D155139
2023-07-13 10:45:15 -07:00
Adrian Kuegel
f250fbcbbb [mlir] Apply ClangTidy fix (NFC)
The return statement is redundant.
2023-07-10 11:46:32 +02:00
Aart Bik
03125e6894 [mlir][sparse][gpu] fix missing dealloc
This dealloc was incorrectly removed in
https://reviews.llvm.org/D153173

Reviewed By: K-Wu

Differential Revision: https://reviews.llvm.org/D154564
2023-07-06 09:48:19 -07:00
Cullen Rhodes
fb27d542b7 [mlir-cpu-runner] Check entry function is void
Currently crashes if function isn't void when specifiying
'-entry-point-result=void'.

Reviewed By: jpienaar

Differential Revision: https://reviews.llvm.org/D154352
2023-07-04 07:25:16 +00:00
Kun Wu
be2dd22b8f [mlir][sparse][gpu] reuse CUDA environment handle throughout instance lifetime
Differential Revision: https://reviews.llvm.org/D153173
2023-06-30 21:52:34 +00:00
Kun Wu
7a3ebba9cb [mlir][sparse][gpu] Add explaining string to three static_assert stmts
Differential Revision: https://reviews.llvm.org/D154243
2023-06-30 14:10:45 -05:00
Job Noorman
8de9f2b558 Move SubtargetFeature.h from MC to TargetParser
SubtargetFeature.h is currently part of MC while it doesn't depend on
anything in MC. Since some LLVM components might have the need to work
with target features without necessarily needing MC, it might be
worthwhile to move SubtargetFeature.h to a different location. This will
reduce the dependencies of said components.

Note that I choose TargetParser as the destination because that's where
Triple lives and SubtargetFeatures feels related to that.

This issues came up during a JITLink review (D149522). JITLink would
like to avoid a dependency on MC while still needing to store target
features.

Reviewed By: MaskRay, arsenm

Differential Revision: https://reviews.llvm.org/D150549
2023-06-26 11:20:08 +02:00
Mitch Phillips
3c3ebebca2 Revert "[mlir][RunnerUtils] Make symbols private + implement loading mechanism."
This reverts commit bba2b656110209a3d9863b92c060082479b06ab1.

Reason: Broke the HWASan buildbot. See https://reviews.llvm.org/D153250
for more information.
2023-06-22 01:01:19 +02:00
Mehdi Amini
7b4ea67f31 Revert "[mlir][CRunnerUtils] Use explicit execution engine symbol registration."
This reverts commit 9119325a5666e557a19f38a05525578b556c215b.

A buildbot is broken, probably because of this change breaking the
SHARED_LIBS=ON build more.
2023-06-21 17:50:18 +02:00
Ingo Müller
9119325a56 [mlir][CRunnerUtils] Use explicit execution engine symbol registration.
As a follow up of https://reviews.llvm.org/D153250, this path uses the
explicit symbol registration mechanism of the execution engine in the
CRunnerUtils library.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D153354
2023-06-21 09:44:10 +00:00
Ingo Müller
bba2b65611 [mlir][RunnerUtils] Make symbols private + implement loading mechanism.
There are two ways to make symbols from a shared library visible in the
execution engine: exporting the symbols with public visibility or
implementing a loading/unloading mechansim that registers the exported
symbols explicitly. The latter has only been available in the JIT runner
until recently, but https://reviews.llvm.org/D153029 makes it available
in any usage of the execution engine (including the Python bindings).

This patch makes the runner utils library use the latter mechanism
instead of the former, i.e., it makes all of its symbols private and
implements the init/destroy functions of the loading mechanism to
control explicitly which symbols it registers.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D153250
2023-06-20 19:28:33 +00:00
Ingo Müller
f9bce19e2e [mlir][async] Mark exported symbols of runtime lib as visible.
The async runtime library explicitly registers the symbols it exports
with the loading mechanism of the execution engine. This even works even
though these symbols were marked as hidden in the library. However, if
used outside the execution engine, such as with `lli --dlopen` or if AOT
compiled, these hidden symbols would not be found. This patch thus marks
all symbols that are part of the API as visible.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D153348
2023-06-20 19:27:47 +00:00
Kun Wu
632ccc538c [mlir][sparse][gpu] remove tuple as one of the spmm_buffer_size output type
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D153188
2023-06-19 15:57:50 +00:00
Ingo Müller
0eb0fecbc5 [mlir][ExecutionEngine] Only load JITDyLibs without init/destroy funcs.
In https://reviews.llvm.org/D153029, I moved the loading/unloading
mechanisms of shared libraries from the JIT runner to the execution
engine in order to make that mechanism available in the latter
(including its Python bindings). However, I realized that I introduced a
small change in semantic: previously, the JIT runner checked for the
presence of init/destroy functions and only loaded the library as
JITDyLib if they were not present. After I moved the code, all libraries
were loaded as JITDyLib, even if they registered their symbols
explicitly in their init function. I am not sure if this is really a
problem but (1) the previous behavior was different and (2) I guess it
could cause a problem if some symbols are exported through the init
function *and*  have public visibility. This patch reestablishes the
original behaviour in the new place of the code.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D153249
2023-06-19 07:38:51 +00:00
Ingo Müller
530db6a3b4 [mlir][async] Update comments about library registration. (NFC)
This updates the code comments about the library registration mechanism,
which changed in https://reviews.llvm.org/D153029, and which should have
updated as part of that patch.

Reviewed By: ingomueller-net

Differential Revision: https://reviews.llvm.org/D153147
2023-06-16 18:03:53 +00:00
Ingo Müller
0b3841eb97 [mlir] Move symbol loading from mlir-cpu-runner to ExecutionEngine.
Both the mlir-cpu-runner and the execution engine allow to provide a
list of shared libraries that should be loaded into the process such
that the jitted code can use the symbols from those libraries. The
runner had implemented a protocol that allowed libraries to control
which symbols it wants to provide in that context (with a function
called __mlir_runner_init). In absence of that, the runner would rely on
the loading mechanism of the execution engine, which didn't do anything
particular with the symbols, i.e., only symbols with public visibility
were visible to jitted code.

Libraries used a mix of the two mechanisms: while the runner utils and C
runner utils libs (and potentially others) used public visibility, the
async runtime lib (as the only one in the monorepo) used the loading
protocol. As a consequence, the async runtime library could not be used
through the Python bindings of the execution engine.

This patch moves the loading protocol from the runner to the execution
engine. For the runner, this should not change anything: it lets the
execution engine handle the loading which now implements the same
protocol that the runner had implemented before. However, the Python
binding now get to benefit from the loading protocol as well, so the
async runtime library (and potentially other out-of-tree libraries) can
now be used in that context.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D153029
2023-06-16 14:50:14 +00:00
Kun Wu
9167dd46ba [mlir][sparse][gpu] recognizing sddmm pattern in GPU libgen path
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D151582
2023-06-15 23:48:11 +00:00
Kun Wu
ac30f48e37 [mlir][sparse][gpu]fix various cusparseLt bugs
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D152489
2023-06-12 23:48:49 +00:00
Navdeep Katel
18cc07aa07 [MLIR][GPU] Add 16-bit version of cudaMemset in cudaRuntimeWrappers
Add 16-bit version of cudaMemset in cudaRuntimeWrappers and update the GPU to LLVM lowering.

Reviewed By: bondhugula

Differential Revision: https://reviews.llvm.org/D151642
2023-06-08 17:33:26 +05:30
Aart Bik
50db4789a8 [mlir][sparse][gpu] refined build setup for cusparse
Reviewed By: K-Wu

Differential Revision: https://reviews.llvm.org/D152387
2023-06-07 11:09:22 -07:00
Kun Wu
8ed59c53de [mlir][sparse][gpu] add sm8.0+ tensor core 2:4 sparsity support
Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D151775
2023-06-06 23:13:21 +00:00
Aart Bik
9fc02a7a08 [mlir][sparse][gpu] add AoS COO support to cuSPARSE
Even though this feature was deprecated in release 11.2,
any library before this version still supports the feature,
which is why we are making it available under a macro.

Reviewed By: K-Wu

Differential Revision: https://reviews.llvm.org/D152290
2023-06-06 12:32:46 -07:00
Mehdi Amini
53a5c3ab4d Fix MLIR build with Shared lib enabled
db7cc0348c9a0 added OpenMP support to the execution engine but didn't
add the explicit CMake dependency.
2023-06-05 12:10:24 -07:00
Kun Wu
7e44f0736a [mlir][gpu][sparse] fix broken type in cusparseCreateCsr
Differential Revision: https://reviews.llvm.org/D151912
2023-06-01 18:06:09 +00:00
Kun Wu
be6c532005 [mlir][sparse][gpu] fixing broken literal names in cuda runner macros
Differential Revision: https://reviews.llvm.org/D151910
2023-06-01 17:52:58 +00:00
Kun Wu
cc402de0b1 [mlir][sparse][gpu] add result type to spmv and spmm gpu libgen path
Differential Revision: https://reviews.llvm.org/D151592
2023-06-01 17:17:40 +00:00
Mehdi Amini
b936816fb3 MLIR/Cuda: Add the appropriate "HINTS" on CMake find_library and mark these REQUIRED
The cmake logic to find cuda paths exposes some paths to search for the cuda
library, we need to propagate this through the call for find_library.
This was already done for cuSparse but not for cuda.

Differential Revision: https://reviews.llvm.org/D151645
2023-05-29 14:32:24 -07:00
Uday Bondhugula
53be2e0f59 [MLIR] NFC. JitRunner - use range-based for
Address clang-tidy warning in JitRunner.cpp. Use range-based for.
2023-05-29 01:54:50 +05:30
Aart Bik
752c04777f [mlir][sparse][gpu] fix merge conflict
Reviewed By: K-Wu

Differential Revision: https://reviews.llvm.org/D151619
2023-05-27 13:42:20 -07:00
Kun Wu
cf44847b4d [mlir][gpu][sparse] adding cusparse sddmm support
Differential Revision: https://reviews.llvm.org/D151279
2023-05-27 20:01:41 +00:00
Aart Bik
74e29d3715 [mlir][sparse][gpu] fix merge conflict
Reviewed By: Peiming

Differential Revision: https://reviews.llvm.org/D151574
2023-05-26 11:00:20 -07:00