Commit Graph

17439 Commits

Author SHA1 Message Date
Matthias Springer
5958043e2d
[mlir][bufferization] Add dump_alias_sets option to transform op (#68289)
Add `dump_alias_sets` to `transform.bufferization.one_shot_bufferize`.
This option is useful for debugging. Also improve the verifier to ensure
that `test_analysis_only` is set when other debugging flags are enabled.
2023-10-05 14:05:45 +02:00
Kohei Yamaguchi
777a6e6f10
[mlir][docs] Cleanup documentations [NFC] (#67945)
- Fix missing links
- Fix missing link format
- Move transform::ApplyFuncToLLVMConversionPatternOp into Transform
dialect
- Remove duplicated MemRef's TOC
- Remove duplicated Memref's dma_start/dma_wait docs
2023-10-05 13:33:41 +02:00
long.chen
5979e1dfb1
[mlir] Fix empty-tensor-elimination around self-copies (#68129)
* Fixes #67977, a crash in `empty-tensor-elimination`.
* Also improves `linalg.copy` canonicalization.
* Also improves indentation indentation in `mlir-linalg-ods-yaml-gen.cpp`.
2023-10-05 12:04:20 +02:00
tdanyluk
a608830807
[mlir] Speed up FuncToLLVM using a SymbolTable (#68082)
We have a project where this saves 23% of the compilation time.

This means using hashmaps instead of searching in linked lists.
2023-10-05 11:24:52 +02:00
Guray Ozen
d20fbc9007
[MLIR][NVGPU] Introduce nvgpu.wargroup.mma.store Op for Hopper GPUs (#65441)
This PR introduces a new Op called `warpgroup.mma.store` to the NVGPU
dialect of MLIR. The purpose of this operation is to facilitate storing
fragmanted result(s) `nvgpu.warpgroup.accumulator` produced by
`warpgroup.mma` to the given memref.

An example of fragmentated matrix is given here :

https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#wgmma-64n16-d

The `warpgroup.mma.store` does followings:
1) Takes one or more `nvgpu.warpgroup.accumulator` type (fragmented
results matrix)
2) Calculates indexes per thread in warp-group and stores the data into
give memref.

Here's an example usage:
```
// A warpgroup performs GEMM, results in fragmented matrix
%result1, %result2 = nvgpu.warpgroup.mma ...

// Stores the fragmented result to memref
nvgpu.warpgroup.mma.store [%result1, %result2], %matrixD : 
    !nvgpu.warpgroup.accumulator< fragmented = vector<64x128xf32>>,
    !nvgpu.warpgroup.accumulator< fragmented = vector<64x128xf32>> 
    to memref<128x128xf32,3>
```
2023-10-05 10:54:13 +02:00
Guray Ozen
b74cfc139a
[mlir][nvgpu] Improve nvgpu->nvvm transformation of warpgroup.mma Op (NFC) (#67325)
This PR introduces substantial improvements to the readability and
maintainability of the `nvgpu.warpgroup.mma` Op transformation from
nvgpu->nvvm. This transformation plays a crucial role in GEMM and
manages complex operations such as generating multiple wgmma ops and
iterating their descriptors. The prior code lacked clarity, but this PR
addresses that issue effectively.

**PR does followings:**
**Introduces a helper class:** `WarpgroupGemm` class encapsulates the
necessary functionality, making the code cleaner and more
understandable.

**Detailed Documentation:** Each function within the helper class is
thoroughly documented to provide clear insights into its purpose and
functionality.
2023-10-05 10:16:59 +02:00
Guray Ozen
7eb2b99f16
[mlir] Change the class name of the GenerateWarpgroupDescriptor (#68286) 2023-10-05 10:15:40 +02:00
Nicolas Vasilache
cc2d9515d0 [mlir][Transform] NFC - Fix missing field in copy constructor 2023-10-05 07:40:35 +00:00
Guray Ozen
6dc7717bca
[MLIR][NVGPU] Change name wgmma.descriptor to warpgroup.descriptor (NFC) (#67526)
NVGPU dialect is gaining large support for warpgroup level operations,
and their names always starts with `warpgroup....`.

This PR changes name of Op and type from `wgmma.descriptor` to
`warpgroup.descriptor` for sake of consistency.
2023-10-05 09:01:48 +02:00
MaheshRavishankar
f28f09dcf0
[mlir][Vector] Add Broadcast -> CastOp reordering to SinkVectorBroadcasting patterns. (#68257)
Also fix an issue with sink broadcast across elementwise where
`arith.cmpf` is elementwise, but result type is different. The result
type is not same as the operand type, creating illegal IR.
Similar issue with `vector.fma` which only accepts vector operand types,
while broadcasts can have scalar sources. Sinking broadcast across would
result in an illegal `vector.fma` (with scalar operands).
2023-10-04 21:27:24 -07:00
Maksim Levental
6f44f87011
[mlir][python] Enable py312. (#68009)
Python 3.12 has been released so why not support it.
2023-10-04 20:35:24 -05:00
Yinying Li
6280e23124
[mlir][sparse] Print new syntax (#68130)
Printing changes from `#sparse_tensor.encoding<{ lvlTypes = [
"compressed" ] }>` to `map = (d0) -> (d0 : compressed)`. Level
properties, ELL and slice are also supported.
2023-10-04 16:36:05 -04:00
Aart Bik
1964118ace
[mlir][sparse] fix codegen header ordering of methods into sections (#68175) 2023-10-04 08:52:26 -07:00
Andrzej Warzynski
8d6d4f8321 [mlir][ArmSME] Split the Op definition (nfc) (#67985)
Move the definitions of LLVM intrinsic Ops for ArmSME into a dedicated
file. To facilitate this, the dialect definition together with various
shared definitions are moved to ArmSME.td.

This change will allow us to refactor the ArmSME dialect documentation.
In particular, we will be able to categorise the Ops into "regular"  and
"intrinsic" ops. Also, it will be easier to add some custom
documentation as opposed to relying on auto-generated docs that simply
list the available Ops.

The documentation will be updated in a forthcoming patch. Only
non-functional changes.
2023-10-04 14:59:00 +00:00
Benjamin Maxwell
496318ad8d
[mlir][ArmSME] Lower vector.extract/insert on SME tiles to MOVA intrinsics (#67786)
This patch adds support for lowering vector.insert/extract of tile
slices or elements to ArmSME MOVA intrinsics.

This enables the following operations for ArmSME:
```
// Extract slice from tile:
%slice = vector.extract %tile[%row] 
                 : vector<[4]xi32> from vector<[4]x[4]xi32>
```
```
// Extract element from tile:
%el = vector.extract %tile[%row, %col]
                 : i32 from vector<[4]x[4]xi32>
```
```
// Insert slice into tile:
%new_tile = vector.insert %slice, %tile[%row]
                    : vector<[4]xi32> into vector<[4]x[4]xi32>
```
```
// Insert element into tile;
%new_tile = vector.insert %el, %tile[%row, %col]
                    : i32 into vector<[4]x[4]xi32>
```
2023-10-04 09:28:39 +01:00
Ingo Müller
9748f98116
[mlir][transform] Make variable names in interpreter consistent. (NFC) (#67800)
This commit renames the arguments of several static implementation
functions of the transform interpreter base class to match the names of
the corresponding member variables in order to clarify their intent.
Similarly, it renames some local variables to reflect their relationship
with corresponding member variables. Finally, this commit also asserts
in `interpreterBaseRunOnOperationImpl` that at most one of shared and
library module are set (which the initialization function guarantees)
and simplifies some related `if` conditions.
2023-10-04 09:53:48 +02:00
Guray Ozen
afe400620f
[MLIR] Use test-lower-to-nvvm for sm_90 Integration Tests on GitHub (#68184)
This PR enables `test-lower-to-nvvm` pass pipeline for the integration
tests for NVIDIA sm_90 architecture.

This PR adjusts `test-lower-to-nvvm` pass in two ways: 

1) Calls `createConvertNVGPUToNVVMPass` before the outlining process.
This particular pass is responsible for generating both device and host
code. On the host, it calls the CUDA driver to build the TMA descriptor
(`cuTensorMap`).

2) Integrates the `createConvertNVVMToLLVMPass` to generate PTXs for
NVVM Ops.
2023-10-04 09:50:48 +02:00
Guray Ozen
9d54ae862a
[mlir] Add opt-level to test-lower-to-nvvm Pipeline (#68183)
This PR adds the `opt-level` parameter to control code optimization for
NVIDIA GPU targets in the `test-lower-to-nvvm` pipeline.
2023-10-04 09:25:53 +02:00
Matthias Springer
8823e961f6
[mlir][ODS] Change get...Mutable to return OpOperand & for single operands (#66519)
The TableGen code generator now generates C++ code that returns a single
`OpOperand &` for `get...Mutable` of operands that are not variadic and
not optional. `OpOperand::set`/`assign` can be used to set a value (same
as `MutableOperandRange::assign`). This is safer than
`MutableOperandRange` because only single values (and no longer
`ValueRange`) can be assigned.

E.g.:
```
// Assignment of multiple values to non-variadic operand.
// Before: Compiles, but produces invalid op.
// After: Compilation error.
extractSliceOp.getSourceMutable().assign({v1, v2});
```
2023-10-04 08:35:40 +02:00
Kazu Hirata
3b34c117db [llvm] Remove unused using decls (NFC)
Identified with misc-unused-using-decls.
2023-10-03 23:21:50 -07:00
Mehdi Amini
5fc28e7a8d
Improve MLIR Attribute::get() method efficiency by reducing the amount of argument copies (#68067)
This ensures that the proper std::forward/std::move are involved, we go from 6
copy-constructions to 0 (!) on Attribute creation in release builds.
2023-10-03 18:07:46 -07:00
Aart Bik
427f120f60
[mlir][sparse] minor edits in runtime lib Cpp files (#68165) 2023-10-03 16:28:54 -07:00
Peiming Liu
0083f8338c
[mlir][sparse] renaming sparse_tensor.sort_coo to sparse_tensor.sort (#68161)
Rationale: the operation does not always sort COO tensors (also used for
sparse_tensor.compress for example).
2023-10-03 16:28:25 -07:00
Shraiysh
f7d4f863be
[mlir][OpenMP] Added translation for omp.teams to LLVM IR (#68042)
This patch adds translation from `omp.teams` operation to LLVM IR using
OpenMPIRBuilder. The clauses are not handled in this patch.
2023-10-03 17:36:25 -05:00
Aart Bik
ddb86a378d
[mlir][sparse] removed unused using clauses in support lib (#68148) 2023-10-03 12:43:51 -07:00
Mehdi Amini
b82f3747da Revert "Minor fixes on the MLIR ActionProfiler (NFC)"
This reverts commit 0502d83470.

This introduces a race condition on the printComma variable.
2023-10-03 12:15:35 -07:00
Aart Bik
980e024bf7
[mlir][sparse] minor edits to support lib files (#68137) 2023-10-03 11:40:54 -07:00
Jacques Pienaar
0a53005113
[mlir] Avoid common folder assuming all types are supported (#68054)
Previously this would just assume all conversions are possible and this
would crash. Use an in-tree testing case here.

---------

Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>
2023-10-03 10:41:58 -07:00
Peiming Liu
c3b01b4679
[mlir][sparse] unify lib/codegen rewriting rules for sparse tensor concatenation. (#68057) 2023-10-03 08:46:25 -07:00
Matthias Springer
173fd67a12
[mlir][scf][bufferize] Improve bufferization of allocs yielded from scf.for (#68089)
The `BufferizableOpInterface` implementation of `scf.for` currently
assumes that an OpResult does not alias with any tensor apart from the
corresponding init OpOperand. Newly allocated buffers (inside of the
loop) are also allowed. The current implementation checks whether the
respective init_arg and yielded value are equivalent. This is overly
strict and causes extra buffer allocations/copies when yielding a new
buffer allocation from a loop.
2023-10-03 16:08:50 +02:00
Matthias Springer
464dfeba44
[mlir][tensor][bufferize] tensor.empty bufferizes to an allocation (#68080)
Make `tensor.empty` bufferizable, so that the
`-empty-tensor-to-alloc-tensor` pass becomes optional. This makes the
bufferization easier to use. `tensor.empty` used to be non-bufferizable,
so that there two separate ops, one that can be optimized away
(`tensor.empty`) and one that is guaranteed to bufferize to an
allocation (`bufferization.alloc_tensor`). With the recent improvements
of "empty tensor elimination" this is no longer needed and
`bufferization.alloc_tensor` can be phased out.
2023-10-03 16:00:37 +02:00
agozillon
1482106c99
[Flang][OpenMP][MLIR] Remove deletion of unused declare target global after use replacement (#67762)
At the moment, for device a reference pointer is generated in place of
the original declare target global value, this reference pointer is the
pointer that actually receives the data. In Clang the original global
value isn't generated for device, just the reference pointer.

Unfortunately for Flang/MLIR this is currently not the case, as the
declare target attribute is processed after the creation of the global
so we end up with a dead global on device effectively after rewriting
its uses to the new device reference pointer.

It appears I was a little overzealous with the deletion of the declare
target globals for device. The current method breaks in-cases where the
same declare target global is used across two target regions (added a
runtime reproduced in the patch). As it'll effectively delete it before
the second target gets a chance to be written to LLVM IR and have it's
uses rewritten .

I'd like to remove this deletion as the dead global isn't breaking any
code and will likely be removed in later dead code elimination passes,
perhaps a little too heavy handed with the original approach.
2023-10-03 15:21:27 +02:00
Oleksandr "Alex" Zinenko
bc30b415ca
[mlir] enable python bindings for nvgpu transforms (#68088)
Expose the autogenerated bindings.

Co-authored-by: Martin Lücke <mluecke@google.com>
2023-10-03 14:52:52 +02:00
Oleksandr "Alex" Zinenko
f89b7a9ee2
[mlir] reword error message on unloaded dialect (#67980)
The previous message was confusing as it mentioned "registration" but
isn't in fact related to dialect registration. Use other words instead.
2023-10-03 11:28:32 +02:00
Guray Ozen
ee49cda7d4
[mlir][nvgpu] Use ImplicitLocOpBuilder in nvgpu-to-nvvm pass (NFC) (#67993)
For the sake of better readability, this PR uses `ImplicitLocOpBuilder`
instead of rewriter+loc
2023-10-03 10:52:36 +02:00
Jacques Pienaar
f1dbfcc14d [mlir][py] Use overloads instead (NFC)
Was using a local, pseudo overload rather than just using an overload proper.
2023-10-02 21:17:49 -07:00
Mehdi Amini
0502d83470 Minor fixes on the MLIR ActionProfiler (NFC)
Ensure the stream flushed to the string before acquiring the mutext.
No need to flush the output stream, the goal of the mutex is to sync
ahead before content is added to the stream.
2023-10-02 20:11:18 -07:00
Antonio Cortes Perez
f33afea260
[mlir] Add an Observer for profiling actions to a stream. (#67251)
The profile is stored in the Chrome trace event format.

Added the --profile-action-to=<file> option to mlir-opt.
2023-10-02 20:07:10 -07:00
Peiming Liu
bc878f70fc
[mlir][sparse] unify lib/codegen rewriting rules for sparse tensor re… (#68049)
…shaping operations.
2023-10-02 17:06:22 -07:00
Mehdi Amini
b97aaa72d9 Remove let construct = from ArithExpandOpsPass definition (NFC)
Note that the `Pass` suffix is added in tablegen, and as a side effect the
options are renamed from `ArithExpandOpsOptions` to `ArithExpandOpsPassOptions`.
2023-10-02 15:54:22 -07:00
Mehdi Amini
b1c10dfd72 Fixup on ArithBufferizePass: add the Pass suffix in TableGen to ensure consitency of the generated code 2023-10-02 15:50:41 -07:00
Mehdi Amini
c1c56ae49e Remove let constructor = from ArithBufferizePass and rely on TableGen to generate the glue (NFC) 2023-10-02 15:41:16 -07:00
Georgios Pinitas
363c617aac
[mlir][tosa] Align shift attribute of TOSA_MulOp with the spec (#67816)
According to specification the `shift` attribute of the Mul operator in
TOSA is of signless i8 type instead of i32.

Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com>
2023-10-02 15:02:16 -07:00
Maksim Levental
d7e49736e6
[mlir][CAPI, python bindings] Expose Operation::setSuccessor (#67922)
This is useful for emitting (using the python bindings) `cf.br` to
blocks that are declared lexically post block creation.
2023-10-02 15:37:25 -05:00
Andrzej Warzynski
811b05c4ef [mlir][ArmSME] Remove dependency on a non-existing target (nfc)
Sending this one without a review - as `MLIRArmSMEIncGen` is not defined
anywhere, dependency on that target is clearly bogus.
2023-10-02 20:34:48 +00:00
Yinying Li
2cb99df609
[mlir][sparse] Fix typos (#67859) 2023-10-02 11:07:38 -04:00
Yinying Li
d2e8517912
[mlir][sparse] Update Enum name for CompressedWithHigh (#67845)
Change CompressedWithHigh to LooseCompressed.
2023-10-02 11:06:40 -04:00
Oleksandr "Alex" Zinenko
aab795a8dc
[mlir] run buffer deallocation in transform tutorial (#67978)
Buffer deallocation pipeline previously was incorrect when applied to
functions. It has since been fixed. Make sure it is exercised in the
tutorial to avoid leaking allocations.
2023-10-02 16:08:11 +02:00
Matthias Springer
c95fcd343d
[mlir][bufferization] Remove resolveUsesInRepetitiveRegions (#67927)
The bufferization analysis has been improved over the last months and
this workaround is no longer needed.
2023-10-02 16:04:27 +02:00
Matthias Springer
43198b0aa2
[mlir][bufferization] Better analysis around allocs and block arguments (#67923)
Values that are the result of buffer allocation ops are guaranteed to
*not* be the same allocation as block arguments of containing blocks.
This fact can be used to allow for more aggressive simplification of
`bufferization.dealloc` ops.
2023-10-02 11:01:12 +02:00