9493 Commits

Author SHA1 Message Date
Nicolas Vasilache
02d27eac0f [mlir][Linalg] NFC - Simplify transform script 2023-09-14 15:32:35 +02:00
Matthias Springer
4f63252d5d
[mlir][transform] Fix crash when op is erased during transform.foreach (#66357)
Fixes a crash when an op, that is mapped to handle that a
`transform.foreach` iterates over, was erased (through the
`TrackingRewriter`). Erasing an op removes it from all mappings and
invalidates iterators. This is already taken care of when an op is
iterating over payload ops in its `apply` method, but not when another
transform op is erasing a tracked payload op.
2023-09-14 14:59:36 +02:00
Martin Erhart
942ce31985
[mlir][bufferization] BufferDeallocationOpInterface: support custom ownership update logic (#66350)
Add a method to the BufferDeallocationOpInterface that allows operations to implement the interface and provide custom logic to compute the ownership indicators of values it defines. As a demonstrating example, this new method is implemented by the `arith.select` operation.
2023-09-14 14:34:04 +02:00
Martin Erhart
8160bce969
[mlir][bufferization][NFC] Introduce BufferDeallocationOpInterface (#66349)
This new interface allows operations to implement custom handling of ownership values and insertion of dealloc operations which is useful when an op cannot implement the interfaces supported by default by the buffer deallocation pass (e.g., because they are not exactly compatible or because there are some additional semantics to it that would render the default implementations in buffer deallocation invalid, or because no interfaces exist for this
kind of behavior and it's not worth introducing one plus a default implementation in buffer deallocation). Additionally, it can also be used to provide more efficient handling for a specific op than the interface based default
implementations can.
2023-09-14 13:58:30 +02:00
Felix Schneider
c336a06144 [mlir] [memref] Fix alignment bug in memref.copy lowering
memref.copy gets lowered to a function call sometimes, this function
is passed the element size of the memref in bytes as an argument.
The element size passed to the copyMemRef() function call can be
miscalculated if the LLVM IR uses aligned access to the memory.

This can be fixed by using llvm.getelementptr to calculate the element
size natively. This is also done in the other lowering path that lowers
to an intrinsic.

Fix https://github.com/llvm/llvm-project/issues/64072

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D156126
2023-09-14 13:18:12 +02:00
Felix Schneider
55088efe06 [mlir][memref] Fix MemrefToLLVM lowering pattern for memref.transpose
The lowering pattern to LLVM for memref.transpose has a bug where
instead of transposing from (source) -> (dest) it actually transposes
(dest) -> (source). This patch fixes the bug and updates the test.

Fix https://github.com/llvm/llvm-project/issues/65145

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D159290
2023-09-14 13:12:55 +02:00
Martin Erhart
01334d1abb
[mlir][bufferization] Add an ownership based buffer deallocation pass (#66337)
Add a new Buffer Deallocation pass with the intend to replace the old
one. For now it is added as a separate pass alongside in order to allow
downstream users to migrate over gradually. This new pass has the goal
of inserting fewer clone operations and supporting additional use-cases.
Please refer to the Buffer Deallocation section in the updated
Bufferization.md file for more information on how this new pass works.
2023-09-14 12:13:37 +02:00
Sergio Afonso
9058762789
[OpenMP][Flang][MLIR] Lowering of requires directive from MLIR to LLVM IR
Default atomic ordering information is processed in the OpenMP dialect
to LLVM IR lowering stage at every spot where an operation can be
affected by it. The rest of clauses are stored globally in the
OpenMPIRBuilderConfig object before starting that lowering stage, so
that the OMPIRBuilder can conditionally modify code generation
depending on these. At the end of the process, the omp.requires
attribute is itself lowered into a global constructor that passes these
clauses as flags to the OpenMP runtime.

Depends on D147217, D147218 and D158278.

Differential Revision: https://reviews.llvm.org/D147219
2023-09-14 10:35:44 +01:00
vic
a9d0f5e2f0
[mlir] Allow loop-like operations in AbstractDenseForwardDataFlowAnalysis (#66179)
Remove assertion violated by loop-like operations.

Signed-off-by: Victor Perez <victor.perez@codeplay.com>
2023-09-14 10:30:40 +02:00
Cullen Rhodes
f75d46a7ec
[mlir][ArmSME] Lower vector.outerproduct to FMOPA/BFMOPA (#65621)
This patch adds support for lowering vector.outerproduct to the ArmSME
MOPA intrinsic for the following types:

  vector<[8]xf16>,  vector<[8]xf16>  -> vector<[8]x[8]xf16>
  vector<[8]xbf16>, vector<[8]xbf16> -> vector<[8]x[8]xbf16>
  vector<[4]xf32>,  vector<[4]xf32>  -> vector<[4]x[4]xf32>
  vector<[2]xf64>,  vector<[2]xf64>  -> vector<[2]x[2]xf64>

The FP variants are lowered to FMOPA (non-widening) [1] and BFloat to
BFMOPA
(non-widening) [2].

Note at the ISA level these variants are implemented by different
architecture features, these are listed below:

  FMOPA (non-widening)
    * half-precision   - +sme2p1,+sme-f16f16
    * single-precision - +sme
    * double-precision - +sme-f64f64
  BFMOPA (non-widening)
    * half-precision   - +sme2p1,+b16b16

There's currently no way to target different features when lowering to
ArmSME. Integration tests are added for F32 and F64. We use QEMU to run
the integration tests but SME2 support isn't available yet, it's
targeted for 9.0, so integration tests for these variants excluded.

Masking is currently unsupported.

Depends on #65450.

[1] https://developer.arm.com/documentation/ddi0602/2023-06/SME-Instructions/FMOPA--non-widening---Floating-point-outer-product-and-accumulate-
[2] https://developer.arm.com/documentation/ddi0602/2023-06/SME-Instructions/BFMOPA--non-widening---BFloat16-floating-point-outer-product-and-accumulate-
2023-09-14 08:31:52 +01:00
Aart Bik
0f65df732c
[mlir][sparse] remove the MLIR PyTACO tests (#66302)
Rationale:

This test was really fun to compare the MLIR sparsifier with TACO using
the PyTACO format. However, the underlying mechanism is rapidly growing
outdated with our recent developments. Rather than maintaining the old
code, we are moving toward the newer, better approaches. So if you are
sad this is gone, stay tuned, something better is coming!
2023-09-13 15:54:49 -07:00
Daniil Dudkin
8f5d519458 [mlir][vector] Implement Workaround Lowerings for Masked fm**imum Reductions
This patch is part of a larger initiative aimed at fixing floating-point `max` and `min` operations in MLIR: https://discourse.llvm.org/t/rfc-fix-floating-point-max-and-min-operations-in-mlir/72671.

Within LLVM, there are no masked reduction counterparts for vector reductions such as `fmaximum` and `fminimum`.
More information can be found here: https://github.com/llvm/llvm-project/issues/64940#issuecomment-1690694156.

To address this issue in MLIR, where we need to generate appropriate lowerings for these cases, we employ regular non-masked intrinsics.
However, we modify the input vector using the `arith.select` operation to effectively deactivate undesired elements using a "neutral mask value".
The neutral mask value is the smallest possible value for the `fmaximum` reduction and the largest possible value for the `fminimum` reduction.

Depends on D158618

Reviewed By: dcaballe

Differential Revision: https://reviews.llvm.org/D158773
2023-09-13 22:49:08 +00:00
Daniil Dudkin
709b27427b [mlir][vector] Bring back maxf/minf reductions
This patch is part of a larger initiative aimed at fixing floating-point `max` and `min` operations in MLIR: https://discourse.llvm.org/t/rfc-fix-floating-point-max-and-min-operations-in-mlir/72671.

In line with the mentioned RFC, this patch  tackles tasks 2.3 and 2.4.
It adds LLVM conversions for the `maxf`/`minf` reductions to the non-NaN-propagating LLVM intrinsics.

Depends on D158618

Reviewed By: dcaballe

Differential Revision: https://reviews.llvm.org/D158659
2023-09-13 22:49:07 +00:00
Daniil Dudkin
4a831250b8 [mlir][vector] Rename vector reductions: maxfmaximumf, minfminimumf
This patch is part of a larger initiative aimed at fixing floating-point `max` and `min` operations in MLIR: https://discourse.llvm.org/t/rfc-fix-floating-point-max-and-min-operations-in-mlir/72671.

Here, we are addressing task 2.1 from the plan, which involves renaming the vector reductions to align with the semantics of the corresponding LLVM intrinsics.

Reviewed By: dcaballe

Differential Revision: https://reviews.llvm.org/D158618
2023-09-13 22:49:07 +00:00
Aart Bik
9918d2556c
[mlir][sparse] remove sparse output python example (#66298)
Rationale:
This was actually just a pure "string based" test
with very little actual python usage. The output
sparse tensor was handled via the deprecated
convertFromMLIRSparseTensor method.
2023-09-13 15:11:35 -07:00
Daniil Dudkin
c46a04339a
[mlir][arith] Rename AtomicRMWKind's maxfmaximumf, minfminimumf (#66135)
This patch is part of a larger initiative aimed at fixing floating-point
`max` and `min` operations in MLIR:
https://discourse.llvm.org/t/rfc-fix-floating-point-max-and-min-operations-in-mlir/72671.

This commit renames `maxf` and `minf` enumerators of `AtomicRMWKind`
to better reflect the current naming scheme and the goals of the RFC.
2023-09-14 01:09:37 +03:00
Markus Böck
be59265bbd
[mlir] Make extraClassOf compile with attribute and type interfaces (#66292)
Using `extraClassOf` currently does not work with attribute or type
interfaces as the generated code calls `getInterfaceFor`, a private
method of `AttributeInterface` and `TypeInterface` respectively.

This PR fixes this by applying the same change that has been done to
`OpInterface` in the past: Make `getInterfaceFor` a protected member of
the class, allowing the generated code in subclasses to use it.

An attribute and type interface with `extraClassOf` have been added as
interfaces in the test dialect to ensure it compiles without errors.
2023-09-13 23:38:39 +02:00
Jakub Kuderski
d6d4a526f4
[mlir][spirv][gpu] Clean up wmma to coop matrix NV conversion. NFC. (#66278)
This is a cleanup in preparation for adding a second conversion path
using the KHR cooperative matrix extension.

Make the existing lowering explicit about emitting ops from the NV coop
matrix extension. Clean up surrounding code.
2023-09-13 15:51:26 -04:00
Peiming Liu
098f46dce3
[sparse] allow unpack op to return 0-ranked tensor type. (#66269)
Many frontends canonicalize scalar into 0-ranked tensor, it change will
hopefully make the operation easier to use for those cases.
2023-09-13 11:33:01 -07:00
frgossen
1cddbf8cf5
Revert Add host-supports-nvptx requirement to lit tests (#66102 and #66129) (#66225) 2023-09-13 12:20:38 -04:00
Krzysztof Drewniak
df852599f3 [mlir] Split up VectorToLLVM pass
Currently, the VectorToLLVM patterns are built into a library along
with the corresponding pass, which also pulls in all the
platform-specific vector dialects (like AMXDialect) to apply all the
vector to LLVM conversions.

This causes dependency bloat when writing libraries - for example the
GPU to LLVM passes, which use the vector to LLVM patterns, don't need
the X86Vector dialect to be present at all.

This commit partitions the library into VectorToLLVM and
VectorToLLVMPass, where the latter pulls in all the other vector
transformations.

Reviewed By: nicolasvasilache, mehdi_amini

Differential Revision: https://reviews.llvm.org/D158287
2023-09-13 16:09:56 +00:00
Yinying Li
dbe1be9aa4
[mlir][sparse] Migrate tests to use new syntax (#66146)
lvlTypes = [ "compressed" ] to map = (d0) -> (d0 : compressed)
lvlTypes = [ "dense" ] to map = (d0) -> (d0 : dense)
2023-09-13 11:41:25 -04:00
lorenzo chelini
d65885ae63
[MLIR][Linalg] Bail out if the tiles provided are more than the number (#66007)
Currently, the compiler crashes if the number of tiles provided exceeds
the number of loops.
2023-09-13 10:41:03 -04:00
Martin Erhart
c199f7dc62 Revert "[mlir][bufferization] Remove allow-return-allocs and create-deallocs pass options, remove bufferization.escape attribute"
This reverts commit 6a91dfedeb956dfa092a6a3f411e8b02f0d5d289.

This caused problems in downstream projects. We are reverting to give
them more time for integration.
2023-09-13 13:53:48 +00:00
Martin Erhart
94c8faeaff Revert "[mlir][bufferization] Update linalg integration tests to lower ops created by bufferization-to-memref pass"
This reverts commit 6f35401f867314d1b8c5e7e2c747d6b42b75d1d5.

This caused problems in downstream projects. We are reverting to give
them more time for integration.
2023-09-13 13:53:48 +00:00
Martin Erhart
520407a7c8 Revert "[mlir][bufferization] Improve buffer deallocation pass"
This reverts commit 1bebb60a7565e5197d23120528f544b886b4d905.

This caused problems in downstream projects. We are reverting to give
them more time for integration.
2023-09-13 13:53:48 +00:00
Martin Erhart
792caac0f8 Revert "[mlir][bufferization][NFC] Introduce BufferDeallocationOpInterface"
This reverts commit 29d86175e6a5fe956147734229dca88822415b21.

This caused problems in downstream projects. We are reverting to give
them more time for integration.
2023-09-13 13:53:47 +00:00
Martin Erhart
9782232ec7 Revert "[mlir][bufferization] BufferDeallocationOpInterface: support custom ownership update logic"
This reverts commit 89117f1807e5ac8db46295e977f02899e8ee8a56.

This caused problems in downstream projects. We are reverting to give
them more time for integration.
2023-09-13 13:53:47 +00:00
Martin Erhart
ccb16acd46 Revert "[mlir][bufferization] Implement BufferDeallocationopInterface for scf.forall.in_parallel"
This reverts commit 1356e853d47723c1be6eee2368d95c514a1816d1.

This caused problems in downstream projects. We are reverting to give
them more time for integration.
2023-09-13 13:53:47 +00:00
Martin Erhart
7995a4701d Revert "[mlir][bufferization] Define a pipeline for buffer deallocation"
This reverts commit f0c46639429768e46e1882036be7c2ddb712a985.

This caused problems in downstream projects. We are reverting to give
them more time for integration.
2023-09-13 13:53:47 +00:00
Guray Ozen
ba81cd10d6
[MLIR] Fix the tma_load test (#66208)
clang was used for local testing. The PR changes it to `mlir-cpu-runner`
2023-09-13 15:33:42 +02:00
Guray Ozen
38faecc692
[MLIR] SM_90 integration test of TMA with 128b Swizzling (#65953)
An integration test for the 128b Swizzling TMA.

TMA with 128B Swizzle loads data as follows (each numbered cell is 16
bytes). The program tests this pattern for `128x64xf16` type.

```
|-------------------------------|
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
| 1 | 0 | 3 | 2 | 5 | 4 | 7 | 6 |
| 2 | 3 | 0 | 1 | 6 | 7 | 4 | 5 |
| 3 | 2 | 1 | 0 | 7 | 6 | 5 | 4 |
| 4 | 5 | 6 | 7 | 0 | 1 | 2 | 3 |
| 5 | 4 | 7 | 6 | 1 | 0 | 3 | 2 |
| 6 | 7 | 4 | 5 | 2 | 3 | 0 | 1 |
|-------------------------------|
| ... pattern repeats ...       |
|-------------------------------|
```
2023-09-13 15:22:48 +02:00
Guray Ozen
6360d09531
[MLIR][NVVM] Fix the register number of predicate (#65970)
The register number of predicate is calculated incorrectly. This PR
fixes that.
2023-09-13 15:04:31 +02:00
David Truby
a6857156df
[mlir] Add pass to add comdat to all linkonce functions (#65270)
This adds a new pass to add an Any comdat to each linkonce
and linkonce_odr function in the LLVM dialect. These comdats are
necessary on Windows
to allow the default system linker to link binaries containing these
functions.
2023-09-13 13:05:05 +01:00
Martin Erhart
f0c4663942 [mlir][bufferization] Define a pipeline for buffer deallocation
Since buffer deallocation requires a few passes to be run in a somewhat fixed
sequence, it makes sense to have a pipeline for convenience (and to reduce the
number of transform ops to represent default deallocation).

Reviewed By: springerm

Differential Revision: https://reviews.llvm.org/D159432
2023-09-13 09:30:24 +00:00
Martin Erhart
1356e853d4 [mlir][bufferization] Implement BufferDeallocationopInterface for scf.forall.in_parallel
The scf.forall.in_parallel terminator operation has a nested graph region with
the NoTerminator trait. Such regions are not supported by the default
implementations. Therefore, this commit adds a specialized implementation for
this operation which only covers the case where the nested region is empty.
This is because after bufferization, ops like tensor.parallel_insert_slice were
already converted to memref operations residing int the scf.forall only and the
nested region of scf.forall.in_parallel ends up empty.

Reviewed By: springerm

Differential Revision: https://reviews.llvm.org/D158979
2023-09-13 09:30:24 +00:00
Martin Erhart
89117f1807 [mlir][bufferization] BufferDeallocationOpInterface: support custom ownership update logic
Add a method to the BufferDeallocationOpInterface that allows operations to
implement the interface and provide custom logic to compute the ownership
indicators of values it defines. As a demonstrating example, this new method is
implemented by the `arith.select` operation.

Reviewed By: springerm

Differential Revision: https://reviews.llvm.org/D158828
2023-09-13 09:30:23 +00:00
Martin Erhart
29d86175e6 [mlir][bufferization][NFC] Introduce BufferDeallocationOpInterface
This new interface allows operations to implement custom handling of ownership
values and insertion of dealloc operations which is useful when an op cannot
implement the interfaces supported by default by the buffer deallocation pass
(e.g., because they are not exactly compatible or because there are some
additional semantics to it that would render the default implementations in
buffer deallocation invalid, or because no interfaces exist for this kind of
behavior and it's not worth introducing one plus a default implementation in
buffer deallocation). Additionally, it can also be used to provide more
efficient handling for a specific op than the interface based default
implementations can.

Reviewed By: springerm

Differential Revision: https://reviews.llvm.org/D158756
2023-09-13 09:30:23 +00:00
Martin Erhart
1bebb60a75 [mlir][bufferization] Improve buffer deallocation pass
Add a new Buffer Deallocation pass replacing the old one with the goal of
inserting fewer clone operations and supporting additional use-cases.
Please refer to the Buffer Deallocation section in the updated
Bufferization.md file for more information on how this new pass works.

Reviewed By: springerm

Differential Revision: https://reviews.llvm.org/D158421
2023-09-13 09:30:23 +00:00
Martin Erhart
6f35401f86 [mlir][bufferization] Update linalg integration tests to lower ops created by bufferization-to-memref pass
This commit prepares the linalg integration tests to be run with the new
BufferDeallocation pass which requires the bufferization-to-memref pass
to be run afterwards.
The bufferization-to-memref pass may create ops of the SCF, Func, Arith,
and MemRef dialects. Currently, not all integration tests execute all the
conversion passes necessary to lower these dialects to LLVM.

Reviewed By: springerm

Differential Revision: https://reviews.llvm.org/D156663
2023-09-13 09:30:22 +00:00
Martin Erhart
6a91dfedeb [mlir][bufferization] Remove allow-return-allocs and create-deallocs pass options, remove bufferization.escape attribute
This is the first commit in a series with the goal to rework the
BufferDeallocation pass. Currently, this pass heavily relies on copies
to perform correct deallocations, which leads to very slow code and
potentially high memory usage. Additionally, there are unsupported cases
such as returning memrefs which this series of commits aims to add
support for as well.

This first commit removes the deallocation capabilities of
one-shot-bufferization.One-shot-bufferization should never deallocate any
memrefs as this should be entirely handled by the buffer-deallocation pass
going forward. This means the allow-return-allocs pass option will
default to true now, create-deallocs defaults to false and they, as well
as the escape attribute indicating whether a memref escapes the current region,
will be removed.

The documentation should w.r.t. these pass option changes should also be
updated in this commit.

Reviewed By: springerm

Differential Revision: https://reviews.llvm.org/D156662
2023-09-13 09:30:22 +00:00
Peiming Liu
64df1c08d0
[sparse] allow unpack op to return any integer type. (#66161) 2023-09-12 17:27:51 -07:00
Jakub Kuderski
4c4bdf0c3a
[mlir][spirv] Fix remaining coop matrix verification corner cases (#66137)
- Check `MakePointer*` load/store attribute values.
- Support coop matrix types in `MatrixTimesScalar` verification.
- Add test cases for all the remaining ops that accept coop matrix types.
- Split NV and KHR tests.
2023-09-12 17:04:03 -04:00
frgossen
1c5161911c
Add host-supports-nvptx requirement to lit tests (#66129) 2023-09-12 15:18:29 -04:00
Yunlong Liu
af562fd27f
Splits cleanup block lowered by AsyncToAsyncRuntime. (#66123)
Splits the cleanup block lowered from AsyncToAsyncRuntime.

The incentive of this change is to clarify the CFG branched by
`async.coro.suspend`.

The `async.coro.suspend` op branches into 3 blocks, depending on the
state of the coroutine:
1) suspend
2) resume
3) cleanup

The behavior before this change is that after the coroutine is resumed
and completed, it will jump to a shared cleanup block for destroying the
states of coroutines. The CFG looks like the following,

Entry block
        |                    \
   resume             |
        |                    |
            Cleanup
                   |
                End

This CFG can potentially be problematic, because the `Cleanup` block is
a shared block and it is not dominated by `resume`. For instance, if
some pass wants to add some specific cleanup mechanism to resume, it can
be confused and add them to the shared `Cleanup`, which leads to the
"operand not dominate its use" error because of the existence of the
other "Entry->cleanup" path.

After this change, the CFG will look like the following,

The overall structure of the lowered CFG can be the following,

  Entry (calling async.coro.suspend)
       |                    \
  Resume           Destroy (duplicate of Cleanup)
       |                     |
  Cleanup             |
       |                    /
     End (ends the corontine)

In this case, the Cleanup block tied to the Resume block will be
isolated from the other path and it is strictly dominated by "Resume".
2023-09-12 11:42:16 -07:00
frgossen
a3b894287f
Add host-supports-nvptx requirement to lit tests (#66102) 2023-09-12 12:21:36 -04:00
Jakub Kuderski
08425def7d
[mlir][spirv] Improve coop matrix attribute handling (#66020)
- Fix values of Matrix Operand bit enums.
- Add verification for the aligned Memory Operand attributes. Mark the
'Aligned' enumerant as not supported.

The target test passes validation with `spirv-val`.
2023-09-12 11:05:08 -04:00
Matthias Springer
91464e1d6a
[mlir][bufferization][NFC] Rename copy_tensor op to materialize_in_destination (#65467)
The previous name was badly chosen. The op is used to ensure that a
computation materializes in the future buffer of a certain tensor.
2023-09-12 15:20:41 +02:00
Andrzej Warzyński
22f96ab6fb
[mlir][vector] Refine vector.transfer_read hoisting/forwarding (#65770)
Make sure that when analysing a `vector.transfer_read` that's a
candidate for either hoisting or store-to-load forwarding,
`memref.collapse_shape` Ops are correctly included in the alias
analysis. This is done by either
* making sure that relevant users are taken into account, or
* source Ops are correctly identified.
2023-09-12 10:33:58 +01:00
Mehdi Amini
6ba42ae0eb
Fix some AffineOps to properly declare their inherent affinemap Attribute (#66050)
This makes it more consistent with properties.
2023-09-12 02:13:50 -07:00