Commit Graph

18719 Commits

Author SHA1 Message Date
Kai Sasaki
076953d250 [mlir] Skip invalid test on big endian platform (s390x) (#80246)
The buildbot test running on s390x platform keeps failing since [this
time](https://lab.llvm.org/buildbot/#/builders/199/builds/31136). This
is because of the dependency on the endianness of the platform. It
expects the format invalid in the big endian platform (s390x). We can
simply skip it.

See: https://discourse.llvm.org/t/mlir-s390x-linux-failure/76695
(cherry picked from commit 65ac8c16e028b23b49fd6b03817faa1ab6c0229d)
2024-02-13 11:39:15 -08:00
Andrei Golubev
0680e84a3f [mlir] Revert to old fold logic in IR::Dialect::add{Types, Attributes}() (#79582)
Fold expressions on Clang are limited to 256 elements. This causes
compilation errors in cases when the amount of elements added exceeds
this limit. Side-step the issue by restoring the original trick that
would use the std::initializer_list. For the record, in our downstream
Clang 16 gives:

mlir/include/mlir/IR/Dialect.h:269:23: fatal error: instantiating fold
expression with 688 arguments exceeded expression nesting limit of 256
    (addType<Args>(), ...);

Partially reverts 26d811b3ec.

Co-authored-by: Nikita Kudriavtsev <nikita.kudriavtsev@intel.com>
(cherry picked from commit e3a38a75ddc6ff00301ec19a0e2488d00f2cc297)
2024-01-29 15:10:47 -08:00
Andrei Golubev
3df71e5a3f [mlir][LLVM] Use int32_t to indirectly construct GEPArg (#79562)
GEPArg can only be constructed from int32_t and mlir::Value. Explicitly
cast other types (e.g. unsigned, size_t) to int32_t to avoid narrowing
conversion warnings on MSVC. Some recent examples of such are:

```
mlir\lib\Dialect\LLVMIR\Transforms\TypeConsistency.cpp: error C2398:
Element '1': conversion from 'size_t' to 'T' requires a narrowing
conversion
    with
    [
        T=mlir::LLVM::GEPArg
    ]

mlir\lib\Dialect\LLVMIR\Transforms\TypeConsistency.cpp: error C2398:
Element '1': conversion from 'unsigned int' to 'T' requires a narrowing
conversion
    with
    [
        T=mlir::LLVM::GEPArg
    ]
```

Co-authored-by: Nikita Kudriavtsev <nikita.kudriavtsev@intel.com>
(cherry picked from commit 89cd345667a5f8f4c37c621fd8abe8d84e85c050)
2024-01-29 10:29:59 -08:00
Mirko Brkušanin
ed48280f8e [AMDGPU] Add GFX12 WMMA and SWMMAC instructions (#77795)
Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com>
Co-authored-by: Piotr Sobczak <piotr.sobczak@amd.com>
2024-01-26 20:01:08 -08:00
Jeff Niu
df1e01b316
[mlir] Add example of printAlias to test dialect (NFC) (#79232)
Follow-up from previous pull request. Motivate the API change with an
attribute that decides between sugaring a sub-attribute or using an
alias
2024-01-23 16:29:57 -08:00
Krzysztof Drewniak
750e90e440
[mlir][ArithToAMDGPU] Add option for saturating truncation to fp8 (#74153)
Many machine-learning applications (and most software written at AMD)
expect the operation that truncates floats to 8-bit floats to be
saturatinng. That is, they expect `truncf 256.0 : f32 to f8E4M3FNUZ` to
yield `240.0`, not `NaN`, and similarly for negative numbers. However,
the underlying hardware instruction that can be used for this truncation
implements overflow-to-NaN semantics.

To enable handling this usecase, we add the saturate-fp8-truncf option
to ArithToAMDGPU (off by default), which causes the requisite clamping
code to be emitted. Said clamping code ensures that Inf and NaN are
passed through exactly (and thus trancate to NaN).

Per review feedback, this commit efactors
createScalarOrSplatConstant() to the Arith dialect utilities and uses
it in this code. It also fixes naming of existing patterns and
switches from vector.extractelement/insertelement to
vector.extract/insert.
2024-01-23 16:52:21 -06:00
Aart Bik
575568de41
[mlir][sparse] adjust compression scheme for example (#79212) 2024-01-23 14:51:46 -08:00
Kunwar Grover
9261ab708e
[mlir][Target] Teach dense_resource conversion to LLVMIR Target (#78958)
This patch adds support for translating dense_resource attributes to
LLVMIR Target.
The support added is similar to how DenseElementsAttr is handled, except
we
don't need to handle splats.

Another possible way of doing this is adding iteration on
dense_resource, but that is
non-trivial as DenseResourceAttr is not meant to be something you should
directly
access. It has subclasses which you are supposed to use to iterate on
it.
2024-01-23 13:30:34 -08:00
Krzysztof Drewniak
80fcc9247a
[mlir][AMDGPU] Actually update the default ABI version, add comments (#79185)
Much confusion occurred earlier today when updating the fallback `int
abi;` in addControlVariables() didn't do anything. THis was because that
that value is the fallback for if the ABI version fails to parse ...
which it always should, because it has a default value that comes from
multiple different places.

This commit updates all the places said default variable can come from,
namely:
1. The ROCDL target attribute definition
2. The ROCDL target attribute's builders
3. The rocdl-attach-target pass's default option values.

With this, the printf test is passing.
2024-01-23 12:16:18 -06:00
Saiyedul Islam
9edd1c4dae
[MLIR][AMDGPU] Switch to code object version 5 (#79144)
As AMDGPU backend has moved to cov5 as default, mlir should also switch
to it.
2024-01-23 20:30:44 +05:30
Saiyedul Islam
d2398cca6f
Restore: [mlir][ROCDL] Stop setting amdgpu-implicitarg-num-bytes (#79129)
This patch restores PR#78498
2024-01-23 18:48:39 +05:30
Saiyedul Islam
082f87c9d4
[AMDGPU] Change default AMDHSA Code Object version to 5 (#79038)
Also update LIT tests and docs.
For more details, see
https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata

Corresponding llvm-objdump AMDGPU lit tests are updated
in a follow-up PR.
2024-01-23 17:08:18 +05:30
Mehdi Amini
d4933b3241 Apply clang-tidy fixes for readability-identifier-naming in PolynomialApproximation.cpp (NFC) 2024-01-22 17:34:56 -08:00
Mehdi Amini
acf2f24ac3 Apply clang-tidy fixes for llvm-else-after-return in LLVMDialect.cpp (NFC) 2024-01-22 17:34:56 -08:00
Mehdi Amini
2e0909025e Apply clang-tidy fixes for readability-simplify-boolean-expr in Vectorization.cpp (NFC) 2024-01-22 17:34:55 -08:00
Mehdi Amini
3af5ab21b8 Apply clang-tidy fixes for readability-identifier-naming in Transforms.cpp (NFC) 2024-01-22 17:34:55 -08:00
Mehdi Amini
c0fe2b8963 Apply clang-tidy fixes for modernize-loop-convert in Transforms.cpp (NFC) 2024-01-22 17:34:55 -08:00
Andrzej Warzynski
e3172e8418 [mlir] Update "UNSUPPORTED" directive in a test
Add missing "arm64" target to the list of unsupported targets ("arm64"
is used on Darwin).
2024-01-22 20:04:19 +00:00
Valentin Clement (バレンタイン クレメン)
3eb4178b9c
[mlir][openacc] Update acc.loop to be a proper loop like operation (#67355)
The initial design of the `acc.loop` was to be an operation that
encapsulates a loop like operation. This was an early design and we now
want to change it so the `acc.loop` operation becomes a real loop-like
operation by implementing the LoopLikeInterface.

Differential Revision: https://reviews.llvm.org/D159229

This patch is just moved from Phabricator to github
2024-01-22 10:31:29 -08:00
Valentin Clement (バレンタイン クレメン)
ee6199ca3c
[mlir][openacc][NFC] Cleanup hasOnly functions for device_type support (#78800)
Just a cleanup for all the `has.*Only()` function to avoid code
duplication
2024-01-22 08:40:52 -08:00
Valentin Clement (バレンタイン クレメン)
b5df6a90f5
[mlir][openacc] Fix num_gang parser (#78792)
Nb of operand per segment is not correctly computed.
2024-01-22 08:40:33 -08:00
Andrzej Warzynski
160ddf7114 [mlir] Remove duplicate test
The removed test is identical to the one directly above.
2024-01-22 15:51:56 +00:00
Andrzej Warzynski
75b0c913a5 [mlir][nfc] Update comments
1. Updates and clarifies a few comments related to hooks for
   vector.{insert|extract}_strided_slice.

2. For consistency with vector.insert_strided_slice, removes a TODO from
   vector.extract_strided_slice Op def. It's self-explenatory that
   adding support for non-unit strides is a "TODO".
2024-01-22 14:25:27 +00:00
Jie Fu
3c94154c86 [mlir] Fix -Wunused-variable in Barvinok.cpp (NFC)
llvm-project/mlir/lib/Analysis/Presburger/Barvinok.cpp:262:21:
 error: unused variable 'd' [-Werror,-Wunused-variable]
  for (const Point &d : ds)
                    ^
1 error generated.
2024-01-22 20:12:39 +08:00
Benjamin Maxwell
e280c287e4
[mlir] Add mlir_arm_runner_utils library for use in integration tests (#78583)
This adds a new `mlir_arm_runner_utils` library that contains utils
specific to Arm/AArch64. This is for use in MLIR integration tests.

This initial patch adds `setArmVLBits()` and `setArmSVLBits()`. This
allows changing vector length or streaming vector length at runtime (or
setting it to a known minimum, i.e. 128-bits).
2024-01-22 09:28:13 +00:00
Cullen Rhodes
9f7fff7f13
[mlir][ArmSME] Add arith-to-arm-sme conversion pass (#78197)
Existing 'arith::ConstantOp' conversion and tests are moved from
VectorToArmSME. There's currently only a single op that's converted at
the moment, but this will grow in the future as things like in-tile add
are implemented. Also, 'createLoopOverTileSlices' is moved to ArmSME
utils since it's relevant for both conversions.
2024-01-22 09:23:11 +00:00
Abhinav271828
68a5261d26
[MLIR][Presburger] Implement function to evaluate the number of terms in a generating function. (#78078)
We implement `computeNumTerms()`, which counts the number of terms in a
generating function by substituting the unit vector in it.
This is the main function in Barvinok's algorithm – the number of points
in a polytope is given by the number of terms in the generating function
corresponding to it.
We also modify the GeneratingFunction class to have `const` getters and
improve the simplification of QuasiPolynomials.
2024-01-22 14:22:01 +05:30
Dominik Adamski
21199f9842
[OpenMP][OMPIRBuilder] Fix LLVM IR codegen for collapsed device loop (#78708)
When we generate the loop body function, we need to be sure, that all
original loop counters are replaced by the new counter.

We need to save all items which use the original loop counter and then
perform replacement of the original loop counter. If we don't do it,
there is a risk that some values are not updated.
2024-01-22 09:24:45 +01:00
Durgadoss R
aa4547fcc8
[MLIR][NVVM] Update cp.async.bulk Ops to use intrinsics (#78900)
This patch updates the cp.async.bulk.{commit/wait}_group Ops to use NVVM
intrinsics.
* Doc updated for the commit_group Op.
* Tests are added to verify the lowering to the intrinsics.

While we are there, fix the FileCheck directive on the
'nvvm.setmaxregister' test.

Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
2024-01-22 08:39:30 +01:00
Guray Ozen
12c241b365
[MLIR][NVVM] Explicit Data Type for Output in wgmma.mma_async (#78713)
The current implementation of `nvvm.wgmma.mma_async` Op deduces the data
type of the output matrix from the data type of struct member, which can be
non-intuitive, especially in cases where types like `2xf16` are packed
into `i32`.

This PR addresses this issue by improving the Op to include an explicit
data type for the output matrix.

The modified Op now includes an explicit data type for Matrix-D (<f16>),
and looks as follows:

```
%result = llvm.mlir.undef : !llvm.struct<(struct<(i32, i32, ...
nvvm.wgmma.mma_async
    %descA, %descB, %result,
    #nvvm.shape<m = 64, n = 32, k = 16>,
    D [<f16>, #nvvm.wgmma_scale_out<zero>],
    A [<f16>, #nvvm.wgmma_scale_in<neg>, <col>],
    B [<f16>, #nvvm.wgmma_scale_in<neg>, <col>]
```
2024-01-22 08:37:20 +01:00
Guray Ozen
21830c9135
[mlir][nvgpu] Fix 'warpgroup.mma.store' index calculation (#78413)
This PR fixes the 'nvgpu.warpgroup.mma.store' index calculation. When
the destionation memref and current accumulator matrix were small, the
previous code was reaching out of range.
2024-01-22 08:32:56 +01:00
Matthias Springer
fbb62d449c
[mlir][bufferization] Buffer deallocation: Make op preconditions stricter (#75127)
The buffer deallocation pass checks the IR ("operation preconditions")
to make sure that there is no IR that is unsupported. In such a case,
the pass signals a failure.

The pass now rejects all ops with unknown memory effects. We do not know
whether such an op allocates memory or not. Therefore, the buffer
deallocation pass does not know whether a deallocation op should be
inserted or not.

Memory effects are queried from the `MemoryEffectOpInterface` interface.
Ops that do not implement this interface but have the
`RecursiveMemoryEffects` trait do not have any side effects (apart from
the ones that their nested ops may have).

Unregistered ops are now rejected by the pass because they do not
implement the `MemoryEffectOpInterface` and neither do we know if they
have `RecursiveMemoryEffects` or not. All test cases that currently have
unregistered ops are updated to use registered ops.
2024-01-21 11:10:09 +01:00
Matthias Springer
62bf7710ff
[mlir][IR] Add notifyBlockRemoved callback to listener (#78306)
There is already a "block inserted" notification (in
`OpBuilder::Listener`), so there should also be a "block removed"
notification.

The purpose of this change is to make the listener API more mature.
There is currently a gap between what kind of IR changes can be made and
what IR changes can be listened to. At the moment, the only way to
inform listeners about "block removal" is to send a manual
`notifyOperationModified` for the parent op (e.g., by wrapping the
`eraseBlock(b)` method call in `updateRootInPlace(b->getParentOp())`).
This tells the listener that *something* has changed, but it is somewhat
of an API abuse.
2024-01-21 10:06:53 +01:00
Jerry Wu
dedc7d4d36
[mlir] Exclude masked ops in VectorDropLeadUnitDim (#76468)
Don't insert cast ops for ops in `vector.mask` region in
`VectorDropLeadUnitDim`.
2024-01-20 19:37:46 -05:00
Bharathi Ramana Joshi
d70bfeb4e1
[MLIR][Presburger] Implement IntegerRelation::setId (#77872) 2024-01-20 15:19:10 +05:30
Jeff Niu
15b089cb02
[mlir] Make printAlias hooks public (NFC) (#78833)
These are very useful when writing custom parsers and printers for
aggregate types or attributes that might want to print aliases.
2024-01-19 23:23:41 -08:00
Mehdi Amini
b1d4265a5f Apply clang-tidy fixes for llvm-qualified-auto in Promotion.cpp (NFC) 2024-01-19 17:58:15 -08:00
Mehdi Amini
197a73f019 Apply clang-tidy fixes for llvm-include-order in Fusion.cpp (NFC) 2024-01-19 17:58:15 -08:00
Mehdi Amini
46ce993dd4 Apply clang-tidy fixes for llvm-else-after-return in ElementwiseOpFusion.cpp (NFC) 2024-01-19 17:58:14 -08:00
Mehdi Amini
f19f213974 Apply clang-tidy fixes for llvm-else-after-return in DropUnitDims.cpp (NFC) 2024-01-19 17:58:14 -08:00
Mehdi Amini
3b61f5a1bc Apply clang-tidy fixes for performance-unnecessary-value-param in DataLayoutPropagation.cpp (NFC) 2024-01-19 17:58:14 -08:00
Mehdi Amini
e611a4cf80
Revert "[mlir][amdgpu] Shared memory access optimization pass" (#78822)
Reverts llvm/llvm-project#75627 ; it broke the bot:
https://lab.llvm.org/buildbot/#/builders/61/builds/53218
2024-01-19 16:41:43 -08:00
Xiangxi Guo (Ryan)
c17aa14f4c
[mlir][index] Fold cmp(x, x) when x isn't a constant (#78812)
Such cases show up in the middle of optimizations passes, e.g., after
some rewrites and then CSE. The current folder can fold such cases when
the inputs are constant; this patch improves it to fold even if the
inputs are non-constant.
2024-01-19 15:54:33 -08:00
erman-gurses
b7360fbe8c
[mlir][amdgpu] Shared memory access optimization pass (#75627)
It implements transformation to optimize accesses to shared memory.

Reference: https://reviews.llvm.org/D127457

_This change adds a transformation and pass to the NvGPU dialect that
attempts to optimize reads/writes from a memref representing GPU shared
memory in order to avoid bank conflicts. Given a value representing a
shared memory memref, it traverses all reads/writes within the parent op
and, subject to suitable conditions, rewrites all last dimension index
values such that element locations in the final (col) dimension are
given by newColIdx = col % vecSize + perm[row](col / vecSize, row)
where perm is a permutation function indexed by row and vecSize
is the vector access size in elements (currently assumes 128bit
vectorized accesses, but this can be made a parameter). This specific
transformation can help optimize typical distributed & vectorized
accesses
common to loading matrix multiplication operands to/from shared memory._
2024-01-19 15:44:45 -08:00
Quinn Dawkins
42b160356f
[mlir][transform] Add an op for replacing values with function calls (#78398)
Adds `transform.func.cast_and_call` that takes a set of inputs and
outputs and replaces the uses of those outputs with a call to a function
at a specified insertion point.

The idea with this operation is to allow users to author independent IR
outside of a to-be-compiled module, and then match and replace a slice
of the program with a call to the external function.

Additionally adds a mechanism for populating a type converter with a set
of conversion materialization functions that allow insertion of
casts on the inputs/outputs to and from the types of the function
signature.
2024-01-19 13:21:52 -05:00
Marius Brehler
205e15c176 [mlir][docs] Fix broken link 2024-01-19 17:38:27 +01:00
Kareem Ergawy
5dbb30d950
[MLIR][OpenMP] Better error reporting for unsupported nowait (#78551)
Provides some context for failing to generate LLVM IR for `target
enter|exit|update` directives when `nowait` is provided. This is
directly helpful for flang users since they would get this error message
if they tried to use `nowait`. Before that we had a very generic
message.

This is a follow-up to https://github.com/llvm/llvm-project/pull/78269,
please only review the latest commit (the one with the same commit
message as the PR title).
2024-01-19 16:47:24 +01:00
Matthias Springer
b4f24be7ef
[mlir][bufferization] Simplify helper potentiallyAliasesMemref (#78690)
This commit simplifies a helper function in the ownership-based buffer
deallocation pass. Fixes a potential double-free (depending on the
scheduling of patterns).
2024-01-19 13:22:02 +01:00
Benjamin Chetioui
35121add2e [mlir][NFC] Remove unused variable. 2024-01-19 11:32:19 +00:00
Han-Chung Wang
12b676de72
[mlir][vector] Drop innermost unit dims on transfer_write. (#78554) 2024-01-19 03:15:13 -08:00