Commit Graph

10443 Commits

Author SHA1 Message Date
Kai Sasaki
076953d250 [mlir] Skip invalid test on big endian platform (s390x) (#80246)
The buildbot test running on s390x platform keeps failing since [this
time](https://lab.llvm.org/buildbot/#/builders/199/builds/31136). This
is because of the dependency on the endianness of the platform. It
expects the format invalid in the big endian platform (s390x). We can
simply skip it.

See: https://discourse.llvm.org/t/mlir-s390x-linux-failure/76695
(cherry picked from commit 65ac8c16e028b23b49fd6b03817faa1ab6c0229d)
2024-02-13 11:39:15 -08:00
Mirko Brkušanin
ed48280f8e [AMDGPU] Add GFX12 WMMA and SWMMAC instructions (#77795)
Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com>
Co-authored-by: Piotr Sobczak <piotr.sobczak@amd.com>
2024-01-26 20:01:08 -08:00
Jeff Niu
df1e01b316
[mlir] Add example of printAlias to test dialect (NFC) (#79232)
Follow-up from previous pull request. Motivate the API change with an
attribute that decides between sugaring a sub-attribute or using an
alias
2024-01-23 16:29:57 -08:00
Krzysztof Drewniak
750e90e440
[mlir][ArithToAMDGPU] Add option for saturating truncation to fp8 (#74153)
Many machine-learning applications (and most software written at AMD)
expect the operation that truncates floats to 8-bit floats to be
saturatinng. That is, they expect `truncf 256.0 : f32 to f8E4M3FNUZ` to
yield `240.0`, not `NaN`, and similarly for negative numbers. However,
the underlying hardware instruction that can be used for this truncation
implements overflow-to-NaN semantics.

To enable handling this usecase, we add the saturate-fp8-truncf option
to ArithToAMDGPU (off by default), which causes the requisite clamping
code to be emitted. Said clamping code ensures that Inf and NaN are
passed through exactly (and thus trancate to NaN).

Per review feedback, this commit efactors
createScalarOrSplatConstant() to the Arith dialect utilities and uses
it in this code. It also fixes naming of existing patterns and
switches from vector.extractelement/insertelement to
vector.extract/insert.
2024-01-23 16:52:21 -06:00
Aart Bik
575568de41
[mlir][sparse] adjust compression scheme for example (#79212) 2024-01-23 14:51:46 -08:00
Kunwar Grover
9261ab708e
[mlir][Target] Teach dense_resource conversion to LLVMIR Target (#78958)
This patch adds support for translating dense_resource attributes to
LLVMIR Target.
The support added is similar to how DenseElementsAttr is handled, except
we
don't need to handle splats.

Another possible way of doing this is adding iteration on
dense_resource, but that is
non-trivial as DenseResourceAttr is not meant to be something you should
directly
access. It has subclasses which you are supposed to use to iterate on
it.
2024-01-23 13:30:34 -08:00
Saiyedul Islam
d2398cca6f
Restore: [mlir][ROCDL] Stop setting amdgpu-implicitarg-num-bytes (#79129)
This patch restores PR#78498
2024-01-23 18:48:39 +05:30
Saiyedul Islam
082f87c9d4
[AMDGPU] Change default AMDHSA Code Object version to 5 (#79038)
Also update LIT tests and docs.
For more details, see
https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata

Corresponding llvm-objdump AMDGPU lit tests are updated
in a follow-up PR.
2024-01-23 17:08:18 +05:30
Andrzej Warzynski
e3172e8418 [mlir] Update "UNSUPPORTED" directive in a test
Add missing "arm64" target to the list of unsupported targets ("arm64"
is used on Darwin).
2024-01-22 20:04:19 +00:00
Valentin Clement (バレンタイン クレメン)
3eb4178b9c
[mlir][openacc] Update acc.loop to be a proper loop like operation (#67355)
The initial design of the `acc.loop` was to be an operation that
encapsulates a loop like operation. This was an early design and we now
want to change it so the `acc.loop` operation becomes a real loop-like
operation by implementing the LoopLikeInterface.

Differential Revision: https://reviews.llvm.org/D159229

This patch is just moved from Phabricator to github
2024-01-22 10:31:29 -08:00
Valentin Clement (バレンタイン クレメン)
b5df6a90f5
[mlir][openacc] Fix num_gang parser (#78792)
Nb of operand per segment is not correctly computed.
2024-01-22 08:40:33 -08:00
Andrzej Warzynski
160ddf7114 [mlir] Remove duplicate test
The removed test is identical to the one directly above.
2024-01-22 15:51:56 +00:00
Benjamin Maxwell
e280c287e4
[mlir] Add mlir_arm_runner_utils library for use in integration tests (#78583)
This adds a new `mlir_arm_runner_utils` library that contains utils
specific to Arm/AArch64. This is for use in MLIR integration tests.

This initial patch adds `setArmVLBits()` and `setArmSVLBits()`. This
allows changing vector length or streaming vector length at runtime (or
setting it to a known minimum, i.e. 128-bits).
2024-01-22 09:28:13 +00:00
Cullen Rhodes
9f7fff7f13
[mlir][ArmSME] Add arith-to-arm-sme conversion pass (#78197)
Existing 'arith::ConstantOp' conversion and tests are moved from
VectorToArmSME. There's currently only a single op that's converted at
the moment, but this will grow in the future as things like in-tile add
are implemented. Also, 'createLoopOverTileSlices' is moved to ArmSME
utils since it's relevant for both conversions.
2024-01-22 09:23:11 +00:00
Dominik Adamski
21199f9842
[OpenMP][OMPIRBuilder] Fix LLVM IR codegen for collapsed device loop (#78708)
When we generate the loop body function, we need to be sure, that all
original loop counters are replaced by the new counter.

We need to save all items which use the original loop counter and then
perform replacement of the original loop counter. If we don't do it,
there is a risk that some values are not updated.
2024-01-22 09:24:45 +01:00
Durgadoss R
aa4547fcc8
[MLIR][NVVM] Update cp.async.bulk Ops to use intrinsics (#78900)
This patch updates the cp.async.bulk.{commit/wait}_group Ops to use NVVM
intrinsics.
* Doc updated for the commit_group Op.
* Tests are added to verify the lowering to the intrinsics.

While we are there, fix the FileCheck directive on the
'nvvm.setmaxregister' test.

Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
2024-01-22 08:39:30 +01:00
Guray Ozen
12c241b365
[MLIR][NVVM] Explicit Data Type for Output in wgmma.mma_async (#78713)
The current implementation of `nvvm.wgmma.mma_async` Op deduces the data
type of the output matrix from the data type of struct member, which can be
non-intuitive, especially in cases where types like `2xf16` are packed
into `i32`.

This PR addresses this issue by improving the Op to include an explicit
data type for the output matrix.

The modified Op now includes an explicit data type for Matrix-D (<f16>),
and looks as follows:

```
%result = llvm.mlir.undef : !llvm.struct<(struct<(i32, i32, ...
nvvm.wgmma.mma_async
    %descA, %descB, %result,
    #nvvm.shape<m = 64, n = 32, k = 16>,
    D [<f16>, #nvvm.wgmma_scale_out<zero>],
    A [<f16>, #nvvm.wgmma_scale_in<neg>, <col>],
    B [<f16>, #nvvm.wgmma_scale_in<neg>, <col>]
```
2024-01-22 08:37:20 +01:00
Guray Ozen
21830c9135
[mlir][nvgpu] Fix 'warpgroup.mma.store' index calculation (#78413)
This PR fixes the 'nvgpu.warpgroup.mma.store' index calculation. When
the destionation memref and current accumulator matrix were small, the
previous code was reaching out of range.
2024-01-22 08:32:56 +01:00
Matthias Springer
fbb62d449c
[mlir][bufferization] Buffer deallocation: Make op preconditions stricter (#75127)
The buffer deallocation pass checks the IR ("operation preconditions")
to make sure that there is no IR that is unsupported. In such a case,
the pass signals a failure.

The pass now rejects all ops with unknown memory effects. We do not know
whether such an op allocates memory or not. Therefore, the buffer
deallocation pass does not know whether a deallocation op should be
inserted or not.

Memory effects are queried from the `MemoryEffectOpInterface` interface.
Ops that do not implement this interface but have the
`RecursiveMemoryEffects` trait do not have any side effects (apart from
the ones that their nested ops may have).

Unregistered ops are now rejected by the pass because they do not
implement the `MemoryEffectOpInterface` and neither do we know if they
have `RecursiveMemoryEffects` or not. All test cases that currently have
unregistered ops are updated to use registered ops.
2024-01-21 11:10:09 +01:00
Jerry Wu
dedc7d4d36
[mlir] Exclude masked ops in VectorDropLeadUnitDim (#76468)
Don't insert cast ops for ops in `vector.mask` region in
`VectorDropLeadUnitDim`.
2024-01-20 19:37:46 -05:00
Mehdi Amini
e611a4cf80
Revert "[mlir][amdgpu] Shared memory access optimization pass" (#78822)
Reverts llvm/llvm-project#75627 ; it broke the bot:
https://lab.llvm.org/buildbot/#/builders/61/builds/53218
2024-01-19 16:41:43 -08:00
Xiangxi Guo (Ryan)
c17aa14f4c
[mlir][index] Fold cmp(x, x) when x isn't a constant (#78812)
Such cases show up in the middle of optimizations passes, e.g., after
some rewrites and then CSE. The current folder can fold such cases when
the inputs are constant; this patch improves it to fold even if the
inputs are non-constant.
2024-01-19 15:54:33 -08:00
erman-gurses
b7360fbe8c
[mlir][amdgpu] Shared memory access optimization pass (#75627)
It implements transformation to optimize accesses to shared memory.

Reference: https://reviews.llvm.org/D127457

_This change adds a transformation and pass to the NvGPU dialect that
attempts to optimize reads/writes from a memref representing GPU shared
memory in order to avoid bank conflicts. Given a value representing a
shared memory memref, it traverses all reads/writes within the parent op
and, subject to suitable conditions, rewrites all last dimension index
values such that element locations in the final (col) dimension are
given by newColIdx = col % vecSize + perm[row](col / vecSize, row)
where perm is a permutation function indexed by row and vecSize
is the vector access size in elements (currently assumes 128bit
vectorized accesses, but this can be made a parameter). This specific
transformation can help optimize typical distributed & vectorized
accesses
common to loading matrix multiplication operands to/from shared memory._
2024-01-19 15:44:45 -08:00
Quinn Dawkins
42b160356f
[mlir][transform] Add an op for replacing values with function calls (#78398)
Adds `transform.func.cast_and_call` that takes a set of inputs and
outputs and replaces the uses of those outputs with a call to a function
at a specified insertion point.

The idea with this operation is to allow users to author independent IR
outside of a to-be-compiled module, and then match and replace a slice
of the program with a call to the external function.

Additionally adds a mechanism for populating a type converter with a set
of conversion materialization functions that allow insertion of
casts on the inputs/outputs to and from the types of the function
signature.
2024-01-19 13:21:52 -05:00
Kareem Ergawy
5dbb30d950
[MLIR][OpenMP] Better error reporting for unsupported nowait (#78551)
Provides some context for failing to generate LLVM IR for `target
enter|exit|update` directives when `nowait` is provided. This is
directly helpful for flang users since they would get this error message
if they tried to use `nowait`. Before that we had a very generic
message.

This is a follow-up to https://github.com/llvm/llvm-project/pull/78269,
please only review the latest commit (the one with the same commit
message as the PR title).
2024-01-19 16:47:24 +01:00
Matthias Springer
b4f24be7ef
[mlir][bufferization] Simplify helper potentiallyAliasesMemref (#78690)
This commit simplifies a helper function in the ownership-based buffer
deallocation pass. Fixes a potential double-free (depending on the
scheduling of patterns).
2024-01-19 13:22:02 +01:00
Han-Chung Wang
12b676de72
[mlir][vector] Drop innermost unit dims on transfer_write. (#78554) 2024-01-19 03:15:13 -08:00
Tobias Gysi
9dd0eb9c9c
[mlir][llvm] Drop unreachable basic block during import (#78467)
This revision updates the LLVM IR import to support unreachable basic
blocks. An unreachable block may dominate itself and a value defined
inside the block may thus be used before its definition. The import does
not support such dependencies. We thus delete the unreachable basic
blocks before the import. This is possible since MLIR does not have
basic block labels that can be reached using an indirect call and
unreachable blocks can indeed be deleted safely.

Additionally, add a small poison constant import test.
2024-01-19 11:10:57 +01:00
Andrzej Warzyński
ee9c9f3b96
[mlir][vector] Add 2 invalid tests for vector.xfer Ops (#78608) 2024-01-19 08:54:12 +00:00
Billy Zhu
595d780b49
[MLIR][ODS] Check hasProperties when generating populateDefaultAttrs (#78525)
Currently ODS generates `populateDefaultAttrs` or
`populateDefaultProperties` based on whether the dialect opted into
usePropertiesForAttributes. But since individual ops might get opted
into using properties (as long as it has one property), it should
actually just check whether the op itself uses properties. Otherwise
`populateDefaultAttrs` will overwrite existing attrs inside properties
when creating an op. Understandably this becomes moot once everything
switches over to using properties, but this fixes it for now.

This PR makes ODS generate `populateDefaultProperties` as long as the op
itself uses properties.
2024-01-18 21:57:20 -08:00
Valentin Clement (バレンタイン クレメン)
b06bc7c6a0
[mlir][flang][openacc] Device type support on acc routine op (#78375)
This patch add support for device_type on the acc.routine operation.
device_type can be specified on seq, worker, vector, gang and bind
information.

The support is following the same design than the one for compute
operations, data operation and the loop operation.
2024-01-18 09:04:11 -08:00
Fehr Mathieu
914cfa4138
[mlir][irdl] Add irdl.base op (#76400)
The `irdl.base` op represent an attribute constraint that will check
that the
base of a type or attribute is the expected one (e.g. `IntegerType`) .

Example:

```mlir
irdl.dialect @cmath {
  irdl.type @complex {
    %0 = irdl.base "!builtin.integer"
    irdl.parameters(%0)
  }

  irdl.type @complex_wrapper {
    %0 = irdl.base @complex
    irdl.parameters(%0)
  }
}
```

The above program defines a `cmath.complex` type that expects a single
parameter, which is a type with base name `builtin.integer`, which is
the
name of an `IntegerType` type.
It also defines a `cmath.complex_wrapper` type that expects a single
parameter, which is a type of base type `cmath.complex`.
2024-01-18 16:31:40 +00:00
Aart Bik
9cd4128998
[mlir][sparse] add a 3-d block and fiber test (#78529) 2024-01-18 07:52:42 -08:00
Krzysztof Drewniak
aac23b08e3
[mlir][ROCDL] Stop setting amdgpu-implicitarg-num-bytes (#78498)
Clang stopped doing this late 2021 back in 33315ef321, and no other
frontent does this, so stop doing it.
2024-01-18 09:43:46 -06:00
Krzysztof Drewniak
05e85e4fc5
[mlir][Math] Add pass to legalize math functions to f32-or-higher (#78361)
Since most of the operations in the `math` dialect don't have
low-precision implementations, add the -math-legalize-to-f32 pass that
goes through and brackets low-precision math funcitons (like `math.sin
%0 : f16`) with `arith.extf` and `arith.truncf`. This preserves the
original semantics of the math operation but allows lowering to proceed.

Versions of this lowering are already implicitly present in some passes,
like ConvertGPUToROCDL. However, because those are implicit rewrites,
they hide the floating-point extension and truncation, preventing anyone
from writing passes that operate on those implitic extf/truncf pairs.

Exposing this legalization explicitly is needed to allow lowening 8-bit
floats on AMD GPUs, as the implementation of extf and truncf on that
platform requires the complex logic found in ArithToAMDGPU, which runs
before the GPU to ROCDL lowering.
2024-01-18 09:37:43 -06:00
Sergio Afonso
2747193058
[Flang][MLIR][OpenMP] Remove the early outlining interface (#78450)
After the removal of the OpenMP early outlining MLIR pass in #67319, the
`EarlyOutliningInterface` stopped doing any useful work. It used to be
necessary to tie the name of the function from which a target region was
outlined to that new function, so it would be used when translating to
LLVM IR in place of the outlined function's name.

This is not necessary anymore, so this patch removes all references to
this interface and uses of the `omp.outline_parent_name` discardable
attribute in tests.
2024-01-18 15:33:43 +00:00
Quinn Dawkins
5caab8bbc0
[mlir][transform] Add transform.get_operand op (#78397)
Similar to `transform.get_result`, except it returns a handle to the
operand indicated by a positional specification, same as is defined for
the linalg match ops.

Additionally updates `get_result` to take the same positional specification.
This makes the use case of wanting to get all of the results of an
operation easier by no longer requiring the user to reconstruct the list
of results one-by-one.
2024-01-18 09:33:14 -05:00
Kai Sasaki
fdcb76f248
[mlir][complex] Convert complex.tan to libm ctan call (#78250)
We can convert `complex.tan` op to
[ctan/ctanf](https://sourceware.org/newlib/libm.html#ctan) function in
libm in the complex to libm conversion.
2024-01-18 12:21:19 +09:00
Sergio Afonso
8fb685fb7e
[MLIR][LLVM] Add explicit target_cpu attribute to llvm.func (#78287)
This patch adds the target_cpu attribute to llvm.func MLIR operations
and updates the translation to/from LLVM IR to match "target-cpu"
function attributes.
2024-01-17 14:55:02 +00:00
Matthias Springer
5fcf907b34
[mlir][IR] Rename "update root" to "modify op" in rewriter API (#78260)
This commit renames 4 pattern rewriter API functions:
* `updateRootInPlace` -> `modifyOpInPlace`
* `startRootUpdate` -> `startOpModification`
* `finalizeRootUpdate` -> `finalizeOpModification`
* `cancelRootUpdate` -> `cancelOpModification`

The term "root" is a misnomer. The root is the op that a rewrite pattern
matches against
(https://mlir.llvm.org/docs/PatternRewriter/#root-operation-name-optional).
A rewriter must be notified of all in-place op modifications, not just
in-place modifications of the root
(https://mlir.llvm.org/docs/PatternRewriter/#pattern-rewriter). The old
function names were confusing and have contributed to various broken
rewrite patterns.

Note: The new function names use the term "modify" instead of "update"
for consistency with the `RewriterBase::Listener` terminology
(`notifyOperationModified`).
2024-01-17 11:08:59 +01:00
Aviad Cohen
d89a0a6594
[mlir][Tosa]: Add folder to ReciprocalOp of splat constant inputs (#78137) 2024-01-17 09:05:07 +02:00
Jacques Pienaar
8934b10642
[mlir][arith] Add overflow flags support to arith ops (#78376)
Add overflow flags support to the following ops:
* `arith.addi`
* `arith.subi`
* `arith.muli`

Example of new syntax:
```
%res = arith.addi %arg1, %arg2 overflow<nsw> : i64
```
Similar to existing LLVM dialect syntax
```
%res = llvm.add %arg1, %arg2 overflow<nsw> : i64
```

Tablegen canonicalization patterns updated to always drop flags, proper
support with tests will be added later.

Updated LLVMIR translation as part of this commit as it currenly written
in a way that it will crash when new attributes added to arith ops
otherwise.

Also lower `arith` overflow flags to corresponding SPIR-V op decorations

Discussion

https://discourse.llvm.org/t/rfc-integer-overflow-flags-support-in-arith-dialect/76025

This effectively rolls forward #77211, #77700 and #77714 while adding a
test to ensure the Python usage is not broken. More follow up needed but
unrelated to the core change here. The changes here are minimal and just
correspond to "textual namespacing" ODS side, no C++ or Python changes
were needed.

---------

---------

Co-authored-by: Ivan Butygin <ivan.butygin@gmail.com>, Yi Wu <yi.wu2@arm.com>
2024-01-17 06:12:23 +03:00
Tobias Gysi
bd26ce47c8
[mlir][llvm] Fix loop annotation parser (#78266)
This revision moves the ArrayRef field of the LoopAnnotation attribute
to the end of the struct to enable printing and parsing of the
attribute. Previously, the parsing could fail in the presence of a start
or end loc.
2024-01-16 16:35:52 +01:00
Matthias Springer
c0a354dfab
[mlir][vector] Fix invalid IR in ContractionOpLowering (#78130)
If a rewrite pattern returns "failure", it must not have modified the
IR. This commit fixes
`Dialect/Vector/vector-contract-to-outerproduct-transforms-unsupported.mlir`
when running with `MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS`.

```
  * Pattern (anonymous namespace)::ContractionOpToOuterProductOpLowering : 'vector.contract -> ()' {
Trying to match "(anonymous namespace)::ContractionOpToOuterProductOpLowering"
    ** Insert  : 'vector.transpose'(0x5625b3a8cb30)
    ** Insert  : 'vector.transpose'(0x5625b3a8cbc0)
"(anonymous namespace)::ContractionOpToOuterProductOpLowering" result 0
  } -> failure : pattern failed to match
} -> failure : pattern failed to match

LLVM ERROR: pattern returned failure but IR did change
```

Note: `vector-contract-to-outerproduct-transforms-unsupported.mlir` is
merged into `vector-contract-to-outerproduct-matvec-transforms.mlir`.
The `greedy pattern application failed` error is not longer produced.
This error indicates that the greedy pattern rewrite did not
convergence; it does not mean that a pattern could not be applied.
2024-01-16 09:40:24 +01:00
Fabian Mora
5b4f2b906b
[mlir][gpu] Add an offloading handler attribute to gpu.module (#78047)
This patch adds an optional offloading handler attribute to
the`gpu.module` op. This attribute will be used during
`gpu-module-to-binary` pass to override the offloading handler used in
the `gpu.binary` op.
2024-01-15 16:58:10 -05:00
Fabian Mora
01dbc5da33
Reland [mlir][ExecutionEngine] Add support for global constructors and destructors #78070 (#78170)
This patch add support for executing global constructors and destructors
in the ExecutionEngine.
2024-01-15 12:10:14 -05:00
Matthias Springer
c1730f4221
[mlir][SCF] Do not verify step size of scf.for (#78141)
An op verifier should verify only local properties. This commit removes
the verification of `scf.for` step sizes. (Verifiers can check
attributes but should not follow SSA values.) This verification could
reject IR that is actually valid, e.g.:

```mlir
scf.if %always_false {
  // Branch is never entered.
  scf.for ... step %c0 { ... }
}
```

This commit fixes `for-loop-peeling.mlir` when running with
`MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS`:
```
within split at llvm-project/mlir/test/Dialect/SCF/for-loop-peeling.mlir:293 offset :9:3: note: see current operation:
"scf.for"(%0, %3, %2) ({
^bb0(%arg1: index):
  %4 = "arith.index_cast"(%arg1) : (index) -> i64
  "memref.store"(%4, %arg0) : (i64, memref<i64>) -> ()
  "scf.yield"() : () -> ()
}) {__peeled_loop__} : (index, index, index) -> ()
LLVM ERROR: IR failed to verify after folding
```

Note: `%2` is `arith.constant 0 : index`.
2024-01-15 17:46:54 +01:00
Boian Petkantchin
5df2c00af3
[mlir][mesh] Remove rank attribute and rename dim_sizes to shape in ClusterOp (#77838)
Remove the somewhat redundant rank attribute.
Before this change
```
mesh.cluster @mesh(rank = 3, dim_sizes = 2x3)
```
After
```
mesh.cluster @mesh(shape = 2x3x?)
```

The rank is instead determined by the provided shape. With this change
no longer `getDimSizes()` can be wrongly assumed to have size equal to
the cluster rank.
Now `getShape().size()` will always equal `getRank()`.
2024-01-15 07:39:09 -08:00
Cullen Rhodes
3295b88a66
Revert "[mlir][ExecutionEngine] Add support for global constructors and destructors" (#78164)
this is causing test failures on AArch64 linux, hitting the
following assert:

# | mlir-cpu-runner: /home/culrho01/llvm-project/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp:519: void llvm::RuntimeDyldELF::resolveAArch64Relocation(const SectionEntry &, uint64_t, uint64_t, uint32_t, int64_t): Assertion `isInt<33>(Result) && "overflow check failed for relocation"' failed.

Seeing the same in buildbot as well, e.g.

https://lab.llvm.org/buildbot/#/builders/179/builds/9094/steps/12/logs/FAIL__MLIR__sparse_codegen_dim_mlir

Reverts llvm/llvm-project#78070
2024-01-15 14:21:41 +00:00
Durgadoss R
dc01b597ba
[MLIR][NVVM] Add support for aligned variants of cluster barriers (#78142)
This patch adds:
* Support for the 'aligned' variants of the cluster barrier Ops, by
extending the existing Op with an 'aligned' attribute.
* Docs for these Ops.
* Test cases to verify the lowering to the corresponding intrinsics.

Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
2024-01-15 14:52:30 +01:00