Test was failing due to a different transform sequence declaration (transform sequence were used, while now it should be named transform sequence). Test is now fixed.
Fixes https://github.com/llvm/llvm-project/issues/62489.
Some notes for each number:
- 1 `bool-literal` should be reasonably clear from context.
- 2 Fixed.
- 3 This is now fixed. `loc(fused[])` is valid, but `loc(fused["foo",])`
is not.
- 4 This operation uses `assemblyFormat` so the syntax is correct
(assuming ODS is correct).
- 5 This operation uses `assemblyFormat` so the syntax is correct
(assuming ODS is correct).
- 6 Added an example.
- 7 The suggested fix is in line with other `assemblyFormat` examples.
- 8 Added syntax and an example.
- 9 I don't know what this is referring too.
- 10 Added example.
- 11 and 12 suggestion seems wrong as the `ShapedTypeInterface` could be
extended by clients, so is not limited to tensors or vectors.
- 13 is already reasonably clear with the example, I think.
- 14 is already reasonably clear with the example, I think.
- 15 Added an example from the `opaque_locations.mlir` tests.
- 16 The answer to this seems to change over time and depend on the use
case? Suggestions by reviewers are welcome.
Rewrite patterns must return `success` if the IR was modified. This
commit fixes sparse tensor tests such as
`SparseTensor/sparse_fusion.mlir`,
`SparseTensor/CPU/sparse_reduce_custom.mlir`,
`SparseTensor/CPU/sparse_semiring_select.mlir` when running with
`MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS`.
The op used to support only float element types. This was inconsistent
with `ConstantOp::isBuildableWith`, which allows integer element types.
The complex type allows any float/integer element type.
Note: The other complex dialect ops do not support non-float element
types yet. The main purpose of this change to fix
`Tensor/canonicalize.mlir`, which is currently failing when verifying
the IR after each pattern application (#74270).
```
within split at mlir/test/Dialect/Tensor/canonicalize.mlir:231 offset :8:15: error: 'complex.constant' op result #0 must be complex type with floating-point elements, but got 'complex<i32>'
%complex1 = tensor.extract %c1[] : tensor<complex<i32>>
^
within split at mlir/test/Dialect/Tensor/canonicalize.mlir:231 offset :8:15: note: see current operation: %0 = "complex.constant"() <{value = [1 : i32, 2 : i32]}> : () -> complex<i32>
"func.func"() <{function_type = () -> tensor<3xcomplex<i32>>, sym_name = "extract_from_elements_complex_i"}> ({
%0 = "complex.constant"() <{value = [1 : i32, 2 : i32]}> : () -> complex<i32>
%1 = "arith.constant"() <{value = dense<(3,2)> : tensor<complex<i32>>}> : () -> tensor<complex<i32>>
%2 = "arith.constant"() <{value = dense<(1,2)> : tensor<complex<i32>>}> : () -> tensor<complex<i32>>
%3 = "tensor.extract"(%1) : (tensor<complex<i32>>) -> complex<i32>
%4 = "tensor.from_elements"(%0, %3, %0) : (complex<i32>, complex<i32>, complex<i32>) -> tensor<3xcomplex<i32>>
"func.return"(%4) : (tensor<3xcomplex<i32>>) -> ()
}) : () -> ()
```
The `ForallRewriter` pattern used to generate invalid IR:
```
mlir/test/Dialect/SparseTensor/GPU/gpu_combi.mlir:0:0: error: 'scf.for' op expects region #0 to have 0 or 1 blocks
mlir/test/Dialect/SparseTensor/GPU/gpu_combi.mlir:0:0: note: see current operation:
"scf.for"(%8, %2, %9) ({
^bb0(%arg5: index):
// ...
"scf.yield"() : () -> ()
^bb1(%10: index): // no predecessors
"scf.yield"() : () -> ()
}) : (index, index, index) -> ()
```
This commit fixes tests such as
`mlir/test/Dialect/SparseTensor/GPU/gpu_combi.mlir` when verifying the
IR after each pattern application (#74270).
Op verifiers should verify only local properties of an op. The dynamic
sizes of a `tensor.generate` op should not be verified. Dynamic sizes
that have a negative constant value should not prevent the
`tensor.generate` op from verifying.
Also share some code between the `tensor.empty` and `tensor.generate`
"dynamic dim -> static dim" canonicalization patterns.
Remove the `invalid-canonicalize.mlir` file and move the test case to
`canonicalize.mlir`. Canonicalization no longer produces IR that does
not verify (and leaves the op as is).
The partial bufferization framework has been replaced with One-Shot
Bufferize. SCF-specific canonicalization patterns for
`to_memref`/`to_tensor` are no longer needed.
Add missing constant propogation folder for LeftShiftLogical,
RightShift[Logical|Arithmetic].
Implement additional folding when Shift value is 0.
This helps for readability of lowered code into SPIR-V.
Part of work for #70704
The current vectorization of 1D depthwise convolutions in Linalg is
_sub-optimal_ for tensor with a low number of channel dimensions, e.g.:
```mlir
linalg.depthwise_conv_1d_nwc_wc
{dilations = dense<1> : vector<1xi64>,
strides = dense<1> : vector<1xi64>}
ins(%input, %filter : tensor<1x8x3xi8>, tensor<1x3xi8>)
outs(%output : tensor<1x8x3xi8>) -> tensor<1x8x3xi8>
```
That's due to the fact that ultimately (i.e. at LLVM level),
vectorization happens along the trailing dimension (i.e. the channel
dimension). In this case it leads to vectors with 3 elements (or worse,
if there's e.g. only 1 channel dimension). For comparison, a 128 bit
wide vector registers can hold 16 x i8.
Instead, this patch adds an option to flatten/collapse the channel
dimension into the width dimension of the input/filter/output using
`vector.shape_cast` operation:
```mlir
%sc_input = vector.shape_cast %input : vector<1x8x3xi8> to vector<1x24xi8>
%sc_output = vector.shape_cast %output : vector<1x8x3xi8> to vector<1x24xi8>
%b_filter = vector.broadcast %filter : vector<3xi8> to vector<1x8x3xi8>
%sc_filter = vector.shape_cast %b_filter : vector<1x8x3xi8> to vector<1x24xi8>
```
This new vectorization mode is implemented in `depthwiseConv` by
inserting `vector.shape_cast` Ops before and after
`depthwiseConv1dSliceAsMulAcc` is invoked. It can be selected through
e.g. a transform dialect attribute:
```mlir
transform.structured.vectorize_children_and_apply_patterns %conv {flatten_1d_depthwise_conv}
```
A forthcoming patch will implement a strategy to automatically switch
between the two implementations, depending on the shape of the input
tensors.
Co-authored by: Bradley Smith <bradley.smith@arm.com>
This adds a consistent usage with `at` for everything that refers to the
current loop nesting. This cleans up some redundant legacy code from
when we were still using topSort inside sparsifier code.
`TileUsingForOp` has an optional Attribute `interchange` which was given
in curly braces like this: `{interchange = [...]}`. The way this was
parsed meant that no `attr-dict` could be attached to the Op.
This patch adds printing / parsing of an `attr-dict` to the Op and
prints/parses the `interchange` Attribute separate from the
discardable Attributes.
Rationale:
We no longer deal with topsort during sparsification, so that LoopId ==
LoopOrd for all methods. This first revision removes the types. A follow
up revision will simplify some other remaining constructs that deal with
loop order (e.g. at and ldx).
Since #73253, loops over tiles in SSA form (i.e. loops that take
`iter_args` and yield a new tile) are supported, so this patch updates
ArmSME lowerings to this form. This is a NFC, as it still lowers to the
same intrinsics, but this makes IR less 'surprising' at a higher-level,
and may be recognised by more transforms.
Example:
IR before:
```mlir
scf.for %tile_slice_index = %c0 to %num_tile_slices step %c1
{
arm_sme.move_vector_to_tile_slice
%broadcast_to_1d, %tile, %tile_slice_index :
vector<[4]xi32> into vector<[4]x[4]xi32>
}
// ... later use %tile
```
IR now:
```mlir
%broadcast_to_tile = scf.for %tile_slice_index = %c0 to %num_tile_slices
step %c1 iter_args(%iter_tile = %init_tile) -> (vector<[4]x[4]xi32>)
{
%tile_update = arm_sme.move_vector_to_tile_slice
%broadcast_to_1d, %iter_tile, %tile_slice_index :
vector<[4]xi32> into vector<[4]x[4]xi32>
scf.yield %tile_update : vector<[4]x[4]xi32>
}
// ... later use %broadcast_to_tile
```
`llvm.fcmp` does support fast math attributes therefore so should
`arith.cmpf`.
The heavy churn in flang tests are because flang sets
`fastmath<contract>` by default on all operations that support the fast
math interface. Downstream users of MLIR should not be so effected.
This was requested in https://github.com/llvm/llvm-project/issues/74263
This reverts commit f42b7615b8.
The fold pattern is incorrect, because it does not even look at the
permutation of non-unit dims and is happy to replace a pattern such as
```
%22 = vector.shape_cast %21 : vector<1x256x256xf32> to vector<256x256xf32>
%23 = vector.transpose %22, [1, 0] : vector<256x256xf32> to vector<256x256xf32>
```
with
```
%22 = vector.shape_cast %21 : vector<1x256x256xf32> to vector<256x256xf32>
```
which is obviously incorrect.
`gpu.dynamic_shared_memory` currently does not get lowered when it is
used with vector dialect. The reason is that vector-to-llvm conversion
is not included in gpu-to-nvvm. This PR includes that and adds a test.
There's an issue in the translator today where, for a CallsiteLoc, if
the callee does not have a DI scope (perhaps due to compile options or
optimizations), it may get propagated the DI scope of its callsite's
parent function, which will create a non-existent DILocation combining
line & col number from one file, and the filename from another.
The root problem is we cannot propagate the parent scope when
translating the callee location, as it no longer applies to inlined
locations (see code diff and hopefully this will make sense).
To facilitate this, the importer is also changed so that callee scopes
are fused with the callee FileLineCol loc, instead of on the Callsite
loc itself. This comes with the benefit that we now have a symmetric
Callsite loc representation. If we required the callee scope be always
annotated on the Callsite loc, it would be hard for generic inlining
passes to maintain that, since it would have to somehow understand the
semantics of the fused metadata and pull it out while inlining.
The tracking listener should not report op replacement errors for
payload ops that are not mapped to any live handles. The handle liveless
analysis did not work properly with transform IR that has named
sequences.
A handle is live if it has a user after the transform op that is
currently being applied. With named sequences, we need to maintain a
stack of currently applied transform ops. That stack already exists
(`regionStack`), the only thing that's missing is the current transform
op for each stack frame.
This commit fixes#72931.
`ParallelOpSingleOrZeroIterationDimsFolder` used to produce invalid IR:
```
within split at mlir/test/Dialect/SCF/canonicalize.mlir:1 offset :11:3: error: 'scf.parallel' op expects region #0 to have 0 or 1 blocks
scf.parallel (%i0, %i1, %i2) = (%c0, %c3, %c7) to (%c1, %c6, %c10) step (%c1, %c2, %c3) {
^
within split at mlir/test/Dialect/SCF/canonicalize.mlir:1 offset :11:3: note: see current operation:
"scf.parallel"(%4, %5, %3) <{operandSegmentSizes = array<i32: 1, 1, 1, 0>}> ({
^bb0(%arg1: index):
"memref.store"(%0, %arg0, %1, %arg1, %6) : (i32, memref<?x?x?xi32>, index, index, index) -> ()
"scf.yield"() : () -> ()
^bb1(%8: index): // no predecessors
"scf.yield"() : () -> ()
}) : (index, index, index) -> ()
```
Together with #74551, this commit fixes
`mlir/test/Dialect/SCF/canonicalize.mlir` when verifying the IR after
each pattern application (#74270).
`DecomposePrintOpConversion` used to generate invalid op such as:
```
error: 'arith.extsi' op operand type 'vector<10xi32>' and result type 'vector<10xi32>' are cast incompatible
vector.print %v9 : vector<10xi32>
```
This commit fixes tests such as
`mlir/test/Integration/Dialect/Vector/CPU/test-reductions-i32.mlir` when
verifying the IR after each pattern application (#74270).
`SimplifyClones` used to generate an invalid op:
```
error: 'memref.cast' op operand type 'memref<*xf32>' and result type 'memref<*xf32>' are cast incompatible
%2 = bufferization.clone %1 : memref<*xf32> to memref<*xf32
```
This commit fixes tests such as
`mlir/test/Dialect/Bufferization/canonicalize.mlir` when verifying the
IR after each pattern (#74270).
The `ShapeOfOp` folder used to generate invalid IR.
Input:
```
%0 = shape.shape_of %arg1 : tensor<index> -> tensor<?xindex>
```
Output:
```
%0 = "shape.const_shape"() <{shape = dense<> : tensor<0xindex>}> : () -> tensor<?xindex>
error: 'shape.const_shape' op inferred type(s) 'tensor<0xindex>' are incompatible with return type(s) of operation 'tensor<?xindex>'
```
This rewrite cannot be implemented as a folder because the result type
may have to change. In the above example, the original `shape.shape_of`
op had a return type of `tensor<?xindex>`, but the folded attribute
(materialized as a `shape.const_shape` op) must have a type of
`tensor<0xf32>` to be valid.
This commit fixes tests such as
`mlir/test/Dialect/Shape/canonicalize.mlir` when verifying the IR after
each pattern application (#74270).
GPU Dialect lowering to SYCL runtime is driven by spirv.target_env
attached to gpu.module. As a result of this, spirv.target_env remains as
an input to LLVMIR Translation.
A SPIRVToLLVMIRTranslation without any actual translation is added to
avoid an unregistered error in mlir-cpu-runner.
SelectObjectAttr.cpp is updated to
1) Pass binary size argument to getModuleLoadFn
2) Pass parameter count to getKernelLaunchFn
This change does not impact CUDA and ROCM usage since both
mlir_cuda_runtime and mlir_rocm_runtime are already updated to accept
and ignore the extra arguments.
Currently the parser & printer of `CallOp` do not match when both
varargs and attr-dict are present (round tripping is broken). This fixes
the parser so that it conforms to the written asm format in the
comments.
Fixes a crash in `TransposeOp::inferReturnTypeComponents()` when the
supplied permutation tensor is rank-0.
Also removes some dead code from the type inference function.
Fix https://github.com/llvm/llvm-project/issues/74237
Continuation of https://github.com/llvm/llvm-project/pull/74247 to fix
https://github.com/llvm/llvm-project/issues/56962. Fixes verifier for
(Integer Attr):
```mlir
llvm.mlir.constant(1 : index) : f32
```
and (Dense Attr):
```mlir
llvm.mlir.constant(dense<100.0> : vector<1xf64>) : f32
```
## Integer Attr
The addition that this PR makes to `LLVM::ConstantOp::verify` is meant
to be exactly verifying the code in
`mlir::LLVM::detail::getLLVMConstant`:
9f78edbd20/mlir/lib/Target/LLVMIR/ModuleTranslation.cpp (L350-L353)
One failure mode is when the `type` (`llvm.mlir.constant(<value>) :
<type>`) is not an `Integer`, because then the `cast` in
`getIntegerBitWidth` will crash:
dca432cb7b/llvm/include/llvm/IR/DerivedTypes.h (L97-L99)
So that's now caught in the verifier.
Apart from that, I don't see anything we could check for. `sextOrTrunc`
means "Sign extend or truncate to width" and that one is quite
permissive. For example, the following doesn't have to be caught in the
verifier as it doesn't crash during `mlir-translate -mlir-to-llvmir`:
```mlir
llvm.func @main() -> f32 {
%cst = llvm.mlir.constant(100 : i64) : f32
llvm.return %cst : f32
}
```
## Dense Attr
Crash if not either a MLIR Vector type or one of these:
9f78edbd20/mlir/lib/Target/LLVMIR/ModuleTranslation.cpp (L375-L391)
This patch adds a target_features (TargetFeaturesAttr) to the LLVM
dialect to allow setting and querying the features in use on a function.
The motivation for this comes from the Arm SME dialect where we would
like a convenient way to check what variants of an operation are
available based on the CPU features.
Intended usage:
The target_features attribute is populated manually or by a pass:
```mlir
func.func @example() attributes {
target_features = #llvm.target_features<["+sme", "+sve", "+sme-f64f64"]>
} {
// ...
}
```
Then within a later rewrite the attribute can be checked, and used to
make lowering decisions.
```c++
// Finds the "target_features" attribute on the parent
// FunctionOpInterface.
auto targetFeatures = LLVM::TargetFeaturesAttr::featuresAt(op);
// Check a feature.
// Returns false if targetFeatures is null or the feature is not in
// the list.
if (!targetFeatures.contains("+sme-f64f64"))
return failure();
```
For now, this is rather simple just checks if the exact feature is in
the list, though it could be possible to extend with implied features
using information from LLVM.
Add support for frame pointers in MLIR.
---------
Co-authored-by: Markus Böck <markus.boeck02@gmail.com>
Co-authored-by: Christian Ulmann <christianulmann@gmail.com>