This patch updates `transform.loop.peel` so that this Op returns two
rather than one handle:
* one for the peeled loop, and
* one for the remainder loop.
Also, following this change this Op will fail if peeling fails. This is
consistent with other similar Ops that also fail if no transformation
takes place.
Relands #67482 with an extra fix for transform_loop_ext.py
Rename and restructure tiling-related transform ops from the structured
extension to be more homogeneous. In particular, all ops now follow a
consistent naming scheme:
- `transform.structured.tile_using_for`;
- `transform.structured.tile_using_forall`;
- `transform.structured.tile_reduction_using_for`;
- `transform.structured.tile_reduction_using_forall`.
This drops the "_op" naming artifact from `tile_to_forall_op` that
shouldn't have been included in the first place, consistently specifies
the name of the control flow op to be produced for loops (instead of
`tile_reduction_using_scf` since `scf.forall` also belongs to `scf`),
and opts for the `using` connector to avoid ambiguity.
The loops produced by tiling are now systematically placed as *trailing*
results of the transform op. While this required changing 3 out of 4 ops
(except for `tile_using_for`), this is the only choice that makes sense
when producing multiple `scf.for` ops that can be associated with a
variadic number of handles. This choice is also most consistent with
*other* transform ops from the structured extension, in particular with
fusion ops, that produce the structured op as the leading result and the
loop as the trailing result.
This PR adds a new transform op that replaces `memref.alloca`s with
`memref.get_global`s to newly inserted `memref.global`s. This is useful,
for example, for allocations that should reside in the shared memory of
a GPU, which have to be declared as globals.
This PR renames the vectorization transform ops as follows:
* `structured.masked_vectorize` => `structured.vectorize`. This reflects
the fact that since [recently](https://reviews.llvm.org/D157774) the op
can also handle the unmasked case.
* `structured.vectorize` =>
`structured.vectorize_children_and_applies_patterns`. This reflects the
fact that the op does not just vectorize the given payload op but all
vectorizable children contained in it, and applies patterns before and
after for preparation and clean-up.
This rename was discussed first
[here](https://reviews.llvm.org/D157774).
The PR also adapts and cleans ups the tablegen description of the
`VectorizeChildrenAndApplyPatternsOp` (formerly `VectorizeOp`).
Don't generate enums from the main VectorOps.td file as that
transitively includes enums from Arith.
---------
Co-authored-by: Nicolas Vasilache <ntv@google.com>
- make sure that the type of induction variable should be determined by the type of the lower bound type.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D159534
The Linalg named ops specification went out of sync with the OpDSL
description, presumably due to direct manual modifications of the yaml
file.
Additionally, the unsigned division has been generating the signed
scalar instruction, which is now fixed.
This revision pipes the fastmath attribute support through the
vector.reduction op. This seemingly simple first step already requires
quite some genuflexions, file and builder reorganization. In the
process, retire the boolean reassoc flag deep in the LLVM dialect
builders and just use the fastmath attribute.
During conversions, templated builders for predicated intrinsics are
partially cleaned up. In the future, to finalize the cleanups, one
should consider adding fastmath to the VPIntrinsic ops.
* Moves several orphaned methods from Operation/OpView -> _OperationBase
so that both hierarchies share them (whether unknown or known to ODS).
* Adds typing information for missing `MLIRError` exception.
* Adds `DiagnosticInfo` typing.
* Adds `DenseResourceElementsAttr` typing that was missing.
This commit removes the deallocation capabilities of
one-shot-bufferization. One-shot-bufferization should never deallocate
any memrefs as this should be entirely handled by the
ownership-based-buffer-deallocation pass going forward. This means the
`allow-return-allocs` pass option will default to true now,
`create-deallocs` defaults to false and they, as well as the escape
attribute indicating whether a memref escapes the current region, will
be removed. A new `allow-return-allocs-from-loops` option is added as a
temporary workaround for some bufferization limitations.
For some reason, the mix-ins of the Python bindings of this dialect used
the PDL type for "any op". However, PDL isn't involved here, so it makes
more sense to use the corresponding type of the transform dialect. This
PR changes that.
`_get_op_result_or_value` was used in mix-ins to unify the handling of
op results and values. However, that function is now called in the
generated constructors, such that doing so in the mix-ins is not
necessary anymore.
This is the first commit in a series with the goal to rework the
BufferDeallocation pass. Currently, this pass heavily relies on copies
to perform correct deallocations, which leads to very slow code and
potentially high memory usage. Additionally, there are unsupported cases
such as returning memrefs which this series of commits aims to add
support for as well.
This first commit removes the deallocation capabilities of
one-shot-bufferization.One-shot-bufferization should never deallocate any
memrefs as this should be entirely handled by the buffer-deallocation pass
going forward. This means the allow-return-allocs pass option will
default to true now, create-deallocs defaults to false and they, as well
as the escape attribute indicating whether a memref escapes the current region,
will be removed.
The documentation should w.r.t. these pass option changes should also be
updated in this commit.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D156662
This patch is part of a larger initiative aimed at fixing floating-point `max` and `min` operations in MLIR: https://discourse.llvm.org/t/rfc-fix-floating-point-max-and-min-operations-in-mlir/72671.
This commit addresses Task 1.2 of the mentioned RFC. By renaming these operations, we align their names with LLVM intrinsics that have corresponding semantics.
This patch adds attribute builders for all buildable attributes from the
builtin dialect that did not previously have any. These builders can be
used to construct attributes of a particular type identified by a string
from a Python argument without knowing the details of how to pass that
Python argument to the attribute constructor. This is used, for example,
in the generated code of the Python bindings of ops.
The list of "all" attributes was produced with:
(
grep -h "ods_ir.AttrBuilder.get" $(find ../build/ -name "*_ops_gen.py") \
| cut -f2 -d"'"
git grep -ho "^def [a-zA-Z0-9_]*" -- include/mlir/IR/CommonAttrConstraints.td \
| cut -f2 -d" "
) | sort -u
Then, I only retained those that had an occurence in
`mlir/include/mlir/IR`. In particular, this drops many dialect-specific
attributes; registering those builders is something that those dialects
should do. Finally, I removed those attrbiutes that had a match in
`mlir/python/mlir/ir.py` already and implemented the remaining ones. The
only ones that still miss a builder now are the following:
* Represent more than one possible attribute type:
- `Any.*Attr` (9x)
- `IntNonNegative`
- `IntPositive`
- `IsNullAttr`
- `ElementsAttr`
* I am not sure what "constant attributes" are:
- `ConstBoolAttrFalse`
- `ConstBoolAttrTrue`
- `ConstUnitAttr`
* `Location` not exposed by Python bindings:
- `LocationArrayAttr`
- `LocationAttr`
* `get` function not implemented in Python bindings:
- `StringElementsAttr`
This patch also fixes a compilation problem with
`I64SmallVectorArrayAttr`.
Reviewed By: makslevental, rkayaith
Differential Revision: https://reviews.llvm.org/D159403
Memref descriptors contain an `offset` field that denotes the start of
the content of the memref relative to the `alignedPtr`. This offset is
not considered when converting a memref descriptor to a np.array in the
Python runtime library, essentially treating all memrefs as if they had
an offset of zero. This patch introduces the necessary pointer arithmetic
to find the actual beginning of the memref contents to the memref->numpy
conversion functions.
There is an ongoing discussion about whether the `offset` field is needed
at all in the memref descriptor.
Until that is decided, the Python runtime and CRunnerUtils should
still correctly implement the offset handling.
Related: https://reviews.llvm.org/D157008
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D158494
The mix-in of the `MultiTileSizesOp` set the default value of its
`divisor` argument. This repeats information from the tablegen
defintion, is not necessary (since the generic code deals with `None`
and default values), and has the risk of running out of sync without
people noticing. This patch removes the setting of the value and forward
`None` to the generic constructor instead.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D159416
This patch simplifies and improves the mix-in of the `TileOp`. In
particular:
* Accept all types of sizes (static, dynamic, scalable) in a single
argument `sizes`.
* Use the existing convenience function to dispatch different types of
sizes instead of repeating the implementation in the mix-in.
* Pass on `None` values as is of optional arguments to the init function
of the super class.
* Reformat with default indentation width (4 spaces vs 2 spaces).
* Add a a test for providing scalable sizes.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D159417
This patch removes some manual conversion of mixed Python/attribute
arguments to `I64ArrayAttr`s, which turned out to be unnecessary.
Interestingly, this change does not depend on the additional attribute
builders added in the (currently pending)
https://reviews.llvm.org/D159403 patch.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D159419
The mix-in did not allow to *not* set many of the arguments, even though
they represent optional attributes. Instead, it set default values,
which have different semantics in some cases. In other cases, setting
the default values is already done by the C++ layer, in which case they
are currently redundant and may be wrong in some potential future change
in the TD or C++ files. With this patch, `None` is preserved until the
generated binding, which handles them as desired.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D158844
This patch makes the `transform.structured.pad` op return also a handle
to the copy op that it inserts. This allows to continue transformation
on that op, such as mapping it to a GPU thread.
The patch was mainly authored by @springerm as part of the WIP patch
https://reviews.llvm.org/D156371, which also has an example usage of
this change.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D159088
Extends the existing mix-in for VectorizeOp with support for the missing unit attributes.
Also fixes the unintuitive implementation where
`structured.VectorizeOp(target=target, vectorize_padding=False)` still resulted in the creation of the UnitAttr `vectorize_padding`.
Reviewed By: ingomueller-net
Differential Revision: https://reviews.llvm.org/D158726
This PR implements python enum bindings for *all* the enums - this includes `I*Attrs` (including positional/bit) and `Dialect/EnumAttr`.
There are a few parts to this:
1. CMake: a small addition to `declare_mlir_dialect_python_bindings` and `declare_mlir_dialect_extension_python_bindings` to generate the enum, a boolean arg `GEN_ENUM_BINDINGS` to make it opt-in (even though it works for basically all of the dialects), and an optional `GEN_ENUM_BINDINGS_TD_FILE` for handling corner cases.
2. EnumPythonBindingGen.cpp: there are two weedy aspects here that took investigation:
1. If an enum attribute is not a `Dialect/EnumAttr` then the `EnumAttrInfo` record is canonical, as far as both the cases of the enum **and the `AttrDefName`**. On the otherhand, if an enum is a `Dialect/EnumAttr` then the `EnumAttr` record has the correct `AttrDefName` ("load bearing", i.e., populates `ods.ir.AttributeBuilder('<NAME>')`) but its `enum` field contains the cases, which is an instance of `EnumAttrInfo`. The solution is to generate an one enum class for both `Dialect/EnumAttr` and "independent" `EnumAttrInfo` but to make that class interopable with two builder registrations that both do the right thing (see next sub-bullet).
2. Because we don't have a good connection to cpp `EnumAttr`, i.e., only the `enum class` getters are exposed (like `DimensionAttr::get(Dimension value)`), we have to resort to parsing e.g., `Attribute.parse(f'#gpu<dim {x}>')`. This means that the set of supported `assemblyFormat`s (for the enum) is fixed at compile of MLIR (currently 2, the only 2 I saw). There might be some things that could be done here but they would require quite a bit more C API work to support generically (e.g., casting ints to enum cases and binding all the getters or going generically through the `symbolize*` methods, like `symbolizeDimension(uint32_t)` or `symbolizeDimension(StringRef)`).
A few small changes:
1. In addition, since this patch registers default builders for attributes where people might've had their own builders already written, I added a `replace` param to `AttributeBuilder.insert` (`False` by default).
2. `makePythonEnumCaseName` can't handle all the different ways in which people write their enum cases, e.g., `llvm.CConv.Intel_OCL_BI`, which gets turned into `INTEL_O_C_L_B_I` (because `llvm::convertToSnakeFromCamelCase` doesn't look for runs of caps). So I dropped it. On the otherhand regularization does need to done because some enums have `None` as a case (and others might have other python keywords).
3. I turned on `llvm` dialect generation here in order to test `nvvm.WGMMAScaleIn`, which is an enum with [[ d7e26b5620/mlir/include/mlir/IR/EnumAttr.td (L22-L25) | no explicit discriminator ]] for the `neg` case.
Note, dialects that didn't get a `GEN_ENUM_BINDINGS` don't have any enums to generate.
Let me know if I should add more tests (the three trivial ones I added exercise both the supported `assemblyFormat`s and `replace=True`).
Reviewed By: stellaraccident
Differential Revision: https://reviews.llvm.org/D157934
In particular:
* Fix and extend the support for constructing possibly nested ArrayAttrs
from lists of Python ints. This can probably be generalized further
and used in many more places.
* Add arguments for `pad_to_multiple_of` and `copy_back_op`.
* Format with black and reorder (keyword-only) arguments to match
tablegen and (`*_gen.py`) order.
* Extend tests for new features.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D157789
Fix forward bug in dac19b457e, which uses
the vertical bar operator for type hints, which is only supported by
Python 3.10 and later, and thus breaks the builds on Python 3.8.
This patch adds a mix-in class for the only transform op of the tensor
dialect that can benefit from one: the MakeLoopIndependentOp. It adds an
overload that makes providing the return type optional.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D156918
This op is the batched version of linalg.mmt4d. It performs matrix-matrix-transpose multiplication of batched 4-d (5d) inputs as the following:
```
C[b, m1, n1, m0, n0] = sum_{b, k1, k0}(A[b, m1, k1, m0, k0] * B[b, n1, k1, n0, k0])
```
The current use is to provide `linalg.batch_matmul` a lowering path similar to `linalg.matmul -> linalg.mmt4d`.
Differential Revision: https://reviews.llvm.org/D156912
This patch uses the new enum binding generation to add the enums of the
dialect to the Python bindings and uses them in the mix-in class where
it was still missing (namely, the `LayoutMapOption` for the
`function_boundary_type_conversion` of the `OneShotBufferizeOp`.
The patch also piggy-backs a few smaller clean-ups:
* Order the keyword-only arguments alphabetically.
* Add the keyword-only arguments to an overload where they were left out
by accident.
* Change some of the attribute values used in the tests to non-default
values such that they show up in the output IR and check for that
output.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D156664
Create a mix-in class with an overloaded constructor that makes the
return type optional.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D156561
Provide Python bindings for transform ops defined in the vector dialect.
All of these ops are sufficiently simple that no mixins are necessary
for them to be nicely usable.
Reviewed By: ingomueller-net
Differential Revision: https://reviews.llvm.org/D156554
Add an ODS (tablegen) backend to generate Python enum classes and
attribute builders for enum attributes defined in ODS. This will allow
us to keep the enum attribute definitions in sync between C++ and
Python, as opposed to handwritten enum classes in Python that may end up
using mismatching values. This also makes autogenerated bindings more
convenient even in absence of mixins.
Use this backend for the transform dialect failure propagation mode enum
attribute as demonstration.
Reviewed By: ingomueller-net
Differential Revision: https://reviews.llvm.org/D156553
This patch creates the .td files for the Python bindings of the
transform ops of the MemRef dialect and integrates them into the build
systems (CMake and Bazel).
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D156536