This also allows tensor.empty in the "conversion" path of the sparse
compiler, further paving the way to
deprecate the bufferization.allocated_tensor() op.
These do not produce extension-specific ops and are handled via common
patterns for both the KHR and the NV coop matrix extension.
Also improve match failure reporting and error handling in type
conversion.
This commit implements `LoopLikeOpInterface` on `scf.while`. This
enables LICM (and potentially other transforms) on `scf.while`.
`LoopLikeOpInterface::getLoopBody()` is renamed to `getLoopRegions` and
can now return multiple regions.
Also fix a bug in the default implementation of
`LoopLikeOpInterface::isDefinedOutsideOfLoop()`, which returned "false"
for some values that are defined outside of the loop (in a nested op, in
such a way that the value does not dominate the loop). This interface is
currently only used for LICM and there is no way to trigger this bug, so
no test is added.
This function was inconsistent with the remaining API because it
accepted `OpOperand &` that do not belong to the op. All the other
functions assert. This helper function is also not really necessary, as
the iter_arg number is identical to the result number.
This patch adjusts the lower to LLVM-IR inside of
OpenMPToLLVMIRTranslation to faciliate the changes made
to Target related operations to add the new Map related
operations. It also includes adjustments to tests to support
these changes, primarily modifying the MLIR as opposed to
the LLVM-IR, the LLVM-IR should be identical after this patch.
Depends on D158735
Reviewers: kiranchandramohan, TIFitis, razvanlupusoru
Differential Revision: https://reviews.llvm.org/D158737
This patch adds two new operations:
The first is the DataBoundsOp, which is based on OpenACC's DataBoundsOp,
which holds stride, index, extent, lower bound and upper bounds
which will be used in future follow up patches to perform initial
array sectioning of mapped arrays, and Fortran pointer and
allocatable mapping. Similarly to OpenACC, this new OpenMP
DataBoundsOp also comes with a new OpenMP type, which
helps to restrict operations to accepting only
DataBoundsOp as an input or output where necessary
(or other related operations that implement this type as
a return).
The patch also adds the MapInfoOp which rolls up some of
the old map information stored in target
operations into this new operation, and adds new
information that will be utilised in the lowering of mapped
variables, e.g. the aforementioned DataBoundsOp, but also a
new ByCapture OpenMP MLIR attribute, and isImplicit boolean
attribute. Both the ByCapture and isImplicit arguments will
affect the lowering from the OpenMP dialect to LLVM-IR in
minor but important ways, such as shifting the final maptype
or generating different load/store combinations to maintain
semantics with the OpenMP standard and alignment with the
current Clang OpenMP output as best as possible.
This MapInfoOp operation is slightly based on OpenACC's
DataEntryOp, the main difference other than some slightly
different fields (e,g, isImplicit/MapType/ByCapture) is that
OpenACC's data operations "inherit" (the MLIR ODS
equivalent) from this operation, whereas in OpenMP operations
that utilise MapInfoOp's are composed of/contain them.
A series of these MapInfoOp (one per map clause list item) is
now held by target operations that represent OpenMP
directives that utilise map clauses, e.g. TargetOp. MapInfoOp's
do not have their own specialised lowering to LLVM-IR, instead
the lowering is dependent on the particular container of the
MapInfoOp's, e.g. TargetOp has its own lowering to LLVM-IR
which utilised the information stored inside of MapInfoOp's to
affect it's lowering and the end result of the LLVM-IR generated,
which in turn can differ for host and device.
This patch contains these operations, minor changes to the
printing and parsing to support them, changes to tests (only
those relevant to this segment of the patch, other test
additions and changes are in other dependent
patches in this series) and some alterations to the OpenMPToLLVM
rewriter to support the new OpenMP type and operations.
This patch is one in a series that are dependent on each
other:
https://reviews.llvm.org/D158734https://reviews.llvm.org/D158735https://reviews.llvm.org/D158737
Reviewers: kiranchandramohan, TIFitis, razvanlupusoru
Differential Revision: https://reviews.llvm.org/D158732
Instead of checking whether the last operation might be a terminator,
always insert operations to the end of the block.
Signed-off-by: Victor Perez <victor.perez@codeplay.com>
In line with #66515, change `MutableArrayRange::begin`/`end` to
enumerate `OpOperand &` instead of `Value`. Also remove
`ForOp::getIterOpOperands`/`setIterArg`, which are now redundant.
Note: `MutableOperandRange` cannot be made a derived class of
`indexed_accessor_range_base` (like `OperandRange`), because
`MutableOperandRange::assign` can change the number of operands in the
range.
* Moves several orphaned methods from Operation/OpView -> _OperationBase
so that both hierarchies share them (whether unknown or known to ODS).
* Adds typing information for missing `MLIRError` exception.
* Adds `DiagnosticInfo` typing.
* Adds `DenseResourceElementsAttr` typing that was missing.
Enable usage where capturing AsmState is good (e.g., avoiding creating AsmState over and over again when walking IR and printing).
This also only changes one C API to verify plumbing. But using the AsmState makes the cost more explicit than the flags interface (which hides the traversals and construction here) and also enables a more efficient usage C side.
Bufferization of tensor.reshape generates a memref.reshape operation.
memref.reshape requires the source memref to have an identity layout.
The bufferization process may result in the source memref having a
non-identity layout, resulting in a verification failure.
This change causes the bufferization interface for tensor.reshape to
copy the source memref to a new buffer when the source has a
non-identity layout.
Added pass optimizes MLProgram global operations by reducing to only
the minimal load/store operations for global tensors. This avoids
unnecessary global operations throughout a program and potentially
improves operation gusion.
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D159228
The method implementations remain in the .cpp file; explicit instantiations have been added for these two types.
makeMatrix has been duplicated to makeIntMatrix and makeFracMatrix.
…cast) expansion
This revision adds a rewrite for sequences of vector `ext(bitcast)` to
use a more efficient sequence of vector operations comprising `shuffle`
and `bitwise` ops.
Such patterns appear naturally when writing quantization /
dequantization functionality with the vector dialect.
The rewrite performs a simple enumeration of each of the bits in the
result vector and determines its provenance in the source vector. The
enumeration is used to generate the proper sequence of `shuffle`,
`andi`, `ori` with shifts`.
The rewrite currently only applies to 1-D non-scalable vectors and bails
out if the final vector element type is not a multiple of 8. This is a
failsafe heuristic determined empirically: if the resulting type is not
an even number of bytes, further complexities arise that are not
improved by this pattern: the heavy lifting still needs to be done by
LLVM.
This is a minor step towards deprecating bufferization.alloc_tensor().
It replaces the examples with tensor.empty() and adjusts the underlying
rewriting logic to prepare for this upcoming change.
Use the recently introduced llvm.mlir.zero operation for values with
LLVM target extension type. Replaces the previous workaround that uses a
single zero-valued integer attribute constant operation.
Signed-off-by: Lukas Sommer <lukas.sommer@codeplay.com>
This commit removes the deallocation capabilities of
one-shot-bufferization. One-shot-bufferization should never deallocate
any memrefs as this should be entirely handled by the
ownership-based-buffer-deallocation pass going forward. This means the
`allow-return-allocs` pass option will default to true now,
`create-deallocs` defaults to false and they, as well as the escape
attribute indicating whether a memref escapes the current region, will
be removed. A new `allow-return-allocs-from-loops` option is added as a
temporary workaround for some bufferization limitations.
This revision adds support for empty tensor elimination to
"bufferization.materialize_in_destination" by implementing the
`SubsetInsertionOpInterface`.
Furthermore, the One-Shot Bufferize conflict detection is improved for
"bufferization.materialize_in_destination".
…(trunci) expansion
This revision adds a rewrite for sequences of vector `bitcast(trunci)`
to use a more efficient sequence of vector operations comprising
`shuffle` and `bitwise` ops.
Such patterns appear naturally when writing quantization /
dequantization functionality with the vector dialect.
The rewrite performs a simple enumeration of each of the bits in the
result vector and determines its provenance in the pre-trunci vector.
The enumeration is used to generate the proper sequence of `shuffle`,
`andi`, `ori` followed by an optional final `trunci`/`extui`.
The rewrite currently only applies to 1-D non-scalable vectors and bails
out if the final vector element type is not a multiple of 8. This is a
failsafe heuristic determined empirically: if the resulting type is not
an even number of bytes, further complexities arise that are not
improved by this pattern: the heavy lifting still needs to be done by
LLVM.
Use the new ownership based deallocation pass pipeline in the regression
and integration tests. Some one-shot bufferization tests tested one-shot
bufferize and deallocation at the same time. I removed the deallocation
pass there because the deallocation pass is already thoroughly tested by
itself.
Fixed version of #66471
`operator[]` returns `OpOperand &` instead of `Value`.
* This allows users to get OpOperands by name instead of "magic" number.
E.g., `extractSliceOp->getOpOperand(0)` can be written as
`extractSliceOp.getSourceMutable()[0]`.
* `OperandRange` provides a read-only API to operands: `operator[]`
returns `Value`. `MutableOperandRange` now provides a mutable API:
`operator[]` returns `OpOperand &`, which can be used to set operands.
Note: The TableGen code generator could be changed to return `OpOperand
&` (instead of `MutableOperandRange`) for non-variadic and non-optional
arguments in a subsequent change. Then the `[0]` part in the above
example would no longer be necessary.
* Always use the auto-generated `getInitArgs` function. Remove the
hand-written `getInitOperands` duplicate.
* Remove `hasIterOperands` and `getNumIterOperands`. The names were
inconsistent because the "arg" is called `initArgs` in TableGen. Use
`getInitArgs().size()` instead.
* Fix verification around ops with no results.
Add capabilities for comparing opaque properties. This is useful when
dealing with arbitrary operations which can be compare based on their
OperationName. Now you can furthermore compare their properties without
the need to determine their actual type.
Before this change, the `ByteCode` test failed on CentOS 7 with
devtoolset-9, because strings happen to be only 8 byte aligned. In
general though, strings have no alignment guarantee.
Increase resource alignment in test to 32 bytes.
Adjust test to sufficiently align buffer.
Add test to check error when buffer is insufficiently aligned.
This patch changes the MaxScale value for level EightK to 256. Also
updated affected level check tests
Change-Id: Id9cffd5eb9053bb688196cd5b3b55b3ddd2a359c
Signed-off-by: Tai Ly <tai.ly@arm.com>
The previous implementation crashed if run on a `builtin.module` using
an `op_name` filter (because the initial value of `parent` in the while
loop was a `nullptr`). This PR fixes the crash and adds a test.
Rationale:
A bufferization.alloc_tensor can be directly replaced
with tensor.empty since these are more or less semantically
equivalent. The latter is considered a bit more "pure"
with respect to SSA semantics.
One of the main user of these kind of coroutines is swift. There yield-once (`retcon.once`) coroutines are used to temporary "expose" pointers to internal fields of various objects creating borrow scopes.
However, in some cases it might be useful also to allow these coroutines to produce a normal result, but there is no convenient way to represent this (as compared to switched-resume kind of coroutines where C++ `co_return`
is transformed to a member / callback call on promise object).
The extension is simple: we allow continuation function to have a non-void result and accept optional extra arguments via a special `llvm.coro.end.result` intrinsic that would essentially forward them as normal results.
This patch fixes the shared clause for the task construct with multiple
shared variables. The shareds field in the kmp_task_t is not an inline
array in the struct, rather it is a pointer to an array. With an inline
array, the pointer dereference to the outlined function body of the task
would segmentation fault when accessed by the runtime.
Reviewed By: kiranchandramohan, jdoerfert
Differential Revision: https://reviews.llvm.org/D158462
This patch makes sure that the following case is lowered correctly
("duplication"):
```
func.func @broadcast_scalable_duplication(%arg0: vector<[32]xf32>) -> vector<1x[32]xf32> {
%res = vector.broadcast %arg0 : vector<[32]xf32> to vector<1x[32]xf32>
return %res : vector<1x[32]xf32>
}
```
To avoid confusion with vectorization-masked.mlir, move
test/Dialect/Linalg/masked_vectorization.mlir
to:
test/Dialect/Linalg/transpose-compose-masked-vectorize-and-cleanups.mlir
The updated name better reflects what's being tested.
Differential Revision: https://reviews.llvm.org/D158427
Change `SingleBlock::{insert,push_back}` to avoid inserting the argument
operation after the block's terminator. This allows removing
`SingleBlockImplicitTerminator`'s functions with the same name.
Define `Block::hasTerminator` checking whether the block has a
terminator operation.
Signed-off-by: Victor Perez <victor.perez@codeplay.com>
This reverts commit ea42b49f1069ec647cbc67e6360026bc4e5d9a60.
Some GPU integration tests are failing that I didn't observe locally.
Reverting until I have a fix.
Use the new ownership based deallocation pass pipeline in the regression
and integration tests. Some one-shot bufferization tests tested one-shot
bufferize and deallocation at the same time. I removed the deallocation
pass there because the deallocation pass is already thoroughly tested by
itself.