According to specification the `shift` attribute of the Mul operator in
TOSA is of signless i8 type instead of i32.
Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com>
Buffer deallocation pipeline previously was incorrect when applied to
functions. It has since been fixed. Make sure it is exercised in the
tutorial to avoid leaking allocations.
Values that are the result of buffer allocation ops are guaranteed to
*not* be the same allocation as block arguments of containing blocks.
This fact can be used to allow for more aggressive simplification of
`bufferization.dealloc` ops.
In the process of vectorization of the affine loop, the 0 vector size
causes the crash with building the invalid AffineForOp. We can catch the
case beforehand propagating to the assertion.
See: https://github.com/llvm/llvm-project/issues/64262
This patch updates AffineParallelOp::verify() to check each result type matches
its corresponding reduction op (i.e, the result type must be a `FloatType` if
the reduction attribute is `addf`)
affine.parallel will crash on --lower-affine if the corresponding result type
cannot match the reduction attribute.
```
%128 = affine.parallel (%arg2, %arg3) = (0, 0) to (8, 7) reduce ("maxf") -> (memref<8x7xf32>) {
%alloc_33 = memref.alloc() : memref<8x7xf32>
affine.yield %alloc_33 : memref<8x7xf32>
}
```
This will crash and report a type conversion issue when we run `mlir-opt --lower-affine`
```
Assertion failed: (isa<To>(Val) && "cast<Ty>() argument of incompatible type!"), function cast, file Casting.h, line 572.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0. Program arguments: mlir-opt --lower-affine temp.mlir
#0 0x0000000102a18f18 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/workspacebin/mlir-opt+0x1002f8f18)
#1 0x0000000102a171b4 llvm::sys::RunSignalHandlers() (/workspacebin/mlir-opt+0x1002f71b4)
#2 0x0000000102a195c4 SignalHandler(int) (/workspacebin/mlir-opt+0x1002f95c4)
#3 0x00000001be7894c4 (/usr/lib/system/libsystem_platform.dylib+0x1803414c4)
#4 0x00000001be771ee0 (/usr/lib/system/libsystem_pthread.dylib+0x180329ee0)
#5 0x00000001be6ac340 (/usr/lib/system/libsystem_c.dylib+0x180264340)
#6 0x00000001be6ab754 (/usr/lib/system/libsystem_c.dylib+0x180263754)
#7 0x0000000106864790 mlir::arith::getIdentityValueAttr(mlir::arith::AtomicRMWKind, mlir::Type, mlir::OpBuilder&, mlir::Location) (.cold.4) (/workspacebin/mlir-opt+0x104144790)
#8 0x0000000102ba66ac mlir::arith::getIdentityValueAttr(mlir::arith::AtomicRMWKind, mlir::Type, mlir::OpBuilder&, mlir::Location) (/workspacebin/mlir-opt+0x1004866ac)
#9 0x0000000102ba6910 mlir::arith::getIdentityValue(mlir::arith::AtomicRMWKind, mlir::Type, mlir::OpBuilder&, mlir::Location) (/workspacebin/mlir-opt+0x100486910)
...
```
Fixes#64068
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D157985
Variables that point to physical storage buffer require aliasing
decorations. This is specified by the `SPV_KHR_physical_storage_buffer`
extension.
Also add an example of a variable with a decoration attribute.
This partially reverts #66380. The assertion that the underlying buffer
of an EncodingReader is aligned to any required alignments for resource
sections. Resources know their own alignment and pad their buffers
accordingly, but the bytecode reader doesn't know that ahead of time.
Consequently, it cannot give the resource EncodingReader a base buffer
aligned to the maximum required alignment.
A simple example from the test fails without this:
```mlir
module @TestDialectResources attributes {
bytecode.test = dense_resource<resource> : tensor<4xi32>
} {}
{-#
dialect_resources: {
builtin: {
resource: "0x2000000001000000020000000300000004000000",
resource_2: "0x2000000001000000020000000300000004000000"
}
}
```
Replace some uses of `Type::getPointerTo` via 2 ways
* Remove entirely if it's only used to support an unnecessary bitcast
(remove the bitcast as well).
* Replace with `PointerType::get`/`PointerType::getUnqual`
NFC opaque pointer clean-up effort.
[MLIR] Add stage and effectOnFullRegion to side effect
This patch add stage and effectOnFullRegion to side effect for optimization pass
to obtain more accurate information.
Stage uses numbering to track the side effects's stage of occurrence.
EffectOnFullRegion indicates if effect act on every single value of resource.
RFC disscussion: https://discourse.llvm.org/t/rfc-add-effect-index-in-memroy-effect/72235
Differential Revision: https://reviews.llvm.org/D156087
Reviewed By: mehdi_amini, Mogball
Differential Revision: https://reviews.llvm.org/D156087
SerializetToHsaco, as currently implemented, leaks the file descriptor
of the .hsaco temporary file, which causes issues in long-running
parallel compilation setups.
See also https://github.com/ROCmSoftwarePlatform/rocMLIR/pull/1257
-- SoftmaxOp's `reifyResultShapes` function was wrongly casting it as a
`LinalgOp`.
-- This commit thus adds a fix to SoftmaxOp's reify result shape
calculation.
Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
Added extra comment that should clarify the need for an insertion guard
when using `getLoopOverTileSlices`. Also removed some redundant calls to
`setInsertionPointAfter` - the insertion guard would overwrite that on
destruction anyway.
This adds a simple higher-level op for the tile slice to vector
intrinsics (and updates the existing vector.print lowering to use it).
This op will be used a few more times to implement vector.insert/extract
lowerings in later patches.
This patch adds the OutlineableOpenMPOpInterface to omp.target. This prevents other operations inside the target region such as WSLoop from hoisting new allocas outside the region.
Define operations that wrap the gfx940's new operations for converting
between f32 and registers containing packed sets of four 8-bit floats.
Define rocdl operations for the intrinsics and an AMDGPU dialect
wrapper around them (to account for the fact that MLIR distinguishes
the two float formats at the type level but that the LLVM IR does
not).
Define an ArithToAMDGPU pass, meant to run before conversion to LLVM,
that replaces relevant calls to arith.extf and arith.truncf with the
packed operations in the AMDGPU dialect. Note that the conversion
currently only handles scalars and vectors of rank <= 1, as we do not
have a usecase for multi-dimensional vector support right now.
Reviewed By: jsjodin
Differential Revision: https://reviews.llvm.org/D152457
This patch updates `transform.loop.peel` so that this Op returns two
rather than one handle:
* one for the peeled loop, and
* one for the remainder loop.
Also, following this change this Op will fail if peeling fails. This is
consistent with other similar Ops that also fail if no transformation
takes place.
Relands #67482 with an extra fix for transform_loop_ext.py
The following patterns
- TransferReadToVectorLoadLowering
- TransferWriteToVectorStoreLowering
attempt to generate invalid vector.maskedload and vector.maskedstore ops
for non rank-1 vector types. These ops operate on 1-D vectors. This
patch adds a check to prevent this.
The vector.extract assembly format currently only contains the source
type, for example:
%1 = vector.extract %0[1] : vector<3x7x8xf32>
it's not immediately obvious if this is the source or result type. This
patch improves the assembly format to make this clearer, so the above
becomes:
%1 = vector.extract %0[1] : vector<7x8xf32> from vector<3x7x8xf32>
This patch adds support for lowering a vector.transfer_read with a
transpose permutation map to a vertical tile load, for example:
vector.transfer_read ... permutation_map: (d0, d1) -> (d1, d0)
is converted to:
arm_sme.tile_load ... <vertical>
On SME the transpose can be done in-flight, rather than as a separate
operation as in the TransferReadPermutationLowering, which would do the
following:
%0 = vector.transfer_read ...
vector.transpose %0, [1, 0] ...
The lowering doesn't support masking yet and the transfer_read must be
in-bounds. It also intentionally doesn't handle simple loads as
transfer_write currently does, as the generic
TransferReadToVectorLoadLowering can lower these to simple vector.load
ops, which can already be lowered to ArmSME.
A subsequent patch will update the existing transfer_write lowering,
this is a separate patch as there is currently no lowering for
vector.transfer_read.
Inserting clones requires a lot of assumptions to hold on the input IR, e.g., all writes to a buffer need to dominate all reads. This is not guaranteed by one-shot bufferization and isn't easy to verify, thus it could quickly lead to incorrect results that are hard to debug. This commit changes the mechanism of how an ownership indicator is materialized when there is not already a unique ownership present. Additionally, we don't create copies of returned memrefs anymore when we don't have ownership. Instead, we insert assert operations to make sure we have ownership at runtime, or otherwise report to the user that correctness could not be guaranteed.
In order to support indirect vararg calls, we need to have information about the
callee type - this patch adds a `callee_type` attribute that holds that.
The attribute is required for vararg calls, else, it is optional and the callee
type is inferred by the operands and results of the operation if not present.
The syntax for non-vararg calls remains the same, whereas for vararg calls, it
is changed to this:
```
llvm.call %p(%arg0, %arg0) vararg(!llvm.func<void (i32, ...)>) : !llvm.ptr, (i32, i32) -> ()
llvm.call @s(%arg0, %arg0) vararg(!llvm.func<void (i32, ...)>) : (i32, i32) -> ()
```