Current generated Python binding for the SCF dialect does not allow
users to call IfOp to create if-else branches on their own.
This PR sets up the default binding generation for scf.if operation
to address this problem.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D121076
This clarifies that this is an LLVM specific variable and avoids
potential conflicts with other projects.
Differential Revision: https://reviews.llvm.org/D119918
* It doesn't required by OpenCL/Intel Level Zero and can be set programmatically.
* Add GPU to spirv lowering in case when attribute is not present.
* Set higher benefit to WorkGroupSizeConversion pattern so it will always try to lower first from the attribute.
Differential Revision: https://reviews.llvm.org/D120399
Early adoption of new technologies or adjusting certain code generation/IR optimization thresholds
is often available through some cl::opt options (which have unstable surfaces).
Specifying such an option twice will lead to an error.
```
% clang -c a.c -mllvm -disable-binop-extract-shuffle -mllvm -disable-binop-extract-shuffle
clang (LLVM option parsing): for the --disable-binop-extract-shuffle option: may only occur zero or one times!
% clang -c a.c -mllvm -hwasan-instrument-reads=0 -mllvm -hwasan-instrument-reads=0
clang (LLVM option parsing): for the --hwasan-instrument-reads option: may only occur zero or one times!
% clang -c a.c -mllvm --scalar-evolution-max-arith-depth=32 -mllvm --scalar-evolution-max-arith-depth=16
clang (LLVM option parsing): for the --scalar-evolution-max-arith-depth option: may only occur zero or one times!
```
The option is specified twice, because there is sometimes a global setting and
a specific file or project may need to override (or duplicately specify) the
value.
The error is contrary to the common practice of getopt/getopt_long command line
utilities that let the last option win and the `getLastArg` behavior used by
Clang driver options. I have seen such errors for several times. I think the
error just makes users inconvenient, while providing very little value on
discouraging production usage of unstable surfaces (this goal is itself
controversial, because developers might not want to commit to a stable surface
too early, or there is just some subtle codegen toggle which is infeasible to
have a driver option). Therefore, I suggest we drop the diagnostic, at least
before the diagnostic gets sufficiently better support for the overridding needs.
Removing the error is a degraded error checking experience. I think this error
checking behavior, if desirable, should be enabled explicitly by tools. Users
preferring the behavior can figure out a way to do so.
Reviewed By: jhenderson, rnk
Differential Revision: https://reviews.llvm.org/D120455
Add operations -, abs, ceil and floor to the index notation.
Add test cases.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D121388
In this CL, update the function name of verifier according to the
behavior. If a verifier needs to access the region then it'll be updated
to `verifyRegions`.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D120373
This patch removes an old recursive implementation to lower vector.transpose to extract/insert operations
and replaces it with a iterative approach that leverages newer linearization/delinearization utilities.
The patch should be NFC except by the order in which the extract/insert ops are generated.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D121321
Add operations abs, ceil, floor, and neg to the C++ API and Python API.
Add test cases.
Reviewed By: gysit
Differential Revision: https://reviews.llvm.org/D121339
This patch moves the testcases from
`mlir/test/Target/LLVMIR/openmp-llvm-bad-schedule-modifier.mlir` to
`mlir/test/Dialect/OpenMP/invalid.mlir` as they test the verifier
(not the translation to LLVM IR).
Reviewed By: NimishMishra
Differential Revision: https://reviews.llvm.org/D120877
This patch adds lowering from omp.atomic.update to LLVM IR. Whenever a
special LLVM IR instruction is available for the operation, `atomicrmw`
instruction is emitted, otherwise a compare-exchange loop based update
is emitted.
Depends on D119522
Reviewed By: ftynse, peixin
Differential Revision: https://reviews.llvm.org/D119657
Currently when we fold an empty loop, we assume that any loop
with iterArgs returns its iterArgs in order, which is not always
the case. It may return values defined outside of the loop or
return its iterArgs out of order. This patch adds support to
those cases.
Reviewed By: dcaballe
Differential Revision: https://reviews.llvm.org/D120776
This revision adds support for the linalg.index to the sparse compiler
pipeline. In essence, this adds the ability to refer to indices in
the tensor index expression, as illustrated below:
Y[i, j, k, l, m] = T[i, j, k, l, m] * i * j
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D121251
It's fine to use any integer (vector) values regardless of the
signedness. The opcode decides how to interpret the bits.
Reviewed By: hanchung
Differential Revision: https://reviews.llvm.org/D121238
These passes generally don't rely on any special aspects of FuncOp, and moving allows
for these passes to be used in many more situations. The passes that obviously weren't
relying on invariants guaranteed by a "function" were updated to be generic pass, the
rest were updated to be FunctionOpinterface InterfacePasses.
The test updates are NFC switching from implicit nesting (-pass -pass2) form to
the -pass-pipeline form (generic passes do not implicitly nest as op-specific passes do).
Differential Revision: https://reviews.llvm.org/D121190
A lot of test passes are currently anchored on FuncOp, but this
dependency
is generally just historical. A majority of these test passes can run on
any operation, or can operate on a specific interface
(FunctionOpInterface/SymbolOpInterface).
This allows for greatly reducing the API dependency on FuncOp, which
is slated to be moved out of the Builtin dialect.
Differential Revision: https://reviews.llvm.org/D121191
Commit rG1a2bb03edab9d7aa31beb587d0c863acc6715d27 introduced a pattern
to convert dynamic dimensions in operands of `GenericOp`s to static
values based on indexing maps and shapes of other operands. The logic
is directly usable to any `LinalgOp`. Move that pattern as an
`OpInterfaceRewritePattern`.
Differential Revision: https://reviews.llvm.org/D120968
This is a pass that can be used by downstream consumers directly
to avoid the boilerplate to wrap around the `populate*Patterns`.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D121222
A `tensor.cast` consumer can be folded with its producer. This is
beneficial only if the result of the tensor cast is more static than
the source. This patch adds a utility function to check that this is
the case, and adds a couple of canonicalizations patterns that fold an
operation with `tensor.cast` conusmers.
Reviewed By: gysit
Differential Revision: https://reviews.llvm.org/D120950
It's valid to create a TypedArrayAttr or MixedContainerType with
nullptr, e.g.,
std::vector<mlir::Attribute> attrs = {mlir::StringAttr()};
builder.createArrayAttr(attrs);
The predicate didn't check if it's a nullptr and it ended up a crash in
the attribute static verifier. We always check if an attribute is null
so it's better to align the check for these two container type attr.
Reviewed By: rdzhabarov
Differential Revision: https://reviews.llvm.org/D121178
With the recent improvements to OpDSL it is cheap to reintroduce a linalg.copy operation.
This operation is needed in at least 2 cases:
1. for copies that may want to change the elemental type (e.g. cast, truncate, quantize, etc)
2. to specify new tensors that should bufferize to a copy operation. The linalg.generic form
always folds away which is not always the right call.
Differential Revision: https://reviews.llvm.org/D121230
Allow pointwise operations to take rank zero input tensors similarly to scalar inputs. Use an empty indexing map to broadcast rank zero tensors to the iteration domain of the operation.
Depends On D120734
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D120807
Simplify tests that use `linalg.fill_rng_2d` to focus on testing the `const` and `index` functions. Additionally, cleanup emit_misc.py to use simpler test functions and fix an error message in config.py.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D120734
Extend OpDSL with a `defines` method that can set the `hasCanonicalizer` flag for an OpDSL operation. If the flag is set via `defines(Canonicalizer)` the operation needs to implement the `getCanonicalizationPatterns` method. The revision specifies the flag for linalg.fill_tensor and adds an empty `FillTensorOp::getCanonicalizationPatterns` implementation.
This revision is a preparation step to replace linalg.fill by its OpDSL counterpart linalg.fill_tensor. The two are only functionally equivalent if both specify the same canonicalization patterns. The revision is thus a prerequisite for the linalg.fill replacement.
Depends On D120725
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D120726
Add a FillOpInterface similar to the contraction and convolution op interfaces. The FillOpInterface is a preparation step to replace linalg.fill by its OpDSL version linalg.fill_tensor. The interface implements the `value()`, `output()`, and `result()` methods that by default are not available on linalg.fill_tensor.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D120725
Currently, the transfer mask is materialized by generating the vector
comparison: [offset + 0, .., offset + length - 1] < [dim, .., dim]
A better alternative is to materialize the transfer mask by using the
operation: `vector.create_mask (dim - offset)`, which will generate
simpler code and compose better with scalable vectors.
Differential Revision: https://reviews.llvm.org/D120487
This is to align with the PyTACO API better.
Modify an existing unit test to test the new routines.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D121083
In quantized comutation, there are casting ops around computation ops.
Reorder the ops to make reduce-to-contract actually work.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D120760
The current StandardToLLVM conversion patterns only really handle
the Func dialect. The pass itself adds patterns for Arithmetic/CFToLLVM, but
those should be/will be split out in a followup. This commit focuses solely
on being an NFC rename.
Aside from the directory change, the pattern and pass creation API have been renamed:
* populateStdToLLVMFuncOpConversionPattern -> populateFuncToLLVMFuncOpConversionPattern
* populateStdToLLVMConversionPatterns -> populateFuncToLLVMConversionPatterns
* createLowerToLLVMPass -> createConvertFuncToLLVMPass
Differential Revision: https://reviews.llvm.org/D120778
These unit tests resides in an internal repository. Porting the tests to the
public repository.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D121021
The default lowering of vector transpose operations generates a large sequence of
scalar extract/insert operations, one pair for each scalar element in the input tensor.
In other words, the vector transpose is scalarized. However, there are transpose
patterns where one or more adjacent high-order dimensions are not transposed (for
example, in the transpose pattern [1, 0, 2, 3], dimensions 2 and 3 are not transposed).
This patch improves the lowering of those cases by not scalarizing them and extracting/
inserting a full n-D vector, where 'n' is the number of adjacent high-order dimensions
not being transposed. By doing so, we prevent the scalarization of the code and generate a
more performant vector version.
Paradoxically, this patch shouldn't improve the performance of transpose operations if
we are using LLVM. The LLVM pipeline is able to optimize away some of the extract/insert
operations and the SLP vectorizer is converting the scalar operations back to its vector
form. However, scalarizing a vector version of the code in MLIR and relying on the SLP
vectorizer to reconstruct the vector code again is highly undesirable for several reasons.
Reviewed By: nicolasvasilache, ThomasRaoux
Differential Revision: https://reviews.llvm.org/D120601
This patch fixes the crash when printing some ops (like affine.for and
scf.for) when they are dumped in invalid state, e.g. during pattern
application. Now the AsmState constructor verifies the operation
first and switches to generic operation printing when the verification
fails. Also operations are now printed in generic form when emitting
diagnostics and the severity level is Error.
Reviewed By: rriddle, mehdi_amini
Differential Revision: https://reviews.llvm.org/D117834
The OpenMPIRBuilder has a bug. Specifically, suppose you have two nested openmp parallel regions (writing with MLIR for ease)
```
omp.parallel {
%a = ...
omp.parallel {
use(%a)
}
}
```
As OpenMP only permits pointer-like inputs, the builder will wrap all of the inputs into a stack allocation, and then pass this
allocation to the inner parallel. For example, we would want to get something like the following:
```
omp.parallel {
%a = ...
%tmp = alloc
store %tmp[] = %a
kmpc_fork(outlined, %tmp)
}
```
However, in practice, this is not what currently occurs in the context of nested parallel regions. Specifically to the OpenMPIRBuilder,
the entirety of the function (at the LLVM level) is currently inlined with blocks marking the corresponding start and end of each
region.
```
entry:
...
parallel1:
%a = ...
...
parallel2:
use(%a)
...
endparallel2:
...
endparallel1:
...
```
When the allocation is inserted, it presently inserted into the parent of the entire function (e.g. entry) rather than the parent
allocation scope to the function being outlined. If we were outlining parallel2, the corresponding alloca location would be parallel1.
This causes a variety of bugs, including https://github.com/llvm/llvm-project/issues/54165 as one example.
This PR allows the stack allocation to be created at the correct allocation block, and thus remedies such issues.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D121061
This commit adds a new hook Pass `bool canScheduleOn(RegisteredOperationName)` that
indicates if the given pass can be scheduled on operations of the given type. This makes it
easier to define constraints on generic passes without a) adding conditional checks to
the beginning of the `runOnOperation`, or b) defining a new pass type that forwards
from `runOnOperation` (after checking the invariants) to a new hook. This new hook is
used to implement an `InterfacePass` pass class, that represents a generic pass that
runs on operations of the given interface type.
The PassManager will also verify that passes added to a pass manager can actually be
scheduled on that pass manager, meaning that we will properly error when an Interface
is scheduled on an operation that doesn't actually implement that interface.
Differential Revision: https://reviews.llvm.org/D120791
RegionBranchOpInterface and BranchOpInterface are allowed to make implicit type conversions along control-flow edges. In effect, this adds an interface method, `areTypesCompatible`, to both interfaces, which should return whether the types of corresponding successor operands and block arguments are compatible. Users of the interfaces, here on forth, must be aware that types may mismatch, although current users (in MLIR core), are not affected by this change. By default, type equality is used.
`async.execute` already has unequal types along control-flow edges (`!async.value<f32>` vs. `f32`), but it opted out of calling `RegionBranchOpInterface::verifyTypes` in its verifier. That method has now been removed and `RegionBranchOpInterface` will verify types along control edges by default in its verifier.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D120790
This patch extends the existing if combining canonicalization to also handle the case where a value returned by the first if is used within the body of the second if.
This patch also extends if combining to support if's whose conditions are logical negations of each other.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D120924
We can simplify an extractvalue of an insertvalue to extract out of the base of the insertvalue, if the insert and extract are at distinct and non-prefix'd indices
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D120915