llvm-capstone

mirror of https://github.com/capstone-engine/llvm-capstone.git synced 2024-11-27 07:31:28 +00:00

Author	SHA1	Message	Date
Han-Chung Wang	78348b6915	[mlir][tensor] Improve tensor.pack simplication pattern. (#76606 ) A tensor.pack op can be rewritten to a tensor.expand_shape op if the packing only happens on inner most dimension. This also formats the lit checks better.	2024-01-02 09:34:24 -08:00
Jungwook Park	2292fd0129	[mlir][spirv] Add support for C-API/python binding to SPIR-V dialect (#76055 ) Enable bindings. --------- Co-authored-by: jungpark-mlir <jungwook@jungwook-22.04>	2024-01-02 08:11:44 -08:00
Simon Camphausen	795c989c38	[mlir][EmitC] Disallow string attributes as initial values (#75310 )	2024-01-02 16:53:36 +01:00
Tobias Gysi	534034737a	[mlir][llvm] Import call site calling conventions (#76391 ) This revision adds support for importing call site calling conventions. Additionally, the revision also adds a roundtrip test for an indirect call with a non-standard calling convention.	2024-01-02 14:27:10 +01:00
Adrian Kuegel	ac8b53fc92	[mlir] Apply ClangTidy performance fix - Use '\n' instead of std::endl; https://clang.llvm.org/extra/clang-tidy/checks/performance/avoid-endl.html	2024-01-02 10:00:29 +00:00
Adrian Kuegel	baf8a39aaf	[mlir] Apply ClangTidy fix. Prefer to use .empty() instead of checking size().	2024-01-02 08:55:37 +00:00
Adrian Kuegel	b238a0d989	[mlir] Apply ClangTidy findings. - Remove redundant return - Use .empty() instead of size() == 0.	2024-01-02 08:53:01 +00:00
Kareem Ergawy	75be7bb3fc	[flang][OpenMP][Offloading][AMDGPU] Add test for `target update` (#76355 ) Adds a new test for offloading `target update` directive to AMD GPUs.	2024-01-02 09:50:27 +01:00
Andrei Golubev	992661922a	[mlir] Make TypedValue::getType() const (#76568 ) The TypedValue::getType() essentially forwards the return value of Value::getType() which is a const method. Somehow, at TypedValue level the method's constness is lost, so restore it. Originally done by: Nikita Kudriavtsev <nikita.kudriavtsev@intel.com>	2024-01-01 21:43:18 +01:00
Bharathi Ramana Joshi	ff80414620	[MLIR][Presburger] Implement PresburgerSpace::mergeAndAlignSymbols (#76397 )	2024-01-01 23:40:57 +05:30
Spenser Bauman	6b65d79fbb	[mlir][linalg] Fix for invalid IR in eliminate_empty_tensors (#73513 ) The transform.structured.eliminate_empty_tensors can produce mis-typed IR when traversing use-def chains past tensor reshaping operations for sharing candidates. This results in Linalg operations whose output types do not match their 'outs' arguments. This patch filters out candidate tensor.empty operations when their types do not match the candidate input operand.	2024-01-01 17:12:40 +00:00
yonillasky	703e83611c	[MLIR][LLVM] Add llvm.intr.coro.promise (#76640 ) Added to allow generating these intrinsics in out-of-tree MLIR passes. Co-authored-by: Yoni Lavi <yoni.lavi@nextsilicon.com>	2024-01-01 11:39:29 +01:00
Bharathi Ramana Joshi	b8e4053c06	[MLIR][Presburger] Fix bug in Identifier::isEqual assert (#76380 ) Make identifiers::isEqual return false instead of failing assertion when identifiers are not equal.	2023-12-31 11:02:13 +05:30
Abhinav271828	e213af78b2	[MLIR][Presburger] Fix a bug with determinant of IntMatrix (#76622 ) Fixed a bug where IntMatrix determinant() had a bug where it would try to assign to a null pointer. Added a test case that triggers this bug to avoid regressions.	2023-12-30 22:03:01 +02:00
Han-Chung Wang	4b14205bc0	[mlir][tensor] Centralize pack/unpack related patterns. (#76603 ) The revision moves pack/unpack related patterns to PackAndUnpackPatterns.cpp. This follows the convention like other tensor ops. It also renames `populateSimplifyTensorPack` to `populateSimplifyPackAndUnpackPatterns` and adds a TODO item for tensor.unpack op.	2023-12-30 11:40:40 -08:00
long.chen	eaa32d20a2	[mlir] fix affine-loop-fusion crash (#76351 ) If `user` not lies in `Region` `findAncestorOpInRegion` will return `nullptr`. Fixes https://github.com/llvm/llvm-project/issues/76281.	2023-12-29 10:51:51 +08:00
Jakub Kuderski	2af186f9bd	[mlir][gpu] Add patterns to break down subgroup reduce (#76271 ) The new patterns break down subgroup reduce ops with vector values into a sequence of subgroup reductions that fit the native shuffle size. The maximum/native shuffle size is parametrized. The overall goal is to be able to perform multi-element reductions with a sequence of `gpu.shuffle` ops.	2023-12-28 14:39:46 -05:00
youkaichao	e9bc4aaa79	[mlir][gpu][docs] fix incorrect syntax for gpu.launch (#76381 ) Per the code: `5c39b8d1a8/mlir/include/mlir/Dialect/GPU/IR/GPUOps.td (L805)` And the usage: `5c39b8d1a8/mlir/lib/Dialect/GPU/IR/GPUDialect.cpp (L869)` The keyword should be `blocks` rather than `block`. The documentation of the syntax is out of date.	2023-12-28 11:43:55 -05:00
gitoleg	8cf6bcf5a3	[mlir][llvm] Add assert in CallOp builder (#76240 ) This commit adds an assert in one of the CallOp builders to ensure it is not use to create an indirect call. Otherwise, the callee type would include the callee pointer type which is handed in as first argument.	2023-12-27 17:08:35 +01:00
Xiang Li	1150e8ef77	[mlir::spirv] Support scf.if in mlir-vulkan-runner (#75367 ) 1. Register SCFDialect in mlir-vulkan-runner 2. Add SCFToSPIRV in GPUToSPIRVPass to lower scf. Fixes https://github.com/llvm/llvm-project/issues/74939	2023-12-27 10:32:21 -05:00
Balaji V. Iyer	36fd7291cd	[mlir][Quasipolynomial] Fixed -Wunused-variable in GeneratorFunction.h (#76419 ) ``` llvm-project/mlir/lib/Analysis/Presburger/GeneratingFunction.h:56:28: error: unused variable 'term' [-Werror,-Wunused-variable] 56 \| for (const ParamPoint &term : numerators) \| ^~~~ 1 error generated. ```	2023-12-26 19:29:04 -06:00
Jie Fu	c86fe3ee0b	[mlir][Quasipolynomials] Fix -Wunused-variable in QuasiPolynomial.cpp (NFC) llvm-project/mlir/lib/Analysis/Presburger/QuasiPolynomial.cpp:29:39: error: unused variable 'aff' [-Werror,-Wunused-variable] for (const SmallVector<Fraction> &aff : term) { ^ 1 error generated.	2023-12-27 08:00:58 +08:00
Balaji V. Iyer	532d4845ed	[mlir][Quasipolynomials] Fixed type issues in GeneratorFuunction.h (#76413 ) Fixed two issues: A SmallVector size that caused size-differences issue (8 vs. 12). Thus removed this size restriction. Also a constant parameter was causing an issue in a function not marked constant.	2023-12-26 16:34:32 -06:00
Abhinav271828	1022febd9d	[MLIR][Presburger] Generating functions and quasi-polynomials for Barvinok's algorithm (#75702 ) Define basic types and classes for Barvinok's algorithm, including polyhedra, generating functions and quasi-polynomials. The class definitions include methods for arithmetic manipulation, printing, logical relations, etc.	2023-12-26 21:29:26 +02:00
Rik Huijzer	061e4f24b2	[mlir][doc] Escape effects, interfaces, and traits (#76297 ) Fixes https://github.com/llvm/llvm-project/issues/76270. Thanks to @scottamain for the clear description. Co-authored-by: Scott Main <scott@modular.com>	2023-12-23 21:48:33 +01:00
Adam Paszke	85b2327192	[mlir][nvvm] Fix the PTX lowering of wgmma.mma_async (#76150 )	2023-12-22 14:46:34 +01:00
Dominik Adamski	ffabf73553	[NFC][OpenMP][MLIR] Add test for lowering parallel workshare GPU loop (#76144 ) This test checks if MLIR code is lowered according to schema presented below: func1() { call __kmpc_parallel_51(..., func2, ...) } func2() { call __kmpc_for_static_loop_4u(..., func3, ...) } func3() { //loop body }	2023-12-22 11:58:04 +01:00
Matthias Springer	73b86d1b2d	[mlir][Transforms] `GreedyPatternRewriteDriver`: verify IR (#74270 ) This commit adds an additional "expensive check" that verifies the IR before starting a greedy pattern rewriter, after every pattern application and after every folding. (Only if `MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS` is set.) It also adds an assertion that the `scope` region (part of `GreedyRewriteConfig`) is not being erased as part of the greedy pattern rewrite. That would break the scoping mechanism and the expensive checks. This commit does not fix any patterns, this is done in separate commits.	2023-12-22 16:44:07 +09:00
Jakub Kuderski	a03c53c530	[mlir][spirv] Add physical storage buffer extension test. NFC. (#76196 ) This test demonstrates how the PhysicalStorageBuffer extension can be used end-2-end in a spir-v module. This module has been verified to pass serialization, deserialization, and validation with spirv-val.	2023-12-21 23:06:26 -05:00
Ryan Holt	847a6f8f0a	[mlir][MemRef] Add runtime bounds checking (#75817 ) This change adds (runtime) bounds checks for `memref` ops using the existing `RuntimeVerifiableOpInterface`. For `memref.load` and `memref.store`, we check that the indices are in-bounds of the memref's index space. For `memref.reinterpret_cast` and `memref.subview` we check that the resulting address space is in-bounds of the input memref's address space.	2023-12-22 11:49:15 +09:00
Matthias Springer	c99670ba51	[mlir][vector] `LoadOp`/`StoreOp`: Allow 0-D vectors (#76134 ) Similar to `vector.transfer_read`/`vector.transfer_write`, allow 0-D vectors. This commit fixes `mlir/test/Dialect/Vector/vector-transfer-to-vector-load-store.mlir` when verifying the IR after each pattern (#74270). That test produces a temporary 0-D load/store op.	2023-12-22 11:12:58 +09:00
Andrzej Warzyński	e6f5762879	[mlir][vector][nfc] Add a test case for scalable vectors (#76138 ) Extends fold-arith-extf-into-vector-contract.mlir by adding a test case for scalable vectors.	2023-12-21 18:45:00 +00:00
Billy Zhu	34a65980d7	[MLIR] Erase location of folded constants (#75415 ) Follow up to the discussion from #75258, and serves as an alternate solution for #74670. Set the location to Unknown for deduplicated / moved / materialized constants by OperationFolder. This makes sure that the folded constants don't end up with an arbitrary location of one of the original ops that became it, and that hoisted ops don't confuse the stepping order.	2023-12-21 09:54:48 -08:00
Benjamin Maxwell	a4e15416b4	[mlir][ArmSME] Move creation of load/store intrinsics to helpers (NFC) (#76168 ) Also, for consistency make the ZeroOp lowering switch on the ArmSMETileType, rather than the element bit width.	2023-12-21 17:46:12 +00:00
Finn Plummer	88151dd428	[mlir][spirv] Add folding for SNegate, [Logical]Not (#74992 ) Add missing constant propogation folder for SNegate, [Logical]Not. Implement additional folding when !(!x) for all ops. This helps for readability of lowered code into SPIR-V. Part of work for #70704	2023-12-21 18:24:01 +01:00
Maksim Levental	537b2aa264	[mlir][python] meta region_op (#75673 )	2023-12-21 11:20:29 -06:00
Oleksandr "Alex" Zinenko	11140cc238	[mlir] mark ChangeResult as nodiscard (#76147 ) This enum is used by dataflow analyses to indicate whether further propagation is necessary to reach the fix point. Accidentally discarding such a value will likely lead to propagation stopping early, leading to incomplete or incorrect results. The most egregious example is the duality between `join` on the analysis class, which triggers propagation internally, and `join` on the lattice class that does not and expects the caller to trigger it depending on the returned `ChangeResult`.	2023-12-21 17:58:53 +01:00
Jakub Kuderski	72003adf6b	[mlir][gpu] Allow subgroup reductions over 1-d vector types (#76015 ) Each vector element is reduced independently, which is a form of multi-reduction. The plan is to allow for gradual lowering of multi-reduction that results in fewer `gpu.shuffle` ops at the end: 1d `vector.multi_reduction` --> 1d `gpu.subgroup_reduce` --> smaller 1d `gpu.subgroup_reduce` --> packed `gpu.shuffle` over i32 For example we can perform 2 independent f16 reductions with a series of `gpu.shuffles` over i32, reducing the final number of `gpu.shuffles` by 2x.	2023-12-21 11:55:43 -05:00
Andrzej Warzyński	17afa5befb	[mlir][nfc] Update tests for Contract -> Op transforms (#76054 ) Updates two tests for vector.contract -> vector.outerproduct transformations: 1. Rename "vector-contract-to-outerproduct-transforms.mlir" as "vector-contract-to-outerproduct-matmul-transforms.mlir". The new name more accurate captures what's being tested. it is also consistent with "vector-contract-to-outerproduct-matvec-transforms.mlir", which covers vector matvec operations and makes finding relevant tests easier. 2. For matmul tests, move the traits definining the iteration spaces to the top of the file. This is consistent with how matvec tests are defined and also makes it easy to quickly identify what cases are covered. 3. For matmul tests, use more meaningful names for function arguments. This helps keep things consistent across the file (i.e. function definitions wih check lines and comments). 4. For matvec test, move a few tests around so that the most basic case (without masking) is first. 5. Update comments.	2023-12-21 13:20:16 +00:00
Alex Zinenko	78bd124649	Revert "[mlir][python] Make the Context/Operation capsule creation methods work as documented. (#76010 )" This reverts commit `bbc2976868`. This change seems to be at odds with the non-owning part semantics of MlirOperation in C API. Since downstream clients can only take and return MlirOperation, it does not sound correct to force all returns of MlirOperation transfer ownership. Specifically, this makes it impossible for downstreams to implement IR-traversing functions that, e.g., look at neighbors of an operation. The following patch triggers the exception, and there does not seem to be an alternative way for a downstream binding writer to express this: ``` diff --git a/mlir/lib/Bindings/Python/IRCore.cpp b/mlir/lib/Bindings/Python/IRCore.cpp index 39757dfad5be..2ce640674245 100644 --- a/mlir/lib/Bindings/Python/IRCore.cpp +++ b/mlir/lib/Bindings/Python/IRCore.cpp @@ -3071,6 +3071,11 @@ void mlir::python::populateIRCore(py::module &m) { py::arg("successors") = py::none(), py::arg("regions") = 0, py::arg("loc") = py::none(), py::arg("ip") = py::none(), py::arg("infer_type") = false, kOperationCreateDocstring) + .def("_get_first_in_block", [](PyOperation &self) -> MlirOperation { + MlirBlock block = mlirOperationGetBlock(self.get()); + MlirOperation first = mlirBlockGetFirstOperation(block); + return first; + }) .def_static( "parse", [](const std::string &sourceStr, const std::string &sourceName, diff --git a/mlir/test/python/ir/operation.py b/mlir/test/python/ir/operation.py index f59b1a26ba48..6b12b8da5c24 100644 --- a/mlir/test/python/ir/operation.py +++ b/mlir/test/python/ir/operation.py @@ -24,6 +24,25 @@ def expect_index_error(callback): except IndexError: pass +@run +def testCustomBind(): + ctx = Context() + ctx.allow_unregistered_dialects = True + module = Module.parse( + r""" + func.func @f1(%arg0: i32) -> i32 { + %1 = "custom.addi"(%arg0, %arg0) : (i32, i32) -> i32 + return %1 : i32 + } + """, + ctx, + ) + add = module.body.operations[0].regions[0].blocks[0].operations[0] + op = add.operation + # This will get a reference to itself. + f1 = op._get_first_in_block() + + # Verify iterator based traversal of the op/region/block hierarchy. # CHECK-LABEL: TEST: testTraverseOpRegionBlockIterators ```	2023-12-21 10:06:44 +00:00
Matthias Springer	db8a119e8f	[mlir][ArmSME] Fix invalid rewriter API usage (#76123 ) When operations are modified in-place, the rewriter must be notified. This commit fixes `mlir/test/Conversion/ArmSMEToLLVM/unsupported.mlir`, `mlir/test/Dialect/ArmSME/tile-zero-masks.mlir` and `mlir/test/Dialect/ArmSME/vector-ops-to-llvm.mlir` when running with `MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS` enabled.	2023-12-21 17:39:36 +09:00
Tobias Gysi	9971b9ab19	[mlir][llvm] Improve alloca handling during inlining (#75961 ) This revision changes the alloca handling in the LLVM inliner. It ensures that alloca operations, even those nested within a region operation, can be relocated to the entry block of the function, or the closest ancestor region that is marked with either the isolated from above or automatic allocation scope trait. While the LLVM dialect does not have any region operations, the inlining interface may be used on IR that mixes different dialects.	2023-12-21 08:11:17 +01:00
Matthias Springer	d8d09296ed	[mlir][EmitC] Fix invalid rewriter API usage (#76124 ) When operations are modified in-place, the rewriter must be notified. This commit fixes `mlir/test/Dialect/EmitC/transforms.mlir` when running with `MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS` enabled.	2023-12-21 16:00:18 +09:00
Valentin Clement	a25da1a921	[mlir][openacc] Add device_type support for compute operations (#75864 ) Re-land PR after being reverted because of buildbot failures. This patch adds representation for `device_type` clause information on compute construct (parallel, kernels, serial). The `device_type` clause on compute construct impacts clauses that appear after it. The values impacted by `device_type` are now tied with an attribute array that represent the device_type associated with them. `DeviceType::None` is used to represent the value produced by a clause before any `device_type`. The operands and the attribute information are parser/printed together. This is an example with `vector_length` clause. The first value (64) is not impacted by `device_type` so it will be represented with DeviceType::None. None is not printed. The second value (128) is tied with the `device_type(multicore)` clause. ``` !$acc parallel vector_length(64) device_type(multicore) vector_length(256) ``` ``` acc.parallel vector_length(%c64 : i32, %c128 : i32 [#acc.device_type<multicore>]) { } ``` When multiple values can be produced for a single clause like `num_gangs` and `wait`, an extra attribute describe the number of values belonging to each `device_type`. Values and attributes are parsed/printed together. ``` acc.parallel num_gangs({%c2 : i32, %c4 : i32}, {%c4 : i32} [#acc.device_type<nvidia>]) ``` While preparing this patch I noticed that the wait devnum is not part of the operations and is not lowered. It will be added in a follow up patch.	2023-12-20 20:36:09 -08:00
Han-Chung Wang	bffdde8b8e	[mlir][tensor][NFC] Fix a typo in pack simplification pattern. (#76109 )	2023-12-20 17:03:55 -08:00
Valentin Clement	553748356c	Revert "[mlir][openacc] Add device_type support for compute operations (#75864 )" This reverts commit `8b885eb90f`.	2023-12-20 16:08:10 -08:00
Maksim Levental	acaff70841	[mlir][python] move transform extras (#76102 )	2023-12-20 17:29:11 -06:00
Peiming Liu	cf4dd91165	[mlir][sparse] initialize slice-driven loop-related fields in one place (#76099 )	2023-12-20 14:20:57 -08:00
Valentin Clement (バレンタインクレメン)	8b885eb90f	[mlir][openacc] Add device_type support for compute operations (#75864 ) This patch adds representation for `device_type` clause information on compute construct (parallel, kernels, serial). The `device_type` clause on compute construct impacts clauses that appear after it. The values impacted by `device_type` are now tied with an attribute array that represent the device_type associated with them. `DeviceType::None` is used to represent the value produced by a clause before any `device_type`. The operands and the attribute information are parser/printed together. This is an example with `vector_length` clause. The first value (64) is not impacted by `device_type` so it will be represented with DeviceType::None. None is not printed. The second value (128) is tied with the `device_type(multicore)` clause. ``` !$acc parallel vector_length(64) device_type(multicore) vector_length(256) ``` ``` acc.parallel vector_length(%c64 : i32, %c128 : i32 [#acc.device_type<multicore>]) { } ``` When multiple values can be produced for a single clause like `num_gangs` and `wait`, an extra attribute describe the number of values belonging to each `device_type`. Values and attributes are parsed/printed together. ``` acc.parallel num_gangs({%c2 : i32, %c4 : i32}, {%c4 : i32} [#acc.device_type<nvidia>]) ``` While preparing this patch I noticed that the wait devnum is not part of the operations and is not lowered. It will be added in a follow up patch.	2023-12-20 13:45:47 -08:00
Stella Laurenzo	bbc2976868	[mlir][python] Make the Context/Operation capsule creation methods work as documented. (#76010 ) This fixes a longstanding bug in the `Context._CAPICreate` method whereby it was not taking ownership of the PyMlirContext wrapper when casting to a Python object. The result was minimally that all such contexts transferred in that way would leak. In addition, counter to the documentation for the `_CAPICreate` helper (see `mlir-c/Bindings/Python/Interop.h`) and the `forContext` / `forOperation` methods, we were silently upgrading any unknown context/operation pointer to steal-ownership semantics. This is dangerous and was causing some subtle bugs downstream where this facility is getting the most use. This patch corrects the semantics and will only do an ownership transfer for `_CAPICreate`, and it will further require that it is an ownership transfer (if already transferred, it was just silently succeeding). Removing the mis-aligned behavior made it clear where the downstream was doing the wrong thing. It also adds some `_testing_` functions to create unowned context and operation capsules so that this can be fully tested upstream, reworking the tests to verify the behavior. In some torture testing downstream, I was not able to trigger any memory corruption with the newly enforced semantics. When getting it wrong, a regular exception is raised.	2023-12-20 12:18:58 -08:00
Alex Beloi	d84c640143	[mlir] Remove "Syntax:" parser where it's already provided by `assemblyFormat` (#76002 ) See #73359 Types using `assemblyFormat` to define parsing don't need an additional handwritten parser. So we should remove the handwritten parsers where one provided by an `assemblyFormat` already exists to avoid confusion and de-syncing.	2023-12-20 14:58:51 -05:00
Krzysztof Parzyszek	8b231d73bd	[mlir] Fix build break with shared libraries When project components are built as separate shared libraries, a lot of errors appear about undefined symbols, e.g. ``` /usr/bin/ld: CMakeFiles/obj.MLIRGPUPipelines.dir/GPUToNVVMPipeline.cpp.o : in function `(anonymous namespace)::buildCommonPassPipeline(mlir::OpPa ssManager&, (anonymous namespace)::GPUToNVVMPipelineOptions const&)': GPUToNVVMPipeline.cpp:(.text._ZN12_GLOBAL__N_123buildCommonPassPipelineE RN4mlir13OpPassManagerERKNS_24GPUToNVVMPipelineOptionsE+0xa5): undefined reference to `mlir::createConvertLinalgToLoopsPass()' ``` Add the necessary dependencies to Dialect/GPU/Pipelines/CMakeLists.txt	2023-12-20 12:49:25 -06:00
Han-Chung Wang	b33a131c82	[mlir][arith] Add support for expanding arith.maxnumf/minnumf ops. (#75989 ) The maxnum/minnum semantics can be found at https://llvm.org/docs/LangRef.html#llvm-minnum-intrinsic. The revision also updates function names in lit tests to match op name. Take arith.maxnumf as example: ``` func.func @maxnumf(%lhs: f32, %rhs: f32) -> f32 { %result = arith.maxnumf %lhs, %rhs : f32 return %result : f32 } ``` will be expanded to ``` func.func @maxnumf(%lhs: f32, %rhs: f32) -> f32 { %0 = arith.cmpf ugt, %lhs, %rhs : f32 %1 = arith.select %0, %lhs, %rhs : f32 %2 = arith.cmpf uno, %lhs, %lhs : f32 %3 = arith.select %2, %rhs, %1 : f32 return %3 : f32 } ``` Case 1: Both LHS and RHS are not NaN; LHS > RHS In this case, `%1` is LHS. `%3` and `%1` have the same value, so `%3` is LHS. Case 2: LHS is NaN and RHS is not NaN In this case, `%2` is true, so `%3` is always RHS. Case 3: LHS is not NaN and RHS is NaN In this case, `%0` is true and `%1` is LHS. `%2` is false, so `%3` and `%1` have the same value, which is LHS. Case 4: Both LHS and RHS are NaN: `%1` and RHS are all NaN, so the result is still NaN.	2023-12-20 10:35:12 -08:00
Paul C Fuqua	11141bc68a	Fix what seems to be a silly bug in gpu.set_default_device rewriting. Smoke test included. (#75756 )	2023-12-20 09:35:42 -06:00
Razvan Lupusoru	a711b042fd	[acc] Initial implementation of MemoryEffects on `acc` operations (#75970 ) The `acc` dialect operations now implement MemoryEffects interfaces in the following ways: - Data entry operations which may read host memory via `varPtr` are now marked as so. The majority of them do NOT actually read the host memory. For example, `acc.present` works on the basis of presence of pointer and not necessarily what the data points to - so they are not marked as reading the host memory. They still use `varPtr` though but this dependency is reflected through ssa. - Data clause operations which may mutate the data pointed to by `accPtr` are marked as doing so. - Data clause operations which update required structured or dynamic runtime counters are marked as reading and writing the newly defined `RuntimeCounters` resource. Some operations, like `acc.getdeviceptr` do not actually use the runtime counters - but are marked as reading them since the address obtained depends on the mapping operations which do update the runtime counters. Namely, `acc.getdeviceptr` cannot be moved across other mapping operations. - Constructs are marked as writing to the `ConstructResource`. This may be too strict but is needed for the following reasons: 1) Structured constructs may not use `accPtr` and instead use `varPtr` - when this is the case, data actions may be removed even when used. 2) Unstructured constructs are currently used to aggregate multiple data actions. We do not want such constructs removed or moved for now. - Terminators are marked as `Pure` as in other dialects. The current approach has the following limitations which may require further improvements: - Subsequent `acc.copyin` operations on same data do not actually read host memory pointed to by `varPtr` but are still marked as so. - Two `acc.delete` operations on same data may not mutate `accPtr` until the runtime counters are zero (but are still marked as mutating). - The `varPtrPtr` argument, when present, points to the address of location of `varPtr`. When mapping to target device, an `accPtrPtr` needs computed and this memory is mutated. This effect is not captured since the current operations do not produce `accPtrPtr`. - Runtime counter effects are imprecise since two operations with differing `varPtr` increment/decrement different counters. Additionally, operations with `varPtrPtr` mutate attachment counters. - The `ConstructResource` is too strict and likely can be relaxed with better modeling.	2023-12-20 07:11:19 -08:00
Gil Rapaport	d9803841f2	[mlir][emitc] Add op modelling C expressions (#71631 ) Add an emitc.expression operation that models C expressions, and provide transforms to form and fold expressions. The translator emits the body of emitc.expression ops as a single C expression. This expression is emitted by default as the RHS of an EmitC SSA value, but if possible, expressions with a single use that is not another expression are instead inlined. Specific expression's inlining can be fine tuned by lowering passes and transforms.	2023-12-20 15:04:46 +02:00
Andrzej Warzyński	354adb44c9	[mlir][vector] Extend `CreateMaskFolder` (#75842 ) Extends `CreateMaskFolder` pattern so that the following: ```mlir %c8 = arith.constant 8 : index %c16 = arith.constant 16 : index %0 = vector.vscale %1 = arith.muli %0, %c16 : index %10 = vector.create_mask %c8, %1 : vector<8x[16]xi1> ``` is folded as: ```mlir %0 = vector.constant_mask [8, 16] : vector<8x[16]xi1> ```	2023-12-20 11:08:54 +00:00
Andrzej Warzyński	d5abd8a1a9	[mlir][vector][nfc] Move tests for scalable outer-product (#76035 ) Tests for vector.outerproduct for scalable vectors from "vector-scalable-outerproduct.mlir" are moved to: * ops.mlir and invalid.mlir. These files are effectively used to document what Ops are supported and That's basically what the original file was testing (but specifically for scalable vectors).	2023-12-20 10:53:00 +00:00
Finn Plummer	4c83c27c91	[mlir][spirv] Add folding for [I\|Logical][Not]Equal (#74194 )	2023-12-20 11:00:28 +01:00
Cullen Rhodes	4db0bd28e8	[mlir][vector][nfc] remove unused template parameter (#75931 )	2023-12-20 08:06:25 +00:00
Matthias Springer	f7096428b4	[mlir][GPU] Add `RecursiveMemoryEffects` to `gpu.launch` (#75315 ) Infer the side effects of `gpu.launch` from its body.	2023-12-20 15:25:25 +09:00
Matthias Springer	c4457e10fe	[mlir][IR] Change block/region walkers to enumerate `this` block/region (#75020 ) This change makes block/region walkers consistent with operation walkers. An operation walk enumerates the current operation. Similarly, block/region walks should enumerate the current block/region. Example: ``` // Current behavior: op1->walk([](Operation op2) { / op1 is enumerated / }); block1->walk([](Block block2) { /* block1 is NOT enumerated / }); region1->walk([](Block block) { /* blocks of region1 are NOT enumerated / }); region1->walk([](Region region2) { /* region1 is NOT enumerated }); // New behavior: op1->walk([](Operation op2) { / op1 is enumerated / }); block1->walk([](Block block2) { /* block1 IS enumerated / }); region1->walk([](Block block) { /* blocks of region1 ARE enumerated / }); region1->walk([](Region region2) { /* region1 IS enumerated }); ```	2023-12-20 14:51:45 +09:00
Matthias Springer	f10302e3fa	[mlir] Require folders to produce Values of same type (#75887 ) This commit adds extra assertions to `OperationFolder` and `OpBuilder` to ensure that the types of the folded SSA values match with the result types of the op. There used to be checks that discard the folded results if the types do not match. This commit makes these checks stricter and turns them into assertions. Discarding folded results with the wrong type (without failing explicitly) can hide bugs in op folders. Two such bugs became apparent in MLIR (and some more in downstream projects) and are fixed with this change. Note: The existing type checks were introduced in https://reviews.llvm.org/D95991. Migration guide: If you see failing assertions (`folder produced value of incorrect type`; make sure to run with assertions enabled!), run with `-debug` or dump the operation right before the failing assertion. This will point you to the op that has the broken folder. A common mistake is a mismatch between static/dynamic dimensions (e.g., input has a static dimension but folded result has a dynamic dimension).	2023-12-20 14:39:22 +09:00
Jakub Kuderski	560564f51c	[mlir][vector][gpu] Align minf/maxf reduction kind names with arith (#75901 ) This is to avoid confusion when dealing with reduction/combining kinds. For example, see a recent PR comment: https://github.com/llvm/llvm-project/pull/75846#discussion_r1430722175. Previously, they were picked to mostly mirror the names of the llvm vector reduction intrinsics: https://llvm.org/docs/LangRef.html#llvm-vector-reduce-fmin-intrinsic. In isolation, it was not clear if `<maxf>` has `arith.maxnumf` or `arith.maximumf` semantics. The new reduction kind names map 1:1 to arith ops, which makes it easier to tell/look up their semantics. Because both the vector and the gpu dialect depend on the arith dialect, it's more natural to align names with those in arith than with the lowering to llvm intrinsics. Issue: https://github.com/llvm/llvm-project/issues/72354	2023-12-20 00:14:43 -05:00
Matthias Springer	10056c821a	[mlir][SCF] `scf.parallel`: Make reductions part of the terminator (#75314 ) This commit makes reductions part of the terminator. Instead of `scf.yield`, `scf.reduce` now terminates the body of `scf.parallel` ops. `scf.reduce` may contain an arbitrary number of reductions, with one region per reduction. Example: ```mlir %init = arith.constant 0.0 : f32 %r:2 = scf.parallel (%iv) = (%lb) to (%ub) step (%step) init (%init, %init) -> f32, f32 { %elem_to_reduce1 = load %buffer1[%iv] : memref<100xf32> %elem_to_reduce2 = load %buffer2[%iv] : memref<100xf32> scf.reduce(%elem_to_reduce1, %elem_to_reduce2 : f32, f32) { ^bb0(%lhs : f32, %rhs: f32): %res = arith.addf %lhs, %rhs : f32 scf.reduce.return %res : f32 }, { ^bb0(%lhs : f32, %rhs: f32): %res = arith.mulf %lhs, %rhs : f32 scf.reduce.return %res : f32 } } ``` `scf.reduce` operations can no longer be interleaved with other ops in the body of `scf.parallel`. This simplifies the op and makes it possible to assign the `RecursiveMemoryEffects` trait to `scf.reduce`. (This was not possible before because the op was not a terminator, causing the op to be DCE'd.)	2023-12-20 11:06:27 +09:00
long.chen	227bfa1fb1	[mlir] fix a crash when lower parallel loop to gpu (#75811 ) (#75946 )	2023-12-20 09:13:15 +08:00
Jakub Kuderski	9f74e6e615	[mlir][vector][gpu] Use `makeArithReduction` in lowering patterns. NFC. (#75952 ) Use the `vector::makeArithReduction` helper as the source-of-truth of reduction to arith ops lowering.	2023-12-19 19:04:27 -05:00
Sang Ik Lee	8197ea2a08	[MLIR] Update FindSyclRuntime.cmake to handle SYCL library path chang… (#75861 ) …e introduced by oneAPI DPC++ compiler 2024.0	2023-12-19 15:55:33 -06:00
Kunwar Grover	282d501476	[mlir][Transform] Fix crash with invalid ir for transform libraries (#75649 ) This patch fixes a crash caused when the transform library interpreter is given an IR that fails to parse.	2023-12-19 23:16:19 +05:30
Han-Chung Wang	899c2bed9e	[mlir][TilingInterface] Early return cloned ops if tile sizes are zeros. (#75410 ) It is a trivial early-return case. If the cloned ops are not returned, it will generate `extract_slice` op that extracts the whole slice. However, it is not folded away. Early-return to avoid the case. E.g., ```mlir func.func @matmul_tensors( %arg0: tensor<?x?xf32>, %arg1: tensor<?x?xf32>, %arg2: tensor<?x?xf32>) -> tensor<?x?xf32> { %0 = linalg.matmul ins(%arg0, %arg1: tensor<?x?xf32>, tensor<?x?xf32>) outs(%arg2: tensor<?x?xf32>) -> tensor<?x?xf32> return %0 : tensor<?x?xf32> } module attributes {transform.with_named_sequence} { transform.named_sequence @__transform_main(%arg1: !transform.any_op {transform.readonly}) { %0 = transform.structured.match ops{["linalg.matmul"]} in %arg1 : (!transform.any_op) -> !transform.any_op %1 = transform.structured.tile_using_for %0 [0, 0, 0] : (!transform.any_op) -> (!transform.any_op) transform.yield } } ``` Apply the transforms and canonicalize the IR: ``` mlir-opt --transform-interpreter -canonicalize input.mlir ``` we will get ```mlir module { func.func @matmul_tensors(%arg0: tensor<?x?xf32>, %arg1: tensor<?x?xf32>, %arg2: tensor<?x?xf32>) -> tensor<?x?xf32> { %c1 = arith.constant 1 : index %c0 = arith.constant 0 : index %dim = tensor.dim %arg0, %c0 : tensor<?x?xf32> %dim_0 = tensor.dim %arg0, %c1 : tensor<?x?xf32> %dim_1 = tensor.dim %arg1, %c1 : tensor<?x?xf32> %extracted_slice = tensor.extract_slice %arg0[0, 0] [%dim, %dim_0] [1, 1] : tensor<?x?xf32> to tensor<?x?xf32> %extracted_slice_2 = tensor.extract_slice %arg1[0, 0] [%dim_0, %dim_1] [1, 1] : tensor<?x?xf32> to tensor<?x?xf32> %extracted_slice_3 = tensor.extract_slice %arg2[0, 0] [%dim, %dim_1] [1, 1] : tensor<?x?xf32> to tensor<?x?xf32> %0 = linalg.matmul ins(%extracted_slice, %extracted_slice_2 : tensor<?x?xf32>, tensor<?x?xf32>) outs(%extracted_slice_3 : tensor<?x?xf32>) -> tensor<?x?xf32> return %0 : tensor<?x?xf32> } } ``` The revision early-return the case so we can get: ```mlir func.func @matmul_tensors(%arg0: tensor<?x?xf32>, %arg1: tensor<?x?xf32>, %arg2: tensor<?x?xf32>) -> tensor<?x?xf32> { %0 = linalg.matmul ins(%arg0, %arg1 : tensor<?x?xf32>, tensor<?x?xf32>) outs(%arg2 : tensor<?x?xf32>) -> tensor<?x?xf32> return %0 : tensor<?x?xf32> } ```	2023-12-19 09:14:43 -08:00
Abhinav271828	cfd51fbadd	[MLIR][Presburger] Add LLL basis reduction (#75565 ) Add a method for LLL basis reduction to the FracMatrix class. This needs an abs() method for Fractions, which is added to Fraction.h.	2023-12-19 17:31:38 +01:00
Ivan Butygin	c0d2ea9d42	[mlir][scf] Improve `scf.parallel` fusion pass (#75852 ) Abort fusion if memref load may alias write, but not the exact alias. Add alias check hook to `naivelyFuseParallelOps`, so user can customize alias checking. Use builtin alias analysis in `ParallelLoopFusion` pass.	2023-12-19 18:07:46 +03:00
Oleksandr "Alex" Zinenko	9519e3ecbf	[mlir] support dialect attribute translation to LLVM IR (#75309 ) Extend the `amendOperation` mechanism for translating dialect attributes attached to operations from another dialect when translating MLIR to LLVM IR. Previously, this mechanism would have no knowledge of the LLVM IR instructions created for the given operation, making it impossible for it to perform local modifications such as attaching operation-level metadata. Collect instructions inserted by the LLVM IR builder and pass them to `amendOperation`.	2023-12-19 14:18:16 +01:00
Guray Ozen	5caae72d1a	[mlir][gpu] Productize `test-lower-to-nvvm` as `gpu-lower-to-nvvm` (#75775 ) The `test-lower-to-nvvm` pipeline serves as the common and proper pipeline for nvvm+host compilation, and it's used across our CUDA integration tests. This PR updates the `test-lower-to-nvvm` pipeline to `gpu-lower-to-nvvm` and moves it within `InitAllPasses.h`. The aim is to call it from Python, also having a standardize compilation process for nvvm.	2023-12-19 08:40:46 +01:00
Adam Paszke	12e4332501	[mlir][nvgpu] Fix the TMA stride setup (#75838 ) There were two issues with the previous computation: * it never looked at dimensions past the second one * the definition was recursive, making each dimension have an extra `elementSize` power	2023-12-19 08:40:26 +01:00
Matthias Springer	9b21866fea	[mlir][linalg] Fix invalid IR in `FoldInsertPadIntoFill` (#74418 ) `FoldInsertPadIntoFill` used to generate an invalid `tensor.insert_slice` op: ``` error: expected type to be 'tensor<?x?x?xf32>' or a rank-reduced version. (size mismatch) ``` This commit fixes tests such as `mlir/test/Dialect/Linalg/canonicalize.mlir` when verifying the IR after each pattern application (#74270).	2023-12-19 14:17:54 +09:00
Matthias Springer	3a087c1592	[mlir][linalg] Fix invalid IR in Linalg op fusion (#74425 ) Linalg op fusion (`Linalg/Transforms/Fusion.cpp`) used to generate invalid fused producer ops: ``` error: 'linalg.conv_2d_nhwc_hwcf' op expected type of operand #2 ('tensor<1x8x16x4xf32>') to match type of corresponding result ('tensor<?x?x?x?xf32>') note: see current operation: %24 = "linalg.conv_2d_nhwc_hwcf"(%21, %22, %23) <{dilations = dense<1> : tensor<2xi64>, operandSegmentSizes = array<i32: 2, 1>, strides = dense<2> : tensor<2xi64>}> ({ ^bb0(%arg9: f32, %arg10: f32, %arg11: f32): %28 = "arith.mulf"(%arg9, %arg10) <{fastmath = #arith.fastmath<none>}> : (f32, f32) -> f32 %29 = "arith.addf"(%arg11, %28) <{fastmath = #arith.fastmath<none>}> : (f32, f32) -> f32 "linalg.yield"(%29) : (f32) -> () }) {linalg.memoized_indexing_maps = [affine_map<(d0, d1, d2, d3, d4, d5, d6) -> (d0, d1 * 2 + d4, d2 * 2 + d5, d6)>, affine_map<(d0, d1, d2, d3, d4, d5, d6) -> (d4, d5, d6, d3)>, affine_map<(d0, d1, d2, d3, d4, d5, d6) -> (d0, d1, d2, d3)>]} : (tensor<1x?x?x3xf32>, tensor<3x3x3x4xf32>, tensor<1x8x16x4xf32>) -> tensor<?x?x?x?xf32> ``` This is a problem because the input IR to greedy pattern rewriter during `-test-linalg-greedy-fusion` is invalid. This commit fixes tests such as `mlir/test/Dialect/Linalg/tile-and-fuse-tensors.mlir` when verifying the IR after each pattern application (#74270).	2023-12-19 14:17:10 +09:00
Jakub Kuderski	07677113ff	[mlir][vector] Add pattern to break down reductions into arith ops (#75727 ) The number of vector elements considered 'small' enough to extract is parameterized. This is to avoid going into specialized reduction lowering when a single/couple of arith ops can do. Targets without dedicated reduction intrinsics can use that as an emulation path too. Depends on https://github.com/llvm/llvm-project/pull/75846.	2023-12-18 17:54:54 -05:00
Jakub Kuderski	a528cee224	[mlir][vector] Improve `makeArithReduction` expansion (#75846 ) Propagate fast math flags. Distinguish `minf`/`maxf` and `minimumf`/`maximumf`. Required for future patterns in https://github.com/llvm/llvm-project/pull/75727.	2023-12-18 17:47:46 -05:00
srcarroll	b26ee97537	[MLIR][Linalg] Support dynamic sizes in `lower_unpack` (#75494 )	2023-12-18 19:02:04 +01:00
Rik Huijzer	672f1a036a	[mlir][memref] Make `LoadOp::verify` error more clear (#75831 ) While debugging https://github.com/llvm/llvm-project/issues/71326, the `LoadOp::verify` code and error were very confusing. This PR improves that. This code was a part from the reverted PR https://github.com/llvm/llvm-project/pull/75519. Fixing the `-convert-vector-to-scf` issue is going to take a bit longer and this code was out of scope anyway. Co-authored-by: Benjamin Maxwell <macdue@dueutil.tech>	2023-12-18 18:41:05 +01:00
Oleksandr "Alex" Zinenko	4d9d105c70	[mlir] fix filecheck prefixes in a dataflow test (#75794 ) -SAME and -LITERAL do not compose in CHECK commands.	2023-12-18 17:11:21 +01:00
Oleksandr "Alex" Zinenko	32a4e3fcca	[mlir] support non-interprocedural dataflow analyses (#75583 ) The core implementation of the dataflow anlysis framework is interpocedural by design. While this offers better analysis precision, it also comes with additional cost as it takes longer for the analysis to reach the fixpoint state. Add a configuration mechanism to the dataflow solver to control whether it operates inteprocedurally or not to offer clients a choice. As a positive side effect, this change also adds hooks for explicitly processing external/opaque function calls in the dataflow analyses, e.g., based off of attributes present in the the function declaration or call operation such as alias scopes and modref available in the LLVM dialect. This change should not affect existing analyses and the default solver configuration remains interprocedural. Co-authored-by: Jacob Peng <jacobmpeng@gmail.com>	2023-12-18 14:16:52 +01:00
Paul Walker	dea16ebd26	[LLVM][IR] Replace ConstantInt's specialisation of getType() with getIntegerType(). (#75217 ) The specialisation will not be valid when ConstantInt gains native support for vector types. This is largely a mechanical change but with extra attention paid to constant folding, InstCombineVectorOps.cpp, LoopFlatten.cpp and Verifier.cpp to remove the need to call `getIntegerType()`. Co-authored-by: Nikita Popov <github@npopov.com>	2023-12-18 11:58:42 +00:00
Dominik Adamski	6deb5d4e44	[NFC][OpenMP][MLIR] Verify if empty workshare loop is lowered correctly (#75518 ) Check if workshare loop without loop body is lowered correctly i.e.: 1) null pointer is passed to OpenMP device RTL function as a parameter which denotes loop function body aggregated parameters 2) Outlined loop function body has only one parameter - loop counter	2023-12-18 11:59:35 +01:00
Kareem Ergawy	d777504355	[MLIR][OpenMP][Offload] Lower target update op to DeviceRT (#75159 ) Adds support for lowring `UpdateDataOp` to the DeviceRT. This reuses the existing utils used by other device directive.	2023-12-18 11:14:46 +01:00
Jakub Kuderski	2c668fddad	[mlir][gpu] Trim trailing whitespace in GPUOps.td. NFC.	2023-12-17 21:34:29 -05:00
Jakub Kuderski	dd45be028d	[mlir][gpu] Trim trailing whitespace in dialect docs. NFC.	2023-12-17 21:00:06 -05:00
Rik Huijzer	6561efe142	[mlir][python][nfc] Test `-print-ir-after-all` (#75742 ) The functionality to `-print-ir-after-all` was added in `caa159f044`. This PR adds a test and, with that, some documentation. --------- Co-authored-by: Maksim Levental <maksim.levental@gmail.com>	2023-12-17 20:24:47 +01:00
Kazu Hirata	6655581038	[Dialect] Use llvm::is_contained (NFC)	2023-12-17 09:41:22 -08:00
Rik Huijzer	9f5afc3de9	Revert "[mlir][vector] Fix invalid `LoadOp` indices being created (#75519 )" This reverts commit `3a1ae2f46d`.	2023-12-17 12:34:17 +01:00
Rik Huijzer	3a1ae2f46d	[mlir][vector] Fix invalid `LoadOp` indices being created (#75519 ) Fixes https://github.com/llvm/llvm-project/issues/71326. The cause of the issue was that a new `LoadOp` was created which looked something like: ```mlir %arg4 = func.func main(%arg1 : index, %arg2 : index) { %alloca_0 = memref.alloca() : memref<vector<1x32xi1>> %1 = vector.type_cast %alloca_0 : memref<vector<1x32xi1>> to memref<1xvector<32xi1>> %2 = memref.load %1[%arg1, %arg2] : memref<1xvector<32xi1>> return } ``` which crashed inside the `LoadOp::verify`. Note here that `%alloca_0` is 0 dimensional, `%1` has one dimension, but `memref.load` tries to index `%1` with two indices. This is now fixed by using the fact that `unpackOneDim` always unpacks one dim `1bce61e6b0/mlir/lib/Conversion/VectorToSCF/VectorToSCF.cpp (L897-L903)` and so the `loadOp` should just index only one dimension. --------- Co-authored-by: Benjamin Maxwell <macdue@dueutil.tech>	2023-12-17 11:42:35 +01:00
Matthias Springer	ea979b24b0	[mlir][SparseTensor][NFC] Remove `isNestedIn` helper function (#75729 ) Use `Region::findAncestorBlockInRegion` instead of a custom IR traversal.	2023-12-17 13:19:27 +09:00
Kazu Hirata	b8f89b84bc	Use StringRef::{starts,ends}_with (NFC) This patch replaces uses of StringRef::{starts,ends}with with StringRef::{starts,ends}_with for consistency with std::{string,string_view}::{starts,ends}_with in C++20. I'm planning to deprecate and eventually remove StringRef::{starts,ends}with.	2023-12-16 15:02:17 -08:00
Quinn Dawkins	82ab0f7f36	[mlir][linalg] Fix rank-reduced cases for extract/insert slice in DropUnitDims (#74723 ) Inferring the reshape reassociation indices for extract/insert slice ops based on the read sizes of the original slicing op will generate an invalid expand/collapse shape op for already rank-reduced cases. Instead just infer from the shape of the slice. Ported from Differential Revision: https://reviews.llvm.org/D147488	2023-12-16 10:08:51 -05:00
Peiming Liu	6c06bde7c4	[mlir][sparse] support loop range query using SparseTensorLevel. (#75670 )	2023-12-15 16:33:31 -08:00
Peiming Liu	21edad7d07	[mlir][sparse] set up the skeleton for SparseTensorLevel abstraction. (#75645 ) Note that at the current moment, the newly-introduced `SparseTensorLevel` classes are far from complete, we plan to migrate code generation related to accessing sparse tensor levels to these classes in the near future to simplify `LoopEmitter`.	2023-12-15 13:34:34 -08:00
Rob Suderman	aa165edca8	[mlir][math] Added `math.sinh` with expansions to `math.exp` (#75517 ) Includes end-to-end tests for the cpu running, folders using `libm` and lowerings to the corresponding `libm` operations.	2023-12-15 11:35:40 -08:00
Andrzej Warzyński	f11bda78c8	[mlir][linalg] Use vector.shuffle to flatten conv filter (#75038 ) Updates the vectorisation of 1D depthwise convolution when flattening the channel dimension (introduced in #71918). In particular - how the convolution filter is "flattened". ATM, the vectoriser will use `vector.shape_cast`: ```mlir %b_filter = vector.broadcast %filter : vector<4xf32> to vector<3x2x4xf32> %sc_filter = vector.shape_cast %b_filter : vector<3x2x4xf32> to vector<3x8xf32> ``` This lowering is not ideal - `vector.shape_cast` can be convenient when it's folded away, but that's not happening in this case. Instead, this patch updates the vectoriser to use `vector.shuffle` (the overall result is identical): ```mlir %sh_filter = vector.shuffle %filter, %filter [0, 1, 2, 3, 0, 1, 2, 3] : vector<4xf32>, vector<4xf32> %b_filter = vector.broadcast %sh_filter : vector<8xf32> to vector<3x8xf32> ```	2023-12-15 17:56:59 +00:00
Peiming Liu	4a72a4ef12	[NFC][mlir][sparse] remove redundant parameter. (#75551 )	2023-12-15 09:29:22 -08:00
Jessica Del	32f9983c06	[AMDGPU] - Add address space for strided buffers (#74471 ) This is an experimental address space for strided buffers. These buffers can have structs as elements and a stride > 1. These pointers allow the indexed access in units of stride, i.e., they point at `buffer[index * stride]`. Thus, we can use the `idxen` modifier for buffer loads. We assign address space 9 to 192-bit buffer pointers which contain a 128-bit descriptor, a 32-bit offset and a 32-bit index. Essentially, they are fat buffer pointers with an additional 32-bit index.	2023-12-15 15:49:25 +01:00
Boian Petkantchin	5e29112719	[mlir][mesh] Add verification and canonicalization for some collectives (#74905 ) Add verification and canonicalization for broadcast, gather, recv, reduce, scatter, send and shift. The canonicalizations only remove trivial collectives with empty mesh_axes attrubutes.	2023-12-15 06:41:10 -08:00
Rafael Ubal	214d32ccd2	Support for dynamic dimensions in 'tensor.splat' (#74626 ) This feature had been marked as `TODO` in the `tensor.splat` documentation for a while. This MR includes: - Support for dynamically shaped tensors in the return type of `tensor.splat` with the syntax suggested in the `TODO` comment. - Updated op documentation. - Bufferization support. - Updates in op folders affected by the new feature. - Unit tests for valid/invalid syntax, valid/invalid folding, and lowering through bufferization. - Additional op builders resembling those available in `tensor.empty`.	2023-12-15 13:54:45 +00:00
Quinn Dawkins	fcd54b368e	[mlir][tensor] Fix tensor.concat reifyResultShapes for static result dims (#75558 ) When the concatenated dim is statically sized but the inputs are dynamically sized, reifyResultShapes must return the static shape. Fixes the implementation of the interface for tensor.concat in such cases.	2023-12-15 08:43:58 -05:00
martin-luecke	681eacc1b6	[MLIR][transform][python] add sugared python abstractions for transform dialect (#75073 ) This adds Python abstractions for the different handle types of the transform dialect The abstractions allow for straightforward chaining of transforms by calling their member functions. As an initial PR for this infrastructure, only a single transform is included: `transform.structured.match`. With a future `tile` transform abstraction an example of the usage is: ```Python def script(module: OpHandle): module.match_ops(MatchInterfaceEnum.TilingInterface).tile(tile_sizes=[32,32]) ``` to generate the following IR: ```mlir %0 = transform.structured.match interface{TilingInterface} in %arg0 %tiled_op, %loops = transform.structured.tile_using_for %0 [32, 32] ``` These abstractions are intended to enhance the usability and flexibility of the transform dialect by providing an accessible interface that allows for easy assembly of complex transformation chains.	2023-12-15 13:04:43 +01:00
Hsiangkai Wang	f643eec892	[mlir][vector] Add emulation patterns for vector masked load/store (#74834 ) In this patch, it will convert ``` vector.maskedload %base[%idx_0, %idx_1], %mask, %pass_thru ``` to ``` %ivalue = %pass_thru %m = vector.extract %mask[0] %result0 = scf.if %m { %v = memref.load %base[%idx_0, %idx_1] %combined = vector.insert %v, %ivalue[0] scf.yield %combined } else { scf.yield %ivalue } %m = vector.extract %mask[1] %result1 = scf.if %m { %v = memref.load %base[%idx_0, %idx_1 + 1] %combined = vector.insert %v, %result0[1] scf.yield %combined } else { scf.yield %result0 } ... ``` It will convert ``` vector.maskedstore %base[%idx_0, %idx_1], %mask, %value ``` to ``` %m = vector.extract %mask[0] scf.if %m { %extracted = vector.extract %value[0] memref.store %extracted, %base[%idx_0, %idx_1] } %m = vector.extract %mask[1] scf.if %m { %extracted = vector.extract %value[1] memref.store %extracted, %base[%idx_0, %idx_1 + 1] } ... ```	2023-12-15 11:35:48 +00:00
Cullen Rhodes	e7432babaf	[mlir][ArmSME] Fail instead of error in vector.outerproduct lowering (#75447 ) The 'vector.outerproduct' -> 'arm_sme.outerproduct' conversion currently errors on unsupported cases when it should return failure.	2023-12-15 07:30:32 +00:00
Felix Schneider	8190369e83	[mlir][tosa] Add verifier for `tosa.transpose` (#75376 ) This patch adds a verifier to `tosa.transpose` which fixes a crash. Related: https://github.com/llvm/llvm-project/pull/74367 Fix https://github.com/llvm/llvm-project/issues/74479	2023-12-15 07:22:32 +01:00
Vivian	bd6a2452ae	[mlir][SCF] Add support for peeling the first iteration out of the loop (#74015 ) There is a use case that we need to peel the first iteration out of the for loop so that the peeled forOp can be canonicalized away and the fillOp can be fused into the inner forall loop. For example, we have nested loops as below ``` linalg.fill ins(...) outs(...) scf.for %arg = %lb to %ub step %step scf.forall ... ``` After the peeling transform, it is expected to be ``` scf.forall ... linalg.fill ins(...) outs(...) scf.for %arg = %(lb + step) to %ub step %step scf.forall ... ``` This patch makes the most use of the existing peeling functions and adds support for peeling the first iteration out of the loop.	2023-12-14 17:03:52 -08:00
Jacques Pienaar	ee2deb4cf7	[mlir] Handle simple commutative cases in CSE. Tried to keep this simple while handling obvious CSE instances. For more complicated cases the expectation is still that the sorting pass would run before. While simple, this case did turn up in a real deployed instance where it had a large (>10% e2e) impact. This can of course be refined.	2023-12-14 16:09:05 -08:00
Fabian Mora	419c45a325	[mlir][gpu] Fix crash in `gpu-module-to-binary` (#75477 ) This patch fixes the error in issue #75434. The crash was being caused by not checking for a lack of target attributes in a GPU module. It's now considered an error to invoke the pass with a GPU module with no target attributes.	2023-12-14 14:03:10 -05:00
Aart Bik	15c06bc4af	[mlir][sparse] comment cleanup in iteration graph sorter (#75508 )	2023-12-14 10:56:28 -08:00
Yinying Li	7bc6c4abe8	[mlir][print]Add functions for printing memref f16/bf16/i16 (#75094 ) 1. Added functions for printMemrefI16/f16/bf16. 2. Added a new integration test for all the printMemref functions.	2023-12-14 13:06:25 -05:00
Jerry Wu	2c9ba9c34a	[mlir] Fix type transformation in DropUnitDimFromElementwiseOps (#75430 ) Use operand and result types to build the corresponding new types in `DropUnitDimFromElementwiseOps`.	2023-12-14 12:20:54 -05:00
Tobias Gysi	25d942403c	[mlir][llvm] Add invariant intrinsics (#75354 ) This commit implements the LLVM IR invariant intrinsics in LLVM dialect. These intrinsics can be used to mark a program regions in which the contents of a specific memory object will not change. The LLVM dialect implementation also implements the PromotableOpInterface to ensure Mem2Reg & SROA are able to promote pointers that are marked using the invariant intrinsics.	2023-12-14 14:58:45 +01:00
Kareem Ergawy	2ab926d959	[flang][MLIR][OpenMP] Add support for `target update` directive. (#75047 ) Add an op in the OMP dialect to model the `target update` direcive. This change reuses the `MapInfoOp` used by other device directive to model `map` clauses but verifies that the restrictions imposed by the `target update` directive are respected.	2023-12-14 12:48:45 +01:00
Cullen Rhodes	f0ce23509a	[mlir][ArmSME][NFC] Move conversion tests (#75446 ) * Move -vector-to-arm-sme tests to mlir/test/Conversion/VectorToArmSME * Move -arm-sme-to-llvm tests to mlir/test/Conversion/ArmSMEToLLVM * Separate unsupported tests.	2023-12-14 10:52:02 +00:00
Cullen Rhodes	0e06694235	[mlir][ArmSME][NFC] Remove arm_sme::populateVectorTransferLoweringPatterns decl (#75442 ) Unused since D154867.	2023-12-14 10:51:28 +00:00
Pablo Antonio Martinez	7f4f75c144	[MLIR][SCFToOpenMP] Add num-threads option (#74854 ) Add `num-threads` option to the `-convert-scf-to-openmp` pass, allowing to set the number of threads to be used in the `omp.parallel` to a fixed value.	2023-12-14 09:07:17 +00:00
Kazu Hirata	88d319a29f	[mlir] Use StringRef::{starts,ends}_with (NFC) This patch replaces uses of StringRef::{starts,ends}with with StringRef::{starts,ends}_with for consistency with std::{string,string_view}::{starts,ends}_with in C++20. I'm planning to deprecate and eventually remove StringRef::{starts,ends}with.	2023-12-13 22:58:30 -08:00
Prathamesh Tagore	2255795f28	[mlir] [tensor] Fix typo in tensor.pack documentation (#74922 )	2023-12-14 11:21:10 +05:30
Keren Zhou	e66f97e8a8	[mlir] Fix loop pipelining when the operand of `yield` is not defined in the loop body (#75423 )	2023-12-13 19:19:13 -08:00
Matthias Springer	6d3ebd831c	[mlir][affine] Allow `memref.cast` in `isDimOpValidSymbol` (#74401 ) `isDimOpValidSymbol` is used during the verification of `affine.for` ops. It is used to check if LB/UB values are valid symbols. This change adds support for `memref.cast`, which can be skipped over if it is a ranked -> ranked cast. This change fixes `mlir/test/Transforms/canonicalize.mlir`, which used to fail when verifying the IR after each pattern application (#74270). In this test case, a pattern that folds dynamic offsets/sizes/strides to static ones is applied. This pattern inserts a trivial `memref.cast` that can be folded away. This folding happens after the pattern application, so the IR fails to verify after applying the offsets/sizes/strides canonicalization pattern. Note: The verifier of `affine.for` violates MLIR guidelines. Only local properties of an op should be verified. The verifier should not inspect the defining ops of operands. (This would mean that constraints such as "operand is a valid affine symbol" cannot be verified.)	2023-12-14 08:54:39 +09:00
Aart Bik	e52c941921	[mlir][sparse] minor cleanup of transform/utils (#75396 ) Consistent include macro naming Modified and added comments	2023-12-13 15:18:35 -08:00
Prathamesh Tagore	f397bdf5ae	[mlir][tensor] Fold consumer linalg transpose with producer tensor pack (#74206 ) Partial fix to https://github.com/openxla/iree/issues/15367	2023-12-13 14:26:19 -08:00
Fangrui Song	2a9d8caf29	Revert "[MLIR] Fuse locations of merged constants (#74670 )" This reverts commit `87e2e89019`. and its follow-ups `0d1490f09f` (#75218) and `6fe3cd5467` (#75312). We observed significant OOM/timeout issues due to #74670 to quite a few services including google-research/swirl-lm. The follow-up #75218 and #75312 do not address the issue. Perhaps this is worth more investigation.	2023-12-13 13:49:03 -08:00
Fangrui Song	71ba8bb4a7	[mlir,vector] Fix -Wunused-variable	2023-12-13 13:28:17 -08:00
Andrzej Warzyński	c02d07fdf0	[mlir][vector] Add pattern to drop unit dim from elementwise(a, b)) (#74817 ) For vectors with either leading or trailing unit dim, replaces: elementwise(a, b) with: sc_a = shape_cast(a) sc_b = shape_cast(b) res = elementwise(sc_a, sc_b) return shape_cast(res) The newly inserted shape_cast Ops fold (before elementwise Op) and then restore (after elementwise Op) the unit dim. Vectors `a` and `b` are required to be rank > 1. Example: ```mlir %mul = arith.mulf %B_row, %A_row : vector<1x[4]xf32> %cast = vector.shape_cast %mul : vector<1x[4]xf32> to vector<[4]xf32> ``` gets converted to: ```mlir %B_row_sc = vector.shape_cast %B_row : vector<1x[4]xf32> to vector<[4]xf32> %A_row_sc = vector.shape_cast %A_row : vector<1x[4]xf32> to vector<[4]xf32> %mul = arith.mulf %B_row_sc, %A_row_sc : vector<[4]xf32> %mul_sc = vector.shape_cast %mul : vector<[4]xf32> to vector<1x[4]xf32> %cast = vector.shape_cast %mul_sc : vector<1x[4]xf32> to vector<[4]xf32> ``` In practice, the bottom 2 shape_cast(s) will be folded away.	2023-12-13 20:29:12 +00:00
Bharathi Ramana Joshi	8d7c979815	[MLIR][Presburger] Fix IntegerRelation::swapVar not swapping identifiers (#74407 ) This commit fixes a bug where identifiers were not swapped when doing a IntegerRelation::swapVar.	2023-12-13 22:47:19 +05:30
Tom Eccles	79524ba527	[mlir][ArmSME] Add sve streaming compatible attribute (#75222 ) Following the same path already used for ArmStreaming and ArmLocallyStreaming. This should correspond to clang's __arm_streaming_compatible attribute.	2023-12-13 13:53:01 +00:00
Benjamin Chetioui	6fe3cd5467	[MLIR][NFC] Add fast path to fused loc flattening. (#75312 ) This is a follow-up on [PR 75218](https://github.com/llvm/llvm-project/pull/75218) that avoids reconstructing a fused loc in the `FlattenFusedLocationRecursively` helper when there has been no change.	2023-12-13 12:40:41 +01:00
Sungsoon Cho	762964e97f	Add cosh op to the math dialect. (#75153 )	2023-12-13 12:25:37 +01:00
Benjamin Maxwell	9505cf457f	[mlir][ArmSME][test] Use `only-if-required-by-ops` rather than `enable_arm_streaming_ignore` (NFC) (#75209 ) This moves the fix out of the IR and into the pass description, which seems nicer. It also works as an integration test for the `only-if-required-by-ops` flag :)	2023-12-13 10:29:28 +00:00
Georgios Pinitas	92433285d7	[mlir][ArmSME] Add missing dependencies in ArmSME transforms (#75269 ) Inject missing dependency between generated files that could cause build issues. Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com>	2023-12-13 10:28:16 +00:00
Christian Ulmann	eab62971cd	[MLIR][LLVM] Support nameless and scopeless global constants (#75307 ) This commit ensures that we model DI information for global constants correctly. These constructs can lack scopes, names, and linkage names, so these parameters were made optional for the DIGlobalVariable attribute.	2023-12-13 10:47:59 +01:00
Johannes de Fine Licht	ed5813c4aa	[MLIR][LLVM] Remove disallowlist from LLVM inliner (#75303 ) The disallowlist was used as a migration strategy while support was extended to more side effecting operations. We now (to the best of our knowledge) support all side effecting operations, so never fail `isLegalToInline` on any LLVM operation. There is no test included, because that's exactly the reason for this change: there are no more unsupported operations in inlining; the existing tests for unsupported inlines have already been burninated.	2023-12-13 10:31:27 +01:00
Abhinav271828	84ab06ba2f	[MLIR][Presburger] Add Gram-Schmidt (#70843 ) Implement Gram-Schmidt orthogonalisation for the FracMatrix class. This requires dotProduct, which has been added as a util.	2023-12-13 08:28:47 +00:00
Aart Bik	365777ecbe	[mlir][sparse] refactor utilities into transform/utils dir (#75250 ) Separates actual transformation files from supporting utility files in the transforms directory. Includes a bazel overlay fix for the build (as well as a bit of cleanup of that file to be less verbose and more flexible).	2023-12-12 15:34:31 -08:00
Stella Laurenzo	8eff570482	Add missing dep on MLIRToLLVMIRTranslationRegistration to mlir-opt. (#75111 ) I was not able to fully triage why this just started failing on one of our bots as it seems that the use was added 4 months ago. I would assume that it was accidentally coming in transitively in some way as the dep was definitely missing. For context, this started failing in [our byo_llvm](https://github.com/openxla/iree/blob/main/build_tools/llvm/byo_llvm.sh) build on a stock build of MLIR on top of an existing LLVM. We were getting: ``` ld.lld: error: undefined symbol: mlir::registerSPIRVDialectTranslation(mlir::DialectRegistry&) >>> referenced by mlir-opt.cpp >>> tools/mlir-opt/CMakeFiles/mlir-opt.dir/mlir-opt.cpp.o:(main) ```	2023-12-12 14:10:06 -08:00
Benjamin Chetioui	0d1490f09f	[MLIR] Flatten fused locations when merging constants. (#75218 ) [PR 74670](https://github.com/llvm/llvm-project/pull/74670) added support for merging locations at constant folding time. We have discovered that in some cases, the number of locations grows so big as to cause a compilation process to OOM. In that case, many of the locations end up appearing several times in nested fused locations. We add here a helper that always flattens fused locations in order to eliminate duplicates in the case of nested fused locations.	2023-12-12 22:00:23 +01:00
Aart Bik	047399c213	[mlir][sparse] cleanup of CodegenEnv reduction API (#75243 )	2023-12-12 12:44:46 -08:00
Yinying Li	31b72b0742	[mlir][sparse]Make isBlockSparsity more robust (#75113 ) 1. A single dimension can either be blocked (with floordiv and mod pair) or non-blocked. Mixing them would be invalid. 2. Block size should be non-zero value.	2023-12-12 13:43:03 -05:00
Boian Petkantchin	4b3446771f	[mlir][mesh] Add endomorphism simplification for all-reduce (#73150 ) Does transformations like all_reduce(x) + all_reduce(y) -> all_reduce(x + y) max(all_reduce(x), all_reduce(y)) -> all_reduce(max(x, y)) when the all_reduce element-wise op is max. Added general rewrite pattern HomomorphismSimplification and EndomorphismSimplification that encapsulate the general algorithm. Made specialization for all-reduce with respect to addf, addi, minsi, maxsi, minimumf and maximumf in the Arithmetic dialect.	2023-12-12 10:21:52 -08:00
Jakub Kuderski	8063622721	[mlir][vector] Allow vector distribution with multiple written elements (#75122 ) Add a configuration option to allow vector distribution with multiple elements written by a single lane. This is so that we can perform vector multi-reduction with multiple results per workgroup.	2023-12-12 13:15:17 -05:00
Rafael Ubal	a8f3860bcb	[mlir][tensor] Fix bug in `tensor.extract(tensor.from_elements)` folder (#75109 ) The folder for `tensor.extract` is not operating correctly when it is consuming the result of a `tensor.from_elements` operation. The existing unit test named `@extract_from_tensor.from_elements_3d` in `mlir/test/Dialect/Tensor/canonicalize.mlir` seems an attempt to stress this code. However, this unit tests creates a `tensor.from_elements` op exclusively from constants, which gets folded away into a single constant tensor. Therefore, the buggy code was never executed in unit tests. I have added a new unit test named `@extract_from_tensor.from_elements_variable_3d` that makes sure the `tensor.from_elements` op is not folded away by having its input operands come directly from function arguments. The original folder code would have made this test fail. This bug was notably affecting the lowering of the `tosa.pad` op in the `tosa-to-tensor` pass, where the generated code is likely to contain a `tensor.from_elements` + `tensor.extract` op sequence.	2023-12-12 15:36:52 +00:00
Dominik Adamski	b730703726	[NFC][MLIR][OpenMP] Add test to check lowering omp.wsloop for GPU (#74857 ) This test checks if proper OpenMP device RTL function is called to handle workshare loop for GPU. The code generation for GPU worksharing loops is implemented by the patch: https://github.com/llvm/llvm-project/pull/73360	2023-12-12 15:05:26 +01:00
lorenzo chelini	06c4f78b07	[MLIR][Linalg] improve silenceable failure msg for `lower_pack` (NFC) (#75053 ) Adjust the silenceable failure message as we lower `tensor.unpack` as a combination of `linalg.transpose` + `tensor.collapse_shape` and `tensor.extract_slice`.	2023-12-12 13:06:17 +01:00
Adrian Kuegel	8a5b448fa0	[mlir][GPU] Apply ClangTidy fixes Use const reference in loops if possible.	2023-12-12 07:34:03 +00:00
Ivan Radanov Ivanov	95dce3e86d	Link NVVM translation in the to LLVMIR registration library	2023-12-12 14:02:39 +09:00
Ivan R. Ivanov	d5fb4c0f11	[MLIR][NVVM] Enable nvvm intrinsics import to LLVMIR (#68843 ) Co-authored-by: Tobias Gysi <tobias.gysi@nextsilicon.com> Co-authored-by: Christian Ulmann <christianulmann@gmail.com>	2023-12-12 13:31:55 +09:00
Billy Zhu	87e2e89019	[MLIR] Fuse locations of merged constants (#74670 ) When merging constants by the operation folder, the location of the op that remains should be updated to track the new meaning of this op. This way we do not lose track of all possible source locations that the constant op came from, and the final location of the op is less reliant on the order of folding. This will also help debuggers understand how to step these instructions. This PR introduces a helper for operation folder to fuse another location into the location of an op. When an op is deduplicated, fuse the location of the op to be removed into the op that is retained. The retained op now represents both original ops. The FusedLoc will have a string metadata to help understand the reason for the location fusion (motivated by the [example](`71be8f3c23/mlir/include/mlir/IR/BuiltinLocationAttributes.td (L130)`) in the docstring of FusedLoc).	2023-12-11 19:31:54 -08:00
Maksim Levental	d36b483f4f	[mlir][python] update type stubs (#75099 )	2023-12-11 18:35:02 -06:00
Matthias Springer	95d6aa21fb	[mlir][SparseTensor][NFC] Use `tensor.empty` for dense tensors (#74804 ) Use `tensor.empty` + initialization for dense tensors instead of `bufferization.alloc_tensor`.	2023-12-12 08:56:47 +09:00
Matthias Springer	a43641c9db	[mlir][bufferization] Fix `regionOperatesOnMemrefValues` (#75016 ) `Region::walk([](Block *b) {...})` does not enumerate blocks that are direct children of the region. These blocks must be checked manually.	2023-12-12 08:56:23 +09:00
Andrzej Warzyński	07919cf895	Revert "[mlir][vector] Make `TransposeOpLowering` configurable (#73915 )" (#75062 ) Reverting a workaround intended specifically for SPRI-V. That workaround emerged from this discussion: * https://github.com/llvm/llvm-project/pull/72105 AFAIK, it hasn't been required in practice. This is based on IREE (https://github.com/openxla/iree), which has just bumped it's fork of LLVM without using it (). () `cef31e775e` This reverts commit `bbd2b08b95`.	2023-12-11 21:32:23 +00:00
Aart Bik	d96f46dd20	[mlir][sparse] fix bug in custom reduction scalarization code (#74898 ) Bug found with BSR of "spy" SDDMM method	2023-12-11 10:22:17 -08:00
Finn Plummer	40e2bb5330	[mlir][spirv] Add folding for Bitwise[Or\|And\|Xor] (#74193 ) Add missing constant propogation folder for Bitwise[Or\|And\|Xor]. Move previous Bitwise[Or\|And] fold implementations to SPIRVCanonicalization for consistency. Implement additional folding when lhs == rhs and rhs = 0 for Xor. As well as, update an Xor testcase to account for this introduced folding. This helps for readability of lowered code into SPIR-V. Part of work for #70704	2023-12-11 13:09:40 -05:00
Lorenzo Chelini	fcdb848596	[MLIR][Linalg] (NFC) Drop `verify-diagnostics` from `transpose-conv2d.mlir` We are not checking diagnostics in this test.	2023-12-11 16:13:19 +01:00
Shenghang Tsai	dc2ce60024	[mlir][CAPI] Add mlirOpOperandGetValue (#75032 )	2023-12-11 12:32:21 +01:00
Rik Huijzer	3764f5e816	[mlir][llvm] Fix negative GEP crash in type consistency (#74859 ) Fixes https://github.com/llvm/llvm-project/issues/74453. The `gepToByteOffset` was implicitly casting an signed integer to an unsigned integer even though negative dimensions are valid for `llvm.getelementptr`. --------- Co-authored-by: Tobias Gysi <tobias.gysi@nextsilicon.com>	2023-12-11 12:29:53 +01:00
Benjamin Maxwell	01ac530a2e	[mlir][ArmSME] Remove `vector.print` legality from ArmSMEToSCF (NFC) (#74875 ) This was moved to VectorToArmSME in #74063, so this is no longer needed. VectorToArmSME uses a greedy rewriter, so a similar legality rule is not needed there. See: `bbb8a0df73/mlir/lib/Conversion/VectorToArmSME/VectorToArmSMEPass.cpp (L35)`	2023-12-11 11:25:43 +00:00
Rik Huijzer	51e5f677c8	[mlir][vector] Fix crash on invalid `permutation_map` (#74925 ) Without this patch, MLIR crashes with ``` Assertion failed: (getNumDims() == map.getNumResults() && "Number of results mismatch"), function compose, file AffineMap.cpp, line 537. ``` during parsing.	2023-12-11 12:07:41 +01:00
Adrian Kuegel	ea2e83af55	[mlir][Python] Apply ClangTidy findings. move constructors should be marked noexcept	2023-12-11 09:43:08 +00:00
Victor Perez	13c648f6bd	[MLIR][IntegerRangeAnalysis] Avoid crash reached when loop bound is uninitialized (#74832 ) If the loop bound is not initialized, the analysis crashed, as it only checked for nullity. Also checking for initialization fixes the issue. Signed-off-by: Victor Perez <victor.perez@codeplay.com> Co-authored-by: Tsang, Whitney <whitney.tsang@intel.com>	2023-12-11 10:36:03 +01:00
Thomas Raoux	ef112833e1	[MLIR][SCF] Add support for pipelining dynamic loops (#74350 ) Support loops without static boundaries. Since the number of iteration is not known we need to predicate prologue and epilogue in case the number of iterations is smaller than the number of stages. This patch includes work from @chengjunlu	2023-12-10 22:32:11 -08:00
Benjamin Maxwell	a17671084d	[mlir][ArmSME] Update `-allocate-arm-sme-tiles` description (NFC) (#74871 )	2023-12-09 11:39:14 +00:00
Aart Bik	21213f39e2	[mlir][sparse] fix uninitialized dense tensor out in conv2d test (#74884 ) Note, tensor.empty may feed into SPARSE output (meaning it truly has no values yet), but for a DENSE output, it should always have an initial value. We ran a verifier over all our tests and this is the only remaining omission.	2023-12-08 12:44:57 -08:00
Aart Bik	3d3e46cc4d	[mlir][sparse] make test for block sparsity more robust (#74798 ) For BSR and convolutions, we encounter (d0, d1, d2, d3) -> ((d0 + d2) floordiv 2, (d1 + d3) floordiv 2, (d0 + d2) mod 2, (d1 + d3) mod 2) which crashed the current test. Note that an actual test and working code is still to follow (since we need to fix a few other things first)	2023-12-08 11:50:10 -08:00
Boian Petkantchin	944e031e36	[mlir][mesh] Use tensor shape notation for the shape of a cluster (#73826 ) Examle: substitute mesh.cluster @mesh0(rank = 2, dim_sizes = [0, 4]) with mesh.cluster @mesh0(rank = 2, dim_sizes = ?x4) Same as tensor/memref shapes. The only difference is for 0-rank shapes. With tensors you would have something like `tensor<f32>`. Here to avoid matching an empty string a 0-rank shape is denoted by `[]`.	2023-12-08 11:34:44 -08:00
Aman LaChapelle	46708a5bcb	[mlir][Pass] Move PassExecutionAction to Pass.h, NFC. (#74850 ) This patch moves PassExecutionAction to Pass.h so that it can be used by the action framework to introspect and intercede in pass managers that might be set up opaquely. This provides for a very particular use case, which essentially involves being able to intercede in a PassManager and skip or apply individual passes. Because of this, this patch also adds a test for this use case to verify that it could in fact work.	2023-12-08 11:22:08 -08:00
Peiming Liu	baa192ea65	[mlir][sparse] optimize memory loads to SSA values when generating sp… (#74787 ) …arse conv.	2023-12-08 09:22:19 -08:00
Frederik Harwath	f7250179e2	Implement acos operator in MLIR Math Dialect (#74584 ) Required for torch-mlir. Cf. llvm/torch-mlir#2604 "Implement torch.aten.acos".	2023-12-08 09:08:43 -08:00
Guray Ozen	c65d8c7187	[mlir][memref] extract_strided_metadata for zero-sized memref (#74835 )	2023-12-08 15:55:14 +01:00
Mehdi Amini	69a0a3be01	[mlir] Add missing MLIR_ENABLE_EXECUTION_ENGINE option to MLIRConfig.cmake.in This is the kind of options that downstream consumers of preconfigured MLIR packages can check to see if the execution engine is available or not.	2023-12-08 04:12:47 -08:00
Amir Bishara	cf2d625a5d	[mlir][linalg] Expose getPreservedProducerResults method from ElementwiseOpFusion file (#73850 ) Declare `getPreservedProducerResults` function which helps to get the preserved results of the producer linalg generic operation as a result of elementwise fusion.	2023-12-08 11:50:33 +02:00
xiaoleis-nv	c340cf0a35	Fix argument name of GEPOp builder (#74810 ) This MR fix the argument name of GEPOp builder from `basePtrType` to `elementType` to avoid confusion. Co-authored-by: Xiaolei Shi <xiaoleis@nvidia.com>	2023-12-08 00:28:12 -08:00
Mehdi Amini	847d8457d1	Apply clang-tidy fixes for performance-unnecessary-value-param in VectorToGPU.cpp (NFC)	2023-12-07 21:39:25 -08:00
Mehdi Amini	b8a3f0fd3a	Apply clang-tidy fixes for llvm-qualified-auto in VectorToGPU.cpp (NFC)	2023-12-07 21:39:25 -08:00
Mehdi Amini	1cef577b90	Apply clang-tidy fixes for llvm-qualified-auto in PredicateTree.cpp (NFC)	2023-12-07 21:39:25 -08:00
Mehdi Amini	345d574b65	Apply clang-tidy fixes for llvm-prefer-isa-or-dyn-cast-in-conditionals in MapMemRefStorageClassPass.cpp (NFC)	2023-12-07 21:39:25 -08:00
Mehdi Amini	6ac80a7677	Apply clang-tidy fixes for readability-identifier-naming in GPUToLLVMConversion.cpp (NFC)	2023-12-07 21:39:25 -08:00
Matthias Springer	f5724847ec	[mlir][Transforms][NFC] GreedyPatternRewriteDriver: Remove redundant worklist management code (#74796 ) Do not add the previous users of replaced ops to the worklist during `notifyOperationReplaced`. The previous users are modified inplace as part of `PatternRewriter::replaceOp`, which calls `PatternRewriter::replaceAllUsesWith`. The latter function updates all users with `updateRootInPlace`, which already puts all previous users of the replaced op on the worklist. No further worklist management work is needed in the `notifyOperationReplaced` callback.	2023-12-08 14:10:44 +09:00
Aart Bik	ec9e49796d	[mlir][sparse] add sparse convolution with 5x5 kernel (#74793 ) Also unifies some of the test set up parts in other conv tests	2023-12-07 18:11:04 -08:00
Aart Bik	7003e255d3	[mlir][sparse] code formatting (NFC) (#74779 )	2023-12-07 15:46:24 -08:00
harsh-nod	42bba97fc2	[mlir] Extend CombineTransferReadOpTranspose pattern to handle extf ops. (#74754 ) This patch modifies the CombineTransferReadOpTranspose pattern to handle extf ops. Also adds a test which shows the transpose getting folded into the transfer_read.	2023-12-07 15:01:55 -08:00
max	4a6ed4a90d	[mlir][python] fix affine test	2023-12-07 16:21:57 -06:00
Maksim Levental	98d8dce6e9	[mlir][affine] implement inferType for delinearize (#74644 )	2023-12-07 15:59:52 -06:00
Peiming Liu	097d2f1417	[mlir][sparse] optimize memory load to SSA value when generating spar… (#74750 ) …se conv kernel.	2023-12-07 12:00:25 -08:00
Maksim Levental	db3bc49487	[mlir][python] fix up affine for (#74495 )	2023-12-07 10:55:55 -06:00
Pablo Antonio Martinez	b396e5429c	Reland "[MLIR][Transform] Add attribute in MatchOp to filter by operand type (#67994 )" Test was failing due to a different transform sequence declaration (transform sequence were used, while now it should be named transform sequence). Test is now fixed.	2023-12-07 11:57:02 +00:00
Mehdi Amini	6b0ed49c8e	[mlir] Fix missing cmake dependency causing non-deterministic build failure (NFC) Fixes #74611	2023-12-07 03:22:45 -08:00
Tom Eccles	e9e1c411b6	[mlir][LLVM] Add nsw and nuw flags (#74508 ) The implementation of these are modeled after the existing fastmath flags for floating point arithmetic.	2023-12-07 10:35:00 +00:00
Mikhail Goncharov	10879403e5	Revert "[MLIR][Transform] Add attribute in MatchOp to filter by operand type (#67994 )" This reverts commit `c4399130ae`. Test fails https://lab.llvm.org/buildbot/#/builders/272/builds/2757	2023-12-07 10:28:35 +01:00
Rik Huijzer	9e8a737742	[mlir][doc] Fix reported Builtin (syntax) issues (#74635 ) Fixes https://github.com/llvm/llvm-project/issues/62489. Some notes for each number: - 1 `bool-literal` should be reasonably clear from context. - 2 Fixed. - 3 This is now fixed. `loc(fused[])` is valid, but `loc(fused["foo",])` is not. - 4 This operation uses `assemblyFormat` so the syntax is correct (assuming ODS is correct). - 5 This operation uses `assemblyFormat` so the syntax is correct (assuming ODS is correct). - 6 Added an example. - 7 The suggested fix is in line with other `assemblyFormat` examples. - 8 Added syntax and an example. - 9 I don't know what this is referring too. - 10 Added example. - 11 and 12 suggestion seems wrong as the `ShapedTypeInterface` could be extended by clients, so is not limited to tensors or vectors. - 13 is already reasonably clear with the example, I think. - 14 is already reasonably clear with the example, I think. - 15 Added an example from the `opaque_locations.mlir` tests. - 16 The answer to this seems to change over time and depend on the use case? Suggestions by reviewers are welcome.	2023-12-07 10:25:48 +01:00
Pablo Antonio Martinez	c4399130ae	[MLIR][Transform] Add attribute in MatchOp to filter by operand type (#67994 ) This patchs adds the `filter_operand_types` attribute to transform::MatchOp, allowing to filter ops depending on their operand types.	2023-12-07 08:28:52 +00:00
Jacob Yu	0c17f43655	[mlir][arith] Overflow semantics in documentation for muli, subi, and addi (#74346 ) Following discussions from this RFC: https://discourse.llvm.org/t/rfc-integer-overflow-semantics Adding the overflow semantics into the muli, subi and addi arith operations.	2023-12-07 01:34:32 -05:00
Matthias Springer	986287e7f3	[mlir][SparseTensor] Fix invalid API usage in patterns (#74690 ) Rewrite patterns must return `success` if the IR was modified. This commit fixes sparse tensor tests such as `SparseTensor/sparse_fusion.mlir`, `SparseTensor/CPU/sparse_reduce_custom.mlir`, `SparseTensor/CPU/sparse_semiring_select.mlir` when running with `MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS`.	2023-12-07 12:05:20 +09:00
Matthias Springer	1612993788	[mlir][complex] Allow integer element types in `complex.constant` ops (#74564 ) The op used to support only float element types. This was inconsistent with `ConstantOp::isBuildableWith`, which allows integer element types. The complex type allows any float/integer element type. Note: The other complex dialect ops do not support non-float element types yet. The main purpose of this change to fix `Tensor/canonicalize.mlir`, which is currently failing when verifying the IR after each pattern application (#74270). ``` within split at mlir/test/Dialect/Tensor/canonicalize.mlir:231 offset :8:15: error: 'complex.constant' op result #0 must be complex type with floating-point elements, but got 'complex<i32>' %complex1 = tensor.extract %c1[] : tensor<complex<i32>> ^ within split at mlir/test/Dialect/Tensor/canonicalize.mlir:231 offset :8:15: note: see current operation: %0 = "complex.constant"() <{value = [1 : i32, 2 : i32]}> : () -> complex<i32> "func.func"() <{function_type = () -> tensor<3xcomplex<i32>>, sym_name = "extract_from_elements_complex_i"}> ({ %0 = "complex.constant"() <{value = [1 : i32, 2 : i32]}> : () -> complex<i32> %1 = "arith.constant"() <{value = dense<(3,2)> : tensor<complex<i32>>}> : () -> tensor<complex<i32>> %2 = "arith.constant"() <{value = dense<(1,2)> : tensor<complex<i32>>}> : () -> tensor<complex<i32>> %3 = "tensor.extract"(%1) : (tensor<complex<i32>>) -> complex<i32> %4 = "tensor.from_elements"(%0, %3, %0) : (complex<i32>, complex<i32>, complex<i32>) -> tensor<3xcomplex<i32>> "func.return"(%4) : (tensor<3xcomplex<i32>>) -> () }) : () -> () ```	2023-12-07 03:22:53 +01:00
Matthias Springer	c6dc9cd1fb	[mlir] Fix build after `77f5b33c`	2023-12-07 10:19:02 +09:00
Peiming Liu	78e2b74f96	[mlir][sparse] fix bugs when generate sparse conv_3d kernels. (#74561 )	2023-12-06 15:59:10 -08:00

... 2 3 4 5 6 ...

18613 Commits