llvm-capstone

mirror of https://github.com/capstone-engine/llvm-capstone.git synced 2024-11-27 07:31:28 +00:00

Author	SHA1	Message	Date
Yuanqiang Liu	510ec2079e	[mlir] fix IRPrinterInstrumentation to use the user-provided IRPrinting config (#70023 )	2024-01-14 15:58:32 -08:00
Matthias Springer	4ed696c348	[mlir][Transforms] `OneToNTypeConversion.cpp`: Fix invalid IR (#77922 ) `buildUnrealizedCast` used to generate invalid `builtin.unrealized_conversion_cast` ops with zero results. This commit fixes `test/Conversion/OneToNTypeConversion/one-to-n-type-conversion.mlir` when running with `MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS`. ``` * Pattern (anonymous namespace)::ConvertMakeTupleOp : 'test.make_tuple -> ()' { Trying to match "(anonymous namespace)::ConvertMakeTupleOp" [...] "(anonymous namespace)::ConvertMakeTupleOp" result 1 } -> success : pattern applied successfully // * IR Dump After Pattern Application * mlir-asm-printer: Verifying operation: func.func 'builtin.unrealized_conversion_cast' op expected at least one result for cast operation mlir-asm-printer: 'func.func' failed to verify and will be printed in generic form "func.func"() <{function_type = (i1, i2) -> (i1, i2), sym_name = "pack_unpack"}> ({ ^bb0(%arg0: i1, %arg1: i2): %0 = "test.make_tuple"() : () -> tuple<> "builtin.unrealized_conversion_cast"(%0) {"__one-to-n-type-conversion_cast-kind__" = "target"} : (tuple<>) -> () [...] }) : () -> () within split at /usr/local/google/home/springerm/mlir_public/llvm-project/mlir/test/Conversion/OneToNTypeConversion/one-to-n-type-conversion.mlir:1 offset :20:8: error: 'builtin.unrealized_conversion_cast' op expected at least one result for cast operation %0 = "test.make_tuple"() : () -> tuple<> ^ within split at /usr/local/google/home/springerm/mlir_public/llvm-project/mlir/test/Conversion/OneToNTypeConversion/one-to-n-type-conversion.mlir:1 offset :20:8: note: see current operation: "builtin.unrealized_conversion_cast"(%0) {"__one-to-n-type-conversion_cast-kind__" = "target"} : (tuple<>) -> () LLVM ERROR: IR failed to verify after pattern application ```	2024-01-14 12:02:54 +01:00
Benjamin Maxwell	b0aebbd41a	[mlir][ArmSME] Workaround for old versions of GCC (NFC) (#78046 ) See: https://github.com/llvm/llvm-project/pull/76086#issuecomment-1890424955	2024-01-14 09:18:53 +00:00
Benjamin Kramer	9c33a2e9a3	[MLIR][Presburger] Fold loop into assert This way it doesn't trigger -Wunused-variable when assertions are disabled.	2024-01-13 17:52:58 +01:00
Abhinav271828	850f713e80	[MLIR][Presburger] Helper functions to compute the constant term of a generating function (#77819 ) We implement two functions that are needed to compute the constant term of a GF. One finds a vector not orthogonal to all the non-null vectors in a given set. One computes the coefficient of any term in an arbitrary rational function (quotient of two polynomials).	2024-01-13 21:30:06 +05:30
Bharathi Ramana Joshi	66786a79d6	[MLIR][Presburger] Implement Matrix::moveColumns (#68362 )	2024-01-13 18:51:26 +05:30
Kazu Hirata	8e8bbbd48e	[mlir] Use llvm::is_contained (NFC)	2024-01-12 22:08:29 -08:00
MaheshRavishankar	974ded9725	[mlir][Linalg] Change `linalg.transpose` to use the output indexing map as identity. (#77951 ) This makes it consistent with how other linalg operations represent indexing maps.	2024-01-12 14:17:51 -08:00
Felix Schneider	f6f1ab9d90	[mlir][scf] Fix `for-loop-peeling` crash (#77697 ) Before applying the peeling patterns, it can happen that the `ForOp` gets a step of zero during folding. This leads to a division-by-zero down the line. This patch adds an additional check for a constant-zero step and a test. Fix https://github.com/llvm/llvm-project/issues/75758	2024-01-12 19:08:16 +01:00
donald chen	eaa4b6cf29	[mlir][bufferization] Clone simplify fails when input and result type not cast compatiable (#71310 ) The simplify of bufferization.clone generates a memref.cast op, but the checks in simplify do not verify whether the operand types and return types of clone op is compatiable, leading to errors. This patch addresses this issue.	2024-01-12 16:11:00 +01:00
Benjamin Maxwell	5417a5fed6	[mlir][ArmSME] Add rudimentary support for tile spills to the stack (#76086 ) This adds very basic (and inelegant) support for something like spilling and reloading tiles, if you use more SME tiles than physically exist. This is purely implemented to prevent the compiler from aborting if a function uses too many tiles (i.e. due to bad unrolling), but is expected to perform very poorly. Currently, this works in two stages: During tile allocation, if we run out of tiles instead of giving up, we switch to allocating 'in-memory' tile IDs. These are tile IDs that start at 16 (which is higher than any real tile ID). A warning will also be emitted for each (root) tile op assigned an in-memory tile ID: ``` warning: failed to allocate SME virtual tile to operation, all tile operations will go through memory, expect degraded performance ``` Everything after this works like normal until `-convert-arm-sme-to-llvm` Here the in-memory tile op: ```mlir arm_sme.tile_op { tile_id = <IN MEMORY TILE> } ``` Is lowered to: ```mlir // At function entry: %alloca = memref.alloca ... : memref<?x?xty> // Around the op: // Swap the contents of %alloca and tile 0. scf.for %slice_idx { %current_slice = "arm_sme.intr.read.horiz" ... <{tile_id = 0 : i32}> "arm_sme.intr.ld1h.horiz"(%alloca, %slice_idx) <{tile_id = 0 : i32}> vector.store %current_slice, %alloca[%slice_idx, %c0] } // Execute op using tile 0. arm_sme.tile_op { tile_id = 0 } // Swap the contents of %alloca and tile 0. // This restores tile 0 to its original state. scf.for %slice_idx { %current_slice = "arm_sme.intr.read.horiz" ... <{tile_id = 0 : i32}> "arm_sme.intr.ld1h.horiz"(%alloca, %slice_idx) <{tile_id = 0 : i32}> vector.store %current_slice, %alloca[%slice_idx, %c0] } ``` This is inserted during the lowering to LLVM as spilling/reloading registers is a very low-level concept, that can't really be modeled correctly at a high level in MLIR. Note: This is always doing the worst case full-tile swap. This could be optimized to only spill/load data the tile op will use, which could be just a slice. It's also not making any use of liveness, which could allow reusing tiles. But these is not seen as important as correct code should only use the available number of tiles.	2024-01-12 14:51:47 +00:00
Matthias Springer	dec908a285	[mlir][Transforms] `GreedyPatternRewriteDriver`: log successful folding (#77796 ) Similar to successful pattern applications, dump the rewritten IR after each successful folding when running with `-debug`.	2024-01-12 15:50:52 +01:00
Matthias Springer	ad100b36e7	[mlir][vector] Fix dominance error in warp vector distribution (#77771 ) This commit fixes a test in `vector-warp-distribute.mlir` when `MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS` is enabled. ``` within split at /usr/local/google/home/springerm/mlir_public/llvm-project/mlir/test/Dialect/Vector/vector-warp-distribute.mlir:1 offset :18:10: error: operand #0 does not dominate this use %1 = vector.extract %0[9] : f32 from vector<64xf32> ^ within split at /usr/local/google/home/springerm/mlir_public/llvm-project/mlir/test/Dialect/Vector/vector-warp-distribute.mlir:1 offset :18:10: note: see current operation: %1 = "affine.apply"(%8) <{map = affine_map<()[s0] -> (s0 ceildiv 2)>}> : (index) -> index within split at /usr/local/google/home/springerm/mlir_public/llvm-project/mlir/test/Dialect/Vector/vector-warp-distribute.mlir:1 offset :18:10: note: operand defined here (op in a child region) "func.func"() <{function_type = (index) -> f32, sym_name = "vector_extract_1d"}> ({ ^bb0(%arg0: index): %0:2 = "vector.warp_execute_on_lane_0"(%arg0) <{warp_size = 32 : i64}> ({ %7 = "some_def"() : () -> vector<64xf32> %8 = "arith.constant"() <{value = 9 : index}> : () -> index %9 = "vector.extractelement"(%7, %8) : (vector<64xf32>, index) -> f32 "vector.yield"(%9, %7) : (f32, vector<64xf32>) -> () }) : (index) -> (f32, vector<2xf32>) %1 = "affine.apply"(%8) <{map = affine_map<()[s0] -> (s0 ceildiv 2)>}> : (index) -> index %2 = "affine.apply"(%8) <{map = affine_map<()[s0] -> (s0 mod 2)>}> : (index) -> index %3 = "vector.extractelement"(%0#1, %2) : (vector<2xf32>, index) -> f32 %4 = "arith.index_cast"(%1) : (index) -> i32 %5 = "arith.constant"() <{value = 32 : i32}> : () -> i32 %6:2 = "gpu.shuffle"(%3, %4, %5) <{mode = #gpu<shuffle_mode idx>}> : (f32, i32, i32) -> (f32, i1) "func.return"(%6#0) : (f32) -> () }) : () -> () LLVM ERROR: IR failed to verify after pattern application ``` The position at which `vector.extractelement` extracts must also be distributed. The fix in `WarpOpExtractElement` is similar to `WarpOpInsertElement`.	2024-01-12 15:08:13 +01:00
Guray Ozen	ae5d63924a	[mlir][nvvm] Introduce `cp.async.bulk.wait_group` (#77917 )	2024-01-12 14:16:38 +01:00
Matthias Springer	aa2dc792ab	[mlir][vector] Fix rewrite pattern API violation in `VectorToSCF` (#77909 ) A rewrite pattern is not allowed to change the IR if it returns "failure". This commit fixes `test/Conversion/VectorToSCF/vector-to-scf.mlir` when running with `MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS`. ``` Processing operation : 'vector.transfer_read'(0x55823a409a60) { %5 = "vector.transfer_read"(%arg0, %0, %0, %2, %4) <{in_bounds = [true, true], operandSegmentSizes = array<i32: 1, 2, 1, 1>, permutation_map = affine_map<(d0, d1) -> (d0, d1)>}> : (memref<?x4xf32>, index, index, f32, vector<[4]x4xi1>) -> vector<[4]x4xf32> * Pattern (anonymous namespace)::lowering_n_d_unrolled::UnrollTransferReadConversion : 'vector.transfer_read -> ()' { Trying to match "(anonymous namespace)::lowering_n_d_unrolled::UnrollTransferReadConversion" ** Insert : 'vector.splat'(0x55823a445640) "(anonymous namespace)::lowering_n_d_unrolled::UnrollTransferReadConversion" result 0 } -> failure : pattern failed to match LLVM ERROR: pattern returned failure but IR did change ```	2024-01-12 13:44:54 +01:00
Oleksandr "Alex" Zinenko	2798b72ae7	[mlir] introduce debug transform dialect extension (#77595 ) Introduce a new extension for simple print-debugging of the transform dialect scripts. The initial version of this extension consists of two ops that are printing the payload objects associated with transform dialect values. Similar ops were already available in the test extenion and several downstream projects, and were extensively used for testing.	2024-01-12 13:24:02 +01:00
Matthias Springer	35c19fdde2	[mlir][vector] Support warp distribution of `transfer_read` with dependencies (#77779 ) Support distribution of `vector.transfer_read` ops when operands are defined inside of the region of `warp_execute_on_lane_0` (except for the buffer from which the op is reading). Such IR was previously not supported. This commit changes the implementation such that indices and the padding value are also distributed. This commit simplifies the implementation considerably: the original implementation created a new `transfer_read` op and then checked if this new op is valid. If not, the rewrite pattern failed. This was a bit hacky. It was also a violation of the rewrite pattern API (detected by `MLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS`) because the IR was modified, but the pattern returned "failure".	2024-01-12 11:55:37 +01:00
Matthias Springer	0a8e3dd432	[mlir][Interfaces] `DestinationStyleOpInterface`: Rename `hasTensor/BufferSemantics` (#77574 ) Rename interface functions as follows: * `hasTensorSemantics` -> `hasPureTensorSemantics` * `hasBufferSemantics` -> `hasPureBufferSemantics` These two functions return "true" if the op has tensor/buffer operands but not buffer/tensor operands. Also drop the "ranked" part from the interface, i.e., do not distinguish between ranked/unranked types. The new function names describe the functions more accurately. They also align their semantics with the notion of "tensor semantics" with the bufferization framework. (An op is supposed to be bufferized if it has tensor operands, and we don't care if it also has memref operands.) This change is in preparation of #75273, which adds `BufferizableOpInterface::hasTensorSemantics`. By renaming the functions in the `DestinationStyleOpInterface`, we can avoid name clashes between the two interfaces.	2024-01-12 10:02:54 +01:00
Jie Fu	c297597e1c	[mlir] Fix linking failure of libMLIRTilingInterfaceTestPasses.dylib (NFC)	2024-01-12 16:06:25 +08:00
Guray Ozen	249186701d	[mlir][nvgpu] Improve verifier of `ldmatrix` (#77807 ) PR improves the verifier of `nvgpu.ldmatrix` Op, so `nvgpu-to-nvvm` lowering does not crash.	2024-01-12 08:57:12 +01:00
MaheshRavishankar	aa2a96a24a	[mlir][TilingInterface] Move TilingInterface tests to use transform dialect ops. (#77204 ) In the process a couple of test transform dialect ops are added just for testing. These operations are not intended to use as full flushed out of transformation ops, but are rather operations added for testing. A separate operation is added to `LinalgTransformOps.td` to convert a `TilingInterface` operation to loops using the `generateScalarImplementation` method implemented by the operation. Eventually this and other operations related to tiling using the `TilingInterface` need to move to a better place (i.e. out of `Linalg` dialect)	2024-01-11 21:31:03 -08:00
Bharathi Ramana Joshi	c39926e679	[MLIR][Presburger] Fix style violations in `ff80414` (NFC) (#76720 ) Use preincrement not postincrement; use `Identifier::getIds` not `getVarKindOffset`	2024-01-12 10:36:28 +05:30
Ivan Butygin	5f59b720a8	Revert "[mlir][arith] Add overflow flags support to arith ops (#77211 )" Temporarily reverting as it broke python bindings This reverts commit `a7262d2d9b`.	2024-01-12 00:05:22 +01:00
Ivan Butygin	5afc4f3a5f	Revert "[mlir][arith][nfc] Fix typos (#77700 )" Temporarily reverting as it broke python bindings This reverts commit `9ed30012fb`.	2024-01-12 00:05:21 +01:00
Ivan Butygin	649b391799	Revert "[mlir][spirv] Lower `arith` overflow flags to corresponding SPIR-V op decorations (#77714 )" Temporaryly reverting as it broke python bindings This reverts commit `4278d9b593`.	2024-01-12 00:05:21 +01:00
Jakub Kuderski	3513267770	[mlir] Add op printing flag to skip regions (#77726 ) The new flag, `--mlir-print-skip-regions`, sets the op printing option that disables region printing. This results in the usual `--mlir-print-ir-*` debug options printing only the names of the executed passes and the signatures of the ops. Example: ```mlir // -----// IR Dump Before CSE (cse) //----- // func.func @bar(%arg0: f32, %arg1: f32) -> f32 {...} // -----// IR Dump Before Canonicalizer (canonicalize) //----- // func.func @bar(%arg0: f32, %arg1: f32) -> f32 {...} ``` The main use-case is to be triage compilation issues (crashes, slowness) on very deep pass pipelines and with very large IR files, where printing IR is prohibitively slow otherwise.	2024-01-11 17:52:41 -05:00
Valentin Clement (バレンタインクレメン)	40f5f90507	[mlir][openacc][flang] Simplify gang, vector and worker representation (#77667 ) The IR representation for gang, vector and worker has grown with the support for device_type. This patch simplify the IR representation for gang, vector and worker information on the acc.loop operation. When the only the keyword is present without any values, the information is printed at the same place than when there is values. The device_type is omitted if there is no values and it is equal to None. Otherwise the full information is displayed. First the keyword only device_type information and then the values with their device_type.	2024-01-11 13:02:06 -08:00
Kazu Hirata	3e82663b05	[Dialect] Fix a warning This patch fixes: mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp:3154:8: error: unused variable 'rank' [-Werror,-Wunused-variable]	2024-01-11 12:18:09 -08:00
Felix Schneider	4619e21c72	[mlir][memref] Transpose: allow affine map layouts in result, extend folder (#76294 ) Currently, the `memref.transpose` verifier forces the result type of the Op to have an explicit `StridedLayoutAttr` via the method `inferTransposeResultType`. This means that the example Op given in the documentation is actually invalid because it uses an `AffineMap` to specify the layout. It also means that we can't "un-transpose" a transposed memref back to the implicit layout form, because the verifier will always enforce the explicit strided layout. This patch makes the following changes: 1. The verifier checks whether the canonicalized strided layout of the result Type is identitcal to the canonicalized infered result type layout. This way, it's only important that the two Types have the same strided layout, not necessarily the same representation of it. 2. The folder is extended to support folding away the trivial case of identity permutation and to fold one transposition into another by composing the permutation maps.	2024-01-11 19:54:49 +01:00
Felix Schneider	061b777c82	[mlir][affine] Add dependency on `UBDialect` for `PoisonAttr` (#77691 ) The folder for `AffineApplyOp` will try creating a `PoisonAttr` under certain circumstances. However, this will result in a crash if the `UBDialect` isn't loaded. This patch adds a dependency of `AffineDialect` on `UBDialect`.	2024-01-11 19:52:15 +01:00
Mats Petersson	21e1bf2d00	Add more ZA modes (#77361 ) Add more ZA modes Adds the arm_shared_za and arm_preserves_za attributes to the existing arm_new_za attribute. The functionality already exists in LLVM, so just "linking the pieces together". For more details see: https://arm-software.github.io/acle/main/acle.html#sme-attributes-relating-to-za	2024-01-11 18:49:52 +00:00
Ivan Butygin	4278d9b593	[mlir][spirv] Lower `arith` overflow flags to corresponding SPIR-V op decorations (#77714 )	2024-01-11 20:39:57 +03:00
Jie Fu	21133f1da4	[TOSA] Fix -Wdangling-gsl and -Wunused-variable in TosaToLinalg.cpp (NFC) llvm-project/mlir/lib/Conversion/TosaToLinalg/TosaToLinalg.cpp:2376:9: error: object backing the pointer will be destroyed at the end of the full-expression [-Werror,-Wdangling-gsl] tensor::getMixedSizes(rewriter, loc, input_real); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ llvm-project/mlir/lib/Conversion/TosaToLinalg/TosaToLinalg.cpp:2366:10: error: unused variable 'imag_el_ty' [-Werror,-Wunused-variable] auto imag_el_ty = cast<FloatType>( ^ 2 errors generated.	2024-01-11 22:41:17 +08:00
Dmitriy Smirnov	566124222e	[TOSA] FFT2D operator (#77005 ) This PR adds lowering for TOSA Fft2d operator down to Linalg.	2024-01-11 12:23:00 +00:00
Benjamin Maxwell	dc974573a8	[mlir][ArmSME][test] Make use of arm_sme.streaming_vl (NFC) (#77322 )	2024-01-11 10:24:55 +00:00
Matthias Springer	21aacb0b4c	[mlir] Improve `GreedyPatternRewriteDriver` and pass documentation (#77614 ) Clarify what kind of IR modifications are allowed. Also improve the documentation of the greedy rewrite driver entry points. Addressing comments in #76219.	2024-01-11 11:24:28 +01:00
Ivan Butygin	9ed30012fb	[mlir][arith][nfc] Fix typos (#77700 ) Cleanup after https://github.com/llvm/llvm-project/pull/77211	2024-01-11 06:51:57 +03:00
Boian Petkantchin	31fd6d116d	[mlir][mesh] fix ProcessMultiIndexOp building (#77676 ) Insert default empty mesh axes array instead of null attribute without MLIR context, since the attribute is default-valued not just optional.	2024-01-10 17:28:17 -08:00
Yinying Li	753dc0a01c	[mlir][verifyMemref] Fix bug and support more types for verifyMemref (#77682 ) 1. Fix a bug in verifyMemref to pass in `data` instead of `baseptr`, which didn't verify data correctly. 2. Add `==` for f16 and bf16. 3. Add a comprehensive test of verifyMemref for all supported types.	2024-01-10 20:04:43 -05:00
Billy Zhu	422b84a771	[MLIR][LLVM] DI Expression Rewrite & Legalization (#77541 ) Add a rewriter for DIExpressions & use it to run legalization patterns before exporting to llvm (because LLVM dialect allows DI Expressions that may not be valid in LLVM IR). The rewriter driver works similarly to the existing mlir rewriter drivers, except it operates on lists of DIExpressionElemAttr (i.e. DIExpressionAttr). Each rewrite pattern transforms a range of DIExpressionElemAttr into a new list of DIExpressionElemAttr. In addition, this PR sets up a place to add legalization patterns that are broadly applicable internally to the LLVM dialect, and they will always be applied prior to export. This PR adds one pattern for merging fragment operators. --------- Co-authored-by: Tobias Gysi <tobias.gysi@nextsilicon.com>	2024-01-10 16:10:06 -08:00
Aart Bik	aec73eade7	[mlir][sparse] allow unknown ops in one-shot bufferization in mini-pipeline (#77688 ) Rationale: Since this mini-pipeline may be used in alternative pipelines (viz. different from the default "sparsifier" pipeline) where unknown ops are handled by alternative bufferization methods that are downstream of this mini-pipeline, we allow unknown ops by default (failure to bufferize is eventually apparent by failing to convert to LLVM IR). This is part of enabling e2e testing for TORCH-MLIR tests using a sparsifier backend	2024-01-10 13:36:20 -08:00
Abhinav271828	2dde029df8	[MLIR][Presburger] Implement computation of generating function for unimodular cones (#77235 ) We implement a function that computes the generating function corresponding to a unimodular cone. The generating function for a polytope is obtained by summing these generating functions over all tangent cones.	2024-01-11 01:28:36 +05:30
Emilio Cota	a1dc813f75	[mlir][mesh] fix unused variable error	2024-01-10 14:32:57 -05:00
Jacques Pienaar	c1d02bd147	[mlir] Change end of OperationDefinition. (#77273 ) Store the last token parsed in the parser state so that the range parsed can utilize its end rather than the start of the token after parsed. This results in a tighter range (especially true in the case of comments, see ```mlir \|%c4 = arith.constant 4 : index // Foo \| ``` vs ```mlir \|%c4 = arith.constant 4 : index\| ``` ). Discovered while working on a little textual post processing tool.	2024-01-10 10:41:02 -08:00
Durgadoss R	6a075a9d5d	[MLIR][NVVM]: Update setmaxregister NVVM Op (#77594 ) This patch updates the setmaxregister NVVM Op to use the intrinsics instead of inline-ptx. * The interface remains same (as expected). * Tests are added to verify the lowered intrinsics in Target/LLVMIR/nvvmir.mlir. Signed-off-by: Durgadoss R <durgadossr@nvidia.com>	2024-01-10 18:49:09 +01:00
Okwan Kwon	7cc9ae9551	[mlir] allow inlining complex ops (#77514 ) Complex ops are pure ops just like the arithmetic ops so they can be inlined.	2024-01-10 09:23:36 -08:00
lorenzo chelini	6bc7e3764c	[MLIR][Tensor] Fix checks for `fold-into-pack-and-unpack.mlir` (#77622 ) Fix after `113bce0`	2024-01-10 11:23:02 -06:00
Tai Ly	af78e5daf0	[mlir][tosa]Fix Rescale shift attr data type (#71084 ) Change Rescale shift attribute to be DenseI8ArrayAttr to match spec (instead of DenseI32ArrayAttr) This replaces https://reviews.llvm.org/D157439 Signed-off-by: Tai Ly <tai.ly@arm.com>	2024-01-10 16:57:39 +00:00
Han-Chung Wang	2472c45ba3	[mlir][tensor] Enhance pack/unpack simplification for identity outer_dims_perm cases. (#77409 ) They can be simplified to reshape ops if outer_dims_perm is an identity permutation. The revision adds a `isIdentityPermutation` method to IndexingUtils.	2024-01-10 08:30:34 -08:00
Andrzej Warzyński	6876fe53af	[mlir][linalg] Add a test to demonstrate peeling + vectorisation (#77590 ) Following on from #75842, we can demonstrate that loop peeling combined with masked vectorisation and existing canonicalization for vector.mask operations leads to the following loop structure: ``` // M dimension scf.for 1:M // N dimension (contains vector ops _without_ masking) scf.for 1:UB // K dimension scf.for 1:K vector.add // N dimension (contains vector ops _with_ masking) scf.for UB:N // K dimension scf.for 1:K vector.mask { vector.add } ``` This is particularly beneficial for scalable vectors which normally require masking. This example demonstrates how to avoid them.	2024-01-10 15:19:16 +00:00

1 2 3 4 5 ...

18613 Commits