llvm-capstone

mirror of https://github.com/capstone-engine/llvm-capstone.git synced 2024-12-14 11:39:35 +00:00

Author	SHA1	Message	Date
Matthias Springer	5958043e2d	[mlir][bufferization] Add `dump_alias_sets` option to transform op (#68289 ) Add `dump_alias_sets` to `transform.bufferization.one_shot_bufferize`. This option is useful for debugging. Also improve the verifier to ensure that `test_analysis_only` is set when other debugging flags are enabled.	2023-10-05 14:05:45 +02:00
Kohei Yamaguchi	777a6e6f10	[mlir][docs] Cleanup documentations [NFC] (#67945 ) - Fix missing links - Fix missing link format - Move transform::ApplyFuncToLLVMConversionPatternOp into Transform dialect - Remove duplicated MemRef's TOC - Remove duplicated Memref's dma_start/dma_wait docs	2023-10-05 13:33:41 +02:00
long.chen	5979e1dfb1	[mlir] Fix `empty-tensor-elimination` around self-copies (#68129 ) * Fixes #67977, a crash in `empty-tensor-elimination`. * Also improves `linalg.copy` canonicalization. * Also improves indentation indentation in `mlir-linalg-ods-yaml-gen.cpp`.	2023-10-05 12:04:20 +02:00
tdanyluk	a608830807	[mlir] Speed up FuncToLLVM using a SymbolTable (#68082 ) We have a project where this saves 23% of the compilation time. This means using hashmaps instead of searching in linked lists.	2023-10-05 11:24:52 +02:00
Guray Ozen	d20fbc9007	[MLIR][NVGPU] Introduce `nvgpu.wargroup.mma.store` Op for Hopper GPUs (#65441 ) This PR introduces a new Op called `warpgroup.mma.store` to the NVGPU dialect of MLIR. The purpose of this operation is to facilitate storing fragmanted result(s) `nvgpu.warpgroup.accumulator` produced by `warpgroup.mma` to the given memref. An example of fragmentated matrix is given here : https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#wgmma-64n16-d The `warpgroup.mma.store` does followings: 1) Takes one or more `nvgpu.warpgroup.accumulator` type (fragmented results matrix) 2) Calculates indexes per thread in warp-group and stores the data into give memref. Here's an example usage: ``` // A warpgroup performs GEMM, results in fragmented matrix %result1, %result2 = nvgpu.warpgroup.mma ... // Stores the fragmented result to memref nvgpu.warpgroup.mma.store [%result1, %result2], %matrixD : !nvgpu.warpgroup.accumulator< fragmented = vector<64x128xf32>>, !nvgpu.warpgroup.accumulator< fragmented = vector<64x128xf32>> to memref<128x128xf32,3> ```	2023-10-05 10:54:13 +02:00
Guray Ozen	b74cfc139a	[mlir][nvgpu] Improve nvgpu->nvvm transformation of `warpgroup.mma` Op (NFC) (#67325 ) This PR introduces substantial improvements to the readability and maintainability of the `nvgpu.warpgroup.mma` Op transformation from nvgpu->nvvm. This transformation plays a crucial role in GEMM and manages complex operations such as generating multiple wgmma ops and iterating their descriptors. The prior code lacked clarity, but this PR addresses that issue effectively. PR does followings: Introduces a helper class: `WarpgroupGemm` class encapsulates the necessary functionality, making the code cleaner and more understandable. Detailed Documentation: Each function within the helper class is thoroughly documented to provide clear insights into its purpose and functionality.	2023-10-05 10:16:59 +02:00
Guray Ozen	7eb2b99f16	[mlir] Change the class name of the `GenerateWarpgroupDescriptor` (#68286 )	2023-10-05 10:15:40 +02:00
Nicolas Vasilache	cc2d9515d0	[mlir][Transform] NFC - Fix missing field in copy constructor	2023-10-05 07:40:35 +00:00
Guray Ozen	6dc7717bca	[MLIR][NVGPU] Change name `wgmma.descriptor` to `warpgroup.descriptor` (NFC) (#67526 ) NVGPU dialect is gaining large support for warpgroup level operations, and their names always starts with `warpgroup....`. This PR changes name of Op and type from `wgmma.descriptor` to `warpgroup.descriptor` for sake of consistency.	2023-10-05 09:01:48 +02:00
MaheshRavishankar	f28f09dcf0	[mlir][Vector] Add Broadcast -> CastOp reordering to SinkVectorBroadcasting patterns. (#68257 ) Also fix an issue with sink broadcast across elementwise where `arith.cmpf` is elementwise, but result type is different. The result type is not same as the operand type, creating illegal IR. Similar issue with `vector.fma` which only accepts vector operand types, while broadcasts can have scalar sources. Sinking broadcast across would result in an illegal `vector.fma` (with scalar operands).	2023-10-04 21:27:24 -07:00
Maksim Levental	6f44f87011	[mlir][python] Enable py312. (#68009 ) Python 3.12 has been released so why not support it.	2023-10-04 20:35:24 -05:00
Yinying Li	6280e23124	[mlir][sparse] Print new syntax (#68130 ) Printing changes from `#sparse_tensor.encoding<{ lvlTypes = [ "compressed" ] }>` to `map = (d0) -> (d0 : compressed)`. Level properties, ELL and slice are also supported.	2023-10-04 16:36:05 -04:00
Aart Bik	1964118ace	[mlir][sparse] fix codegen header ordering of methods into sections (#68175 )	2023-10-04 08:52:26 -07:00
Andrzej Warzynski	8d6d4f8321	[mlir][ArmSME] Split the Op definition (nfc) (#67985 ) Move the definitions of LLVM intrinsic Ops for ArmSME into a dedicated file. To facilitate this, the dialect definition together with various shared definitions are moved to ArmSME.td. This change will allow us to refactor the ArmSME dialect documentation. In particular, we will be able to categorise the Ops into "regular" and "intrinsic" ops. Also, it will be easier to add some custom documentation as opposed to relying on auto-generated docs that simply list the available Ops. The documentation will be updated in a forthcoming patch. Only non-functional changes.	2023-10-04 14:59:00 +00:00
Benjamin Maxwell	496318ad8d	[mlir][ArmSME] Lower vector.extract/insert on SME tiles to MOVA intrinsics (#67786 ) This patch adds support for lowering vector.insert/extract of tile slices or elements to ArmSME MOVA intrinsics. This enables the following operations for ArmSME: ``` // Extract slice from tile: %slice = vector.extract %tile[%row] : vector<[4]xi32> from vector<[4]x[4]xi32> ``` ``` // Extract element from tile: %el = vector.extract %tile[%row, %col] : i32 from vector<[4]x[4]xi32> ``` ``` // Insert slice into tile: %new_tile = vector.insert %slice, %tile[%row] : vector<[4]xi32> into vector<[4]x[4]xi32> ``` ``` // Insert element into tile; %new_tile = vector.insert %el, %tile[%row, %col] : i32 into vector<[4]x[4]xi32> ```	2023-10-04 09:28:39 +01:00
Ingo Müller	9748f98116	[mlir][transform] Make variable names in interpreter consistent. (NFC) (#67800 ) This commit renames the arguments of several static implementation functions of the transform interpreter base class to match the names of the corresponding member variables in order to clarify their intent. Similarly, it renames some local variables to reflect their relationship with corresponding member variables. Finally, this commit also asserts in `interpreterBaseRunOnOperationImpl` that at most one of shared and library module are set (which the initialization function guarantees) and simplifies some related `if` conditions.	2023-10-04 09:53:48 +02:00
Guray Ozen	afe400620f	[MLIR] Use `test-lower-to-nvvm` for sm_90 Integration Tests on GitHub (#68184 ) This PR enables `test-lower-to-nvvm` pass pipeline for the integration tests for NVIDIA sm_90 architecture. This PR adjusts `test-lower-to-nvvm` pass in two ways: 1) Calls `createConvertNVGPUToNVVMPass` before the outlining process. This particular pass is responsible for generating both device and host code. On the host, it calls the CUDA driver to build the TMA descriptor (`cuTensorMap`). 2) Integrates the `createConvertNVVMToLLVMPass` to generate PTXs for NVVM Ops.	2023-10-04 09:50:48 +02:00
Guray Ozen	9d54ae862a	[mlir] Add `opt-level` to `test-lower-to-nvvm` Pipeline (#68183 ) This PR adds the `opt-level` parameter to control code optimization for NVIDIA GPU targets in the `test-lower-to-nvvm` pipeline.	2023-10-04 09:25:53 +02:00
Matthias Springer	8823e961f6	[mlir][ODS] Change `get...Mutable` to return `OpOperand &` for single operands (#66519 ) The TableGen code generator now generates C++ code that returns a single `OpOperand &` for `get...Mutable` of operands that are not variadic and not optional. `OpOperand::set`/`assign` can be used to set a value (same as `MutableOperandRange::assign`). This is safer than `MutableOperandRange` because only single values (and no longer `ValueRange`) can be assigned. E.g.: ``` // Assignment of multiple values to non-variadic operand. // Before: Compiles, but produces invalid op. // After: Compilation error. extractSliceOp.getSourceMutable().assign({v1, v2}); ```	2023-10-04 08:35:40 +02:00
Kazu Hirata	3b34c117db	[llvm] Remove unused using decls (NFC) Identified with misc-unused-using-decls.	2023-10-03 23:21:50 -07:00
Mehdi Amini	5fc28e7a8d	Improve MLIR Attribute::get() method efficiency by reducing the amount of argument copies (#68067 ) This ensures that the proper std::forward/std::move are involved, we go from 6 copy-constructions to 0 (!) on Attribute creation in release builds.	2023-10-03 18:07:46 -07:00
Aart Bik	427f120f60	[mlir][sparse] minor edits in runtime lib Cpp files (#68165 )	2023-10-03 16:28:54 -07:00
Peiming Liu	0083f8338c	[mlir][sparse] renaming sparse_tensor.sort_coo to sparse_tensor.sort (#68161 ) Rationale: the operation does not always sort COO tensors (also used for sparse_tensor.compress for example).	2023-10-03 16:28:25 -07:00
Shraiysh	f7d4f863be	[mlir][OpenMP] Added translation for `omp.teams` to LLVM IR (#68042 ) This patch adds translation from `omp.teams` operation to LLVM IR using OpenMPIRBuilder. The clauses are not handled in this patch.	2023-10-03 17:36:25 -05:00
Aart Bik	ddb86a378d	[mlir][sparse] removed unused using clauses in support lib (#68148 )	2023-10-03 12:43:51 -07:00
Mehdi Amini	b82f3747da	Revert "Minor fixes on the MLIR ActionProfiler (NFC)" This reverts commit `0502d83470`. This introduces a race condition on the printComma variable.	2023-10-03 12:15:35 -07:00
Aart Bik	980e024bf7	[mlir][sparse] minor edits to support lib files (#68137 )	2023-10-03 11:40:54 -07:00
Jacques Pienaar	0a53005113	[mlir] Avoid common folder assuming all types are supported (#68054 ) Previously this would just assume all conversions are possible and this would crash. Use an in-tree testing case here. --------- Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>	2023-10-03 10:41:58 -07:00
Peiming Liu	c3b01b4679	[mlir][sparse] unify lib/codegen rewriting rules for sparse tensor concatenation. (#68057 )	2023-10-03 08:46:25 -07:00
Matthias Springer	173fd67a12	[mlir][scf][bufferize] Improve bufferization of allocs yielded from `scf.for` (#68089 ) The `BufferizableOpInterface` implementation of `scf.for` currently assumes that an OpResult does not alias with any tensor apart from the corresponding init OpOperand. Newly allocated buffers (inside of the loop) are also allowed. The current implementation checks whether the respective init_arg and yielded value are equivalent. This is overly strict and causes extra buffer allocations/copies when yielding a new buffer allocation from a loop.	2023-10-03 16:08:50 +02:00
Matthias Springer	464dfeba44	[mlir][tensor][bufferize] `tensor.empty` bufferizes to an allocation (#68080 ) Make `tensor.empty` bufferizable, so that the `-empty-tensor-to-alloc-tensor` pass becomes optional. This makes the bufferization easier to use. `tensor.empty` used to be non-bufferizable, so that there two separate ops, one that can be optimized away (`tensor.empty`) and one that is guaranteed to bufferize to an allocation (`bufferization.alloc_tensor`). With the recent improvements of "empty tensor elimination" this is no longer needed and `bufferization.alloc_tensor` can be phased out.	2023-10-03 16:00:37 +02:00
agozillon	1482106c99	[Flang][OpenMP][MLIR] Remove deletion of unused declare target global after use replacement (#67762 ) At the moment, for device a reference pointer is generated in place of the original declare target global value, this reference pointer is the pointer that actually receives the data. In Clang the original global value isn't generated for device, just the reference pointer. Unfortunately for Flang/MLIR this is currently not the case, as the declare target attribute is processed after the creation of the global so we end up with a dead global on device effectively after rewriting its uses to the new device reference pointer. It appears I was a little overzealous with the deletion of the declare target globals for device. The current method breaks in-cases where the same declare target global is used across two target regions (added a runtime reproduced in the patch). As it'll effectively delete it before the second target gets a chance to be written to LLVM IR and have it's uses rewritten . I'd like to remove this deletion as the dead global isn't breaking any code and will likely be removed in later dead code elimination passes, perhaps a little too heavy handed with the original approach.	2023-10-03 15:21:27 +02:00
Oleksandr "Alex" Zinenko	bc30b415ca	[mlir] enable python bindings for nvgpu transforms (#68088 ) Expose the autogenerated bindings. Co-authored-by: Martin Lücke <mluecke@google.com>	2023-10-03 14:52:52 +02:00
Oleksandr "Alex" Zinenko	f89b7a9ee2	[mlir] reword error message on unloaded dialect (#67980 ) The previous message was confusing as it mentioned "registration" but isn't in fact related to dialect registration. Use other words instead.	2023-10-03 11:28:32 +02:00
Guray Ozen	ee49cda7d4	[mlir][nvgpu] Use ImplicitLocOpBuilder in nvgpu-to-nvvm pass (NFC) (#67993 ) For the sake of better readability, this PR uses `ImplicitLocOpBuilder` instead of rewriter+loc	2023-10-03 10:52:36 +02:00
Jacques Pienaar	f1dbfcc14d	[mlir][py] Use overloads instead (NFC) Was using a local, pseudo overload rather than just using an overload proper.	2023-10-02 21:17:49 -07:00
Mehdi Amini	0502d83470	Minor fixes on the MLIR ActionProfiler (NFC) Ensure the stream flushed to the string before acquiring the mutext. No need to flush the output stream, the goal of the mutex is to sync ahead before content is added to the stream.	2023-10-02 20:11:18 -07:00
Antonio Cortes Perez	f33afea260	[mlir] Add an Observer for profiling actions to a stream. (#67251 ) The profile is stored in the Chrome trace event format. Added the --profile-action-to=<file> option to mlir-opt.	2023-10-02 20:07:10 -07:00
Peiming Liu	bc878f70fc	[mlir][sparse] unify lib/codegen rewriting rules for sparse tensor re… (#68049 ) …shaping operations.	2023-10-02 17:06:22 -07:00
Mehdi Amini	b97aaa72d9	Remove `let construct =` from ArithExpandOpsPass definition (NFC) Note that the `Pass` suffix is added in tablegen, and as a side effect the options are renamed from `ArithExpandOpsOptions` to `ArithExpandOpsPassOptions`.	2023-10-02 15:54:22 -07:00
Mehdi Amini	b1c10dfd72	Fixup on ArithBufferizePass: add the Pass suffix in TableGen to ensure consitency of the generated code	2023-10-02 15:50:41 -07:00
Mehdi Amini	c1c56ae49e	Remove `let constructor =` from ArithBufferizePass and rely on TableGen to generate the glue (NFC)	2023-10-02 15:41:16 -07:00
Georgios Pinitas	363c617aac	[mlir][tosa] Align `shift` attribute of `TOSA_MulOp` with the spec (#67816 ) According to specification the `shift` attribute of the Mul operator in TOSA is of signless i8 type instead of i32. Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com>	2023-10-02 15:02:16 -07:00
Maksim Levental	d7e49736e6	[mlir][CAPI, python bindings] Expose `Operation::setSuccessor` (#67922 ) This is useful for emitting (using the python bindings) `cf.br` to blocks that are declared lexically post block creation.	2023-10-02 15:37:25 -05:00
Andrzej Warzynski	811b05c4ef	[mlir][ArmSME] Remove dependency on a non-existing target (nfc) Sending this one without a review - as `MLIRArmSMEIncGen` is not defined anywhere, dependency on that target is clearly bogus.	2023-10-02 20:34:48 +00:00
Yinying Li	2cb99df609	[mlir][sparse] Fix typos (#67859 )	2023-10-02 11:07:38 -04:00
Yinying Li	d2e8517912	[mlir][sparse] Update Enum name for CompressedWithHigh (#67845 ) Change CompressedWithHigh to LooseCompressed.	2023-10-02 11:06:40 -04:00
Oleksandr "Alex" Zinenko	aab795a8dc	[mlir] run buffer deallocation in transform tutorial (#67978 ) Buffer deallocation pipeline previously was incorrect when applied to functions. It has since been fixed. Make sure it is exercised in the tutorial to avoid leaking allocations.	2023-10-02 16:08:11 +02:00
Matthias Springer	c95fcd343d	[mlir][bufferization] Remove `resolveUsesInRepetitiveRegions` (#67927 ) The bufferization analysis has been improved over the last months and this workaround is no longer needed.	2023-10-02 16:04:27 +02:00
Matthias Springer	43198b0aa2	[mlir][bufferization] Better analysis around allocs and block arguments (#67923 ) Values that are the result of buffer allocation ops are guaranteed to not be the same allocation as block arguments of containing blocks. This fact can be used to allow for more aggressive simplification of `bufferization.dealloc` ops.	2023-10-02 11:01:12 +02:00

1 2 3 4 5 ...

17439 Commits