llvm-capstone

mirror of https://github.com/capstone-engine/llvm-capstone.git synced 2024-11-26 23:21:11 +00:00

Author	SHA1	Message	Date
Kai Sasaki	076953d250	[mlir] Skip invalid test on big endian platform (s390x) (#80246 ) The buildbot test running on s390x platform keeps failing since [this time](https://lab.llvm.org/buildbot/#/builders/199/builds/31136). This is because of the dependency on the endianness of the platform. It expects the format invalid in the big endian platform (s390x). We can simply skip it. See: https://discourse.llvm.org/t/mlir-s390x-linux-failure/76695 (cherry picked from commit 65ac8c16e028b23b49fd6b03817faa1ab6c0229d)	2024-02-13 11:39:15 -08:00
Andrei Golubev	0680e84a3f	[mlir] Revert to old fold logic in IR::Dialect::add{Types, Attributes}() (#79582 ) Fold expressions on Clang are limited to 256 elements. This causes compilation errors in cases when the amount of elements added exceeds this limit. Side-step the issue by restoring the original trick that would use the std::initializer_list. For the record, in our downstream Clang 16 gives: mlir/include/mlir/IR/Dialect.h:269:23: fatal error: instantiating fold expression with 688 arguments exceeded expression nesting limit of 256 (addType<Args>(), ...); Partially reverts `26d811b3ec`. Co-authored-by: Nikita Kudriavtsev <nikita.kudriavtsev@intel.com> (cherry picked from commit e3a38a75ddc6ff00301ec19a0e2488d00f2cc297)	2024-01-29 15:10:47 -08:00
Andrei Golubev	3df71e5a3f	[mlir][LLVM] Use int32_t to indirectly construct GEPArg (#79562 ) GEPArg can only be constructed from int32_t and mlir::Value. Explicitly cast other types (e.g. unsigned, size_t) to int32_t to avoid narrowing conversion warnings on MSVC. Some recent examples of such are: ``` mlir\lib\Dialect\LLVMIR\Transforms\TypeConsistency.cpp: error C2398: Element '1': conversion from 'size_t' to 'T' requires a narrowing conversion with [ T=mlir::LLVM::GEPArg ] mlir\lib\Dialect\LLVMIR\Transforms\TypeConsistency.cpp: error C2398: Element '1': conversion from 'unsigned int' to 'T' requires a narrowing conversion with [ T=mlir::LLVM::GEPArg ] ``` Co-authored-by: Nikita Kudriavtsev <nikita.kudriavtsev@intel.com> (cherry picked from commit 89cd345667a5f8f4c37c621fd8abe8d84e85c050)	2024-01-29 10:29:59 -08:00
Mirko Brkušanin	ed48280f8e	[AMDGPU] Add GFX12 WMMA and SWMMAC instructions (#77795 ) Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com> Co-authored-by: Piotr Sobczak <piotr.sobczak@amd.com>	2024-01-26 20:01:08 -08:00
Jeff Niu	df1e01b316	[mlir] Add example of `printAlias` to test dialect (NFC) (#79232 ) Follow-up from previous pull request. Motivate the API change with an attribute that decides between sugaring a sub-attribute or using an alias	2024-01-23 16:29:57 -08:00
Krzysztof Drewniak	750e90e440	[mlir][ArithToAMDGPU] Add option for saturating truncation to fp8 (#74153 ) Many machine-learning applications (and most software written at AMD) expect the operation that truncates floats to 8-bit floats to be saturatinng. That is, they expect `truncf 256.0 : f32 to f8E4M3FNUZ` to yield `240.0`, not `NaN`, and similarly for negative numbers. However, the underlying hardware instruction that can be used for this truncation implements overflow-to-NaN semantics. To enable handling this usecase, we add the saturate-fp8-truncf option to ArithToAMDGPU (off by default), which causes the requisite clamping code to be emitted. Said clamping code ensures that Inf and NaN are passed through exactly (and thus trancate to NaN). Per review feedback, this commit efactors createScalarOrSplatConstant() to the Arith dialect utilities and uses it in this code. It also fixes naming of existing patterns and switches from vector.extractelement/insertelement to vector.extract/insert.	2024-01-23 16:52:21 -06:00
Aart Bik	575568de41	[mlir][sparse] adjust compression scheme for example (#79212 )	2024-01-23 14:51:46 -08:00
Kunwar Grover	9261ab708e	[mlir][Target] Teach dense_resource conversion to LLVMIR Target (#78958 ) This patch adds support for translating dense_resource attributes to LLVMIR Target. The support added is similar to how DenseElementsAttr is handled, except we don't need to handle splats. Another possible way of doing this is adding iteration on dense_resource, but that is non-trivial as DenseResourceAttr is not meant to be something you should directly access. It has subclasses which you are supposed to use to iterate on it.	2024-01-23 13:30:34 -08:00
Krzysztof Drewniak	80fcc9247a	[mlir][AMDGPU] Actually update the default ABI version, add comments (#79185 ) Much confusion occurred earlier today when updating the fallback `int abi;` in addControlVariables() didn't do anything. THis was because that that value is the fallback for if the ABI version fails to parse ... which it always should, because it has a default value that comes from multiple different places. This commit updates all the places said default variable can come from, namely: 1. The ROCDL target attribute definition 2. The ROCDL target attribute's builders 3. The rocdl-attach-target pass's default option values. With this, the printf test is passing.	2024-01-23 12:16:18 -06:00
Saiyedul Islam	9edd1c4dae	[MLIR][AMDGPU] Switch to code object version 5 (#79144 ) As AMDGPU backend has moved to cov5 as default, mlir should also switch to it.	2024-01-23 20:30:44 +05:30
Saiyedul Islam	d2398cca6f	Restore: [mlir][ROCDL] Stop setting amdgpu-implicitarg-num-bytes (#79129 ) This patch restores PR#78498	2024-01-23 18:48:39 +05:30
Saiyedul Islam	082f87c9d4	[AMDGPU] Change default AMDHSA Code Object version to 5 (#79038 ) Also update LIT tests and docs. For more details, see https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata Corresponding llvm-objdump AMDGPU lit tests are updated in a follow-up PR.	2024-01-23 17:08:18 +05:30
Mehdi Amini	d4933b3241	Apply clang-tidy fixes for readability-identifier-naming in PolynomialApproximation.cpp (NFC)	2024-01-22 17:34:56 -08:00
Mehdi Amini	acf2f24ac3	Apply clang-tidy fixes for llvm-else-after-return in LLVMDialect.cpp (NFC)	2024-01-22 17:34:56 -08:00
Mehdi Amini	2e0909025e	Apply clang-tidy fixes for readability-simplify-boolean-expr in Vectorization.cpp (NFC)	2024-01-22 17:34:55 -08:00
Mehdi Amini	3af5ab21b8	Apply clang-tidy fixes for readability-identifier-naming in Transforms.cpp (NFC)	2024-01-22 17:34:55 -08:00
Mehdi Amini	c0fe2b8963	Apply clang-tidy fixes for modernize-loop-convert in Transforms.cpp (NFC)	2024-01-22 17:34:55 -08:00
Andrzej Warzynski	e3172e8418	[mlir] Update "UNSUPPORTED" directive in a test Add missing "arm64" target to the list of unsupported targets ("arm64" is used on Darwin).	2024-01-22 20:04:19 +00:00
Valentin Clement (バレンタインクレメン)	3eb4178b9c	[mlir][openacc] Update acc.loop to be a proper loop like operation (#67355 ) The initial design of the `acc.loop` was to be an operation that encapsulates a loop like operation. This was an early design and we now want to change it so the `acc.loop` operation becomes a real loop-like operation by implementing the LoopLikeInterface. Differential Revision: https://reviews.llvm.org/D159229 This patch is just moved from Phabricator to github	2024-01-22 10:31:29 -08:00
Valentin Clement (バレンタインクレメン)	ee6199ca3c	[mlir][openacc][NFC] Cleanup hasOnly functions for device_type support (#78800 ) Just a cleanup for all the `has.*Only()` function to avoid code duplication	2024-01-22 08:40:52 -08:00
Valentin Clement (バレンタインクレメン)	b5df6a90f5	[mlir][openacc] Fix num_gang parser (#78792 ) Nb of operand per segment is not correctly computed.	2024-01-22 08:40:33 -08:00
Andrzej Warzynski	160ddf7114	[mlir] Remove duplicate test The removed test is identical to the one directly above.	2024-01-22 15:51:56 +00:00
Andrzej Warzynski	75b0c913a5	[mlir][nfc] Update comments 1. Updates and clarifies a few comments related to hooks for vector.{insert\|extract}_strided_slice. 2. For consistency with vector.insert_strided_slice, removes a TODO from vector.extract_strided_slice Op def. It's self-explenatory that adding support for non-unit strides is a "TODO".	2024-01-22 14:25:27 +00:00
Jie Fu	3c94154c86	[mlir] Fix -Wunused-variable in Barvinok.cpp (NFC) llvm-project/mlir/lib/Analysis/Presburger/Barvinok.cpp:262:21: error: unused variable 'd' [-Werror,-Wunused-variable] for (const Point &d : ds) ^ 1 error generated.	2024-01-22 20:12:39 +08:00
Benjamin Maxwell	e280c287e4	[mlir] Add `mlir_arm_runner_utils` library for use in integration tests (#78583 ) This adds a new `mlir_arm_runner_utils` library that contains utils specific to Arm/AArch64. This is for use in MLIR integration tests. This initial patch adds `setArmVLBits()` and `setArmSVLBits()`. This allows changing vector length or streaming vector length at runtime (or setting it to a known minimum, i.e. 128-bits).	2024-01-22 09:28:13 +00:00
Cullen Rhodes	9f7fff7f13	[mlir][ArmSME] Add arith-to-arm-sme conversion pass (#78197 ) Existing 'arith::ConstantOp' conversion and tests are moved from VectorToArmSME. There's currently only a single op that's converted at the moment, but this will grow in the future as things like in-tile add are implemented. Also, 'createLoopOverTileSlices' is moved to ArmSME utils since it's relevant for both conversions.	2024-01-22 09:23:11 +00:00
Abhinav271828	68a5261d26	[MLIR][Presburger] Implement function to evaluate the number of terms in a generating function. (#78078 ) We implement `computeNumTerms()`, which counts the number of terms in a generating function by substituting the unit vector in it. This is the main function in Barvinok's algorithm – the number of points in a polytope is given by the number of terms in the generating function corresponding to it. We also modify the GeneratingFunction class to have `const` getters and improve the simplification of QuasiPolynomials.	2024-01-22 14:22:01 +05:30
Dominik Adamski	21199f9842	[OpenMP][OMPIRBuilder] Fix LLVM IR codegen for collapsed device loop (#78708 ) When we generate the loop body function, we need to be sure, that all original loop counters are replaced by the new counter. We need to save all items which use the original loop counter and then perform replacement of the original loop counter. If we don't do it, there is a risk that some values are not updated.	2024-01-22 09:24:45 +01:00
Durgadoss R	aa4547fcc8	[MLIR][NVVM] Update cp.async.bulk Ops to use intrinsics (#78900 ) This patch updates the cp.async.bulk.{commit/wait}_group Ops to use NVVM intrinsics. * Doc updated for the commit_group Op. * Tests are added to verify the lowering to the intrinsics. While we are there, fix the FileCheck directive on the 'nvvm.setmaxregister' test. Signed-off-by: Durgadoss R <durgadossr@nvidia.com>	2024-01-22 08:39:30 +01:00
Guray Ozen	12c241b365	[MLIR][NVVM] Explicit Data Type for Output in `wgmma.mma_async` (#78713 ) The current implementation of `nvvm.wgmma.mma_async` Op deduces the data type of the output matrix from the data type of struct member, which can be non-intuitive, especially in cases where types like `2xf16` are packed into `i32`. This PR addresses this issue by improving the Op to include an explicit data type for the output matrix. The modified Op now includes an explicit data type for Matrix-D (<f16>), and looks as follows: ``` %result = llvm.mlir.undef : !llvm.struct<(struct<(i32, i32, ... nvvm.wgmma.mma_async %descA, %descB, %result, #nvvm.shape<m = 64, n = 32, k = 16>, D [<f16>, #nvvm.wgmma_scale_out<zero>], A [<f16>, #nvvm.wgmma_scale_in<neg>, <col>], B [<f16>, #nvvm.wgmma_scale_in<neg>, <col>] ```	2024-01-22 08:37:20 +01:00
Guray Ozen	21830c9135	[mlir][nvgpu] Fix 'warpgroup.mma.store' index calculation (#78413 ) This PR fixes the 'nvgpu.warpgroup.mma.store' index calculation. When the destionation memref and current accumulator matrix were small, the previous code was reaching out of range.	2024-01-22 08:32:56 +01:00
Matthias Springer	fbb62d449c	[mlir][bufferization] Buffer deallocation: Make op preconditions stricter (#75127 ) The buffer deallocation pass checks the IR ("operation preconditions") to make sure that there is no IR that is unsupported. In such a case, the pass signals a failure. The pass now rejects all ops with unknown memory effects. We do not know whether such an op allocates memory or not. Therefore, the buffer deallocation pass does not know whether a deallocation op should be inserted or not. Memory effects are queried from the `MemoryEffectOpInterface` interface. Ops that do not implement this interface but have the `RecursiveMemoryEffects` trait do not have any side effects (apart from the ones that their nested ops may have). Unregistered ops are now rejected by the pass because they do not implement the `MemoryEffectOpInterface` and neither do we know if they have `RecursiveMemoryEffects` or not. All test cases that currently have unregistered ops are updated to use registered ops.	2024-01-21 11:10:09 +01:00
Matthias Springer	62bf7710ff	[mlir][IR] Add `notifyBlockRemoved` callback to listener (#78306 ) There is already a "block inserted" notification (in `OpBuilder::Listener`), so there should also be a "block removed" notification. The purpose of this change is to make the listener API more mature. There is currently a gap between what kind of IR changes can be made and what IR changes can be listened to. At the moment, the only way to inform listeners about "block removal" is to send a manual `notifyOperationModified` for the parent op (e.g., by wrapping the `eraseBlock(b)` method call in `updateRootInPlace(b->getParentOp())`). This tells the listener that something has changed, but it is somewhat of an API abuse.	2024-01-21 10:06:53 +01:00
Jerry Wu	dedc7d4d36	[mlir] Exclude masked ops in VectorDropLeadUnitDim (#76468 ) Don't insert cast ops for ops in `vector.mask` region in `VectorDropLeadUnitDim`.	2024-01-20 19:37:46 -05:00
Bharathi Ramana Joshi	d70bfeb4e1	[MLIR][Presburger] Implement IntegerRelation::setId (#77872 )	2024-01-20 15:19:10 +05:30
Jeff Niu	15b089cb02	[mlir] Make `printAlias` hooks public (NFC) (#78833 ) These are very useful when writing custom parsers and printers for aggregate types or attributes that might want to print aliases.	2024-01-19 23:23:41 -08:00
Mehdi Amini	b1d4265a5f	Apply clang-tidy fixes for llvm-qualified-auto in Promotion.cpp (NFC)	2024-01-19 17:58:15 -08:00
Mehdi Amini	197a73f019	Apply clang-tidy fixes for llvm-include-order in Fusion.cpp (NFC)	2024-01-19 17:58:15 -08:00
Mehdi Amini	46ce993dd4	Apply clang-tidy fixes for llvm-else-after-return in ElementwiseOpFusion.cpp (NFC)	2024-01-19 17:58:14 -08:00
Mehdi Amini	f19f213974	Apply clang-tidy fixes for llvm-else-after-return in DropUnitDims.cpp (NFC)	2024-01-19 17:58:14 -08:00
Mehdi Amini	3b61f5a1bc	Apply clang-tidy fixes for performance-unnecessary-value-param in DataLayoutPropagation.cpp (NFC)	2024-01-19 17:58:14 -08:00
Mehdi Amini	e611a4cf80	Revert "[mlir][amdgpu] Shared memory access optimization pass" (#78822 ) Reverts llvm/llvm-project#75627 ; it broke the bot: https://lab.llvm.org/buildbot/#/builders/61/builds/53218	2024-01-19 16:41:43 -08:00
Xiangxi Guo (Ryan)	c17aa14f4c	[mlir][index] Fold `cmp(x, x)` when `x` isn't a constant (#78812 ) Such cases show up in the middle of optimizations passes, e.g., after some rewrites and then CSE. The current folder can fold such cases when the inputs are constant; this patch improves it to fold even if the inputs are non-constant.	2024-01-19 15:54:33 -08:00
erman-gurses	b7360fbe8c	[mlir][amdgpu] Shared memory access optimization pass (#75627 ) It implements transformation to optimize accesses to shared memory. Reference: https://reviews.llvm.org/D127457 _This change adds a transformation and pass to the NvGPU dialect that attempts to optimize reads/writes from a memref representing GPU shared memory in order to avoid bank conflicts. Given a value representing a shared memory memref, it traverses all reads/writes within the parent op and, subject to suitable conditions, rewrites all last dimension index values such that element locations in the final (col) dimension are given by newColIdx = col % vecSize + perm[row](col / vecSize, row) where perm is a permutation function indexed by row and vecSize is the vector access size in elements (currently assumes 128bit vectorized accesses, but this can be made a parameter). This specific transformation can help optimize typical distributed & vectorized accesses common to loading matrix multiplication operands to/from shared memory._	2024-01-19 15:44:45 -08:00
Quinn Dawkins	42b160356f	[mlir][transform] Add an op for replacing values with function calls (#78398 ) Adds `transform.func.cast_and_call` that takes a set of inputs and outputs and replaces the uses of those outputs with a call to a function at a specified insertion point. The idea with this operation is to allow users to author independent IR outside of a to-be-compiled module, and then match and replace a slice of the program with a call to the external function. Additionally adds a mechanism for populating a type converter with a set of conversion materialization functions that allow insertion of casts on the inputs/outputs to and from the types of the function signature.	2024-01-19 13:21:52 -05:00
Marius Brehler	205e15c176	[mlir][docs] Fix broken link	2024-01-19 17:38:27 +01:00
Kareem Ergawy	5dbb30d950	[MLIR][OpenMP] Better error reporting for unsupported `nowait` (#78551 ) Provides some context for failing to generate LLVM IR for `target enter\|exit\|update` directives when `nowait` is provided. This is directly helpful for flang users since they would get this error message if they tried to use `nowait`. Before that we had a very generic message. This is a follow-up to https://github.com/llvm/llvm-project/pull/78269, please only review the latest commit (the one with the same commit message as the PR title).	2024-01-19 16:47:24 +01:00
Matthias Springer	b4f24be7ef	[mlir][bufferization] Simplify helper `potentiallyAliasesMemref` (#78690 ) This commit simplifies a helper function in the ownership-based buffer deallocation pass. Fixes a potential double-free (depending on the scheduling of patterns).	2024-01-19 13:22:02 +01:00
Benjamin Chetioui	35121add2e	[mlir][NFC] Remove unused variable.	2024-01-19 11:32:19 +00:00
Han-Chung Wang	12b676de72	[mlir][vector] Drop innermost unit dims on transfer_write. (#78554 )	2024-01-19 03:15:13 -08:00

1 2 3 4 5 ...

18719 Commits