llvm-capstone

mirror of https://github.com/capstone-engine/llvm-capstone.git synced 2025-02-18 16:58:23 +00:00

Author	SHA1	Message	Date
h-vetinari	80403e9fee	[libc++] fix some inconsistencies on libcxx status pages (#73471 ) Minor things I've noticed from looking at the pages. For the missing space I actually searched for "``\\\s\w" to see where they were hiding.	2023-11-27 10:14:18 -05:00
Tom Eccles	caba0314cf	[flang] Enable alias tags pass by default (#73111 ) Enable by default for optimization levels higher than 0 (same behavior as clang). For simplicity, only forward the flag to the frontend driver when it contradicts what is implied by the optimization level. Since https://github.com/llvm/llvm-project/pull/72903 there are now no known performance regressions.	2023-11-27 15:10:21 +00:00
Guillaume Chatelet	9539cbf033	[libc] Add detection support for float16 (#73372 )	2023-11-27 16:08:17 +01:00
Simon Pilgrim	286905351f	[X86] vector-interleaved tests - add AVX512F/AVX512DQ/AVX512BW/AVX512DQBW-ONLY common prefixes to merge more SLOW/FAST checks Not used by many vector-interleaved tests, but its a LOT easier to maintain if we use the same prefixes for all of them.	2023-11-27 15:06:24 +00:00
Dmitri Gribenko	934efd2c9b	Revert "[Bazel] Fix llvm-exegesis build post 12b0ab2" This reverts commit 1449b349ac4072adb1f593234c1d6f67986d3b6a. The corresponding CMake commit (12b0ab2) was reverted.	2023-11-27 15:53:48 +01:00
Erich Keane	ba1c869f00	[OpenACC] Implement 'routine' construct parsing (#73143 ) The 'routine' construct applies either to a function directly, or, when provided a name, applies to the function named (and is visible in the current scope). This patch implements the parsing for this. The identifier provided (or Id Expression) is required to be a valid, declared identifier, though the semantic analysis portion of the Routine directive will need to enforce it being a function/overload set.	2023-11-27 06:49:29 -08:00
Joseph Huber	bf02c84cb8	[libc] Use file lock to join newline on RPC puts call (#73373 ) Summary: The puts call appends a newline. With multiple threads, this can be done out of order such that another thread puts something before we finish appending the newline. Add a flockfile and funlockfile to ensure that the whole string is printed before another string can appear.	2023-11-27 08:41:15 -06:00
Igor Kirillov	839abdb0d2	[MachineLICM] Fix incorrect CSE on hoisted const load (#73007 ) When hoisting an invariant load, we should not combine it with an existing load through common subexpression elimination (CSE). This is because there might be memory-changing instructions between the existing load and the end of the block entering the loop. Fixes https://github.com/llvm/llvm-project/issues/72855	2023-11-27 14:37:18 +00:00
Michael Buch	ae10baf0a0	[libcxxabi][test][NFC] Turn off clang-format for demangler test-case array (#73503 ) Adding test-cases to the `cases` array causes `git clang-format` to split the strings of many of the existing test-cases, making them harder to read/work with in most cases. This patch disables `clang-format` for the `cases` array so it doesn't catch anyone off-guard in the future.	2023-11-27 09:24:37 -05:00
Joseph Huber	79b03306af	[llvm] Disable HandleLLVMOptions in runtimes mode (#73031 ) Summary: There are a few default options that LLVM adds that can be problematic for runtimes builds. These options are generally intended to handle building LLVM itself, but are also added when building in a runtimes mode. One such issue I've run into is that in `libc` we deliberately use `--target` to use a different device toolchain, which doesn't support some linker arguments passed via `-Wl`. This is observed in https://github.com/llvm/llvm-project/pull/73030 when attempting to use these options. This patch completely removes these default arguments. The consensus is that any issues created by this patch should ultimately be solved on a per-runtime basis.	2023-11-27 08:12:32 -06:00
Nikita Popov	6778dbe502	[DomTree] Avoid duplicate hash lookup (NFC) We're performing the same lookup twice here, just to access different fields. Only perform it once. Also prefer using find() over operator[], as we do not want to perform an insert if the node does not exist.	2023-11-27 15:05:03 +01:00
Sander de Smalen	adb130ccad	[llvm][TypeSize] Consider TypeSize of '0' to be fixed/scalable-agnostic. (#72994 ) This patch allows adding any quantity to a zero-initialized TypeSize, such that e.g.: TypeSize::Scalable(0) + TypeSize::Fixed(4) == TypeSize::Fixed(4) TypeSize::Fixed(0) + TypeSize::Scalable(4) == TypeSize::Scalable(4) This makes it easier to implement add-reductions using TypeSize where the 'scalable' flag is not yet known before starting the reduction. (this PR follows on from #72979)	2023-11-27 14:04:52 +00:00
martinboehme	5bd643e145	[clang][dataflow] Strengthen widening of boolean values. (#73484 ) Before we widen to top, we now check if both values can be proved either true or false in their respective environments; if so, widening returns a true or false literal. The idea is that we avoid losing information if posssible. This patch includes a test that fails without this change to widening. This change does mean that we call the SAT solver in more places, but this seems acceptable given the additional precision we gain. In tests on an internal codebase, the number of SAT solver timeouts we observe with Crubit's nullability checker does increase by about 25%. They can be brought back to the previous level by doubling the SAT solver work limit.	2023-11-27 14:55:49 +01:00
Simon Pilgrim	edf645616f	[X86] Regenerate vector-interleaved-store-i64-stride-4.ll	2023-11-27 13:48:36 +00:00
Shengchen Kan	cb112eb16c	[X86][CodeGen] Teach frame lowering to spill/reload registers w/ PUSHP/POPP, PUSH2[P]/POP2[P] (#73292 ) #73092 supported the encoding/decoding for PUSHP/POPP #73233 supported the encoding/decoding for PUSH2[P]/POP2[P] In this patch, we teach frame lowering to spill/reload registers w/ these instructions. 1. Use PPX for balanced spill/reload 2. Use PUSH2/POP2 for continuous spills/reloads 3. PUSH2/POP2 must be 16B-aligned on the stack, so pad when necessary	2023-11-27 21:37:07 +08:00
David Spickett	d7c03a196e	[clang][AArch64][NFC] Remove trailing space in SME intriniscs header	2023-11-27 13:31:15 +00:00
Hans Wennborg	a9e3d232a5	Revert "[runtimes] Add missing test dependencies to check-all (#72955 )" This caused some runtimes builds to fail with: error: unknown target 'runtimes-test-depends' See comments on the PR. > The test-depends target contained all the dependencies needed to run the > runtimes tests, but it was never added as a dependency of check-all. > This caused some of the tsan tests to fail, since the custom libcxx > build the tests were looking for was never built. Besides the tsan > failures, this fixes all the other test failures I was seeing with: > cmake -G Ninja -B release-build -S llvm \ > -DCMAKE_POSITION_INDEPENDENT_CODE=ON \ > -DCMAKE_BUILD_TYPE=Release \ > -DLLVM_ENABLE_ASSERTIONS=OFF \ > -DLLVM_ENABLE_PROJECTS="clang;lld" \ > -DLLVM_ENABLE_RUNTIMES="libcxx;libcxxabi;libunwind;compiler-rt" > > This is the same configuration the test-release.sh script uses, so I'm > hoping this will also fix all the test failures we've been seeing when > building the releases. > > Fixes #58680 This reverts commit 7f215b1380da49dccbf57da3040a40d25ed898f4.	2023-11-27 13:05:02 +01:00
Momchil Velikov	ac06d4e4cb	Re-commit "[MachineSink][AArch64] Enable sink-and-fold by default (#72132 )" This re-commits 13fe0386454d after fixing a couple of issues in the LLDB testsuite in ef9bcace834e and 6b87d84ff45d	2023-11-27 11:28:22 +00:00
Guray Ozen	f21a70f9fe	[mlir][cuda] Guard mgpuLaunchClusterKernel for Cuda 12.0+ (NFC) (#73495 )	2023-11-27 11:50:46 +01:00
Nico Weber	344b5346a0	Revert "[gn] port 92b821f2dcdd" This reverts commit 580858ad0accee171256d720cff465e7016f018d. 92b821f2dcdd was reverted in ea5de6021cf69a23	2023-11-27 19:44:43 +09:00
Hans Wennborg	ea5de6021c	Revert "Reland "[llvm-exegesis] Switch from MCJIT to LLJIT (#72838 )"" This broke tools/llvm-exegesis/X86/latency/dummy-counters.test on Mac, see comment on the PR. > There was an issue with certain configurations failing to build due to a > deleted Error constructor that should be fixed with this relanding. This reverts commit 92b821f2dcddbfa934689e10d8f11df150ab1043.	2023-11-27 11:34:55 +01:00
Simon Pilgrim	11276563c8	[X86] X86DAGToDAGISel - attempt to merge XMM/YMM loads with YMM/ZMM loads of the same ptr (#73126 ) If we are loading the same ptr at different vector widths, then reuse the largest load and just extract the low subvector. Unlike the equivalent VBROADCAST_LOAD/SUBV_BROADCAST_LOAD folds which can occur in DAG, we have to wait until DAGISel otherwise we can hit infinite loops if constant folding recreates the original constant value. This is mainly useful for better constant sharing.	2023-11-27 10:26:26 +00:00
Timm Bäder	489df61a29	[clang][Interp][NFC] const qualify a local variable	2023-11-27 11:17:49 +01:00
Rik Huijzer	56cf3ff479	[mlir][spirv][doc] Remove duplicate syntax formats (#73386 ) Some operations defined their syntax both in the documentation and via `assemblyFormat`. This leads to two syntax descriptions in the documentation for SPIR-V, see for example the documentation for [`spirv.mlir.yield`](https://mlir.llvm.org/docs/Dialects/SPIR-V/#spirvmliryield-spirvyieldop). Since the `assemblyFormat` is used to generate the actual parsers and printer implementations, this PR removes the manual syntax descriptions. (Similar to https://github.com/llvm/llvm-project/pull/73343.) The strategy that I used to find the duplicates was pretty uncomplicated. I scrolled through the [SPIR-V Dialect](https://mlir.llvm.org/docs/Dialects/SPIR-V) to find all duplicates and then remove the duplicate text from the `td` file. Note that the `Syntax:` block in the docs is a good proxy for whether `assemblyFormat` is defined because it will only be generated if the op has defined `assemblyFormat` (`op.hasAssemblyFormat()`): `e970652776/mlir/tools/mlir-tblgen/OpDocGen.cpp (L108-L124)` `e970652776/mlir/tools/mlir-tblgen/OpDocGen.cpp (L197-L199)` Related issue https://github.com/llvm/llvm-project/issues/73359.	2023-11-27 11:17:07 +01:00
Nikita Popov	272085f10b	[DomTree] Remove unnecessary domtree level check in SemiNCA (NFC) (#73107 ) runSemiNCA() currently checks that the ReverseChildren are below MinLevel in the DT, which is used when performing incremental updates. However, ReverseChildren is populated during runDFS with only the predecessors that are part of that DFS walk, which will itself be level limited in the relevant cases. As such, I don't believe that this should be checked during runSemiNCA(). This code probably dates back to a time when predecessors were not cached during runDFS and as such not limited to the visited subtree only.	2023-11-27 11:12:16 +01:00
Florian Hahn	17139f38e5	[LAA] Check HasSameSize before couldPreventStoreLoadForward. After 9645267, TypeByteSize is 0 if both access do not have the same size (i.e. HasSameSize will be false). This can cause an infinite loop in couldPreventStoreLoadForward, if HasSameSize is not checked first. So check HasSameSize first instead of after couldPreventStoreLoadForward. Checking HasSameSize first is also cheaper.	2023-11-27 10:10:41 +00:00
Timm Baeder	0e86d3ea9b	[clang] Print static_assert values of arithmetic binary operators (#71671 ) These are actually quite useful to print.	2023-11-27 11:10:02 +01:00
Florian Hahn	2fda8ca6da	[LAA] Auto-generate checks for forward-loop-carried.ll Auto-generate checks for -loop-carried.ll to make it easier to update in follow-on patch. As this test only checks the dependence, mark pointers as noalias to avoid also checking various runtime pointer check groups.	2023-11-27 10:06:17 +00:00
Guray Ozen	edf5cae739	[mlir][gpu] Support Cluster of Thread Blocks in `gpu.launch_func` (#72871 ) NVIDIA Hopper architecture introduced the Cooperative Group Array (CGA). It is a new level of parallelism, allowing clustering of Cooperative Thread Arrays (CTA) to synchronize and communicate through shared memory while running concurrently. This PR enables support for CGA within the `gpu.launch_func` in the GPU dialect. It extends `gpu.launch_func` to accommodate this functionality. The GPU dialect remains architecture-agnostic, so we've added CGA functionality as optional parameters. We want to leverage mechanisms that we have in the GPU dialects such as outlining and kernel launching, making it a practical and convenient choice. An example of this implementation can be seen below: ``` gpu.launch_func @kernel_module::@kernel clusters in (%1, %0, %0) // <-- Optional blocks in (%0, %0, %0) threads in (%0, %0, %0) ``` The PR also introduces index and dimensions Ops specific to clusters, binding them to NVVM Ops: ``` %cidX = gpu.cluster_id x %cidY = gpu.cluster_id y %cidZ = gpu.cluster_id z %cdimX = gpu.cluster_dim x %cdimY = gpu.cluster_dim y %cdimZ = gpu.cluster_dim z ``` We will introduce cluster support in `gpu.launch` Op in an upcoming PR. See [the documentation](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#cluster-of-cooperative-thread-arrays) provided by NVIDIA for details.	2023-11-27 11:05:07 +01:00
Kerry McLaughlin	d1652ff080	[Clang][SME2] Add outer product and accumulate/subtract builtins (#71176 ) Adds the following SME2 builtins: - svmop(a\|s)_za32, - svbmop(a\|s)_za32 See https://github.com/ARM-software/acle/pull/217	2023-11-27 09:54:23 +00:00
Tim Northover	1726a59d46	AArch64: remove duplicate SHA3 feature from Apple CPUs. NFC.	2023-11-27 09:44:53 +00:00
David Spickett	772f296214	[lldb][AArch64][Linux] Correct name of FPCR field It should be "RMode" as in "rounding mode" not "RMMode".	2023-11-27 09:10:56 +00:00
David Spickett	8167934480	[lldb] Improve error message for script commands when there's no interpreter (#73321 ) It was: ``` error: there is no embedded script interpreter in this mode. ``` 1. What does "mode" mean? 2. It implies there might be an embedded script interpreter for some other "mode", whatever that would be. So I'm simplifying it and noting the most common reason for this which is that lldb wasn't built with a scripting language enabled in the first place. There are other tips for dealing with this, but I'm not sure this message is the best place for them.	2023-11-27 09:10:39 +00:00
Florian Hahn	d045e23c2d	[ConstraintElim] Refactor GEP offset collection. Move GEP offset collection to separate helper function and collect variable and constant offsets in OffsetResult. For now, this only supports 1 VariableOffset, but the new code structure can be more easily extended to handle more offsets in the future. The refactoring drops the check that the VariableOffset >= -1 * constant offset. This is not needed to check whether the constraint is monotonically increasing. The constant factors can be ignored, the constraint will be monotonically increasing if all variables are positive. See https://alive2.llvm.org/ce/z/ah2uSQ, https://alive2.llvm.org/ce/z/NCADNZ	2023-11-27 09:05:58 +00:00
Nikita Popov	57a0a9aadf	[InstCombine] Add more inbounds tests for indexed compare fold (NFC) Don't only test the case where all GEPs are missing inbounds, also test inbounds only being present on some of them. The fold should not be performed in either case.	2023-11-27 09:42:19 +01:00
Nikita Popov	553f8853db	[DomTree] Reduce number of hash table lookups (NFC) (#73097 ) Inside runSemiNCA(), create a direct mapping from node number of node info, so we can save the node number -> node pointer -> node info lookup in many cases. To allow this in more cases, change Label to a node number instead of node pointer. I've limited this to runSemiNCA() for now, because we have the convenient property there that no new node infos will be added, so we don't have to worry about pointer invalidation. This gives a pretty nice compile-time improvement of about 0.4%.	2023-11-27 09:36:34 +01:00
David Green	295edaab13	[AArch64][GlobalISel] Better vecreduce.fadd lowering. (PR #73294 ) This changes the fadd legalization to handle fp16 types, and treats more types as legal so that the backend can produce the correct patterns. This is currently a missing identity fold for `fadd x -0.0 -> x`	2023-11-27 08:20:54 +00:00
Aiden Grossman	5eb85c052e	[JumpThreading] Remove LVI printer flag (#73426 ) This patch removes the -print-lvi-after-jump-threading flag now that we can print everything in the LVI cache using the print<lazy-value-info> pass.	2023-11-27 00:19:23 -08:00
Nikita Popov	2b646b5989	[CVP] Don't try to fold load/store operands to constant (#73338 ) CVP currently tries to fold load/store pointer operands to constants using LVI. If there is a dominating condition of the form `icmp eq ptr %p, @g`, then `%p` will be replaced with `@g`. LVI is geared towards range-based optimizations, and is very inefficient at handling simple pointer equality conditions. We have other passes that can handle this optimization in a more efficient way, such as IPSCCP and GVN. Removing this optimization gives a geomean 0.4-1.2% compile-time improvement depending on configuration. At the same time, there is no impact on codegen.	2023-11-27 09:17:03 +01:00
Hsiangkai Wang	477c0b67a3	[mlir][affine][gpu] Replace DivSIOp to CeilDivSIOp when lowering to GPU launch (#73328 ) When converting affine.for to GPU launch operator, we have to calculate the block dimension and thread dimension for the launch operator. The formula of the dimension size is (upper_bound - lower_bound) / step_size When the difference is indivisible by step_size, we use rounding-to-zero as the division result. However, the block dimension and thread dimension is right-open range, i.e., [0, block_dim) and [0, thread_dim). So, we will get the wrong result if we use DivSIOp. In this patch, we replace it with CeilDivSIOp to get the correct block and thread dimension values.	2023-11-27 08:05:54 +00:00
Shengchen Kan	27c0bc9cae	[X86][MC] Allow to specify any of the 8/16/32/64 register names interchangeably for R16-R31 (#73421 )	2023-11-27 15:25:19 +08:00
Owen Pan	659e4017b7	[clang-format][NFC] Improve an `if` conditional in the annotator	2023-11-26 22:54:44 -08:00
LLVM GN Syncbot	681d02d09e	[gn build] Port f8afc53d641c	2023-11-27 06:00:31 +00:00
Dmitry Vyukov	f8afc53d64	[libc++] Speed up classic locale (#72112 ) Locale objects use atomic reference counting, which may be very expensive in parallel applications. The classic locale is used by default by all streams and can be very contended. But it's never destroyed, so the reference counting is also completely pointless on the classic locale. Currently ~70% of time in the parallel stringstream benchmarks is spent in locale ctor/dtor. And the execution radically slows down with more threads. Avoid reference counting on the classic locale. With this change parallel benchmarks start to scale with threads. Co-authored-by: Louis Dionne <ldionne.2@gmail.com> ``` │ baseline │ optimized │ │ sec/op │ sec/op vs base │ Istream_numbers/0/threads:1 4.672µ ± 0% 4.419µ ± 0% -5.42% (p=0.000 n=30+39) Istream_numbers/0/threads:72 539.817µ ± 0% 9.842µ ± 1% -98.18% (p=0.000 n=30+40) Istream_numbers/1/threads:1 4.890µ ± 0% 4.750µ ± 0% -2.85% (p=0.000 n=30+40) Istream_numbers/1/threads:72 66.44µ ± 1% 10.14µ ± 1% -84.74% (p=0.000 n=30+40) Istream_numbers/2/threads:1 4.888µ ± 0% 4.746µ ± 0% -2.92% (p=0.000 n=30+40) Istream_numbers/2/threads:72 494.8µ ± 0% 410.2µ ± 1% -17.11% (p=0.000 n=30+40) Istream_numbers/3/threads:1 4.697µ ± 0% 4.695µ ± 5% ~ (p=0.391 n=30+37) Istream_numbers/3/threads:72 421.5µ ± 7% 421.9µ ± 9% ~ (p=0.665 n=30) Ostream_number/0/threads:1 183.0n ± 0% 141.0n ± 2% -22.95% (p=0.000 n=30) Ostream_number/0/threads:72 24196.5n ± 1% 343.5n ± 3% -98.58% (p=0.000 n=30) Ostream_number/1/threads:1 250.0n ± 0% 196.0n ± 2% -21.60% (p=0.000 n=30) Ostream_number/1/threads:72 16260.5n ± 0% 407.0n ± 2% -97.50% (p=0.000 n=30) Ostream_number/2/threads:1 254.0n ± 0% 196.0n ± 1% -22.83% (p=0.000 n=30) Ostream_number/2/threads:72 28.49µ ± 1% 18.89µ ± 5% -33.72% (p=0.000 n=30) Ostream_number/3/threads:1 185.0n ± 0% 185.0n ± 0% 0.00% (p=0.017 n=30) Ostream_number/3/threads:72 19.38µ ± 4% 19.33µ ± 5% ~ (p=0.425 n=30) ```	2023-11-27 07:00:21 +01:00
Zi Xuan Wu (Zeson)	e89324219a	[RISCV] Don't combine store of vmv.x.s/vfmv.f.s to vp_store with VL of 1 when it's indexed store (#73219 ) Because we can't support vp_store with indexed address mode by lowering to vse intrinsic later.	2023-11-27 13:39:35 +08:00
Kazu Hirata	6318dd8273	[mlir] Fix a warning This patch fixes: mlir/lib/Pass/PassRegistry.cpp:376:37: error: ISO C++ requires the name after '::~' to be found in the same scope as the name before '::~' [-Werror,-Wdtor-name]	2023-11-26 20:40:56 -08:00
Jakub Kuderski	771676878a	[mlir][spirv] Add missing group non-uniform bitwise and logical ops (#73475 ) This covers the following ops: `spirv.GroupNonUniform` x {`Bitwise`, `Logical`} x {`And`, `Or`, `Xor`} We need these to efficiently lower from the `gpu.subgroup_reduce` op.	2023-11-26 23:23:19 -05:00
Wang Pengcheng	2e6c01be0d	[SelectionDAG] Add instantiated OPC_EmitInteger and OPC_EmitStringInteger (#73241 ) These two opcodes are used to be followed by a MVT operand, which is always one of i8/i16/i32/i64. We add instantiated `OPC_EmitInteger` and `OPC_EmitStringInteger` with i8/i16/i32/i64 so that we can reduce one byte. We reserve `OPC_EmitInteger` and `OPC_EmitStringInteger` in case that we may need them someday, though I haven't found one usage after this change. Overall this reduces the llc binary size with all in-tree targets by about 200K.	2023-11-27 11:08:28 +08:00
Fangrui Song	282201dc63	[Driver] Allow -e entry but reject -eentry (#72804 ) This short option taking an argument is unfortunate. * If a cc1-only option starts with `-e`, using it for driver will not be reported as an error (e.g. commit 6cd9886c88d16d288c74846495d89f2fe84ff827). * If another `-e` driver option is intended but a typo is made, the option will be recognized as a `-e`. `gcc -export-dynamic` passes `-export-dynamic` to ld. It's not clear whether some options behave this way. It seems `-Wl,-eentry` and `-Wl,--entry=entry` are primarily used. There may also be a few `gcc -e entry`, but `gcc -eentry` is extremely rare or not used at all. Therefore, we probably should reject the Joined form of `-e`.	2023-11-27 11:04:29 +08:00
XinWang10	a77ea94c3c	[X86][MC] Update condition about ExplicitVEXPrefix (#73312 ) After #72835, ExplicitVEXPrefix has changed and it is not a bit now, but in scope ExplicitOpPrefix, so the bitwise op of ExplicitVEXPrefix may need to update.	2023-11-27 10:39:46 +08:00

1 2 3 4 5 ...

481862 Commits