llvm-capstone

mirror of https://github.com/capstone-engine/llvm-capstone.git synced 2025-02-12 21:28:48 +00:00

Author	SHA1	Message	Date
Arthur Eubanks	0a1aa6cda2	[NFC][CodeGen] Change CodeGenOpt::Level/CodeGenFileType into enum classes (#66295 ) This will make it easy for callers to see issues with and fix up calls to createTargetMachine after a future change to the params of TargetMachine. This matches other nearby enums. For downstream users, this should be a fairly straightforward replacement, e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive or s/CGFT_/CodeGenFileType::	2023-09-14 14:10:14 -07:00
Jakub Kuderski	ae84b160d2	[GitHub][mlir][spirv] Add missing patterns for SPIR-V in mlir (#66423 ) Include conversion passes and tools.	2023-09-14 17:06:03 -04:00
Joseph Huber	bbe7eb92b4	[libc][Obvious] Fix missing entrypoints after moving to generic Summary: The previous patch moved the implementations of these to generic/ and accidentally did not add the unlocked variants. This patch fixes that	2023-09-14 15:59:08 -05:00
Jakub Kuderski	12175bcbce	[mlir][spirv] Support coop matrix in `spirv.CompositeConstruct` (#66399 ) Also improve the documentation (code and website).	2023-09-14 16:57:59 -04:00
eric	571e4f233b	Mark LWG 2426 as complete. Atomic is implemented by Clang, and Clang already implements the behavior in the DR. This was initially noticed by Zoe Carver.	2023-09-14 16:42:57 -04:00
Joseph Huber	a1be5d69df	[libc] Implement more input functions on the GPU (#66288 ) Summary: This patch implements the `fgets`, `getc`, `fgetc`, and `getchar` functions on the GPU. Their implementations are straightforward enough. One thing worth noting is that the implementation of `fgets` will be extremely slow due to the high latency to read a single char. A faster solution would be to make a new RPC call to call `fgets` (due to the special rule that newline or null breaks the stream). But this is left out because performance isn't the primary concern here.	2023-09-14 15:39:29 -05:00
Adrian Prantl	96ccc81fb8	Add comments (NFC) (#66427 )	2023-09-14 12:56:28 -07:00
Yaxun (Sam) Liu	d7e1932f85	[HIP] Fix comdat of template kernel handle (#66283 ) Currently, clang emits LLVM IR that fails verifier for the following code: ``` template<typename T> __global__ void foo(T x); void bar() { foo<<<1, 1>>>(0); } ``` This is due to clang putting the kernel handle for foo into comdat, which is not allowed, since the kernel handle is a declaration. The siutation is similar to calling a declaration-only template function. The callee will be a declaration in LLVM IR and won't be put into comdat. This is in contrast to calling a template function with body, which will be put into comdat. Fixes: SWDEV-419769	2023-09-14 15:56:02 -04:00
Christopher Bate	cafb6284d1	[mlir][VectorToGPU] Update memref stride preconditions on `nvgpu.mma.sync` path This change removes the requirement that the row stride be statically known when converting `vector.transfer_read` and `vector.transfer_write` to distributed SIMT operations in the `nvgpu` lowering path. It also adds a check to verify that the last dimension of the source memref is statically known to have stride 1 since this is assumed in the conversion logic. No other change should be required since the generated `vector.load` operations are never created across dimensions other than the last. The routines for checking preconditions on `vector.transfer_read/write` are moved to under nvgpu utilities. The change is NFC with respect to the GPU dialect lowering path. Reviewed By: ThomasRaoux Differential Revision: https://reviews.llvm.org/D155753	2023-09-14 13:51:42 -06:00
Adrian Prantl	e8ad9b0e33	Add .swift_ast to list of Swift sections (#66426 )	2023-09-14 12:51:13 -07:00
Alexey Bataev	c15c1e5dd5	[SLP]Do not account non-instructions for external use. If the non-instruction gets vectorized, no need to account its extract cost, it won't be removed and replaced by extractelement instruction.	2023-09-14 12:40:33 -07:00
Alexey Bataev	1034405486	[SLP][NFC]Add a test for non-instruction with external use.	2023-09-14 12:34:14 -07:00
Christopher Ferris	fd1721d860	[scudo] Add -Wconversion for tests and clean-up warnings. (#66147 ) Fix all the places where the tests are doing implicit conversions.	2023-09-14 12:32:16 -07:00
Louis Dionne	85f27d126d	[libc++] Make sure LWG2070 is implemented as a DR (#65998 ) When we implemented C++20's P0674R1, we didn't enable the part of P0674R1 that was resolving LWG2070 as a DR. This patch fixes that and makes sure that we consistently go through the allocator when constructing and destroying the underlying object in std::allocate_shared. Fixes #54365.	2023-09-14 15:12:06 -04:00
Amir Ayupov	4a6426a802	[BOLT][NFC] Simplify RI::selectFunctionsToProcess Reviewed By: #bolt, maksfb Differential Revision: https://reviews.llvm.org/D159516	2023-09-14 11:57:44 -07:00
Kinuko Yasuda	0612c9b09a	[clang][dataflow] Ignore assignment where base class's operator is used (#66364 ) In C++ it seems it is legit to use base class's operator (e.g. `using Base::operator=`) to perform copy if the base class is the common ancestor of the source and destination object. In such a case we shouldn't try to access fields beyond that of the base class, however such a case seems to be very rare (typical code would implement a copy constructor instead), and could add complexities, so in this patch we simply bail if the method operator's parent class is different from the type of the destination object that this framework recognizes.	2023-09-14 20:45:56 +02:00
Leonard Chan	f45f1c3585	Reland "[clang] Add experimental option to omit the RTTI component from the vtable when -fno-rtti is used" This reverts commit 070493ddbd9473499d6f00ca62bc6aa92808ed79 (and relands the original change). This removes a test run that makes an assumption of RTTI being on by default for a given target.	2023-09-14 18:28:37 +00:00
LLVM GN Syncbot	a126d61ede	[gn build] Port 71e3642619dd	2023-09-14 18:15:29 +00:00
Kuba (Brecka) Mracek	454cc36630	[AArch64] Relax binary format switch in AArch64MCInstLower::LowerSymbolOperand to allow non-Darwin Mach-O files (#66011 ) Trying to use a arm64-apple-none-macho target triple today crashes with an assertion, this patch fixes that.	2023-09-14 11:12:30 -07:00
Matthias Braun	b0c8c45423	Avoid BlockFrequency overflow problems (#66280 ) Multiplying raw block frequency with an integer carries a high risk of overflow. - Add `BlockFrequency::mul` return an std::optional with the product or `nullopt` to indicate an overflow. - Fix two instances where overflow was likely.	2023-09-14 11:11:27 -07:00
Danila Malyutin	e80a8b4ab6	[NFC] Add test for #66382	2023-09-14 21:10:00 +03:00
Justin Bogner	11f0a63267	[github] Simplify DirectX backend labeling (#66407 ) Based on how AMDGPU and RISCV set up their labels - this way does a better job of catching everything with less maintenance burden.	2023-09-14 11:05:25 -07:00
Justin Bogner	71e3642619	[Transforms][DXIL] Wire up a basic DXILUpgrade pass (#66275 ) This pass will upgrade DXIL-style llvm constructs (which are mostly metadata) into the representations we use in LLVM for the same concepts. For now we just strip the valver metadata, which we don't need. Later changes will make this pass more useful, and then we should be able to wire it into clang and possibly the DirectX backend's AsmParser.	2023-09-14 11:02:31 -07:00
Alex Langford	a5a2a5a3ec	[lldb][NFCI] Remove use of ConstString in StructuredData The remaining use of ConstString in StructuredData is the Dictionary class. Internally it's backed by a `std::map<ConstString, ObjectSP>`. I propose that we replace it with a `llvm::StringMap<ObjectSP>`. Many StructuredData::Dictionary objects are ephemeral and only exist for a short amount of time. Many of these Dictionaries are only produced once and are never used again. That leaves us with a lot of string data in the ConstString StringPool that is sitting there never to be used again. Even if the same string is used many times for keys of different Dictionary objects, that is something we can measure and adjust for instead of assuming that every key may be reused at some point in the future. Quick comparisons of key data is likely not a concern with Dictionary, but the use of `llvm::StringMap` means that lookups should be fast with its hashing strategy. Switching to a llvm::StringMap meant that the iteration order may be different. To account for this when serializing/dumping the dictionary, I added some code to sort the output by key before emitting anything. Differential Revision: https://reviews.llvm.org/D159313	2023-09-14 10:53:39 -07:00
Christopher Bate	e2d39f799b	[mlir][Transform] Add `updateConversionTarget` to `ConversionPatternDescriptorOpInterface` This change adds a method to modify the ConversionTarget used during `transform.apply_conversion_patterns` to the `ConversionPatternDescriptorOpInterface`. This is needed when the TypeConverter is used to dictate the dynamic legality of operations, as in "structural" conversion patterns present in, for example, the SCF and func dialects. As a first use case/test, this change also adds a `transform.apply_patterns.scf.structural_conversions` operation to the SCF dialect. Reviewed By: springerm Differential Revision: https://reviews.llvm.org/D158672	2023-09-14 11:39:47 -06:00
Fangrui Song	5a58e98c20	[ELF] Align the end of PT_GNU_RELRO associated PT_LOAD to a common-page-size boundary (#66042 ) Close #57618: currently we align the end of PT_GNU_RELRO to a common-page-size boundary, but do not align the end of the associated PT_LOAD. This is benign when runtime_page_size >= common-page-size. However, when runtime_page_size < common-page-size, it is possible that `alignUp(end(PT_LOAD), page_size) < alignDown(end(PT_GNU_RELRO), page_size)`. In this case, rtld's mprotect call for PT_GNU_RELRO will apply to unmapped regions and lead to an error, e.g. ``` error while loading shared libraries: cannot apply additional memory protection after relocation: Cannot allocate memory ``` To fix the issue, add a padding section .relro_padding like mold, which is contained in the PT_GNU_RELRO segment and the associated PT_LOAD segment. The section also prevents strip from corrupting PT_LOAD program headers. .relro_padding has the largest `sortRank` among RELRO sections. Therefore, it is naturally placed at the end of `PT_GNU_RELRO` segment in the absence of `PHDRS`/`SECTIONS` commands. In the presence of `SECTIONS` commands, we place .relro_padding immediately before a symbol assignment using DATA_SEGMENT_RELRO_END (see also https://reviews.llvm.org/D124656), if present. DATA_SEGMENT_RELRO_END is changed to align to max-page-size instead of common-page-size. Some edge cases worth mentioning: * ppc64-toc-addis-nop.s: when PHDRS is present, do not append .relro_padding * avoid-empty-program-headers.s: when the only RELRO section is .tbss, it is not part of PT_LOAD segment, therefore we do not append .relro_padding. --- Close #65002: GNU ld from 2.39 onwards aligns the end of PT_GNU_RELRO to a max-page-size boundary (https://sourceware.org/PR28824) so that the last page is protected even if runtime_page_size > common-page-size. In my opinion, losing protection for the last page when the runtime page size is larger than common-page-size is not really an issue. Double mapping a page of up to max-common-page for the protection could cause undesired VM waste. Internally we had users complaining about 2MiB max-page-size applying to shared objects. Therefore, the end of .relro_padding is padded to a common-page-size boundary. Users who are really anxious can set common-page-size to match their runtime page size. --- 17 tests need updating as there are lots of change detectors.	2023-09-14 10:33:11 -07:00
Sam McCall	21ab252f97	[dataflow] Add global invariant condition to DataflowAnalysisContext (#65949 ) This records facts that are not sensitive to the current flow condition, and should apply to all environments. The motivating case is recording information about where a Value originated, such as nullability: - we may see the same Value for multiple expressions (e.g. reads of the same field) in multiple environments (multiple blocks or iterations) - we want to record information only when we first see the Value (e.g. Nullability annotations on fields only add information if we don't know where the value came from) - this information should be expressible as a SAT condition - we must add this SAT condition to every environment where the Value may appear We solve this by recording the information in the global condition. This doesn't seem particularly elegant, but solves the problem and is a fairly small and natural extension of the Environment. Alternatives considered: - store the constraint directly as a property on the Value. But it's more composable for such properties to always be variables (AtomicBoolValue), and constrain them with SAT conditions. - add a hook whenever values are created, giving the analysis the chance to populate them. However the framework relies on/provides the ability to construct values in arbitrary places without providing the context such a hook would need, this would be a very invasive change.	2023-09-14 19:30:04 +02:00
Alex Langford	2f377c5bd7	[lldb][NFCI] Remove use of ConstString from UnixSignals The majority of UnixSignals strings are static in the sense that they do not change. The overwhelming majority of these strings are string literals. Using ConstString to manage their lifetime does not make sense. The only exception to this is one of the subclasses of UnixSignals, for which I have created a StringSet local to that file which will guarantee the lifetimes of these StringRefs. As for the other benefits of ConstString, string uniqueness is not a concern (as many of them are already string literals) and comparing signal names and aliases should not be a hot path. Differential Revision: https://reviews.llvm.org/D159011	2023-09-14 10:19:53 -07:00
Ye Luo	8c2da6bb7f	[libomptarget] document ActionFunctions in the amdgpu plugin. (#66397 )	2023-09-14 12:18:49 -05:00
Jingu Kang	5ec9699c4d	[MachineLICM] Handle Subloops Following discussion on https://reviews.llvm.org/D154205, make MachineLICM pass handle subloops with only visiting outermost loop's blocks once. Differential Revision: https://reviews.llvm.org/D154205	2023-09-14 18:07:31 +01:00
Aart Bik	156a4ba9b4	[mlir][sparse] deprecate the convert{To,From}MLIRSparseTensor methods (#66304 ) Rationale: These libraries provided COO input and output at external boundaries which, since then, has been generalized to the much more powerful pack and unpack operations of the sparse tensor dialect.	2023-09-14 10:02:29 -07:00
Adrian Prantl	9dfc6d37da	Clean up test case (#66400 )	2023-09-14 09:48:36 -07:00
Matt Arsenault	07acfe3a4d	ADT: Replace FPClassTest fabs with inverse_fabs and unknown_sign (#66390 )	2023-09-14 19:46:53 +03:00
Shubham Sandeep Rastogi	0d0ab7600f	Fix Dexter test broken with e6cc7b723f244f52663b6d67a5d94597109da1ef	2023-09-14 09:34:02 -07:00
Shubham Sandeep Rastogi	e6cc7b723f	[Dexter] Fix test failures on greendragon (#66299 ) The issue with these test failures is that the dSYM was not being found by lldb, which is why setting breakpoints was failing and lldb quit without performing any steps. This change copies the dSYM to the same temp directory that the executable is copied to.	2023-09-14 09:28:47 -07:00
Yinying Li	e2e429d994	[mlir][sparse] Migrate more tests to new syntax (#66309 ) CSR: `lvlTypes = [ "dense", "compressed" ]` to `map = (d0, d1) -> (d0 : dense, d1 : compressed)` CSC: `lvlTypes = [ "dense", "compressed" ], dimToLvl = affine_map<(d0, d1) -> (d1, d0)>` to `map = (d0, d1) -> (d1 : dense, d0 : compressed)` This is an ongoing effort: #66146	2023-09-14 12:21:13 -04:00
Aaron Ballman	1db6b127a1	[C23] Remove N2713 from the list This paper was obsoleted by the changes in N3138 and US-045	2023-09-14 12:08:51 -04:00
Nuno Lopes	5eabb022d2	[test][NFC] fix call in test with mismatch in call/decl types	2023-09-14 13:59:02 +01:00
Joel E. Denny	15a1d288ba	[docs] Add more details about Python formatting (#66141 ) Describe the darker utility. Make it clear that black/darker's default rules should be used. Add some examples. See ongoing discussion at <https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style/68257>.	2023-09-14 12:00:19 -04:00
David Green	74724902ba	[AArch64] Split Ampere1Write_Arith into rr/ri and rs/rx InstRWs. (#66384 ) The ampere1 scheduling model uses IsCheapLSL predicates for ADDXri and ADDWrr instructions, which only have 3 operands. In attempting to check that the third is a shift, the predicate can attempt to access an out of bounds operand, hitting an assert. This splits the rr/ri instructions (which can never have shifts) from the rs/rx instructions to ensure they both work correctly. Ampere1Write_1cyc_1AB was chosen for the rr/ir instructions to match the cheap case. This also sets CompleteModel = 0 for the ampere1 scheduling model, as at runtime under debug it will attempt to check that as well as all instructions having scheduling info, there is information for each output operand. DefIdx 1 exceeds machine model writes for renamable $w9, renamable $w8 = LDPWi renamable $x8, 0 (Try with MCSchedModel.CompleteModel set to false)incomplete machine model	2023-09-14 16:29:30 +01:00
Ingo Müller	360c629024	[mlir][linalg][transform][python] Drop _get_op_result... from mix-ins. (#65726 ) `_get_op_result_or_value` was used in mix-ins to unify the handling of op results and values. However, that function is now called in the generated constructors, such that doing so in the mix-ins is not necessary anymore.	2023-09-14 17:24:16 +02:00
Matt Arsenault	ddc3346a6b	clang/AMDGPU: Fix accidental behavior change for __builtin_amdgcn_ldexph (#66340 )	2023-09-14 18:15:44 +03:00
Duncan P. N. Exon Smith	7976bdb589	Resign as code owner of branch weights and block frequency Somewhat overdue... it has been a few years since I stopped watching block frequency / branch weight patches actively, so I effectively stopped acting as code owner a while ago. Reflect the reality. Still happy to help out; feel free to pull me in if you think I might have useful context!	2023-09-14 08:01:02 -07:00
Manos Anagnostakis	008f26b12e	[AArch64] New subtarget features to control ldp and stp formation (#66098 ) On some AArch64 cores, including Ampere's ampere1 and ampere1a architectures, load and store pair instructions are faster compared to simple loads/stores only when the alignment of the pair is at least twice that of the individual element being loaded. Based on that, this patch introduces four new subtarget features, two for controlling ldp and two for controlling stp, to cover the ampere1 and ampere1a alignment needs and to enable optional fine-grained control over ldp and stp generation in general. The latter can be utilized by another cpu, if there are possible benefits with a different policy than the default provided by the compiler. More specifically, for each of the ldp and stp respectively we have: - disable-ldp/disable-stp: Do not emit ldp/stp. - ldp-aligned-only/stp-aligned-only: Emit ldp/stp only if the source pointer is aligned to at least double the alignment of the type. Therefore, for -mcpu=ampere1 and -mcpu=ampere1a ldp-aligned-only/stp-aligned-only become the defaults, because of the benefit from the alignment, whereas for the rest of the cpus the default behaviour of the compiler is maintained.	2023-09-14 16:58:39 +02:00
Slava Zakharin	b7d02d7e12	[flang] Select proper library APIs for derived type io. (#66327 ) This patch syncs the logic inside `getInputFunc` that selects the library API and the logic in `createIoRuntimeCallForItem` that creates the input arguments for the library call. There were cases where we selected `InputDerivedType` API and passed only two arguments, and also we selected `InputDescriptor` and passed three arguments. It turns out we also were incorrectly selecting `OutputDescriptor` in `getOutputFunc` (`test4` case in the new LIT test), which caused runtime issues for output of a derived type with descriptor components (due to the missing non-type-bound table).	2023-09-14 07:58:26 -07:00
Slava Zakharin	8730fe95d8	[flang] Do not finalize main program variables. (#66326 )	2023-09-14 07:58:01 -07:00
Björn Pettersson	a0ce4384a6	[LICM] Simplify isLoadInvariantInLoop given opaque pointers (#65597 ) Since we no longer support typed pointers in LLVM IR, the PtrASXTy in isLoadInvariantInLoop was set to be equal to Addr->getType() (an opaque ptr in the same address space). That made the loop looking through bitcasts redundant.	2023-09-14 16:53:34 +02:00
Matthias Springer	aca9019be0	[mlir][transform] Check for invalidated iterators on payload IR mappings (#66369 ) Add extra error checking (in debug mode) to detect cases where an iterator on "direct" payload IR mappings is invalidated (due to elements being removed). Such errors are hard to debug: they are often non-deterministic; sometimes the program crashes, sometimes it produces wrong results. Even when it crashes, the stack trace often points to completely unrelated code locations. Store a timestamp with each "direct" mapping. The timestamp is increased whenever an operation is performed that invaldiates an iterator on that mapping. A debug iterator is added that checks the timestamp as payload IR is enumerated.	2023-09-14 16:34:32 +02:00
Martin Erhart	66aa9a2517	[mlir][bufferization] Implement BufferDeallocationopInterface for scf.forall.in_parallel (#66351 ) The scf.forall.in_parallel terminator operation has a nested graph region with the NoTerminator trait. Such regions are not supported by the default implementations. Therefore, this commit adds a specialized implementation for this operation which only covers the case where the nested region is empty. This is because after bufferization, ops like tensor.parallel_insert_slice were already converted to memref operations residing int the scf.forall only and the nested region of scf.forall.in_parallel ends up empty.	2023-09-14 16:20:24 +02:00
Joel E. Denny	9e739fdb85	[lit] Fix some issues from --per-test-coverage (#65242 ) D154280 (landed in 64d19542e78a in July, 2023) implements `--per-test-coverage` (which can also be specified via `lit_config.per_test_coverage`). However, it has a few issues, which the current patch addresses: 1. D154280 implements `--per-test-coverage` only for the case that lit is configured to use an external shell. The current patch extends the implementation to lit's internal shell. 2. In the case that lit is configured to use an external shell, regardless of whether `--per-test-coverage` is actually specified, D154280 causes `%dbg(RUN: at line N)` to be expanded in RUN lines early and in a manner that is specific to sh-like shells. As a result, later code in lit that expands it in a shell-specific manner is useless as there's nothing left to expand. The current patch cleans up the implementation to avoid useless code. 3. Because of issue 2, D154280 corrupts support for windows `cmd` as an external shell (effectively comments out all RUN lines with `:`). The current patch happens to fix that particular corruption by addressing issue 2. However, D122569 (landed in 1041a9642ba0 in April, 2022) had already broken support for windows `cmd` as an external shell (discards RUN lines when expanding `%dbg(RUN: at line N)`). The current patch does not attempt to fix that bug. For further details, see the PR discussion of the current patch. The current patch addresses the above issues by implementing `--per-test-coverage` before selecting the shell (internal or external) and by leaving `%dbg(RUN: at line N)` unexpanded there. Thus, it is expanded later in a shell-specific manner, as before D154280. This patch introduces `buildPdbgCommand` into lit's implementation to encapsulate the process of building (or rebuilding in the case of the `--per-test-coverage` implementation) a full `%dbg(RUN: at line N) cmd` line and asserting that the result matches `kPdbgRegex`. It also cleans up that and all other uses of `kPdbgRegex` to operate on the full line with `re.fullmatch` not `re.match`. This change better reflects the intention in every case, but it is expected to be NFC because `kPdbgRegex` ends in `.` and thus avoids the difference between `re.fullmatch` and `re.match`. The only caveat is that `.` does not match newlines, but RUN lines cannot contain newlines currently, so this caveat currently shouldn't matter in practice. The original `--per-test-coverage` implementation avoided accumulating `export LLVM_PROFILE_FILE={profile}` insertions across retries (due to `ALLOW_RETRIES`) by skipping the insertion if `%dbg(RUN: at line N)` was not present and thus had already been expanded. However, the current patch makes sure the insertions also happen for commands without `%dbg(RUN: at line N)`, such as preamble commands or some commands from other lit test formats. Thus, the current patch implements a different mechanism to avoid accumulating those insertions (see code comments).	2023-09-14 10:08:20 -04:00

1 2 3 4 5 ...

474868 Commits