llvm-capstone

mirror of https://github.com/capstone-engine/llvm-capstone.git synced 2024-12-02 18:58:15 +00:00

Author	SHA1	Message	Date
Justin Bogner	7a13e410fd	[DirectX] Move ROV info into HLSL metadata. NFC Pull Request: https://github.com/llvm/llvm-project/pull/74896	2023-12-09 10:42:45 -08:00
Justin Bogner	18f0da26b2	[HLSL][DirectX] Avoid some unnecessary casting. NFC	2023-12-08 15:38:09 -08:00
Joseph Huber	97f3be2c5a	[CUDA][HIP] Improve variable registration with the new driver (#73177 ) Summary: This patch adds support for registering texture / surface variables from CUDA / HIP. Additionally, we now properly track the `extern` and `const` flags that are also used in these runtime functions. This does not implement the `managed` variables yet as those seem to require some extra handling I'm not familiar with. The issue is that the current offload entry isn't large enough to carry size and alignment information along with an extra global.	2023-12-07 15:44:23 -06:00
Dominik Adamski	bb4484d41e	[OpenMPIRBuilder] Add support for target workshare loops (#73360 ) The workshare loop for target region uses the new OpenMP device runtime. The code generation scheme for the new device runtime is presented below: Input code: ``` workshare-loop { loop-body } ``` Output code: helper function which represents loop body: ``` function-loop-body(counter, loop-body-args) { loop-body } ``` workshare-loop is replaced by the proper device runtime call: ``` call __kmpc_new_worksharing_rtl(function-loop-body, loop-body-args, loop-tripcount, ...) ``` This PR uses the new device runtime functions which were added in PR: https://github.com/llvm/llvm-project/pull/73225	2023-12-06 09:47:09 +01:00
Jorge Gorbe Moya	ce9b72c979	[NFC] Fix unused variable (used only in assert) after `d1cdcddcc2`	2023-12-04 14:15:24 -08:00
Youngsuk Kim	d1cdcddcc2	[llvm][OMPIRBuilder] Remove no-op ptr-to-ptr bitcast (NFC) Opaque ptr cleanup effort	2023-12-04 15:06:07 -06:00
Craig Topper	aba040182a	[IR] Replace uses of IRBuilder::getInt8PtrTy with getPtrTy. NFC (#73154 )	2023-11-22 12:24:18 -08:00
Joseph Huber	52204a29ab	[Offload] Initial support for registering offloading entries on COFF targets (#72697 ) Summary: This patch provides the initial support to allow handling the new driver's offloading entries. Normally, the ELF target can emit varibles at C-identifier named sections and the linker will provide a pointer to the section. For COFF target, instead the linker merges sections containing a `$` in alphabetical order. We thus can emit these variables at sections and then emit two variables that are guaranteed to be sorted before and after the others to traverse it. Previous patches consolidated the handling of offloading entries so that this patch more easily can handle mapping them to the appropriate section. Ideally, the only remaining step to allow the new driver to run on Windows targets is to accurately map the following `ld.lld` arguments to their `llvm-link` equivalents. These are used inside the linker-wrapper, so we should simply need to remap the arguments to the same functionality if possible. ``` -o, -output -l, --library -L, --library-path -v, --version -rpath -whole-archive, -no-whole-archive ``` I have not tested this at runtime as I do not have access to a windows machine. This patch was adapted from some initial efforts in https://reviews.llvm.org/D137470.	2023-11-21 06:48:34 -06:00
Mike Rice	3ce5c04ad0	Replace getAs with castAs, dyn_cast with cast (NFC) (#72600 ) Make the code clear that nullptrs are not expected.	2023-11-17 09:22:33 -08:00
Joseph Huber	9c0e64999b	[Offloading][NFC] Refactor handling of offloading entries (#72544 ) Summary: This patch is a simple refactoring of code out of the linker wrapper into a common location. The main motivation behind this change is to make it easier to change the handling in the future to accept a triple to be used to emit entries that function on that target.	2023-11-17 08:26:20 -06:00
agozillon	718793ce6a	[OpenMP][OMPIRBuilder] Handle replace uses of ConstantExpr's inside of Target regions (#71891 ) Currently there's an edge cases where constant indexing in target regions can lead to incorrect results as we do not correctly replace uses of mapped variables in generated target functions with the target arguments (and accessor instructions) that replace them. This patch seeks to fix that by extending the current logic in the OMPIRBuilder. Things like GEP's can come in the form of Constants/ConstantExprs, Constants and ConstantExpr's do not have access to the knowledge of what they're contained in, so we must dig a little to find an instruction so we can tell if they're used inside of the function we're outlining so we can be sure they are replaceable and we are not accidentally replacing a usage somewhere else in the module that's still necessary. This patch handles these by replacing the original constant expression with a new instruction equivalent; an instruction as it allows easy modification in the following loop, as we can now know the constant (instruction) is owned by our target function (as it holds this knowledge) and replaceUsesOfWith can now be invoked on it (cannot do this with constants it seems), a brand new one also allows us to be cautious as it is perhaps possible the old expression was used inside of the function but exists and is used externally (unlikely by the nature of a Constant, but still a positive side affect).	2023-11-15 15:45:32 +01:00
Akash Banerjee	767b34297d	[OpenMP] Mute OpenMP Target Enter, Exit and Data codegen for device pass (#72287 )	2023-11-15 10:44:16 +00:00
Youngsuk Kim	876236023c	[llvm] Remove no-op ptr-to-ptr bitcasts (NFC) (#72133 ) Opaque ptr cleanup effort (NFC).	2023-11-13 13:05:27 -05:00
Tom Eccles	a207e6307a	[flang] add fveclib flag (#71734 ) -fveclib= allows users to choose a vectorized libm so that loops containing math functions are vectorized. This is implemented as much as possible in the same way as in clang. The driver test in veclib.f90 is copied from the clang test.	2023-11-13 10:04:50 +00:00
Dominik Adamski	f2f5f1bfb6	[OMPIRBuilder] Do not call __kmpc_push_num_threads for device parallel (#71934 ) Function __kmpc_push_num_threads should be called only if we specify number of threads for host parallel region. Number of threads specified by the user should be passed as one of arguments of __kmpc_parallel_51 function.	2023-11-10 20:38:56 +01:00
Paulo Matos	7b9d73c2f9	[NFC] Remove Type::getInt8PtrTy (#71029 ) Replace this with PointerType::getUnqual(). Followup to the opaque pointer transition. Fixes an in-code TODO item.	2023-11-07 17:26:26 +01:00
Johannes Doerfert	3de645efe3	[OpenMP][NFC] Split the reduction buffer size into two components Before we tracked the size of the teams reduction buffer in order to allocate it at runtime per kernel launch. This patch splits the number into two parts, the size of the reduction data (=all reduction variables) and the (maximal) length of the buffer. This will allow us to allocate less if we need less, e.g., if we have less teams than the maximal length. It also allows us to move code from clangs codegen into the runtime as we now know how large the reduction data is.	2023-11-06 11:50:41 -08:00
Dominik Adamski	2cce0f6c57	[OpenMP][OMPIRBuilder] Add support to omp target parallel (#67000 ) Added support for LLVM IR code generation which is used for handling omp target parallel code. The call for __kmpc_parallel_51 is generated and the parallel region is outlined to separate function. The proper setup of kmpc_target_init mode is not included in the commit. It is assumed that the SPMD mode for target initialization is properly set by other codegen functions.	2023-11-06 11:44:00 +01:00
Johannes Doerfert	b8cbc5c02c	[OpenMP] Introduce the KernelLaunchEnvironment as implicit argument (#70401 ) The KernelEnvironment is for compile time information about a kernel. It allows the compiler to feed information to the runtime. The KernelLaunchEnvironment is for dynamic information per kernel launch. It allows the rutime to feed information to the kernel that is not shared with other invocations of the kernel. The first use case is to replace the globals that synchronize teams reductions with per-launch versions. This allows concurrent teams reductions. More uses cases will follow, e.g., per launch memory pools. Fixes: https://github.com/llvm/llvm-project/issues/70249	2023-10-31 19:38:43 -07:00
Nikita Popov	5eb65ca83d	[OpenMP] Move function out of !NDEBUG section To unbreak the release build.	2023-10-29 20:33:05 +01:00
Johannes Doerfert	d346c82435	[OpenMP] Associate the KernelEnvironment with the GenericKernelTy (#70383 ) By associating the kernel environment with the generic kernel we can access middle-end information easily, including the launch bounds ranges that are acceptable. By constraining the number of threads accordingly, we now obey the user-provided bounds that were passed via attributes.	2023-10-29 11:35:34 -07:00
Johannes Doerfert	31b91213bd	[OpenMP] Unify the min/max thread/teams pathways We used to pass the min/max threads/teams values through different paths from the frontend to the middle end. This simplifies the situation by passing the values once, only when we will create the KernelEnvironment, which contains the values. At that point we also manifest the metadata, as appropriate. Some footguns have also been removed, e.g., our target check is now triple-based, not calling convention-based, as the latter is dependent on the ordering of operations. The types of the values have been unified to int32_t.	2023-10-29 10:53:20 -07:00
Mehdi Amini	f390a76b7e	Revert "Revert "[OpenMP][NFC] Add min/max threads/teams count into the KernelEnvironment (#70257 )"" This reverts commit `ddbaa11e9f`. Reapply the original commit, the broken test was repaired in `5e51363f38` in the meantime.	2023-10-26 17:30:01 -07:00
Mehdi Amini	ddbaa11e9f	Revert "[OpenMP][NFC] Add min/max threads/teams count into the KernelEnvironment (#70257 )" This reverts commit `c2a1249a82`. The MLIR bots are broken with an omp test failure.	2023-10-26 17:25:20 -07:00
Johannes Doerfert	c2a1249a82	[OpenMP][NFC] Add min/max threads/teams count into the KernelEnvironment (#70257 ) The runtime needs to know about the acceptable launch bounds, especially if the compiler (middle- or backend) assumed those bounds. While this patch does not yet inform the runtime, it stores the bounds in a place that can/will be accessed and is associated with the kernel.	2023-10-26 14:46:55 -07:00
Johannes Doerfert	0ba57c8bba	[OpenMP] Pass min/max thread and team count to the OMPIRBuilder (#70247 ) We now provide the information about the min/max thread and team count from to the OMPIRBuilder, no matter what the source was. That means we unify `thread_limit`, `num_teams`, `num_threads` handling with the target specific attriutes (`__launch_bounds__` and `amdgpu_flat_work_group_size`). This is in preparation to pass the values to the runtime, and to allow the middle-end (OpenMP-opt) to tighten the values if it seems appropriate. There is no "real" change after this commit.	2023-10-26 14:45:07 -07:00
Joseph Huber	078ae8cd64	[Offloading][NFC] Move creation of offloading entries from OpenMP (#70116 ) Summary: This patch is a first step to remove dependencies on the OpenMPIRBuilder for creating generic offloading entries. This patch changes no functionality and merely moves the code around. In the future the interface will be changed to allow for more code re-use in the registration and creation of offloading entries as well as a more generic interface for CUDA, HIP, OpenMP, and SYCL(?). Doing this as a first step to reduce the noise involved in the functional changes.	2023-10-25 09:25:43 -04:00
Shraiysh	9922aadf9e	[OpenMPIRBuilder] Added `if` clause for `teams` (#69139 ) This patch adds support for the `if` clause on `teams` construct. The value of the argument must be an integer value. If the value evaluates to true (non-zero) integer, then the number of threads is determined by `num_threads` clause (or default and ICV if `num_threads` is absent). When the condition evaluates to false (zero), then the bounds are set to 1. ([OpenMP 5.2 Section 10.2](https://www.openmp.org/spec-html/5.2/openmpse58.html)) This essentially means that ``` upperbound = ifexpr ? upperbound : 1 lowerbound = ifexpr ? lowerbound : 1 ```	2023-10-17 15:00:39 -05:00
Shraiysh	e41eaf4896	[OpenMPIRBuilder] Add ThreadLimit and NumTeams clauses to teams construct (#68364 ) This patch adds support for `thread_limit` and bounds on `num_teams` clause for the teams construct in OpenMP. Added testcases for the same.	2023-10-11 10:36:03 -05:00
Shraiysh	9050b27bd5	[OpenMPIRBuilder] Remove wrapper function in `createTask`, `createTeams` (#67723 ) This patch removes the wrapper function in `OpenMPIRBuilder::createTask` and `OpenMPIRBuilder.createTeams`. The outlined function is directly of the form that is expected by the runtime library calls. This patch also adds a utility function to help add fake values and their uses, which will be deleted in finalization callbacks. Why we needed wrappers earlier? Before the post outline callbacks are executed, the IR has the following structure: ``` define @func() { ;... call void @outlined_fn(ptr %data) ;... } define void @outlined_fn(ptr %data) ``` OpenMP offloading expects a specific signature for the outlined function in a runtime call. For example, `__kmpc_fork_teams` expects the following signature: ``` define @outlined_fn(ptr %global.tid, ptr %data) ``` As there is no way to change a function's arguments after it has been created, a wrapper function with the expected signature is created that calls the outlined function inside it. How we are handling it now? To handle this in the current patch, we create a "fake" global tid and add a "fake" use for it in the to-be-outlined region. We need to create these fake values so the outliner sees it as something it needs to pass to the outlined function. We also tell the outliner to exclude this global tid value from the aggregate `data` argument, so it comes as a separate argument in the beginning. This way, we are able to directly get the outlined function in the expected format. This is inspired by the way `createParallel` handles outlining (using fake values and then deleting them later). Tasks are handled with a similar approach. This simplifies the generated code and the code to do this itself also becomes simpler (because we no longer have to construct a new function).	2023-10-09 09:20:31 -04:00
agozillon	2a1f1b5fde	[OpenMP][OpenMPIRBuilder] Move copyInput to a passed in lambda function and re-order kernel argument load/stores (#68124 ) This patch moves the existing copyInput function into a lambda argument that can be defined by a caller to the function. This allows more flexibility in how the function is defined, allowing Clang and MLIR to utilise their own respective functions and types inside of the lamba without affecting the OMPIRBuilder itself. The idea is to eventually replace/build on the existing copyInput function that's used and moved into OpenMPToLLVMIRTranslation.cpp to a slightly more complex implementation that uses MLIRs map information (primarily ByRef and ByCapture information at the moment). The patch also moves kernel load stores to the top of the kernel, prior to the first openmp runtime invocation. Just makes the IR a little closer to Clang.	2023-10-06 16:47:27 +02:00
agozillon	bc0c1783fd	[Clang][OpenMP][OMPIRBuilder] Move Clang's OpenMP Member/MemberOf flag helpers into the OMPIRBuilder (#67844 ) This patch seeks to move the following functions to the OMPIRBuilder: - getFlagMemberOffset - getMemberOfFlag - setCorrectMemberOfFlag These small helper functions help set the end bits of the OpenMPOffloadMappingFlags flag that correspond to the reserved segment for OMP_MAP_MEMBER_OF. They will be of use in the future for lowering MLIR types/values that can contian members and can be lowered similarly to a structure or class type within the OpenMPToLLVMIRTranslation step of the OpenMP dialects lowering to LLVM-IR. In particular for Flang which currently uses this flow. Types with descriptors like pointers/allocatables, and likely derived types in certain cases can be lowered as if they were structures with explicitly mapped members.	2023-10-03 15:20:44 +02:00
Benjamin Kramer	4731623f81	[LoopUnroll] Fold variable only used in assert into the assert Avoids warnings in Release builds. NFCI.	2023-09-27 13:19:18 +02:00
Nikita Popov	18be23f82a	[OPMIRBuilder] Fix typo in condition Fix a condition I accidentally inverted in `296671f059`.	2023-09-27 12:24:02 +02:00
Nikita Popov	296671f059	[LoopUnroll] Store more information in UnrollCostEstimator (NFCI) Instead of having ApproximateLoopSize() use a bunch of out parameters, from which we later construct an UnrollCostEstimator, directly construct UnrollCostEstimator which holds all the information derived from loop analysis. This makes it easier to add additional metrics in the future.	2023-09-27 12:06:13 +02:00
Shraiysh	8d17875acb	[OMPIRBuilder] Added `createTeams` (#66807 ) This patch adds basic support for `omp teams` to the OpenMPIRBuilder. The outlined function after code extraction is called from a wrapper function with appropriate arguments. This wrapper function is passed to the runtime calls. This approach is different from the Clang approach - clang directly emits the runtime call to the outlined function. The outlining utility (OutlineInfo) simply outlines the code and generates a function call to the outlined function. After the function has been generated by the outlining utility, there is no easy way to alter the function arguments without meddling with the outlining itself. Hence the wrapper function approach is taken.	2023-09-24 16:23:43 -05:00
Prabhdeep Singh Soni	9b57b167bb	[OMPIRBuilder] Fix shared clause for task construct This patch fixes the shared clause for the task construct with multiple shared variables. The shareds field in the kmp_task_t is not an inline array in the struct, rather it is a pointer to an array. With an inline array, the pointer dereference to the outlined function body of the task would segmentation fault when accessed by the runtime. Reviewed By: kiranchandramohan, jdoerfert Differential Revision: https://reviews.llvm.org/D158462	2023-09-15 12:19:47 -04:00
Arthur Eubanks	0a1aa6cda2	[NFC][CodeGen] Change CodeGenOpt::Level/CodeGenFileType into enum classes (#66295 ) This will make it easy for callers to see issues with and fix up calls to createTargetMachine after a future change to the params of TargetMachine. This matches other nearby enums. For downstream users, this should be a fairly straightforward replacement, e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive or s/CGFT_/CodeGenFileType::	2023-09-14 14:10:14 -07:00
Sergio Afonso	9058762789	[OpenMP][Flang][MLIR] Lowering of requires directive from MLIR to LLVM IR Default atomic ordering information is processed in the OpenMP dialect to LLVM IR lowering stage at every spot where an operation can be affected by it. The rest of clauses are stored globally in the OpenMPIRBuilderConfig object before starting that lowering stage, so that the OMPIRBuilder can conditionally modify code generation depending on these. At the end of the process, the omp.requires attribute is itself lowered into a global constructor that passes these clauses as flags to the OpenMP runtime. Depends on D147217, D147218 and D158278. Differential Revision: https://reviews.llvm.org/D147219	2023-09-14 10:35:44 +01:00
Sergio Afonso	094a63a20b	[OpenMP][OMPIRBuilder] OpenMPIRBuilder support for requires directive This patch updates the `OpenMPIRBuilderConfig` structure to hold all available 'requires' clauses, and it replicates part of the code generation for the 'requires' registration function from clang in the `OMPIRBuilder`, to be used with flang. Porting the rest of features of the clang implementation to the IRBuilder and sharing it between clang and flang remains for a future patch, due to the complexity of the logic selecting the attributes of the generated registration function. Differential Revision: https://reviews.llvm.org/D147217	2023-09-14 10:33:54 +01:00
Shraiysh	8929f38320	[nfc][OpenMPIRBuilder] Formatting OMPIRBuilder.cpp and OMPIRBuilder.h (#65772 )	2023-09-08 11:54:52 -05:00
Jeremy Morse	4427407a29	[NFC][RemoveDIs] Create a new spelling of the moveBefore method As outlined in my proposal of how to get rid of debug intrinsics, this patch adds a moveBefore method that signals the caller /intends/ the order of moved instructions is to stay the same. This semantic difference has an effect on debug-info, as it signals whether debug-info needs to move with instructions or not. The patch just replaces a few calls to moveBefore with calls to moveBeforePreserving -- and the latter just calls the former, so it's all NFC right now. A future patch will add an implementation of moveBeforePreserving that takes action to correctly preserve debug-info, but that's tightly coupled with our non-instruction debug-info representation that's still being reviewed. [0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939 Differential Revision: https://reviews.llvm.org/D156369	2023-09-07 18:37:57 +01:00
Joseph Huber	9da61aed75	[OpenMP] Emit offloading entries for indirect target variables OpenMP 5.1 allows emission of the `indirect` clause on declare target functions, see https://www.openmp.org/spec-html/5.1/openmpsu70.html#x98-1080002.14.7. The intended use of this is to permit calling device functions via their associated host pointer. In order to do this the first step will be building a map associating these variables. Doing this will require the same offloading entry handling we use for other kernels and globals. We intentionally emit a new global on the device side. Although it's possible to look up the device function's address directly, this would require changing the visibility and would prevent us from making static functions indirect. Also, the CUDA toolchain will optimize out unused functions and using a global prevents that. The downside is that the runtime will need to read the global and copy its value, but there shouldn't be any other costs. Note that this patch just performs the codegen, currently this new offloading entry type is unused and will be ignored by the runtime. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D157738	2023-08-24 18:21:13 -05:00
Johannes Doerfert	f3958ce008	[OpenMP] Add NVIDIA annotations for static grid thread limit We already add AMD GPU annotations, the NVIDIA ones are just a little more convoluted to add/update but otherwise the same. We see again that the interplay of ompx_attribute and deduced value needs to be improved, see the TODO. Differential Revision: https://reviews.llvm.org/D158383	2023-08-23 11:12:04 -07:00
Johannes Doerfert	7481b465ae	[OpenMP] Use default grid value for static grid size If the user did not provide any static clause to override the grid size, we assume the default grid size as upper bound and use it to improve code generation through vendor specific attributes. Fixes: https://github.com/llvm/llvm-project/issues/64816 Differential Revision: https://reviews.llvm.org/D158382	2023-08-23 11:12:03 -07:00
Johannes Doerfert	df8d33fa7a	[OpenMP][AMDGPU] Add "amdgpu-flat-work-group-size" for known thread counts If we know the thread count statically and it is a constant, we can set the "amdgpu-flat-work-group-size" kernel attribute. Fixes https://github.com/llvm/llvm-project/issues/64816 in parts.	2023-08-18 21:47:57 -07:00
Akash Banerjee	5d9ccd7a96	[OpenMP] Migrate dispatch related utility functions from Clang codegen to OMPIRBuilder Migrate createForStaticInitFunction, createDispatchInitFunction, createDispatchNextFunction and createDispatchFiniFunction from Clang CodeGen to OMPIRBuilder. Differential Revision: https://reviews.llvm.org/D157994	2023-08-16 16:35:28 +01:00
Soumi Manna	bd1ddc5850	[NFC][OpenMP] Initialize pointer field Reviewed By: tahonermann Differential Revision: https://reviews.llvm.org/D157989	2023-08-16 07:47:24 -07:00
Joseph Huber	43125b6392	[OpenMP] Use protected visibility for the kernel environment Summary: These new globals should use protected visibility. Visibility like this helps certain checks on AMDGPU architecture and LTO.	2023-08-15 12:47:44 -05:00
Jan Sjodin	b7fcf51515	[OpenMP][OpenMPIRBuilder] Add kernel launch codegen to emitTargetCall This patch adds code emission in emitTargetCall to call the OpenMP runtime to launch an kernel, and to call the fallback host implementation if the launch fails. Reviewed By: TIFitis, kiranchandramohan, jdoerfert Differential Revision: https://reviews.llvm.org/D155633	2023-08-15 10:03:06 -04:00

1 2 3 4 5 ...

316 Commits