llvm-capstone

mirror of https://github.com/capstone-engine/llvm-capstone.git synced 2024-11-23 13:50:11 +00:00

Author	SHA1	Message	Date
Joseph Huber	4b7beab418	[OpenMP] Add back implicit flags manually Summary: We used to inherit these flags from the LLVM options in a runtimes build. This patch adds them back in manually as they are helpful for diagnostics and optimizing the created binary.	2023-11-27 14:51:48 -06:00
Johannes Doerfert	7bfcce3e94	[OpenMP] Tear down GenericDeviceTy's with GenericPluginTy (#73557 ) There is no point in keeping GenericDeviceTy objects alive longer than the associated GenericPluginTy. Instead of the old API we now tear them down with the plugin, avoiding ordering issues.	2023-11-27 11:42:12 -08:00
Johannes Doerfert	f9436464a9	[OpenMP][NFC] Minor name and code simplification	2023-11-27 11:08:29 -08:00
Johannes Doerfert	2b2e711afc	[OpenMP][NFC] Remove no-op __tgt_rtl_deinit_plugin The order in which we deinit things, especially when shared libraries are involved, is complicated. To simplify our lives the nextgen plugin deinitializes the GenericPluginTy and subclasses automatically. The old __tgt_rtl_deinit_plugin is not needed anymore.	2023-11-27 11:07:57 -08:00
Johannes Doerfert	9c33bf62a7	[OpenMP][NFC] Remove unused (un)register_lib plugin API These APIs have not been hooked up for a while. No need to carry them.	2023-11-27 11:07:57 -08:00
Brad Smith	e66876f2e0	[OpenMP][Tools] Have sort(1) not use long name parameters (#73477 ) I noticed a few tests were failing on NetBSD. NetBSD's sort(1) does not support long name parameters unlike GNU and FreeBSD/OpenBSD/DragonFly's sort(1). executed command: sort --numeric-sort --stable .---command stderr------------ \| sort: unknown option -- - \| usage: sort [-bdfHilmnrSsu] [-k kstart[,kend]] [-o output] [-R char] [-T dir] \| [-t char] [file ...] \| or: sort -C\|-c [-bdfilnru] [-k kstart[,kend]] [-o output] [-R char] \| [-t char] [file] `-----------------------------	2023-11-27 13:23:25 -05:00
Brad Smith	20406af31b	[runtime] Have the runtime use the compiler builtin for alloca on NetBSD (#73480 ) Most of the tests were failing with the following in their logs.. \| /usr/bin/ld: /home/brad/llvm-build/runtimes/runtimes-bins/openmp/runtime/src/libomp.so: warning: Warning: reference to the libc supplied alloca(3); this most likely will not work. Please use the compiler provided version of alloca(3), by supplying the appropriate compiler flags (e.g. -std=gnu99). By making use of __builtin_alloca.. before: Total Discovered Tests: 353 Unsupported: 59 (16.71%) Passed : 51 (14.45%) Failed : 243 (68.84%) after: Total Discovered Tests: 353 Unsupported: 59 (16.71%) Passed : 290 (82.15%) Failed : 4 (1.13%)	2023-11-27 13:22:54 -05:00
Joseph Huber	ca007181ea	[OpenMP] Fix missing CMake function in runtimes build Summary: We borrowed this function from LLVM, my previous patch removed that. Now we redefine it if it's not present.	2023-11-27 09:23:15 -06:00
Lixi Zhou	a3c0f705db	[NFC] fix failed ompt tests on M1 device (#65696 ) Fix the 2 failed ompt tests on M1 device found on #63194. ``` libomp :: ompt/synchronization/masked.c libomp :: ompt/synchronization/master.c ``` For the details of this fix, please check the origin discussion in https://github.com/llvm/llvm-project/issues/63194#issuecomment-1710494689 Thanks @jprotze for the fix.	2023-11-24 23:40:14 +01:00
Akash Banerjee	f1d773863d	[Flang][OpenMP] Remove use of non reference values from MapInfoOp (#72444 ) This patch removes the val field from the `MapInfoOp`. Previously when lowering `TargetOp`, the bounds information for the `BoxValues` were also being mapped. Instead these ops are now cloned inside the target region to prevent mapping of non reference typed values.	2023-11-24 11:33:19 +00:00
Joachim Jenke	f5e50b21da	[OpenMP] Optimized trivial multiple edges from task dependency graph From "3.1 Reducing the number of edges" of this [[ https://hal.science/hal-04136674v1/ \| paper ]] - Optimization (b) Task (dependency) nodes have a `successors` list built upon passed dependency. Given the following code, B will be added to A's successors list building the graph `A` -> `B` ``` // A # pragma omp task depend(out: x) {} // B # pragma omp task depend(in: x) {} ``` In the following code, B is currently added twice to A's successor list ``` // A # pragma omp task depend(out: x, y) {} // B # pragma omp task depend(in: x, y) {} ``` This patch removes such dupplicates by checking lastly inserted task in `A` successor list. Authored by: Romain Pereira (rpereira-dev) Differential Revision: https://reviews.llvm.org/D158544	2023-11-21 18:36:12 +01:00
Johannes Doerfert	f48c4d8aa1	[OpenMP] Be more forgiving during record and replay When we record and replay kernels we should not error out early if there is a chance the program might still run fine. This patch will: 1) Fallback to the allocation heuristic if the VAMap doesn't work. 2) Adjust the memory start to match the required address if possible. 3) Adjust the (guessed) pointer arguments if the memory start adjustment is impossible. This will allow kernels without indirect accesses to work while indirect accesses will most likely fail.	2023-11-20 17:15:34 -08:00
Johannes Doerfert	41566fb852	[OpenMP][FIX] Ensure recording works properly w/ late allocations	2023-11-20 17:15:33 -08:00
Johannes Doerfert	6663df30c0	[OpenMP][NFC] Remove std::move to silence warnings	2023-11-20 17:15:33 -08:00
Joseph Huber	47a3ad5be1	[Libomptarget] Handle dynamic stack sizes for AMD COV5 (#72606 ) Summary: One of the changes in the AMD code-object version five was that kernels that use an unknown amount of private stack memory now no longer default to 16 KBs. Instead it emits a flag that indicates the runtime must provide a value. This patch checks if we must provide such a stack, and uses the existing handling of the stack environment variable to configure it.	2023-11-20 12:48:42 -06:00
Brad Smith	3425e11a11	[OpenMP] Add missing pieces in __kmp_launch_worker for Solaris support (#72613 )	2023-11-17 13:04:13 -05:00
Fabian Mora	be9fa9dee5	[flang][NVPTX] Add initial support to the NVPTX target (#71992 ) This patch adds initial support to the NVPTX target, enabling `flang` to produce OpenMP offload code for NVPTX targets.	2023-11-16 11:34:28 -05:00
agozillon	718793ce6a	[OpenMP][OMPIRBuilder] Handle replace uses of ConstantExpr's inside of Target regions (#71891 ) Currently there's an edge cases where constant indexing in target regions can lead to incorrect results as we do not correctly replace uses of mapped variables in generated target functions with the target arguments (and accessor instructions) that replace them. This patch seeks to fix that by extending the current logic in the OMPIRBuilder. Things like GEP's can come in the form of Constants/ConstantExprs, Constants and ConstantExpr's do not have access to the knowledge of what they're contained in, so we must dig a little to find an instruction so we can tell if they're used inside of the function we're outlining so we can be sure they are replaceable and we are not accidentally replacing a usage somewhere else in the module that's still necessary. This patch handles these by replacing the original constant expression with a new instruction equivalent; an instruction as it allows easy modification in the following loop, as we can now know the constant (instruction) is owned by our target function (as it holds this knowledge) and replaceUsesOfWith can now be invoked on it (cannot do this with constants it seems), a brand new one also allows us to be cautious as it is perhaps possible the old expression was used inside of the function but exists and is used externally (unlikely by the nature of a Constant, but still a positive side affect).	2023-11-15 15:45:32 +01:00
Jan Patrick Lehr	5c22b907dc	Reland [OpenMP][libomptarget] Enable parallel copies via multiple SDM… (#72307 ) …A engines (#71801) This enables the AMDGPU plugin to use a new ROCm 5.7 interface to dispatch asynchronous data transfers across SDMA engines. The default functionality stays unchanged, meaning that all data transfers are enqueued into a H2D queue or an D2H queue, depending on transfer direction, via the HSA interface used previously. The new interface can be enabled via the environment variable `LIBOMPTARGET_AMDGPU_USE_MULTIPLE_SDMA_ENGINES=true` when libomptarget is built against a recent ROCm version (5.7 and later). As of now, requests are distributed in a round-robin fashion across available SDMA engines.	2023-11-14 21:30:04 +01:00
Joseph Huber	cc9e19ee59	Revert "[OpenMP][libomptarget] Enable parallel copies via multiple SDMA engines (#71801 )" This causes the tests to fail because the bots were not updated in time. Revert until we update the bots to a valid version. This reverts commit `e876250b63`.	2023-11-14 12:34:27 -06:00
Jan Patrick Lehr	e876250b63	[OpenMP][libomptarget] Enable parallel copies via multiple SDMA engines (#71801 ) This enables the AMDGPU plugin to use a new ROCm 5.7 interface to dispatch asynchronous data transfers across SDMA engines. The default functionality stays unchanged, meaning that all data transfers are enqueued into a H2D queue or an D2H queue, depending on transfer direction, via the HSA interface used previously. The new interface can be enabled via the environment variable `LIBOMPTARGET_AMDGPU_USE_MULTIPLE_SDMA_ENGINES=true` when libomptarget is built against a recent ROCm version (5.7 and later). As of now, requests are distributed in a round-robin fashion across available SDMA engines.	2023-11-14 19:16:39 +01:00
Brad Smith	5feebdcef2	[OpenMP] Link against libm on OpenBSD (#70614 ) Needed for some math functions in libomp.	2023-11-11 20:37:50 -05:00
Johannes Doerfert	7318fe6334	[OpenMP][FIX] Ensure device reduction geps work for multi-var reductions If we have more than one reduction variable we need to be consistent wrt. indexing. In `3de645efe3` we broke this as the buffer type was reduced to a singleton but the index computation was not adjusted to account for that offset. This fixes it by interleaving the reduction variables properly in a array-of-struct style. We can revert it back to struct-of-array in a follow up if turns out to be a problem. I doubt it since half the accesses should benefit from the locallity this layout offers and only the other half were consecutive before.	2023-11-10 14:34:46 -08:00
Joseph Huber	237adfca4e	[OpenMP] Rework handling of global ctor/dtors in OpenMP (#71739 ) Summary: This patch reworks how we handle global constructors in OpenMP. Previously, we emitted individual kernels that were all registered and called individually. In order to provide more generic support, this patch moves all handling of this to the target backend and the runtime plugin. This has the benefit of supporting the GNU extensions for constructors an destructors, removing a class of failures related to shared library destruction order, and allows targets other than OpenMP to use the same support without needing to change the frontend. This is primarily done by calling kernels that the backend emits to iterate a list of ctor / dtor functions. For x64, this is automatic and we get it for free with the standard `dlopen` handling. For AMDGPU, we emit `amdgcn.device.init` and `amdgcn.device.fini` functions which handle everything atuomatically and simply need to be called. For NVPTX, a patch https://github.com/llvm/llvm-project/pull/71549 provides the kernels to call, but the runtime needs to set up the array manually by pulling out all the known constructor / destructor functions. One concession that this patch requires is the change that for GPU targets in OpenMP offloading we will use `llvm.global_dtors` instead of using `atexit`. This is because `atexit` is a separate runtime function that does not mesh well with the handling we're trying to do here. This should be equivalent in all cases except for cases where we would need to destruct manually such as: ``` struct S { ~S() { foo(); } }; void foo() { static S s; } ``` However this is broken in many other ways on the GPU, so it is not regressing any support, simply increasing the scope of what we can handle. This changes the handling of ctors / dtors. This patch now outputs a information message regarding the deprecation if the old format is used. This will be completely removed in a later release. Depends on: https://github.com/llvm/llvm-project/pull/71549	2023-11-10 14:53:53 -06:00
Ilya Leoshkevich	72552fc5cb	[OpenMP][SystemZ] Compile __kmpc_omp_task_begin_if0() with backchain (#71834 ) OpenMP runtime fails to build on SystemZ with clang with the following error message: LLVM ERROR: Unsupported stack frame traversal count __kmpc_omp_task_begin_if0() uses OMPT_GET_FRAME_ADDRESS(1), which delegates to __builtin_frame_address(), which in turn works with nonzero values on SystemZ only if backchain is in use. If backchain is not in use, the above error is emitted. Compile __kmpc_omp_task_begin_if0() with backchain. Note that this only resolves the build error. If at runtime its caller is compiled without backchain, __builtin_frame_address() will produce an incorrect value, but will not cause a crash. Since the value is relevant only for OMPT, this is acceptable.	2023-11-09 23:54:16 +01:00
Konstantinos Parasyris	b34d31d2e1	[OpenMP] Fix record-replay allocation order for kernel environment (#71863 )	2023-11-09 12:51:22 -08:00
xingxue-ibm	90a9e9f638	[OpenMP] Fix a condition for KMP_OS_SOLARIS. (#71831 ) Line 75 of `z_Linux_util.cpp` checks `#ifdef KMP_OS_SOLARIS` which is always true regardless of the building platform because macro `KMP_OS_SOLARIS` is always defined in line 23 of `kmp_platform.h`: `define KMP_OS_SOLARIS 0`.	2023-11-09 13:30:36 -05:00
Saiyedul Islam	21861991e7	[OpenMP] Cleanup and fixes for ABI agnostic DeviceRTL (#71234 ) Fixes the DeviceRTL compilation to ensure it is ABI agnostic. Uses already available global variable "oclc_ABI_version" instead of "llvm.amdgcn.abi.verion". It also adds some minor fields in ImplicitArg structure.	2023-11-09 10:34:35 +05:30
Jonathan Peyton	5cc603cb22	[OpenMP] Add skewed iteration distribution on hybrid systems (#69946 ) This commit adds skewed distribution of iterations in nonmonotonic:dynamic schedule (static steal) for hybrid systems when thread affinity is assigned. Currently, it distributes the iterations at 60:40 ratio. Consider this loop with dynamic schedule type, for (int i = 0; i < 100; ++i). In a hybrid system with 20 hardware threads (16 CORE and 4 ATOM core), 88 iterations will be assigned to performance cores and 12 iterations will be assigned to efficient cores. Each thread with CORE core will process 5 iterations + extras and with ATOM core will process 3 iterations. Differential Revision: https://reviews.llvm.org/D152955	2023-11-08 10:19:37 -06:00
Anton Rydahl	446e11acef	[OpenMP ]Adding more libomptarget reduction tests (#71616 ) Based on https://github.com/llvm/llvm-project/pull/70766 I think it would be good to have a few more offloading reduction tests, so we do not accidentally break minimum and maximum reductions another time.	2023-11-07 20:39:08 -08:00
Shilei Tian	6d7457861b	[OpenMP][FIX] Fix the compile error introduced by reverting `eab828d`	2023-11-07 19:46:18 -05:00
Shilei Tian	6e574f125d	Revert "[OpenMP] Provide a specialized team reduction for the common case (#70766 )" This reverts commit `eab828d46c`.	2023-11-07 19:16:44 -05:00
Johannes Doerfert	2d739f13d4	[OpenMP][Offload] Automatically map indirect function pointers (#71462 ) We already have all the information to automatically map function pointers that have been declared as `indirect` declare target by the user. This is just enabling and testing the functionality by looking through the one level of indirection.	2023-11-07 08:33:39 -08:00
Johannes Doerfert	002f422410	[OpenMP] Replace CUDART_VERSION with CUDA_VERSION	2023-11-06 12:30:40 -08:00
Johannes Doerfert	726ee40f52	[OpenMP] Move the recording code to account for KernelLaunchEnvironment We need to record late to account for the kernel launch environment as well as the potential changes in block and thread count.	2023-11-06 12:30:40 -08:00
Johannes Doerfert	3de645efe3	[OpenMP][NFC] Split the reduction buffer size into two components Before we tracked the size of the teams reduction buffer in order to allocate it at runtime per kernel launch. This patch splits the number into two parts, the size of the reduction data (=all reduction variables) and the (maximal) length of the buffer. This will allow us to allocate less if we need less, e.g., if we have less teams than the maximal length. It also allows us to move code from clangs codegen into the runtime as we now know how large the reduction data is.	2023-11-06 11:50:41 -08:00
Jan Patrick Lehr	07f5cf1992	[OpenMP][libomptarget] Fixes possible no-return warning (#70808 ) The UNREACHABLE macro resolves to message + trap, which may still warn, so we add call to __builtin_unreachable.	2023-11-06 16:45:03 +01:00
Akash Banerjee	be59fe5028	[OpenMP][Flang]Fix some of the Fortan OpenMP Offloading tests target_map_common_block2.f90 - Fix the extra space in the print message. - #67164 fixes this. So moving it outside of failing and also removing XFAIL marker. basic-target-region-3D-array.f90 - Corrected the check to account for the new lines printed. Depends on #67319	2023-11-06 13:24:02 +00:00
Shilei Tian	db37d25c53	Revert "[OpenMP] Simplify parallel reductions (#70983 )" This reverts commit `e9a48f9e05` because it breaks 3 sollve 5.0 tests: test_loop_reduction_and_device.c test_loop_reduction_bitand_device.c test_loop_reduction_multiply_device.c	2023-11-05 22:51:59 -05:00
Konstantinos Parasyris	d301a28950	[OpenMP] Guard Virtual Memory Management API and Types (#70986 )	2023-11-03 16:24:18 -07:00
Johannes Doerfert	d3e7a48cbd	[OpenMP][NFC] Remove a no-op function	2023-11-03 10:28:36 -07:00
Neale Ferguson	1111ef0257	Add openmp support to System z (#66081 ) * openmp/README.rst - Add s390x to those platforms supported * openmp/libomptarget/plugins-nextgen/CMakeLists.txt - Add s390x subdirectory * openmp/libomptarget/plugins-nextgen/s390x/CMakeLists.txt - Add s390x definitions * openmp/runtime/CMakeLists.txt - Add s390x to those platforms supported * openmp/runtime/cmake/LibompGetArchitecture.cmake - Define s390x ARCHITECTURE * openmp/runtime/cmake/LibompMicroTests.cmake - Add dependencies for System z (aka s390x) * openmp/runtime/cmake/LibompUtils.cmake - Add S390X to the mix * openmp/runtime/cmake/config-ix.cmake - Add s390x as a supported LIPOMP_ARCH * openmp/runtime/src/kmp_affinity.h - Define __NR_sched_[get\|set]addinity for s390x * openmp/runtime/src/kmp_config.h.cmake - Define CACHE_LINE for s390x * openmp/runtime/src/kmp_os.h - Add KMP_ARCH_S390X to support checks * openmp/runtime/src/kmp_platform.h - Define KMP_ARCH_S390X * openmp/runtime/src/kmp_runtime.cpp - Generate code when KMP_ARCH_S390X is defined * openmp/runtime/src/kmp_tasking.cpp - Generate code when KMP_ARCH_S390X is defined * openmp/runtime/src/thirdparty/ittnotify/ittnotify_config.h - Define ITT_ARCH_S390X * openmp/runtime/src/z_Linux_asm.S - Instantiate __kmp_invoke_microtask for s390x * openmp/runtime/src/z_Linux_util.cpp - Generate code when KMP_ARCH_S390X is defined * openmp/runtime/test/ompt/callback.h - Define print_possible_return_addresses for s390x * openmp/runtime/tools/lib/Platform.pm - Return s390x as platform and host architecture * openmp/runtime/tools/lib/Uname.pm - Set hardware platform value for s390x	2023-11-03 12:42:55 +01:00
Brad Smith	b5b251aac8	[OpenMP] Add support for Solaris/x86_64 (#70593 ) Tested on `amd64-pc-solaris2.11`.	2023-11-02 23:29:02 -04:00
Johannes Doerfert	e9a48f9e05	[OpenMP] Simplify parallel reductions (#70983 ) A lot of the code was from a time when we had multiple parallel levels. The new runtime is much simpler, the code can be simplified a lot which should speed up reductions too.	2023-11-02 15:50:05 -07:00
Johannes Doerfert	eab828d46c	[OpenMP] Provide a specialized team reduction for the common case (#70766 ) We default to < 1024 teams if the user did not specify otherwise. As such we can avoid the extra logic in the teams reduction that handles more than num_of_records (default 1024) teams. This is a stopgap but still shaves of 33% of the runtime in some simple reduction examples.	2023-11-02 15:49:22 -07:00
Johannes Doerfert	95e11a97f6	[OpenMP][FIX] Unbreak a fencing issue A recent update caused the fences to be team only while we always need kernel fences. Broke OpenMC on NVIDIA A100.	2023-11-02 15:04:10 -07:00
Jon Chesterfield	f0e100a05a	[amdgpu][openmp] Treat missing TIMESTAMP_FREQUENCY as non-fatal (#70987 ) If you build with dynamic_hsa, the symbol is known and compilation succeeds. If you then run with a slightly older libhsa, this argument is not recognised and an error returned. I'd rather the program runs with a misleading omp wtime than refuses to run at all.	2023-11-01 22:43:34 +00:00
Johannes Doerfert	a8152086ff	[Attributor][FIX] Ensure new BBs are registered	2023-11-01 12:12:14 -07:00
Johannes Doerfert	a273d17d4a	[OpenMP][FIX] Do not add implicit argument to device Ctors and Dtors Constructors and destructors on the device do not take any arguments, also not the implicit dyn_ptr argument other kernels automatically take.	2023-11-01 11:18:11 -07:00
Johannes Doerfert	f9a89e6b9c	[OpenMP][FIX] Allocate per launch memory for GPU team reductions (#70752 ) We used to perform team reduction on global memory allocated in the runtime and by clang. This was racy as multiple instances of a kernel, or different kernels with team reductions, would use the same locations. Since we now have the kernel launch environment, we can allocate dynamic memory per-launch, allowing us to move all the state into a non-racy place. Fixes: https://github.com/llvm/llvm-project/issues/70249	2023-11-01 11:11:48 -07:00

1 2 3 4 5 ...

3077 Commits