llvm-capstone

mirror of https://github.com/capstone-engine/llvm-capstone.git synced 2025-02-25 21:11:25 +00:00

Author	SHA1	Message	Date
Fabian Mora	be9fa9dee5	[flang][NVPTX] Add initial support to the NVPTX target (#71992 ) This patch adds initial support to the NVPTX target, enabling `flang` to produce OpenMP offload code for NVPTX targets.	2023-11-16 11:34:28 -05:00
agozillon	718793ce6a	[OpenMP][OMPIRBuilder] Handle replace uses of ConstantExpr's inside of Target regions (#71891 ) Currently there's an edge cases where constant indexing in target regions can lead to incorrect results as we do not correctly replace uses of mapped variables in generated target functions with the target arguments (and accessor instructions) that replace them. This patch seeks to fix that by extending the current logic in the OMPIRBuilder. Things like GEP's can come in the form of Constants/ConstantExprs, Constants and ConstantExpr's do not have access to the knowledge of what they're contained in, so we must dig a little to find an instruction so we can tell if they're used inside of the function we're outlining so we can be sure they are replaceable and we are not accidentally replacing a usage somewhere else in the module that's still necessary. This patch handles these by replacing the original constant expression with a new instruction equivalent; an instruction as it allows easy modification in the following loop, as we can now know the constant (instruction) is owned by our target function (as it holds this knowledge) and replaceUsesOfWith can now be invoked on it (cannot do this with constants it seems), a brand new one also allows us to be cautious as it is perhaps possible the old expression was used inside of the function but exists and is used externally (unlikely by the nature of a Constant, but still a positive side affect).	2023-11-15 15:45:32 +01:00
Jan Patrick Lehr	5c22b907dc	Reland [OpenMP][libomptarget] Enable parallel copies via multiple SDM… (#72307 ) …A engines (#71801) This enables the AMDGPU plugin to use a new ROCm 5.7 interface to dispatch asynchronous data transfers across SDMA engines. The default functionality stays unchanged, meaning that all data transfers are enqueued into a H2D queue or an D2H queue, depending on transfer direction, via the HSA interface used previously. The new interface can be enabled via the environment variable `LIBOMPTARGET_AMDGPU_USE_MULTIPLE_SDMA_ENGINES=true` when libomptarget is built against a recent ROCm version (5.7 and later). As of now, requests are distributed in a round-robin fashion across available SDMA engines.	2023-11-14 21:30:04 +01:00
Joseph Huber	cc9e19ee59	Revert "[OpenMP][libomptarget] Enable parallel copies via multiple SDMA engines (#71801 )" This causes the tests to fail because the bots were not updated in time. Revert until we update the bots to a valid version. This reverts commit e876250b636522d1eb05a908f2e1cd451feab001.	2023-11-14 12:34:27 -06:00
Jan Patrick Lehr	e876250b63	[OpenMP][libomptarget] Enable parallel copies via multiple SDMA engines (#71801 ) This enables the AMDGPU plugin to use a new ROCm 5.7 interface to dispatch asynchronous data transfers across SDMA engines. The default functionality stays unchanged, meaning that all data transfers are enqueued into a H2D queue or an D2H queue, depending on transfer direction, via the HSA interface used previously. The new interface can be enabled via the environment variable `LIBOMPTARGET_AMDGPU_USE_MULTIPLE_SDMA_ENGINES=true` when libomptarget is built against a recent ROCm version (5.7 and later). As of now, requests are distributed in a round-robin fashion across available SDMA engines.	2023-11-14 19:16:39 +01:00
Brad Smith	5feebdcef2	[OpenMP] Link against libm on OpenBSD (#70614 ) Needed for some math functions in libomp.	2023-11-11 20:37:50 -05:00
Johannes Doerfert	7318fe6334	[OpenMP][FIX] Ensure device reduction geps work for multi-var reductions If we have more than one reduction variable we need to be consistent wrt. indexing. In 3de645efe30b83ba1b6d7e500486c4f441a17a61 we broke this as the buffer type was reduced to a singleton but the index computation was not adjusted to account for that offset. This fixes it by interleaving the reduction variables properly in a array-of-struct style. We can revert it back to struct-of-array in a follow up if turns out to be a problem. I doubt it since half the accesses should benefit from the locallity this layout offers and only the other half were consecutive before.	2023-11-10 14:34:46 -08:00
Joseph Huber	237adfca4e	[OpenMP] Rework handling of global ctor/dtors in OpenMP (#71739 ) Summary: This patch reworks how we handle global constructors in OpenMP. Previously, we emitted individual kernels that were all registered and called individually. In order to provide more generic support, this patch moves all handling of this to the target backend and the runtime plugin. This has the benefit of supporting the GNU extensions for constructors an destructors, removing a class of failures related to shared library destruction order, and allows targets other than OpenMP to use the same support without needing to change the frontend. This is primarily done by calling kernels that the backend emits to iterate a list of ctor / dtor functions. For x64, this is automatic and we get it for free with the standard `dlopen` handling. For AMDGPU, we emit `amdgcn.device.init` and `amdgcn.device.fini` functions which handle everything atuomatically and simply need to be called. For NVPTX, a patch https://github.com/llvm/llvm-project/pull/71549 provides the kernels to call, but the runtime needs to set up the array manually by pulling out all the known constructor / destructor functions. One concession that this patch requires is the change that for GPU targets in OpenMP offloading we will use `llvm.global_dtors` instead of using `atexit`. This is because `atexit` is a separate runtime function that does not mesh well with the handling we're trying to do here. This should be equivalent in all cases except for cases where we would need to destruct manually such as: ``` struct S { ~S() { foo(); } }; void foo() { static S s; } ``` However this is broken in many other ways on the GPU, so it is not regressing any support, simply increasing the scope of what we can handle. This changes the handling of ctors / dtors. This patch now outputs a information message regarding the deprecation if the old format is used. This will be completely removed in a later release. Depends on: https://github.com/llvm/llvm-project/pull/71549	2023-11-10 14:53:53 -06:00
Ilya Leoshkevich	72552fc5cb	[OpenMP][SystemZ] Compile __kmpc_omp_task_begin_if0() with backchain (#71834 ) OpenMP runtime fails to build on SystemZ with clang with the following error message: LLVM ERROR: Unsupported stack frame traversal count __kmpc_omp_task_begin_if0() uses OMPT_GET_FRAME_ADDRESS(1), which delegates to __builtin_frame_address(), which in turn works with nonzero values on SystemZ only if backchain is in use. If backchain is not in use, the above error is emitted. Compile __kmpc_omp_task_begin_if0() with backchain. Note that this only resolves the build error. If at runtime its caller is compiled without backchain, __builtin_frame_address() will produce an incorrect value, but will not cause a crash. Since the value is relevant only for OMPT, this is acceptable.	2023-11-09 23:54:16 +01:00
Konstantinos Parasyris	b34d31d2e1	[OpenMP] Fix record-replay allocation order for kernel environment (#71863 )	2023-11-09 12:51:22 -08:00
xingxue-ibm	90a9e9f638	[OpenMP] Fix a condition for KMP_OS_SOLARIS. (#71831 ) Line 75 of `z_Linux_util.cpp` checks `#ifdef KMP_OS_SOLARIS` which is always true regardless of the building platform because macro `KMP_OS_SOLARIS` is always defined in line 23 of `kmp_platform.h`: `define KMP_OS_SOLARIS 0`.	2023-11-09 13:30:36 -05:00
Saiyedul Islam	21861991e7	[OpenMP] Cleanup and fixes for ABI agnostic DeviceRTL (#71234 ) Fixes the DeviceRTL compilation to ensure it is ABI agnostic. Uses already available global variable "oclc_ABI_version" instead of "llvm.amdgcn.abi.verion". It also adds some minor fields in ImplicitArg structure.	2023-11-09 10:34:35 +05:30
Jonathan Peyton	5cc603cb22	[OpenMP] Add skewed iteration distribution on hybrid systems (#69946 ) This commit adds skewed distribution of iterations in nonmonotonic:dynamic schedule (static steal) for hybrid systems when thread affinity is assigned. Currently, it distributes the iterations at 60:40 ratio. Consider this loop with dynamic schedule type, for (int i = 0; i < 100; ++i). In a hybrid system with 20 hardware threads (16 CORE and 4 ATOM core), 88 iterations will be assigned to performance cores and 12 iterations will be assigned to efficient cores. Each thread with CORE core will process 5 iterations + extras and with ATOM core will process 3 iterations. Differential Revision: https://reviews.llvm.org/D152955	2023-11-08 10:19:37 -06:00
Anton Rydahl	446e11acef	[OpenMP ]Adding more libomptarget reduction tests (#71616 ) Based on https://github.com/llvm/llvm-project/pull/70766 I think it would be good to have a few more offloading reduction tests, so we do not accidentally break minimum and maximum reductions another time.	2023-11-07 20:39:08 -08:00
Shilei Tian	6d7457861b	[OpenMP][FIX] Fix the compile error introduced by reverting eab828d	2023-11-07 19:46:18 -05:00
Shilei Tian	6e574f125d	Revert "[OpenMP] Provide a specialized team reduction for the common case (#70766 )" This reverts commit eab828d46c2fb7613df0bc44d34ff89702ffcc80.	2023-11-07 19:16:44 -05:00
Johannes Doerfert	2d739f13d4	[OpenMP][Offload] Automatically map indirect function pointers (#71462 ) We already have all the information to automatically map function pointers that have been declared as `indirect` declare target by the user. This is just enabling and testing the functionality by looking through the one level of indirection.	2023-11-07 08:33:39 -08:00
Johannes Doerfert	002f422410	[OpenMP] Replace CUDART_VERSION with CUDA_VERSION	2023-11-06 12:30:40 -08:00
Johannes Doerfert	726ee40f52	[OpenMP] Move the recording code to account for KernelLaunchEnvironment We need to record late to account for the kernel launch environment as well as the potential changes in block and thread count.	2023-11-06 12:30:40 -08:00
Johannes Doerfert	3de645efe3	[OpenMP][NFC] Split the reduction buffer size into two components Before we tracked the size of the teams reduction buffer in order to allocate it at runtime per kernel launch. This patch splits the number into two parts, the size of the reduction data (=all reduction variables) and the (maximal) length of the buffer. This will allow us to allocate less if we need less, e.g., if we have less teams than the maximal length. It also allows us to move code from clangs codegen into the runtime as we now know how large the reduction data is.	2023-11-06 11:50:41 -08:00
Jan Patrick Lehr	07f5cf1992	[OpenMP][libomptarget] Fixes possible no-return warning (#70808 ) The UNREACHABLE macro resolves to message + trap, which may still warn, so we add call to __builtin_unreachable.	2023-11-06 16:45:03 +01:00
Akash Banerjee	be59fe5028	[OpenMP][Flang]Fix some of the Fortan OpenMP Offloading tests target_map_common_block2.f90 - Fix the extra space in the print message. - #67164 fixes this. So moving it outside of failing and also removing XFAIL marker. basic-target-region-3D-array.f90 - Corrected the check to account for the new lines printed. Depends on #67319	2023-11-06 13:24:02 +00:00
Shilei Tian	db37d25c53	Revert "[OpenMP] Simplify parallel reductions (#70983 )" This reverts commit e9a48f9e05c103a235993c6b15a2c36442a2ddc1 because it breaks 3 sollve 5.0 tests: test_loop_reduction_and_device.c test_loop_reduction_bitand_device.c test_loop_reduction_multiply_device.c	2023-11-05 22:51:59 -05:00
Konstantinos Parasyris	d301a28950	[OpenMP] Guard Virtual Memory Management API and Types (#70986 )	2023-11-03 16:24:18 -07:00
Johannes Doerfert	d3e7a48cbd	[OpenMP][NFC] Remove a no-op function	2023-11-03 10:28:36 -07:00
Neale Ferguson	1111ef0257	Add openmp support to System z (#66081 ) * openmp/README.rst - Add s390x to those platforms supported * openmp/libomptarget/plugins-nextgen/CMakeLists.txt - Add s390x subdirectory * openmp/libomptarget/plugins-nextgen/s390x/CMakeLists.txt - Add s390x definitions * openmp/runtime/CMakeLists.txt - Add s390x to those platforms supported * openmp/runtime/cmake/LibompGetArchitecture.cmake - Define s390x ARCHITECTURE * openmp/runtime/cmake/LibompMicroTests.cmake - Add dependencies for System z (aka s390x) * openmp/runtime/cmake/LibompUtils.cmake - Add S390X to the mix * openmp/runtime/cmake/config-ix.cmake - Add s390x as a supported LIPOMP_ARCH * openmp/runtime/src/kmp_affinity.h - Define __NR_sched_[get\|set]addinity for s390x * openmp/runtime/src/kmp_config.h.cmake - Define CACHE_LINE for s390x * openmp/runtime/src/kmp_os.h - Add KMP_ARCH_S390X to support checks * openmp/runtime/src/kmp_platform.h - Define KMP_ARCH_S390X * openmp/runtime/src/kmp_runtime.cpp - Generate code when KMP_ARCH_S390X is defined * openmp/runtime/src/kmp_tasking.cpp - Generate code when KMP_ARCH_S390X is defined * openmp/runtime/src/thirdparty/ittnotify/ittnotify_config.h - Define ITT_ARCH_S390X * openmp/runtime/src/z_Linux_asm.S - Instantiate __kmp_invoke_microtask for s390x * openmp/runtime/src/z_Linux_util.cpp - Generate code when KMP_ARCH_S390X is defined * openmp/runtime/test/ompt/callback.h - Define print_possible_return_addresses for s390x * openmp/runtime/tools/lib/Platform.pm - Return s390x as platform and host architecture * openmp/runtime/tools/lib/Uname.pm - Set hardware platform value for s390x	2023-11-03 12:42:55 +01:00
Brad Smith	b5b251aac8	[OpenMP] Add support for Solaris/x86_64 (#70593 ) Tested on `amd64-pc-solaris2.11`.	2023-11-02 23:29:02 -04:00
Johannes Doerfert	e9a48f9e05	[OpenMP] Simplify parallel reductions (#70983 ) A lot of the code was from a time when we had multiple parallel levels. The new runtime is much simpler, the code can be simplified a lot which should speed up reductions too.	2023-11-02 15:50:05 -07:00
Johannes Doerfert	eab828d46c	[OpenMP] Provide a specialized team reduction for the common case (#70766 ) We default to < 1024 teams if the user did not specify otherwise. As such we can avoid the extra logic in the teams reduction that handles more than num_of_records (default 1024) teams. This is a stopgap but still shaves of 33% of the runtime in some simple reduction examples.	2023-11-02 15:49:22 -07:00
Johannes Doerfert	95e11a97f6	[OpenMP][FIX] Unbreak a fencing issue A recent update caused the fences to be team only while we always need kernel fences. Broke OpenMC on NVIDIA A100.	2023-11-02 15:04:10 -07:00
Jon Chesterfield	f0e100a05a	[amdgpu][openmp] Treat missing TIMESTAMP_FREQUENCY as non-fatal (#70987 ) If you build with dynamic_hsa, the symbol is known and compilation succeeds. If you then run with a slightly older libhsa, this argument is not recognised and an error returned. I'd rather the program runs with a misleading omp wtime than refuses to run at all.	2023-11-01 22:43:34 +00:00
Johannes Doerfert	a8152086ff	[Attributor][FIX] Ensure new BBs are registered	2023-11-01 12:12:14 -07:00
Johannes Doerfert	a273d17d4a	[OpenMP][FIX] Do not add implicit argument to device Ctors and Dtors Constructors and destructors on the device do not take any arguments, also not the implicit dyn_ptr argument other kernels automatically take.	2023-11-01 11:18:11 -07:00
Johannes Doerfert	f9a89e6b9c	[OpenMP][FIX] Allocate per launch memory for GPU team reductions (#70752 ) We used to perform team reduction on global memory allocated in the runtime and by clang. This was racy as multiple instances of a kernel, or different kernels with team reductions, would use the same locations. Since we now have the kernel launch environment, we can allocate dynamic memory per-launch, allowing us to move all the state into a non-racy place. Fixes: https://github.com/llvm/llvm-project/issues/70249	2023-11-01 11:11:48 -07:00
Johannes Doerfert	b8cbc5c02c	[OpenMP] Introduce the KernelLaunchEnvironment as implicit argument (#70401 ) The KernelEnvironment is for compile time information about a kernel. It allows the compiler to feed information to the runtime. The KernelLaunchEnvironment is for dynamic information per kernel launch. It allows the rutime to feed information to the kernel that is not shared with other invocations of the kernel. The first use case is to replace the globals that synchronize teams reductions with per-launch versions. This allows concurrent teams reductions. More uses cases will follow, e.g., per launch memory pools. Fixes: https://github.com/llvm/llvm-project/issues/70249	2023-10-31 19:38:43 -07:00
Johannes Doerfert	6258da14d6	[OpenMP] Lower synchronization threshold for reductions This should provide an easy performance boost by only avoiding synchronization that was unnessary anyway.	2023-10-30 22:39:46 -07:00
Johannes Doerfert	e137af60cd	[OpenMP][NFC] Fix test to actually check for the result	2023-10-30 17:15:41 -07:00
Andrew Gozillon	68c384676c	[Flang][MLIR][OpenMP] Temporarily re-add basic handling of uses in target regions to avoid gfortran test-suite regressions This was a regression introduced by myself in: `6a62707c04` where I too hastily removed the basic handling of implicit captures we have currently. This will be superseded by all implicit captures being added to target operations map_info entries in a soon landing series of patches, however, that is currently not the case so we must continue to do some basic handling of these captures for the time being. This patch re-adds that behaviour to avoid regressions. Unfortunately this means some test changes as well as getUsedValuesDefinedAbove grabs constants used outside of the target region which aren't handled particularly well currently.	2023-10-30 15:10:12 -05:00
Shilei Tian	0d5b7dd25c	[OpenMP] Add a test for D158802 (#70678 ) In D158802 we honored user's `thread_limit` value even with the optimization introduced in D152014. This patch adds a simple test.	2023-10-30 15:59:05 -04:00
Jon Chesterfield	896749aa0d	[amdgpu][openmp] Avoiding writing to packet header twice (#70695 ) I think it follows from the HSA spec that a write to the first byte is deemed significant to the GPU in which case writing to the second short and reading back from it later would be safe. However, the examples for this all involve an atomic write to the first 32 bits and it seems a credible risk that the occasional CI errors abound invalid packets have as their root cause that the firmware notices the early write to packet->setup and treats that as a sign that the packet is ready to go. That was overly-paranoid, however in passing noticed the code in libc is genuinely invalid. The memset writes a zero to the header byte, changing it from type_invalid (1) to type_vendor (0), at which point the GPU is free to read the 64 byte packet and interpret it as a vendor packet, which is probably why libc CI periodically errors about invalid packets. Also a drive by change to do the atomic store on a uint32_t consistently. I'm not sure offhand what __atomic_store_n on a uint16_t* and an int resolves to, seems better to be unambiguous there.	2023-10-30 18:35:52 +00:00
agozillon	6a62707c04	[Flang][OpenMP][MLIR] Initial array section mapping MLIR -> LLVM-IR lowering utilising omp.bounds (#68689 ) This patch seeks to add initial lowering of OpenMP array sections within target region map clauses from MLIR to LLVM IR. This patch seeks to support fixed sized contiguous (don't think OpenMP supports anything other than contiguous sections from my reading but i could be wrong) arrays initially, before looking toward assumed size and shaped arrays. The patch also currently does not include stride, it's left for future work. Although, assumed size works in some fashion (dummy arguments) with some minor alterations to the OMPEarlyOutliner, so it is possible changes made in the IsolatedFromAbove series may allow this to work with no further required patches. It utilises the generated omp.bounds to calculate the size of the mapped OpenMP array (both for sectioned and un-sectioned arrays) as well as the offset to be passed to the kernel argument structure. Alongside these changes some refactoring of how map data is handled is attempted, using a new MapData structure to keep track of information utilised in the lowering of mapped values. The initial addition of a more complex createDeviceArgumentAccessor that utilises capture kinds similarly to (and loosely based on) Clang to generate different kernel argument accesses is also added. A similar function for altering how the kernel argument is passed to the kernel argument structure on the host is also utilised (createAlteredByCaptureMap), which allows modification of the pointer/basePointer based on their capture (and bounds information). It's of note ByRef, is the default for explicit mappings and ByCopy will be the default for implicit captures, so the former is currently tested in this patch and the latter is not for the moment.	2023-10-30 16:00:23 +01:00
Brad Smith	0a29879e41	[OpenMP] Add missing bit with the Hurd support (#70609 ) Looking at 855d09855d8e541176758f38015e8b9b522d6110 it looks like a bit was missing. The padding variable is used further down by the KMP_ALLOCA() function.	2023-10-29 22:35:03 -04:00
Brad Smith	0d1da7c37f	[OpenMP] Make use of getloadavg() on *BSD OS's (#70586 ) OpenBSD does not have /proc filesystem, neither does FreeBSD (by default).	2023-10-29 18:30:11 -04:00
Konstantinos Parasyris	d6a3d6b96d	[openmp] Fixed Support for VA for record-replay. (#70396 ) The commit was discussed in phabricator (https://reviews.llvm.org/D157186). Record replay currently fails on AMD as it conflicts with the heap memory allocator introduced in #69806. The workaround is setting `LIBOMPTARGET_HEAP_SIZE=0` during both record and replay run.	2023-10-29 12:27:19 -07:00
Johannes Doerfert	d346c82435	[OpenMP] Associate the KernelEnvironment with the GenericKernelTy (#70383 ) By associating the kernel environment with the generic kernel we can access middle-end information easily, including the launch bounds ranges that are acceptable. By constraining the number of threads accordingly, we now obey the user-provided bounds that were passed via attributes.	2023-10-29 11:35:34 -07:00
Brad Smith	223852aecf	[OpenMP] Fix building for 32-bit DragonFly, NetBSD, OpenBSD (#70527 ) Fixing ```#error "Unknown or unsupported OS"```	2023-10-27 22:53:24 -04:00
Konstantinos Parasyris	01828c4323	[OpenMP] record-replay use static-cast (#70516 ) [OpenMP] Fixes #69905	2023-10-27 16:46:34 -07:00
Shraiysh	03485a0406	[openmp][flang] Add tests for map clause (#70394 ) This patch adds basic tests for map clause on target construct for commonblocks. There will be more tests to add, which will be added in future patches. Currently failing tests are added in a separate folder with XFAIL. They should be moved as they are fixed.	2023-10-27 09:35:06 -07:00
Mehdi Amini	f390a76b7e	Revert "Revert "[OpenMP][NFC] Add min/max threads/teams count into the KernelEnvironment (#70257 )"" This reverts commit ddbaa11e9f43a38d50d62a9b9b07c3653b6bf8ab. Reapply the original commit, the broken test was repaired in 5e51363f38d083ab326736c0d4d1b5f9fe0de080 in the meantime.	2023-10-26 17:30:01 -07:00
Mehdi Amini	ddbaa11e9f	Revert "[OpenMP][NFC] Add min/max threads/teams count into the KernelEnvironment (#70257 )" This reverts commit c2a1249a8257ed033a98e32e425539c6da6700ec. The MLIR bots are broken with an omp test failure.	2023-10-26 17:25:20 -07:00

1 2 3 4 5 ...

3061 Commits