llvm-capstone

mirror of https://github.com/capstone-engine/llvm-capstone.git synced 2024-11-23 05:40:09 +00:00

Author	SHA1	Message	Date
Neale Ferguson	1111ef0257	Add openmp support to System z (#66081 ) * openmp/README.rst - Add s390x to those platforms supported * openmp/libomptarget/plugins-nextgen/CMakeLists.txt - Add s390x subdirectory * openmp/libomptarget/plugins-nextgen/s390x/CMakeLists.txt - Add s390x definitions * openmp/runtime/CMakeLists.txt - Add s390x to those platforms supported * openmp/runtime/cmake/LibompGetArchitecture.cmake - Define s390x ARCHITECTURE * openmp/runtime/cmake/LibompMicroTests.cmake - Add dependencies for System z (aka s390x) * openmp/runtime/cmake/LibompUtils.cmake - Add S390X to the mix * openmp/runtime/cmake/config-ix.cmake - Add s390x as a supported LIPOMP_ARCH * openmp/runtime/src/kmp_affinity.h - Define __NR_sched_[get\|set]addinity for s390x * openmp/runtime/src/kmp_config.h.cmake - Define CACHE_LINE for s390x * openmp/runtime/src/kmp_os.h - Add KMP_ARCH_S390X to support checks * openmp/runtime/src/kmp_platform.h - Define KMP_ARCH_S390X * openmp/runtime/src/kmp_runtime.cpp - Generate code when KMP_ARCH_S390X is defined * openmp/runtime/src/kmp_tasking.cpp - Generate code when KMP_ARCH_S390X is defined * openmp/runtime/src/thirdparty/ittnotify/ittnotify_config.h - Define ITT_ARCH_S390X * openmp/runtime/src/z_Linux_asm.S - Instantiate __kmp_invoke_microtask for s390x * openmp/runtime/src/z_Linux_util.cpp - Generate code when KMP_ARCH_S390X is defined * openmp/runtime/test/ompt/callback.h - Define print_possible_return_addresses for s390x * openmp/runtime/tools/lib/Platform.pm - Return s390x as platform and host architecture * openmp/runtime/tools/lib/Uname.pm - Set hardware platform value for s390x	2023-11-03 12:42:55 +01:00
Brad Smith	b5b251aac8	[OpenMP] Add support for Solaris/x86_64 (#70593 ) Tested on `amd64-pc-solaris2.11`.	2023-11-02 23:29:02 -04:00
Johannes Doerfert	e9a48f9e05	[OpenMP] Simplify parallel reductions (#70983 ) A lot of the code was from a time when we had multiple parallel levels. The new runtime is much simpler, the code can be simplified a lot which should speed up reductions too.	2023-11-02 15:50:05 -07:00
Johannes Doerfert	eab828d46c	[OpenMP] Provide a specialized team reduction for the common case (#70766 ) We default to < 1024 teams if the user did not specify otherwise. As such we can avoid the extra logic in the teams reduction that handles more than num_of_records (default 1024) teams. This is a stopgap but still shaves of 33% of the runtime in some simple reduction examples.	2023-11-02 15:49:22 -07:00
Johannes Doerfert	95e11a97f6	[OpenMP][FIX] Unbreak a fencing issue A recent update caused the fences to be team only while we always need kernel fences. Broke OpenMC on NVIDIA A100.	2023-11-02 15:04:10 -07:00
Jon Chesterfield	f0e100a05a	[amdgpu][openmp] Treat missing TIMESTAMP_FREQUENCY as non-fatal (#70987 ) If you build with dynamic_hsa, the symbol is known and compilation succeeds. If you then run with a slightly older libhsa, this argument is not recognised and an error returned. I'd rather the program runs with a misleading omp wtime than refuses to run at all.	2023-11-01 22:43:34 +00:00
Johannes Doerfert	a8152086ff	[Attributor][FIX] Ensure new BBs are registered	2023-11-01 12:12:14 -07:00
Johannes Doerfert	a273d17d4a	[OpenMP][FIX] Do not add implicit argument to device Ctors and Dtors Constructors and destructors on the device do not take any arguments, also not the implicit dyn_ptr argument other kernels automatically take.	2023-11-01 11:18:11 -07:00
Johannes Doerfert	f9a89e6b9c	[OpenMP][FIX] Allocate per launch memory for GPU team reductions (#70752 ) We used to perform team reduction on global memory allocated in the runtime and by clang. This was racy as multiple instances of a kernel, or different kernels with team reductions, would use the same locations. Since we now have the kernel launch environment, we can allocate dynamic memory per-launch, allowing us to move all the state into a non-racy place. Fixes: https://github.com/llvm/llvm-project/issues/70249	2023-11-01 11:11:48 -07:00
Johannes Doerfert	b8cbc5c02c	[OpenMP] Introduce the KernelLaunchEnvironment as implicit argument (#70401 ) The KernelEnvironment is for compile time information about a kernel. It allows the compiler to feed information to the runtime. The KernelLaunchEnvironment is for dynamic information per kernel launch. It allows the rutime to feed information to the kernel that is not shared with other invocations of the kernel. The first use case is to replace the globals that synchronize teams reductions with per-launch versions. This allows concurrent teams reductions. More uses cases will follow, e.g., per launch memory pools. Fixes: https://github.com/llvm/llvm-project/issues/70249	2023-10-31 19:38:43 -07:00
Johannes Doerfert	6258da14d6	[OpenMP] Lower synchronization threshold for reductions This should provide an easy performance boost by only avoiding synchronization that was unnessary anyway.	2023-10-30 22:39:46 -07:00
Johannes Doerfert	e137af60cd	[OpenMP][NFC] Fix test to actually check for the result	2023-10-30 17:15:41 -07:00
Andrew Gozillon	68c384676c	[Flang][MLIR][OpenMP] Temporarily re-add basic handling of uses in target regions to avoid gfortran test-suite regressions This was a regression introduced by myself in: `6a62707c04` where I too hastily removed the basic handling of implicit captures we have currently. This will be superseded by all implicit captures being added to target operations map_info entries in a soon landing series of patches, however, that is currently not the case so we must continue to do some basic handling of these captures for the time being. This patch re-adds that behaviour to avoid regressions. Unfortunately this means some test changes as well as getUsedValuesDefinedAbove grabs constants used outside of the target region which aren't handled particularly well currently.	2023-10-30 15:10:12 -05:00
Shilei Tian	0d5b7dd25c	[OpenMP] Add a test for D158802 (#70678 ) In D158802 we honored user's `thread_limit` value even with the optimization introduced in D152014. This patch adds a simple test.	2023-10-30 15:59:05 -04:00
Jon Chesterfield	896749aa0d	[amdgpu][openmp] Avoiding writing to packet header twice (#70695 ) I think it follows from the HSA spec that a write to the first byte is deemed significant to the GPU in which case writing to the second short and reading back from it later would be safe. However, the examples for this all involve an atomic write to the first 32 bits and it seems a credible risk that the occasional CI errors abound invalid packets have as their root cause that the firmware notices the early write to packet->setup and treats that as a sign that the packet is ready to go. That was overly-paranoid, however in passing noticed the code in libc is genuinely invalid. The memset writes a zero to the header byte, changing it from type_invalid (1) to type_vendor (0), at which point the GPU is free to read the 64 byte packet and interpret it as a vendor packet, which is probably why libc CI periodically errors about invalid packets. Also a drive by change to do the atomic store on a uint32_t consistently. I'm not sure offhand what __atomic_store_n on a uint16_t* and an int resolves to, seems better to be unambiguous there.	2023-10-30 18:35:52 +00:00
agozillon	6a62707c04	[Flang][OpenMP][MLIR] Initial array section mapping MLIR -> LLVM-IR lowering utilising omp.bounds (#68689 ) This patch seeks to add initial lowering of OpenMP array sections within target region map clauses from MLIR to LLVM IR. This patch seeks to support fixed sized contiguous (don't think OpenMP supports anything other than contiguous sections from my reading but i could be wrong) arrays initially, before looking toward assumed size and shaped arrays. The patch also currently does not include stride, it's left for future work. Although, assumed size works in some fashion (dummy arguments) with some minor alterations to the OMPEarlyOutliner, so it is possible changes made in the IsolatedFromAbove series may allow this to work with no further required patches. It utilises the generated omp.bounds to calculate the size of the mapped OpenMP array (both for sectioned and un-sectioned arrays) as well as the offset to be passed to the kernel argument structure. Alongside these changes some refactoring of how map data is handled is attempted, using a new MapData structure to keep track of information utilised in the lowering of mapped values. The initial addition of a more complex createDeviceArgumentAccessor that utilises capture kinds similarly to (and loosely based on) Clang to generate different kernel argument accesses is also added. A similar function for altering how the kernel argument is passed to the kernel argument structure on the host is also utilised (createAlteredByCaptureMap), which allows modification of the pointer/basePointer based on their capture (and bounds information). It's of note ByRef, is the default for explicit mappings and ByCopy will be the default for implicit captures, so the former is currently tested in this patch and the latter is not for the moment.	2023-10-30 16:00:23 +01:00
Brad Smith	0a29879e41	[OpenMP] Add missing bit with the Hurd support (#70609 ) Looking at `855d09855d` it looks like a bit was missing. The padding variable is used further down by the KMP_ALLOCA() function.	2023-10-29 22:35:03 -04:00
Brad Smith	0d1da7c37f	[OpenMP] Make use of getloadavg() on *BSD OS's (#70586 ) OpenBSD does not have /proc filesystem, neither does FreeBSD (by default).	2023-10-29 18:30:11 -04:00
Konstantinos Parasyris	d6a3d6b96d	[openmp] Fixed Support for VA for record-replay. (#70396 ) The commit was discussed in phabricator (https://reviews.llvm.org/D157186). Record replay currently fails on AMD as it conflicts with the heap memory allocator introduced in #69806. The workaround is setting `LIBOMPTARGET_HEAP_SIZE=0` during both record and replay run.	2023-10-29 12:27:19 -07:00
Johannes Doerfert	d346c82435	[OpenMP] Associate the KernelEnvironment with the GenericKernelTy (#70383 ) By associating the kernel environment with the generic kernel we can access middle-end information easily, including the launch bounds ranges that are acceptable. By constraining the number of threads accordingly, we now obey the user-provided bounds that were passed via attributes.	2023-10-29 11:35:34 -07:00
Brad Smith	223852aecf	[OpenMP] Fix building for 32-bit DragonFly, NetBSD, OpenBSD (#70527 ) Fixing ```#error "Unknown or unsupported OS"```	2023-10-27 22:53:24 -04:00
Konstantinos Parasyris	01828c4323	[OpenMP] record-replay use static-cast (#70516 ) [OpenMP] Fixes #69905	2023-10-27 16:46:34 -07:00
Shraiysh	03485a0406	[openmp][flang] Add tests for map clause (#70394 ) This patch adds basic tests for map clause on target construct for commonblocks. There will be more tests to add, which will be added in future patches. Currently failing tests are added in a separate folder with XFAIL. They should be moved as they are fixed.	2023-10-27 09:35:06 -07:00
Mehdi Amini	f390a76b7e	Revert "Revert "[OpenMP][NFC] Add min/max threads/teams count into the KernelEnvironment (#70257 )"" This reverts commit `ddbaa11e9f`. Reapply the original commit, the broken test was repaired in `5e51363f38` in the meantime.	2023-10-26 17:30:01 -07:00
Mehdi Amini	ddbaa11e9f	Revert "[OpenMP][NFC] Add min/max threads/teams count into the KernelEnvironment (#70257 )" This reverts commit `c2a1249a82`. The MLIR bots are broken with an omp test failure.	2023-10-26 17:25:20 -07:00
Johannes Doerfert	c2a1249a82	[OpenMP][NFC] Add min/max threads/teams count into the KernelEnvironment (#70257 ) The runtime needs to know about the acceptable launch bounds, especially if the compiler (middle- or backend) assumed those bounds. While this patch does not yet inform the runtime, it stores the bounds in a place that can/will be accessed and is associated with the kernel.	2023-10-26 14:46:55 -07:00
Johannes Doerfert	0012b956f9	[OpenMP][FIX] Move workaround code to avoid races The workaround code ensure we always call __kmpc_kernel_parallel, but it did so in a racy manner as the initialization might not have been completed yet. To avoid introducing a sync, we move the workaround into the deinit function for now.	2023-10-26 14:38:23 -07:00
Joseph Huber	cee08ff342	[Libomptarget] Do not pass 'nogpulib' to the non-LTO Nvidia tests (#70327 ) Summary: For the other tests we pass `-nogpulib` to ensure that we set up the needed libraries correctly. However, this caused problems for the non-LTO build and test of Nvidia systems. In general this is because we would do a separate compile of the libomptarget device runtime and then link in that cubin. This exercised the runtime in a lot of ways it's not used to, since doing things this way was hardly expected or tested. This patch disables it only for the Nvidia non-LTO build so that we still get the effect of `--liboimptarget-nvptx-bc-path` rather than ignoring it.	2023-10-26 10:36:34 -05:00
Yuanfang Chen	f09f58d0f2	[OpenMP] [OMPD] Fix CMake install command https://cmake.org/cmake/help/latest/command/install.html "If a relative path is given it is interpreted relative to the value of the CMAKE_INSTALL_PREFIX variable."	2023-10-26 03:02:53 +00:00
Joseph Huber	17b5445996	[Libomptarget] Add a wavefront sync builtin for the AMDGPU implementation (#70228 ) Summary: While this is technically a no-op for AMDGPU hardware, in cases where the user would see fit to add an explicit wavefront sync on Nvidia hardware, we should also inform the LLVM optimizer that this control flow is convergent so we do not reorder blocks.	2023-10-25 14:27:14 -05:00
Joseph Huber	006cd37960	[OpenMP][Obvious] Fix incorrect variant selector in test Summary: This should be `kind` and not `arch`.	2023-10-25 13:48:30 -05:00
Joseph Huber	ca3545f0ef	[Libomptarget] Bump up PTX version from +ptx61 to +ptx63 (#70227 ) Summary: This version is required to support the 'activemask' feature which is used for certain features, such as reductions. This ties the implementation of the DeviceRTL roughly to the features provided by the CUDA 9.0 release, which should be sufficienly old as to not cause problems since this is a minor version jump that corresponds to the release of `sm_53`.	2023-10-25 13:28:02 -04:00
Joseph Huber	8a181f43da	[OpenMP][Obvious] Fix incompatbile function prototype causing failures Summary: This function needs `void` as the arguments to be ABI compatbile with what is actually defined. This is enforced when doing CUDA separable linking of the runtime.	2023-10-25 10:44:07 -05:00
Joseph Huber	84d8ace51a	[OpenMP][Obvious] Fix function prototype when used in C mode Summary: The `llvm_omp_target_dynamic_shared_alloc` prototype in `omp.h` accidentally left the void argument unspecified. This created unintended code when called from the C language, causing some `nvlink` failures in certain scenarios.	2023-10-25 09:35:23 -05:00
Ilya Leoshkevich	f7fc98a1cf	[OpenMP][Archer] Do not check for column numbers in backtraces (#70075 ) TSan can show only line numbers on some platforms, e.g., SystemZ. Skip checking the column numbers; line numbers should be enough to verify that race detection is working.	2023-10-25 13:22:24 +02:00
Ilya Leoshkevich	77c2b623ca	[OpenMP][Tests] Sync struct DEP with the runtime (#69982 ) struct DEP defined in multiple testcases must correspond to runtime's struct kmp_depend_info. The former defines flags as int, and the latter as kmp_uint8_t. This discrepancy goes unnoticed on little-endian systems, but breaks big-endian ones. Make flags in struct DEP unsigned char.	2023-10-24 19:40:08 +02:00
Ilya Leoshkevich	34459b72da	[OpenMP] Provide big-endian bitfield definitions (#69995 ) structs kmp_depend_info.flags and kmp_tasking_flags contain bitfields, which overlay integer flag values. The current bitfield definitions target little-endian machines. On big-endian machines bitfields are laid out in the opposite order, so the current definitions do not work there. There are two ways to fix this: either provide big-endian bitfield definitions, or bit-swap integer flag values. Go with the former, since it's localized to one place and therefore is more maintainable.	2023-10-24 19:39:50 +02:00
Jon Chesterfield	840d0b7e03	[amdgpu] D2D memcpy via streams and HSA (#69977 ) hsa_amd_memory_async_copy can handle device to device copies if passed the corresponding parameters. No functional change - currently D2D copy goes through a fallback in libomptarget that stages through a host malloc, after this it goes directly through HSA. Works under exactly the situations that HSA works. Verified locally on a performance benchmark. Hoping to attract further testing from internal developers after it lands.	2023-10-24 00:05:04 +01:00
Johannes Doerfert	86bb713142	[OpenMP][FIX] Enlarge thread state array, improve test and add second	2023-10-22 17:47:00 -07:00
Johannes Doerfert	9f3b06d8be	[OpenMP][FIX] Fix memset oversight to partially unblock test The tests "unoptimized" version is still broken, disabled for now.	2023-10-22 14:29:11 -07:00
Johannes Doerfert	f3ff0a67be	[OpenMP][FIX] Ensure test runs correct with (at least) 2 threads	2023-10-22 13:22:36 -07:00
Johannes Doerfert	87dac9f168	[OpenMP] Rewrite test to check the correct (CPU) result The test initially showed we do no crash but compute the wrong GPU result, now we show the CPU result is correct and disable GPU testing.	2023-10-21 14:55:15 -07:00
Johannes Doerfert	d3921e4670	[OpenMP] Basic BumpAllocator for (AMD)GPUs (#69806 ) The patch contains a basic BumpAllocator for (AMD)GPUs to allow us to run more tests. The allocator implements `malloc`, both internally and externally, while we continue to default to the NVIDIA `malloc` when we target NVIDIA GPUs. Once we have smarter or customizable allocators we should consider this choice, for now, this allocator is better than none. It traps if it is out of memory, making it easy to debug. Heap size is configured via `LIBOMPTARGET_HEAP_SIZE` and defaults to 512MB. It allows to track allocation statistics via `LIBOMPTARGET_DEVICE_RTL_DEBUG=8` (together with `-fopenmp-target-debug=8`). Two tests were added, and one was enabled. This is the next step towards fixing https://github.com/llvm/llvm-project/issues/66708	2023-10-21 14:49:30 -07:00
Johannes Doerfert	d571af7f62	[OpenMP][FIX] Ensure thread states do not crash on the GPU The nested parallelism causes thread states which still do not properly work but at least don't crash anymore.	2023-10-21 14:43:09 -07:00
Johannes Doerfert	1cea309b7e	[OpenMP][NFC] Move DebugKind to make it reusable from the host	2023-10-20 19:28:09 -07:00
Joseph Huber	34a3fb9f62	[Libomptarget][NFC] Remove use of VLA in the AMDGPU plugin (#69761 ) Summary: We should not rely on a VLA in C++ for the handling of this string. The size is a true runtime value so we cannot rely on constexpr handling. We simply use a small vector, whose default size is most likely large enough to handle whatever size gets output within the stack, but is safe in cases where it is not.	2023-10-20 16:02:51 -04:00
Michael Klemm	f93a697e47	[libomptarget][OpenMP] Initial implementation of omp_target_memset() and omp_target_memset_async() (#68706 ) Implement a slow-path version of omp_target_memset*() There is a TODO to implement a fast path that uses an on-device kernel instead of the host-based memory fill operation. This may require some additional plumbing to have kernels in libomptarget.so	2023-10-19 15:29:36 +02:00
Joseph Huber	970e7456e0	[Libomptarget] Add a test for the `libc` implementation of assert (#69518 ) Summary: The `libcgpu.a` file provides its own implementation of `__assert_fail`. This adds a test to make sure it's usable in OpenMP offloading as expected. Currently this requires linking `libcgpu.a` before the OpenMP device RTL however. We also disable the test on the CPU as the format of the string will be different.	2023-10-19 08:55:45 -04:00
Joseph Huber	b69081e324	Attributes (#69358 ) - [Libomptarget] Make the references to 'malloc' and 'free' weak. - [Libomptarget][NFC] Use C++ style attributes instead	2023-10-18 12:52:43 -04:00
Joseph Huber	1e5fe67e70	[Libomptarget] Make the references to 'malloc' and 'free' weak. (#69356 ) Summary: We use `malloc` internally in the DeviceRTL to handle data globalization. If this is undefined it will map to the Nvidia implementation of `malloc` for NVPTX and return `nullptr` for AMDGPU. This is somewhat problematic, because when using this as a shared library it causes us to always extract the GPU libc implementation, which uses RPC and thus requires an RPC server. Making this `weak` allows us to implement this internally without worrying about binding to the GPU `libc` implementation.	2023-10-18 12:50:23 -04:00
Jon Chesterfield	7ac516a119	[amdgpu] Disable openmp test that is blocking CI after changing hardware, need to diagnose memory fault	2023-10-16 13:59:49 +01:00
Kazu Hirata	18d199116f	Stop including llvm/ADT/STLFunctionalExtras.h (NFC) These source files do not use function_ref.	2023-10-13 20:50:58 -07:00
JP Lehr	b2a67255be	[OpenMP] Disable flaky libomptarget AMDGPU test We observe intermittent failures of that test and need some time to investigate. Hence, for now, we disable it.	2023-10-10 13:09:29 -05:00
Joseph Huber	4e9054d391	[Libomptarget] Fix lookup of the `libcgpu.a` library Summary: The `libcgpu.a` library was added to support certain libc functions. A recent patch made us pass its location directly on the commandline, however it used `find_library`. This doesn't work because the ordering of CMake might run `fine_library` before it builds the library we're trying to find. This patch changes this to just use the destimation we know it will end up in and checks it manually.	2023-10-05 10:48:56 -05:00
Joseph Huber	75e648031c	[Libomptarget] Disable AMDGPU complex math test after recent patch Summary: The recent patch added `-nogpulib` to make these tests only pick up what was intentionally put into them. This had the effect of removing the dependency on the ROCm device libs which are needed for math. This test disables the complex math test, which is the only one that needed it, for the time being. In the future we will implement these and provide it via the GPU `libm` and pass it in the same way as the GPU `libc`.	2023-10-04 15:24:43 -05:00
Joseph Huber	7282975057	[Libomptarget] Explicitly pass the OpenMP device libraries to tests (#68225 ) Summary: We have tests that depend on two static libraries `libomptarget.devicertl.a` and `libcgpu.a`. These are currently implicitly picked up and searched through the standard path. This patch changes that to pass `-nogpulib` to disable implicit runtime path searches. We then explicitly passed the built libraries to the compilations so that we know exactly which libraries are being used. Depends on: https://github.com/llvm/llvm-project/pull/68220 Fixes https://github.com/llvm/llvm-project/issues/68141	2023-10-04 14:14:30 -05:00
Joseph Huber	2d4d8c8f97	[Libomptarget] Make the DeviceRTL configuration globals weak (#68220 ) This patch applies weak linkage to the config globals by the name `__omp_rtl...`. This is because when passing `-nogpulib` we will not link in or create these globals. This allows the OpenMP device RTL to be self contained without requiring the additional definitions from the `clang` compiler. In the standard case, this should not affect the current behavior, this is because the strong definition coming from the compiler should always override the weak definition we default to here. In the case that these are not defined by the compiler, these will remain weak. This will impact optimizations somewhat, but the previous behavior was that it would not link so that is an improvement. Depends on: https://github.com/llvm/llvm-project/pull/68215	2023-10-04 14:14:13 -05:00
Joseph Huber	49d8a559d3	[LinkerWrapper] Fix resolution of weak symbols during LTO (#68215 ) Summary: Weak symbols are supposed to have the semantics that they can be overriden by a strong (i.e. global) definition. This wasn't being respected by the LTO pass because we simply used the first definition that was available. This patch fixes that logic by doing a first pass over the symbols to check for strong resolutions that could override a weak one. A lot of fake linker logic is ending up in the linker wrapper. If there were an option to handle this in `lld` it would be a lot cleaner, but unfortunately supporting NVPTX is a big restriction as their binaries require the `nvlink` tool.	2023-10-04 14:13:52 -05:00
agozillon	1482106c99	[Flang][OpenMP][MLIR] Remove deletion of unused declare target global after use replacement (#67762 ) At the moment, for device a reference pointer is generated in place of the original declare target global value, this reference pointer is the pointer that actually receives the data. In Clang the original global value isn't generated for device, just the reference pointer. Unfortunately for Flang/MLIR this is currently not the case, as the declare target attribute is processed after the creation of the global so we end up with a dead global on device effectively after rewriting its uses to the new device reference pointer. It appears I was a little overzealous with the deletion of the declare target globals for device. The current method breaks in-cases where the same declare target global is used across two target regions (added a runtime reproduced in the patch). As it'll effectively delete it before the second target gets a chance to be written to LLVM IR and have it's uses rewritten . I'd like to remove this deletion as the dead global isn't breaking any code and will likely be removed in later dead code elimination passes, perhaps a little too heavy handed with the original approach.	2023-10-03 15:21:27 +02:00
Leandro Lupori	5833a9e99a	[OpenMP] Fix -Wc++98-compat-extra-semi warning (NFC) (#68022 ) Compiling OpenMP with LLVM 16 triggers the following warning: warning: extra ';' outside of a function is incompatible with C++98	2023-10-02 16:43:02 -03:00
Shilei Tian	103bb69c04	[OpenMP] Fix a potential memory buffer overflow (#67252 ) #67167 reports a potential memory overflow caused by the wrong size passed to the function `memcpy_s`. This patch fixes it. Fix #67167.	2023-09-29 12:41:32 -04:00
Joseph Huber	183a1b1e38	[OpenMP] Enable the 'libc/malloc.c' test on NVPTX Summary: Previously this test hanged indefinitely on NVPTX. This was due to an issue fixed previously where we would wait indefinitely inside the CUDA runtime waiting for the kernel to complete if it was blocked on the RPC server. This patch enables this test again now that it can run without deadlocking, at least on CUDA 12.2.	2023-09-28 14:41:35 -05:00
Joseph Huber	0f88be77ea	[Libomptarget] Fix Nvidia offloading hanging on dataRetrieve using RPC (#66817 ) Summary: The RPC server is responsible for providing host services from the GPU. Generally, the client running on the GPU will spin in place until the host checks the server. Inside the runtime, we elected to have the user thread do this checking while it would be otherwise waiting for the kernel to finish. However, for Nvidia this caused problems when offloading to a target region that requires a copy back. This is caused by the implementation of `dataRetrieve` on Nvidia. We initialize an asynchronous copy-back on the same stream that the kernel is running on. This creates an implicit sync on the kernel to finish before we issue the D2H copy, which we then wait on. This implicit sync happens inside of the CUDA runtime. This is problematic when running the RPC server because we need someone to check the RPC server. If no one checks the RPC server then the kernel will never finish, meaning that the memcpy will never be issued and the program hangs. This patch adds an explicit check for unfinished work on the stream and waits for it to complete.	2023-09-26 16:03:34 -05:00
Joseph Huber	791b279924	[libc] Change the `puts` implementation on the GPU (#67189 ) Summary: Normally, the implementation of `puts` simply writes a second newline charcter after printing the first string. However, because the GPU does everything in batches of the SIMT group size, this will end up with very poor output where you get the strings printed and then 1-64 newline characters all in a row. Optimizations like to turn `printf` calls into `puts` so it's a good idea to make this produce the expected output. The least invasive way I could do this was to add a new opcode. It's a little bloated, but it avoids an unneccessary and slow send operation to configure this.	2023-09-25 11:17:22 -05:00
Andrew Gozillon	76916669b9	[MLIR][OpenMP] Initial Lowering of Declare Target for Data This patch adds initial lowering for DeclareTargetAttr on GlobalOp's utilising registerTargetGlobalVariable and getAddrOfDeclareTargetVar from the OMPIRBuilder. It also adds initial processing of declare target map operands, populating the combinedInfo that the OMPIRBuilder requires to generate kernels and it's kernel argument structure. The combination of these additions allows simple mapping of declare target globals to Target regions, as such a simple runtime test showcasing this and testing it has been added. The patch currently does not factor in filtering based on device_type clauses (e.g. no emission of globals for device if host specified), this will come in a future iteration. And for the moment it's only been tested with 1-D arrays and basic fortran data types, more complex types (such as user defined derived types from Fortran, allocatables or Fortran pointers) may need further work. reviewers: kiranchandramohan, skatrak Differential Revision: https://reviews.llvm.org/D149368	2023-09-20 13:31:15 -05:00
Kazushi Marukawa	7b8130c2c3	[OpenMP][VE] Limit the number of threads to create (#66729 ) VE supports up to 64 threads per a VE process. So, we limit the number of threads defined by KMP_MAX_NTH. We also modify the __kmp_sys_max_nth initialization to use KMP_MAX_NTH as a limit.	2023-09-20 17:44:24 +09:00
Arthur Eubanks	0a1aa6cda2	[NFC][CodeGen] Change CodeGenOpt::Level/CodeGenFileType into enum classes (#66295 ) This will make it easy for callers to see issues with and fix up calls to createTargetMachine after a future change to the params of TargetMachine. This matches other nearby enums. For downstream users, this should be a fairly straightforward replacement, e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive or s/CGFT_/CodeGenFileType::	2023-09-14 14:10:14 -07:00
Ye Luo	8c2da6bb7f	[libomptarget] document ActionFunctions in the amdgpu plugin. (#66397 )	2023-09-14 12:18:49 -05:00
Terry Wilmarth	102d864719	Fix /tmp approach, and add environment variable method as third fallback during library registration The /tmp fallback for /dev/shm did not write to a fixed filename, so multiple instances of the runtime would not be able to detect each other. Now, we create the /tmp file in much the same way as the /dev/shm file was created, since mkstemp approach would not work to create a file that other instances of the runtime would detect. Also, add the environment variable method as a third fallback to /dev/shm and /tmp for library registration, as some systems do not have either. Also, add ability to fallback to a subsequent method should a failure occur during any part of the registration process. When unregistering, it is assumed that the method chosen during registration should work, so errors at that point are ignored. This also avoids a problem with multiple threads trying to unregister the library.	2023-09-13 13:50:49 -05:00
Rodrigo Ceccato de Freitas	f94b6f3396	[OpenMP] Remove optimization skipping reduction struct initialization (#65697 ) This commit removes an optimization that skips the initialization of the reduction struct if the number of threads in a team is 1. This optimization caused a bug with Hidden Helper Threads. When the task group is initially initialized by the master thread but a Hidden Helper Thread executes a target nowait region, it requires the reduction struct initialization to properly accumulate the data. This commit also adds a LIT test for issue #57522 to ensure that the issue is properly addressed and that the optimization removal does not introduce any regressions. Fixes: #57522	2023-09-12 16:09:16 -05:00
Kazushi Marukawa	e8679b93da	[OpenMP][test][VE] Limit the number of AFFINITY_MAX_CPUS for VE (#65872 ) Limit the number of AFFINITY_MAX_CPUS for VE because VE's sched_getaffinity doesn't work correctly with large sized mask buffer.	2023-09-12 23:45:56 +09:00
Saiyedul Islam	466a8149b3	Revert "[AMDGPU] Make default AMDHSA Code Object Version to be 5 (#65410 )" (#66060 ) This reverts commit `0a8d17e79b`.	2023-09-12 15:13:59 +05:30
Saiyedul Islam	0a8d17e79b	[AMDGPU] Make default AMDHSA Code Object Version to be 5 (#65410 ) Also update LIT tests and docs. For more details, see https://llvm.org/docs/AMDGPUUsage.html#code-object-v5-metadata Reviewed By: arsenm, jhuber6 Github PR: #65410 Differential Revision: https://reviews.llvm.org/D129818	2023-09-12 13:53:31 +05:30
Ye Luo	6c8248e38b	[libomptarget] Rename AMDGPUSignalTy member Signal to HSASignal.	2023-09-11 22:42:34 -05:00
Ye Luo	08352b99a4	[libomptarget][NFC] update comments.	2023-09-11 22:30:25 -05:00
Michael Halkenhäuser	12ac0f6ede	[OpenMP][DeviceRTL][AMDGPU] Add missing libomptarget build targets (#65964 ) Extend CMake variable `all_amdgpu_architectures` by `gfx941` and `gfx942`.	2023-09-11 15:43:51 +02:00
Michael Halkenhäuser	53602e6193	[OpenMP][OMPT] Fix device identifier collision during callbacks (#65595 ) Fixes: https://github.com/llvm/llvm-project/issues/65104 When a user assigns devices to target regions it may happen that different identifiers will map onto the same id within different plugins. This will lead to situations where callbacks will become much harder to read, as ambiguous identifiers are reported. We fix this by collecting the index-offset upon general RTL initialization. Which in turn, allows to calculate the unique, user-observable device id.	2023-09-11 12:11:44 +02:00
Ye Luo	c9733b8a9e	[libomptarget][NFC]Rename targetDataMapper to targetDat in interface.cpp (#65915 ) In omptarget.cpp, there is a targetDataMapper which does mapper related operations. In interface.cpp, targetDataMapper function template is simply needed for handling TargetAsyncInfoTy. Thus it is better to name it as simple as targetData similar to targetKernel.	2023-09-10 21:08:11 -05:00
Tianlan Zhou	057564fec5	Fix some typos in comments: evalute -> evaluate (NFC) (#65906 )	2023-09-11 04:11:06 +08:00
Kazushi Marukawa	f8efa65ca5	[OpenMP][test][VE] Change to use VE_LD_LIBRARY_PATH for VE (#65869 ) Change to use VE_LD_LIBRARY_PATH for VE instead of LD_LIBRARY_PATH. The VE is connected to the host, and compiled test programs for VE is invoked on the host and transferred to the VE. If programs are compiled for the host, we use LD_LIBRARY_PATH. Otherwise, we use VE_LD_LIBRARY_PATH.	2023-09-10 12:07:16 +09:00
Kazushi (Jam) Marukawa	18b6724355	[OpenMP][VE] Support OpenMP runtime on VE Support OpenMP runtime library on VE. This patch makes OpenMP compilable for VE architecture. Almost all tests run correctly on VE. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D159401	2023-09-10 08:29:53 +09:00
Shilei Tian	600e0cde3e	[NFC][OpenMP] clang-format `openmp/libomptarget/src/interface.cpp`	2023-09-08 23:01:39 -04:00
Brad Smith	7e31b45d6a	[OpenMP] Use the more appropriate function to retrieve the thread id on OpenBSD (#65553 ) Use the getthrid() function instead of a syscall.	2023-09-07 21:05:25 -04:00
Shilei Tian	010a5a737b	[OpenMP] Fix build issue with `libomp` when OMPT is disabled	2023-09-06 23:40:24 -04:00
Brad Smith	fd4c80dec9	[OpenMP] Fix gettid warnings on DragonFly (#65549 ) Define __kmp_gettid() as appropriate for DragonFly.	2023-09-06 20:21:11 -04:00
Shilei Tian	99d67fb9aa	[OpenMP] Align up the size when calling aligned_alloc (#65525 ) Based on https://en.cppreference.com/w/c/memory/aligned_alloc, the `size` is supposed to be a multiple of `alignment`, and it is implementation defined behavior if not. We have a non-conformant use in `kmp_barrier.h` when allocating distribute barrier. The size of the barrier is 576 and the alignment is `4*CACHE_LINE`, which is 256 on most systems. Apparently it works perfectly fine for Linux and Intel-based Mac, but not for Apple Silicon based Mac. Fix #63194.	2023-09-06 16:28:07 -04:00
Joseph Huber	460840c09d	[OpenMP] Support 'omp_get_num_procs' on the device (#65501 ) Summary: The `omp_get_num_procs()` function should return the amount of parallelism availible. On the GPU, this was not defined. We have elected to define this function as the maximum amount of wavefronts / warps that can be simultaneously resident on the device. For AMDGPU this is the number of CUs multiplied byth CU's per wave. For NVPTX this is the maximum threads per SM divided by the warp size and multiplied by the number of SMs.	2023-09-06 13:45:05 -05:00
Shilei Tian	ff5c7261ef	[OpenMP] Fix a wrong assertion in `__kmp_get_global_thread_id` The function assumes that `__kmp_gtid_get_specific` always returns a valid gtid. That is not always true, because when creating the key for thread-specific data, a destructor is assigned. The dtor will be called at thread exit. However, before the dtor is called, the thread-specific data will be reset to NULL first (https://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_key_create.html): > At thread exit, if a key value has a non-NULL destructor pointer, and the thread > has a non-NULL value associated with that key, the value of the key is set to NULL. This will lead to that `__kmp_gtid_get_specific` returns `KMP_GTID_DNE`. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D159369	2023-09-06 12:21:43 -04:00
Shilei Tian	518b08c193	[OpenMP] Fix issue of indirect function call in `__kmpc_fork_call_if` (#65436 ) The outlined function is typically invoked by using `__kmp_invoke_microtask`, which is written in asm. D138495 introduces a new interface function for parallel region for OpenMPIRBuilder, where the outlined function is called via the function pointer. For some reason, it works perfectly well on x86 and x86-64 system, but doesn't work on Apple Silicon. The 3rd argument in the callee is always `nullptr`, even if it is not in caller. It appears `x2` always contains `0x0`. This patch adopts the typical method to invoke the function pointer. It works on my M2 Ultra Mac. Fix #63194.	2023-09-06 12:17:45 -04:00
Fangrui Song	678e3ee123	[lldb] Fix duplicate word typos; NFC Those fixes were taken from https://reviews.llvm.org/D137338	2023-09-01 21:32:24 -07:00
Ethan Luis McDonough	2b6ba8c735	[openmp] Tighten flang detection in offloading test This patch ensures that the locally built version of flang when building in-tree. `find_program` sometimes used the wrong executable if a different copy of flang was installed. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D159161	2023-09-01 13:59:18 -05:00
Martin Storsjö	c2019c416c	[OpenMP] [test] Fix target_thread_limit.cpp to not assume 4 or more cores Previously, the test ran a section with #pragma omp target thread_limit(4) and expected it to execute exactly 4 times, even though it would in practice execute min(cores, 4) times. Increment a counter and check that it executed 1-4 times. Differential Revision: https://reviews.llvm.org/D159311	2023-09-01 21:16:58 +03:00
Jan Leyonberg	a0e3418bc8	[flang][OpenMP] Add fortran test with basic target region This patch adds a test that uses a target region to set a scalar value. It also adds rules in lit.cfg to handle fortran testing. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D159216	2023-09-01 09:26:36 -04:00
Shilei Tian	35fdf8d703	[OpenMP] Fix a segment fault in __kmp_get_global_thread_id In `__kmp_get_global_thread_id`, if the gtid mode is 1, after getting the gtid from TLS, it will store the gtid value to the thread stack maintained in the thread descriptor. However, `__kmp_get_global_thread_id` can be called when the library is destructed, after the corresponding thread info has been release. This will cause a segment fault. This can happen on an Intel-based Mac. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D159324	2023-08-31 21:15:28 -04:00
Joseph Huber	ccb1d183c3	[OpenMP][Docs] Remove old entry saying static libraries are unsupported Summary: Static libraries have been supported since LLVM 15.0, this entry is misleading and should be removed.	2023-08-30 06:48:57 -05:00
Ethan Luis McDonough	9e3d59e4c2	[openmp] Fix flang detection for offloading test This patch fixes the flang detection in the openmp fortran offloading test. Reviewed By: jsjodin Differential Revision: https://reviews.llvm.org/D158546	2023-08-29 16:31:03 -05:00
Martin Storsjö	81ecc887aa	[OpenMP] Export __kmpc_set_thread_limit on Windows This fixes the new test target/target_thread_limit.cpp on Windows, which was added recently in `08bbff4aad` / https://reviews.llvm.org/D152054. Differential Revision: https://reviews.llvm.org/D159070	2023-08-29 23:22:21 +03:00
Saiyedul Islam	f616c3eeb4	[OpenMP][DeviceRTL][AMDGPU] Support code object version 5 Update DeviceRTL and the AMDGPU plugin to support code object version 5. Default is code object version 4. CodeGen for __builtin_amdgpu_workgroup_size generates code for cov4 as well as cov5 if -mcode-object-version=none is specified. DeviceRTL compilation passes this argument via Xclang option to generate abi-agnostic code. Generated code for the above builtin uses a clang control constant "llvm.amdgcn.abi.version" to branch on the abi version, which is available during linking of user's OpenMP code. Load of this constant gets eliminated during linking. AMDGPU plugin queries the ELF for code object version and then prepares various implicitargs accordingly. Differential Revision: https://reviews.llvm.org/D139730 Reviewed By: jhuber6, yaxunl	2023-08-29 06:35:44 -05:00
Anton Rydahl	c1b5674fbb	[OpenMP] Change OpenMP default version in documentation and help text for -fopenmp-version As discussed on the weekly OpenMP meeting on the second of August 2023, the default version in the OpenMP documentation shoud be changed from OpenMP 5.0 to 5.1. Differential Revision: https://reviews.llvm.org/D156901	2023-08-28 19:05:55 -07:00
Doru Bercea	2102ed0b91	Fix for openmp tests honoring thread_limit. Diff: https://reviews.llvm.org/D159001	2023-08-28 13:04:17 -04:00
Doru Bercea	5fe6f56563	Disable intermittently failing OpenMP test. Diff: https://reviews.llvm.org/D159003	2023-08-28 12:56:22 -04:00
Doru Bercea	41bb5ef11f	Add passing test for issue 64797.	2023-08-28 09:55:56 -04:00
Joachim Jenke	1880d8f5c1	[OpenMP][Archer] Add support for taskwait depend At the moment Archer segfaults due to a null-pointer access, if an application uses taskwait with depend clause as used in the two new tests. This patch cleans up the task_schedule function, moves semantic blocks into functions and replaces the if blocks by a single switch statement. The switch statement will warn, when new enum values are added in OMPT and makes clear what code is executed for the different cases. With free-agent tasks coming up in OpenMP 6.0, we should expect more null-pointer task_data, so additional null-pointer checks were added. We also cannot rely on having an implicit task on the stack, so the BarrierIndex is stored during task creation. Differential Revision: https://reviews.llvm.org/D158072	2023-08-28 09:43:24 +02:00
Joachim Jenke	cec855af3e	[OpenMP][OMPT] Fix ompt_get_task_memory implementation Since td_allow_completion_event is a member of the taskdata struct, not all firstprivate/shared variables are stored at the end of the task memory allocation. Simply report the whole allocation instead. Furthermore, the function should always return 0 since in no case there is another block to report. Differential Review: https://reviews.llvm.org/D158080	2023-08-28 09:19:52 +02:00
Sandeep Kosuri	08bbff4aad	[OpenMP] Codegen support for thread_limit on target directive for host offloading - This patch adds support for thread_limit clause on target directive according to OpenMP 51 [2.14.5] - The idea is to create an outer task for target region, when there is a thread_limit clause, and manipulate the thread_limit of task instead. This way, thread_limit will be applied to all the relevant constructs enclosed by the target region. Differential Revision: https://reviews.llvm.org/D152054	2023-08-26 22:18:49 -05:00
Shilei Tian	fbcce33706	[OpenMP] Honor `thread_limit` value when choosing grid size D152014 introduced an optimization that favors more smaller blocks over fewer larger blocks, even if user sets `thread_limit` explicitly. This patch changes the behavior to honor user value. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D158802	2023-08-26 22:17:49 -04:00
Joseph Huber	aa78e94b0b	[Libomptarget] Support mapping indirect host calls to device functions The changes in D157738 allowed for us to emit stub globals on the device in the offloading entry section. These globals contain the addresses of device functions and allow us to map host functions to their corresponding device equivalent. This patch provides the initial support required to build a table on the device to lookup the associated value. This is done by finding these entries and creating a global table on the device that can be searched with a simple binary search. This requires an allocation, which supposedly should be automatically freed at plugin shutdown. This includes a basic test which looks up device pointers via a host pointer using the added function. This will need to be built upon to provide full support for these calls in the runtime. To support reverse offloading it would also be useful to provide a reverse table that allows us to get host functions from device stubs. Depends on D157738 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D157918	2023-08-25 18:51:56 -05:00
Michael Halkenhaeuser	9300b6de3c	[OpenMP][OMPT] Add OMPT support for `generic-elf-64bit` plugin Fixes: https://github.com/llvm/llvm-project/issues/64487 Connect OMPT during plugin initialization and enable corresponding tests. Avoid linking OMPT when corresponding support is disabled. Depends on D158542 Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D158543	2023-08-25 13:53:11 -04:00
Johannes Doerfert	a01398156a	[OpenMPOpt][FIX] Ensure to propagate information about parallel regions Before, we checked the parallel region only once, and ignored updates in the KernelInfo for the parallel region that happened later. This caused us to think nested parallel sections are not present even if they are, among other things.	2023-08-25 10:46:56 -07:00
Michael Halkenhaeuser	275259eb9a	[OpenMP] Add `getComputeUnitKind` to generic-elf-64bit plugin Make the generic-plugin report a corresponding CU kind -- instead of 'unknown'. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D158542	2023-08-25 07:14:43 -04:00
Johannes Doerfert	d2c37fc4f7	[Attributor][FIX] Avoid dangling stack references in map The old code did not account for new queries during an update, which caused us to leave stack RQIs in the map. We are now explicit about temporary vs non-temporary RQIs. Fixes: https://github.com/llvm/llvm-project/issues/64959	2023-08-24 16:28:10 -07:00
Johannes Doerfert	3611300a32	[OpenMP][FIX] Update tests after D157725	2023-08-24 16:28:10 -07:00
Aaron Jarmusch	1ff0bdb86d	[OpenMP] Fix Slice Duplicate in Profiler Fixed the broken commit - `6579021f02` Fix for the AMDGPU buildbot reported by @jplehr.	2023-08-24 20:52:15 +00:00
Aaron Jarmusch	6579021f02	[OpenMP] Fix Slice Duplicate in Profiler Using LIBOMPTARGET_PROFILER, duplicates are created from timing both Kernel functions and Data update functions. I commented out the duplicate timescope and left them in the targetkernel and the targetdataupdate functions. This way the timescope calls will be closer to the launching of the kernel and the data moving. Reviewed By: jdoerfert, tianshilei1992 Differential Revision: https://reviews.llvm.org/D157725	2023-08-24 19:41:37 +00:00
Johannes Doerfert	908ae84351	[OpenMP] Avoid assumptions at the end of a kernel When we used to treat the kernel end as as aligned barrier, assertions at the end made sense. Now, they actually cause problems as the "writes" are not ordered with regards to reads within the kernel. We can simply get rid of them.	2023-08-23 16:11:43 -07:00
Johannes Doerfert	80906ce48d	[OpenMP] Disable early vectorization of loads/stores in the runtime We are having a hard time optimizing some vectorized loads/stores later on which causes this optimization to degrade performance. Differential Revision: https://reviews.llvm.org/D158656	2023-08-23 15:14:14 -07:00
Johannes Doerfert	382b97554d	[OpenMP] Force the parallel abstraction to be inlined This is good for performance and compile time and the indirection (+ switch statements) is nothing that needs to be preserved.	2023-08-23 11:48:18 -07:00
Johannes Doerfert	81a02b0767	[Attributor][NFC] Precommit test	2023-08-23 11:48:18 -07:00
Johannes Doerfert	7481b465ae	[OpenMP] Use default grid value for static grid size If the user did not provide any static clause to override the grid size, we assume the default grid size as upper bound and use it to improve code generation through vendor specific attributes. Fixes: https://github.com/llvm/llvm-project/issues/64816 Differential Revision: https://reviews.llvm.org/D158382	2023-08-23 11:12:03 -07:00
Johannes Doerfert	c5488c8dcc	[OpenMP] Properly set static thread limit (w/o analysis) We used to have two separate implementations to derive the number of threads used in a target region. This lead us to sometimes miss out on user provided thread bounds (num_threads, or thread_limit) when we looked for "constant default values". If we might miss out on the presence of those bounds, we cannot set the thread_limit statically since the runtime will try to honor user input rather than cap it at the "preferred default". This patch replaces the secondary implementation with the primary in a mode that will not emit code but just look for the presence, and potentially upper bounds, of thread limiting clauses. The runtime test would not pass without this rewrite as we missed some clauses, set the static limit on the device to the preferred value, but then violated that value at runtime. Fixes: https://github.com/llvm/llvm-project/issues/64845 Differential Revision: https://reviews.llvm.org/D158381	2023-08-23 11:12:03 -07:00
Vadim Paretsky	6789dda762	[OpenMP] make small memory allocations in loop collapse code on the stack A few places in the loop collapse support code make small dynamic allocations that introduce a noticeable performance overhead when made on the heap. This change moves allocations up to 32 bytes to the stack instead of the heap. Differential Revision: https://reviews.llvm.org/D158220	2023-08-23 10:37:45 -07:00
Jonathan Peyton	99f5969565	[OpenMP] Let primary thread gather topology info for each worker thread This change has the primary thread create each thread's initial mask and topology information so it is available immediately after forking. The setting of mask/topology information is decoupled from the actual binding. Also add this setting of topology information inside the __kmp_partition_places mechanism for OMP_PLACES+OMP_PROC_BIND. Without this, there could be a timing window after the primary thread signals the workers to fork where worker threads have not yet established their affinity mask or topology information. Each worker thread will then bind to the location the primary thread sets. Differential Revision: https://reviews.llvm.org/D156727	2023-08-22 15:56:51 -05:00
Michael Halkenhaeuser	57f0bdc8fb	[OpenMP][OMPT] Fix `target enter data` callback ordering & reported device num This patch fixes: https://github.com/llvm/llvm-project/issues/64738 We observed multiple issues, primarily that the `DeviceId` was reported as -1 in certain scenarios. The reason for this is simply that the device is not initialized at that point. Hence, we need to move the RAII object creation just after the `checkDeviceAndCtors`, closer to the actual call we want to observe. This also solves an odering issue where one `target enter data` callback would be executed before the `Init` callback. Additionally, this change will also fix that the callbacks corresponding to `enter / exit data` and `update` in conjunction with `nowait` would not result in the emission of an OMPT callback. Added a testcase to cover initialized device number and `omp target` constructs. Reviewed By: dhruvachak Differential Revision: https://reviews.llvm.org/D157605	2023-08-22 13:12:09 -04:00
Kazu Hirata	11e2975810	Fx typos in documentation	2023-08-18 23:36:04 -07:00
Johannes Doerfert	9c08e76f3e	[Attributor] Introduce AAIndirectCallInfo AAIndirectCallInfo will collect information and specialize indirect call sites. It is similar to our IndirectCallPromotion but runs as part of the Attributor (so with assumed callee information). It also expands more calls and let's the rest of the pipeline figure out what is UB, for now. We use existing call promotion logic to improve the result, otherwise we rely on the (implicit) function pointer cast. This effectively "fixes" #60327 as it will undo the type punning early enough for the inliner to work with the (now specialized, thus direct) call. Fixes: https://github.com/llvm/llvm-project/issues/60327	2023-08-18 16:44:05 -07:00
Terry Wilmarth	f0221fb1d7	[OpenMP] Add option to use different units for blocktime This change adds the option of using different units for blocktimes specified via the KMP_BLOCKTIME environment variable. The parsing of the environment now recognizes units suffixes: ms and us. If a units suffix is not specified, the default unit is ms. Thus default behavior is still the same, and any previous usage still works the same. Internally, blocktime is now converted to microseconds everywhere, so settings that exceed INT_MAX in microseconds are considered "infinite". kmp_set/get_blocktime are updated to use the units the user specified with KMP_BLOCKTIME, and if not specified, ms are used. Added better range checking and inform messages for the two time units. Large values of blocktime for default (ms) case (beyond INT_MAX/1000) are no longer allowed, but will autocorrect with an INFORM message. The delay for determining ticks per usec was lowered. It is now 1 million ticks which was calculated as ~450us based on 2.2GHz clock which is pretty typical base clock frequency on X86: (1e6 Ticks) / (2.2e9 Ticks/sec) * (1e6 usec/sec) = 454 usec Really short benchmarks can be affected by longer delay. Update KMP_BLOCKTIME docs. Portions of this commit were authored by Johnny Peyton. Differential Revision: https://reviews.llvm.org/D157646	2023-08-18 14:01:13 -05:00
Johannes Doerfert	5eb7a427b0	[Attributor][NFC] Precommit tests	2023-08-17 22:42:38 -07:00
Johannes Doerfert	4fcd5f93d6	[OpenMPOpt] Mark more runtime functions as SPMD compatible Fixes: https://github.com/llvm/llvm-project/issues/64421	2023-08-17 18:33:24 -07:00
Joseph Huber	5717329f1a	[Libomptarget] Disable deadlocking bug49334.cpp test on AMDGPU This test hangs on AMDGPU sporadically, disable it for the time being. Fixes: https://github.com/llvm/llvm-project/issues/64733 Reviewed By: ronlieb Differential Revision: https://reviews.llvm.org/D158082	2023-08-16 10:24:00 -05:00
Michael Halkenhaeuser	41f3626f8b	[OpenMP][OMPT] Fix reported target pointer for data alloc callback This patch fixes: https://github.com/llvm/llvm-project/issues/64671 DataOp EMI callbacks would not report the correct target pointer. This is now alleviated by passing a `void**` into the function which emits the actual callback, then evaluating that pointer. Note: Since this is only done after the pointer has been properly updated, only `endpoint=2` callbacks will show a non-null value. Reviewed By: dhruvachak, jdoerfert Differential Revision: https://reviews.llvm.org/D157996	2023-08-16 06:39:10 -04:00
Ye Luo	1c822e1e82	[libomptarget] Avoid unintialized GenericPluginTy::NumDevices	2023-08-13 00:01:50 -05:00
Joseph Huber	06adac8c4e	[Libomptarget] Configure the RPC port count from the plugin This patch allows us to configure the port count to what the specific card would desire for parallelism. For AMDGPU we need to use the maximum number of hardware parallelism to avoid deadlocks. For NVPTX we don't have this problem due to the friendlier scheduler, so we use the number of warps active on an SM times the number of SMs as a good guess. Note that the max ports currently is going to be smaller than these numbers. That will be improved in the future. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D155903	2023-08-11 12:54:47 -05:00
Carlos Eduardo Seo	4dce6d3061	[OpenMP] Disable some offloading/api tests for AArch64 Like for x86_64-linux-gnu, these need to be disabled for aarch64-linux-gnu. Differential Revision: https://reviews.llvm.org/D156815	2023-08-07 20:26:39 +00:00
Michael Halkenhaeuser	7eba3e58d5	[OpenMP][AMDGPU] Add Envar for controlling HSA busy queue tracking If the Envar is set to true (default), busy HSA queues will be actively avoided when assigning a queue to a Stream. Otherwise, we will initialize a new HSA queue for each requested Stream, then default to round robin once the set maximum has been reached. Reviewed By: jdoerfert, kevinsala Differential Revision: https://reviews.llvm.org/D156996	2023-08-07 10:48:02 -04:00
Kevin Sala	b8e297d1af	[OpenMP][libomptarget] Improve kernel initialization in plugins This patch modifies the plugins so that the initialization of KernelTy objects is done in the init method. Part of the initialization was done in the constructKernelEntry method. Now this method is called constructKernel and only allocates and constructs a KernelTy object. This patch prepares the kernel class for the new implementation of device reductions. Differential Revision: https://reviews.llvm.org/D156917	2023-08-06 11:53:58 +02:00
Shilei Tian	14d57545b2	[NFC][OpenMP] Fix compile warnings introduced in recent patches	2023-08-05 19:38:45 -04:00
koparasy	73cb01dc8a	[OpenMP] Support for OpenMP-Offload Record Replay Enable record-replay for OpenMP offload kernels. On recording the initialization is performed on device initialization by reading env variables. (This is similar to the way rr used to operate). The primary change takes place in the replay phase with the replay tool explicitly initializing the record-replay functionality. Differential Revision: https://reviews.llvm.org/D156174 Fix	2023-08-05 00:46:06 -07:00
Johannes Doerfert	ed16143593	[OpenMP][FIX] Ensure __assert_fail is compatible with the host Fixes: https://github.com/llvm/llvm-project/issues/64360	2023-08-04 11:36:58 -07:00
Shilei Tian	fcf1a1022a	[OMPX] Change `thread_dim` to `block_dim` and the original `block_dim` to `grid_dim` There is no `threadDim` in CUDA. Instead, it is `blockDim`. Then the current `blockDim` is `gridDim` in CUDA. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D157051	2023-08-03 21:51:06 -04:00
Joseph Huber	c96cba3aea	[Libomptarget] Fix compilation of libomptarget with old GCC Summary: Older gcc can't figure out the copy elision and needs an explicit move.	2023-08-03 10:49:35 -05:00
Joseph Huber	46642cc83d	[Libomptarget] Remove debug RAII from libomptarget This feature was supposed to allow you to trace execution inside of Libomptarget. However, this never really worked properly. The printing was always reoganized, only worked for single threads, and pretty much only told you a handful of things about a runtime library that's an implementation detail to all users. Despite this, it contributed about 40% of the total filesize of the deviceRTL. This patch simply removes this functionalit which I think was past due. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D157001	2023-08-03 09:37:47 -05:00
Kevin Sala	4f46a48aaf	[OpenMP][libomptarget] Remove unused virtual functions in GenericKernelTy The virtual functions getDefaultNumBlocks and getDefaultNumThreads from the kernels are only forwarding the call to the generic device's ones. This patch removes those two functions from the kernels (and their derived ones). Now calls are made to the device's functions directly. Differential Revision: https://reviews.llvm.org/D156905	2023-08-02 17:18:50 +02:00
Kevin Sala	f7e0208a09	[OpenMP][libomptarget] Add missing field in GenericELF64bit grid values	2023-08-02 17:18:50 +02:00
Shilei Tian	ab15b11c41	[NFC][OMPT] Move `pop_macro` to the right place to fix compile warnings	2023-08-02 10:27:57 -04:00
Michael Halkenhaeuser	5b19f42b63	[OpenMP][AMDGPU] Single eager resource init + HSA queue utilization tracking This patch lazily initializes queues/streams/events since their initialization might come at a cost even if we do not use them. To further benefit from this, AMDGPU/HSA queue management is moved into the AMDGPUStreamManager of an AMDGPUDevice. Streams may now use different HSA queues during their lifetime and identify busy queues. When a Stream is requested from the resource manager, it will search for and try to assign an idle queue. During the search for an idle queue the manager may initialize more queues, up to the set maximum (default: 4). When no idle queue could be found: resort to round robin selection. With contributions from Johannes Doerfert <johannes@jdoerfert.de> Depends on D156245 Reviewed By: kevinsala Differential Revision: https://reviews.llvm.org/D154523	2023-08-02 08:22:26 -04:00
Johannes Doerfert	e5a3d5ba88	[OpenMP][NFC] Enable more runtime tests and also run them with O3 The test run fine on my AMD GPU machine, we should verify them on others too and put them into our regular testing. Not testing O1/2/3 is really bad and not testing all architecturs is similarly problematic. Differential Revision: https://reviews.llvm.org/D148576	2023-07-31 15:45:53 -07:00
Johannes Doerfert	63684550c4	[OpenMP][NFC] Add offloading tests for the new ompx APIs	2023-07-31 15:45:53 -07:00
Johannes Doerfert	deb0ea3e47	[OpenMP] Add ompx wrappers for __syncthreads Differential Revision: https://reviews.llvm.org/D156729	2023-07-31 13:44:51 -07:00
Johannes Doerfert	daef6d327a	[OpenMP] Introduce ompx.h and 3D wrappers (threadId, threadDim, ...) The new ompx.h header will give us a place to put extensions. The first are 3D getters for the common cuda values: `{threadId,threadDim,blockId,blockDim}.{x,y,z}` Differential Revision: https://reviews.llvm.org/D156501	2023-07-31 13:44:51 -07:00
Johannes Doerfert	1f3a28d4e5	[OpenMP][NFC] Reorganize the ompx::mapping layer in the GPU runtime This change makes the naming more consistent, I hope.	2023-07-31 13:44:51 -07:00
Jonathan Peyton	b34c7d8c8e	[OpenMP] Introduce hybrid core attributes to OMP_PLACES and KMP_AFFINITY * Add KMP_CPU_EQUAL and KMP_CPU_ISEMPTY to affinity mask API * Add printout of leader to hardware thread dump * Allow OMP_PLACES to restrict fullMask This change fixes an issue with the OMP_PLACES=resource(#) syntax. Before this change, specifying the number of resources did NOT change the default number of threads created by the runtime. e.g., OMP_PLACES=cores(2) would still create __kmp_avail_proc number of threads. After this change, the fullMask and __kmp_avail_proc are modified if necessary so that the final place list dictates which resources are available and how thus, how many threads are created by default. * Introduce hybrid core attributes to OMP_PLACES and KMP_AFFINITY For OMP_PLACES, two new features are added: 1) OMP_PLACES=cores:<attribute> where <attribute> is either intel_atom, intel_core, or eff# where # is 0 - number of core efficiencies-1. This syntax also supports the optional (#) number selection of resources. 2) OMP_PLACES=core_types\|core_effs where this setting will create the number of core_types (or core_effs\|core_efficiencies). For KMP_AFFINITY, the granularity setting is expanded to include two new keywords: core_type, and core_eff (or core_efficiency). This will set the granularity to include all cores with a particular core type (or efficiency). e.g., KMP_AFFINITY=granularity=core_type,compact will create threads which can float across a single core type. Differential Revision: https://reviews.llvm.org/D154547	2023-07-31 13:55:32 -05:00
Anton Rydahl	5c0f98cd2a	[OpenMP][Docs] Added offloading command line reference to OpenMP FAQ This command adds an OpenMP offloading specific command line reference. The OpenMP FAQ links to the .rst new file. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D156387	2023-07-29 17:40:28 -07:00
antonrydahl	daf36b54b4	Revert "[OpenMP][Docs] Added offloading command line reference to OpenMP FAQ" This reverts commit `4166ff6107`. I accidentally pushed an old version of this patch.	2023-07-28 18:28:29 -07:00
Anton Rydahl	b880552dc1	[OpenMP][Docs] Updated the OpenMP documentation about building the OpenMP documentation with Sphinx When I was trying to improve the OpenMP documentation, I found that the information in `OpenMP/docs/README.md` did not contain up-to-date information about how to build the OpenMP documentation with Sphinx. When I ran `make docs-openmp-html`, the command failed because there were a few syntax errors in `openmp/docs/design/Runtimes.rst`. This commit fixes the syntax errors and updates the documentation on building the OpenMP documentation. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D156470	2023-07-28 18:04:21 -07:00
antonrydahl	4166ff6107	[OpenMP][Docs] Added offloading command line reference to OpenMP FAQ I have added a few things to the OpenMP FAQ which I think were missing. Feel free to suggest some changes. Are there missing options in the offloading command line reference? And what do you think about the section "Q: Why is my build taking a long time"? Differential Revision: https://reviews.llvm.org/D156387	2023-07-28 18:04:21 -07:00
Joseph Huber	141c4e7a94	[OpenMP] Do not always emit unused extern variables Currently, the precense of the OpenMP target declare metadata requires that we always codegen a global declaration. This is undesirable in the case that we could defer or omit this declaration as is common with unused extern variables. This is important as it allows us, in the runtime, to rely on static linking semantics to omit unused symbols so they are not included when the user links it in. This patch changes the check for always emitting these variables. Because of this we also need to extend this logic to the generation of the offloading entries. This has the result of derring the offload entry generation to the canonical definitoin. So we are effectively assuming whoever owns the storage for this variable will perform that operation. This makes an exception for `link` attributes as those require their own special handling. Let me know if this is sound in the implementation, I do not have the largest view of the standards here. Fixes: https://github.com/llvm/llvm-project/issues/64133 Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D156368	2023-07-28 11:52:05 -05:00
Kevin Sala	7b2745b424	[OpenMP][libomptarget] Process resources when getting/returning from managers This patch adds the functionality to process with a lambda the resources obtained and returned by the resource managers in the plugins. These processing lambdas are empty for the moment. The idea is to process them when the resource manager mutex is acquired. Differential Revision: https://reviews.llvm.org/D156245	2023-07-28 00:37:08 +02:00
Kevin Sala	523ac0fcdf	[OpenMP][libomptarget] Retrieve multiple resources from resource managers This patch extends the plugin resource managers to return more than one resource per call. The return function is not extended since we do not return more than one resource anywhere. Differential Revision: https://reviews.llvm.org/D155629	2023-07-28 00:37:08 +02:00
Kevin Sala	53e4c7c309	[OpenMP][libomptarget] Improving plugin resource managers This patch improves the resource managers in the plugins by properly handling the errors. Until now, errors when creating and destroying resources were not propagated and were directly handled inside the resource managers. Now, all errors are propagated as in the rest of the plugin infrastructure. The code is now ready to implement the request/return of multiple resources in a single getResource/returnResource call. Differential Revision: https://reviews.llvm.org/D155621	2023-07-28 00:37:08 +02:00
Shilei Tian	10068cd654	[OpenMP] Introduce kernel environment This patch introduces per kernel environment. Previously, flags such as execution mode are set through global variables with name like `__kernel_name_exec_mode`. They are accessible on the host by reading the corresponding global variable, but not from the device. Besides, some assumptions, such as no nested parallelism, are not per kernel basis, preventing us applying per kernel optimization in the device runtime. This is a combination and refinement of patch series D116908, D116909, and D116910. Depend on D155886. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D142569	2023-07-26 13:35:14 -04:00
Michael Halkenhaeuser	1dec417ac4	[OpenMP] [OMPT] [7/8] Invoke tool-supplied callbacks before and after target launch and data transfer operations Implemented RAII objects, initialized at target entry points, that invoke tool-supplied callbacks. Updated status of target callbacks as implemented. Depends on D127365 Patch from John Mellor-Crummey <johnmc@rice.edu> With contributions from: Dhruva Chakrabarti <Dhruva.Chakrabarti@amd.com> Jan-Patrick Lehr <janpatrick.lehr@amd.com> Reviewed By: jdoerfert, dhruvachak, jplehr Differential Revision: https://reviews.llvm.org/D127367	2023-07-25 08:18:26 -04:00
Tobias Hieta	4706251a31	Clear release notes for 18.x	2023-07-25 13:58:49 +02:00
Michael Halkenhaeuser	cf119df548	Revert "[OpenMP] [OMPT] [7/8] Invoke tool-supplied callbacks before and after target launch and data transfer operations" This reverts commit `00ccfcf9a6`.	2023-07-25 06:22:25 -04:00
Michael Halkenhaeuser	5fa5c39871	[OpenMP] Add OMPT release note OMPT release note addition for LLVM 17 Differential Revision: https://reviews.llvm.org/D156191	2023-07-24 20:38:04 -04:00
Michael Halkenhaeuser	00ccfcf9a6	[OpenMP] [OMPT] [7/8] Invoke tool-supplied callbacks before and after target launch and data transfer operations Implemented RAII objects, initialized at target entry points, that invoke tool-supplied callbacks. Updated status of target callbacks as implemented. Depends on D127365 Patch from John Mellor-Crummey <johnmc@rice.edu> With contributions from: Dhruva Chakrabarti <Dhruva.Chakrabarti@amd.com> Jan-Patrick Lehr <janpatrick.lehr@amd.com> Reviewed By: jdoerfert, dhruvachak Differential Revision: https://reviews.llvm.org/D127367	2023-07-24 20:22:06 -04:00
Jonathan Peyton	4e680ae5f2	[OpenMP] Move KMP_VERSION printout logic to post-serial-init Get the KMP_VERSION printout logic out of environment variable file (kmp_settings.cpp) and move to end of serial initialization where KMP_SETTINGS and OMP_DISPLAY_ENV are. Differential Revision: https://reviews.llvm.org/D154652	2023-07-24 16:02:03 -05:00
Jonathan Peyton	fda297729d	[OpenMP] Restore comment accidently deleted in D154650	2023-07-24 16:01:03 -05:00
Jonathan Peyton	1e3bbf76a1	[OpenMP] Re-use affinity raii class in worker spawning Get rid of explicit mask alloc, getthreadaffinity, set temp affinity, reset to old affinity, dealloc steps in favor of existing kmp_affinity_raii_t to push/pop a temporary affinity. Differential Revision: https://reviews.llvm.org/D154650	2023-07-24 15:58:25 -05:00
Joseph Huber	8db184ae8c	[OpenMP] Add a few release notes Summary: Release notes	2023-07-24 13:26:44 -05:00
Shilei Tian	6bd74fd65f	Revert commits for kernel environment This reverts commits for kernel environments as they causes issues in AMD BB.	2023-07-23 23:32:31 -04:00
Shilei Tian	c5c8040390	[OpenMP] Introduce kernel environment This patch introduces per kernel environment. Previously, flags such as execution mode are set through global variables with name like `__kernel_name_exec_mode`. They are accessible on the host by reading the corresponding global variable, but not from the device. Besides, some assumptions, such as no nested parallelism, are not per kernel basis, preventing us applying per kernel optimization in the device runtime. This is a combination and refinement of patch series D116908, D116909, and D116910. Depend on D155886. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D142569	2023-07-23 18:36:01 -04:00
Shilei Tian	763fdb1ffa	[OpenMP][Plugin] Update the global address calculation Current global address caculation doesn't work for AMDGPU in some cases (https://reviews.llvm.org/D142569#4506212). The root cause is the `sh_addr` is not substracted when caculating the address. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D155886	2023-07-23 17:41:06 -04:00
Michael Halkenhaeuser	d82eace1c9	[OpenMP][OMPT] Add 'Initialized' flag We observed some overhead and unnecessary debug output. This can be alleviated by (re-)introduction of a boolean that indicates, if the OMPT initialization has been performed. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D155186	2023-07-21 08:19:03 -04:00
Michael Halkenhaeuser	453a75dc52	[OpenMP] [OMPT] [6/8] Added callback support for target data operations, target submit, and target regions. This patch adds support for invoking target callbacks but does not yet invoke them. A new structure OmptInterface has been added that tracks thread local states including correlation ids. This structure defines methods that will be called from the device independent target library with information related to a target entry point for which a callback is invoked. These methods in turn use the callback functions maintained by OmptDeviceCallbacksTy to invoke the tool supplied callbacks. Depends on D124652 Patch from John Mellor-Crummey <johnmc@rice.edu> With contributions from: Dhruva Chakrabarti <Dhruva.Chakrabarti@amd.com> Differential Revision: https://reviews.llvm.org/D127365	2023-07-21 06:24:12 -04:00
Joseph Huber	e537c83975	[libc] Add basic support for calling host functions from the GPU This patch adds the `rpc_host_call` function as a GPU extension. This is exported from the `libc` project to use the RPC interface to call a function pointer via RPC any copying the arguments by-value. The interface can only support a single void pointer argument much like pthreads. The function call here is the bare-bones version of what's required for OpenMP reverse offloading. Full support will require interfacing with the mapping table, nowait support, etc. I decided to test this interface in `libomptarget` as that will be the primary consumer and it would be more difficult to make a test in `libc` due to the testing infrastructure not really having a concept of the "host" as it runs directly on the GPU as if it were a CPU target. Reviewed By: jplehr Differential Revision: https://reviews.llvm.org/D155003	2023-07-19 10:11:46 -05:00
Johannes Doerfert	f914208c43	[OpenMP][NFCI] Avoid storing non-constant values in ICV If we store a constant in an ICV it is easier for the optimizer to propagate it. Since we often use the full block for the thread limit and the parallel team size, we can instead replace that dynamic value with a constant that otherwise cannot occur, here 0.	2023-07-18 16:50:50 -07:00
Johannes Doerfert	88a68de14c	[OpenMP][NFCI] Split assertion message from assertion expression We ended up with `llvm.assume(icmp ne ptr as(4) null, as(4) @str)` because the string in address space 4 was not known to be non-null. There is no need to create these assumes.	2023-07-18 16:50:50 -07:00
Matt Arsenault	e9725628ba	libomptarget: Try to fix dependency tracking for llvm tools	2023-07-18 06:21:33 -04:00
Jay Foad	92542f2a40	[AMDGPU] Add targets gfx1150 and gfx1151 This is the target definition only. Currently they are treated the same as GFX 11.0.x. Differential Revision: https://reviews.llvm.org/D155429	2023-07-17 13:06:12 +01:00
Joseph Huber	2dbc532672	[OMPT] Fix use of 'DEBUG_PREFIX' in the OMPT headers This is the only place that defines this prefix in a header file and was thus overriding and redefining other users of it. If we must use it in a header file, at least repsect its old values. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D155316	2023-07-14 15:58:24 -05:00
Adrian Munera	739164c024	[OpenMP] Build device runtimes for sm_87 Summary: These were missing from the list of all architectures. Differential Revision: https://reviews.llvm.org/D155287	2023-07-14 13:49:27 -05:00
Joseph Huber	48da62617e	[OpenMP] Add documentation on using the `libc` in OpenMP This points users to the `libc` documentation and explains the basics of how it's used inside the runtime. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D155318	2023-07-14 13:28:29 -05:00
Carlos Eduardo Seo	7d0df44d39	[OpenMP] Disable veccopy tests for AArch64 Like for x86_64-linux-gnu, these need to be disabled for aarch64-linux-gnu. Differential Revision: https://reviews.llvm.org/D155109	2023-07-12 23:53:14 +00:00
Joseph Huber	1776dc8124	[Libomptarget][Obvious] Fix uninitialized pointer Summary: This pointer was not initliazed to null which meant that it would be erronenously deleted by plugins that were not in use.	2023-07-11 15:41:46 -05:00
Joseph Huber	8a0763f19c	[Libomptarget] Remove RPCHandleTy indirection The 'RPCHandleTy' was intended to capture the intention that a specific device owns its slot in the RPC server. However, this required creating a temporary store to hold these pointers. This was causing really weird spurious failure due to undefined behaviour in the order of library teardown. For example, the x64 plugin would be torn down, set this to some invalid memory, and then the CUDA plugin would crash. Rather than spend the time to fully diagnose this problem I found it pertinent to simply remove the failure mode. This patch removes this indirection so now the usage of the RPC server must always be done with the intended device. This just requires some extra handling for the AMDGPU indirection where we need to store a reference to the device. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D154971	2023-07-11 10:54:40 -05:00
Joachim Jenke	81bc7cf609	[OpenMP][NFC] lit: Allow setting default environment variables for test Add CHECK_OPENMP_ENV environment variable which will be passed to environment variables for test (make check-* target). This provides a handy way to exercise various openmp code with different settings during development. For example, to change default barrier pattern: ``` $ env CHECK_OPENMP_ENV="KMP_FORKJOIN_BARRIER_PATTERN=hier,hier \ KMP_PLAIN_BARRIER_PATTERN=hier,hier \ KMP_REDUCTION_BARRIER_PATTERN=hier,hier" \ ninja check-openmp ``` Even with this, each test can set appropriate environment variables if needed as before. Also, this commit adds missing documention about how to run tests in README. Patch provided by t-msn Differential Revision: https://reviews.llvm.org/D122645	2023-07-11 15:00:40 +02:00
Michael Halkenhaeuser	142faf56f5	[OpenMP] [OMPT] [amdgpu] [5/8] Implemented device init/fini/load callbacks Added support in the generic plugin to invoke registered callbacks. Depends on D124070 Patch from John Mellor-Crummey <johnmc@rice.edu> (With contributions from Dhruva Chakrabarti <Dhruva.Chakrabarti@amd.com>) Differential Revision: https://reviews.llvm.org/D124652	2023-07-11 07:13:22 -04:00
Carlos Eduardo Seo	d9a2b83dcd	[OpenMP] Fix note section type notation for AArch64 Like Arm, AArch64 also uses "%" instead of "@" for note section types. Differential Revision: https://reviews.llvm.org/D154859	2023-07-10 17:21:54 +00:00
Shao-Ce SUN	048423702d	[OpenMP] Fix build warnings ``` llvm-project/openmp/libomptarget/src/private.h:260:9: warning: 'DEBUG_PREFIX' macro redefined [-Wmacro-redefined] #define DEBUG_PREFIX GETNAME(TARGET_NAME) ^ llvm-project/openmp/libomptarget/include/ompt_device_callbacks.h:22:9: note: previous definition is here #define DEBUG_PREFIX "OMPT" ^ 1 warning generated. ``` ``` llvm-project/openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.cpp:458:14: warning: moving a local object in a return statement prevents copy elision [-Wpessimizing-move] return std::move(Err); ^ llvm-project/openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.cpp:458:14: note: remove std::move call here return std::move(Err); ^~~~~~~~~~ ~ llvm-project/openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.cpp:552:12: warning: moving a local object in a return statement prevents copy elision [-Wpessimizing-move] return std::move(Err); ^ llvm-project/openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.cpp:552:12: note: remove std::move call here return std::move(Err); ^~~~~~~~~~ ~ 2 warnings generated. ``` Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D154787	2023-07-09 22:12:23 +08:00
Elliot Goodrich	a11efd4926	Add missing StringExtras.h includes In preparation for removing the `#include "llvm/ADT/StringExtras.h"` from the header to source file of `llvm/Support/Error.h`, first add in all the missing includes that were previously included transitively through this header. This is fixing all files missed in `b0abd4893f` and `39d8e6e22c`. Differential Revision: https://reviews.llvm.org/D154763	2023-07-08 20:06:21 +01:00
Joachim Jenke	820be30ad9	[OpenMP][OMPT] Introduce VERBOSE_INIT in ompt-multiplex.h OpenMP 5.1 added OMP_TOOL_VERBOSE_INIT. This env variable is extremely helpful to understand the issue when loading a tool fails unexpectedly (e.g., errors from dlopen, when the libc available at runtime is older than libc used at compile time of the tool -> missed to load the right gcc module). This patch replicates the verbose init code from libomp watching out for a different env variable. Similar to CLIENT_TOOL_LIBRARIES_VAR, a tool can define the name of the env var by defining CLIENT_TOOL_VERBOSE_INIT_VAR before including ompt-multiplex.h. Alternatively, a tool can define OMPT_MULTIPLEX_TOOL_NAME to specify the tool name which will be the prefix for both _TOOL_LIBRARIES and _VERBOSE_INIT var. Finally, if none of the two macros is defined, the header will print a compiler warning and look at OMP_TOOL_VERBOSE_INIT. Patch prepared by Semih Burak Differential Revision: https://reviews.llvm.org/D112809	2023-07-08 17:09:57 +02:00
Joseph Huber	e526a7fc15	[Libomptarget][NFC] Clean up warnings and format	2023-07-07 18:59:26 -05:00
Joseph Huber	b83e29027c	[Libomptarget] Fix tests only including the LTO variant Summary: These were overriding rather than appending. Fix that.	2023-07-07 16:24:27 -05:00
Martin Storsjö	f105c1dc58	[OpenMP] Remove the workaround of passing "-x assembler-with-cpp" manually By building the assembly with language ASM now (since `4072c8aee4` and `cbaa3597aa`), this shouldn't be needed any longer. Differential Revision: https://reviews.llvm.org/D150701	2023-07-07 23:32:27 +03:00
Joseph Huber	338c80516b	[Libomptarget] Refine logic for determining if we support RPC Summary: Add a requirement for the GPU libc to only be on if its enabled explicitly. Fix the logic around the pythonification of the variable.	2023-07-07 14:06:58 -05:00
Joseph Huber	d3748d942a	[Libomptarget] Fix test logic for optionally adding the libcgpu.a Summary: This was not operating as expected and was causing the build to fail on non-configured systems.	2023-07-07 12:49:50 -05:00
Joseph Huber	691dc2d10d	[Libomptarget] Begin implementing support for RPC services This patch adds the intial support for running an RPC server in libomptarget to handle host services. We interface with the library provided by the `libc` project to stand up a basic server. We introduce a new type that is controlled by the plugin and has each device intialize its interface. We then run a basic server to check the RPC buffer. This patch does not fully implement the interface. In the future each plugin will want to define special handlers via the interface to support things like malloc or H2D copies coming from RPC. We will also want to allow the plugin to specify t he number of ports. This is currently capped in the implementation but will be adjusted soon. Right now running the server is handled by whatever thread ends up doing the waiting. This is probably not a completely sound solution but I am not overly familiar with the behaviour of OpenMP tasks and what would be required here. This works okay with synchrnous regions, and somewhat fine with `nowait` regions, but I've observed some weird behavior when one of those regions calls `exit`. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D154312	2023-07-07 12:36:46 -05:00
Joachim Jenke	124d36e093	[OpenMP][OMPT] Change OMPT kind for OpenMP test lock functions The OpenMP specification mentions that omp_test_lock and omp_test_nest_lock dispatch OMPT callbacks with ompt_mutex_test_lock and ompt_mutex_test_nest_lock for their kind respectively. Previously, the values ompt_mutex_lock and ompt_mutex_nest_lock were used. This could cause issues in application relying on the kind to correctly determine lock states. This commit changes the kind to the expected ones. Also update callback.h and OMPT tests to reflect this change. Patch prepared by Thyre Differential Review: https://reviews.llvm.org/D153028 Differential Review: https://reviews.llvm.org/D153031 Differential Review: https://reviews.llvm.org/D153032	2023-07-07 14:49:47 +02:00
Joachim Jenke	d679c904c2	[OpenMP][OMPT] Rename callback master to masked in ompt-multiplex.h OpenMP 5.1 replaced callback ompt_callback_master_t by ompt_callback_masked_t. In order to stick to the standard, the implementation is updated accordingly. Patch prepared by Semih Burak Differential Revision: https://reviews.llvm.org/D112798	2023-07-07 14:01:40 +02:00
Joachim Jenke	94ec997521	[OpenMP][OMPT] Add two missing nullpointer checks in ompt-multiplex.h In the functions ompt_multiplex_get_own_ompt_data and ompt_multiplex_get_client_ompt_data in addition to data being NULL, also the void pointer field "ptr" of "data" could be NULL, leading to a subsequent segfault. This patch add the corresponding checks. Patch prepared by Semih Burak Differential Revision: https://reviews.llvm.org/D112806	2023-07-07 14:01:39 +02:00
Joachim Jenke	73d411d1b2	[OpenMP][Tools] Add omp_all_memory support for Archer The semantic of depend(out:omp_all_memory) is quite similar to taskwait in that it separates all tasks (with dependency) created before an all_memory-task from all tasks (with dependency) created after an all_memory-task. Only a single of such tasks can execute at a time. Similar to taskwait, we have a CV (AllMemory[1]) in the generating task to express the dependency sink semantic of an all_memory-task. In addition, AllMemory[0] describes the dependency source semantic of an all_memory-task. All tasks with dependency create an HB-arc towards the sink and terminate an HB-arc from the source. Since we expect that not many applications will use such dependency, the support for handling the synchronization semantic is off by default and can be turned on using ARCHER_OPTION="all_memory=1". The most costly part is the precautionary posting of an HB-arc towards the sink, which represents a potentially contentious write from all concurrently executing sibling tasks. A warning is printed at runtime, when the option is off while such dependency is observed. In most cases the lazy activation will still lead to false alerts. Differential Revision: https://reviews.llvm.org/D111895	2023-07-07 13:55:46 +02:00
Joachim Jenke	6ef16f2618	[OpenMP] Add OMPT support for omp_all_memory task dependence omp_all_memory currently has no representation in OMPT. Adding new dependency flags as suggested by omp-lang issue #3007. Differential Revision: https://reviews.llvm.org/D111788	2023-07-07 13:44:53 +02:00
Jonathan Peyton	05e2bc25e8	[OpenMP] Ensure socket layer is not first in CPUID topology detection * Return 0 length topology if socket layer is detected first * Fix DEBUG ASSERT	2023-07-06 12:35:34 -05:00
Jonathan Peyton	2d02988f74	[OpenMP] Remove gcc-12 warnings from libomp	2023-07-06 11:47:45 -05:00
Joseph Huber	b420e0ed27	[Libomptarget] Disable the 'mapping/prelock.cpp' test on AMDGPU Summary: This test was not functional on the new plugins, now that the old ones have been deleted it doesn't work. Disable until we get a fix.	2023-07-06 11:45:18 -05:00
Joseph Huber	071c8a41cc	[Libomptarget] Fix tests after deleting the next-gen plugins The next-gen plugins didn't correctly configure tests and were never actually being run. Since deleting the old plugin we stopped getting `libomptarget` tests. This patch fixes the issue and allows the targets to be built Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D154619	2023-07-06 10:44:50 -05:00
Joseph Huber	e90ab9148b	[OpenMP] Delete old plugins It's time to remove the old plugins as the next-gen has already been set to default in LLVM 16. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D142820	2023-07-05 17:39:47 -05:00
Joseph Huber	70c08dbcfb	[Libomptarget] Remove the remote and ve plugins from libomptarget These plugins are unmaintained and are not in a workable state. The VE plugin has not been touched for years and has never had any running tests. The remote plugin is in an unfinished state and is not production ready upstream. These will need to be ported to the new nextgen interface in the future if they are needed. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D154548	2023-07-05 17:39:46 -05:00
Nawrin Sultana	50a95e3e6b	[OpenMP] Minor improvement in error msg and fixes few coverity reported issues Differential Revision: https://reviews.llvm.org/D152289	2023-07-05 12:07:51 -05:00
Joseph Huber	33859fb962	[Libomptarget][Obvious] Missing comma on enum	2023-07-04 22:01:03 -05:00
Joseph Huber	ec39b35178	[Libomptarget] Add missing HSA agent info enumeration Summary: This was not added to dynamic_hsa.h	2023-07-04 21:55:49 -05:00
Joseph Huber	6764301a6b	[Libomptarget] Correctly implement `getWTime` on AMDGPU AMDGPU provides a fixed frequency clock since some generations back. However, the frequency is variable by card and must be looked up at runtime. This patch adds a new device environment line for the clock frequency so that we can use it in the same way as NVPTX. This is the correct implementation and the version in ASO should be replaced. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D154456	2023-07-04 21:50:43 -05:00
Joseph Huber	18a6ccea3a	[Libomptarget] Fix misused macro name preventing printing of library name Summary: This code used `LIBOMPTARGET_DEBUG` which is not the macro name, but the environment variable. This caused this portion to always be disabled. In the long run we should aim for this to always be availible as it's useful for other diagnostic message.	2023-07-04 08:00:27 -05:00
Joel E. Denny	6e127c6f29	[OpenMP] libomptarget: Don't map alignment padding to host In the case of partially mapped structs, libomptarget sometimes adds padding to device allocations to ensure they are aligned properly. However, without this patch, it considers that padding to be mapped to the host, which can cause presence checks (e.g., `omp_target_is_present` or a `present` modifier) to misbehave for unmapped parts of the struct. This patch keeps the padding but treats it as unmapped. See the new test case for examples. Reviewed By: grokos, jdoerfert Differential Revision: https://reviews.llvm.org/D149685	2023-07-03 10:23:38 -04:00
Dhruva Chakrabarti	6a1d1f7eef	[OpenMP] Added memory scope to atomic::inc API and used the device scope in reduction. With https://reviews.llvm.org/D137524, memory scope and ordering attributes are being used to generate the required instructions for atomic inc/dec on AMDGPU. This patch adds the memory scope attribute to the atomic::inc API and uses the device scope in reduction. Without the device scope in atomic_inc, the default system scope leads to unnecessary L2 write-backs/invalidates. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D154172	2023-06-30 15:05:01 -04:00
Joseph Huber	968f65ae03	[OpenMP] Adjust using the NVPTX architecture detection tool A previous patch by @arsenm adjusted these to find the `amdgpu-arch` tool correctly if we do a `LLVM_ENABLE_PROJECTS` build. This patch applies the same to `nvptx-arch` tool to keep it consistent. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D154107	2023-06-29 12:14:44 -05:00
Ethan Luis McDonough	341c3cf78c	[flang][openmp] Fortran offloading test Flang currently supports offloading for AMD GPUs. This patch establishes a test structure for Fortran offloading tests in libomptarget. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D148778	2023-06-28 15:15:32 -05:00
Matt Arsenault	17f564f305	OpenMP: Revert accidental cmake change to make amdgpu-arch errors fatal I still think this should be done but should be done separately.	2023-06-28 07:33:27 -04:00
Matt Arsenault	7c3fa755f1	OpenMP/cmake: Use TARGET instead of looking for amdgpu-arch Not sure if the standalone build case is supposed to be a supported path. Should probably rely on find_package and imported targets anyway.	2023-06-28 06:55:15 -04:00
Job Noorman	8de9f2b558	Move SubtargetFeature.h from MC to TargetParser SubtargetFeature.h is currently part of MC while it doesn't depend on anything in MC. Since some LLVM components might have the need to work with target features without necessarily needing MC, it might be worthwhile to move SubtargetFeature.h to a different location. This will reduce the dependencies of said components. Note that I choose TargetParser as the destination because that's where Triple lives and SubtargetFeatures feels related to that. This issues came up during a JITLink review (D149522). JITLink would like to avoid a dependency on MC while still needing to store target features. Reviewed By: MaskRay, arsenm Differential Revision: https://reviews.llvm.org/D150549	2023-06-26 11:20:08 +02:00
Shao-Ce SUN	f042890521	[openmp] remove initializeRewriteSymbolsLegacyPassPass Fix build error caused by D153679 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D153704	2023-06-25 00:35:01 +08:00
Matt Arsenault	6e94a9bf54	Revert "OpenMP/cmake: Use list append instead of repeating variable name" This reverts commit `e429fdd036`.	2023-06-23 15:44:05 -04:00
Matt Arsenault	a2f5bcc766	OpenMP/cmake: Use DEPFILE instead of IMPLICIT_DEPENDS IMPLICIT_DEPENDS doesn't actually work with ninja and this does.	2023-06-23 15:25:10 -04:00
Matt Arsenault	e429fdd036	OpenMP/cmake: Use list append instead of repeating variable name	2023-06-23 15:25:10 -04:00
Carlos Eduardo Seo	556b563ae0	[OpenMP] Disable some tests for AArch64 Like for X86, some of the tests also need to be disabled for AArch64. Differential Revision: https://reviews.llvm.org/D153312	2023-06-20 19:00:07 +00:00
Adrian Munera	028cf8c016	[OpenMP] Implement printing TDGs to dot files This patch implements the "__kmp_print_tdg_dot" function, that prints a task dependency graph into a dot file containing the tasks and their dependencies. It is activated through a new environment variable "KMP_TDG_DOT" Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D150962	2023-06-19 08:27:38 -05:00
Animesh Kumar	0c6f2f629c	[OpenMP] Update the default version of OpenMP to 5.1 The default version of OpenMP is updated from 5.0 to 5.1 which means if -fopenmp is specified but -fopenmp-version is not specified with clang, the default version of OpenMP is taken to be 5.1. After modifying the Frontend for that, various LIT tests were updated. This patch contains all such changes. At a high level, these are the patterns of changes observed in LIT tests - # RUN lines which mentioned `-fopenmp-version=50` need to kept only if the IR for version 5.0 and 5.1 are different. Otherwise only one RUN line with no version info(i.e. default version) needs to be there. # Test cases of this sort already had the RUN lines with respect to the older default version 5.0 and the version 5.1. Only swapping the version specification flag `-fopenmp-version` from newer version RUN line to older version RUN line is required. # Diagnostics: Remove the 5.0 version specific RUN lines if there was no difference in the Diagnostics messages with respect to the default 5.1. # Diagnostics: In case there was any difference in diagnostics messages between 5.0 and 5.1, mention version specific messages in tests. # If the test contained version specific ifdef's e.g. "#ifdef OMP5" but there were no RUN lines for any other version than 5.X, then bring the code guarded by ifdef's outside and remove the ifdef's. # Some tests had RUN lines for both 5.0 and 5.1 versions, but it is found that the IR for 5.0 is not different from the 5.1, therefore such RUN lines are redundant. So, such duplicated lines are removed. # To generate CHECK lines automatically, use the script llvm/utils/update_cc_test_checks.py Reviewed By: saiislam, ABataev Differential Revision: https://reviews.llvm.org/D129635 (cherry picked from commit 9dd2999907dc791136a75238a6000f69bf67cf4e)	2023-06-15 12:41:09 +05:30
Shilei Tian	375862b481	[OpenMP] Fix the issue in openmp/runtime/test/parallel/bug63197.c If the system has 32 threads, then the test will fail because of partial match.	2023-06-14 12:23:37 -04:00
Shilei Tian	b14dc71c5e	[OpenMP] Use 0 instead of false in the test bug63197.c	2023-06-14 11:51:51 -04:00
Shilei Tian	85592d3d4d	[OpenMP] Fix the issue where `num_threads` still takes effect incorrectly This patch fixes the issue that, if we have a compile-time serialized parallel region (such as `if (0)`) with `num_threads`, followed by a regular parallel region, the regular parallel region will pick up the value set in the serialized parallel region incorrectly. The reason is, in the front end, if we can prove a parallel region has to serialized, instead of emitting `__kmpc_fork_call`, the front end directly emits `__kmpc_serialized_parallel`, body, and `__kmpc_end_serialized_parallel`. However, this "optimization" doesn't consider the case where `num_threads` is used such that `__kmpc_push_num_threads` is still emitted. Since we don't reset the value in `__kmpc_serialized_parallel`, it will affect the next parallel region followed by it. Fix #63197. Reviewed By: tlwilmar Differential Revision: https://reviews.llvm.org/D152883	2023-06-14 11:46:12 -04:00
Joel E. Denny	5df492302e	[OpenMP] Fix --libomptarget-nvptx-bc-path in tests After D151324, which landed as `349c0aacb3`, many libomptarget non-LTO nvptx64 tests fail with errors like: ``` clang: error: bitcode library '/tmp/llvm-project/build/runtimes/runtimes-bins/openmp/libomptarget/libomptarget-nvptx-sm_70.bc' does not exist ``` This patch updates the bc path. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D152462	2023-06-08 15:09:35 -04:00
Johannes Doerfert	cb17c48fdd	[Attributor] Identify and remove no-op fences The logic and implementation follows the removal of no-op barriers. If the fence is not making updates visible, either to the world or the current thread, it is not needed. Said differently, the fences we remove do not establish synchronization (happens-before) edges. This allows us to eliminate some of the regression caused by: https://reviews.llvm.org/D145290	2023-06-05 17:14:00 -07:00
Johannes Doerfert	6629a96a8c	[OpenMP] Improve default block count selection fow low block counts If a combined loop has insufficient parallelism (= low trip count), we might end up with too few teams/blocks. To counter that we can reduce the number of threads per team we use. This patch implements a heuristic and exposes a new environment variable to control the minimum of threads to be employed in this case. Issue reported by: Felipe Cabarcas Jaramillo <cabarcas@udel.edu> (@fel-cab). Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D152014	2023-06-05 16:35:44 -07:00
Hansang Bae	bd46706b1f	[OpenMP][libomp] Allow white spaces in OMP_TARGET_OFFLOAD value Remove heading/trailing white spaces when matching OMP_TARGET_OFFLOAD value. Differential Revision: https://reviews.llvm.org/D149890	2023-06-05 17:41:54 -05:00
paperchalice	0beffb8542	[CMake] Ensure `CLANG_RESOURCE_DIR` is respected. re-commit of `39aa0f5c43` with missing file: cmake/Modules/GetClangResourceDir.cmake.	2023-06-03 04:21:35 -07:00
Martin Storsjö	d072d11022	Revert "[CMake] Ensure `CLANG_RESOURCE_DIR` is respected." This reverts commit `39aa0f5c43`. This is missing the new GetClangResourceDir.cmake that is being included, so all clang builds are broken.	2023-06-03 11:47:57 +03:00
paperchalice	39aa0f5c43	[CMake] Ensure `CLANG_RESOURCE_DIR` is respected.	2023-06-02 23:29:44 -07:00
Joel E. Denny	19841e4dca	[OpenMP] Fix transformed loop's var privacy Without this patch, the following example crashes Clang: ``` #pragma omp target map(i) #pragma omp tile sizes(2) for (i = 0; i < N; ++i) ; ``` This patch fixes the crash by changing `Sema::isOpenMPPrivateDecl` not to identify `i` as private just because it's the loop variable of a `tile` construct. While OpenMP TR11 and earlier do specify privacy for loop variables of loops generated from a `tile` construct, I haven't found text stating that the original loop variable must be private in the above example, so this patch leaves it shared. Even so, it is a bit unexpected that value of `i` after the loop is `N - 1` instead of `N`. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D151356	2023-06-02 12:18:13 -04:00
Joseph Huber	349c0aacb3	[OpenMP] Remove 'keep_alive' functionality from the device RTL The OpenMP DeviceRTL uses a hacky workaround to keep certain runtime calls alive. This used a function that prevented them from being optimized out. We needed this hack because the 'OpenMPOpt' pass likes to introduce new runtime calls into the TU. This then interacted badly with the method of linking the bitcode file per-TU like we do with Nvidia. The OpenMPOpt pass would then generate a runtime call to a function that was never linked in. This should not be a problem anymore because we unconditionally link in the `libomptarget.devicertl.a` runtime library. This should thus only extract symbols that are undefined. So, if we do end up with an unresolved reference it will be resolved by the static library. The downside to this is that if we are doing non-LTO NVPTX compilation that introduces one of these calls it will be linked outside the module and therefore provide the overhead of an external function call. However, removing this flag should make optimizing things easier. We will need to see if that performance is a problem. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D151324	2023-05-31 17:12:43 -05:00
Shilei Tian	319d5d99ca	[NFC][OpenMP] Remove unused variable `new_iv_saved` in `openmp/runtime/src/kmp_collapse.cpp`	2023-05-30 22:17:02 -04:00
Kazu Hirata	a82f2b2db3	Fix typos in documentation	2023-05-28 13:13:12 -07:00
Mark de Wever	cbaa3597aa	Reland "[CMake] Bumps minimum version to 3.20.0. This reverts commit `d763c6e5e2`. Adds the patch by @hans from https://github.com/llvm/llvm-project/issues/62719 This patch fixes the Windows build. `d763c6e5e2` reverted the reviews D144509 [CMake] Bumps minimum version to 3.20.0. This partly undoes D137724. This change has been discussed on discourse https://discourse.llvm.org/t/rfc-upgrading-llvms-minimum-required-cmake-version/66193 Note this does not remove work-arounds for older CMake versions, that will be done in followup patches. D150532 [OpenMP] Compile assembly files as ASM, not C Since CMake 3.20, CMake explicitly passes "-x c" (or equivalent) when compiling a file which has been set as having the language C. This behaviour change only takes place if "cmake_minimum_required" is set to 3.20 or newer, or if the policy CMP0119 is set to new. Attempting to compile assembly files with "-x c" fails, however this is workarounded in many cases, as OpenMP overrides this with "-x assembler-with-cpp", however this is only added for non-Windows targets. Thus, after increasing cmake_minimum_required to 3.20, this breaks compiling the GNU assembly for Windows targets; the GNU assembly is used for ARM and AArch64 Windows targets when building with Clang. This patch unbreaks that. D150688 [cmake] Set CMP0091 to fix Windows builds after the cmake_minimum_required bump The build uses other mechanism to select the runtime. Fixes #62719 Reviewed By: #libc, Mordante Differential Revision: https://reviews.llvm.org/D151344	2023-05-27 12:51:21 +02:00
Jennifer Yu	3108c1b811	skip test run on amdgcn-amd-amdhsa	2023-05-26 16:24:44 -07:00
Jennifer Yu	a419ec4f25	Fix runtime crash inside __kmpc_init_allocator It seems load of traits.addr should be passed in runtime call. Currently the load of load traits.addr gets passed cause runtime to fail. To fix this, skip the call to EmitLoadOfScalar for extra load. Differential Revision: https://reviews.llvm.org/D151576	2023-05-26 16:03:01 -07:00
Tobias Hieta	f98ee40f4b	[NFC][Py Reformat] Reformat python files in the rest of the dirs This is an ongoing series of commits that are reformatting our Python code. This catches the last of the python files to reformat. Since they where so few I bunched them together. Reformatting is done with `black`. If you end up having problems merging this commit because you have made changes to a python file, the best way to handle that is to run git checkout --ours <yourfile> and then reformat it with black. If you run into any problems, post to discourse about it and we will try to help. RFC Thread below: https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style Reviewed By: jhenderson, #libc, Mordante, sivachandra Differential Revision: https://reviews.llvm.org/D150784	2023-05-25 11:17:05 +02:00
Doru Bercea	04609b09e9	Enable up to 64 arguments for outlined regions in OpenMP device code. Co-Author: Fabio Luporini <fabio@devitocodes.com> Review: https://reviews.llvm.org/D150134	2023-05-24 10:31:39 -04:00
Hansang Bae	2f6eba7516	[OpenMP][libomp] Implement KMP_DLSYM_NEXT on Windows The interop API routines try to invoke external entries, but we did not have support for KMP_DLSYM_NEXT on Windows. Also added proper guards for STUB build. Differential Revision: https://reviews.llvm.org/D149892	2023-05-24 08:33:34 -05:00
Joseph Huber	47800a12dc	[OpenMP][NFC] clang-format the OpenMP device runtime These files aren't fully formatted. I'm guessing this was a holdover from when `clang-format` was totally broken for OpenMP offloading. Format the files to be more consistent. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D151226	2023-05-23 11:19:09 -05:00
Jonathan Peyton	d67c91b5e7	[OpenMP] Insert missing variable update inside loop While loop within task priority code did not have necessary update of variable which could lead to hangs if two threads collided when both attempted to execute the compare_and_exchange. Fixes: https://github.com/llvm/llvm-project/issues/62867 Differential Revision: https://reviews.llvm.org/D151138	2023-05-23 09:19:04 -05:00
Joachim Jenke	fe7f620ed6	[OpenMP][Tests][NFC] Mark unsupported libomp tests for GCC This patch properly marks the support level for libomp test when testing with GCC. Some new OpenMP features were only introduced with GCC 11. Tests using the target construct are incompatibe with GCC. Tests pass now with GCC 10, 11, 12	2023-05-23 10:33:09 +02:00

... 3 4 5 6 7 ...

3236 Commits