llvm-capstone

mirror of https://github.com/capstone-engine/llvm-capstone.git synced 2024-11-23 13:50:11 +00:00

Author	SHA1	Message	Date
Shraiysh	03485a0406	[openmp][flang] Add tests for map clause (#70394 ) This patch adds basic tests for map clause on target construct for commonblocks. There will be more tests to add, which will be added in future patches. Currently failing tests are added in a separate folder with XFAIL. They should be moved as they are fixed.	2023-10-27 09:35:06 -07:00
Mehdi Amini	f390a76b7e	Revert "Revert "[OpenMP][NFC] Add min/max threads/teams count into the KernelEnvironment (#70257 )"" This reverts commit `ddbaa11e9f`. Reapply the original commit, the broken test was repaired in `5e51363f38` in the meantime.	2023-10-26 17:30:01 -07:00
Mehdi Amini	ddbaa11e9f	Revert "[OpenMP][NFC] Add min/max threads/teams count into the KernelEnvironment (#70257 )" This reverts commit `c2a1249a82`. The MLIR bots are broken with an omp test failure.	2023-10-26 17:25:20 -07:00
Johannes Doerfert	c2a1249a82	[OpenMP][NFC] Add min/max threads/teams count into the KernelEnvironment (#70257 ) The runtime needs to know about the acceptable launch bounds, especially if the compiler (middle- or backend) assumed those bounds. While this patch does not yet inform the runtime, it stores the bounds in a place that can/will be accessed and is associated with the kernel.	2023-10-26 14:46:55 -07:00
Johannes Doerfert	0012b956f9	[OpenMP][FIX] Move workaround code to avoid races The workaround code ensure we always call __kmpc_kernel_parallel, but it did so in a racy manner as the initialization might not have been completed yet. To avoid introducing a sync, we move the workaround into the deinit function for now.	2023-10-26 14:38:23 -07:00
Joseph Huber	cee08ff342	[Libomptarget] Do not pass 'nogpulib' to the non-LTO Nvidia tests (#70327 ) Summary: For the other tests we pass `-nogpulib` to ensure that we set up the needed libraries correctly. However, this caused problems for the non-LTO build and test of Nvidia systems. In general this is because we would do a separate compile of the libomptarget device runtime and then link in that cubin. This exercised the runtime in a lot of ways it's not used to, since doing things this way was hardly expected or tested. This patch disables it only for the Nvidia non-LTO build so that we still get the effect of `--liboimptarget-nvptx-bc-path` rather than ignoring it.	2023-10-26 10:36:34 -05:00
Yuanfang Chen	f09f58d0f2	[OpenMP] [OMPD] Fix CMake install command https://cmake.org/cmake/help/latest/command/install.html "If a relative path is given it is interpreted relative to the value of the CMAKE_INSTALL_PREFIX variable."	2023-10-26 03:02:53 +00:00
Joseph Huber	17b5445996	[Libomptarget] Add a wavefront sync builtin for the AMDGPU implementation (#70228 ) Summary: While this is technically a no-op for AMDGPU hardware, in cases where the user would see fit to add an explicit wavefront sync on Nvidia hardware, we should also inform the LLVM optimizer that this control flow is convergent so we do not reorder blocks.	2023-10-25 14:27:14 -05:00
Joseph Huber	006cd37960	[OpenMP][Obvious] Fix incorrect variant selector in test Summary: This should be `kind` and not `arch`.	2023-10-25 13:48:30 -05:00
Joseph Huber	ca3545f0ef	[Libomptarget] Bump up PTX version from +ptx61 to +ptx63 (#70227 ) Summary: This version is required to support the 'activemask' feature which is used for certain features, such as reductions. This ties the implementation of the DeviceRTL roughly to the features provided by the CUDA 9.0 release, which should be sufficienly old as to not cause problems since this is a minor version jump that corresponds to the release of `sm_53`.	2023-10-25 13:28:02 -04:00
Joseph Huber	8a181f43da	[OpenMP][Obvious] Fix incompatbile function prototype causing failures Summary: This function needs `void` as the arguments to be ABI compatbile with what is actually defined. This is enforced when doing CUDA separable linking of the runtime.	2023-10-25 10:44:07 -05:00
Joseph Huber	84d8ace51a	[OpenMP][Obvious] Fix function prototype when used in C mode Summary: The `llvm_omp_target_dynamic_shared_alloc` prototype in `omp.h` accidentally left the void argument unspecified. This created unintended code when called from the C language, causing some `nvlink` failures in certain scenarios.	2023-10-25 09:35:23 -05:00
Ilya Leoshkevich	f7fc98a1cf	[OpenMP][Archer] Do not check for column numbers in backtraces (#70075 ) TSan can show only line numbers on some platforms, e.g., SystemZ. Skip checking the column numbers; line numbers should be enough to verify that race detection is working.	2023-10-25 13:22:24 +02:00
Ilya Leoshkevich	77c2b623ca	[OpenMP][Tests] Sync struct DEP with the runtime (#69982 ) struct DEP defined in multiple testcases must correspond to runtime's struct kmp_depend_info. The former defines flags as int, and the latter as kmp_uint8_t. This discrepancy goes unnoticed on little-endian systems, but breaks big-endian ones. Make flags in struct DEP unsigned char.	2023-10-24 19:40:08 +02:00
Ilya Leoshkevich	34459b72da	[OpenMP] Provide big-endian bitfield definitions (#69995 ) structs kmp_depend_info.flags and kmp_tasking_flags contain bitfields, which overlay integer flag values. The current bitfield definitions target little-endian machines. On big-endian machines bitfields are laid out in the opposite order, so the current definitions do not work there. There are two ways to fix this: either provide big-endian bitfield definitions, or bit-swap integer flag values. Go with the former, since it's localized to one place and therefore is more maintainable.	2023-10-24 19:39:50 +02:00
Jon Chesterfield	840d0b7e03	[amdgpu] D2D memcpy via streams and HSA (#69977 ) hsa_amd_memory_async_copy can handle device to device copies if passed the corresponding parameters. No functional change - currently D2D copy goes through a fallback in libomptarget that stages through a host malloc, after this it goes directly through HSA. Works under exactly the situations that HSA works. Verified locally on a performance benchmark. Hoping to attract further testing from internal developers after it lands.	2023-10-24 00:05:04 +01:00
Johannes Doerfert	86bb713142	[OpenMP][FIX] Enlarge thread state array, improve test and add second	2023-10-22 17:47:00 -07:00
Johannes Doerfert	9f3b06d8be	[OpenMP][FIX] Fix memset oversight to partially unblock test The tests "unoptimized" version is still broken, disabled for now.	2023-10-22 14:29:11 -07:00
Johannes Doerfert	f3ff0a67be	[OpenMP][FIX] Ensure test runs correct with (at least) 2 threads	2023-10-22 13:22:36 -07:00
Johannes Doerfert	87dac9f168	[OpenMP] Rewrite test to check the correct (CPU) result The test initially showed we do no crash but compute the wrong GPU result, now we show the CPU result is correct and disable GPU testing.	2023-10-21 14:55:15 -07:00
Johannes Doerfert	d3921e4670	[OpenMP] Basic BumpAllocator for (AMD)GPUs (#69806 ) The patch contains a basic BumpAllocator for (AMD)GPUs to allow us to run more tests. The allocator implements `malloc`, both internally and externally, while we continue to default to the NVIDIA `malloc` when we target NVIDIA GPUs. Once we have smarter or customizable allocators we should consider this choice, for now, this allocator is better than none. It traps if it is out of memory, making it easy to debug. Heap size is configured via `LIBOMPTARGET_HEAP_SIZE` and defaults to 512MB. It allows to track allocation statistics via `LIBOMPTARGET_DEVICE_RTL_DEBUG=8` (together with `-fopenmp-target-debug=8`). Two tests were added, and one was enabled. This is the next step towards fixing https://github.com/llvm/llvm-project/issues/66708	2023-10-21 14:49:30 -07:00
Johannes Doerfert	d571af7f62	[OpenMP][FIX] Ensure thread states do not crash on the GPU The nested parallelism causes thread states which still do not properly work but at least don't crash anymore.	2023-10-21 14:43:09 -07:00
Johannes Doerfert	1cea309b7e	[OpenMP][NFC] Move DebugKind to make it reusable from the host	2023-10-20 19:28:09 -07:00
Joseph Huber	34a3fb9f62	[Libomptarget][NFC] Remove use of VLA in the AMDGPU plugin (#69761 ) Summary: We should not rely on a VLA in C++ for the handling of this string. The size is a true runtime value so we cannot rely on constexpr handling. We simply use a small vector, whose default size is most likely large enough to handle whatever size gets output within the stack, but is safe in cases where it is not.	2023-10-20 16:02:51 -04:00
Michael Klemm	f93a697e47	[libomptarget][OpenMP] Initial implementation of omp_target_memset() and omp_target_memset_async() (#68706 ) Implement a slow-path version of omp_target_memset*() There is a TODO to implement a fast path that uses an on-device kernel instead of the host-based memory fill operation. This may require some additional plumbing to have kernels in libomptarget.so	2023-10-19 15:29:36 +02:00
Joseph Huber	970e7456e0	[Libomptarget] Add a test for the `libc` implementation of assert (#69518 ) Summary: The `libcgpu.a` file provides its own implementation of `__assert_fail`. This adds a test to make sure it's usable in OpenMP offloading as expected. Currently this requires linking `libcgpu.a` before the OpenMP device RTL however. We also disable the test on the CPU as the format of the string will be different.	2023-10-19 08:55:45 -04:00
Joseph Huber	b69081e324	Attributes (#69358 ) - [Libomptarget] Make the references to 'malloc' and 'free' weak. - [Libomptarget][NFC] Use C++ style attributes instead	2023-10-18 12:52:43 -04:00
Joseph Huber	1e5fe67e70	[Libomptarget] Make the references to 'malloc' and 'free' weak. (#69356 ) Summary: We use `malloc` internally in the DeviceRTL to handle data globalization. If this is undefined it will map to the Nvidia implementation of `malloc` for NVPTX and return `nullptr` for AMDGPU. This is somewhat problematic, because when using this as a shared library it causes us to always extract the GPU libc implementation, which uses RPC and thus requires an RPC server. Making this `weak` allows us to implement this internally without worrying about binding to the GPU `libc` implementation.	2023-10-18 12:50:23 -04:00
Jon Chesterfield	7ac516a119	[amdgpu] Disable openmp test that is blocking CI after changing hardware, need to diagnose memory fault	2023-10-16 13:59:49 +01:00
Kazu Hirata	18d199116f	Stop including llvm/ADT/STLFunctionalExtras.h (NFC) These source files do not use function_ref.	2023-10-13 20:50:58 -07:00
JP Lehr	b2a67255be	[OpenMP] Disable flaky libomptarget AMDGPU test We observe intermittent failures of that test and need some time to investigate. Hence, for now, we disable it.	2023-10-10 13:09:29 -05:00
Joseph Huber	4e9054d391	[Libomptarget] Fix lookup of the `libcgpu.a` library Summary: The `libcgpu.a` library was added to support certain libc functions. A recent patch made us pass its location directly on the commandline, however it used `find_library`. This doesn't work because the ordering of CMake might run `fine_library` before it builds the library we're trying to find. This patch changes this to just use the destimation we know it will end up in and checks it manually.	2023-10-05 10:48:56 -05:00
Joseph Huber	75e648031c	[Libomptarget] Disable AMDGPU complex math test after recent patch Summary: The recent patch added `-nogpulib` to make these tests only pick up what was intentionally put into them. This had the effect of removing the dependency on the ROCm device libs which are needed for math. This test disables the complex math test, which is the only one that needed it, for the time being. In the future we will implement these and provide it via the GPU `libm` and pass it in the same way as the GPU `libc`.	2023-10-04 15:24:43 -05:00
Joseph Huber	7282975057	[Libomptarget] Explicitly pass the OpenMP device libraries to tests (#68225 ) Summary: We have tests that depend on two static libraries `libomptarget.devicertl.a` and `libcgpu.a`. These are currently implicitly picked up and searched through the standard path. This patch changes that to pass `-nogpulib` to disable implicit runtime path searches. We then explicitly passed the built libraries to the compilations so that we know exactly which libraries are being used. Depends on: https://github.com/llvm/llvm-project/pull/68220 Fixes https://github.com/llvm/llvm-project/issues/68141	2023-10-04 14:14:30 -05:00
Joseph Huber	2d4d8c8f97	[Libomptarget] Make the DeviceRTL configuration globals weak (#68220 ) This patch applies weak linkage to the config globals by the name `__omp_rtl...`. This is because when passing `-nogpulib` we will not link in or create these globals. This allows the OpenMP device RTL to be self contained without requiring the additional definitions from the `clang` compiler. In the standard case, this should not affect the current behavior, this is because the strong definition coming from the compiler should always override the weak definition we default to here. In the case that these are not defined by the compiler, these will remain weak. This will impact optimizations somewhat, but the previous behavior was that it would not link so that is an improvement. Depends on: https://github.com/llvm/llvm-project/pull/68215	2023-10-04 14:14:13 -05:00
Joseph Huber	49d8a559d3	[LinkerWrapper] Fix resolution of weak symbols during LTO (#68215 ) Summary: Weak symbols are supposed to have the semantics that they can be overriden by a strong (i.e. global) definition. This wasn't being respected by the LTO pass because we simply used the first definition that was available. This patch fixes that logic by doing a first pass over the symbols to check for strong resolutions that could override a weak one. A lot of fake linker logic is ending up in the linker wrapper. If there were an option to handle this in `lld` it would be a lot cleaner, but unfortunately supporting NVPTX is a big restriction as their binaries require the `nvlink` tool.	2023-10-04 14:13:52 -05:00
agozillon	1482106c99	[Flang][OpenMP][MLIR] Remove deletion of unused declare target global after use replacement (#67762 ) At the moment, for device a reference pointer is generated in place of the original declare target global value, this reference pointer is the pointer that actually receives the data. In Clang the original global value isn't generated for device, just the reference pointer. Unfortunately for Flang/MLIR this is currently not the case, as the declare target attribute is processed after the creation of the global so we end up with a dead global on device effectively after rewriting its uses to the new device reference pointer. It appears I was a little overzealous with the deletion of the declare target globals for device. The current method breaks in-cases where the same declare target global is used across two target regions (added a runtime reproduced in the patch). As it'll effectively delete it before the second target gets a chance to be written to LLVM IR and have it's uses rewritten . I'd like to remove this deletion as the dead global isn't breaking any code and will likely be removed in later dead code elimination passes, perhaps a little too heavy handed with the original approach.	2023-10-03 15:21:27 +02:00
Leandro Lupori	5833a9e99a	[OpenMP] Fix -Wc++98-compat-extra-semi warning (NFC) (#68022 ) Compiling OpenMP with LLVM 16 triggers the following warning: warning: extra ';' outside of a function is incompatible with C++98	2023-10-02 16:43:02 -03:00
Shilei Tian	103bb69c04	[OpenMP] Fix a potential memory buffer overflow (#67252 ) #67167 reports a potential memory overflow caused by the wrong size passed to the function `memcpy_s`. This patch fixes it. Fix #67167.	2023-09-29 12:41:32 -04:00
Joseph Huber	183a1b1e38	[OpenMP] Enable the 'libc/malloc.c' test on NVPTX Summary: Previously this test hanged indefinitely on NVPTX. This was due to an issue fixed previously where we would wait indefinitely inside the CUDA runtime waiting for the kernel to complete if it was blocked on the RPC server. This patch enables this test again now that it can run without deadlocking, at least on CUDA 12.2.	2023-09-28 14:41:35 -05:00
Joseph Huber	0f88be77ea	[Libomptarget] Fix Nvidia offloading hanging on dataRetrieve using RPC (#66817 ) Summary: The RPC server is responsible for providing host services from the GPU. Generally, the client running on the GPU will spin in place until the host checks the server. Inside the runtime, we elected to have the user thread do this checking while it would be otherwise waiting for the kernel to finish. However, for Nvidia this caused problems when offloading to a target region that requires a copy back. This is caused by the implementation of `dataRetrieve` on Nvidia. We initialize an asynchronous copy-back on the same stream that the kernel is running on. This creates an implicit sync on the kernel to finish before we issue the D2H copy, which we then wait on. This implicit sync happens inside of the CUDA runtime. This is problematic when running the RPC server because we need someone to check the RPC server. If no one checks the RPC server then the kernel will never finish, meaning that the memcpy will never be issued and the program hangs. This patch adds an explicit check for unfinished work on the stream and waits for it to complete.	2023-09-26 16:03:34 -05:00
Joseph Huber	791b279924	[libc] Change the `puts` implementation on the GPU (#67189 ) Summary: Normally, the implementation of `puts` simply writes a second newline charcter after printing the first string. However, because the GPU does everything in batches of the SIMT group size, this will end up with very poor output where you get the strings printed and then 1-64 newline characters all in a row. Optimizations like to turn `printf` calls into `puts` so it's a good idea to make this produce the expected output. The least invasive way I could do this was to add a new opcode. It's a little bloated, but it avoids an unneccessary and slow send operation to configure this.	2023-09-25 11:17:22 -05:00
Andrew Gozillon	76916669b9	[MLIR][OpenMP] Initial Lowering of Declare Target for Data This patch adds initial lowering for DeclareTargetAttr on GlobalOp's utilising registerTargetGlobalVariable and getAddrOfDeclareTargetVar from the OMPIRBuilder. It also adds initial processing of declare target map operands, populating the combinedInfo that the OMPIRBuilder requires to generate kernels and it's kernel argument structure. The combination of these additions allows simple mapping of declare target globals to Target regions, as such a simple runtime test showcasing this and testing it has been added. The patch currently does not factor in filtering based on device_type clauses (e.g. no emission of globals for device if host specified), this will come in a future iteration. And for the moment it's only been tested with 1-D arrays and basic fortran data types, more complex types (such as user defined derived types from Fortran, allocatables or Fortran pointers) may need further work. reviewers: kiranchandramohan, skatrak Differential Revision: https://reviews.llvm.org/D149368	2023-09-20 13:31:15 -05:00
Kazushi Marukawa	7b8130c2c3	[OpenMP][VE] Limit the number of threads to create (#66729 ) VE supports up to 64 threads per a VE process. So, we limit the number of threads defined by KMP_MAX_NTH. We also modify the __kmp_sys_max_nth initialization to use KMP_MAX_NTH as a limit.	2023-09-20 17:44:24 +09:00
Arthur Eubanks	0a1aa6cda2	[NFC][CodeGen] Change CodeGenOpt::Level/CodeGenFileType into enum classes (#66295 ) This will make it easy for callers to see issues with and fix up calls to createTargetMachine after a future change to the params of TargetMachine. This matches other nearby enums. For downstream users, this should be a fairly straightforward replacement, e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive or s/CGFT_/CodeGenFileType::	2023-09-14 14:10:14 -07:00
Ye Luo	8c2da6bb7f	[libomptarget] document ActionFunctions in the amdgpu plugin. (#66397 )	2023-09-14 12:18:49 -05:00
Terry Wilmarth	102d864719	Fix /tmp approach, and add environment variable method as third fallback during library registration The /tmp fallback for /dev/shm did not write to a fixed filename, so multiple instances of the runtime would not be able to detect each other. Now, we create the /tmp file in much the same way as the /dev/shm file was created, since mkstemp approach would not work to create a file that other instances of the runtime would detect. Also, add the environment variable method as a third fallback to /dev/shm and /tmp for library registration, as some systems do not have either. Also, add ability to fallback to a subsequent method should a failure occur during any part of the registration process. When unregistering, it is assumed that the method chosen during registration should work, so errors at that point are ignored. This also avoids a problem with multiple threads trying to unregister the library.	2023-09-13 13:50:49 -05:00
Rodrigo Ceccato de Freitas	f94b6f3396	[OpenMP] Remove optimization skipping reduction struct initialization (#65697 ) This commit removes an optimization that skips the initialization of the reduction struct if the number of threads in a team is 1. This optimization caused a bug with Hidden Helper Threads. When the task group is initially initialized by the master thread but a Hidden Helper Thread executes a target nowait region, it requires the reduction struct initialization to properly accumulate the data. This commit also adds a LIT test for issue #57522 to ensure that the issue is properly addressed and that the optimization removal does not introduce any regressions. Fixes: #57522	2023-09-12 16:09:16 -05:00
Kazushi Marukawa	e8679b93da	[OpenMP][test][VE] Limit the number of AFFINITY_MAX_CPUS for VE (#65872 ) Limit the number of AFFINITY_MAX_CPUS for VE because VE's sched_getaffinity doesn't work correctly with large sized mask buffer.	2023-09-12 23:45:56 +09:00
Saiyedul Islam	466a8149b3	Revert "[AMDGPU] Make default AMDHSA Code Object Version to be 5 (#65410 )" (#66060 ) This reverts commit `0a8d17e79b`.	2023-09-12 15:13:59 +05:30

1 2 3 4 5 ...

3014 Commits