llvm-capstone

mirror of https://github.com/capstone-engine/llvm-capstone.git synced 2024-10-07 02:43:57 +00:00

Author	SHA1	Message	Date
Dominik Adamski	d87a53a960	[NFC][OpenMP][Flang] Add test for OpenMP target parallel do (#77776 ) Added test which proves that end-to-end compilation of `omp target parallel do` costruct is successful for Flang compiler.	2024-01-18 15:26:39 +01:00
Paul Osmialowski	d5b2e41e20	[OpenMP][omp_lib] Restore compatibility with more restrictive Fortran compilers (#77780 ) The most recent changes to `omp_lib.h.var` have re-introduced some compatibility issues that had to be fixed due to the similar changes in the past. Namely: 1. D120707 has removed the "use omp_lib_kinds" statement and replaced it with import 2. D114537 added line continuation to the long lines This patch introduces the same kind of changes in order to restore compatibility with some more restrictive Fortran compilers so their users could still benefit from the LLVM's OpenMP Fortran library.	2024-01-18 11:06:24 +00:00
Alexandre Ganea	64874e5ab5	[openmp] Silence warnings when building the LLVM release with MSVC	2024-01-17 07:23:58 -05:00
Alexandre Ganea	c5bbf40d98	[openmp] Remove extra ';' outside of function Fixes: ``` [4038/11058] Building CXX object projects/openmp/libomptarget/src/CMakeFiles/omptarget.dir/OpenMP/InteropAPI.cpp.o /home/aganea/llvm-project/openmp/libomptarget/src/OpenMP/InteropAPI.cpp:202:2: warning: extra ';' outside of a function is incompatible with C++98 [-Wc++98-compat-extra-semi] }; ^ 1 warning generated. ```	2024-01-17 07:23:56 -05:00
Joseph Huber	89cdd48a22	[Libomptarget] Remove temporary files in AMDGPU JIT impl (#77980 ) Summary: This patch cleans up some of the JIT handling for AMDGPU as well as removing its temporary files. Previously these would be left in the temporary directory after the program was run. This costs some extra time, but the correct solution to avoid that is to create a sufficient entrypoint into `ld.lld` that we can simply pass a memory buffer into.	2024-01-15 19:03:19 -06:00
carlobertolli	93efa2b8b9	Revert "[OpenMP] Fix two usm tests for amdgpus." (#77983 ) Reverts llvm/llvm-project#77851	2024-01-12 15:01:49 -06:00
carlobertolli	3add9491cd	[OpenMP] Fix two usm tests for amdgpus. (#77851 ) Some are missing setting of HSA_XNACK=1 environment variable, used to enable unified memory support on amdgpu's when it's not been set at kernel boot time. Some others needed to be marked as supporting unified_shared_memory in the lit test harness.	2024-01-12 14:42:49 -06:00
Joseph Huber	ab02372c23	[OpenMP] Fix or disable NVPTX tests failing currently (#77844 ) Summary: This patch is an attempt to get a clean run of `check-openmp` running on an NVPTX machine. I simply took the lists of tests that failed on my `sm_89` machine and disabled them or fixed them. A lot of these tests are disabled on AMDGPU already, so it makes sense that NVPTX fails. The others are simply problems with NVPTX optimized debugging which will need to be fixed. I opened an issue on one of them.	2024-01-11 19:17:08 -06:00
Joseph Huber	37c1a5e3f5	[Libomptarget] Fix GPU Dtors referencing possibly deallocated image (#77828 ) Summary: The constructors and destructors look up a symbol in the ELF quickly to determine if they need to be run on the GPU. This allows us to avoid the very slow actions required to do the slower lookup using the vendor API. One problem occurs with how we handle the lifetime of these images. Right now there is no invariant to specify the lifetime of the underlying binary image that is loaded. In the typical case, this comes from the binary itself in the `.llvm.offloading` section, meaning that the lifetime of the binary should match the executable itself. This would work fine, if it weren't for the fact that the plugin is loaded via `dlopen` and can have a teardown order out of sync with the main executable. This was likely what was occuring when this failed on some systems but not others. A potential solution would be to simply copy images into memory so the runtime does not rely on external references. Another would be to manually zero these out after initialization as to prevent this mistake from happening accidentally. The former has the benefit of making some checks easier, and allowing for constant initialization be done on the ELF itself (normally we can't do this because writing to a constant section, e.g. .llvm.offloading is a segfault.). The downside would be the extra time required to copy the image in bulk (Although we are likely doing this in the vendor runtimes as well). This patch went with a quick solution to simply set a boolean value at initialization time if we need to call destructors. Fixes: https://github.com/llvm/llvm-project/issues/77798	2024-01-11 15:00:53 -06:00
Joseph Huber	3ede817f5b	[Libomptarget] Fix JIT on the NVPTX target by calling ptx manually (#77801 ) Summary: Recently a patch added an assertion in the GlobalHandler to indicate when an ELF was not used. This began to fire whenever NVPTX JIT was used, because the JIT pass output a PTX file instead of an ELF. The CUModuleLoad method consumes `.s` internally and compiles it to a cubin, however, this is too late as we perform several checks on the ELF directly for the presence of certain symbols and to read some necessary constants. This results in inconsistent behaviour. To address this, this patch simply calls `ptxas` manually, similar to how `lld` is called for the AMDGPU JIT pass. This is inevitably going to be slower than simply passing it to the CUDA routine due to the overhead involved in file IO and a fork call, but it's necessary for correctness. CUDA provides an API for compiling PTX manually. However, this only started showing up in CUDA 11.1 and is only provided "officially" in a static library. The `libnvidia-ptxjitcompiler.so` next to the CUDA driver has the same symbols and can likely be used as a replacement. This would be the faster solution. However, given that it's not documented it may have some issues.	2024-01-11 11:32:43 -06:00
Dominik Adamski	18798cf972	[OpenMP] Add missing weak definitions of missing variables (#77767 ) Variables `__omp_rtl_assume_teams_oversubscription` and `__omp_rtl_assume_threads_oversubscription `are used by functions: `__kmpc_distribute_static_loop`, `__kmpc_distribute_for_static_loop `and `__kmpc_for_static_loop`.	2024-01-11 15:28:45 +01:00
Dominik Adamski	ee431288a6	[NFC][OpenMP][Flang] Add smoke test for omp target parallel (#77579 ) Added test which proves that end-to-end compilation of omp target parallel costruct is successful for Flang compiler.	2024-01-11 10:18:11 +01:00
Andrew Gozillon	8ca07e57c3	[Flang][OpenMP][Offloading][Test] Adjust slightly incorrect tests now cmake configuration works These tests were slightly broken, in one case a failing test that now works. In the other case some accidentally left over code during a name change that broke compilation due to missing symbols.	2024-01-10 16:20:33 -06:00
Joseph Huber	e203968e41	[Libomptarget] Do not abort on failed plugin init (#77623 ) Summary: The current code logic is supposed to skip plugins that aren't found or could not be loaded. However, the plugic ontained a call to `abort` if it failed, which prevented us from continuing if initilalization the plugin failed (such as if `dlopen` failed for the dyanmic plugins).	2024-01-10 11:42:04 -06:00
Joseph Huber	d03b8c3a04	[Libomptarget][NFC] Format in-line comments consistently (#77530 ) Summary: The LLVM style uses /Foo=/ when indicating the name of a constant. See https://llvm.org/docs/CodingStandards.html#comment-formatting. This is useful for consistency, as well as because `clang-format` understands this syntax and formats it more cleanly. Do a bulk update of this syntax.	2024-01-10 10:10:08 -06:00
Joseph Huber	0d6412eae3	[Libomptarget] Add error message back in after changes (#77528 ) Summary: My previous reworking of the image hangling removed the image info which was originally used for this extra error message requested by Ye Luo. I have since added in the necessary ELF facilities to extract it from the object file and can add it back in. It's a little verbose mostly from needing to shuffle around types and potential errors.	2024-01-10 10:07:53 -06:00
Joseph Huber	d65a7d1f1a	[Libomptarget] Do not run CPU tests if FFI was not found Summary: The previous behaviour before I made it dynamically open libFFI was that these tests would be ignored if FFI was not found. This now allows tests to be run without the dependency and thus the tests fails on some buildbots. This simply makesit not build the tests if it's not present.	2024-01-10 07:22:23 -06:00
Martin Storsjö	14435a28cd	[OpenMP] Allow setting OPENMP_INSTALL_LIBDIR (#77533 ) The comment indicate that it should be possible, but as long as it wasn't a cache variable, the cmake script overwrote whatever variable the user had set.	2024-01-10 11:24:19 +02:00
Joseph Huber	c7c68f1764	[Libomptarget] Allow the CPU targets to be built without libffi (#77495 ) Summary: The CPU targets currently rely on `libffi` to invoke the "kernel" functions. Previously we would not build these if this dependency was not found. This patch copies th eapproach used for things like CUDA and HSA to dynamically load this if it is not found. The one sketchy thing this does is hard-code the default ABI for the target. These are normally defined on a per-file basis in the FFI source, so I had to fish out the expected values. We only use two types, so ideally we will always be able to use the default ABI. It's possible we could remove this dependency entirely in the future as well.	2024-01-09 14:01:52 -06:00
Brad Smith	dc03382d3e	[openmp][AIX] Add AIX to __kmp_set_stack_info() (#77421 )	2024-01-09 12:02:40 -05:00
Joseph Huber	0fe86f9c51	[Libomptarget] Remove extra cache for offloading entries (#77012 ) Summary: The offloading entries right now are assumed to be baked into the binary itself, and thus always valid whenever the library is executing. This means that we don't need to copy them to additional storage and can instead simply pass around references to it. This is not likely to change in the expected operation of the OpenMP library. Additionally, the indirection for the offload entry struct is simply two pointers, so moving it by value is trivial.	2024-01-08 16:49:33 -06:00
carlobertolli	ce4144406c	Revert "[OpenMP][libomptarget] Enable automatic unified shared memory executi…" (#77371 ) Reverts llvm/llvm-project#75999 lit test is failing.	2024-01-08 14:38:29 -06:00
carlobertolli	22a73e7c46	[OpenMP][libomptarget] Enable automatic unified shared memory executi… (#75999 ) …on (zero-copy) on MI300A. This patch enables applications that did not request OpenMP unified_shared_memory to run with the same zero-copy behavior, where mapped memory does not result in extra memory allocations and memory copies, but CPU-allocated memory is accessed from the device. The name for this behavior is "automatic zero-copy" and it relies on detecting: that the runtime is running on a MI300A, that the user did not select unified_shared_memory in their program, and that XNACK (unified memory support) is enabled in the current GPU configuration. If all these conditions are met, then automatic zero-copy is triggered. This patch is still missing support for global variables, which will be provided in a subsequent patch. Co-authored-by: Thorsten Blass <thorsten.blass@amd.com>	2024-01-08 14:17:28 -06:00
Joseph Huber	e7655ad605	[Libomptarget] Remove unnecessary CMake definition of endiannness (#77205 ) Summary: This is needed for some definition in `hsa.h` that requires this to be set for some architectures when it fails at autodetection. We only really build `libomptarget` with `gcc` and `clang` which already provide their own way of detecting this. Remove the unnecessary define and move it into the source.	2024-01-08 13:23:38 -06:00
Joseph Huber	bda562519b	[Libomptarget][NFC] Fix unhandled allocator enum value	2024-01-08 10:17:05 -06:00
Xing Xue	2edce427a8	[openmp][AIX]Initial changes for porting to AIX (#76841 ) This PR contains initial changes for building and testing libomp on AIX. More changes will follow. - `KMP_OS_AIX` is defined for the AIX platform - `KMP_ARCH_PPC` is defined for 32-bit PPC - `KMP_ARCH_PPC_XCOFF` and `KMP_ARCH_PPC64_XCOFF` are for 32- and 64-bit XCOFF object formats respectively - Assembly file `z_AIX_asm.S` is used for AIX specific assembly code and will be added in a separate PR - The target library is disabled because AIX does not have the device support - OMPT is temporarily disabled	2024-01-08 08:33:00 -05:00
Chaitanya	1637c07925	[openmp][amdgpu] Add DynamicLdsSize to AMDGPUImplicitArgsTy (#65325 ) #65273 "hidden_dynamic_lds_size" argument will be added in the reserved section at offset 120 of the implicit argument layout Add DynamicLdsSize to AMDGPUImplicitArgsTy struct at offset 120 and fill the dynamic LDS size before kernel launch.	2024-01-06 09:34:48 +05:30
Dominik Adamski	0cdaadf15a	[libomptarget][flang] Explicitly pass the OpenMP device libraries to tests (#76796 ) This pull request is a follow-up of patch: https://github.com/llvm/llvm-project/pull/68225 and it explicitly specifies OpenMP device libraries for Fortran OpenMP tests.	2024-01-04 08:45:34 +01:00
Joseph Huber	fb32977ac7	[Libomptarget] Fix RPC-based malloc on NVPTX (#72440 ) Summary: The device allocator on NVPTX architectures is enqueued to a stream that the kernel is potentially executing on. This can lead to deadlocks as the kernel will not proceed until the allocation is complete and the allocation will not proceed until the kernel is complete. CUDA 11.2 introduced async allocations that we can manually place on separate streams to combat this. This patch makes a new allocation type that's guaranteed to be non-blocking so it will actually make progress, only Nvidia needs to care about this as the others are not blocking in this way by default. I had originally tried to make the `alloc` and `free` methods take a `__tgt_async_info`. However, I observed that with the large volume of streams being created by a parallel test it quickly locked up the system as presumably too many streams were being created. This implementation not just creates a new stream and immediately destroys it. This obviously isn't very fast, but it at least gets the cases to stop deadlocking for now.	2024-01-02 16:53:53 -06:00
Kareem Ergawy	75be7bb3fc	[flang][OpenMP][Offloading][AMDGPU] Add test for `target update` (#76355 ) Adds a new test for offloading `target update` directive to AMD GPUs.	2024-01-02 09:50:27 +01:00
Joseph Huber	64f0681e97	[Libomptarget] Rework image checking further (#76120 ) Summary: In the future, we may have more checks for different kinds of inputs, e.g. SPIR-V. This patch simply reworks the handling to be more generic and do the magic detection up-front. The checks inside the routines are now asserts so we don't spend time checking this stuff over and over again. This patch also tweaked the bitcode check. I used a different function to get the Lazy-IR module now, as it returns the raw expected value rather than the SM diganostic. No functionality change intended.	2023-12-29 15:14:39 -06:00
Gheorghe-Teodor Bercea	a01b58aef0	[OpenMP][libomptarget][Fix] Add missing array initialization (#76457 ) Add missing array initialization as the array was not initialized and the value zero was assumed.	2023-12-27 12:58:41 -05:00
Ethan Luis McDonough	813a671232	[OpenMP] Remove unnecessary dependencies from plugin unit tests (#76266 ) This was an oversight that seems to be causing problems on certain builds. This patch should fix #76225.	2023-12-22 14:44:23 -06:00
Felipe Cabarcas	9b6ea5e8f8	[OpenMP] Improve omp offload profiler (#68016 ) Summary: Adding information to the LIBOMPTARGET profiler runtime kernel and API calls. Key changes: * Adding information to runtime calls for better understanding of how the application is executing. For example teams requested by the user, size of memory transfers. * Profile timer was changed from 'us' to 'ns', since 'us' was too coarse-grain to register some important details like key kernel duration * Removed non API or Runtime calls, to reduce complexity of profile for application developers. --------- Co-authored-by: Felipe Cabarcas <cabarcas@leia.crpl.cis.udel.edu> Co-authored-by: fel-cab <fel-cab@github.com>	2023-12-22 14:58:11 -05:00
Fabian Mora	12250c4092	Reland [OpenMP][Fix] libomptarget Fortran tests (#76189 ) This patch fixes the erroneous multiple-target requirement in Fortran offloading tests. Additionally, it adds two new variables (test_flags_clang, test_flags_flang) to lit.cfg so that compiler-specific flags for Clang and Flang can be specified. This patch re-lands: #74543. The error was caused by having: ``` config.substitutions.append(("%flags", config.test_flags)) config.substitutions.append(("%flags_clang", config.test_flags_clang)) config.substitutions.append(("%flags_flang", config.test_flags_flang)) ``` when instead it has to be: ``` config.substitutions.append(("%flags_clang", config.test_flags_clang)) config.substitutions.append(("%flags_flang", config.test_flags_flang)) config.substitutions.append(("%flags", config.test_flags)) ``` because LIT replaces with the first longest sub-string match.	2023-12-21 14:18:36 -08:00
Ethan Luis McDonough	cb3a893436	[OpenMP] Check for gtest when building libomptarget unit tests (#76141 ) This patch addresses an issue introduced in pull request #74398. CMake will attempt to re-build gtest if openmp is enabled as a project (as opposed to being enabled as a runtime). This patch adds a check that prevents this from happening.	2023-12-21 04:00:35 -06:00
Joseph Huber	ba192debb4	[Libomptarget][Obvious] Fix typo in attribute lookup Summary: These are keys into the AMDGPU target metadata. One of them had a typo which prevented it from being extracted.	2023-12-20 19:03:35 -06:00
Joseph Huber	f324584ae3	[Libomptarget][NFCI] Remove caching of created ELF files (#76080 ) Summary: We currently keep a cache of created ELF files from the relevant images. This shouldn't be necessary as the entire ELF interface is generally trivially constructable and extremely cheap. The cost of constructing one of these objects is simply a size check and writing a pointer to the underlying data. Given that, keeping a cache of these images should not be necessary overall.	2023-12-20 17:13:41 -06:00
Shilei Tian	7e4c6f6cb2	[OpenMP] Reduce the size of heap memory required by the test `malloc_parallel.c` (#75885 ) This patch reduces the size of heap memory required by the test `malloc_parallel.c` and `malloc.c`. The original size is too large such that `malloc` returns `nullptr` on many threads, causing illegal memory access.	2023-12-20 15:03:01 -08:00
Ethan Luis McDonough	3c10e5b2f6	[OpenMP] Add unit tests for nextgen plugins (#74398 ) This patch add three GTest unit tests that test plugin read and write operations. Tests can be compiled with `ninja -C runtimes/runtimes-bins LibomptUnitTests`.	2023-12-20 14:58:56 -08:00
Joseph Huber	e4f4022b70	[Libomptarget][NFC] Fix linting warnings in the plugins Summary: Fix some linting warnings present in the plugins.	2023-12-20 10:07:34 -06:00
Joseph Huber	ac029e02a9	[Libomptarget] Remove __tgt_image_info and use the ELF directly (#75720 ) Summary: This patch reorganizes a lot of the code used to check for compatibility with the current environment. The main bulk of this patch involves moving from using a separate `__tgt_image_info` struct (which just contains a string for the architecture) to instead simply checking this information from the ELF directly. Checking information in the ELF is very inexpensive as creating an ELF file is simply writing a base pointer. The main desire to do this was to reorganize everything into the ELF image. We can then do the majority of these checks without first initializing the plugin. A future patch will move the first ELF checks to happen without initializing the plugin so we no longer need to initialize and plugins that don't have needed images. This patch also adds a lot more sanity checks for whether or not the ELF is actually compatible. Such as if the images have a valid ABI, 64-bit width, executable, etc.	2023-12-19 20:01:31 -06:00
Joseph Huber	219355d4c0	[Libomptarget] Use scoped atomics in the device runtime (#75834 ) Summary: A recent patch allowed us to easily replace GNU atomics with scoped variants that make use of the backend's handling for more permissive scopes. The default is full "system" scope, that means the atomic operation must be consistent with operations that may happen on the host's memory. This is generally only required for processes that are communicating with something via global fine-grained memory. This patch uses these atomics to make everything device scoped, as nothing in the OpenMP runtime should depend on the host. This is only provided as a very new clang extension but the DeviceRTL is only compiled with clang so it is always available.	2023-12-19 14:30:34 -06:00
Carlos Eduardo Seo	dcd7c8b7c9	[OpenMP][AArch64] Workaround for ompt/synchronization tests (#75848 ) ompt/synchronization/[masked.c \| master.c] tests fail due to a wrong offset being calculated for the possible return addreses. PR #65936 fixes this for Darwin and the same has to be done for Linux. Updates #69627	2023-12-19 19:26:23 +01:00
Fabian Mora	ac82c8b925	Revert "[OpenMP][Fix] libomptarget Fortran tests" (#75953 ) Reverts llvm/llvm-project#74543	2023-12-19 12:11:08 -05:00
Gheorghe-Teodor Bercea	65909177e3	[OpenMP][libomptarget][Fix] Disable test on NVIDIA platforms (#75949 ) The tests doesn't seem to work for NVIDIA so disabling it for now.	2023-12-19 11:58:10 -05:00
Fabian Mora	49efb082cc	[OpenMP][Fix] libomptarget Fortran tests (#74543 ) This patch fixes the erroneous multiple-target requirement in Fortran offloading tests. Additionally, it adds two new variables (`test_flags_clang`, `test_flags_flang`) to `lit.cfg` so that compiler-specific flags for Clang and Flang can be specified.	2023-12-19 11:35:14 -05:00
Shilei Tian	3768039913	[OpenMP] Directly use user's grid and block size in kernel language mode (#70612 ) In kernel language mode, use user's grid and blocks size directly. No validity check, which means if user's values are too large, the launch will fail, similar to what CUDA and HIP are doing right now.	2023-12-18 12:26:18 -05:00
Joseph Huber	913622d012	[Libomptarget] Remove remaining global constructors in plugins (#75814 ) Summary: This patch fixes the remaining global constructor in the plguins after addressing the ones in the JIT interface. This struct was mistakenly using global constructors as not all the members were being initialized properly. This was almost certainly being optimized out because it's trivial, but would still be present in debug builds and prevented us from compiling with `-Werror=global-constructors`. We will want to do that once offloading is moved to a runtimes only build.	2023-12-18 11:01:02 -06:00
Joseph Huber	1580877555	[Libomptarget] Remove bitcode image map used for JIT processing (#75672 ) Summary: Libomptarget supports JIT by treating an LLVM-IR file as a regular input image. The handling here used a global map to keep track of triples once it was parsed. This was done to same time, however this created a global constructor as well as an extra mutex to handle it. This patch removes the use of this map. Instead, we simply use the file magic to perform a quick check if the input image is valid bitcode. If not, we then create a lazy module. This should roughly equivalent to the old handling that create an IR symbol table. Here we can prevent the module from materializing everything but the single triple metadata we read in later.	2023-12-18 09:28:06 -06:00

1 2 3 4 5 ...

3214 Commits