llvm-capstone

mirror of https://github.com/capstone-engine/llvm-capstone.git synced 2024-11-23 05:40:09 +00:00

Author	SHA1	Message	Date
David CARLIER	9b3edb592d	release/18.x: [openmp] __kmp_x86_cpuid fix for i386/PIC builds. (#84626 ) (#85053 )	2024-03-14 21:43:47 -07:00
Jonas Paulsson	33c6b20276	SystemZ release notes for 18.x. (#84560 )	2024-03-13 16:27:21 -07:00
Vadim Paretsky	a91b9bd9c7	[OpenMP] fix endianness dependent definitions in OMP headers for MSVC (#84540 ) MSVC does not define __BYTE_ORDER__ making the check for BigEndian erroneously evaluate to true and breaking the struct definitions in MSVC compiled builds correspondingly. The fix adds an additional check for whether __BYTE_ORDER__ is defined by the compiler to fix these. --------- Co-authored-by: Vadim Paretsky <b-vadipa@microsoft.com> (cherry picked from commit 110141b37813dc48af33de5e1407231e56acdfc5)	2024-03-11 13:42:47 -07:00
Daniel Martinez	c8b11e9300	Fix build on musl by including stdint.h (#81434 ) openmp fails to build on musl since it lacks the defines for int32_t Co-authored-by: Daniel Martinez <danielmartinez@cock.li> (cherry picked from commit 45fe67dd61a6ac7df84d3a586e41c36a4767757f)	2024-02-26 13:55:13 -08:00
Xing Xue	801a10d305	[OpenMP][AIX]Add assembly file containing microtasking routines and unnamed common block definitions (#81770 ) This patch adds assembly file `z_AIX_asm.S` that contains the 32- and 64-bit XCOFF version of microtasking routines and unnamed common block definitions. This code has been run through the libomp LIT tests and a user package successfully. (cherry picked from commit 94100bc2fb1a39dbeb43d18a95176097c53f1324)	2024-02-20 11:54:09 -08:00
Xing Xue	ae27600016	[OpenMP][AIX] Set worker stack size to 2 x KMP_DEFAULT_STKSIZE if system stack size is too big (#81996 ) This patch sets the stack size of worker threads to `2 x KMP_DEFAULT_STKSIZE` (2 x 4MB) for AIX if the system stack size is too big. Also defines maximum stack size for 32-bit AIX. (cherry picked from commit 2de269a641e4ffbb7a44e559c4c0a91bb66df823)	2024-02-19 16:14:44 -08:00
Xing Xue	34fdf52cce	[OpenMP][AIX]Define struct kmp_base_tas_lock with the order of two members swapped for big-endian (#79188 ) The direct lock data structure has bit `0` (the least significant bit) of the first 32-bit word set to `1` to indicate it is a direct lock. On the other hand, the first word (in 32-bit mode) or first two words (in 64-bit mode) of an indirect lock are the address of the entry allocated from the indirect lock table. The runtime checks bit `0` of the first 32-bit word to tell if this is a direct or an indirect lock. This works fine for 32-bit and 64-bit little-endian because its memory layout of a 64-bit address is (`low word`, `high word`). However, this causes problems for big-endian where the memory layout of a 64-bit address is (`high word`, `low word`). If an address of the indirect lock table entry is something like `0x110035300`, i.e., (`0x1`, `0x10035300`), it is treated as a direct lock. This patch defines `struct kmp_base_tas_lock` with the ordering of the two 32-bit members flipped for big-endian PPC64 so that when checking/setting tags in member `poll`, the second word (the low word) is used. This patch also changes places where `poll` is not already explicitly specified for checking/setting tags. (cherry picked from commit ac97562c99c3ae97f063048ccaf08ebdae60ac30)	2024-02-16 05:15:11 -08:00
Xing Xue	cf130269fa	[OpenMP][test]Flip bit-fields in 'struct flags' for big-endian in test cases (#79895 ) This patch flips bit-fields in `struct flags` for big-endian in test cases to be consistent with the definition of the structure in libomp `kmp.h`. (cherry picked from commit 7a9b0e4acb3b5ee15f8eb138aad937cfa4763fb8)	2024-02-16 05:15:11 -08:00
Martin Storsjö	d7c6794aff	[OpenMP] [cmake] Don't use -fno-semantic-interposition on Windows (#81113 ) This was added in `4b7beab418`. When the flag was added implicitly elsewhere, it was added via llvm/cmake/modules/HandleLLVMOptions.cmake, where it wasn't added on Windows/Cygwin targets. This avoids one warning per object file in OpenMP. (cherry picked from commit 72f04fa0734f8559ad515f507a4a3ce3f461f196)	2024-02-16 04:40:41 -08:00
Alexandre Ganea	1cfd46f134	[openmp] On Windows, fix standalone cmake build (#80174 ) This fixes: https://github.com/llvm/llvm-project/issues/80117 (cherry picked from commit d2565bb11308f6cf98d838e828d9bcbe2d51e0e4)	2024-02-01 17:54:51 -08:00
Alexandre Ganea	15fdc7646c	Re-land [openmp] Fix warnings when building on Windows with latest MSVC or Clang ToT (#77853 ) The reverts `94f960925b` and fixes it.	2024-01-23 12:48:38 -05:00
Alexandre Ganea	94f960925b	Revert `10f3296dd7` - [openmp] Fix warnings when building on Windows with latest MSVC or Clang ToT (#77853 ) It broke the AMDGPU buildbot: https://lab.llvm.org/buildbot/#/builders/193/builds/45378	2024-01-23 08:51:12 -05:00
Alexandre Ganea	10f3296dd7	[openmp] Fix warnings when building on Windows with latest MSVC or Clang ToT (#77853 ) There were quite a few compilation warnings when building openmp on Windows with the latest Visual Studios 2022 version 17.8.4. Some other warnings were visible with the latest Clang at tip. This commit fixes all of them.	2024-01-23 08:38:18 -05:00
Jan Patrick Lehr	181c4c331a	[OpenMP][Fix] Require USM capability in force-usm test (#79059 ) This should fix the AMDGPU buildbot breakage from #76571	2024-01-22 15:21:31 -06:00
Jan Patrick Lehr	fa4780fa6c	[OpenMP][USM] Introduces -fopenmp-force-usm flag (#76571 ) This flag forces the compiler to generate code for OpenMP target regions as if the user specified the #pragma omp requires unified_shared_memory in each source file. The option does not have a -fno-* friend since OpenMP requires the unified_shared_memory clause to be present in all source files. Since this flag does no harm if the clause is present, it can be used in conjunction. My understanding is that USM should not be turned off selectively, hence, no -fno- version. This adds a basic test to check the correct generation of double indirect access to declare target globals in USM mode vs non-USM mode. Which I think is the only difference observable in code generation. This runtime test checks for the (non-)occurence of data movement between host and device. It does one run without the flag and one with the flag to also see that both versions behave as expected. In the case w/o the new flag data movement between host and device is expected. In the case with the flag such data movement should not be present / reported.	2024-01-22 21:59:26 +01:00
Joseph Huber	621bafd5c1	[Libomptarget] Move target table handling out of the plugins (#77150 ) Summary: This patch removes the bulk of the handling of the `__tgt_offload_entries` out of the plugins itself. The reason for this is because the plugins themselves should not be handling this implementation detail of the OpenMP runtime. Instead, we expose two new plugin API functions to get the points to a device pointer for a global as well as a kernel type. This required introducing a new type to represent a binary image that has been loaded on a device. We can then use this to load the addresses as needed. The creation of the mapping table is then handled just in `libomptarget` where we simply look up each address individually. This should allow us to expose these operations more generically when we provide a separate API.	2024-01-22 11:06:47 -06:00
carlobertolli	ae99966a27	[OpenMP] Enable automatic unified shared memory on MI300A. (#77512 ) This patch enables applications that did not request OpenMP unified_shared_memory to run with the same zero-copy behavior, where mapped memory does not result in extra memory allocations and memory copies, but CPU-allocated memory is accessed from the device. The name for this behavior is "automatic zero-copy" and it relies on detecting: that the runtime is running on a MI300A, that the user did not select unified_shared_memory in their program, and that XNACK (unified memory support) is enabled in the current GPU configuration. If all these conditions are met, then automatic zero-copy is triggered. This patch also introduces an environment variable OMPX_APU_MAPS that, if set, triggers automatic zero-copy also on non APU GPUs (e.g., on discrete GPUs). This patch is still missing support for global variables, which will be provided in a subsequent patch. Co-authored-by: Thorsten Blass <thorsten.blass@amd.com>	2024-01-22 10:30:22 -06:00
carlobertolli	3440466536	[OpenMP] Fix two usm tests for amdgpus. (#78824 ) Some are missing setting of HSA_XNACK=1 environment variable, used to enable unified memory support on amdgpu's when it's not been set at kernel boot time. Some others needed to be marked as supporting unified_shared_memory in the lit test harness. Extend lit test harness to enable unified_shared_memory requirement for AMD GPUs. Reland: #77851	2024-01-22 08:56:51 -06:00
Joseph Huber	b689b4fe55	[LLVM][CMake] Add ffi_static target for the FFI static library (#78779 ) Summary: This patch is an attempt to make the `find_package(FFI)` support in LLVM prefer to provide the static library version if present. This is currently an optional library for building `libffi`, and its presence implies that it should likely be used. This patch is an attempt to fix some problems observed with testing programs linked against `libffi` on many different systems that could have conflicting paths. Linking it statically prevents this. This patch adds the `ffi_static` target for this library.	2024-01-22 07:27:06 -06:00
Dominik Adamski	21199f9842	[OpenMP][OMPIRBuilder] Fix LLVM IR codegen for collapsed device loop (#78708 ) When we generate the loop body function, we need to be sure, that all original loop counters are replaced by the new counter. We need to save all items which use the original loop counter and then perform replacement of the original loop counter. If we don't do it, there is a risk that some values are not updated.	2024-01-22 09:24:45 +01:00
Alexandre Ganea	0ac992e0ad	[openmp] Revert `64874e5ab5` since it was committed by mistake and the PR (https://github.com/llvm/llvm-project/pull/77853 ) wasn't approved yet.	2024-01-18 13:55:03 -05:00
Dominik Adamski	8930c5a4be	[NFC][OpenMP] Fix typo in CHECK line (#78586 ) Typo in test: openmp/libomptarget/test/offloading/fortran/basic-target-parallel-do.f90	2024-01-18 15:40:15 +01:00
Dominik Adamski	d87a53a960	[NFC][OpenMP][Flang] Add test for OpenMP target parallel do (#77776 ) Added test which proves that end-to-end compilation of `omp target parallel do` costruct is successful for Flang compiler.	2024-01-18 15:26:39 +01:00
Paul Osmialowski	d5b2e41e20	[OpenMP][omp_lib] Restore compatibility with more restrictive Fortran compilers (#77780 ) The most recent changes to `omp_lib.h.var` have re-introduced some compatibility issues that had to be fixed due to the similar changes in the past. Namely: 1. D120707 has removed the "use omp_lib_kinds" statement and replaced it with import 2. D114537 added line continuation to the long lines This patch introduces the same kind of changes in order to restore compatibility with some more restrictive Fortran compilers so their users could still benefit from the LLVM's OpenMP Fortran library.	2024-01-18 11:06:24 +00:00
Alexandre Ganea	64874e5ab5	[openmp] Silence warnings when building the LLVM release with MSVC	2024-01-17 07:23:58 -05:00
Alexandre Ganea	c5bbf40d98	[openmp] Remove extra ';' outside of function Fixes: ``` [4038/11058] Building CXX object projects/openmp/libomptarget/src/CMakeFiles/omptarget.dir/OpenMP/InteropAPI.cpp.o /home/aganea/llvm-project/openmp/libomptarget/src/OpenMP/InteropAPI.cpp:202:2: warning: extra ';' outside of a function is incompatible with C++98 [-Wc++98-compat-extra-semi] }; ^ 1 warning generated. ```	2024-01-17 07:23:56 -05:00
Joseph Huber	89cdd48a22	[Libomptarget] Remove temporary files in AMDGPU JIT impl (#77980 ) Summary: This patch cleans up some of the JIT handling for AMDGPU as well as removing its temporary files. Previously these would be left in the temporary directory after the program was run. This costs some extra time, but the correct solution to avoid that is to create a sufficient entrypoint into `ld.lld` that we can simply pass a memory buffer into.	2024-01-15 19:03:19 -06:00
carlobertolli	93efa2b8b9	Revert "[OpenMP] Fix two usm tests for amdgpus." (#77983 ) Reverts llvm/llvm-project#77851	2024-01-12 15:01:49 -06:00
carlobertolli	3add9491cd	[OpenMP] Fix two usm tests for amdgpus. (#77851 ) Some are missing setting of HSA_XNACK=1 environment variable, used to enable unified memory support on amdgpu's when it's not been set at kernel boot time. Some others needed to be marked as supporting unified_shared_memory in the lit test harness.	2024-01-12 14:42:49 -06:00
Joseph Huber	ab02372c23	[OpenMP] Fix or disable NVPTX tests failing currently (#77844 ) Summary: This patch is an attempt to get a clean run of `check-openmp` running on an NVPTX machine. I simply took the lists of tests that failed on my `sm_89` machine and disabled them or fixed them. A lot of these tests are disabled on AMDGPU already, so it makes sense that NVPTX fails. The others are simply problems with NVPTX optimized debugging which will need to be fixed. I opened an issue on one of them.	2024-01-11 19:17:08 -06:00
Joseph Huber	37c1a5e3f5	[Libomptarget] Fix GPU Dtors referencing possibly deallocated image (#77828 ) Summary: The constructors and destructors look up a symbol in the ELF quickly to determine if they need to be run on the GPU. This allows us to avoid the very slow actions required to do the slower lookup using the vendor API. One problem occurs with how we handle the lifetime of these images. Right now there is no invariant to specify the lifetime of the underlying binary image that is loaded. In the typical case, this comes from the binary itself in the `.llvm.offloading` section, meaning that the lifetime of the binary should match the executable itself. This would work fine, if it weren't for the fact that the plugin is loaded via `dlopen` and can have a teardown order out of sync with the main executable. This was likely what was occuring when this failed on some systems but not others. A potential solution would be to simply copy images into memory so the runtime does not rely on external references. Another would be to manually zero these out after initialization as to prevent this mistake from happening accidentally. The former has the benefit of making some checks easier, and allowing for constant initialization be done on the ELF itself (normally we can't do this because writing to a constant section, e.g. .llvm.offloading is a segfault.). The downside would be the extra time required to copy the image in bulk (Although we are likely doing this in the vendor runtimes as well). This patch went with a quick solution to simply set a boolean value at initialization time if we need to call destructors. Fixes: https://github.com/llvm/llvm-project/issues/77798	2024-01-11 15:00:53 -06:00
Joseph Huber	3ede817f5b	[Libomptarget] Fix JIT on the NVPTX target by calling ptx manually (#77801 ) Summary: Recently a patch added an assertion in the GlobalHandler to indicate when an ELF was not used. This began to fire whenever NVPTX JIT was used, because the JIT pass output a PTX file instead of an ELF. The CUModuleLoad method consumes `.s` internally and compiles it to a cubin, however, this is too late as we perform several checks on the ELF directly for the presence of certain symbols and to read some necessary constants. This results in inconsistent behaviour. To address this, this patch simply calls `ptxas` manually, similar to how `lld` is called for the AMDGPU JIT pass. This is inevitably going to be slower than simply passing it to the CUDA routine due to the overhead involved in file IO and a fork call, but it's necessary for correctness. CUDA provides an API for compiling PTX manually. However, this only started showing up in CUDA 11.1 and is only provided "officially" in a static library. The `libnvidia-ptxjitcompiler.so` next to the CUDA driver has the same symbols and can likely be used as a replacement. This would be the faster solution. However, given that it's not documented it may have some issues.	2024-01-11 11:32:43 -06:00
Dominik Adamski	18798cf972	[OpenMP] Add missing weak definitions of missing variables (#77767 ) Variables `__omp_rtl_assume_teams_oversubscription` and `__omp_rtl_assume_threads_oversubscription `are used by functions: `__kmpc_distribute_static_loop`, `__kmpc_distribute_for_static_loop `and `__kmpc_for_static_loop`.	2024-01-11 15:28:45 +01:00
Dominik Adamski	ee431288a6	[NFC][OpenMP][Flang] Add smoke test for omp target parallel (#77579 ) Added test which proves that end-to-end compilation of omp target parallel costruct is successful for Flang compiler.	2024-01-11 10:18:11 +01:00
Andrew Gozillon	8ca07e57c3	[Flang][OpenMP][Offloading][Test] Adjust slightly incorrect tests now cmake configuration works These tests were slightly broken, in one case a failing test that now works. In the other case some accidentally left over code during a name change that broke compilation due to missing symbols.	2024-01-10 16:20:33 -06:00
Joseph Huber	e203968e41	[Libomptarget] Do not abort on failed plugin init (#77623 ) Summary: The current code logic is supposed to skip plugins that aren't found or could not be loaded. However, the plugic ontained a call to `abort` if it failed, which prevented us from continuing if initilalization the plugin failed (such as if `dlopen` failed for the dyanmic plugins).	2024-01-10 11:42:04 -06:00
Joseph Huber	d03b8c3a04	[Libomptarget][NFC] Format in-line comments consistently (#77530 ) Summary: The LLVM style uses /Foo=/ when indicating the name of a constant. See https://llvm.org/docs/CodingStandards.html#comment-formatting. This is useful for consistency, as well as because `clang-format` understands this syntax and formats it more cleanly. Do a bulk update of this syntax.	2024-01-10 10:10:08 -06:00
Joseph Huber	0d6412eae3	[Libomptarget] Add error message back in after changes (#77528 ) Summary: My previous reworking of the image hangling removed the image info which was originally used for this extra error message requested by Ye Luo. I have since added in the necessary ELF facilities to extract it from the object file and can add it back in. It's a little verbose mostly from needing to shuffle around types and potential errors.	2024-01-10 10:07:53 -06:00
Joseph Huber	d65a7d1f1a	[Libomptarget] Do not run CPU tests if FFI was not found Summary: The previous behaviour before I made it dynamically open libFFI was that these tests would be ignored if FFI was not found. This now allows tests to be run without the dependency and thus the tests fails on some buildbots. This simply makesit not build the tests if it's not present.	2024-01-10 07:22:23 -06:00
Martin Storsjö	14435a28cd	[OpenMP] Allow setting OPENMP_INSTALL_LIBDIR (#77533 ) The comment indicate that it should be possible, but as long as it wasn't a cache variable, the cmake script overwrote whatever variable the user had set.	2024-01-10 11:24:19 +02:00
Joseph Huber	c7c68f1764	[Libomptarget] Allow the CPU targets to be built without libffi (#77495 ) Summary: The CPU targets currently rely on `libffi` to invoke the "kernel" functions. Previously we would not build these if this dependency was not found. This patch copies th eapproach used for things like CUDA and HSA to dynamically load this if it is not found. The one sketchy thing this does is hard-code the default ABI for the target. These are normally defined on a per-file basis in the FFI source, so I had to fish out the expected values. We only use two types, so ideally we will always be able to use the default ABI. It's possible we could remove this dependency entirely in the future as well.	2024-01-09 14:01:52 -06:00
Brad Smith	dc03382d3e	[openmp][AIX] Add AIX to __kmp_set_stack_info() (#77421 )	2024-01-09 12:02:40 -05:00
Joseph Huber	0fe86f9c51	[Libomptarget] Remove extra cache for offloading entries (#77012 ) Summary: The offloading entries right now are assumed to be baked into the binary itself, and thus always valid whenever the library is executing. This means that we don't need to copy them to additional storage and can instead simply pass around references to it. This is not likely to change in the expected operation of the OpenMP library. Additionally, the indirection for the offload entry struct is simply two pointers, so moving it by value is trivial.	2024-01-08 16:49:33 -06:00
carlobertolli	ce4144406c	Revert "[OpenMP][libomptarget] Enable automatic unified shared memory executi…" (#77371 ) Reverts llvm/llvm-project#75999 lit test is failing.	2024-01-08 14:38:29 -06:00
carlobertolli	22a73e7c46	[OpenMP][libomptarget] Enable automatic unified shared memory executi… (#75999 ) …on (zero-copy) on MI300A. This patch enables applications that did not request OpenMP unified_shared_memory to run with the same zero-copy behavior, where mapped memory does not result in extra memory allocations and memory copies, but CPU-allocated memory is accessed from the device. The name for this behavior is "automatic zero-copy" and it relies on detecting: that the runtime is running on a MI300A, that the user did not select unified_shared_memory in their program, and that XNACK (unified memory support) is enabled in the current GPU configuration. If all these conditions are met, then automatic zero-copy is triggered. This patch is still missing support for global variables, which will be provided in a subsequent patch. Co-authored-by: Thorsten Blass <thorsten.blass@amd.com>	2024-01-08 14:17:28 -06:00
Joseph Huber	e7655ad605	[Libomptarget] Remove unnecessary CMake definition of endiannness (#77205 ) Summary: This is needed for some definition in `hsa.h` that requires this to be set for some architectures when it fails at autodetection. We only really build `libomptarget` with `gcc` and `clang` which already provide their own way of detecting this. Remove the unnecessary define and move it into the source.	2024-01-08 13:23:38 -06:00
Joseph Huber	bda562519b	[Libomptarget][NFC] Fix unhandled allocator enum value	2024-01-08 10:17:05 -06:00
Xing Xue	2edce427a8	[openmp][AIX]Initial changes for porting to AIX (#76841 ) This PR contains initial changes for building and testing libomp on AIX. More changes will follow. - `KMP_OS_AIX` is defined for the AIX platform - `KMP_ARCH_PPC` is defined for 32-bit PPC - `KMP_ARCH_PPC_XCOFF` and `KMP_ARCH_PPC64_XCOFF` are for 32- and 64-bit XCOFF object formats respectively - Assembly file `z_AIX_asm.S` is used for AIX specific assembly code and will be added in a separate PR - The target library is disabled because AIX does not have the device support - OMPT is temporarily disabled	2024-01-08 08:33:00 -05:00
Chaitanya	1637c07925	[openmp][amdgpu] Add DynamicLdsSize to AMDGPUImplicitArgsTy (#65325 ) #65273 "hidden_dynamic_lds_size" argument will be added in the reserved section at offset 120 of the implicit argument layout Add DynamicLdsSize to AMDGPUImplicitArgsTy struct at offset 120 and fill the dynamic LDS size before kernel launch.	2024-01-06 09:34:48 +05:30
Dominik Adamski	0cdaadf15a	[libomptarget][flang] Explicitly pass the OpenMP device libraries to tests (#76796 ) This pull request is a follow-up of patch: https://github.com/llvm/llvm-project/pull/68225 and it explicitly specifies OpenMP device libraries for Fortran OpenMP tests.	2024-01-04 08:45:34 +01:00

1 2 3 4 5 ...

3236 Commits