Commit Graph

3231 Commits

Author SHA1 Message Date
Xing Xue
ae27600016 [OpenMP][AIX] Set worker stack size to 2 x KMP_DEFAULT_STKSIZE if system stack size is too big (#81996)
This patch sets the stack size of worker threads to `2 x
KMP_DEFAULT_STKSIZE` (2 x 4MB) for AIX if the system stack size is too
big. Also defines maximum stack size for 32-bit AIX.

(cherry picked from commit 2de269a641e4ffbb7a44e559c4c0a91bb66df823)
2024-02-19 16:14:44 -08:00
Xing Xue
34fdf52cce [OpenMP][AIX]Define struct kmp_base_tas_lock with the order of two members swapped for big-endian (#79188)
The direct lock data structure has bit `0` (the least significant bit)
of the first 32-bit word set to `1` to indicate it is a direct lock. On
the other hand, the first word (in 32-bit mode) or first two words (in
64-bit mode) of an indirect lock are the address of the entry allocated
from the indirect lock table. The runtime checks bit `0` of the first
32-bit word to tell if this is a direct or an indirect lock. This works
fine for 32-bit and 64-bit little-endian because its memory layout of a
64-bit address is (`low word`, `high word`). However, this causes
problems for big-endian where the memory layout of a 64-bit address is
(`high word`, `low word`). If an address of the indirect lock table
entry is something like `0x110035300`, i.e., (`0x1`, `0x10035300`), it
is treated as a direct lock. This patch defines `struct
kmp_base_tas_lock` with the ordering of the two 32-bit members flipped
for big-endian PPC64 so that when checking/setting tags in member
`poll`, the second word (the low word) is used. This patch also changes
places where `poll` is not already explicitly specified for
checking/setting tags.

(cherry picked from commit ac97562c99c3ae97f063048ccaf08ebdae60ac30)
2024-02-16 05:15:11 -08:00
Xing Xue
cf130269fa [OpenMP][test]Flip bit-fields in 'struct flags' for big-endian in test cases (#79895)
This patch flips bit-fields in `struct flags` for big-endian in test
cases to be consistent with the definition of the structure in libomp
`kmp.h`.

(cherry picked from commit 7a9b0e4acb3b5ee15f8eb138aad937cfa4763fb8)
2024-02-16 05:15:11 -08:00
Martin Storsjö
d7c6794aff [OpenMP] [cmake] Don't use -fno-semantic-interposition on Windows (#81113)
This was added in 4b7beab418. When the
flag was added implicitly elsewhere, it was added via
llvm/cmake/modules/HandleLLVMOptions.cmake, where it wasn't added on
Windows/Cygwin targets.

This avoids one warning per object file in OpenMP.

(cherry picked from commit 72f04fa0734f8559ad515f507a4a3ce3f461f196)
2024-02-16 04:40:41 -08:00
Alexandre Ganea
1cfd46f134 [openmp] On Windows, fix standalone cmake build (#80174)
This fixes: https://github.com/llvm/llvm-project/issues/80117

(cherry picked from commit d2565bb11308f6cf98d838e828d9bcbe2d51e0e4)
2024-02-01 17:54:51 -08:00
Alexandre Ganea
15fdc7646c Re-land [openmp] Fix warnings when building on Windows with latest MSVC or Clang ToT (#77853)
The reverts 94f960925b and fixes it.
2024-01-23 12:48:38 -05:00
Alexandre Ganea
94f960925b Revert 10f3296dd7 - [openmp] Fix warnings when building on Windows with latest MSVC or Clang ToT (#77853)
It broke the AMDGPU buildbot: https://lab.llvm.org/buildbot/#/builders/193/builds/45378
2024-01-23 08:51:12 -05:00
Alexandre Ganea
10f3296dd7
[openmp] Fix warnings when building on Windows with latest MSVC or Clang ToT (#77853)
There were quite a few compilation warnings when building openmp on Windows with
the latest Visual Studios 2022 version 17.8.4. Some other warnings were visible
with the latest Clang at tip. This commit fixes all of them.
2024-01-23 08:38:18 -05:00
Jan Patrick Lehr
181c4c331a
[OpenMP][Fix] Require USM capability in force-usm test (#79059)
This should fix the AMDGPU buildbot breakage from #76571
2024-01-22 15:21:31 -06:00
Jan Patrick Lehr
fa4780fa6c
[OpenMP][USM] Introduces -fopenmp-force-usm flag (#76571)
This flag forces the compiler to generate code for OpenMP target regions
as if the user specified the #pragma omp requires unified_shared_memory
in each source file.

The option does not have a -fno-* friend since OpenMP requires the
unified_shared_memory clause to be present in all source files. Since
this flag does no harm if the clause is present, it can be used in
conjunction. My understanding is that USM should not be turned off
selectively, hence, no -fno- version.

This adds a basic test to check the correct generation of double
indirect access to declare target globals in USM mode vs non-USM mode.
Which I think is the only difference observable in code generation.

This runtime test checks for the (non-)occurence of data movement between host
and device. It does one run without the flag and one with the flag to
also see that both versions behave as expected. In the case w/o the new
flag data movement between host and device is expected. In the case with
the flag such data movement should not be present / reported.
2024-01-22 21:59:26 +01:00
Joseph Huber
621bafd5c1
[Libomptarget] Move target table handling out of the plugins (#77150)
Summary:
This patch removes the bulk of the handling of the
`__tgt_offload_entries` out of the plugins itself. The reason for this
is because the plugins themselves should not be handling this
implementation detail of the OpenMP runtime. Instead, we expose two new
plugin API functions to get the points to a device pointer for a global
as well as a kernel type.

This required introducing a new type to represent a binary image that
has been loaded on a device. We can then use this to load the addresses
as needed. The creation of the mapping table is then handled just in
`libomptarget` where we simply look up each address individually. This
should allow us to expose these operations more generically when we
provide a separate API.
2024-01-22 11:06:47 -06:00
carlobertolli
ae99966a27
[OpenMP] Enable automatic unified shared memory on MI300A. (#77512)
This patch enables applications that did not request OpenMP
unified_shared_memory to run with the same zero-copy behavior, where
mapped memory does not result in extra memory allocations and memory
copies, but CPU-allocated memory is accessed from the device. The name
for this behavior is "automatic zero-copy" and it relies on detecting:
that the runtime is running on a MI300A, that the user did not select
unified_shared_memory in their program, and that XNACK (unified memory
support) is enabled in the current GPU configuration. If all these
conditions are met, then automatic zero-copy is triggered.

This patch also introduces an environment variable OMPX_APU_MAPS that,
if set, triggers automatic zero-copy also on non APU GPUs (e.g., on
discrete GPUs).
This patch is still missing support for global variables, which will be
provided in a subsequent patch.

Co-authored-by: Thorsten Blass <thorsten.blass@amd.com>
2024-01-22 10:30:22 -06:00
carlobertolli
3440466536
[OpenMP] Fix two usm tests for amdgpus. (#78824)
Some are missing setting of HSA_XNACK=1 environment variable, used to
enable unified memory support on amdgpu's when it's not been set at
kernel boot time. Some others needed to be marked as supporting
unified_shared_memory in the lit test harness.

Extend lit test harness to enable unified_shared_memory requirement for
AMD GPUs.

Reland: #77851
2024-01-22 08:56:51 -06:00
Joseph Huber
b689b4fe55
[LLVM][CMake] Add ffi_static target for the FFI static library (#78779)
Summary:
This patch is an attempt to make the `find_package(FFI)` support in LLVM
prefer to provide the static library version if present. This is
currently
an optional library for building `libffi`, and its presence implies that
it should likely be used. This patch is an attempt to fix some problems
observed with testing programs linked against `libffi` on many different
systems that could have conflicting paths. Linking it statically
prevents this.

This patch adds the `ffi_static` target for this library.
2024-01-22 07:27:06 -06:00
Dominik Adamski
21199f9842
[OpenMP][OMPIRBuilder] Fix LLVM IR codegen for collapsed device loop (#78708)
When we generate the loop body function, we need to be sure, that all
original loop counters are replaced by the new counter.

We need to save all items which use the original loop counter and then
perform replacement of the original loop counter. If we don't do it,
there is a risk that some values are not updated.
2024-01-22 09:24:45 +01:00
Alexandre Ganea
0ac992e0ad [openmp] Revert 64874e5ab5 since it was committed by mistake and the PR (https://github.com/llvm/llvm-project/pull/77853) wasn't approved yet. 2024-01-18 13:55:03 -05:00
Dominik Adamski
8930c5a4be
[NFC][OpenMP] Fix typo in CHECK line (#78586)
Typo in test: openmp/libomptarget/test/offloading/fortran/basic-target-parallel-do.f90
2024-01-18 15:40:15 +01:00
Dominik Adamski
d87a53a960
[NFC][OpenMP][Flang] Add test for OpenMP target parallel do (#77776)
Added test which proves that end-to-end compilation of `omp target
parallel do` costruct is successful for Flang compiler.
2024-01-18 15:26:39 +01:00
Paul Osmialowski
d5b2e41e20
[OpenMP][omp_lib] Restore compatibility with more restrictive Fortran compilers (#77780)
The most recent changes to `omp_lib.h.var` have re-introduced some
compatibility issues that had to be fixed due to the similar changes in
the past. Namely:

1. D120707 has removed the "use omp_lib_kinds" statement and replaced it
with import

2. D114537 added line continuation to the long lines

This patch introduces the same kind of changes in order to restore
compatibility with some more restrictive Fortran compilers so their
users could still benefit from the LLVM's OpenMP Fortran library.
2024-01-18 11:06:24 +00:00
Alexandre Ganea
64874e5ab5 [openmp] Silence warnings when building the LLVM release with MSVC 2024-01-17 07:23:58 -05:00
Alexandre Ganea
c5bbf40d98 [openmp] Remove extra ';' outside of function
Fixes:
```
[4038/11058] Building CXX object projects/openmp/libomptarget/src/CMakeFiles/omptarget.dir/OpenMP/InteropAPI.cpp.o
/home/aganea/llvm-project/openmp/libomptarget/src/OpenMP/InteropAPI.cpp:202:2: warning: extra ';' outside of a function is incompatible with C++98 [-Wc++98-compat-extra-semi]
};
 ^
1 warning generated.
```
2024-01-17 07:23:56 -05:00
Joseph Huber
89cdd48a22
[Libomptarget] Remove temporary files in AMDGPU JIT impl (#77980)
Summary:
This patch cleans up some of the JIT handling for AMDGPU as well as
removing its temporary files. Previously these would be left in the
temporary directory after the program was run. This costs some extra
time, but the correct solution to avoid that is to create a sufficient
entrypoint into `ld.lld` that we can simply pass a memory buffer into.
2024-01-15 19:03:19 -06:00
carlobertolli
93efa2b8b9
Revert "[OpenMP] Fix two usm tests for amdgpus." (#77983)
Reverts llvm/llvm-project#77851
2024-01-12 15:01:49 -06:00
carlobertolli
3add9491cd
[OpenMP] Fix two usm tests for amdgpus. (#77851)
Some are missing setting of HSA_XNACK=1 environment variable, used to
enable unified memory support on amdgpu's when it's not been set at
kernel boot time. Some others needed to be marked as supporting
unified_shared_memory in the lit test harness.
2024-01-12 14:42:49 -06:00
Joseph Huber
ab02372c23
[OpenMP] Fix or disable NVPTX tests failing currently (#77844)
Summary:
This patch is an attempt to get a clean run of `check-openmp` running on
an NVPTX machine. I simply took the lists of tests that failed on my
`sm_89` machine and disabled them or fixed them. A lot of these tests
are disabled on AMDGPU already, so it makes sense that NVPTX fails. The
others are simply problems with NVPTX optimized debugging which will
need to be fixed. I opened an issue on one of them.
2024-01-11 19:17:08 -06:00
Joseph Huber
37c1a5e3f5
[Libomptarget] Fix GPU Dtors referencing possibly deallocated image (#77828)
Summary:
The constructors and destructors look up a symbol in the ELF quickly to
determine if they need to be run on the GPU. This allows us to avoid the
very slow actions required to do the slower lookup using the vendor API.

One problem occurs with how we handle the lifetime of these images.
Right now there is no invariant to specify the lifetime of the
underlying binary image that is loaded. In the typical case, this comes
from the binary itself in the `.llvm.offloading` section, meaning that
the lifetime of the binary should match the executable itself. This
would work fine, if it weren't for the fact that the plugin is loaded
via `dlopen` and can have a teardown order out of sync with the main
executable.

This was likely what was occuring when this failed on some systems but
not others. A potential solution would be to simply copy images into
memory so the runtime does not rely on external references. Another
would be to manually zero these out after initialization as to prevent
this mistake from happening accidentally. The former has the benefit of
making some checks easier, and allowing for constant initialization be
done on the ELF itself (normally we can't do this because writing to a
constant section, e.g. .llvm.offloading is a segfault.). The downside
would be the extra time required to copy the image in bulk (Although we
are likely doing this in the vendor runtimes as well).

This patch went with a quick solution to simply set a boolean value at
initialization time if we need to call destructors.

Fixes: https://github.com/llvm/llvm-project/issues/77798
2024-01-11 15:00:53 -06:00
Joseph Huber
3ede817f5b
[Libomptarget] Fix JIT on the NVPTX target by calling ptx manually (#77801)
Summary:
Recently a patch added an assertion in the GlobalHandler to indicate
when an ELF was not used. This began to fire whenever NVPTX JIT was
used, because the JIT pass output a PTX file instead of an ELF. The
CUModuleLoad method consumes `.s` internally and compiles it to a cubin,
however, this is too late as we perform several checks on the ELF
directly for the presence of certain symbols and to read some necessary
constants. This results in inconsistent behaviour.

To address this, this patch simply calls `ptxas` manually, similar to
how `lld` is called for the AMDGPU JIT pass. This is inevitably going to
be slower than simply passing it to the CUDA routine due to the overhead
involved in file IO and a fork call, but it's necessary for correctness.

CUDA provides an API for compiling PTX manually. However, this only
started showing up in CUDA 11.1 and is only provided "officially" in a
static library. The `libnvidia-ptxjitcompiler.so` next to the CUDA
driver has the same symbols and can likely be used as a replacement.
This would be the faster solution. However, given that it's not
documented it may have some issues.
2024-01-11 11:32:43 -06:00
Dominik Adamski
18798cf972
[OpenMP] Add missing weak definitions of missing variables (#77767)
Variables `__omp_rtl_assume_teams_oversubscription` and
`__omp_rtl_assume_threads_oversubscription `are used by functions:
`__kmpc_distribute_static_loop`, `__kmpc_distribute_for_static_loop `and
`__kmpc_for_static_loop`.
2024-01-11 15:28:45 +01:00
Dominik Adamski
ee431288a6
[NFC][OpenMP][Flang] Add smoke test for omp target parallel (#77579)
Added test which proves that end-to-end compilation of omp target
parallel costruct is successful for Flang compiler.
2024-01-11 10:18:11 +01:00
Andrew Gozillon
8ca07e57c3 [Flang][OpenMP][Offloading][Test] Adjust slightly incorrect tests now cmake configuration works
These tests were slightly broken, in one case a failing test that now works. In the other case
some accidentally left over code during a name change that broke compilation due to missing
symbols.
2024-01-10 16:20:33 -06:00
Joseph Huber
e203968e41
[Libomptarget] Do not abort on failed plugin init (#77623)
Summary:
The current code logic is supposed to skip plugins that aren't found or
could not be loaded. However, the plugic ontained a call to `abort` if
it failed, which prevented us from continuing if initilalization the
plugin failed (such as if `dlopen` failed for the dyanmic plugins).
2024-01-10 11:42:04 -06:00
Joseph Huber
d03b8c3a04
[Libomptarget][NFC] Format in-line comments consistently (#77530)
Summary:
The LLVM style uses /*Foo=*/ when indicating the name of a constant. See
https://llvm.org/docs/CodingStandards.html#comment-formatting. This is
useful for consistency, as well as because `clang-format` understands
this syntax and formats it more cleanly. Do a bulk update of this
syntax.
2024-01-10 10:10:08 -06:00
Joseph Huber
0d6412eae3
[Libomptarget] Add error message back in after changes (#77528)
Summary:
My previous reworking of the image hangling removed the image info which
was originally used for this extra error message requested by Ye Luo. I
have since added in the necessary ELF facilities to extract it from the
object file and can add it back in. It's a little verbose mostly from
needing to shuffle around types and potential errors.
2024-01-10 10:07:53 -06:00
Joseph Huber
d65a7d1f1a [Libomptarget] Do not run CPU tests if FFI was not found
Summary:
The previous behaviour before I made it dynamically open libFFI was that
these tests would be ignored if FFI was not found. This now allows tests
to be run without the dependency and thus the tests fails on some
buildbots. This simply makesit not build the tests if it's not present.
2024-01-10 07:22:23 -06:00
Martin Storsjö
14435a28cd
[OpenMP] Allow setting OPENMP_INSTALL_LIBDIR (#77533)
The comment indicate that it should be possible, but as long as it
wasn't a cache variable, the cmake script overwrote whatever variable
the user had set.
2024-01-10 11:24:19 +02:00
Joseph Huber
c7c68f1764
[Libomptarget] Allow the CPU targets to be built without libffi (#77495)
Summary:
The CPU targets currently rely on `libffi` to invoke the "kernel"
functions. Previously we would not build these if this dependency was
not found. This patch copies th eapproach used for things like CUDA and
HSA to dynamically load this if it is not found.

The one sketchy thing this does is hard-code the default ABI for the
target. These are normally defined on a per-file basis in the FFI
source, so I had to fish out the expected values. We only use two types,
so ideally we will always be able to use the default ABI.

It's possible we could remove this dependency entirely in the future as
well.
2024-01-09 14:01:52 -06:00
Brad Smith
dc03382d3e
[openmp][AIX] Add AIX to __kmp_set_stack_info() (#77421) 2024-01-09 12:02:40 -05:00
Joseph Huber
0fe86f9c51
[Libomptarget] Remove extra cache for offloading entries (#77012)
Summary:
The offloading entries right now are assumed to be baked into the binary
itself, and thus always valid whenever the library is executing. This
means that we don't need to copy them to additional storage and can
instead simply pass around references to it.

This is not likely to change in the expected operation of the OpenMP
library. Additionally, the indirection for the offload entry struct is
simply two pointers, so moving it by value is trivial.
2024-01-08 16:49:33 -06:00
carlobertolli
ce4144406c
Revert "[OpenMP][libomptarget] Enable automatic unified shared memory executi…" (#77371)
Reverts llvm/llvm-project#75999

lit test is failing.
2024-01-08 14:38:29 -06:00
carlobertolli
22a73e7c46
[OpenMP][libomptarget] Enable automatic unified shared memory executi… (#75999)
…on (zero-copy) on MI300A.

This patch enables applications that did not request OpenMP
unified_shared_memory to run with the same zero-copy behavior, where
mapped memory does not result in extra memory allocations and memory
copies, but CPU-allocated memory is accessed from the device. The name
for this behavior is "automatic zero-copy" and it relies on detecting:
that the runtime is running on a MI300A, that the user did not select
unified_shared_memory in their program, and that XNACK (unified memory
support) is enabled in the current GPU configuration. If all these
conditions are met, then automatic zero-copy is triggered.

This patch is still missing support for global variables, which will be
provided in a subsequent patch.

Co-authored-by: Thorsten Blass <thorsten.blass@amd.com>
2024-01-08 14:17:28 -06:00
Joseph Huber
e7655ad605
[Libomptarget] Remove unnecessary CMake definition of endiannness (#77205)
Summary:
This is needed for some definition in `hsa.h` that requires this to be
set for some architectures when it fails at autodetection. We only
really build `libomptarget` with `gcc` and `clang` which already provide
their own way of detecting this. Remove the unnecessary define and move
it into the source.
2024-01-08 13:23:38 -06:00
Joseph Huber
bda562519b [Libomptarget][NFC] Fix unhandled allocator enum value 2024-01-08 10:17:05 -06:00
Xing Xue
2edce427a8
[openmp][AIX]Initial changes for porting to AIX (#76841)
This PR contains initial changes for building and testing libomp on AIX.
More changes will follow.
- `KMP_OS_AIX` is defined for the AIX platform
- `KMP_ARCH_PPC` is defined for 32-bit PPC
- `KMP_ARCH_PPC_XCOFF` and `KMP_ARCH_PPC64_XCOFF` are for 32- and 64-bit
XCOFF object formats respectively
- Assembly file `z_AIX_asm.S` is used for AIX specific assembly code and
will be added in a separate PR
- The target library is disabled because AIX does not have the device
support
- OMPT is temporarily disabled
2024-01-08 08:33:00 -05:00
Chaitanya
1637c07925
[openmp][amdgpu] Add DynamicLdsSize to AMDGPUImplicitArgsTy (#65325)
#65273 "hidden_dynamic_lds_size" argument will be added in the reserved
section at offset 120 of the implicit argument layout
Add DynamicLdsSize to AMDGPUImplicitArgsTy struct at offset 120 and fill
the dynamic LDS size before kernel launch.
2024-01-06 09:34:48 +05:30
Dominik Adamski
0cdaadf15a
[libomptarget][flang] Explicitly pass the OpenMP device libraries to tests (#76796)
This pull request is a follow-up of patch:
https://github.com/llvm/llvm-project/pull/68225 and it explicitly
specifies OpenMP device libraries for Fortran OpenMP tests.
2024-01-04 08:45:34 +01:00
Joseph Huber
fb32977ac7
[Libomptarget] Fix RPC-based malloc on NVPTX (#72440)
Summary:
The device allocator on NVPTX architectures is enqueued to a stream that
the kernel is potentially executing on. This can lead to deadlocks as
the kernel will not proceed until the allocation is complete and the
allocation will not proceed until the kernel is complete. CUDA 11.2
introduced async allocations that we can manually place on separate
streams to combat this. This patch makes a new allocation type that's
guaranteed to be non-blocking so it will actually make progress, only
Nvidia needs to care about this as the others are not blocking in this
way by default.

I had originally tried to make the `alloc` and `free` methods take a
`__tgt_async_info`. However, I observed that with the large volume of
streams being created by a parallel test it quickly locked up the system
as presumably too many streams were being created. This implementation
not just creates a new stream and immediately destroys it. This
obviously isn't very fast, but it at least gets the cases to stop
deadlocking for now.
2024-01-02 16:53:53 -06:00
Kareem Ergawy
75be7bb3fc
[flang][OpenMP][Offloading][AMDGPU] Add test for target update (#76355)
Adds a new test for offloading `target update` directive to AMD GPUs.
2024-01-02 09:50:27 +01:00
Joseph Huber
64f0681e97
[Libomptarget] Rework image checking further (#76120)
Summary:
In the future, we may have more checks for different kinds of inputs,
e.g. SPIR-V. This patch simply reworks the handling to be more generic
and do the magic detection up-front. The checks inside the routines are
now asserts so we don't spend time checking this stuff over and over
again.

This patch also tweaked the bitcode check. I used a different function
to get the Lazy-IR module now, as it returns the raw expected value
rather than the SM diganostic.

No functionality change intended.
2023-12-29 15:14:39 -06:00
Gheorghe-Teodor Bercea
a01b58aef0
[OpenMP][libomptarget][Fix] Add missing array initialization (#76457)
Add missing array initialization as the array was not initialized and
the value zero was assumed.
2023-12-27 12:58:41 -05:00
Ethan Luis McDonough
813a671232
[OpenMP] Remove unnecessary dependencies from plugin unit tests (#76266)
This was an oversight that seems to be causing problems on certain
builds. This patch should fix #76225.
2023-12-22 14:44:23 -06:00