Commit Graph

3214 Commits

Author SHA1 Message Date
Dominik Adamski
d87a53a960
[NFC][OpenMP][Flang] Add test for OpenMP target parallel do (#77776)
Added test which proves that end-to-end compilation of `omp target
parallel do` costruct is successful for Flang compiler.
2024-01-18 15:26:39 +01:00
Paul Osmialowski
d5b2e41e20
[OpenMP][omp_lib] Restore compatibility with more restrictive Fortran compilers (#77780)
The most recent changes to `omp_lib.h.var` have re-introduced some
compatibility issues that had to be fixed due to the similar changes in
the past. Namely:

1. D120707 has removed the "use omp_lib_kinds" statement and replaced it
with import

2. D114537 added line continuation to the long lines

This patch introduces the same kind of changes in order to restore
compatibility with some more restrictive Fortran compilers so their
users could still benefit from the LLVM's OpenMP Fortran library.
2024-01-18 11:06:24 +00:00
Alexandre Ganea
64874e5ab5 [openmp] Silence warnings when building the LLVM release with MSVC 2024-01-17 07:23:58 -05:00
Alexandre Ganea
c5bbf40d98 [openmp] Remove extra ';' outside of function
Fixes:
```
[4038/11058] Building CXX object projects/openmp/libomptarget/src/CMakeFiles/omptarget.dir/OpenMP/InteropAPI.cpp.o
/home/aganea/llvm-project/openmp/libomptarget/src/OpenMP/InteropAPI.cpp:202:2: warning: extra ';' outside of a function is incompatible with C++98 [-Wc++98-compat-extra-semi]
};
 ^
1 warning generated.
```
2024-01-17 07:23:56 -05:00
Joseph Huber
89cdd48a22
[Libomptarget] Remove temporary files in AMDGPU JIT impl (#77980)
Summary:
This patch cleans up some of the JIT handling for AMDGPU as well as
removing its temporary files. Previously these would be left in the
temporary directory after the program was run. This costs some extra
time, but the correct solution to avoid that is to create a sufficient
entrypoint into `ld.lld` that we can simply pass a memory buffer into.
2024-01-15 19:03:19 -06:00
carlobertolli
93efa2b8b9
Revert "[OpenMP] Fix two usm tests for amdgpus." (#77983)
Reverts llvm/llvm-project#77851
2024-01-12 15:01:49 -06:00
carlobertolli
3add9491cd
[OpenMP] Fix two usm tests for amdgpus. (#77851)
Some are missing setting of HSA_XNACK=1 environment variable, used to
enable unified memory support on amdgpu's when it's not been set at
kernel boot time. Some others needed to be marked as supporting
unified_shared_memory in the lit test harness.
2024-01-12 14:42:49 -06:00
Joseph Huber
ab02372c23
[OpenMP] Fix or disable NVPTX tests failing currently (#77844)
Summary:
This patch is an attempt to get a clean run of `check-openmp` running on
an NVPTX machine. I simply took the lists of tests that failed on my
`sm_89` machine and disabled them or fixed them. A lot of these tests
are disabled on AMDGPU already, so it makes sense that NVPTX fails. The
others are simply problems with NVPTX optimized debugging which will
need to be fixed. I opened an issue on one of them.
2024-01-11 19:17:08 -06:00
Joseph Huber
37c1a5e3f5
[Libomptarget] Fix GPU Dtors referencing possibly deallocated image (#77828)
Summary:
The constructors and destructors look up a symbol in the ELF quickly to
determine if they need to be run on the GPU. This allows us to avoid the
very slow actions required to do the slower lookup using the vendor API.

One problem occurs with how we handle the lifetime of these images.
Right now there is no invariant to specify the lifetime of the
underlying binary image that is loaded. In the typical case, this comes
from the binary itself in the `.llvm.offloading` section, meaning that
the lifetime of the binary should match the executable itself. This
would work fine, if it weren't for the fact that the plugin is loaded
via `dlopen` and can have a teardown order out of sync with the main
executable.

This was likely what was occuring when this failed on some systems but
not others. A potential solution would be to simply copy images into
memory so the runtime does not rely on external references. Another
would be to manually zero these out after initialization as to prevent
this mistake from happening accidentally. The former has the benefit of
making some checks easier, and allowing for constant initialization be
done on the ELF itself (normally we can't do this because writing to a
constant section, e.g. .llvm.offloading is a segfault.). The downside
would be the extra time required to copy the image in bulk (Although we
are likely doing this in the vendor runtimes as well).

This patch went with a quick solution to simply set a boolean value at
initialization time if we need to call destructors.

Fixes: https://github.com/llvm/llvm-project/issues/77798
2024-01-11 15:00:53 -06:00
Joseph Huber
3ede817f5b
[Libomptarget] Fix JIT on the NVPTX target by calling ptx manually (#77801)
Summary:
Recently a patch added an assertion in the GlobalHandler to indicate
when an ELF was not used. This began to fire whenever NVPTX JIT was
used, because the JIT pass output a PTX file instead of an ELF. The
CUModuleLoad method consumes `.s` internally and compiles it to a cubin,
however, this is too late as we perform several checks on the ELF
directly for the presence of certain symbols and to read some necessary
constants. This results in inconsistent behaviour.

To address this, this patch simply calls `ptxas` manually, similar to
how `lld` is called for the AMDGPU JIT pass. This is inevitably going to
be slower than simply passing it to the CUDA routine due to the overhead
involved in file IO and a fork call, but it's necessary for correctness.

CUDA provides an API for compiling PTX manually. However, this only
started showing up in CUDA 11.1 and is only provided "officially" in a
static library. The `libnvidia-ptxjitcompiler.so` next to the CUDA
driver has the same symbols and can likely be used as a replacement.
This would be the faster solution. However, given that it's not
documented it may have some issues.
2024-01-11 11:32:43 -06:00
Dominik Adamski
18798cf972
[OpenMP] Add missing weak definitions of missing variables (#77767)
Variables `__omp_rtl_assume_teams_oversubscription` and
`__omp_rtl_assume_threads_oversubscription `are used by functions:
`__kmpc_distribute_static_loop`, `__kmpc_distribute_for_static_loop `and
`__kmpc_for_static_loop`.
2024-01-11 15:28:45 +01:00
Dominik Adamski
ee431288a6
[NFC][OpenMP][Flang] Add smoke test for omp target parallel (#77579)
Added test which proves that end-to-end compilation of omp target
parallel costruct is successful for Flang compiler.
2024-01-11 10:18:11 +01:00
Andrew Gozillon
8ca07e57c3 [Flang][OpenMP][Offloading][Test] Adjust slightly incorrect tests now cmake configuration works
These tests were slightly broken, in one case a failing test that now works. In the other case
some accidentally left over code during a name change that broke compilation due to missing
symbols.
2024-01-10 16:20:33 -06:00
Joseph Huber
e203968e41
[Libomptarget] Do not abort on failed plugin init (#77623)
Summary:
The current code logic is supposed to skip plugins that aren't found or
could not be loaded. However, the plugic ontained a call to `abort` if
it failed, which prevented us from continuing if initilalization the
plugin failed (such as if `dlopen` failed for the dyanmic plugins).
2024-01-10 11:42:04 -06:00
Joseph Huber
d03b8c3a04
[Libomptarget][NFC] Format in-line comments consistently (#77530)
Summary:
The LLVM style uses /*Foo=*/ when indicating the name of a constant. See
https://llvm.org/docs/CodingStandards.html#comment-formatting. This is
useful for consistency, as well as because `clang-format` understands
this syntax and formats it more cleanly. Do a bulk update of this
syntax.
2024-01-10 10:10:08 -06:00
Joseph Huber
0d6412eae3
[Libomptarget] Add error message back in after changes (#77528)
Summary:
My previous reworking of the image hangling removed the image info which
was originally used for this extra error message requested by Ye Luo. I
have since added in the necessary ELF facilities to extract it from the
object file and can add it back in. It's a little verbose mostly from
needing to shuffle around types and potential errors.
2024-01-10 10:07:53 -06:00
Joseph Huber
d65a7d1f1a [Libomptarget] Do not run CPU tests if FFI was not found
Summary:
The previous behaviour before I made it dynamically open libFFI was that
these tests would be ignored if FFI was not found. This now allows tests
to be run without the dependency and thus the tests fails on some
buildbots. This simply makesit not build the tests if it's not present.
2024-01-10 07:22:23 -06:00
Martin Storsjö
14435a28cd
[OpenMP] Allow setting OPENMP_INSTALL_LIBDIR (#77533)
The comment indicate that it should be possible, but as long as it
wasn't a cache variable, the cmake script overwrote whatever variable
the user had set.
2024-01-10 11:24:19 +02:00
Joseph Huber
c7c68f1764
[Libomptarget] Allow the CPU targets to be built without libffi (#77495)
Summary:
The CPU targets currently rely on `libffi` to invoke the "kernel"
functions. Previously we would not build these if this dependency was
not found. This patch copies th eapproach used for things like CUDA and
HSA to dynamically load this if it is not found.

The one sketchy thing this does is hard-code the default ABI for the
target. These are normally defined on a per-file basis in the FFI
source, so I had to fish out the expected values. We only use two types,
so ideally we will always be able to use the default ABI.

It's possible we could remove this dependency entirely in the future as
well.
2024-01-09 14:01:52 -06:00
Brad Smith
dc03382d3e
[openmp][AIX] Add AIX to __kmp_set_stack_info() (#77421) 2024-01-09 12:02:40 -05:00
Joseph Huber
0fe86f9c51
[Libomptarget] Remove extra cache for offloading entries (#77012)
Summary:
The offloading entries right now are assumed to be baked into the binary
itself, and thus always valid whenever the library is executing. This
means that we don't need to copy them to additional storage and can
instead simply pass around references to it.

This is not likely to change in the expected operation of the OpenMP
library. Additionally, the indirection for the offload entry struct is
simply two pointers, so moving it by value is trivial.
2024-01-08 16:49:33 -06:00
carlobertolli
ce4144406c
Revert "[OpenMP][libomptarget] Enable automatic unified shared memory executi…" (#77371)
Reverts llvm/llvm-project#75999

lit test is failing.
2024-01-08 14:38:29 -06:00
carlobertolli
22a73e7c46
[OpenMP][libomptarget] Enable automatic unified shared memory executi… (#75999)
…on (zero-copy) on MI300A.

This patch enables applications that did not request OpenMP
unified_shared_memory to run with the same zero-copy behavior, where
mapped memory does not result in extra memory allocations and memory
copies, but CPU-allocated memory is accessed from the device. The name
for this behavior is "automatic zero-copy" and it relies on detecting:
that the runtime is running on a MI300A, that the user did not select
unified_shared_memory in their program, and that XNACK (unified memory
support) is enabled in the current GPU configuration. If all these
conditions are met, then automatic zero-copy is triggered.

This patch is still missing support for global variables, which will be
provided in a subsequent patch.

Co-authored-by: Thorsten Blass <thorsten.blass@amd.com>
2024-01-08 14:17:28 -06:00
Joseph Huber
e7655ad605
[Libomptarget] Remove unnecessary CMake definition of endiannness (#77205)
Summary:
This is needed for some definition in `hsa.h` that requires this to be
set for some architectures when it fails at autodetection. We only
really build `libomptarget` with `gcc` and `clang` which already provide
their own way of detecting this. Remove the unnecessary define and move
it into the source.
2024-01-08 13:23:38 -06:00
Joseph Huber
bda562519b [Libomptarget][NFC] Fix unhandled allocator enum value 2024-01-08 10:17:05 -06:00
Xing Xue
2edce427a8
[openmp][AIX]Initial changes for porting to AIX (#76841)
This PR contains initial changes for building and testing libomp on AIX.
More changes will follow.
- `KMP_OS_AIX` is defined for the AIX platform
- `KMP_ARCH_PPC` is defined for 32-bit PPC
- `KMP_ARCH_PPC_XCOFF` and `KMP_ARCH_PPC64_XCOFF` are for 32- and 64-bit
XCOFF object formats respectively
- Assembly file `z_AIX_asm.S` is used for AIX specific assembly code and
will be added in a separate PR
- The target library is disabled because AIX does not have the device
support
- OMPT is temporarily disabled
2024-01-08 08:33:00 -05:00
Chaitanya
1637c07925
[openmp][amdgpu] Add DynamicLdsSize to AMDGPUImplicitArgsTy (#65325)
#65273 "hidden_dynamic_lds_size" argument will be added in the reserved
section at offset 120 of the implicit argument layout
Add DynamicLdsSize to AMDGPUImplicitArgsTy struct at offset 120 and fill
the dynamic LDS size before kernel launch.
2024-01-06 09:34:48 +05:30
Dominik Adamski
0cdaadf15a
[libomptarget][flang] Explicitly pass the OpenMP device libraries to tests (#76796)
This pull request is a follow-up of patch:
https://github.com/llvm/llvm-project/pull/68225 and it explicitly
specifies OpenMP device libraries for Fortran OpenMP tests.
2024-01-04 08:45:34 +01:00
Joseph Huber
fb32977ac7
[Libomptarget] Fix RPC-based malloc on NVPTX (#72440)
Summary:
The device allocator on NVPTX architectures is enqueued to a stream that
the kernel is potentially executing on. This can lead to deadlocks as
the kernel will not proceed until the allocation is complete and the
allocation will not proceed until the kernel is complete. CUDA 11.2
introduced async allocations that we can manually place on separate
streams to combat this. This patch makes a new allocation type that's
guaranteed to be non-blocking so it will actually make progress, only
Nvidia needs to care about this as the others are not blocking in this
way by default.

I had originally tried to make the `alloc` and `free` methods take a
`__tgt_async_info`. However, I observed that with the large volume of
streams being created by a parallel test it quickly locked up the system
as presumably too many streams were being created. This implementation
not just creates a new stream and immediately destroys it. This
obviously isn't very fast, but it at least gets the cases to stop
deadlocking for now.
2024-01-02 16:53:53 -06:00
Kareem Ergawy
75be7bb3fc
[flang][OpenMP][Offloading][AMDGPU] Add test for target update (#76355)
Adds a new test for offloading `target update` directive to AMD GPUs.
2024-01-02 09:50:27 +01:00
Joseph Huber
64f0681e97
[Libomptarget] Rework image checking further (#76120)
Summary:
In the future, we may have more checks for different kinds of inputs,
e.g. SPIR-V. This patch simply reworks the handling to be more generic
and do the magic detection up-front. The checks inside the routines are
now asserts so we don't spend time checking this stuff over and over
again.

This patch also tweaked the bitcode check. I used a different function
to get the Lazy-IR module now, as it returns the raw expected value
rather than the SM diganostic.

No functionality change intended.
2023-12-29 15:14:39 -06:00
Gheorghe-Teodor Bercea
a01b58aef0
[OpenMP][libomptarget][Fix] Add missing array initialization (#76457)
Add missing array initialization as the array was not initialized and
the value zero was assumed.
2023-12-27 12:58:41 -05:00
Ethan Luis McDonough
813a671232
[OpenMP] Remove unnecessary dependencies from plugin unit tests (#76266)
This was an oversight that seems to be causing problems on certain
builds. This patch should fix #76225.
2023-12-22 14:44:23 -06:00
Felipe Cabarcas
9b6ea5e8f8
[OpenMP] Improve omp offload profiler (#68016)
Summary:
Adding information to the LIBOMPTARGET profiler runtime kernel and API
calls.

Key changes:
* Adding information to runtime calls for better understanding of how
the application
is executing. For example teams requested by the user, size of memory
transfers.
* Profile timer was changed from 'us' to 'ns', since 'us' was too
coarse-grain
  to register some important details like key kernel duration
* Removed non API or Runtime calls, to reduce complexity of profile for
application
  developers.

---------

Co-authored-by: Felipe Cabarcas <cabarcas@leia.crpl.cis.udel.edu>
Co-authored-by: fel-cab <fel-cab@github.com>
2023-12-22 14:58:11 -05:00
Fabian Mora
12250c4092
Reland [OpenMP][Fix] libomptarget Fortran tests (#76189)
This patch fixes the erroneous multiple-target requirement in Fortran
offloading tests. Additionally, it adds two new variables
(test_flags_clang, test_flags_flang) to lit.cfg so that
compiler-specific flags for Clang and Flang can be specified.

This patch re-lands: #74543. The error was caused by having:
```
config.substitutions.append(("%flags", config.test_flags))
config.substitutions.append(("%flags_clang", config.test_flags_clang))
config.substitutions.append(("%flags_flang", config.test_flags_flang))
```
when instead it has to be:
```
config.substitutions.append(("%flags_clang", config.test_flags_clang))
config.substitutions.append(("%flags_flang", config.test_flags_flang))
config.substitutions.append(("%flags", config.test_flags))
```
because LIT replaces with the first longest sub-string match.
2023-12-21 14:18:36 -08:00
Ethan Luis McDonough
cb3a893436
[OpenMP] Check for gtest when building libomptarget unit tests (#76141)
This patch addresses an issue introduced in pull request #74398. CMake
will attempt to re-build gtest if openmp is enabled as a project (as
opposed to being enabled as a runtime). This patch adds a check that
prevents this from happening.
2023-12-21 04:00:35 -06:00
Joseph Huber
ba192debb4 [Libomptarget][Obvious] Fix typo in attribute lookup
Summary:
These are keys into the AMDGPU target metadata. One of them had a typo
which prevented it from being extracted.
2023-12-20 19:03:35 -06:00
Joseph Huber
f324584ae3
[Libomptarget][NFCI] Remove caching of created ELF files (#76080)
Summary:
We currently keep a cache of created ELF files from the relevant images.
This shouldn't be necessary as the entire ELF interface is generally
trivially constructable and extremely cheap. The cost of constructing
one of these objects is simply a size check and writing a pointer to the
underlying data. Given that, keeping a cache of these images should not
be necessary overall.
2023-12-20 17:13:41 -06:00
Shilei Tian
7e4c6f6cb2
[OpenMP] Reduce the size of heap memory required by the test malloc_parallel.c (#75885)
This patch reduces the size of heap memory required by the test
`malloc_parallel.c` and `malloc.c`. The original size is too large such
that `malloc` returns `nullptr` on many threads, causing illegal
memory access.
2023-12-20 15:03:01 -08:00
Ethan Luis McDonough
3c10e5b2f6
[OpenMP] Add unit tests for nextgen plugins (#74398)
This patch add three GTest unit tests that test plugin read and write
operations. Tests can be compiled with `ninja -C runtimes/runtimes-bins
LibomptUnitTests`.
2023-12-20 14:58:56 -08:00
Joseph Huber
e4f4022b70 [Libomptarget][NFC] Fix linting warnings in the plugins
Summary:
Fix some linting warnings present in the plugins.
2023-12-20 10:07:34 -06:00
Joseph Huber
ac029e02a9
[Libomptarget] Remove __tgt_image_info and use the ELF directly (#75720)
Summary:
This patch reorganizes a lot of the code used to check for compatibility
with the current environment. The main bulk of this patch involves
moving from using a separate `__tgt_image_info` struct (which just
contains a string for the architecture) to instead simply checking this
information from the ELF directly. Checking information in the ELF is
very inexpensive as creating an ELF file is simply writing a base
pointer.

The main desire to do this was to reorganize everything into the ELF
image. We can then do the majority of these checks without first
initializing the plugin. A future patch will move the first ELF checks
to happen without initializing the plugin so we no longer need to
initialize and plugins that don't have needed images.

This patch also adds a lot more sanity checks for whether or not the ELF
is actually compatible. Such as if the images have a valid ABI, 64-bit
width, executable, etc.
2023-12-19 20:01:31 -06:00
Joseph Huber
219355d4c0
[Libomptarget] Use scoped atomics in the device runtime (#75834)
Summary:
A recent patch allowed us to easily replace GNU atomics with scoped
variants that make use of the backend's handling for more permissive
scopes. The default is full "system" scope, that means the atomic
operation must be consistent with operations that may happen on the
host's memory. This is generally only required for processes that are
communicating with something via global fine-grained memory. This patch
uses these atomics to make everything device scoped, as nothing in the
OpenMP runtime should depend on the host.

This is only provided as a very new clang extension but the DeviceRTL is
only compiled with clang so it is always available.
2023-12-19 14:30:34 -06:00
Carlos Eduardo Seo
dcd7c8b7c9
[OpenMP][AArch64] Workaround for ompt/synchronization tests (#75848)
ompt/synchronization/[masked.c | master.c] tests fail due to a wrong
offset being calculated for the possible return addreses. PR #65936
fixes this for Darwin and the same has to be done for Linux.

Updates #69627
2023-12-19 19:26:23 +01:00
Fabian Mora
ac82c8b925
Revert "[OpenMP][Fix] libomptarget Fortran tests" (#75953)
Reverts llvm/llvm-project#74543
2023-12-19 12:11:08 -05:00
Gheorghe-Teodor Bercea
65909177e3
[OpenMP][libomptarget][Fix] Disable test on NVIDIA platforms (#75949)
The tests doesn't seem to work for NVIDIA so disabling it for now.
2023-12-19 11:58:10 -05:00
Fabian Mora
49efb082cc
[OpenMP][Fix] libomptarget Fortran tests (#74543)
This patch fixes the erroneous multiple-target requirement in Fortran
offloading tests. Additionally, it adds two new variables
(`test_flags_clang`, `test_flags_flang`) to `lit.cfg` so that
compiler-specific flags for Clang and Flang can be specified.
2023-12-19 11:35:14 -05:00
Shilei Tian
3768039913
[OpenMP] Directly use user's grid and block size in kernel language mode (#70612)
In kernel language mode, use user's grid and blocks size directly. No
validity
check, which means if user's values are too large, the launch will fail,
similar
to what CUDA and HIP are doing right now.
2023-12-18 12:26:18 -05:00
Joseph Huber
913622d012
[Libomptarget] Remove remaining global constructors in plugins (#75814)
Summary:
This patch fixes the remaining global constructor in the plguins after
addressing the ones in the JIT interface. This struct was mistakenly
using global constructors as not all the members were being initialized
properly. This was almost certainly being optimized out because it's
trivial, but would still be present in debug builds and prevented us
from compiling with `-Werror=global-constructors`. We will want to do
that once offloading is moved to a runtimes only build.
2023-12-18 11:01:02 -06:00
Joseph Huber
1580877555
[Libomptarget] Remove bitcode image map used for JIT processing (#75672)
Summary:
Libomptarget supports JIT by treating an LLVM-IR file as a regular input
image. The handling here used a global map to keep track of triples once
it was parsed. This was done to same time, however this created a global
constructor as well as an extra mutex to handle it. This patch removes
the use of this map.

Instead, we simply use the file magic to perform a quick check if the
input image is valid bitcode. If not, we then create a lazy module. This
should roughly equivalent to the old handling that create an IR symbol
table. Here we can prevent the module from materializing everything but
the single triple metadata we read in later.
2023-12-18 09:28:06 -06:00