In `libomptarget` we use a couple of functions from `libomp`, but we didn't link
`libomptarget` against `libomp`. That will not work on some platforms such
as macOS. A linker error will be encountered because those symbols are not resolved
at link time when building `libomptarget`. This patch simply makes `libomptarget`
link agains `libomp`, makes it a "user" of `libomp`. I think this will not break
the policies between `libomp` and `libomptarget`.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D149617
D132005 introduced function calls from `libomp` to `libomptarget` if offloading
is enabled. However, the external function declaration may not always work. For
example, it causes a link error on macOS. Currently it is guarded properly by
a macro, but in order to get OpenMP target offloading working on macOS, it has
to be handled correctly. This patch applies the same idea of how we support
target memory extension by using function pointer indirect call for that function.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D149557
D132005 introduced function calls from `libomp` to `libomptarget` if offloading
is enabled. However, the external function declaration may not always work. For
example, it causes a link error on macOS. Currently it is guarded properly by
a macro, but in order to get OpenMP target offloading working on macOS, it has
to be handled correctly. This patch applies the same idea of how we support
target memory extension by using function pointer indirect call for that function.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D149557
The linker flag `--version-script` may not be supported by all linkers, such as
macOS's linker. `libomp` is already capable of detecting whether the linker supports
it and append the linker flag accordingly. Since currently we assume `libomptarget`
only works on Linux, we don't do the check accordingly. This patch simply adds
the check before adding it to linker flag. This will be the first patch to make
OpenMP target offloading work on macOS. Note that CMake files in `plugins` are
not touched before they are going to be removed pretty soon anyway.
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D149555
Without this patch, if an incompatible libomptarget.so is present in a
system directory, such as /usr/lib64, check-openmp fails many
libomptarget tests with linking errors. The problem appears to have
started at D129875, which landed as dc52712a06. This patch extends
the libomptarget test suite config with a -L for the current build
directory of libomptarget.so.
Reviewed By: jhuber6, JonChesterfield
Differential Revision: https://reviews.llvm.org/D149391
The working of depend clause with iterator modifier
can be correctly tested by means of execution tests
and not at the LLVM IR level. These tests
are imported/inspired from the SOLLVE tests.
SOLLVE repo: https://github.com/SOLLVE/sollve_vv
Differential Revision: https://reviews.llvm.org/D146706
This patch caused failures on the OpenMP buildbots as discussed in
https://reviews.llvm.org/D149010. We will need to investigate why we are
seeing unresolved references to the standard C++ library.
This reverts commit 5a15ca7f10.
This reverts commit 35cfadfbe2.
It makes a couple of buildbots unhappy because of the following test failures:
- `Transforms/OpenMP/add_attributes.ll'`
- `mapping/declare_mapper_target_data.cpp` on AMDGPU
This patch introduces per kernel environment. Previously, flags such as execution mode are set through global variables with name like `__kernel_name_exec_mode`. They are accessible on the host by reading the corresponding global variable, but not from the device. Besides, some assumptions, such as no nested parallelism, are not per kernel basis, preventing us applying per kernel optimization in the device runtime.
This is a combination and refinement of patch series D116908, D116909, and D116910.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D142569
(rectangular and non-rectangular loops)
Submitting on behalf of Natalia Glagoleva <natgla@microsoft.com>
Differential Revision: https://reviews.llvm.org/D148393
Currently the device runtime is built as a custom target, which will not be included
in the compile commands. Those language servers using compile commands cannot
handle device runtime correctly.
In this patch, when `CMAKE_EXPORT_COMPILE_COMMANDS` is turned on, dummy
targets that will be excluded from all will be added. Those targets will not be
built or installed if we just simply do `make` or `make install`, but their
compilation will be included in the compile commands.
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D148870
When proxy or helper tasks were used in inactive parallel regions, no
memo of the th_task_state was stored in the stack, so th_task_state
became invalid. This change inserts an item in the memo stack to track
these th_task_states.
Patch by Alex Duran.
Differential Revision: https://reviews.llvm.org/D145736
Replace the custom libomp_check_linker_flag() implementation with
llvm_check_compiler_linker_flag() from the common cmake utils. Due
to the way the custom implementation is implemented (capturing
output from an entire nested cmake invocation) it can easily end
up incorrectly detecting flags as unavailable, e.g. because "error",
"unknown" or similar occurs inside compiler flags, the directory
name, etc.
Fixes https://github.com/llvm/llvm-project/issues/62240.
Differential Revision: https://reviews.llvm.org/D148798
This patch removes the Err data member from the AsyncInfoWrapperTy class. Now the error
is stored externally, in the caller side, and it is explicitly passed to the
AsyncInfoWrapperTy::finalize() function as a reference.
Differential Revision: https://reviews.llvm.org/D148027
It turns out that the __builtin_amdgcn_s_barrier() alone does not emit
a fence. We somehow got away with this and assumed it would work as it
(hopefully) is correct on the NVIDIA path where we just emit a
__syncthreads. After talking to @arsenm we now (mostly) align with the
OpenCL barrier implementation [1] and emit explicit fences for AMDGPUs.
It seems this was the underlying cause for #59759, but I am not 100%
certain. There is a chance this simply hides the problem.
Fixes: https://github.com/llvm/llvm-project/issues/59759
[1] 07b347366e/opencl/src/workgroup/wgbarrier.cl (L21)
Summary:
We can detect the user's GPUs via the `auto` option. But if the user has
multiple GPUs installed or set the list incorrectly, we need to remove
the duplicates.
Summary:
At some point we stopped copying this file to the server, but
realistically this is just a static `.pdf` hosted in the LLVM repository
so we can link it directly.
We recently reverted a patch that automatically set the rpath on OpenMP
executables. This was used because the `libomptarget.so` library is only
expected to work with the same version of compiler that will be using
it. This patch adds some documentation for how to get similar behaviour
as before using a clang configuration file.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D147943
Code generation support for 'parallel masked' directive.
The `EmitOMPParallelMaskedDirective` was implemented.
In addition, the appropiate device functions were added.
Fix#59939.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D143527
The support for `libarcher` can sometimes cause problems when running
tests or building. We want an option to turn this off when we are not
directly testing `libarcher`.
Reviewed By: jplehr
Differential Revision: https://reviews.llvm.org/D147343
We introduced the implementation of supporting asynchronous routines with depend objects specified in Version 5.1 of the OpenMP Application Programming Interface. In brief, these routines omp_target_memcpy_async and omp_target_memcpy_rect_async perform asynchronous (nonblocking) memory copies between any
combination of host and device pointers. The basic idea is to create the implicit tasks to carry the memory copy calls and handle dependencies specified by depend objects. The implicit tasks are executed via hidden helper thread in OpenMP runtime.
Reviewed By: jdoerfert, tianshilei1992
Committed By: jplehr
Differential Revision: https://reviews.llvm.org/D136103
We missed certain updates, mostly to call site information, and
dependent AAs did not get recomputed. We also did not properly
distinguish and propagate incoming and outgoing information of call
sites.
The runtime tests passes now, I'll add a proper test for
AAExecutionDomain soon that covers all the cases and ensures we haven't
forgotten more updates. To help unblock some apps, I'll put the fix
first.
We can use dominance and avoid the special handling of kernels and
prevent inserting code before allocas accidentally (as happend in the
runtime test).
This code may have served a purpose at some point but it has been dead
for a long while. `FromMapperBase` was always `nullptr` which is `false`
which makes the rest of the code dead. Since this has not
affected tests, I delete it for now.
When building OpenMP as part of LLVM, CMAKE was generating incorrect
location references for OpenMP build's first step's artifacts being used
in regenerating its Windows import library in the second step. The fix is
to feed a dummy non-buildable, rather than buildable, source to CMAKE to
satisfy its source requirements removing the need to reference the first
step's artifacts in the second step altogether.
Differential Revision:https://reviews.llvm.org/D146894
It turns out that the `__builtin_amdgcn_s_barrier()` alone does not emit
a fence. We somehow got away with this and assumed it would work as it
(hopefully) is correct on the NVIDIA path where we just emit a
`__syncthreads`. After talking to @arsenm we now (mostly) align with the
OpenCL barrier implementation [1] and emit explicit fences for AMDGPUs.
It seems this was the underlying cause for #59759, but I am not 100%
certain. There is a chance this simply hides the problem.
Fixes: https://github.com/llvm/llvm-project/issues/59759
[1] 07b347366e/opencl/src/workgroup/wgbarrier.cl (L21)
Reviewed By: ye-luo
Differential Revision: https://reviews.llvm.org/D145290
This patch fixes an issue whereby a constexpr class member which is
mapped to the device is being optimized out thus leading to a runtime
error.
Patch: https://reviews.llvm.org/D146552
The shadow pointer map was problematic as we scanned an entire list if
an entry had shadow pointers. The new scheme stores the shadow pointers
inside the entries. This allows easy access without any search. It also
helps us, but also makes it necessary, to define a consistent locking
scheme. The implicit locking of entries via TargetPointerResultTy makes
this pretty effortless, however one has to:
- Lock HDTTMap before locking an entry.
- Do not lock HDTTMap while holding an entry lock.
- Hold the entry lock to read or modify an entry.
The changes to submitData and retrieveData have been made to ensure 2
when the LIBOMPTARGET_INFO flag is used. Most everything else happens by
itself as TargetPointerResultTy acts as a lock_guard for the entry. It
is a little complicated when we deal with multiple entries, especially
as they can be equal. However, one can still follow the rules with
reasonable effort.
LookupResult are now finally also locking the entry before it is
inspected. This is good even if we haven't run into a problem yet.
Differential Revision: https://reviews.llvm.org/D123446
This unblocks one of the XFAIL tests for AMD, though we need to work
around the missing printf still.
Differential Revision: https://reviews.llvm.org/D146592
Summary:
Some older compilers, which we still support, have problems handling the
copy elision that allows us to directly move an `Error` to an
`Expected`. This patch adds explicit moves to remove the error. Same as
last patch but I forgot this one.