MacOS build of LLVM with OpenMP enabled fails with an error
that it doesn't know what std::abs is. Fix by including <cmath>
so that the relevant function declaration is included.
No functional change intended.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D150687
This reverts commit 65429b9af6.
Broke several projects, see https://reviews.llvm.org/D144509#4347562 onwards.
Also reverts follow-up commit "[OpenMP] Compile assembly files as ASM, not C"
This reverts commit 4072c8aee4.
Also reverts fix attempt "[cmake] Set CMP0091 to fix Windows builds after the cmake_minimum_required bump"
This reverts commit 7d47dac5f8.
Since CMake 3.20, CMake explicitly passes "-x c" (or equivalent)
when compiling a file which has been set as having the language
C. This behaviour change only takes place if "cmake_minimum_required"
is set to 3.20 or newer, or if the policy CMP0119 is set to new.
Attempting to compile assembly files with "-x c" fails, however
this is workarounded in many cases, as OpenMP overrides this with
"-x assembler-with-cpp", however this is only added for non-Windows
targets.
Thus, after increasing cmake_minimum_required to 3.20, this breaks
compiling the GNU assembly for Windows targets; the GNU assembly is
used for ARM and AArch64 Windows targets when building with Clang.
This patch unbreaks that.
Differential Revision: https://reviews.llvm.org/D150532
Previously, we tried to check whether the -std=c++17 option was
supported and manually add the flag. That doesn't work for compilers
that do support C++17 but use a different option syntax, like
clang-cl.
OpenMP itself probably doesn't specifically require C++17, therefore
CXX_STANDARD_REQUIRED is left off, but in some cases, we may
have code that only works in C++17 mode.
In particular, 46262cab24 made a
refactoring that works when built with Clang in C++17 mode, but not
in C++14 mode. MSVC accepts the construct in both language modes.
For libomptarget, we've had specific checks that require C++17
(or the -std=c++17 option) to be supported. It's doubtful that
libomptarget has got any code which more specifically requires C++17;
this seems to be a remnant from when libomptarget was added
originally in 2467df6e4f / D14031.
At that point, the rest of OpenMP didn't require C++11, while
libomptarget did require it. Now, it's unlikely that anyone attempts
building it with a toolchain that doesn't support C++11.
At this point, we could also probably just set CXX_STANDARD_REQUIRED
to true, requiring C++17 as baseline for all the OpenMP libraries.
This fixes building OpenMP with clang-cl after
46262cab24.
Differential Revision: https://reviews.llvm.org/D149726
This patch implements the "task record and replay" mechanism. The idea is to be able to store tasks and their dependencies in the runtime so that we do not pay the cost of task creation and dependency resolution for future executions. The objective is to improve fine-grained task performance, both for those from "omp task" and "taskloop".
The entry point of the recording phase is __kmpc_start_record_task, and the end of record is triggered by __kmpc_end_record_task.
Tasks encapsulated between a record start and a record end are saved, meaning that the runtime stores their dependencies and structures, referred to as TDG, in order to replay them in subsequent executions. In these TDG replays, we start the execution by scheduling all root tasks (tasks that do not have input dependencies), and there will be no involvement of a hash table to track the dependencies, yet tasks do not need to be created again.
At the beginning of __kmpc_start_record_task, we must check if a TDG has already been recorded. If yes, the function returns 0 and starts to replay the TDG by calling __kmp_exec_tdg; if not, we start to record, and the function returns 1.
An integer uniquely identifies TDGs. Currently, this identifier needs to be incremented manually in the source code. Still, depending on how this feature would eventually be used in the library, the caller function must do it; also, the caller function needs to implement a mechanism to skip the associated region, according to the return value of __kmpc_start_record_task.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D146642
Fixes a GCC build issue (an instance of unallowed typename keyword) and reworks memory allocation
to avoid the use of C++ library based primitives ) in and restores the earlier commit https://reviews.llvm.org/D148393
Differential Revision: https://reviews.llvm.org/D149010
Summary:
The changes in https://reviews.llvm.org/D150022 changed the API for this
function that we query. Simply pass in the alignment from the associated
header to fix.
This patch fixes the printing of device information. Devices are initialized
before printing its information. Fixes#61392
Differential Revision: https://reviews.llvm.org/D146081
This patch improves the device info printing in the NextGen plugins. The device
info properties are composed of keys, values and units (if necessary). These
properties are pushed into a queue by each vendor-specifc plugin, and later,
these properties are printed processed and printed by the common Plugin
Interface. The printing format is common across the different plugins.
Differential Revision: https://reviews.llvm.org/D148178
The interop types use the number of dependencies in the function
interface. Every other function uses an `i32` to count the number of
dependencies except for the initialization function. This leads to
codegen issues when the rest of the compiler passes in an `i32` that
then creates an invalid call. Fix this to be consistent with the other
uses.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D150156
In `libomptarget` we use a couple of functions from `libomp`, but we didn't link
`libomptarget` against `libomp`. That will not work on some platforms such
as macOS. A linker error will be encountered because those symbols are not resolved
at link time when building `libomptarget`. This patch simply makes `libomptarget`
link agains `libomp`, makes it a "user" of `libomp`. I think this will not break
the policies between `libomp` and `libomptarget`.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D149617
The purpose of this patch is to Implement registration of callback functions in the generic plugin by looking up corresponding callbacks in libomptarget. The overall design document is https://rice.app.box.com/s/pf3gix2hs4d4o1aatwir1set05xmjljc
Defined an object of type OmptDeviceCallbacksTy in the amdgpu plugin for holding the tool-provided callback functions. Implemented a global constructor in the plugin that creates a connector object to connect with libomptarget. The callbacks that are already registered with libomptarget are looked up and registered with the plugin.
Combined with an internal patch from Dhruva Chakrabarti, which fixes the OMPT initialization ordering.
Achieved through removal of the constructor attribute from ompt_init.
Patch from John Mellor-Crummey <johnmc@rice.edu>
With contributions from:
Dhruva Chakrabarti <Dhruva.Chakrabarti@amd.com>
Michael Halkenhaeuser <MichaelGerald.Halkenhauser@amd.com>
Reviewed By: dhruvachak, tianshilei1992
Differential Revision: https://reviews.llvm.org/D124070
They don't convey any useful information and make the documentation
unnecessarily hard to read.
Differential Revision: https://reviews.llvm.org/D149641
Adds HSA timeout hint of 2 seconds to the AMDGPU nextgen-plugin to improve
performance of small kernels.
The HSA runtime may stay in HSA_WAIT_STATE_ACTIVE for up to the timeout
value before switching to HSA_WAIT_STATE_BLOCKED. This can improve
latency from which small kernels can benefit.
The value was determined via experimentation w/ different benchmarks.
The timeout value can be overriden using the environment variable
LIBOMPTARGET_AMDGPU_STREAM_BUSYWAIT with a value in microseconds.
Original author: Greg Rodgers <Gregory.Rodgers@amd.com>
Contributions from: JP Lehr <JanPatrick.Lehr@amd.com>
Differential Revision: https://reviews.llvm.org/D148808
This reverts commit 63f0fdc262.
Since f1431bbfb1, this environment
variable is always set up by lit itself, so individual test suites
don't need to set it.
Differential Revision: https://reviews.llvm.org/D149356
For me, the test fails for nvptx64 offload. The problem was
introduced by D146838, which landed as 747af24155. It tries to copy
a string constant's address from device to host and then print the
string. This patch copies the contents of the string instead.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D149623
The purpose of this patch is to Implement registration of callback functions in the generic plugin by looking up corresponding callbacks in libomptarget. The overall design document is https://rice.app.box.com/s/pf3gix2hs4d4o1aatwir1set05xmjljc
Defined an object of type OmptDeviceCallbacksTy in the amdgpu plugin for holding the tool-provided callback functions. Implemented a global constructor in the plugin that creates a connector object to connect with libomptarget. The callbacks that are already registered with libomptarget are looked up and registered with the plugin.
Combined with an internal patch from Dhruva Chakrabarti, which fixes the OMPT initialization ordering.
Achieved through removal of the constructor attribute from ompt_init.
Patch from John Mellor-Crummey <johnmc@rice.edu>
With contributions from:
Dhruva Chakrabarti <Dhruva.Chakrabarti@amd.com>
Michael Halkenhaeuser <MichaelGerald.Halkenhauser@amd.com>
Differential Revision: https://reviews.llvm.org/D124070
This patch fixes a bug introduced by D142586, which landed as
434992c96e. The fix was to only look for alignments that are powers
of 2. See the new test case for details.
Reviewed By: jdoerfert, jhuber6
Differential Revision: https://reviews.llvm.org/D149490
In `libomptarget` we use a couple of functions from `libomp`, but we didn't link
`libomptarget` against `libomp`. That will not work on some platforms such
as macOS. A linker error will be encountered because those symbols are not resolved
at link time when building `libomptarget`. This patch simply makes `libomptarget`
link agains `libomp`, makes it a "user" of `libomp`. I think this will not break
the policies between `libomp` and `libomptarget`.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D149617
D132005 introduced function calls from `libomp` to `libomptarget` if offloading
is enabled. However, the external function declaration may not always work. For
example, it causes a link error on macOS. Currently it is guarded properly by
a macro, but in order to get OpenMP target offloading working on macOS, it has
to be handled correctly. This patch applies the same idea of how we support
target memory extension by using function pointer indirect call for that function.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D149557
D132005 introduced function calls from `libomp` to `libomptarget` if offloading
is enabled. However, the external function declaration may not always work. For
example, it causes a link error on macOS. Currently it is guarded properly by
a macro, but in order to get OpenMP target offloading working on macOS, it has
to be handled correctly. This patch applies the same idea of how we support
target memory extension by using function pointer indirect call for that function.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D149557
The linker flag `--version-script` may not be supported by all linkers, such as
macOS's linker. `libomp` is already capable of detecting whether the linker supports
it and append the linker flag accordingly. Since currently we assume `libomptarget`
only works on Linux, we don't do the check accordingly. This patch simply adds
the check before adding it to linker flag. This will be the first patch to make
OpenMP target offloading work on macOS. Note that CMake files in `plugins` are
not touched before they are going to be removed pretty soon anyway.
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D149555
Without this patch, if an incompatible libomptarget.so is present in a
system directory, such as /usr/lib64, check-openmp fails many
libomptarget tests with linking errors. The problem appears to have
started at D129875, which landed as dc52712a06. This patch extends
the libomptarget test suite config with a -L for the current build
directory of libomptarget.so.
Reviewed By: jhuber6, JonChesterfield
Differential Revision: https://reviews.llvm.org/D149391
The working of depend clause with iterator modifier
can be correctly tested by means of execution tests
and not at the LLVM IR level. These tests
are imported/inspired from the SOLLVE tests.
SOLLVE repo: https://github.com/SOLLVE/sollve_vv
Differential Revision: https://reviews.llvm.org/D146706
This patch caused failures on the OpenMP buildbots as discussed in
https://reviews.llvm.org/D149010. We will need to investigate why we are
seeing unresolved references to the standard C++ library.
This reverts commit 5a15ca7f10.
This reverts commit 35cfadfbe2.
It makes a couple of buildbots unhappy because of the following test failures:
- `Transforms/OpenMP/add_attributes.ll'`
- `mapping/declare_mapper_target_data.cpp` on AMDGPU
This patch introduces per kernel environment. Previously, flags such as execution mode are set through global variables with name like `__kernel_name_exec_mode`. They are accessible on the host by reading the corresponding global variable, but not from the device. Besides, some assumptions, such as no nested parallelism, are not per kernel basis, preventing us applying per kernel optimization in the device runtime.
This is a combination and refinement of patch series D116908, D116909, and D116910.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D142569
(rectangular and non-rectangular loops)
Submitting on behalf of Natalia Glagoleva <natgla@microsoft.com>
Differential Revision: https://reviews.llvm.org/D148393
Currently the device runtime is built as a custom target, which will not be included
in the compile commands. Those language servers using compile commands cannot
handle device runtime correctly.
In this patch, when `CMAKE_EXPORT_COMPILE_COMMANDS` is turned on, dummy
targets that will be excluded from all will be added. Those targets will not be
built or installed if we just simply do `make` or `make install`, but their
compilation will be included in the compile commands.
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D148870
When proxy or helper tasks were used in inactive parallel regions, no
memo of the th_task_state was stored in the stack, so th_task_state
became invalid. This change inserts an item in the memo stack to track
these th_task_states.
Patch by Alex Duran.
Differential Revision: https://reviews.llvm.org/D145736
Replace the custom libomp_check_linker_flag() implementation with
llvm_check_compiler_linker_flag() from the common cmake utils. Due
to the way the custom implementation is implemented (capturing
output from an entire nested cmake invocation) it can easily end
up incorrectly detecting flags as unavailable, e.g. because "error",
"unknown" or similar occurs inside compiler flags, the directory
name, etc.
Fixes https://github.com/llvm/llvm-project/issues/62240.
Differential Revision: https://reviews.llvm.org/D148798
This patch removes the Err data member from the AsyncInfoWrapperTy class. Now the error
is stored externally, in the caller side, and it is explicitly passed to the
AsyncInfoWrapperTy::finalize() function as a reference.
Differential Revision: https://reviews.llvm.org/D148027
It turns out that the __builtin_amdgcn_s_barrier() alone does not emit
a fence. We somehow got away with this and assumed it would work as it
(hopefully) is correct on the NVIDIA path where we just emit a
__syncthreads. After talking to @arsenm we now (mostly) align with the
OpenCL barrier implementation [1] and emit explicit fences for AMDGPUs.
It seems this was the underlying cause for #59759, but I am not 100%
certain. There is a chance this simply hides the problem.
Fixes: https://github.com/llvm/llvm-project/issues/59759
[1] 07b347366e/opencl/src/workgroup/wgbarrier.cl (L21)
Summary:
We can detect the user's GPUs via the `auto` option. But if the user has
multiple GPUs installed or set the list incorrectly, we need to remove
the duplicates.
Summary:
At some point we stopped copying this file to the server, but
realistically this is just a static `.pdf` hosted in the LLVM repository
so we can link it directly.
We recently reverted a patch that automatically set the rpath on OpenMP
executables. This was used because the `libomptarget.so` library is only
expected to work with the same version of compiler that will be using
it. This patch adds some documentation for how to get similar behaviour
as before using a clang configuration file.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D147943
Code generation support for 'parallel masked' directive.
The `EmitOMPParallelMaskedDirective` was implemented.
In addition, the appropiate device functions were added.
Fix#59939.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D143527
The support for `libarcher` can sometimes cause problems when running
tests or building. We want an option to turn this off when we are not
directly testing `libarcher`.
Reviewed By: jplehr
Differential Revision: https://reviews.llvm.org/D147343
We introduced the implementation of supporting asynchronous routines with depend objects specified in Version 5.1 of the OpenMP Application Programming Interface. In brief, these routines omp_target_memcpy_async and omp_target_memcpy_rect_async perform asynchronous (nonblocking) memory copies between any
combination of host and device pointers. The basic idea is to create the implicit tasks to carry the memory copy calls and handle dependencies specified by depend objects. The implicit tasks are executed via hidden helper thread in OpenMP runtime.
Reviewed By: jdoerfert, tianshilei1992
Committed By: jplehr
Differential Revision: https://reviews.llvm.org/D136103
We missed certain updates, mostly to call site information, and
dependent AAs did not get recomputed. We also did not properly
distinguish and propagate incoming and outgoing information of call
sites.
The runtime tests passes now, I'll add a proper test for
AAExecutionDomain soon that covers all the cases and ensures we haven't
forgotten more updates. To help unblock some apps, I'll put the fix
first.
We can use dominance and avoid the special handling of kernels and
prevent inserting code before allocas accidentally (as happend in the
runtime test).
This code may have served a purpose at some point but it has been dead
for a long while. `FromMapperBase` was always `nullptr` which is `false`
which makes the rest of the code dead. Since this has not
affected tests, I delete it for now.
When building OpenMP as part of LLVM, CMAKE was generating incorrect
location references for OpenMP build's first step's artifacts being used
in regenerating its Windows import library in the second step. The fix is
to feed a dummy non-buildable, rather than buildable, source to CMAKE to
satisfy its source requirements removing the need to reference the first
step's artifacts in the second step altogether.
Differential Revision:https://reviews.llvm.org/D146894
It turns out that the `__builtin_amdgcn_s_barrier()` alone does not emit
a fence. We somehow got away with this and assumed it would work as it
(hopefully) is correct on the NVIDIA path where we just emit a
`__syncthreads`. After talking to @arsenm we now (mostly) align with the
OpenCL barrier implementation [1] and emit explicit fences for AMDGPUs.
It seems this was the underlying cause for #59759, but I am not 100%
certain. There is a chance this simply hides the problem.
Fixes: https://github.com/llvm/llvm-project/issues/59759
[1] 07b347366e/opencl/src/workgroup/wgbarrier.cl (L21)
Reviewed By: ye-luo
Differential Revision: https://reviews.llvm.org/D145290
This patch fixes an issue whereby a constexpr class member which is
mapped to the device is being optimized out thus leading to a runtime
error.
Patch: https://reviews.llvm.org/D146552
The shadow pointer map was problematic as we scanned an entire list if
an entry had shadow pointers. The new scheme stores the shadow pointers
inside the entries. This allows easy access without any search. It also
helps us, but also makes it necessary, to define a consistent locking
scheme. The implicit locking of entries via TargetPointerResultTy makes
this pretty effortless, however one has to:
- Lock HDTTMap before locking an entry.
- Do not lock HDTTMap while holding an entry lock.
- Hold the entry lock to read or modify an entry.
The changes to submitData and retrieveData have been made to ensure 2
when the LIBOMPTARGET_INFO flag is used. Most everything else happens by
itself as TargetPointerResultTy acts as a lock_guard for the entry. It
is a little complicated when we deal with multiple entries, especially
as they can be equal. However, one can still follow the rules with
reasonable effort.
LookupResult are now finally also locking the entry before it is
inspected. This is good even if we haven't run into a problem yet.
Differential Revision: https://reviews.llvm.org/D123446
This unblocks one of the XFAIL tests for AMD, though we need to work
around the missing printf still.
Differential Revision: https://reviews.llvm.org/D146592
Summary:
Some older compilers, which we still support, have problems handling the
copy elision that allows us to directly move an `Error` to an
`Expected`. This patch adds explicit moves to remove the error. Same as
last patch but I forgot this one.
Summary:
Some older compilers, which we still support, have problems handling the
copy elision that allows us to directly move an `Error` to an
`Expected`. This patch adds explicit moves to remove the error.
When offloading is mandatory we can emit a more helpful message if we
did not find any compatible images with the user's system.
Fixes#60221
Reviewed By: ye-luo
Differential Revision: https://reviews.llvm.org/D142369
Added a new target ompt mode that depends on libomptarget OMPT support.
Added tests that verify callbacks for target regions, kernel launch,
and data transfer operations. All of them should pass on amdgpu using
make check-libomptarget.
Reviewed By: jplehr
Differential Revision: https://reviews.llvm.org/D127372
The default and pre-link pipeline builders currently require you to
call a separate method for optimization level O0, even though they
have perfectly well-defined O0 optimization pipelines.
Accept O0 optimization level and call buildO0DefaultPipeline()
internally, so all consumers don't need to repeat this.
Differential Revision: https://reviews.llvm.org/D146200
Clean up for the AMD-specific kernel launch info in the NextGen Plugins.
- Fixes a mistake introduced with the initial commit that added printing
of an AMD-only property.
- Removes another AMD-only property (not clear on upstream status)
- Adds some more comment to what info is printed.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D145924
My change of D14093 is only fixed problem for "pragma target data".
The problem still here for "pragma target"
what I am missing is:
When processing "pragma target data", the VD is passed when call to
emitCombinedEntry, so check VD is null as map for this pointer.
But when processing "pragma target" the VD is passed as nullptr, so
check VD is null is not working.
To fix this I add a new parameter IsMapThis. During the call to
emitCombinedEntry passes true if it is capturing this pointer and use
that instead check of "!VD".
Differential Revision: https://reviews.llvm.org/D146000
This patch breaks the handling of `printf` in the OpenMP library. Usiing
`-ffreestanding` prevents clang from emitting LLVM builtins, which we
use for OpenMP printing support. Shelve this until we have functioning
`printf` in the GPU `libc` and we can remove that code.
This reverts commit a92eaa3ebe.
Some globals were used for enforcing certain linking rules in the Intel
OpenMP implementation's MSVC compatibility layer and are not applicable
to the LLVM implementation (kmp_import.cpp has already been removed from
the build).
Differential Revision:https://reviews.llvm.org/D145837
The `stdint.h` header provides the standard types. Previously we used
`-nostdinc` and defined these ourselves. This patch switches to a
freestanding version which should work properly. Without
`-ffreestanding` the `stdint.h` header will include other libraries. But
in a freestanding environment it should work given the primitives.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D145963
This reverts commit 8cf85a0cad.
This is add back change of "Add map info for dereference pointer."
In addition turn off test run on amdgpu, since I don't know the way to
reprodue the problem.
This is to fix run time problem when use:
int **a;
map((*a)[:3]), (*a)[1] or map(**a).
current we skip generate map info for dereference pointer:
&(*a), &(*a)[0], 3*sizeof(int), TARGET_PARAM | TO | FROM
One way to fix runtime problem is to generate map info for dereference
pointer.
map((*a)[:3]):
&(*a), &(*a), sizeof(pointer), TARGET_PARAM | TO | FROM
&(*a), &(*a)[0], 3*sizeof(int), PTR_AND_OBJ | TO | FROM
map(**a):
&(*a), &(*a), sizeof(pointer), TARGET_PARAM | TO | FROM
&(*a), &(**a), sizeof(int), PTR_AND_OBJ | TO | FROM
The change in CGOpenMPRuntime.cpp add that.
The change in SemaOpenMP is to fix variable of dereference pointer to array
captured by reference. That is wrong. That cause run time to fail.
The rule is:
If variable is identified in a map clause it is always captured by
reference except if it is a pointer that is dereferenced somehow.
Differential Revision: https://reviews.llvm.org/D145093
The indeces of the dependent loops are properly ordered, just start from
1, so need just subtract 1 to get correct loop index.
Differential Revision: https://reviews.llvm.org/D145514
This reverts commit 9b9d08111b.
(Accepted by Jon https://reviews.llvm.org/D118493#4178250)
libc++, libc++abi, libunwind, and compiler-rt don't add the extra DT_RUNPATH,
it's strange for OpenMP to diverge.
Some build systems want to handle DT_RUNPATH themselves (e.g.
CMAKE_INSTALL_RPATH). Some distributions (e.g. Fedora) have policies against
DT_RUNPATH and the default DT_RUNPATH for OpenMP is causing trouble.
For users who don't want to specify rpath by themselves,
https://clang.llvm.org/docs/UsersManual.html#configuration-files
can be used to specify the default rpath, e.g.
specify -frtlib-add-rpath or -Wl,-rpath in bin/clang.cfg
The support for enabling and disabling certain architectures for the
OpenMP device RTL is different between AMD and Nvidia. This patch
updates the logic to make it common. This supports the `auto` format
more generally via the `nvptx-arch` and `amdgpu-arch` options. (These
are not availible at CMake time without a runtimes build, or another
install somewhere. But that only prevents users from using auto).
Reviewed By: ye-luo
Differential Revision: https://reviews.llvm.org/D145513
Some build bots have not been updated to the new minimal CMake version.
Reverting for now and ping the buildbot owners.
This reverts commit 44c6b905f8.
This partly undoes D137724.
This change has been discussed on discourse
https://discourse.llvm.org/t/rfc-upgrading-llvms-minimum-required-cmake-version/66193
Note this does not remove work-arounds for older CMake versions, that
will be done in followup patches.
Reviewed By: mehdi_amini, MaskRay, ChuanqiXu, to268, thieta, tschuett, phosek, #libunwind, #libc_vendors, #libc, #libc_abi, sivachandra, philnik, zibi
Differential Revision: https://reviews.llvm.org/D144509
Only generate the second def file when necessary (native Windows import
library builds).
Properly clean up .def file artifacts.
Reduce the re-generated import library build artifacts to the minimum.
Refactor the import library related portions of the script for clarity.
Tested with MSVC and MinWG/gcc12.0
Differential Revision:https://reviews.llvm.org/D144419
Summary:
Tihs patch is mostly NFC to fix some warning currently present in OpenMP
offloading plugins. Specifically this mostly removes the use of Twine
variables in favor of LLVM's small string. Twine variables are prone to
use-after-free and this is a cleaner way to concatenate a string.
The next-gen plugins did not properly set the values from
`OMP_NUM_TEAMS` and `OMP_TEAMS_THREAD_LIMIT`. This is because these
maximum values are set by each plugin to its hardware maximum. This
happens *after* the previous initialization. Move it to the correct
place and then add a test.
Fixes https://github.com/llvm/llvm-project/issues/61082
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D145105
Makes the info that is printed for kernel launches configurable for
different plugins. Adds all machinery to print the detailed launch
info that the current AMD plugin provides and includes e.g. register
spill counts.
The files msgpack.cpp, msgpack.def, and msgpack.h are copied from the old plugin
and are untouched. The contents of UtilitiesHSA.cpp and .h are copied together from
various files from the old plugin. The code was originally written by
Jon Chesterfield. I updated the function and type names visible to the outside, i.e.
in headers, to respect the LLVM conventions.
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D144521
The old code didn't actually align the values, and it added padding even
when none was necessary. This approach will pad entries if necessary
and, similar to the struct case, use the host pointer as guidance.
NOTE: This does still not align them as the host has, but it's unclear
if the user really should use the alignment bits anyway. For now
this is a reasonable compromise, only if we have host alignment
information (explicitly not implicitly via the host pointer), we
could do it completely right without wasting lots of resources for
>99% of the cases.
Fixes: https://github.com/llvm/llvm-project/issues/61034
ATOMIC_VAR_INIT has a trivial definition
`#define ATOMIC_VAR_INIT(value) (value)`,
is deprecated in C17/C++20, and will be removed in newer standards in
newer GCC/Clang (e.g. https://reviews.llvm.org/D144196).
Summary:
Ever since the change to the new plugins the information messages are
common between the major plugins. This allows us to test the info.c file
generically.
Summary:
Internally we need to know the feature that was used to build the CUDA.
This used to be added when the deviceRTL was build via the OpenMP
interface, but ever since it was moved to call the packager explicitly
it was not being added. This causes failured if the user attempts to use
the library without LTO enabled.
This fix runtime problem due to generate this[:1] map info for non member
variable.
To fix this check VD, if VD is not null, it is not member from current
or base classes.
Differential Revision: https://reviews.llvm.org/D144616
This interface function does not actually need the device image type.
It's unused in the function, so it should be able to be safely removed.
The motivation for this is to facilitate downsteam porting of the
amd-stg-open RPC module into the nextgen plugin so we can delete the old
plugin entirely. For that to work we need to be able to call this
function at kernel-launch time, which doesn't have the image. Also it's
cleaner.
Reviewed By: jplehr
Differential Revision: https://reviews.llvm.org/D144436
This patch should enable the "Host" allocation using fine-grained
memory. As far as I understand, this is HSA managed memory that is
availible to the host, but can be accessed by the device as well.
The original patch that introduced these extensions just stipulated that
it's "non-migratable" memory, which is most likely true because it's
managed by the host but accessible by the device. This should work
sufficiently well for what we expect the "host" allocation to do.
Depends on D143771
Reviewed By: kevinsala
Differential Revision: https://reviews.llvm.org/D143775
Currently, the AMDGPU plugin did not support the `TARGET_ALLOC_SHARED`
allocation kind. We used the fine-grained memory allocator for the
"host" alloc when this is most likely not what is intended. Fine-grained
memory can be accessed by all agents, so it should be considered shared.
This patch removes the use of fine-grained memory for the host
allocator. A later patch will add support for this via the
`hsa_amd_memory_lock` method.
Reviewed By: kevinsala
Differential Revision: https://reviews.llvm.org/D143771
~TaskAsyncInfoWrapperTy() calls isDone. With synchronize inside isDone, we need to handle the error return from synchronize in the destructor.
The consumers of TaskAsyncInfoWrapperTy, targetDataMapper and targetKernel, both call AsyncInfo.synchronize() before exiting.
For this reason in ~TaskAsyncInfoWrapperTy(), calling synchronize() via isDone() is redundant.
This patch removes synchronize() call inside isDone() and makes it a lightweight check.
__tgt_target_nowait_query needs to call synchronize() before checking isDone().
Differential Revision: https://reviews.llvm.org/D144315
Summary:
Currently when we synchronize the asynchronous queue for the plugins, we
ignore the return value. This is problematic because we will continue on
like nothing happened if the kernel fails.
Fixes https://github.com/llvm/llvm-project/issues/60814
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D144191
06d9bf5e64 (https://reviews.llvm.org/D143431)
did a large restructuring of how the import library is created;
previously, a second step to tweak the import library was only
done for MSVC style targets, but after this commit, that logic
was applied for mingw targets too.
Since LIBOMP_GENERATED_IMP_LIB_FILENAME and LIBOMP_IMP_LIB_FILE
are equal on mingw targets (both are "libomp.dll.a", while they
are "libomp.dll.lib" and "libomp.lib" for MSVC targets), this caused
a conflict, with errors like this:
ninja: error: build.ninja:875: multiple rules generate runtime/src/libomp.dll.a [-w dupbuild=err]
Skip the logic with a second step to recreate the import library
for mingw targets. The MSVC specific logic for this relies on
running the static archiver with CMAKE_LINK_DEF_FILE_FLAG, which
with MS lib.exe (and llvm-lib) ignore the input object files and
just generates an import library - but mingw style tools don't
support this mode of operation. (By attemptinig the same, mingw tools
would generate a static library with the def file as one member.)
With mingw tools, the same can be achieved by invoking the dlltool
executable instead.
Instead of adding alternative logic for invoking dlltool, just skip
the second import library step, since neither GNU nor LLVM mingw
tools actually generate import libraries that link by ordinal - so
there's no need for a second import library.
Differential Revision: https://reviews.llvm.org/D143992
Need to assign the calculated lower bound back to temp variable,
otherwise incorrect value (upper bound instead of lower bound) might be
used.
Differential Revision: https://reviews.llvm.org/D144015
Current runtime implementation only checks for target allocator when libmemkind is
not available. This patch adds checks for target allocator regardless of the
presence of libmemkind library.
Differential Revision: https://reviews.llvm.org/D142582
than ordinal
This check-in changes the OpenMP build script to generate the Windows
import library that imports by name rather than ordinal to reduce
ordinals order dependency and promote runtime flavors compatibility
going forward. The existing ordinals ordering is preserved to maintain
backward compatibility.
Differential Revision: https://reviews.llvm.org/D143431
The GPU plugins have a dependency on the device libraries. Sometimes we
cannot build the device libraries because the user does not have a valid
`clang` to use or it was explicitly disabled. Currently this leads to a
transitive failure because we cannot meet this dependency. This patch
simply removes that dependency.
Fixes https://github.com/llvm/llvm-project/issues/60457
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D143196
With the NPM, we're now defaulting to preserving LCSSA, so a couple
of tests have changed slightly.
Differential Revision: https://reviews.llvm.org/D140982
The th_task_state was initialized from the master thread's value, or
from its memo stack, but this causes problems because neither of those
may have the right value at the right time. However, other threads in
the team are guaranteed to have the right values, so we change the
initialize the new threads' th_task_state from the th_task_state of
the last of the older threads in the hot team.
Differential Revision: https://reviews.llvm.org/D142247Fix#56307.
The memory sanitizer intercepts the memcpy() call but not the direct
assignment of last byte to 0. This leads the sanitizer to believe the
last byte of a string based on the kmp_str_buf_t type is uninitialized.
Hence, the eventual strlen() inside __kmp_env_dump() leads to an
use-of-uninitialized-value warning.
Using strncat() instead gives the sanitizer the information it needs.
Differential Revision: https://reviews.llvm.org/D143401Fixes#60501
The memory sanitizer intercepts the memcpy() call but not the direct
assignment of last byte to 0. This leads the sanitizer to believe the
last byte of a string based on the kmp_str_buf_t type is uninitialized.
Hence, the eventual strlen() inside __kmp_env_dump() leads to an
use-of-uninitialized-value warning.
Using strncat() instead gives the sanitizer the information it needs.
Differential Revision: https://reviews.llvm.org/D143401Fixes#60501
The NextGen plugins use the information regarding new mapping/unmappings to
lock/unlock the corresponding host buffer and speed up the host-device memory
transfers involving those buffers. The locking/unlocking is disabled by default
and can be enabled by the LIBOMPTARGET_LOCK_MAPPED_HOST_BUFFERS envar. The
envar accepts boolean values (on/off) and a special option:
- off: Do not lock mapped host buffers (default).
- on: Lock mapped host buffers automatically, but do not report lock
failures if the plugin fails to lock them.
- mandatory: Lock mapped host buffers automatically and treat locking failures
in the plugins as fatal errors. This option may be useful for
debugging purposes.
Differential Revision: https://reviews.llvm.org/D142514
Summary:
The previous patch also needed to apply this to the other AMDGPU plugin,
this will be removed soon but it should be correct while it's here at
least.
Previously, on non-Linux, amdgpu would get enabled whatever the CPU architecture.
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D143017
While we potentially need to align partially mapped structs more than
the first member, we do not need to align past the struct itself. This
prevents us from moving the base pointer past the struct beginning too.
See https://reviews.llvm.org/D142508 for a discussion.
Reviewed By: pavelkopyl, grokos, jhuber6
Differential Revision: https://reviews.llvm.org/D142586
`check_loc` is not used if ITT is disabled or debug is off, causing a
compiler warning.
Reviewed By: jlpeyton
Differential Revision: https://reviews.llvm.org/D143004
Summary:
We added a new agent information enum in a previous commit. This was not
added to the dynamic HSA implementation so it failed to compile without
a local HSA install to use.
The next-gen plugin properly prints errors. This patch improves the
error messages by including the Node-ID of the GPU that failed as well
as a textual representation of the enumeration values.
Reviewed By: kevinsala
Differential Revision: https://reviews.llvm.org/D143192
When `N` is 1024, `int result[N][N]` is obviously large stack that Windows cannot support...
Fix#60326.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D142684