It's time to remove the old plugins as the next-gen has already been set
to default in LLVM 16.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D142820
These plugins are unmaintained and are not in a workable state. The VE
plugin has not been touched for years and has never had any running
tests. The remote plugin is in an unfinished state and is not production
ready upstream. These will need to be ported to the new nextgen
interface in the future if they are needed.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D154548
AMDGPU provides a fixed frequency clock since some generations back.
However, the frequency is variable by card and must be looked up at
runtime. This patch adds a new device environment line for the clock
frequency so that we can use it in the same way as NVPTX. This is the
correct implementation and the version in ASO should be replaced.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D154456
Summary:
This code used `LIBOMPTARGET_DEBUG` which is not the macro name, but the
environment variable. This caused this portion to always be disabled. In
the long run we should aim for this to always be availible as it's
useful for other diagnostic message.
In the case of partially mapped structs, libomptarget sometimes adds
padding to device allocations to ensure they are aligned properly.
However, without this patch, it considers that padding to be mapped to
the host, which can cause presence checks (e.g.,
`omp_target_is_present` or a `present` modifier) to misbehave for
unmapped parts of the struct. This patch keeps the padding but treats
it as unmapped. See the new test case for examples.
Reviewed By: grokos, jdoerfert
Differential Revision: https://reviews.llvm.org/D149685
With https://reviews.llvm.org/D137524, memory scope and ordering
attributes are being used to generate the required instructions for
atomic inc/dec on AMDGPU. This patch adds the memory scope attribute to
the atomic::inc API and uses the device scope in reduction. Without
the device scope in atomic_inc, the default system scope leads to
unnecessary L2 write-backs/invalidates.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D154172
A previous patch by @arsenm adjusted these to find the `amdgpu-arch`
tool correctly if we do a `LLVM_ENABLE_PROJECTS` build. This patch
applies the same to `nvptx-arch` tool to keep it consistent.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D154107
Flang currently supports offloading for AMD GPUs. This patch establishes a test structure for Fortran offloading tests in libomptarget.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D148778
SubtargetFeature.h is currently part of MC while it doesn't depend on
anything in MC. Since some LLVM components might have the need to work
with target features without necessarily needing MC, it might be
worthwhile to move SubtargetFeature.h to a different location. This will
reduce the dependencies of said components.
Note that I choose TargetParser as the destination because that's where
Triple lives and SubtargetFeatures feels related to that.
This issues came up during a JITLink review (D149522). JITLink would
like to avoid a dependency on MC while still needing to store target
features.
Reviewed By: MaskRay, arsenm
Differential Revision: https://reviews.llvm.org/D150549
This patch implements the "__kmp_print_tdg_dot" function, that prints a task dependency graph into a dot file containing the tasks and their dependencies.
It is activated through a new environment variable "KMP_TDG_DOT"
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D150962
The default version of OpenMP is updated from 5.0 to 5.1 which means if -fopenmp is specified but -fopenmp-version is not specified with clang, the default version of OpenMP is taken to be 5.1. After modifying the Frontend for that, various LIT tests were updated. This patch contains all such changes. At a high level, these are the patterns of changes observed in LIT tests -
# RUN lines which mentioned `-fopenmp-version=50` need to kept only if the IR for version 5.0 and 5.1 are different. Otherwise only one RUN line with no version info(i.e. default version) needs to be there.
# Test cases of this sort already had the RUN lines with respect to the older default version 5.0 and the version 5.1. Only swapping the version specification flag `-fopenmp-version` from newer version RUN line to older version RUN line is required.
# Diagnostics: Remove the 5.0 version specific RUN lines if there was no difference in the Diagnostics messages with respect to the default 5.1.
# Diagnostics: In case there was any difference in diagnostics messages between 5.0 and 5.1, mention version specific messages in tests.
# If the test contained version specific ifdef's e.g. "#ifdef OMP5" but there were no RUN lines for any other version than 5.X, then bring the code guarded by ifdef's outside and remove the ifdef's.
# Some tests had RUN lines for both 5.0 and 5.1 versions, but it is found that the IR for 5.0 is not different from the 5.1, therefore such RUN lines are redundant. So, such duplicated lines are removed.
# To generate CHECK lines automatically, use the script llvm/utils/update_cc_test_checks.py
Reviewed By: saiislam, ABataev
Differential Revision: https://reviews.llvm.org/D129635
(cherry picked from commit 9dd2999907dc791136a75238a6000f69bf67cf4e)
This patch fixes the issue that, if we have a compile-time serialized parallel
region (such as `if (0)`) with `num_threads`, followed by a regular parallel
region, the regular parallel region will pick up the value set in the serialized
parallel region incorrectly. The reason is, in the front end, if we can prove a
parallel region has to serialized, instead of emitting `__kmpc_fork_call`, the
front end directly emits `__kmpc_serialized_parallel`, body, and `__kmpc_end_serialized_parallel`.
However, this "optimization" doesn't consider the case where `num_threads` is
used such that `__kmpc_push_num_threads` is still emitted. Since we don't reset
the value in `__kmpc_serialized_parallel`, it will affect the next parallel region
followed by it.
Fix#63197.
Reviewed By: tlwilmar
Differential Revision: https://reviews.llvm.org/D152883
After D151324, which landed as 349c0aacb3, many libomptarget non-LTO
nvptx64 tests fail with errors like:
```
clang: error: bitcode library '/tmp/llvm-project/build/runtimes/runtimes-bins/openmp/libomptarget/libomptarget-nvptx-sm_70.bc' does not exist
```
This patch updates the bc path.
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D152462
The logic and implementation follows the removal of no-op barriers. If
the fence is not making updates visible, either to the world or the
current thread, it is not needed. Said differently, the fences we remove
do not establish synchronization (happens-before) edges.
This allows us to eliminate some of the regression caused by:
https://reviews.llvm.org/D145290
If a combined loop has insufficient parallelism (= low trip count), we
might end up with too few teams/blocks. To counter that we can reduce
the number of threads per team we use. This patch implements a heuristic
and exposes a new environment variable to control the minimum of threads
to be employed in this case.
Issue reported by:
Felipe Cabarcas Jaramillo <cabarcas@udel.edu> (@fel-cab).
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D152014
Without this patch, the following example crashes Clang:
```
#pragma omp target map(i)
#pragma omp tile sizes(2)
for (i = 0; i < N; ++i)
;
```
This patch fixes the crash by changing `Sema::isOpenMPPrivateDecl` not
to identify `i` as private just because it's the loop variable of a
`tile` construct.
While OpenMP TR11 and earlier do specify privacy for loop variables of
loops *generated* from a `tile` construct, I haven't found text
stating that the original loop variable must be private in the above
example, so this patch leaves it shared. Even so, it is a bit
unexpected that value of `i` after the loop is `N - 1` instead of `N`.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D151356
The OpenMP DeviceRTL uses a hacky workaround to keep certain runtime
calls alive. This used a function that prevented them from being
optimized out. We needed this hack because the 'OpenMPOpt' pass likes to
introduce new runtime calls into the TU. This then interacted badly with
the method of linking the bitcode file per-TU like we do with Nvidia.
The OpenMPOpt pass would then generate a runtime call to a function that
was never linked in.
This should not be a problem anymore because we unconditionally link in
the `libomptarget.devicertl.a` runtime library. This should thus only
extract symbols that are undefined. So, if we do end up with an
unresolved reference it will be resolved by the static library.
The downside to this is that if we are doing non-LTO NVPTX compilation
that introduces one of these calls it will be linked outside the module
and therefore provide the overhead of an external function call.
However, removing this flag should make optimizing things easier. We
will need to see if that performance is a problem.
Reviewed By: ye-luo
Differential Revision: https://reviews.llvm.org/D151324
This reverts commit d763c6e5e2.
Adds the patch by @hans from
https://github.com/llvm/llvm-project/issues/62719
This patch fixes the Windows build.
d763c6e5e2 reverted the reviews
D144509 [CMake] Bumps minimum version to 3.20.0.
This partly undoes D137724.
This change has been discussed on discourse
https://discourse.llvm.org/t/rfc-upgrading-llvms-minimum-required-cmake-version/66193
Note this does not remove work-arounds for older CMake versions, that
will be done in followup patches.
D150532 [OpenMP] Compile assembly files as ASM, not C
Since CMake 3.20, CMake explicitly passes "-x c" (or equivalent)
when compiling a file which has been set as having the language
C. This behaviour change only takes place if "cmake_minimum_required"
is set to 3.20 or newer, or if the policy CMP0119 is set to new.
Attempting to compile assembly files with "-x c" fails, however
this is workarounded in many cases, as OpenMP overrides this with
"-x assembler-with-cpp", however this is only added for non-Windows
targets.
Thus, after increasing cmake_minimum_required to 3.20, this breaks
compiling the GNU assembly for Windows targets; the GNU assembly is
used for ARM and AArch64 Windows targets when building with Clang.
This patch unbreaks that.
D150688 [cmake] Set CMP0091 to fix Windows builds after the cmake_minimum_required bump
The build uses other mechanism to select the runtime.
Fixes#62719
Reviewed By: #libc, Mordante
Differential Revision: https://reviews.llvm.org/D151344
It seems load of traits.addr should be passed in runtime call. Currently
the load of load traits.addr gets passed cause runtime to fail.
To fix this, skip the call to EmitLoadOfScalar for extra load.
Differential Revision: https://reviews.llvm.org/D151576
This is an ongoing series of commits that are reformatting our
Python code. This catches the last of the python files to
reformat. Since they where so few I bunched them together.
Reformatting is done with `black`.
If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.
If you run into any problems, post to discourse about it and
we will try to help.
RFC Thread below:
https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style
Reviewed By: jhenderson, #libc, Mordante, sivachandra
Differential Revision: https://reviews.llvm.org/D150784
The interop API routines try to invoke external entries, but we did
not have support for KMP_DLSYM_NEXT on Windows. Also added proper
guards for STUB build.
Differential Revision: https://reviews.llvm.org/D149892
These files aren't fully formatted. I'm guessing this was a holdover
from when `clang-format` was totally broken for OpenMP offloading.
Format the files to be more consistent.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D151226
This patch properly marks the support level for libomp test when testing with
GCC.
Some new OpenMP features were only introduced with GCC 11.
Tests using the target construct are incompatibe with GCC.
Tests pass now with GCC 10, 11, 12
Codegen for some OpenMP directives is different from clang, so some
OMPT tests fail. As we don't expect GCC codegen to change significantly,
we mark the tests as unsupported for GCC.
OMPT Tests pass now with GCC 10, 11, 12
TSan in GCC filters duplicates less aggressively. With 8 threads we can
expect reports for up to 7 pairs of data race in some tests.
Tests pass now with GCC 10, 11, 12
On some platforms, std::abs may inadvertently pull in a math library.
This patch replaces its use in the new loop collapse code with
a no thrills in-situ implementation.
Differential Revision: https://reviews.llvm.org/D150882
The OpenMP target JIT needs testing, so does the IR we actually generate
for the device. This is the initial commit with variations of an "empty
OpenMP kernel" that should all result in a empty IR kernel.
Differential revision: https://reviews.llvm.org/D150623
MacOS build of LLVM with OpenMP enabled fails with an error
that it doesn't know what std::abs is. Fix by including <cmath>
so that the relevant function declaration is included.
No functional change intended.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D150687