Summary:
[nfc][libomptarget] Inline option into target_impl
Subset of D69423. The macros that were in option.h are all target dependent.
Inlining the header simplifies the dependency graph when looking to move code
into a common subdir.
Reviewers: ABataev, jdoerfert, grokos
Subscribers: openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D69472
Summary:
[NFC][libomptarget] move remaining device specific code out of omptarget-nvptx.h
Strictly there is one remaining difference wrt amdgcn - parallelLevel is
volatile qualified on amdgcn and not on nvptx. Determining whether this is
correct - and how to represent the different semantics of 'volatile' under
various conditions - is beyond the scope of this code motion patch.
Reviewers: ABataev, jdoerfert, grokos
Subscribers: openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D69424
Details:
- nconflicts field initialized;
- formatting fix (moved declaration out of the long line);
- count conflicts in new hash as opposed to old one.
Differential Revision: https://reviews.llvm.org/D68036
Summary:
[libomptarget][nfc] Make interface.h target independent
Move interface.h under a top level include directory.
Remove #includes to avoid the interface depending on the implementation.
Reviewers: ABataev, jdoerfert, grokos, ronlieb, RaviNarayanaswamy
Reviewed By: jdoerfert
Subscribers: mgorny, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D68615
llvm-svn: 374919
Summary:
[libomptarget][nfc] Update remaining uint32 to use lanemask_t
Update a few functions in the API to use lanemask_t instead of i32. NFC for
nvptx. Also update the ActiveThreads type in DataSharingStateTy.
This removes a lot of #ifdef from the downsteam amdgcn implementation.
Reviewers: ABataev, jdoerfert, grokos, ronlieb, RaviNarayanaswamy
Subscribers: openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D68513
llvm-svn: 373806
/proc unless Linux layer compatibility is activated for CentOS is activated is not present
thus relying on a more native for checking the address.
Reviewers: Hahnfeld, kongyl, jdoerfert, jlpeyton, AndreyChurbanov, emaster, dim
Reviewed By: Hahnfeld
Differential Revision: https://reviews.llvm.org/D67326
llvm-svn: 373152
Linker automatically provides __start_<section name> and __stop_<section name> symbols to satisfy unresolved references if <section name> is representable as a C identifier (see https://sourceware.org/binutils/docs/ld/Input-Section-Example.html for details). These symbols indicate the start address and end address of the output section respectively. Therefore, renaming OpenMP offload entries section name from ".omp.offloading_entries" to "omp_offloading_entries" to use this feature.
This is the first part of the patch for eliminating OpenMP linker script (please see https://reviews.llvm.org/D64943).
Differential Revision: https://reviews.llvm.org/D68070
llvm-svn: 373118
There's no need to initialize variables with static storage duration
because they're implicitly initialized to zero. See
https://en.cppreference.com/w/c/language/initialization#Implicit_initialization
I think that's already relied upon because the supplied 0 only sets
'kmp_time_global_t g_time;' in 'struct kmp_base_global'. The other fields
are not set in the code, but implicitly initialized by the compiler.
Differential Revision: https://reviews.llvm.org/D66292
llvm-svn: 370943
Summary:
In non-SPMD mode we may end up with the divergent threads when trying to
increment/decrement parallel level counter. It may lead to incorrect
calculations of the parallel level and wrong results when threads are
divergent. We need to reconverge the threads before trying to modify the
parallel level counter.
Reviewers: grokos, jdoerfert
Subscribers: guansong, openmp-commits, caomhin, kkwli0
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D66802
llvm-svn: 370803
Summary:
Use target_impl functions to replace more inline asm
Follow on from D65836. Removes remaining asm shuffles and lanemask accessors
Also changes the types of target_impl bitwise functions to unsigned.
Reviewers: jdoerfert, ABataev, grokos, Hahnfeld, gregrodgers, ronlieb, hfinkel
Subscribers: openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D66809
llvm-svn: 370216
Summary:
[libomptarget] Refactor syncthreads macro to inline function
See also abandoned D66846, split into this diff and others.
Rev 2 of D66855
Reviewers: jdoerfert, ABataev, grokos, ronlieb, gregrodgers
Subscribers: openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D66861
llvm-svn: 370210
Summary:
[libomptarget] Refactor syncwarp macro to inline function
See also abandoned D66846, split into this diff and others.
Reviewers: jdoerfert, ABataev, grokos, ronlieb, gregrodgers
Subscribers: openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D66857
llvm-svn: 370149
Summary:
[libomptarget] Refactor shfl_down_sync macro to inline function
See also abandoned D66846, split into this diff and others.
Reviewers: jdoerfert, ABataev, grokos, ronlieb, gregrodgers
Subscribers: openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D66853
llvm-svn: 370146
Summary:
[libomptarget] Refactor shfl_sync macro to inline function
See also abandoned D66846, split into this diff and others.
Reviewers: jdoerfert, ABataev, grokos, ronlieb, gregrodgers
Subscribers: openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D66852
llvm-svn: 370144
Summary:
Added function void __kmpc_syncwarp(int32_t) to expose it to the
compiler. It is required to fix the problem with the critical regions in
Cuda9.0+. We cannot use barrier in the critical region, but still need
to reconverge the threads in the warp after. This function allows to do
this.
Reviewers: grokos, jdoerfert
Subscribers: guansong, openmp-commits, kkwli0, caomhin
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D66672
llvm-svn: 369933
Summary:
In Cuda 9.0 it is not guaranteed that threads in the warps are
convergent. We need to use __syncwarp() function to reconverge
the threads and to guarantee the memory ordering among threads in the
warps.
This is the first patch to fix the problem with the test
libomptarget/deviceRTLs/nvptx/src/sync.cu on Cuda9+.
This patch just replaces calls to __shfl_sync() function with the call
of __syncwarp() function where we need to reconverge the threads when we
try to modify the value of the parallel level counter.
Reviewers: grokos
Subscribers: guansong, jfb, jdoerfert, caomhin, kkwli0, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D65013
llvm-svn: 369796
This patch fixes https://bugs.llvm.org/show_bug.cgi?id=42906, via adding
adjustment of number of threads on enter to the teams construct on host
according to user settings. This allows to pass checks and avoid assertions
at time of team of threads creation.
Patch by Andrey Churbanov
Differential Revision: https://reviews.llvm.org/D66351
llvm-svn: 369430
Fix last warned location in ittnotify_static.cpp using the defined
macro KMP_FALLTHROUGH().
Differential Revision: https://reviews.llvm.org/D65871
llvm-svn: 369003
The variables in kmp_lock.cpp are really arrays of function pointers
that return void or int, not pointers to functions that return void*
or int*. The other changes are only cosmetic.
Differential Revision: https://reviews.llvm.org/D65870
llvm-svn: 369002
The implementation status can only be one of
ompt_event_UNIMPLEMENTED = ompt_set_never = 1
ompt_event_MAY_ALWAYS = ompt_set_always = 5
In both cases, the condition was already true, so just remove
the check.
Differential Revision: https://reviews.llvm.org/D65869
llvm-svn: 369001
Instead, maintain a list of disabled options to still build libomp and
libomptarget without warnings. This includes -Wno-error and -Wno-pedantic
to silence warnings that LLVM enables when building in-tree.
I tested the following compilers:
* Clang 6.0, 7.0, 8.0
* GCC 4.8.5 (CentOS 7), GCC 6, 7, 8, 9
* Intel Compiler 16, 17, 18, 19
RFC thread on openmp-dev mailing list:
http://lists.llvm.org/pipermail/openmp-dev/2019-August/002668.html
Differential Revision: https://reviews.llvm.org/D65867
llvm-svn: 368999
Summary:
[libomptarget] Factor architecture dependent code out of loop.cu
Related to the patch series starting D64217. Added subscribers to said series as reviewers. This effort is smaller in scope.
This patch factors out just enough architecture dependent code from loop.cu to allow the same source to be used with amdgcn, given a different target_impl.h. Testing is that the same bitcode (modulo variable names) is generated for libomptarget before and after the refactor, for nvptx and the out of tree amdgcn.
Reviewers: jdoerfert, ABataev, bollu, jfb, tra, grokos, Hahnfeld, guansong, xtian, gregrodgers, ronlieb, hfinkel, gtbercea, guraypp, arpith-jacob
Reviewed By: jdoerfert, ABataev
Subscribers: dexonsmith, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D65836
llvm-svn: 368751
This patch fixes problem raised in post-review comments of the
https://reviews.llvm.org/D65285. Developers of ittnotify confirmed
that dll_path_ptr field of the __itt_global structure is never used
by ittnotify library, so it is safe to remove the dll_path array.
Differential Revision: https://reviews.llvm.org/D65885
llvm-svn: 368559
Summary:
This patch adds support for the close map modifier.
The close map modifier will overwrite the unified shared memory requirement and create a device copy of the data.
Reviewers: ABataev, Hahnfeld, caomhin, grokos, jdoerfert, AlexEichenberger
Reviewed By: Hahnfeld, AlexEichenberger
Subscribers: guansong, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D65340
llvm-svn: 368488
We have one global RTLs.RequiresFlags, I don't see a need to make a
copy per device that the runtime manages. This was problematic anyway
because the copy happened during the first __tgt_register_lib(). This
made it impossible to call __tgt_register_requires() from normal user
funtions for testing.
Hence, this change also fixes unified_shared_memory/shared_update.c for
older versions of Clang that don't call __tgt_register_requires() before
__tgt_register_lib().
Differential Revision: https://reviews.llvm.org/D66019
llvm-svn: 368465
Summary:
This patch adds support for using unified memory in the case of regular maps that happen when a target region is offloaded to the device.
For cases where only a single version of the data is required then the host address can be used. When variables need to be privatized in any way or globalized, then the copy to the device is still required for correctness.
Reviewers: ABataev, jdoerfert, Hahnfeld, AlexEichenberger, caomhin, grokos
Reviewed By: Hahnfeld
Subscribers: mgorny, guansong, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D65001
llvm-svn: 368192
Summary:
[libomptarget] Use forceinline. Necessary for nvcc to inline small functions within the bitcode library
Suggested in D65836
Reviewers: ABataev, jdoerfert, grokos, gregrodgers
Subscribers: openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D65876
llvm-svn: 368177
New OMPT tests with teams construct should be disabled for GCC as it
emits code with a GOMP entry not supported in the LLVM runtime.
Differential Revision: https://reviews.llvm.org/D65757
llvm-svn: 367939
Ensures that CUDA fail reasons (such as "No CUDA-capable device detected")
are printed together with libomptarget's debug message
(e.g. "Error when setting CUDA context"). Previously, the former was
printed only in CMAKE_BUILD_TYPE=Debug builds while the latter was
enabled by LIBOMPTARGET_ENABLE_DEBUG.
With this change, also only call cuGetErrorString when the error will be
printed.
Suggested-by: Ye Luo <xw111luoye@gmail.com>
Differential Revision: https://reviews.llvm.org/D65687
llvm-svn: 367910
This patch implements the libomptarget runtime interface for OpenMP 5.0
declare mapper functions. The declare mapper functions generated by
Clang will call them to complete the mapping of members.
kmpc_mapper_num_components gets the current number of components for a
user-defined mapper; kmpc_push_mapper_component pushes back one
component for a user-defined mapper.
The design slides can be found at
https://github.com/lingda-li/public-sharing/blob/master/mapper_runtime_design.pptx
Patch by Lingda Li <lildmh@gmail.com>
Differential Revision: https://reviews.llvm.org/D60972
llvm-svn: 367772
All other files are already C++ and the build system has always
passed '-x c++' for C files, effectively compiling them as C++.
To stay warning free we need one fix in ittnotify_static.{c,cpp}:
The variable dll_path can be written to, so it must not be const.
GCC complained with -Wcast-qual and I think it's right.
Differential Revision: https://reviews.llvm.org/D65285
llvm-svn: 367343
Round the stack size to a multiple of the page size. Older versions of
Android (until KitKat) would fail pthread_attr_setstacksize with
EINVAL if the stack size was not a multiple of the page size.
Patch by Dan Albert <danalbert@google.com>.
Test: Build, copied into the NDK, passed openmp test on ICS.
Bug: https://github.com/android-ndk/ndk/issues/9
llvm-svn: 367070
Both Clang and GCC complained that they cannot initialize a return
object of type 'kmp_proc_bind_t' with an 'int'. While at it, also
fix a warning about missing parentheses thrown by Clang.
Differential Revision: https://reviews.llvm.org/D65284
llvm-svn: 367041
Summary:
According to the OpenMP standard, barrier operation must perform
implicit flush operation. Currently, if there is only one thread in the
team, barrier does not flush the memory. Patch fixes this problem.
Reviewers: grokos, gtbercea, kkwli0
Subscribers: guansong, jdoerfert, openmp-commits, caomhin
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D62398
llvm-svn: 367024
This is a port of libomp for the RISC-V 64-bit Linux target.
We have tested this port on a HiFive Unleashed development board
using a downstream LLVM that has support for the missing bits in
upstream. As of now, all tests are passing, including OMPT.
Patch by Ferran Pallarès!
Differential Revision: https://reviews.llvm.org/D59880
llvm-svn: 367021
If the first target region in a program calls the push_tripcount
function, libomptarget didn't handle the offload policy correctly.
This could lead to unexpected error messages as seen in
http://lists.llvm.org/pipermail/openmp-dev/2019-June/002561.html
To solve this, add a check calling IsOffloadDisabled() as all other
entry points already do. If this method returns false, libomptarget
is effectively disabled.
Differential Revision: https://reviews.llvm.org/D64626
llvm-svn: 366810
This is done at call-site and does not need to be handled in
__kmp_invoke_microtask. It was already absent from the x86
and x86_64 assembly, this patch removes it from the generic
implementation in z_Linux_util.cpp and adds documentation for
AArch64 and PPC64 that it's actually not needed. I can't test
on these architectures, so I don't want to change the code just
because it looks right :)
While at it, rename some variables for consistency and add a
check in test/ompt/parallel/normal.c that the pointer was reset
before entering the barrier.
Differential Revision: https://reviews.llvm.org/D64442
llvm-svn: 366721
Remove loopTripCnt from threaded device stack after consuming it.
Added a libomptarget DP message to aid in future debugging and to
validate the added testcase, which only runs in Debug build.
Differential Revision: https://reviews.llvm.org/D64808
llvm-svn: 366349
This is a follow up patch to D64534 (r365963) which removed all OMP
spec versioning within the OpenMP runtime codebase. This patch removes
REQUIRES: openmp-x.y lines from lit tests.
llvm-svn: 366341
This leads to problems when compiling C++ code with libc++ for Nvidia GPUs
because Clang now uses wrappers for math functions that might include
C++ templates not allowed in 'extern "C"'.
Differentiel Revision: https://reviews.llvm.org/D64625
llvm-svn: 366229
Summary:
We used CUDART_VERSION macro to check for the installed cuda version
but this macro is defined in cuda_runtime_api.h, which is not used by
project. Better to use CUDA_VERSION macro, which is defined in cuda.h.
Also, added the check if this macro is defined. If macro is undefined,
there is something wrong with the cuda configuration and we should not
continue the compilation.
This also fixes problems with runtime building in cuda 10+.
Reviewers: grokos
Subscribers: guansong, jdoerfert, caomhin, kkwli0, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D64648
llvm-svn: 366224
Summary:
We used to call __kmpc_omp_taskwait function with global threadid set to
0. It may crash the application at the runtime if the thread executing
target region is not a master thread.
Reviewers: grokos, kkwli0
Subscribers: guansong, jdoerfert, caomhin, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D64571
llvm-svn: 366220
Remove all older OMP spec versioning from the runtime and build system.
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D64534
llvm-svn: 365963
These entry points are never called by Clang trunk nor clang-ykt. If
XL doesn't use them either, they can finally go away.
Differential Revision: https://reviews.llvm.org/D52700
llvm-svn: 365817
Summary:
__kmpc_push_tripcount function is not thread safe and may lead to data
race when the target regions are executed in parallel threads. The patch
makes loopTripCnt counter thread aware and stores the tripcount value
per thread in the map. Access to map is guarded by mutex to prevent
data race in the map itself.
Test is for NVPTX target because it does not work correctly on the
host. Seems to me, there is a problem in libomp with target regions in
the parallel threads.
Reviewers: grokos
Subscribers: guansong, jfb, jdoerfert, openmp-commits, kkwli0, caomhin
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D64080
llvm-svn: 365332
Summary:
According to the OpenMP standard, flush makes a thread’s temporary view of memory consistent with memory and enforces an order on the memory operations of the variables explicitly specified or implied.
According to the Cuda toolkit documentation (https://docs.nvidia.com/cuda/archive/8.0/cuda-c-programming-guide/index.html#memory-fence-functions), __threadfence() functions provides required functionality.
__threadfence_system() also provides required functionality, but it also
includes some extra functionality, like synchronization of page-locked
host memory, synchronization for the host, etc. It is not required per
the standard and we can use more relaxed version of memory fence
operation.
Reviewers: grokos, gtbercea, kkwli0
Subscribers: guansong, jfb, jdoerfert, openmp-commits, caomhin
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D62397
llvm-svn: 364572
Bug reported in https://bugs.llvm.org/show_bug.cgi?id=42269.
Freeing of the contention group (CG) stucture by master thread looks wrong,
because workers can leave the CG later on. Intead the freeing
is now done by the last thread leaving the CG.
Differential Revision: https://reviews.llvm.org/D63599
llvm-svn: 364456
Summary:
This patch adds support for handling variables under the:
```
#pragma omp declare target to()
```
clause when the
```
#pragma omp requires unified_shared_memory
```
is used.
The address of the host variable is copied into the device pointer just like for the declare target link case.
Reviewers: ABataev, caomhin, grokos, AlexEichenberger
Reviewed By: grokos
Subscribers: jcownie, guansong, jdoerfert, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D63106
llvm-svn: 363825
Summary:
The problems with __syncthreads() were fixed in clang >= 9.0 and the
original __syncthreads() can be used instead of the ptx instruction.
Reviewers: grokos
Subscribers: guansong, jdoerfert, openmp-commits, kkwli0, caomhin
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D63515
llvm-svn: 363807
Summary: This patch enables the usage of a host variable on the device for declare target link variables when unified memory is available.
Reviewers: ABataev, caomhin, grokos
Reviewed By: grokos
Subscribers: Hahnfeld, guansong, jdoerfert, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D60884
llvm-svn: 362505
Made type of depth of hwloc object to correapond with
change from unsigned in hwloc 1,x to int in hwloc 2.x.
This eliminates the warning on signed-unsigned comparison.
Differential Revision: https://reviews.llvm.org/D62332
llvm-svn: 362401
Current parsing allows trailing string after the permitted value,
MANDATORY|DISABLED|DEFAULT -- e.g., "mandatorynot" is also recognized
as "MANDATORY". Such cases should be recognized as incorrect/unknown
value.
Differential Revision: https://reviews.llvm.org/D62431
llvm-svn: 362125
This change adds checks before dereferencing a pointer returned from a
function.
Differential Revision: https://reviews.llvm.org/D62224
llvm-svn: 362111
The omp_taskloop_num_tasks and omp_taskwait have deadlooped
on the NetBSD buildbot previously, practically hanging the host running
it. Disable them until we can find a good solution, or make the kernel
less fragile.
llvm-svn: 361825
Summary:
Parallel level counter should be volatile to prevent some dangerous
optimiations by the ptxas. Otherwise, ptxas optimizations lead to
undefined behaviour in some cases.
Also, use __threadfence() for #pragma omp flush and if the barrier
should not be used (we have only one thread in the team), still perform
flush operation since the standard requires implicit flush when
executing barriers.
Reviewers: gtbercea, kkwli0, grokos
Subscribers: guansong, jfb, jdoerfert, openmp-commits, caomhin
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D62199
llvm-svn: 361421
This change adds implementation to ompt_finalize_tool() and
ompt_get_task_memory().
Patch by Hansang Bae
Differential Revision: https://reviews.llvm.org/D61657
llvm-svn: 361309
Summary:
Target link variables are currently implemented by creating a copy of the variables on the device side and unified memory never gets exploited.
When the prgram uses the:
```
#pragma omp requires unified_shared_memory
```
directive in conjunction with a declare target link, the linked variable is no longer allocated on the device and the host version is used instead.
This behavior is overridden by performing an explicit mapping.
A Clang side patch is required.
Reviewers: ABataev, AlexEichenberger, grokos, Hahnfeld
Reviewed By: AlexEichenberger, grokos, Hahnfeld
Subscribers: Hahnfeld, jfb, guansong, jdoerfert, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D60223
llvm-svn: 361294
https://reviews.llvm.org/D58454 did not fix the problem for a typical use
case of building LLVM with gcc or icc and then testing with the newly built
clang compiler.
The compilers do not agree on how to extend a 32-bit pointer to uint64, so
make the pointer unsigned first, before adjusting the size.
Patch by Joachim Protze
Differential Revision: https://reviews.llvm.org/D58506
llvm-svn: 361158
OpenMP 5.0 says that the callback for the events initial-task-begin and
initial-task-end has to be ompt_callback_implicit_task.
Patch by Tim Cramer
Differential Revision: https://reviews.llvm.org/D58776
llvm-svn: 361157
Added synchronization for possible concurrent initialization of mutexes
by multiple threads. The need of synchronization caused by commit r357927
which added the use of mutexes at threads movement to/from common pool
(earlier the mutexes were used only at suspend/resume).
Patch by Johnny Peyton.
Differential Revision: https://reviews.llvm.org/D61995
llvm-svn: 360919
Currently cores within package that share the same L2 cache are grouped together.
The current logic behind this assumes that the L2 cache is always at deeper
(or the same) level than the package itself. In case when L2 cache is common
for all packages (and the packages are at deeper level than L2 cache) the whole of
the further topology discovery fails to find any computational units resulting in
following assertion:
Assertion failure at kmp_affinity.cpp(715): nActiveThreads == __kmp_avail_proc.
OMP: Error #13: Assertion failure at kmp_affinity.cpp(715).
This patch adds a bit of a logic that prevents such situation from occurring.
Differential Revision: https://reviews.llvm.org/D61796
llvm-svn: 360890
Removed unconditional and unsafe decrement of counter
of active threads in pool at shutdown time.
Differential Revision: https://reviews.llvm.org/D61944
llvm-svn: 360784
The implementation should be done by compiler, user can only declare
objects of this type and use them in OpenMP directives.
Differential Revision: https://reviews.llvm.org/D61860
llvm-svn: 360774
The code is currently using the ambiguous instruction
"sub sp, sp, w9, lsl #4". The ARM reference manual says this isn't
valid, and it's not clear whether it's supposed to mean uxtw or uxtx.
It doesn't matter which instruction we use here, since the high
bits of the operand are zero anyway, so I arbitrarily choose uxtw, to
preserve the register name.
See https://reviews.llvm.org/D60840 for the LLVM patch.
Differential Revision: https://reviews.llvm.org/D61770
llvm-svn: 360711
Changed file extension of the destination of the copy of libomp.lib
(it was mistakely .dll, now it is .lib) in installation on Windows.
Differential Revision: https://reviews.llvm.org/D61673
llvm-svn: 360595
Summary:
Patch improves performance of the full runtime mode by moving
threads limit counter to the shared memory. It also allows to save
global memory.
Reviewers: grokos, kkwli0, gtbercea
Subscribers: guansong, jdoerfert, openmp-commits, caomhin
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D61801
llvm-svn: 360584
Summary:
Patch improves performance of the full runtime mode by moving
number-of-threads counter to the shared memory. It also allows to save
global memory.
Reviewers: grokos, gtbercea, kkwli0
Subscribers: guansong, jfb, jdoerfert, openmp-commits, caomhin
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D61785
llvm-svn: 360457
This patch provides workaround to allow gfortran to compile the
OpenMP Fortran modules.
From the gfortran manual:
https://gcc.gnu.org/onlinedocs/gcc-9.1.0/gfortran/BOZ-literal-constants.html
"Note that initializing an INTEGER variable with a statement such as
DATA i/Z'FFFFFFFF'/ will give an integer overflow error rather than the desired
result of -1 when i is a 32-bit integer on a system that supports 64-bit
integers. The -fno-range-check option can be used as a workaround for legacy
code that initializes integers in this manner."
Bug filed: https://bugs.llvm.org/show_bug.cgi?id=41755
Differential Revision: https://reviews.llvm.org/D61603
llvm-svn: 360299
Summary:
To be able to successfully build OpenMP on FreeBSD/i386, which still
uses i486 as its default processor, I had to provide wrappers for the
`__kmp_load_mxcsr` and `__kmp_store_mxcsr` functions.
If the compiler signals that SSE is not available, loading and storing
mxcsr does not make sense anway, so in that case the inline functions
are empty. This gives the minimum amount of code churn.
See also https://svnweb.freebsd.org/changeset/base/345283
Reviewers: emaste, jlpeyton, Hahnfeld
Reviewed By: jlpeyton
Subscribers: hfinkel, krytarowski, jdoerfert, openmp-commits, llvm-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D60916
llvm-svn: 360062
Summary:
Patch improves performance of the full runtime mode by moving
thread-limit counter to the shared memory. It also allows to save
global memory.
Reviewers: grokos, gtbercea, kkwli0
Subscribers: guansong, jdoerfert, caomhin, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D61526
llvm-svn: 359922
Summary:
Used parallelLevel[] counter to simplify and improve implementation of
the existing standard OpenMP functions. Functions are tested already in
several tests, the patch is NFC.
Reviewers: grokos, gtbercea, kkwli0
Subscribers: guansong, jdoerfert, caomhin, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D61459
llvm-svn: 359892
Summary:
Previously for the different purposes we need to get the active/common
parallel level and with full runtime we iterated over all the records to
calculate this level. Instead, we can used the warp-based parallel level
counters used in no-runtime mode.
Reviewers: grokos, gtbercea, kkwli0
Subscribers: guansong, jfb, jdoerfert, caomhin, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D61395
llvm-svn: 359822
Summary:
Function omp_get_thread_limit() in SPMD mode can return the maximum
available number of threads as a result.
Reviewers: grokos, gtbercea, kkwli0
Subscribers: guansong, jdoerfert, openmp-commits, caomhin
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D61378
llvm-svn: 359790
Summary:
To be able to successfully build OpenMP on 32-bit FreeBSD, such as
FreeBSD/i386, I first had to provide a few wrappers (see D60916), and
then add `KMP_OS_FREEBSD` to the list of defines checked for 32-bit
architectures in `kmp_runtime.cpp`.
I have successfully built libomp.so and ran a bunch of test programs on
FreeBSD/i386 with this.
See also https://svnweb.freebsd.org/changeset/base/345283
Reviewers: emaste, jlpeyton, Hahnfeld
Reviewed By: jlpeyton
Subscribers: krytarowski, guansong, jdoerfert, openmp-commits, llvm-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D60917
llvm-svn: 359716
Implemented task modifier in two versions - one without taking into account
omp_orig variable (the omp_orig still can be processed by compiler without help
of the library, but each reduction object will need separate initializer with
global access to omp_orig), another with omp_orig variable included into
interface (single initializer can be used for multiple reduction objects of
the same type). Second version can be used when the omp_orig is not globally
accessible, or to optimize code in case of multiple reduction objects
of the same type.
Patch by Andrey Churbanov
Differential Revision: https://reviews.llvm.org/D60976
llvm-svn: 359710
This patch adds:
* New omp_sched_monotonic flag to omp_sched_t which is handled within the runtime
* Parsing of monotonic/nonmonotonic in OMP_SCHEDULE
* Tests for the monotonic flag and envirable parsing
* Logic to force monotonic when hierarchical scheduling is used
Differential Revision: https://reviews.llvm.org/D60979
llvm-svn: 359601
Summary:
The parallelLevel counter must be on per-thread basis to fully support
L2+ parallelism, otherwise we may end up with undefined behavior.
Introduce the parallelLevel on per-warp basis using shared memory. It
allows to avoid the problems with the synchronization and allows fully
support L2+ parallelism in SPMD mode with no runtime.
Reviewers: gtbercea, grokos
Subscribers: guansong, jdoerfert, caomhin, kkwli0, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D60918
llvm-svn: 359341
Summary:
I ran into some issues after rOMP355687, where __atomic_fetch_add was
being used incorrectly on x86, and this turns out to be caused by the
following added conditionals:
```
#if defined(KMP_ARCH_MIPS)
```
The problem is, these macros are always defined, and are either 0 or 1
depending on the architecture. E.g. the correct way to test for MIPS
is:
```
#if KMP_ARCH_MIPS
```
Reviewers: petarj, jlpeyton, Hahnfeld, AndreyChurbanov
Reviewed By: petarj, AndreyChurbanov
Subscribers: AndreyChurbanov, sdardis, arichardson, atanasyan, jfb, jdoerfert, openmp-commits, llvm-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D60938
llvm-svn: 358911
Fix the test to run it really in SPMD mode without runtime. Previously
it was run in SPMD + full runtime mode and does not allow to cehck the
functionality correctly.
llvm-svn: 358902
Summary:
If the kernel is executed in SPMD mode and the L2+ parallel for region
with the dynamic scheduling is executed, dynamic scheduling functions
are called. They expect full runtime support, but SPMD kernels may be
executed without the full runtime. It leads to the runtime crash of the
compiled program. Patch fixes this problem + fixes handling of the
parallelism level in SPMD mode, which is required as part of this patch.
Reviewers: gtbercea, kkwli0, grokos
Subscribers: guansong, jdoerfert, openmp-commits, caomhin
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D60578
llvm-svn: 358442
This change replaces some of the assembly functions in z_Linux_asm.S
for inline asm in kmp.h. This allows better interaction with compiler
tools and sanitizers.
Differential Revision: https://reviews.llvm.org/D60423
llvm-svn: 358438
* Replace HBWMALLOC API with more general MEMKIND API, new functions
and variables added.
* Have libmemkind.so loaded when accessible.
* Redirect memspaces to default one except for high bandwidth which
is processed separately.
* Ignore some allocator traits e.g., sync_hint, access, pinned, while
others are processed normally e.g., alignment, pool_size, fallback,
fb_data, partition.
* Add tests for memory management
Patch by Andrey Churbanov
Differential Revision: https://reviews.llvm.org/D59783
llvm-svn: 357929
This patch cleans up the bookkeeping code for the load balancing dynamic mode.
When a thread is moved to or from the thread pool, the th_active_in_pool flag
and the __kmp_thread_pool_active_nth global counter are both updated. This
removes the need for the corrective code in the main wait loop. Another global
counter, __kmp_thread_pool_nth, was removed completely, as it was only used for
debugging, but was not under KMP_DEBUG.
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D59508
llvm-svn: 357927
Debug dump on large machine shows when many OpenMP threads (401 in total)
sleep on a barrier, one of the innermost nesting levels sleeps
on a child's b_arrived flag whose value is equal to 4 and is equal to
checker value. i.e., (1) sleep bit is 0, and (2) done_check() would
return true if called.
It is unclear how this might happen. It could be Windows Server 2016's
error of EnterCriticalSection / LeaveCriticalSection, or
error of WaitForSingleObject / SetEvent / ResetEvent, or
error in the library which is very difficult to find.
As a workaround, change INFINITE wait to timed wait, so that each
thread awakens each 5 seconds (the timeout was chosen arbitrary to not
disturb other threads much), check flag condition under the lock, and
either go to sleep again or stop sleeping as a result of the check.
Patch by Andrey Churbanov
Differential Revision: https://reviews.llvm.org/D59793
llvm-svn: 357722
The distribute clause needs an explicit push of a timer. The teams
clause needs a timer added and also, similarly to parallel, exchanged
with the serial timer when encountered so that serial regions are
counted properly.
Differential Revision: https://reviews.llvm.org/D59801
llvm-svn: 357621
On most platforms, certain compiler and linker flags have to be passed
when using pthreads, otherwise linking against libomp.so might fail with
undefined references to several pthread functions.
Use CMake's `find_package(Threads)` to determine these for standalone
builds, or take them (and optionally modify them) from the top-level
LLVM cmake files.
Also, On FreeBSD, ensure that libomp.so is linked against libm.so,
similar to NetBSD.
Adjust test cases with hardcoded `-lpthread` flag to use the common
build flags, which should now have the required pthread flags.
Reviewers: emaste, jlpeyton, krytarowski, mgorny, protze.joachim, Hahnfeld
Reviewed By: Hahnfeld
Subscribers: AndreyChurbanov, tra, EricWF, Hahnfeld, jfb, jdoerfert, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D59451
llvm-svn: 357618
At the moment, support for runtime debug output using the
OMPTARGET_DEBUG=1 environment variable is only available with
CMAKE_BUILD_TYPE=Debug builds. The patch allows setting it independently
using the LIBOMPTARGET_ENABLE_DEBUG option, which is enabled by default
depending on CMAKE_BUILD_TYPE. That is, unless this option is set
explicitly, nothing changes. This is the same mechanism used by LLVM for
LLVM_ENABLE_ASSERTIONS.
This patch also removes adding -g -O0 in debug builds, it should be
handled by cmake's CMAKE_{C|CXX}_FLAGS_DEBUG configuration option.
Idea by Hal Finkel
Differential Revision: https://reviews.llvm.org/D55952
llvm-svn: 356998
Summary:
While building the 8.0 releases on FreeBSD, I encountered the following
error in the regression tests, where ompt/misc/interoperability.cpp
failed to compile, with:
```
projects/openmp/runtime/test/ompt/misc/interoperability.cpp:7:10: fatal error: 'alloca.h' file not found
#include <alloca.h>
^~~~~~~~~~
```
Like on NetBSD, alloca(3) is defined in <stdlib.h> instead.
Reviewers: emaste, jlpeyton, krytarowski, mgorny, protze.joachim
Reviewed By: jlpeyton
Subscribers: jdoerfert, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D59736
llvm-svn: 356936
Summary:
[Split off from D59451 to get this fix in separately]
While building the 8.0 releases on FreeBSD, I encountered the following
warnings in openmp quite a few times:
```
In file included from projects/openmp/runtime/src/kmp_settings.cpp:27:
projects/openmp/runtime/src/kmp_wrapper_getpid.h:35:2: warning: #warning is a language extension [-Wpedantic]
#warning No gettid found, use getpid instead
^
projects/openmp/runtime/src/kmp_wrapper_getpid.h:35:2: warning: No gettid found, use getpid instead [-W#warnings]
2 warnings generated.
```
I added a gettid wrapper that uses FreeBSD's pthread_getthreadid_np(3)
function for this.
Reviewers: emaste, jlpeyton, krytarowski, mgorny, protze.joachim
Reviewed By: jlpeyton
Subscribers: jfb, jdoerfert, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D59735
llvm-svn: 356934
The GOMP sections interface uses schedule(dynamic) dispatch so it cannot
be assumed which thread executes the cancel and which thread executes
the cancellation point. This patch allows either thread to execute either
section.
llvm-svn: 356302
The following GCC intrinsics are not available on MIPS32:
__sync_fetch_and_add_8
__sync_fetch_and_and_8
__sync_fetch_and_or_8
__sync_val_compare_and_swap_8
Replace these with appropriate libatomic implementation.
Patch by Miodrag Dinic.
Differential Revision: https://reviews.llvm.org/D45691
llvm-svn: 355687
Summary:
The current install-clang-headers target installs clang's resource
directory headers. This is different from the install-llvm-headers
target, which installs LLVM's API headers. We want to introduce the
corresponding target to clang, and the natural name for that new target
would be install-clang-headers. Rename the existing target to
install-clang-resource-headers to free up the install-clang-headers name
for the new target, following the discussion on cfe-dev [1].
I didn't find any bots on zorg referencing install-clang-headers. I'll
send out another PSA to cfe-dev to accompany this rename.
[1] http://lists.llvm.org/pipermail/cfe-dev/2019-February/061365.html
Reviewers: beanz, phosek, tstellar, rnk, dim, serge-sans-paille
Subscribers: mgorny, javed.absar, jdoerfert, #sanitizers, openmp-commits, lldb-commits, cfe-commits, llvm-commits
Tags: #clang, #sanitizers, #lldb, #openmp, #llvm
Differential Revision: https://reviews.llvm.org/D58791
llvm-svn: 355340
Changing the default from -fPIC to -fno-PIC on PowerPC exposed an issue in
OpenMP for PowerPC.
The issue is reported here:
https://bugs.llvm.org/show_bug.cgi?id=40082
This is a fix for that issue.
Also removed the XFAIL from the two tests that were failing under -fno-PIC.
Differential Revision: https://reviews.llvm.org/D56286
llvm-svn: 355229
This change makes the runtime decide the intended use of each barrier
invocation, for the OMPT synchronization tool callbacks. The OpenMP 5.0
specification defines four possible barrier kinds -- implicit, explicit,
implementation, and just normal barrier.
Patch by Hansang Bae
Differential Revision: https://reviews.llvm.org/D58247
llvm-svn: 355140
Nest-var, OMP_NESTED, omp_set_nested()., and omp_get_nested() have been
deprecated in the 5.0 spec. Initial nesting info is now derived from
OMP_MAX_ACTIVE_LEVELS, OMP_NUM_THREADS, and OMP_PROC_BIND.
This patch deprecates the internal ICV that corresponds to nest-var, and
replaces it with the max-active-levels-var ICV to determine nesting. The
change still allows for use of OMP_NESTED (according to 5.0 changes),
omp_get_nested, and omp_set_nested, which have had deprecation messages
added to them. The change allows certain settings of OMP_NUM_THREADS,
OMP_PROC_BIND, and OMP_MAX_ACTIVE_LEVELS to turn on nesting, but
OMP_NESTED=0 will still force nesting to be off.
The runtime now prints informative messages about deprecation of
OMP_NESTED, omp_set_nested(), and omp_get_nested(), when those
environment variables or routines are used. It also prints deprecated
message in output for KMP_SETTINGS and OMP_DISPLAY_ENV for OMP_NESTED.
This patch also fixes OMP_DISPLAY_ENV output for OMP_TARGET_OFFLOAD.
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D58408
llvm-svn: 355138
This patch cleans up the yielding code and makes it optional. An
environment variable, KMP_USE_YIELD, was added. Yielding is still
on by default (KMP_USE_YIELD=1), but can be turned off completely
(KMP_USE_YIELD=0), or turned on only when oversubscription is detected
(KMP_USE_YIELD=2). Note that oversubscription cannot always be detected
by the runtime (for example, when the runtime is initialized and the
process forks, oversubscription cannot be detected currently over
multiple instances of the runtime).
Because yielding can be controlled by user now, the library mode
settings (from KMP_LIBRARY) for throughput and turnaround have been
adjusted by altering blocktime, unless that was also explicitly set.
In the original code, there were a number of places where a double yield
might have been done under oversubscription. This version checks
oversubscription and if that's not going to yield, then it does
the spin check.
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D58148
llvm-svn: 355120
Summary:
This patch adds a more sophisticated team reduction scheme to the OpenMP libomptarget-nvptx runtime.
The scheme uses a fixed size global memory buffer whose length can be adjusted via compiler flag:
```
-fopenmp-cuda-teams-reduction-recs-num=1024
```
The global buffer is a structure of arrays (with default size of 1024 each and controlled by the above flag), one array for each reduction variable.
Values in the buffer are processed by the last team to finish executing the body of the target region.
In addition to adding support for the new flag, the compiler also emits special functions used for the reduction of the intermediate reduction values. These changes will be added in a separate compiler patch following this one.
Reviewers: ABataev, caomhin
Reviewed By: ABataev
Subscribers: guansong, jfb, jdoerfert, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D58409
llvm-svn: 354471
This patch adds the new 5.0 API function omp_get_supported_active_levels().
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D58211
llvm-svn: 354368
Remove fatal error messages from the cancellation API for GOMP
Add __kmp_barrier_gomp_cancel() to implement cancellation of parallel regions.
This new function uses the linear barrier algorithm with a cancellable
nonsleepable wait loop.
Differential Revision: https://reviews.llvm.org/D57969
llvm-svn: 354367
The thread-limit-var and omp_get_thread_limit API was not perfectly handled for
teams construct. Now, when modified by thread_limit clause, omp_get_thread_limit
reports the correct value. In addition, the value is restored when leaving the
teams construct to what it was in the encountering context.
This is done partly by creating the notion of a Contention Group root (CG root)
that keeps track of the thread at the root of each separate CG, the
thread-limit-var associated with the CG, and associated counter of active
threads within the contention group.
thread-limits are passed from master to worker threads via an entry in the ICV
data structure. When a "contention group switch" occurs, a new CG root record is
made and passed from master to worker. A thread could potentially have several
CG root records if it encounters multiple nested teams constructs (but at the
moment the spec doesn't allow for nested teams, so the most one could have
currently is 2). The master of the teams masters gets the thread-limit clause
value stored to its local ICV structure, and the other teams masters copy it
from the master. The thread-limit is set from that ICV copy and restored to the
ICV copy when entering and leaving the teams construct.
This change also fixes a bug when the top-level teams construct team gets
reused, and OMP_DYNAMIC was true, which can cause the expected size of this team
to be smaller than what was actually allocated. The fix updates the size of the
team after its threads were reserved.
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D56804
llvm-svn: 353747
%s refers to the test file in the source tree. This was accidentally added in
r351197 / 2b46d30 ("[OMPT] Second chunk of final OMPT 5.0 interface updates").
Differential Revision: https://reviews.llvm.org/D58002
llvm-svn: 353715
Summary:
As @david2050 commented, changes introduced by https://reviews.llvm.org/D56397 break builds for older compilers
which don't support `__has(_cpp)_attribute`. This is a fix for the break.
Reviewers: protze.joachim, jlpeyton, AndreyChurbanov, Hahnfeld, david2050
Subscribers: openmp-commits, david2050
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D57851
llvm-svn: 353538
The three switch fallthrough generate a warning with -Wimplicit-fallthrough.
Two are documented as fallthrough, one is not, but I think the intention is to also fallthrough in kmp_tasking.cpp.
Not sure whether kmp.h is the best place to define the macro.
Reviewers: jlpeyton, AndreyChurbanov, Hahnfeld
Reviewed By: jlpeyton
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D56397
llvm-svn: 353052
Redo after revert by hans. The wrong include in one test is fixed.
Make sure that OMPT is enabled in runtime entry points that access internals
of the runtime. Else, return an appropiate value indicating an error or that
the data is not available.
Patch provided by @sconvent
Reviewers: jlpeyton, omalyshe, hbae, Hahnfeld, joachim.protze
Reviewed By: joachim.protze
Tags: #openmp, #ompt
Differential Revision: https://reviews.llvm.org/D47717
llvm-svn: 352611
This fixes most references to the paths:
llvm.org/svn/
llvm.org/git/
llvm.org/viewvc/
github.com/llvm-mirror/
github.com/llvm-project/
reviews.llvm.org/diffusion/
to instead point to https://github.com/llvm/llvm-project.
This is *not* a trivial substitution, because additionally, all the
checkout instructions had to be migrated to instruct users on how to
use the monorepo layout, setting LLVM_ENABLE_PROJECTS instead of
checking out various projects into various subdirectories.
I've attempted to not change any scripts here, only documentation. The
scripts will have to be addressed separately.
Additionally, I've deleted one document which appeared to be outdated
and unneeded:
lldb/docs/building-with-debug-llvm.txt
Differential Revision: https://reviews.llvm.org/D57330
llvm-svn: 352514
As the codebase is now under the Apache 2.0 license with LLVM
Exceptions, and all Arm's contributions, past or future, are under that
new license, this Arm specific words in LICENSE.txt are no longer
needed.
llvm-svn: 352377