Commit Graph

273 Commits

Author SHA1 Message Date
Shilei Tian
a014fbbc21 [OpenMP] Improve D2D memcpy to use more efficient driver API
Summary:
In current implementation, D2D memcpy is first to copy data back to host and then
copy from host to device. This is very efficient if the device supports D2D
memcpy, like CUDA.

In this patch, D2D memcpy will first try to use native supported driver API. If
it fails, fall back to original way. It is worth noting that D2D memcpy in this
scenerio contains two ideas:
- Same devices: this is the D2D memcpy in the CUDA context.
- Different devices: this is the PeerToPeer memcpy in the CUDA context.
My implementation merges this two parts. It chooses the best API according to
the source device and destination device.

Reviewers: jdoerfert, AndreyChurbanov, grokos

Reviewed By: jdoerfert

Subscribers: yaxunl, guansong, sstefan1, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D80649
2020-06-04 16:59:06 -04:00
Manoel Roemmer
6b9e43c67e [Openmp][VE] Libomptarget plugin for NEC SX-Aurora
This patch adds a libomptarget plugin for the NEC SX-Aurora TSUBASA Vector
Engine (VE target).  The code is largely based on the existing generic-elf
plugin and uses the NEC VEO and VEOSINFO libraries for offloading.

Differential Revision: https://reviews.llvm.org/D76843
2020-05-12 10:47:30 +02:00
Joel E. Denny
dd5ba4b585 [OpenMP][NFC] Fix not sustitution in tests
D78566 introduced a `\bnot\b` lit substitution in OpenMP test suites.
However, that would corrupt a command like
`FileCheck -implicit-check-not` or any file name like `%t.not`.  We
could use lookbehind/lookahead assertions to avoid such cases, but
this patch switches to `%not` (suggested during the D78566 review) as
a safer option.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D79529
2020-05-11 14:53:48 -04:00
Shilei Tian
cb038927ef [OpenMP] Fix an issue of wrong return type of DeviceRTLTy::getNumOfDevices
Summary: There is a typo in DeviceRTLTy::getNumOfDevices that the type of its return value is bool. It will lead to a problem of wrong device number returned from omp_get_num_devices.

Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: yaxunl, guansong, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D79255
2020-05-03 15:59:06 -04:00
Ron Lieberman
ee9c53d271 [libomptarget] Initialize reference parameter IsNew within Device::getOrAllocTgtPtr
The two locals IsNew and Pointer_IsNew were uninitialized at declaration, and then passed by
reference to Device.getOrAllocTgtPtr which in turn did not assign on all
paths within the function. This resulted in occasional runtime failures in one application.
Device::getOrAllocTgtPtr will now initialize IsNew to false on entry to function.

Differential Revision: https://reviews.llvm.org/D78744
2020-04-24 15:33:37 -05:00
Joel E. Denny
5f6aa9680c [OpenMP] target_data_begin: fail on device alloc fail
Without this patch, target_data_begin continues after an illegal
mapping or an out-of-memory error on the device.  With this patch, it
terminates the runtime with an error instead.

The new test exercises only illegal mappings.  I didn't think of a
good way to exercise out-of-memory errors from the test suite.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D78170
2020-04-21 17:10:50 -04:00
Joel E. Denny
ba942610f6 [OpenMP] Add scaffolding for negative runtime tests
Without this patch, the openmp project's test suites do not appear to
have support for negative tests.  However, D78170 needs to add a test
that an expected runtime failure occurs.

This patch makes `not` visible in all of the openmp project's test
suites.  In all but `libomptarget/test`, it should be possible for a
test author to insert `not` before a use of the lit substitution for
running a test program.  In `libomptarget/test`, that substitution is
target-specific, and its value is `echo` when the target is not
available.  In that case, inserting `not` before a lit substitution
would expect an `echo` fail, so this patch instead defines a separate
lit substitution for expected runtime fails.

Reviewed By: jdoerfert, Hahnfeld

Differential Revision: https://reviews.llvm.org/D78566
2020-04-21 17:10:50 -04:00
Shilei Tian
4031bb982b [OpenMP] Refined CUDA plugin to put all CUDA operations into class
Summary: Current implementation mixed everything up so that there is almost no encapsulation. In this patch, all CUDA related operations are put into a new class DeviceRTLTy and only necessary functions are exposed. In addition, all C++ code now conforms with LLVM code standard, keeping those API functions following C style.

Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: jfb, yaxunl, guansong, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D77951
2020-04-13 13:32:46 -04:00
Shilei Tian
feed674dec [OpenMP] Introduce stream pool to make sure the correctness of device synchr...
...onization

Summary: In previous patch, in order to optimize performance, we only synchronize once
for each target region. The syncrhonization is via stream synchronization.
However, in the extreme situation, the performce might be bad. Consider the
following case: There is a task that requires transferring huge amount of data
(call many times of data transferring function). It is scheduled to the first
stream. And then we have 255 very light tasks scheduled to the remaining 255
streams (by default we have 256 streams). They can be finished before we do
synchronization at the end of the first task. Next, we get another very huge
task. It will be scheduled again to the first stream. Now the first task
finishes its kernel launch and call stream synchronization. Right now, the
stream already contains two kernels, and the synchronization will wait until the
two kernels finish instead of just the first one for the first task.

In this patch, we introduce stream pool. After each synchronization, the stream
will be returned back to the pool to make sure that for each synchronization,
only expected operations are waited.

Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: gregrodgers, yaxunl, lildmh, guansong, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D77412
2020-04-11 07:08:56 -04:00
Shilei Tian
03ff643d2e [OpenMP] Put old APIs back and added new _async series for backward compatibility
Summary: According to comments on bi-weekly meeting, this patch put back old APIs and added new `_async` series

Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: yaxunl, guansong, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D77822
2020-04-09 22:40:58 -04:00
Shilei Tian
32ed29271f [OpenMP] Optimized stream selection by scheduling data mapping for the same target region into a same stream
Summary:
This patch introduces two things for offloading:
1. Asynchronous data transferring: those functions are suffix with `_async`. They have one more argument compared with their synchronous counterparts: `__tgt_async_info*`, which is a new struct that only has one field, `void *Identifier`. This struct is for information exchange between different asynchronous operations. It can be used for stream selection, like in this case, or operation synchronization, which is also used. We may expect more usages in the future.
2. Optimization of stream selection for data mapping. Previous implementation was using asynchronous device memory transfer but synchronizing after each memory transfer. Actually, if we say kernel A needs four memory copy to device and two memory copy back to host, then we can schedule these seven operations (four H2D, two D2H, and one kernel launch) into a same stream and just need synchronization after memory copy from device to host. In this way, we can save a huge overhead compared with synchronization after each operation.

Reviewers: jdoerfert, ye-luo

Reviewed By: jdoerfert

Subscribers: yaxunl, lildmh, guansong, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D77005
2020-04-07 14:55:47 -04:00
Kazuaki Ishizaki
4201679110 [OpenMP] NFC: Fix trivial typo
Differential Revision: https://reviews.llvm.org/D77430
2020-04-04 12:06:54 +09:00
JonChesterfield
09834f9761 [libomptarget][nfc] Move non-freestanding headers out of common
Summary:
[libomptarget][nfc] Move non-freestanding headers out of common

Lowers the bar for building deviceRTL.
Drops math.h entirely as it wasn't used and libm is a big dependency.

Reviewers: jdoerfert, ABataev, grokos

Reviewed By: jdoerfert

Subscribers: jvesely, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D77071
2020-03-31 23:43:18 +01:00
Jon Chesterfield
856c995436 [libomptarget] Add missing elf_end call in elf_common.c
Summary:
[libomptarget] Add missing elf_end call in elf_common.c
Noticed when reviewing D76843.

Reviewers: simoll, jdoerfert, efocht, AndreyChurbanov, grokos, manorom

Reviewed By: grokos

Subscribers: openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D76874
2020-03-26 19:07:33 +00:00
JonChesterfield
0813f41005 [libomptarget][nfc] Explicitly static function scope shared variables
Summary:
[libomptarget][nfc] Explicitly static function scope shared variables

`__shared__` in CUDA implies static in function scope. See e.g. D.2.1.1
in CUDA_C_Programming_Guide.pdf,
http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/

This is surprising for non-cuda developers, see e.g. D73239 where I thought
local variables would be thread local.

Tested by IR diff of libomptarget.bc (no change), running in tree tests,
and binary diff of the nvcc static archives (no significant change).

Reviewers: jdoerfert, ABataev, grokos

Reviewed By: jdoerfert

Subscribers: openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D76713
2020-03-24 18:51:50 +00:00
JonChesterfield
298527587c [libomptarget][nfc] Disable amdgcn rtl build. The cmake logic for finding llvm is misbehaving. 2020-03-21 00:01:03 +00:00
George Rokos
0a42c9bfe4 Enable CUDA offloading on aarch64 host
Differential Revision: https://reviews.llvm.org/D76469
2020-03-20 15:38:47 -07:00
Tom Scogland
a23d7282ca openmp: fix memcpy memory leak
Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D72637
2020-03-12 23:24:16 -05:00
Alexey Bataev
c422d69b1a [LIBOMPTARGET]Fix PR45139: Bug in mixing Python and OpenMP target offload.
Summary: Explicitly initialize data members of RTLsTy class upon construction.

Reviewers: grokos

Subscribers: guansong, openmp-commits, caomhin, kkwli0

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D75946
2020-03-11 09:12:02 -04:00
Jon Chesterfield
221ada654b [libomptarget] Implement locks for amdgcn
Summary:
[libomptarget] Implement locks for amdgcn

The nvptx implementation deadlocks on amdgcn. atomic_cas with multiple
active lanes can deadlock - if one lane succeeds, all the others are locked
out. The set_lock implementation therefore runs on a single lane.

Also uses a sleep intrinsic instead of the system clock for a probably
minor performance improvement. The unset/test implementations may be revised
later, based on code size / performance or similar concerns.

This implements the lock at a per-wavefront scope. That's not strictly as
specified, since openmp describes locks in terms of threads. I think the
nvptx implementation provides true per-thread locking on volta and the same
per-warp locking on other architectures.

Reviewers: jdoerfert, ABataev, grokos

Reviewed By: jdoerfert

Subscribers: jvesely, mgorny, jfb, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D75546
2020-03-05 20:25:31 +00:00
Jon Chesterfield
918a1065be [libomptarget][nfc] Move GetWarp/LaneId functions into per arch code
Summary:
[libomptarget][nfc] Move GetWarp/LaneId functions into per arch code

No code change for nvptx. Amdgcn currently has two implementations of GetLaneId,
this patch keeps the one a colleague considered to be superior for our ISA.

GetWarpId is currently the same function for amdgcn and nvptx, but I think it's
cleaner to keep it grouped with all the others than to keep it in support.cu.

Reviewers: jdoerfert, grokos, ABataev

Reviewed By: jdoerfert

Subscribers: jvesely, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D75587
2020-03-05 17:05:58 +00:00
Jon Chesterfield
84ac0dffd4 [libomptarget][nfc][amdgcn] Replace magic number with named intrinsic 2020-03-05 11:50:30 +00:00
Jon Chesterfield
133db44996 [libomptarget] Implement most hip atomic functions in terms of intrinsics
Summary:
[libomptarget] Implement hip atomic functions in terms of intrinsics

All but atomicInc can be implemented using type generic clang intrinsics.
There is not yet a corresponding intrinsic for atomicInc in clang, only one in
LLVM. This patch leaves atomicInc as an unresolved symbol.

Reviewers: jdoerfert, ABataev, hfinkel, grokos, arsenm

Reviewed By: arsenm

Subscribers: sri, saiislam, wdng, jvesely, mgorny, jfb, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D73076
2020-03-04 17:56:40 +00:00
Jon Chesterfield
ad3d021b9e [libomptarget][nfc][amdgcn] Simplify assert_fail implementation 2020-03-03 18:24:51 +00:00
Alexey Bataev
c4a9d976c1 [LIBOMPTARGET]Lower priority of global constructor/destructor to silence the warning from gcc.
Summary: fixed the warning from gcc since prios 0-100 are reserved for the internal use.

Reviewers: grokos

Subscribers: kkwli0, caomhin, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D75458
2020-03-02 15:15:11 -05:00
Alexey Bataev
63cef621f9 [LIBOMPTARGET]Fix PR44933: fix crash because of the too early deinitialization of libomptarget.
Summary:
Instead of using global variables with unpredicted time of
deinitialization, use dynamically allocated variables with functions
explicitly marked as global constructor/destructor and priority. This
allows to prevent the crash because of the incorrect order of dynamic
libraries deinitialization.

Reviewers: grokos, hfinkel

Subscribers: caomhin, kkwli0, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D74837
2020-02-25 15:54:37 -05:00
Alexey Bataev
578c13d13c [OPENMP]Fix the test, NFC. 2020-02-13 10:40:06 -05:00
Ethan Stewart
190a11148b Changed omp_get_max_threads() implementation to more closely match spec description.
Summary: The 5.0 spec states, "The omp_get_max_threads routine returns an upper bound on the number of threads that could be used to form a new team if a parallel construct without a num_threads clause were encountered after execution returns from this routine." The attached test shows Max Threads: 96, Num Threads: 128 without the proposed change. The number of threads should not exceed the (max) nthreads ICV, hence we should return the higher SPMD thread number even when omp_get_max_threads() is called in a generic kernel. This change does fail the api test, max_threads.c, because now it would return 64 instead of 32.

Reviewers: jdoerfert, ABataev, grokos, JonChesterfield

Reviewed By: jdoerfert

Subscribers: openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D74092
2020-02-12 23:29:34 +00:00
JonChesterfield
c2ce9ea4e3 [libomptarget][nfc] Change enum values to match those in cuda/rtl
Summary:
[libomptarget][nfc] Change enum values to match those in cuda/rtl

support.h and cuda/rtl.cpp (and downsteam hsa/rtl.cpp) have enums for execution
mode. These are actually independent - the numbers that used within support, or
within the plugin, are never passed across the boundary.

Nevertheless, trying to work out why the values are different between the two
has generated a reasonable amount of confusion. This patch changes support to
match the values in plugin, on the basis that the plugin also has some comments
which I'd have to update if I changed that one instead. Credit to Ron for
working through this in our own fork. See rocm-developer-tools/aomp/issues/7
for that earlier diagnostic write up.

Also happy with generic = 0, spmd = 1 - provided it's the same in both places.

Reviewers: jdoerfert, grokos, ABataev, ronlieb

Reviewed By: grokos

Subscribers: openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D74503
2020-02-12 23:27:08 +00:00
Johannes Doerfert
a5153dbc36 [OpenMP][Offloading] Added support for multiple streams so that multiple kernels can be executed concurrently
Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D74145
2020-02-11 22:07:14 -06:00
Johannes Doerfert
3ff4e2eee8 [OpenMP] Switch default C++ standard to C++ 14
Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D74258
2020-02-11 17:11:54 -06:00
Jonas Devlieghere
4fe839ef3a [CMake] Rename EXCLUDE_FROM_ALL and make it an argument to add_lit_testsuite
EXCLUDE_FROM_ALL means something else for add_lit_testsuite as it does
for something like add_executable. Distinguish between the two by
renaming the variable and making it an argument to add_lit_testsuite.

Differential revision: https://reviews.llvm.org/D74168
2020-02-06 15:33:18 -08:00
Jon Chesterfield
6a82f0f0b9 [libomptarget] Implement wavefront functions for amdgcn
Summary: [libomptarget] Implement wavefront functions for amdgcn

Reviewers: jdoerfert, ABataev, grokos, arsenm

Reviewed By: arsenm

Subscribers: saiislam, wdng, arsenm, jvesely, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D73077
2020-02-04 21:55:29 +00:00
Jon Chesterfield
ab9762a9f5 Revert "[nfc][libomptarget] Remove SHARED annotation from local variables"
This reverts commit 0e9374e374.
Revert D73239. It fails some local testing, cause presently unknown
2020-01-27 20:05:17 +00:00
Jon Chesterfield
0e9374e374 [nfc][libomptarget] Remove SHARED annotation from local variables
Summary:
[nfc][libomptarget] Remove SHARED annotation from local variables

A few local variables in reduction.cu were marked SHARED. This patch leaves
all per-kernel global state localised in omp_data.cu.

Reviewers: ABataev, jdoerfert, grokos

Reviewed By: jdoerfert

Subscribers: openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D73239
2020-01-23 00:00:23 +00:00
Alexey Bataev
9148b8b734 [OpenMP][Offloading] Fix the issue that omp_get_num_devices returns wrong number of devices, by Shiley Tian.
Summary:
This patch is to fix issue in the following simple case:

  #include <omp.h>
  #include <stdio.h>

  int main(int argc, char *argv[]) {
    int num = omp_get_num_devices();
    printf("%d\n", num);

    return 0;
  }

Currently it returns 0 even devices exist. Since this file doesn't contain any
target region, the host entry is empty so further actions like initialization
will not be proceeded, leading to wrong device number returned by runtime
function call.

Reviewers: jdoerfert, ABataev, protze.joachim

Reviewed By: ABataev

Subscribers: protze.joachim

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D72576
2020-01-21 13:25:18 -05:00
Jon Chesterfield
03c2a59cd6 [libomptarget] Implement smid for amdgcn
Summary:
[libomptarget] Implement smid for amdgcn

Implementation is in a new file as it uses an intrinsic with
complicated encoding that warranted substantial comments.

Reviewers: jdoerfert, grokos, ABataev, ronlieb

Reviewed By: jdoerfert

Subscribers: jvesely, mgorny, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D72956
2020-01-20 14:52:17 +00:00
George Rokos
e244145ab0 [LIBOMPTARGET] Do not increment/decrement the refcount for "declare target" objects
The reference counter for global objects marked with declare target is INF. This patch prevents the runtime from incrementing /decrementing INF refcounts. Without it, the map(delete: global_object) directive actually deallocates the global on the device. With this patch, such a directive becomes a no-op.

Differential Revision: https://reviews.llvm.org/D72525
2020-01-14 16:30:38 -08:00
Jon Chesterfield
2a43688a0a [nfc][libomptarget] Refactor nvptx/target_impl.cu
Summary:
[nfc][libomptarget] Refactor nxptx/target_impl.cu

Use __kmpc_impl_atomic_add instead of atomicAdd to match the rest of the file.
Alternatively, target_impl.cu could use the cuda functions directly. Using a mixture in this
file was an oversight, happy to resolve in either direction.

Removed some comments that look outdated.

Call __kmpc_impl_unset_lock directly to avoid a redundant diagnostic and remove an implict
dependency on interface.h.

Reviewers: ABataev, grokos, jdoerfert

Reviewed By: jdoerfert

Subscribers: jfb, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D72719
2020-01-14 19:27:45 +00:00
Jon Chesterfield
2d287bec3c [nfc][libomptarget] Refactor amdgcn target_impl
Summary:
[nfc][libomptarget] Refactor amdgcn target_impl

Removes references to internal libraries from the header
Standardises on C++ mangling for all the target_impl functions
Update comment block
clang-format
Move some functions into a new target_impl.hip source file

This lays the groundwork for implementing the remaining unresolved
symbols in the target_impl.hip source.

Reviewers: jdoerfert, grokos, ABataev, ronlieb

Reviewed By: jdoerfert

Subscribers: jvesely, mgorny, jfb, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D72712
2020-01-14 19:27:07 +00:00
Alexey Bataev
b19c0810e5 [LIBOMPTARGET]Ignore empty target descriptors.
Summary:
If the dynamically loaded module has been compiled with -fopenmp-targets
and has no target regions, it has empty target descriptor. It leads to a
crash at the runtime if another module has at least one target region
and at least one entry in its descriptor. The runtime library is unable
to load the empty binary descriptor and terminates the execution.
Caused by a clang-offload-wrapper.

Reviewers: grokos, jdoerfert

Subscribers: caomhin, kkwli0, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D72472
2020-01-10 09:45:27 -05:00
Kazuaki Ishizaki
4c6a098ad5 [OpenMP] NFC: Fix trivial typos in comments
Reviewers: jdoerfert, Jim

Reviewed By: Jim

Subscribers: Jim, mgorny, guansong, jfb, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D72285
2020-01-07 14:05:03 +08:00
Jon Chesterfield
bc48af8c57 [libomptarget][nfc] Change unintentional target_impl prefix to kmpc_impl 2019-12-30 20:50:23 +00:00
Jon Chesterfield
63e2aa5658 [libomptarget][nfc] Provide target_impl malloc/free
Summary:
[libomptarget][nfc] Provide target_impl malloc/free

Sufficient to build support.cu for amdgcn

Reviewers: jdoerfert, ABataev, grokos

Reviewed By: jdoerfert

Subscribers: jvesely, mgorny, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D71685
2019-12-19 16:54:28 +00:00
JonChesterfield
b40822fc14 [libomptarget][nvptx] Fix build, second symbol reordering 2019-12-19 02:02:44 +00:00
Jon Chesterfield
89a2bef27a [libomptarget][nvptx] Fix build, symbol ordering in target_impl.h 2019-12-19 01:50:06 +00:00
JonChesterfield
9aefe5f65e [libomptarget][amdgcn] Correct return type of extern __clock64 to unsigned 2019-12-19 00:11:21 +00:00
Jon Chesterfield
2caeaf2f45 [libomptarget][nfc] Introduce atomic wrapper function
Summary:
[libomptarget][nfc] Introduce atomic wrapper function

Wraps atomic functions in a template prefixed __kmpc_atomic that
dispatches to cuda or hip atomic functions. Intended to be easily extended
to dispatch to OpenCL or C++ atomics for a third target.

Reviewers: ABataev, jdoerfert, grokos

Reviewed By: jdoerfert

Subscribers: Anastasia, jvesely, mgrang, dexonsmith, llvm-commits, mgorny, jfb, openmp-commits

Tags: #openmp, #llvm

Differential Revision: https://reviews.llvm.org/D71404
2019-12-18 20:06:17 +00:00
JonChesterfield
8adae6027c [libomptarget][nfc] Extract function from data_sharing, move to common
Summary:
[libomptarget][nfc] Extract function from data_sharing, move to common

Finding the first active thread in the warp is different on nvptx and amdgcn,
mostly due to warp size and the desire for efficiency.

Reviewers: ABataev, jdoerfert, grokos

Reviewed By: jdoerfert

Subscribers: jvesely, mgorny, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D71643
2019-12-18 19:39:35 +00:00
Alexey Bataev
15d47deedd [LIBOPENMP][NVPTX]Fix the build error in the runtime. 2019-12-17 14:46:04 -05:00