I have added a few things to the OpenMP FAQ which I think were missing. Feel free to suggest some changes. Are there missing options in the offloading command line reference? And what do you think about the section "Q: Why is my
build taking a long time"?
Differential Revision: https://reviews.llvm.org/D156387
Currently, the precense of the OpenMP target declare metadata requires
that we always codegen a global declaration. This is undesirable in the
case that we could defer or omit this declaration as is common with
unused extern variables. This is important as it allows us, in the
runtime, to rely on static linking semantics to omit unused symbols so
they are not included when the user links it in.
This patch changes the check for always emitting these variables.
Because of this we also need to extend this logic to the generation of
the offloading entries. This has the result of derring the offload entry
generation to the canonical definitoin. So we are effectively assuming
whoever owns the storage for this variable will perform that operation.
This makes an exception for `link` attributes as those require their own
special handling.
Let me know if this is sound in the implementation, I do not have the
largest view of the standards here.
Fixes: https://github.com/llvm/llvm-project/issues/64133
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D156368
This patch adds the functionality to process with a lambda the resources
obtained and returned by the resource managers in the plugins. These
processing lambdas are empty for the moment. The idea is to process them
when the resource manager mutex is acquired.
Differential Revision: https://reviews.llvm.org/D156245
This patch extends the plugin resource managers to return more than one resource
per call. The return function is not extended since we do not return more than
one resource anywhere.
Differential Revision: https://reviews.llvm.org/D155629
This patch improves the resource managers in the plugins by properly handling
the errors. Until now, errors when creating and destroying resources were not
propagated and were directly handled inside the resource managers. Now, all
errors are propagated as in the rest of the plugin infrastructure.
The code is now ready to implement the request/return of multiple resources in
a single getResource/returnResource call.
Differential Revision: https://reviews.llvm.org/D155621
This patch introduces per kernel environment. Previously, flags such as execution mode are set through global variables with name like `__kernel_name_exec_mode`. They are accessible on the host by reading the corresponding global variable, but not from the device. Besides, some assumptions, such as no nested parallelism, are not per kernel basis, preventing us applying per kernel optimization in the device runtime.
This is a combination and refinement of patch series D116908, D116909, and D116910.
Depend on D155886.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D142569
Implemented RAII objects, initialized at target entry points, that
invoke tool-supplied callbacks. Updated status of target callbacks as
implemented.
Depends on D127365
Patch from John Mellor-Crummey <johnmc@rice.edu>
With contributions from:
Dhruva Chakrabarti <Dhruva.Chakrabarti@amd.com>
Jan-Patrick Lehr <janpatrick.lehr@amd.com>
Reviewed By: jdoerfert, dhruvachak, jplehr
Differential Revision: https://reviews.llvm.org/D127367
Implemented RAII objects, initialized at target entry points, that
invoke tool-supplied callbacks. Updated status of target callbacks as
implemented.
Depends on D127365
Patch from John Mellor-Crummey <johnmc@rice.edu>
With contributions from:
Dhruva Chakrabarti <Dhruva.Chakrabarti@amd.com>
Jan-Patrick Lehr <janpatrick.lehr@amd.com>
Reviewed By: jdoerfert, dhruvachak
Differential Revision: https://reviews.llvm.org/D127367
Get the KMP_VERSION printout logic out of environment variable file
(kmp_settings.cpp) and move to end of serial initialization where
KMP_SETTINGS and OMP_DISPLAY_ENV are.
Differential Revision: https://reviews.llvm.org/D154652
Get rid of explicit mask alloc, getthreadaffinity, set temp affinity,
reset to old affinity, dealloc steps in favor of existing
kmp_affinity_raii_t to push/pop a temporary affinity.
Differential Revision: https://reviews.llvm.org/D154650
This patch introduces per kernel environment. Previously, flags such as execution mode are set through global variables with name like `__kernel_name_exec_mode`. They are accessible on the host by reading the corresponding global variable, but not from the device. Besides, some assumptions, such as no nested parallelism, are not per kernel basis, preventing us applying per kernel optimization in the device runtime.
This is a combination and refinement of patch series D116908, D116909, and D116910.
Depend on D155886.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D142569
We observed some overhead and unnecessary debug output.
This can be alleviated by (re-)introduction of a boolean that indicates, if the
OMPT initialization has been performed.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D155186
This patch adds support for invoking target callbacks but does not yet
invoke them. A new structure OmptInterface has been added that tracks
thread local states including correlation ids. This structure defines
methods that will be called from the device independent target library
with information related to a target entry point for which a callback
is invoked. These methods in turn use the callback functions maintained
by OmptDeviceCallbacksTy to invoke the tool supplied callbacks.
Depends on D124652
Patch from John Mellor-Crummey <johnmc@rice.edu>
With contributions from:
Dhruva Chakrabarti <Dhruva.Chakrabarti@amd.com>
Differential Revision: https://reviews.llvm.org/D127365
This patch adds the `rpc_host_call` function as a GPU extension. This is
exported from the `libc` project to use the RPC interface to call a
function pointer via RPC any copying the arguments by-value. The
interface can only support a single void pointer argument much like
pthreads. The function call here is the bare-bones version of what's
required for OpenMP reverse offloading. Full support will require
interfacing with the mapping table, nowait support, etc.
I decided to test this interface in `libomptarget` as that will be the
primary consumer and it would be more difficult to make a test in `libc`
due to the testing infrastructure not really having a concept of the
"host" as it runs directly on the GPU as if it were a CPU target.
Reviewed By: jplehr
Differential Revision: https://reviews.llvm.org/D155003
If we store a constant in an ICV it is easier for the optimizer to
propagate it. Since we often use the full block for the thread limit and
the parallel team size, we can instead replace that dynamic value with a
constant that otherwise cannot occur, here 0.
We ended up with `llvm.assume(icmp ne ptr as(4) null, as(4) @str)`
because the string in address space 4 was not known to be non-null.
There is no need to create these assumes.
This is the only place that defines this prefix in a header file and
was thus overriding and redefining other users of it. If we must use it
in a header file, at least repsect its old values.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D155316
This points users to the `libc` documentation and explains the basics of
how it's used inside the runtime.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D155318
The 'RPCHandleTy' was intended to capture the intention that a specific
device owns its slot in the RPC server. However, this required creating
a temporary store to hold these pointers. This was causing really weird
spurious failure due to undefined behaviour in the order of library
teardown. For example, the x64 plugin would be torn down, set this to
some invalid memory, and then the CUDA plugin would crash. Rather than
spend the time to fully diagnose this problem I found it pertinent to
simply remove the failure mode.
This patch removes this indirection so now the usage of the RPC server
must always be done with the intended device. This just requires some
extra handling for the AMDGPU indirection where we need to store a
reference to the device.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D154971
Add CHECK_OPENMP_ENV environment variable which will be passed to environment
variables for test (make check-* target). This provides a handy way to
exercise various openmp code with different settings during development.
For example, to change default barrier pattern:
```
$ env CHECK_OPENMP_ENV="KMP_FORKJOIN_BARRIER_PATTERN=hier,hier \
KMP_PLAIN_BARRIER_PATTERN=hier,hier \
KMP_REDUCTION_BARRIER_PATTERN=hier,hier" \
ninja check-openmp
```
Even with this, each test can set appropriate environment variables if needed
as before.
Also, this commit adds missing documention about how to run tests in README.
Patch provided by t-msn
Differential Revision: https://reviews.llvm.org/D122645
Added support in the generic plugin to invoke registered callbacks.
Depends on D124070
Patch from John Mellor-Crummey <johnmc@rice.edu>
(With contributions from Dhruva Chakrabarti <Dhruva.Chakrabarti@amd.com>)
Differential Revision: https://reviews.llvm.org/D124652
In preparation for removing the `#include "llvm/ADT/StringExtras.h"`
from the header to source file of `llvm/Support/Error.h`, first add in
all the missing includes that were previously included transitively
through this header.
This is fixing all files missed in b0abd4893f and
39d8e6e22c.
Differential Revision: https://reviews.llvm.org/D154763
OpenMP 5.1 added OMP_TOOL_VERBOSE_INIT. This env variable is
extremely helpful to understand the issue when loading a tool fails
unexpectedly (e.g., errors from dlopen, when the libc available at
runtime is older than libc used at compile time of the tool -> missed
to load the right gcc module).
This patch replicates the verbose init code from libomp watching
out for a different env variable. Similar to
CLIENT_TOOL_LIBRARIES_VAR, a tool can define the name of
the env var by defining CLIENT_TOOL_VERBOSE_INIT_VAR
before including ompt-multiplex.h.
Alternatively, a tool can define OMPT_MULTIPLEX_TOOL_NAME
to specify the tool name which will be the prefix for both
_TOOL_LIBRARIES and _VERBOSE_INIT var.
Finally, if none of the two macros is defined, the header will
print a compiler warning and look at OMP_TOOL_VERBOSE_INIT.
Patch prepared by Semih Burak
Differential Revision: https://reviews.llvm.org/D112809
This patch adds the intial support for running an RPC server in
libomptarget to handle host services. We interface with the library
provided by the `libc` project to stand up a basic server. We introduce
a new type that is controlled by the plugin and has each device
intialize its interface. We then run a basic server to check the RPC
buffer.
This patch does not fully implement the interface. In the future each
plugin will want to define special handlers via the interface to support
things like malloc or H2D copies coming from RPC. We will also want to
allow the plugin to specify t he number of ports. This is currently
capped in the implementation but will be adjusted soon.
Right now running the server is handled by whatever thread ends up doing
the waiting. This is probably not a completely sound solution but I am
not overly familiar with the behaviour of OpenMP tasks and what would be
required here. This works okay with synchrnous regions, and somewhat
fine with `nowait` regions, but I've observed some weird behavior when
one of those regions calls `exit`.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D154312
The OpenMP specification mentions that omp_test_lock and
omp_test_nest_lock dispatch OMPT callbacks with ompt_mutex_test_lock
and ompt_mutex_test_nest_lock for their kind respectively. Previously,
the values ompt_mutex_lock and ompt_mutex_nest_lock were used. This
could cause issues in application relying on the kind to correctly
determine lock states. This commit changes the kind to the expected
ones.
Also update callback.h and OMPT tests to reflect this change.
Patch prepared by Thyre
Differential Review: https://reviews.llvm.org/D153028
Differential Review: https://reviews.llvm.org/D153031
Differential Review: https://reviews.llvm.org/D153032
OpenMP 5.1 replaced callback ompt_callback_master_t by
ompt_callback_masked_t. In order to stick to the standard,
the implementation is updated accordingly.
Patch prepared by Semih Burak
Differential Revision: https://reviews.llvm.org/D112798
In the functions ompt_multiplex_get_own_ompt_data
and ompt_multiplex_get_client_ompt_data in addition to
data being NULL, also the void pointer field "ptr" of
"data" could be NULL, leading to a subsequent
segfault.
This patch add the corresponding checks.
Patch prepared by Semih Burak
Differential Revision: https://reviews.llvm.org/D112806
The semantic of depend(out:omp_all_memory) is quite similar to taskwait in
that it separates all tasks (with dependency) created before an
all_memory-task from all tasks (with dependency) created after an
all_memory-task.
Only a single of such tasks can execute at a time. Similar to taskwait, we
have a CV (AllMemory[1]) in the generating task to express the dependency
sink semantic of an all_memory-task. In addition, AllMemory[0] describes the
dependency source semantic of an all_memory-task. All tasks with dependency
create an HB-arc towards the sink and terminate an HB-arc from the source.
Since we expect that not many applications will use such dependency, the
support for handling the synchronization semantic is off by default and
can be turned on using ARCHER_OPTION="all_memory=1". The most costly part
is the precautionary posting of an HB-arc towards the sink, which represents
a potentially contentious write from all concurrently executing sibling tasks.
A warning is printed at runtime, when the option is off while such dependency
is observed. In most cases the lazy activation will still lead to false alerts.
Differential Revision: https://reviews.llvm.org/D111895
omp_all_memory currently has no representation in OMPT.
Adding new dependency flags as suggested by omp-lang issue #3007.
Differential Revision: https://reviews.llvm.org/D111788