llvm-capstone

mirror of https://github.com/capstone-engine/llvm-capstone.git synced 2024-11-24 14:20:17 +00:00

Author	SHA1	Message	Date
Atmn Patel	ec8f4a38c8	[OpenMP][Libomptarget] Introduce Remote Offloading Plugin This introduces a remote offloading plugin for libomptarget. This implementation relies on gRPC and protobuf, so this library will only build if both libraries are available on the system. The corresponding server is compiled to `openmp-offloading-server`. This is a large change, but the only way to split this up is into RTL/server but I fear that could introduce an inconsistency amongst them. Ideally, tests for this should be added to the current ones that but that is problematic for at least one reason. Given that libomptarget registers plugin on a first-come-first-serve basis, if we wanted to offload onto a local x86 through a different process, then we'd have to either re-order the plugin list in `rtl.cpp` (which is what I did locally for testing) or find a better solution for runtime plugin registration in libomptarget. Differential Revision: https://reviews.llvm.org/D95314	2021-01-26 15:33:38 -05:00
Atmn	683719bc0c	[OpenMP][Libomptarget] Introduce changes to support remote plugin In order to support remote execution, we need to be able to send the target binary description to the remote host for registration (and consequent deregistration). To support this, I added these two optional new functions to the plugin API: - `__tgt_rtl_register_lib` - `__tgt_rtl_unregister_lib` These functions will be called to properly manage the instance of libomptarget running on the remote host. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D93293	2021-01-26 14:19:27 -05:00
Jon Chesterfield	32cc5564e2	[libomptarget][devicertl][amdgpu] Fix build, variable renaming error	2021-01-26 19:05:21 +00:00
Shilei Tian	7c03f7d7d0	[OpenMP][deviceRTLs] Build the deviceRTLs with OpenMP instead of target dependent language From this patch (plus some landed patches), `deviceRTLs` is taken as a regular OpenMP program with just `declare target` regions. In this way, ideally, `deviceRTLs` can be written in OpenMP directly. No CUDA, no HIP anymore. (Well, AMD is still working on getting it work. For now AMDGCN still uses original way to compile) However, some target specific functions are still required, but they're no longer written in target specific language. For example, CUDA parts have all refined by replacing CUDA intrinsic and builtins with LLVM/Clang/NVVM intrinsics. Here're a list of changes in this patch. 1. For NVPTX, `DEVICE` is defined empty in order to make the common parts still work with AMDGCN. Later once AMDGCN is also available, we will completely remove `DEVICE` or probably some other macros. 2. Shared variable is implemented with OpenMP allocator, which is defined in `allocator.h`. Again, this feature is not available on AMDGCN, so two macros are redefined properly. 3. CUDA header `cuda.h` is dropped in the source code. In order to deal with code difference in various CUDA versions, we build one bitcode library for each supported CUDA version. For each CUDA version, the highest PTX version it supports will be used, just as what we currently use for CUDA compilation. 4. Correspondingly, compiler driver is also updated to support CUDA version encoded in the name of bitcode library. Now the bitcode library for NVPTX is named as `libomptarget-nvptx-cuda_[cuda_version]-sm_[sm_number].bc`, such as `libomptarget-nvptx-cuda_80-sm_20.bc`. With this change, there are also multiple features to be expected in the near future: 1. CUDA will be completely dropped when compiling OpenMP. By the time, we also build bitcode libraries for all supported SM, multiplied by all supported CUDA version. 2. Atomic operations used in `deviceRTLs` can be replaced by `omp atomic` if OpenMP 5.1 feature is fully supported. For now, the IR generated is totally wrong. 3. Target specific parts will be wrapped into `declare variant` with `isa` selector if it can work properly. No target specific macro is needed anymore. 4. (Maybe more...) Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D94745	2021-01-26 12:28:47 -05:00
George Rokos	94cf89d1c2	[libomptarget][NFC] Fixed obsolete function names in comments	2021-01-26 07:39:42 -08:00
Alexey Bataev	4a63e53373	[LIBOMPTARGET]FIX define declaration, NFC Fixed declaration of define by adding a comma symbol. Required to fix build without profiling.	2021-01-26 07:43:31 -05:00
Johannes Doerfert	8c7fdc4c61	[OpenMP] Add source location information to the libomptarget profile In much of the libomptarget interface we have an ident_t object now, if it is not null we can use it to improve the profile output. For now, we simply use the ident_t "source information string" as generated by the FE. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D95282	2021-01-25 22:43:43 -06:00
Shilei Tian	9d64275ae0	[OpenMP] Added the support for hidden helper task in RTL The basic design is to create an outer-most parallel team. It is not a regular team because it is only created when the first hidden helper task is encountered, and is only responsible for the execution of hidden helper tasks. We first use `pthread_create` to create a new thread, let's call it the initial and also the main thread of the hidden helper team. This initial thread then initializes a new root, just like what RTL does in initialization. After that, it directly calls `__kmpc_fork_call`. It is like the initial thread encounters a parallel region. The wrapped function for this team is, for main thread, which is the initial thread that we create via `pthread_create` on Linux, waits on a condition variable. The condition variable can only be signaled when RTL is being destroyed. For other work threads, they just do nothing. The reason that main thread needs to wait there is, in current implementation, once the main thread finishes the wrapped function of this team, it starts to free the team which is not what we want. Two environment variables, `LIBOMP_NUM_HIDDEN_HELPER_THREADS` and `LIBOMP_USE_HIDDEN_HELPER_TASK`, are also set to configure the number of threads and enable/disable this feature. By default, the number of hidden helper threads is 8. Here are some open issues to be discussed: 1. The main thread goes to sleeping when the initialization is finished. As Andrey mentioned, we might need it to be awaken from time to time to do some stuffs. What kind of update/check should be put here? Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D77609	2021-01-25 22:16:17 -05:00
Jon Chesterfield	357eea6e8b	Revert "[libomptarget][cuda] Gracefully handle missing cuda library" This reverts commit `fafd45c01f`.	2021-01-26 03:14:53 +00:00
Jon Chesterfield	fafd45c01f	[libomptarget][cuda] Gracefully handle missing cuda library [libomptarget][cuda] Gracefully handle missing cuda library If using dynamic cuda, and it failed to load, it is not safe to call cuGetErrorString. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D95412	2021-01-26 02:54:00 +00:00
Shilei Tian	3333244d77	[OpenMP][deviceRTLs] Remove omp_is_initial_device `omp_is_initial_device` in device code was implemented as a builtin function in D38968 for a better performance. Therefore there is no chance that this function will be called to `deviceRTLs`. As we're moving to build `deviceRTLs` with OpenMP compiler, this function can lead to a compilation error. This patch just simply removes it. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D95397	2021-01-25 18:34:23 -05:00
Shilei Tian	27cc4a8138	[OpenMP][NVPTX] Rewrite CUDA intrinsics with NVVM intrinsics This patch makes prep for dropping CUDA when compiling `deviceRTLs`. CUDA intrinsics are replaced by NVVM intrinsics which refers to code in `__clang_cuda_intrinsics.h`. We don't want to directly include it because in the near future we're going to switch to OpenMP and by then the header cannot be used anymore. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D95327	2021-01-25 14:14:30 -05:00
Joseph Huber	93eef7d8e9	[OpenMP][NFC] Fix SourceInfo.h variable names Summary: Fix the names to use Pascal case to comply with the LLVM coding guidelines. `ident_t` is required for compatibility with the rest of libomp.	2021-01-25 12:43:34 -05:00
Jon Chesterfield	95f0d1edaf	[libomptarget] Compile with older cuda, revert D95274 [libomptarget] Compile with older cuda, revert D95274 Fixes regression reported in comments of D95274. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D95367	2021-01-25 16:12:56 +00:00
Jon Chesterfield	e5e448aafa	[libomptarget][cuda] Fix build, change missed from D95274	2021-01-24 18:30:04 +00:00
Shilei Tian	cfd978d5d3	[OpenMP] Fixed test environment of `check-libomptarget-nvptx` D95161 removed the option `--libomptarget-nvptx-path`, which is used in the tests for `libomptarget-nvptx`. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D95293	2021-01-24 13:18:33 -05:00
Jon Chesterfield	c3074d48d3	[libomptarget][nvptx] Replace cuda atomic primitives with clang intrinsics [libomptarget][nvptx] Replace cuda atomic primitives with clang intrinsics Tested by diff of IR generated for target_impl.cu before and after. NFC. Part of removing deviceRTL build time dependency on cuda SDK. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D95294	2021-01-24 10:59:15 +00:00
Jon Chesterfield	dc70c56be5	[libomptarget][amdgpu][nfc] Update comments [libomptarget][amdgpu][nfc] Update comments Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D95295	2021-01-23 22:53:58 +00:00
Jon Chesterfield	78b0630b72	[libomptarget][cuda] Call v2 functions explicitly [libomptarget][cuda] Call v2 functions explicitly rtl.cpp calls functions like cuMemFree that are replaced by a macro in cuda.h with cuMemFree_v2. This patch changes the source to use the v2 names consistently. See also D95104, D95155 for the idea. Alternatives are to use a mixture, e.g. call the macro names and explictly dlopen the _v2 names, or to keep the current status where the symbols are replaced by macros in both files Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D95274	2021-01-23 20:33:13 +00:00
Hansang Bae	480cbed31e	[OpenMP] Remove unnecessary pointer checks in a few locations Also, return NULL from unsuccessful OMPT function lookup. Differential Revision: https://reviews.llvm.org/D95277	2021-01-22 19:18:50 -06:00
Jon Chesterfield	47e95e87a3	[libomptarget] Build cuda plugin without cuda installed locally [libomptarget] Build cuda plugin without cuda installed locally Compiles a new file, `plugins/cuda/dynamic_cuda/cuda.cpp`, to an object file that exposes the same symbols that the plugin presently uses from libcuda. The object file contains dlopen of libcuda and cached dlsym calls. Also provides a cuda.h containing the subset that is used. This lets the cmake file choose between the system cuda and a dlopen shim, with no changes to rtl.cpp. The corresponding change to amdgpu is postponed until after a refactor of the plugin to reduce the size of the hsa.h stub required Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D95155	2021-01-23 00:15:04 +00:00
Joseph Schuchart	edbcc17b7a	[OpenMP] libomp: properly initialize buckets in __kmp_dephash_extend The buckets are initialized in __kmp_dephash_create but when they are extended the memory is allocated but not NULL'd, potentially leaving some buckets uninitialized after all entries have been copied into the new allocation. This commit makes sure the buckets are properly initialized with NULL before copying the entries. Differential Revision: https://reviews.llvm.org/D95167	2021-01-22 20:29:46 +03:00
Jon Chesterfield	9b19ecb8f1	[libomptarget][devicertl] Drop templated atomic functions [libomptarget][devicertl] Drop templated atomic functions The five __kmpc_atomic templates are instantiated a total of seven times. This change replaces the template with explictly typed functions, which have the same prototype for amdgcn and nvptx, and implements them with the same code presently in use. Rolls in the accepted but not yet landed D95085. The unsigned long long type can be replaced with uint64_t when replacing the cuda function. Until then, clang warns on casting a pointer to one to a pointer to the other. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D95093	2021-01-22 14:48:22 +00:00
Joseph Huber	119a9ea13f	[OpenMP] Fix failing test due to change in offloading flags Summary: Prior to D91261 the information checked the OMP_MAP_TARGET_PARAM flag, change this as it has been removed. The INFO macro was changed to accept a flag as input to make conditionally printing information easier. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D95133	2021-01-21 14:09:36 -05:00
Giorgis Georgakoudis	6b7645dd31	[OpenMP] Add time profiling support in libomp Profiling has been recently implemented in libomptarget (D93055). This patch enables time profiling support for libomptarget in libomp, to support profiling of multi-threaded execution of offloaded regions. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D94855	2021-01-21 09:15:14 -08:00
Shilei Tian	48c54f0f62	[OpenMP][NVPTX] Added forward declaration for atomic operations Pretty similar to D95058, this patch added forward declaration for CUDA atomic functions. We already have definitions with right mangled names in internal CUDA headers so the forward declaration here can work properly. Reviewed By: jdoerfert, JonChesterfield Differential Revision: https://reviews.llvm.org/D95085	2021-01-21 10:37:16 -05:00
Joseph Huber	e4eaf9d820	[OpenMP] Add support for mapping names in mapper API Summary: The custom mapper API did not previously support the mapping names added previously. This means they were not present if a user requested debugging information while using the mapper functions. This adds basic support for passing the mapped names to the runtime library. Reviewers: jdoerfert Differential Revision: https://reviews.llvm.org/D94806	2021-01-21 09:26:44 -05:00
Shilei Tian	33a5d212c6	[OpenMP][NVPTX] Added forward declaration to pave the way for building deviceRTLs with OpenMP Once we switch to build deviceRTLs with OpenMP, primitives and CUDA intrinsics cannot be used directly anymore because `__device__` is not recognized by OpenMP compiler. To avoid involving all CUDA internal headers we had in `clang`, we forward declared these functions. Eventually they will be transformed into right LLVM instrinsics. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D95058	2021-01-20 15:56:02 -05:00
Jon Chesterfield	fbc1dcb946	[libomptarget][devicertl][nfc] Simplify target_atomic abstraction [libomptarget][devicertl][nfc] Simplify target_atomic abstraction Atomic functions were implemented as a shim around cuda's atomics, with amdgcn implementing those symbols as a shim around gcc style intrinsics. This patch folds target_atomic.h into target_impl.h and folds amdgcn. Further work is likely to be useful here, either changing to openmp's atomic interface or instantiating the templates on the few used types in order to move them into a cuda/c++ implementation file. This change is mostly to group the remaining uses of the cuda api under nvptx' target_impl abstraction. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D95062	2021-01-20 19:50:50 +00:00
Jon Chesterfield	ea616f9026	[libomptarget][devicertl][nfc] Remove some cuda intrinsics, simplify [libomptarget][devicertl][nfc] Remove some cuda intrinsics, simplify Replace __popc, __ffs with clang intrinsics. Move kmpc_impl_min to only file that uses it and replace template with explictly typed. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D95060	2021-01-20 19:45:05 +00:00
Shilei Tian	fd70f70d1e	[OpenMP][NVPTX] Replaced CUDA builtin vars with LLVM intrinsics Replaced CUDA builtin vars with LLVM intrinsics such that we don't need definitions of those intrinsics. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D95013	2021-01-20 12:02:06 -05:00
Jon Chesterfield	e069662deb	[libomptarget][devicertl] Wrap source in declare target pragmas [libomptarget][devicertl] Wrap source in declare target pragmas Factored out of D93135 / D94745. C++ and cuda ignore unknown pragmas so this is a NFC for the current implementation language. Removes noise from patches for building deviceRTL as openmp. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D95048	2021-01-20 15:50:41 +00:00
Hansang Bae	2d911f7c72	[OpenMP] Fix atomic entries for captured logical operation Added missing code for the captured atomic operation. Differential Revision: https://reviews.llvm.org/D94848	2021-01-19 09:59:28 -06:00
AndreyChurbanov	a60bc55c69	[OpenMP] libomp: cleanup parsing of OMP_ALLOCATOR env variable. Differential Revision: https://reviews.llvm.org/D94932	2021-01-19 16:21:22 +03:00
Kelvin Li	9d81073acb	[OpenMP][Docs] Fix typos in FAQ (NFC)	2021-01-18 18:55:58 -05:00
AndreyChurbanov	aa3a59e0c6	[OpenMP][NFC] Fix test The test fails if memkind library is accessible.	2021-01-19 00:05:34 +03:00
Shilei Tian	9bf843bdc8	Revert "[OpenMP] Added the support for hidden helper task in RTL" This reverts commit `ed939f853d`.	2021-01-18 06:57:52 -05:00
Chandler Carruth	f855751c12	Fix openmp CMake build on non-Linux AArch64 systems. This just checks for `/proc/cpuinfo` existing before reading it. Tested on an ARM macOS machine.	2021-01-17 16:18:31 -08:00
Shilei Tian	ed939f853d	[OpenMP] Added the support for hidden helper task in RTL The basic design is to create an outer-most parallel team. It is not a regular team because it is only created when the first hidden helper task is encountered, and is only responsible for the execution of hidden helper tasks. We first use `pthread_create` to create a new thread, let's call it the initial and also the main thread of the hidden helper team. This initial thread then initializes a new root, just like what RTL does in initialization. After that, it directly calls `__kmpc_fork_call`. It is like the initial thread encounters a parallel region. The wrapped function for this team is, for main thread, which is the initial thread that we create via `pthread_create` on Linux, waits on a condition variable. The condition variable can only be signaled when RTL is being destroyed. For other work threads, they just do nothing. The reason that main thread needs to wait there is, in current implementation, once the main thread finishes the wrapped function of this team, it starts to free the team which is not what we want. Two environment variables, `LIBOMP_NUM_HIDDEN_HELPER_THREADS` and `LIBOMP_USE_HIDDEN_HELPER_TASK`, are also set to configure the number of threads and enable/disable this feature. By default, the number of hidden helper threads is 8. Here are some open issues to be discussed: 1. The main thread goes to sleeping when the initialization is finished. As Andrey mentioned, we might need it to be awaken from time to time to do some stuffs. What kind of update/check should be put here? Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D77609	2021-01-16 14:13:35 -05:00
Jon Chesterfield	214387c2c6	[libomptarget][nvptx] Reduce calls to cuda header [libomptarget][nvptx] Reduce calls to cuda header Remove use of clock_t in favour of a builtin. Drop a preprocessor branch. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D94731	2021-01-15 02:16:33 +00:00
Jon Chesterfield	6e7094c14b	[libomptarget][nvptx][nfc] Move target_impl functions out of header [libomptarget][nvptx][nfc] Move target_impl functions out of header This removes most of the differences between the two target_impl.h. Also change name mangling from C to C++ for __kmpc_impl_*_lock. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D94728	2021-01-15 00:19:48 +00:00
Shilei Tian	547b032ccc	[OpenMP] Remove omptarget-nvptx from deps as it is no longer a valid target `omptarget-nvptx` is still a dependence for `check-libomptarget-nvtpx` although it has been removed by D94573. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D94725	2021-01-14 19:16:11 -05:00
Shilei Tian	64e9e9aeee	[OpenMP] Dropped unnecessary define when compiling deviceRTLs for NVPTX The comment said CUDA 9 header files use the `nv_weak` attribute which `clang` is not yet prepared to handle. It's three years ago and now things have changed. Based on my test, removing the definition doesn't have any problem on my machine with CUDA 11.1 installed. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D94700	2021-01-14 13:55:12 -05:00
Shilei Tian	763c1f9933	[OpenMP] Drop the static library libomptarget-nvptx For NVPTX target, OpenMP provides a static library `libomptarget-nvptx` built by NVCC, and another bitcode `libomptarget-nvptx-sm_{$sm}.bc` generated by Clang. When compiling an OpenMP program, the `.bc` file will be fed to `clang` in the second run on the program that compiles the target part. Then the generated PTX file will be fed to `ptxas` to generate the object file, and finally the driver invokes `nvlink` to generate the binary, where the static library will be appened to `nvlink`. One question is, why do we need two libraries? The only difference is, the static library contains `omp_data.cu` and the bitcode library doesn't. It's unclear why they were implemented in this way, but per D94565, there is no issue if we also include the file into the bitcode library. Therefore, we can safely drop the static library. This patch is about the change in OpenMP. The driver will be updated as well if this patch is accepted. Reviewed By: jdoerfert, JonChesterfield Differential Revision: https://reviews.llvm.org/D94573	2021-01-14 13:34:25 -05:00
Jon Chesterfield	5d165f0b89	[libomptarget][amdgpu] Fix kernel launch tracing to match previous behavior Restore control of kernel launch tracing to be >= 1 as it was before export LIBOMPTARGET_KERNEL_TRACE=1 Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D94695	2021-01-14 18:13:22 +00:00
Terry Wilmarth	4fe17ada55	[OpenMP] Fix hierarchical barrier Hierarchical barrier is an experimental barrier algorithm that uses aspects of machine hierarchy to define the barrier tree structure. This patch fixes offset calculation in hierarchical barrier. The offset is used to store info on a flag about sleeping threads waiting on a location stored in the flag. This commit also fixes a potential deadlock in hierarchical barrier when using infinite blocktime by adjusting the offset value of leaf kids so that it matches the value of leaf state. It also adds testing of default barriers with infinite blocktime, and also tests hierarchical barrier algorithm with both default and infinite blocktime. Patch by Terry Wilmarth and Nawrin Sultana. Differential Revision: https://reviews.llvm.org/D94241	2021-01-13 10:22:57 -06:00
Joseph Huber	a957634942	[OpenMP] Add documentation for error messages and release notes Add extra information to the runtime page describing the error messages and add information to the release notes for clang 12.0 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D94562	2021-01-13 11:00:41 -05:00
Jon Chesterfield	84e0b14a0a	[libomptarget][nvptx] Include omp_data.cu in bitcode deviceRTL [libomptarget][nvptx] Include omp_data.cu in bitcode deviceRTL Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D94565	2021-01-13 03:51:11 +00:00
Hansang Bae	bba3a82b56	[OpenMP] Use persistent memory for omp_large_cap_mem This change enables volatile use of persistent memory for omp_large_cap_mem* on supported systems. It depends on libmemkind's support for persistent memory, and requirements/details can be found at the following url. https://pmem.io/2020/01/20/memkind-dax-kmem.html Differential Revision: https://reviews.llvm.org/D94353	2021-01-12 20:35:27 -06:00
Hansang Bae	6f0f022038	[OpenMP] Update allocator trait key/value definitions Use new definitions introduced in 5.1 specification. Differential Revision: https://reviews.llvm.org/D94277	2021-01-12 20:09:45 -06:00
Shilei Tian	01f1273fe2	[OpenMP] Fixed a typo in openmp/CMakeLists.txt	2021-01-12 17:00:49 -05:00
Shilei Tian	68ff52ffea	[OpenMP] Fixed the link error that cannot find static data member Constant static data member can be defined in the class without another define after the class in C++17. Although it is C++17, Clang can still handle it even w/o the flag for C++17. Unluckily, GCC cannot handle that. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D94541	2021-01-12 16:48:28 -05:00
Jon Chesterfield	33e2494bea	[libomptarget][amdgpu][nfc] Fix build on centos [libomptarget][amdgpu][nfc] Fix build on centos rtl.cpp replaced 224 with a #define from elf.h, but that doesn't work on a centos 7 build machine with an old elf.h Reviewed By: ronlieb Differential Revision: https://reviews.llvm.org/D94528	2021-01-12 19:40:03 +00:00
Shilei Tian	bdd1ad5e5c	[OpenMP] Fixed include directories for OpenMP when building OpenMP with LLVM_ENABLE_RUNTIMES Some LLVM headers are generated by CMake. Before the installation, LLVM's headers are distributed everywhere, some of which are in `${LLVM_SRC_ROOT}/llvm/include/llvm`, and some are in `${LLVM_BINARY_ROOT}/include/llvm`. After intallation, they're all in `${LLVM_INSTALLATION_ROOT}/include/llvm`. OpenMP now depends on LLVM headers. Some headers depend on headers generated by CMake. When building OpenMP along with LLVM, a.k.a via `LLVM_ENABLE_RUNTIMES`, we need to tell OpenMP where it can find those headers, especially those still have not been copied/installed. Reviewed By: jdoerfert, jhuber6 Differential Revision: https://reviews.llvm.org/D94534	2021-01-12 14:32:38 -05:00
Shilei Tian	0871d6d516	[OpenMP] Move memory manager to plugin and make it a common interface The lifetime of `libomptarget` and its opened plugins are not aligned and it's hard for `libomptarget` to determine when the plugins are destroyed. As a result, some issues (see D94256 for details) occur on some platforms. Actually, if we take target memory as target resources, same as other resources, such as CUDA streams, in each plugin, then the memory manager should also be in the plugin. Also considering some platforms may want to opt out the feature, it makes sense to move the memory manager to plugin, make it a common interface, and let plguin developers determine whether they need it. This is what this patch does. CUDA plugin is taken as example to show how to integrate it. In this way, we can also get a bonus that different thresholds can be set for different platforms. Reviewed By: jdoerfert, JonChesterfield Differential Revision: https://reviews.llvm.org/D94379	2021-01-11 21:33:42 -05:00
Shilei Tian	a81c68ae6b	[OpenMP] Take elf_common.c as a interface library For now `elf_common.c` is taken as a common part included into different plugin implementations directly via `#include "../../common/elf_common.c"`, which is not a best practice. Since it is simple enough such that we don't need to create a real library for it, we just take it as a interface library so that other targets can link it directly. Another advantage of this method is, we don't need to add the folder into header search path which can potentially pollute the search path. VE and AMD platforms have not been tested because I don't have target machines. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D94443	2021-01-11 17:34:26 -05:00
Shilei Tian	7be3285248	[OpenMP] Not set OPENMP_STANDALONE_BUILD=ON when building OpenMP along with LLVM For now, `*_STANDALONE_BUILD` is set to ON even if they're built along with LLVM because of issues mentioned in the comments. This can cause some issues. For example, if we build OpenMP along with LLVM, we'd like to copy those OpenMP headers to `<prefix>/lib/clang/<version>/include` such that `clang` can find those headers without using `-I <prefix>/include` because those headers will be copied to `<prefix>/include` if it is built standalone. In this patch, we fixed the dependence issue in OpenMP such that it can be built correctly even with `OPENMP_STANDALONE_BUILD=OFF`. The issue is in the call to `add_lit_testsuite`, where `clang` and `clang-resource-headers` are passed as `DEPENDS`. Since we're building OpenMP along with LLVM, `clang` is set by CMake to be the C/C++ compiler, therefore these two dependences are no longer needed, where caused the dependence issue. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D93738	2021-01-10 16:46:19 -05:00
Shilei Tian	175c336a1c	[OpenMP] Remove copy constructor of `RTLInfoTy` Multiple `RTLInfoTy` objects are stored in a list `AllRTLs`. Since `RTLInfoTy` contains a `std::mutex`, it is by default not a copyable object. In order to support `AllRTLs.push_back(...)` which is currently used, a customized copy constructor is provided. Every time we need to add a new data member into `RTLInfoTy`, we should keep in mind not forgetting to add corresponding assignment in the copy constructor. In fact, the only use of the copy constructor is to push the object into the list, we can of course write it in a way that first emplace a new object back, and then use the reference to the last element. In this way we don't need the copy constructor anymore. If the element is invalid, we just need to pop it, and that's what this patch does. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D94361	2021-01-09 13:01:01 -05:00
Shilei Tian	676c7cb0c0	[OpenMP] Added the support for cache line size 256 for A64FX Fugaku supercomputer is built with the Fujitsu A64FX microprocessor, whose cache line is 256. In current libomp, we only have cache line size 128 for PPC64 and otherwise 64. This patch added the support of cache line 256 for A64FX. It's worth noting that although A64FX is a variant of AArch64, this property is not shared. As a result, in light of UCX source code (`392443ab92/src/ucs/arch/aarch64/cpu.c (L17)`), we can only determine by checking whether the CPU is FUJITSU A64FX. Reviewed By: jdoerfert, Hahnfeld Differential Revision: https://reviews.llvm.org/D93169	2021-01-09 11:58:47 -05:00
Joseph Huber	2ce16810f2	[OpenMP] Always print error messages in libomptarget CUDA plugin Summary: Currently error messages from the CUDA plugins are only printed to the user if they have debugging enabled. Change this behaviour to always print the messages that result in offloading failure. This improves the error messages by indidcating what happened when the error occurs in the plugin library, such as a segmentation fault on the device. Reviewed by: jdoerfert Differential Revision: https://reviews.llvm.org/D94263	2021-01-07 17:47:32 -05:00
Johannes Doerfert	9ae171bcd3	[OpenMP][Docs] Add remarks intro section Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D93735	2021-01-07 14:31:17 -06:00
Joseph Huber	abb174bbc1	[OpenMP] Add example in Libomptarget Information docs Add an example to the OpenMP Documentation on the LIBOMPTARGET_INFO environment variable Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D94246	2021-01-07 15:00:51 -05:00
Hansang Bae	fb1c528526	[OpenMP] Use c_int/c_size_t in Fortran target memory routine interface The Fortran interface is now in line with 5.1 specification. Differential Revision: https://reviews.llvm.org/D94042	2021-01-06 16:28:30 -06:00
Shilei Tian	5acdae1f9a	[OpenMP] Fixed an issue that wrong LLVM headers might be included when building libomptarget Wrong LLVM headers might be included if we don't set `include_directories` to a right place. This will cause a compilation error if LLVM is installed in system directories. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D93737	2021-01-06 17:07:36 -05:00
Shilei Tian	e2a623094f	[OpenMP] Fixed the test environment when building along with LLVM Currently all built libraries in OpenMP are anywhere if building along with LLVM. It is not an issue if we don't execute any test. However, almost all tests for `libomptarget` fails because in the lit configuration, we only set `<build_dir>/libomptarget` to `LD_LIBRARY_PATH` and `LIBRARY_PATH`. Since those libraries are everywhere, `clang` can no longer find `libomptarget.so` or those deviceRTLs anymore. In this patch, we set a unified path for all built libraries, no matter whether it is built along with LLVM or not. In this way, our lit configuration can work propoerly. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D93736	2021-01-06 17:06:16 -05:00
George Rokos	dec02904d2	[libomptarget] Allow calls to omp_target_memcpy with 0 size. Differential Revision: https://reviews.llvm.org/D94095	2021-01-05 16:03:53 -08:00
Joseph Huber	fe5d51a489	[OpenMP] Add using bit flags to select Libomptarget Information Summary: This patch adds more fine-grained support over which information is output from the libomptarget runtime when run with the environment variable LIBOMPTARGET_INFO set. An extensible set of flags can be used to pick and choose which information the user is interested in. Reviewers: jdoerfert JonChesterfield grokos Differential Revision: https://reviews.llvm.org/D93727	2021-01-04 12:03:15 -05:00
Jon Chesterfield	76bfbb74d3	[libomptarget][amdgpu] Call into deviceRTL instead of ockl [libomptarget][amdgpu] Call into deviceRTL instead of ockl Amdgpu codegen presently emits a call into ockl. The same functionality is already present in the deviceRTL. Adds an amdgpu specific entry point to avoid the dependency. This lets simple openmp code (specifically, that which doesn't use libm) run without rocm device libraries installed. Reviewed By: ronlieb Differential Revision: https://reviews.llvm.org/D93356	2021-01-04 16:48:47 +00:00
Hansang Bae	82a29a62ab	[OpenMP] Add definition/interface for target memory routines The change includes new routines introduced in 5.1 and Fortran interface. Differential Revision: https://reviews.llvm.org/D93505	2021-01-04 08:12:57 -06:00
Terry Wilmarth	6b316febb4	[OpenMP] libomp: Handle implicit conversion warnings This patch partially prepares the runtime source code to be built with -Wconversion, which should trigger warnings if any implicit conversions can possibly change a value. For builds done with icc or gcc, all such warnings are handled in this patch. clang gives a much longer list of warnings, particularly for sign conversions, which the other compilers don't report. The -Wconversion flag is commented into cmake files, but I'm not going to turn it on. If someone thinks it is important, and wants to fix all the clang warnings, they are welcome to. Types of changes made here involve either improving the consistency of types used so that no conversion is needed, or else performing careful explicit conversions, when we're sure a problem won't arise. Patch is a combination of changes by Terry Wilmarth and Johnny Peyton. Differential Revision: https://reviews.llvm.org/D92942	2020-12-31 00:39:57 +03:00
Joseph Huber	631501b1f9	[OpenMP] Fixing typo on memory size in Documenation	2020-12-23 11:46:26 -05:00
Joseph Huber	6e60346495	[OpenMP] Fixing Typo in Documentation	2020-12-23 09:17:51 -05:00
Joseph Huber	1c19804ebf	[OpenMP] Add OpenMP Documentation for Libomptarget environment variables Add support to the OpenMP web pages for environment variables supported by Libomptarget and their usage. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D93723	2020-12-22 17:41:27 -05:00
Johannes Doerfert	7b0f9dd79a	[OpenMP][Docs] Fix Typo	2020-12-22 13:06:23 -06:00
Shilei Tian	1eb082c2ea	[OpenMP][Docs] Fixed a typo in the doc that can mislead users to a CMake error When setting `LLVM_ENABLE_RUNTIMES`, lower case word should be used; otherwise, it can cause a CMake error that specific path is not found. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D93719	2020-12-22 14:05:58 -05:00
Johannes Doerfert	9cb748724e	[OpenMP][Docs] Add FAQ entry about math and complex on GPUs Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D93718	2020-12-22 13:05:04 -06:00
Shilei Tian	612ddc3117	[OpenMP][Docs] Updated the faq about building an OpenMP offloading capable compiler After some issues about building runtimes along with LLVM were fixed, building an OpenMP offloading capable compiler is pretty simple. This patch updates the FAQ part in the doc. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D93671	2020-12-22 13:14:53 -05:00
Johannes Doerfert	994bb6eb7d	[OpenMP][NFC] Provide a new remark and documentation If a GPU function is externally reachable we give up trying to find the (unique) kernel it is called from. This can hinder optimizations. Emit a remark and explain mitigation strategies. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D93439	2020-12-17 14:38:26 -06:00
Hansang Bae	e1fd202489	[OpenMP] Add definitions for 5.1 interop to omp.h	2020-12-17 13:03:59 -06:00
Atmn	907886cc5b	[OpenMP][Libomptarget][NFC] Use CMake Variables This patchs adds CMake variables to add subdirectories and include directories for libomptarget and explicitly gives the location of source files. Differential Revision: https://reviews.llvm.org/D93290	2020-12-16 19:05:15 -05:00
Jon Chesterfield	b607837c75	[libomptarget][nfc] Replace static const with enum [libomptarget][nfc] Replace static const with enum Semantically identical. Replaces 0xff... with ~0 to spare counting the f. Has the advantage that the compiler doesn't need to prove the 4/8 byte value dead before discarding it, and sidesteps the compilation question associated with what static means for a single source language. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D93328	2020-12-16 16:40:37 +00:00
Peyton, Jonathan L	5aafdd7b88	[OpenMP] Introduce new file wrapper class for runtime Introduce new kmp_safe_raii_file_t class with RAII semantics for file open/close. It is essentially a wrapper around the C-style FILE* object. This also unifies the way we error report if a file can't be opened. Differential Revision: https://reviews.llvm.org/D92604	2020-12-15 14:46:30 -06:00
Hansang Bae	171ca93c54	[OpenMP] Initialize runtime in the forked child process This patch enables serial initialization in the forked child process to fix unstable runtime behavior when used with Python-based AI tools. Differential Revision: https://reviews.llvm.org/D93230	2020-12-15 07:29:28 -06:00
Giorgis Georgakoudis	e007b32864	[OpenMP] Add time profiling for libomptarget Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D93055	2020-12-11 18:53:37 -08:00
Jon Chesterfield	ce93de3bb2	[libomptarget][nfc] Remove data_sharing type aliasing [libomptarget][nfc] Remove data_sharing type aliasing Libomptarget previous used __kmpc_data_sharing_slot to access values of type __kmpc_data_sharing_{worker,master}_slot_static. This aliasing violation was benign in practice. The master type has since been removed, so a single type can be used instead. This is particularly helpful for the transition to an openmp deviceRTL, as the c++/openmp compiler for amdgcn currently rejects the flexible array member for being an incomplete type. Serves the same purpose as abandoned D86324. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D93075	2020-12-11 02:13:34 +00:00
Hansang Bae	c3b5009aa7	[OpenMP] Use RTM lock for OMP lock with synchronization hint This patch introduces a new RTM lock type based on spin lock which is used for OMP lock with speculative hint on supported architecture. Differential Revision: https://reviews.llvm.org/D92615	2020-12-09 19:14:53 -06:00
Nawrin Sultana	540007b427	[OpenMP] Add strict mode in num_tasks and grainsize This patch adds new API __kmpc_taskloop_5 to accomadate strict modifier (introduced in OpenMP 5.1) in num_tasks and grainsize clause. Differential Revision: https://reviews.llvm.org/D92352	2020-12-09 16:46:30 -06:00
Peyton, Jonathan L	fe3b244ef7	[OpenMP] Fix norespect affinity bug for Windows KMP_AFFINITY=norespect was triggering an error because the underlying process affinity mask was not updated to include the entire machine. The Windows documentation states that the thread affinities must be subsets of the process affinity. This patch also moves the printing (for KMP_AFFINITY=verbose) of whether the initial mask was respected out of each topology detection function and to one location where the initial affinity mask is read. Differential Revision: https://reviews.llvm.org/D92587	2020-12-09 14:32:48 -06:00
Peyton, Jonathan L	9b7d6a6bff	[OpenMP] Fix too long name for shm segment on macOS Remove the user id component to the shm segment name and just use the pid like before. Differential Revision: https://reviews.llvm.org/D92660	2020-12-09 14:31:15 -06:00
Jon Chesterfield	7c59614394	[libomptarget][amdgpu] clang-format src/rtl.cpp	2020-12-09 19:45:51 +00:00
Jon Chesterfield	c9bc414840	[libomptarget][amdgpu] Let default number of teams equal number of CUs	2020-12-09 19:35:34 +00:00
Jon Chesterfield	e191d31159	[libomptarget][amdgpu] Robust handling of device_environment symbol	2020-12-09 19:21:51 +00:00
Jon Chesterfield	cab9f69235	[libomptarget][amdgpu] Improve diagnostics on arch mismatch	2020-12-09 18:55:53 +00:00
Giorgis Georgakoudis	18dff28958	[OpenMP] Add doxygen generation for the runtime Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D92779	2020-12-08 16:20:45 -08:00
AndreyChurbanov	fff1abc406	[OpenMP] NFC: comment adjusted	2020-12-07 19:50:14 +03:00
AndreyChurbanov	22558c8501	[OpenMP] libomp: Fix possible NULL dereferences Check pointer returned by strchr, as it can be NULL in case of broken format of input string. Introduced new function __kmp_str_loc_numbers for fast parsing of numbers only in the location string. Also made some cleanup of __kmp_str_loc_init declaration and usage: - changed type of init_fname parameter to bool; - changed input from true to false in places where fname is not used. Differential Revision: https://reviews.llvm.org/D90962	2020-12-07 19:09:07 +03:00
Jon Chesterfield	71f4693020	[libomptarget][amdgpu] Add plumbing to call into hostrpc lib, if linked	2020-12-07 15:24:01 +00:00
Jon Chesterfield	e1b8e8a1f4	[libomptarget][amdgpu] Skip device_State allocation when using bss global	2020-12-06 12:13:56 +00:00
Joachim Protze	a148216b31	[OpenMP][OMPT] Fix OMPT return address guard for gomp interface D91692 missed various locations in kmp_gsupport, where the scope for OMPT_STORE_RETURN_ADDRESS is too narrow, i.e. the scope ends before the OMPT callback is called in some nested function. This patch fixes the scoping issue, so that all OMPT tests pass, when the tests are built with gcc. Differential Revision: https://reviews.llvm.org/D92121	2020-12-05 19:06:28 +01:00
Joachim Protze	d3ec512b1d	[OpenMP][OMPT] Make sure that 0 is never used as ID in tests (NFC)	2020-12-04 18:41:56 +01:00

1 2 3 4 5 ...

1548 Commits