We use `find_program` to identify a few programs we use for offloading.
Namely, `clang-offload-packger`, `amdgpu-arch`, and `nvptx-arch`.
Currently the logic allows these to bind to any tool matching this name,
so it will find it on the system. This meant that if the installation
was deleted or it found a broken binary the compilation would fail. We
should only pull these from the current LLVM binary directory.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D158203
This patch moves the storage from inside the libc's optional class to
its own set of class, so we can support non-trivially destructible
objects.
These new classes check if the class is or isn't non trivially
destructible and instantiate the correct base class, i.e., we explicitly
call the destructor if an object is not trivially destructible.
The motivation is to support cpp::optional<UInt<128>> (used by
UInt<T>::div), which is used when a platform does not support native
int128_t types (e.g., riscv32).
The code here is a trimmed-down version of llvm::optional.
Reviewed By: michaelrj
Differential Revision: https://reviews.llvm.org/D150211
limits.h currently interferes with Clang's limits.h. include_next
emits a warning because it is a GNU extension. Will re add this once
we figure out a good solution.
This reverts commits 13bbca8d69,
002cba0329, and
0fb3066873.
This patch implements the `fopen`, `fclose`, and `fread` functions on
the GPU. These are pretty much re-implemented from what existed but
using the new interface. Having this subset allows us to test the
interface a bit more strenuously since we can write and read to a file.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D157622
This fixes the following compilation error: no known conversion from 'off_t *'
(aka 'long long *') to 'long' for 5th argument.
Since pointers are 32-bit long anyway, casting it to long shouldn't be a
problem. Tested on rv32.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D157792
The LLVM-libc build itself will override include paths and prefer it's
own limits.h over the compiler's limits.h. Because we rely on the
compiler limits.h for numerical limits in LLVM-libc it needs to be
include_next:ed if not already included. The other method to work
around this is to define all numeric macros in place.
Signed-off-by: Alfred Persson Forsberg <cat@catcream.org>
Reviewed By: thesamesam
Differential Revision: https://reviews.llvm.org/D158040
To guarantee accuracy for all potential float values, this patch adds a
fuzzer to compare the results for float conversions from our printf
against MPFR's.
Reviewed By: lntue
Differential Revision: https://reviews.llvm.org/D156495
Fuzzing revealed bugs in the %e and %g conversions. Since these are very
similar, they are grouped together. Again, most of the bugs were related
to rounding. As an example, previously the code to check if the number
was truncated only worked for digits below the decimal point, due to it
being originally designed for %f. This patch adds a mechanism to check
the digits above the decimal point for both %e and %g.
Reviewed By: sivachandra, lntue
Differential Revision: https://reviews.llvm.org/D157536
Fuzzing revealed several bugs in the %f float conversion. This patch
fixes them. Most of these bugs are related to rounding, such as
1.999...999 being rounded to 2.999...999 instead of 2.000...000 due to
rounding up not properly changing the nines to zeros. Additionally, much
of the rounding infrastructure has been refactored out so it can be
shared with the other conversions.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D157535
The trailing zeroes were previously not counted when calculating the
padding, which caused a high-precision number to get too much padding.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D157534
In the same way that get_explicit_mantissa is used to get the mantissa
with all the implicit bits spelled out, get_explicit_exponent gives you
the exponent with the special cases handled. Mainly it handles the cases
where the exponent is zero, which causes the exponent to either be 1
higher than expected, or just 0.
Reviewed By: lntue
Differential Revision: https://reviews.llvm.org/D157156
Summary:
We implement round by implicitly converting these floating point values.
Sometimes this emits warnings that we should silence by making these
explicit casts.
The port count and index into the ports was originally written as a
64-bit number. This was with an abundance of caution, however it's
highly unlikely that any configuration will excede a 32-bit number as
most machines will require something in the low-thousands. Because GPUs
are functionally 32-bit in many of their operations this costs us some
extra time and registers to do the 64-bit operations. Doing this saves
us about four registers in most tests.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D157980
This patch moves the storage from inside the libc's optional class to
its own set of class, so we can support non-trivially destructible
objects.
These new classes check if the class is or isn't non trivially
destructible and instantiate the correct base class, i.e., we explicitly
call the destructor if an object is not trivially destructible.
The motivation is to support cpp::optional<UInt<128>> (used by
UInt<T>::div), which is used when a platform does not support native
int128_t types (e.g., riscv32).
The code here is a trimmed-down version of llvm::optional.
Reviewed By: michaelrj
Differential Revision: https://reviews.llvm.org/D150211
This header contains implementation specific constants.
The compiler already provides its own limits.h with numerical limits
conforming to freestanding ISO C. But it is missing extensions like
POSIX, and does for example not include <linux/limits.h> which is
expected on a Linux system, therefore, an LLVM libc implementation of
limits.h is needed for hosted (__STDC_HOSTED__) environments.
Reviewed By: michaelrj
Differential Revision: https://reviews.llvm.org/D156961
This code is creating a warning on Fuchsia, this patch should fix that
warning.
Reviewed By: mcgrathr
Differential Revision: https://reviews.llvm.org/D157546
Nvidia uses a 32-bit mask, but we store it in a common 64-bit integer to
provide it with a compatible ABI with the AMD implementaiton which may
use a 64-bit mask. Silence these warnings by explicitly casting to the
smaller value, we know this is always legal as the result will always
fit into the smaller value if it was generated on NVPTX.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D157548
For whatever reason, the CMake did not like having the `generic_`
version live in the same directory. This patch pushes them to a new
directory, which is probably clearer anyway.
Reviewed By: michaelrj
Differential Revision: https://reviews.llvm.org/D157544
This more closely matches the stricter warnings used for
this same code in the Fuchsia build.
Reviewed By: michaelrj
Differential Revision: https://reviews.llvm.org/D156630
The GPU has much tighter requirements for handling IO functions.
Previously we attempted to define the GPU as one of the platform files.
Using a common interface allowed us to easily define these functions
without much extra work. However, it became more clear that this was a
poor fit for the GPU. The file interface uses function pointers, which
prevented inlining and caused bad perfromance and resource usage on the
GPU. Further, using an actual `FILE` type rather than referring to it as
a host stub prevented us from usin files coming from the host on the GPU
device.
After talking with @sivachandra, the approach now is to simply define
GPU specific versions of the functions we intend to support. Also, we
are ignoring `errno` for the time being as it is unlikely we will ever
care about supporting it fully.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D157427
Sometimes the vfprintf test was failing, I suspect that's due to it
using the same filename as the fprintf test. This patch fixes that
problem by changing the filename of the vfprintf output file.
Reviewed By: lntue
Differential Revision: https://reviews.llvm.org/D157523
This patch fixes the floating point conversion warnings found with
`-Wconversion` and `-Wno-sign-conversion`. These were the last warnings
I found, meaning that once this lands https://reviews.llvm.org/D156630
should be unblocked.
Reviewed By: mcgrathr, lntue
Differential Revision: https://reviews.llvm.org/D157449
This patch adds support for yocto images, which are custom Linux-base
systems created by yocto.
$CMAKE_HOST_SYSTEM_NAME returns "poky" as the system name, but it is a
linux image, so we just replace the name with "linux", so libc can use
the correct path.
Reviewed By: michaelrj
Differential Revision: https://reviews.llvm.org/D157404
This patch is an alternative to D155902. It provides the following benefits:
- No buffer manual allocation and error handling for the general case
- More flexible API : width specifier, sign and prefix handling
- Simpler code
The more flexible API removes the need for manually tweaking the buffer afterwards, and so prevents relying on implementation details of IntegerToString.
Reviewed By: michaelrj, jhuber6
Differential Revision: https://reviews.llvm.org/D156981
tests not compile with `-ffreestanding` can pull unwanted dependencies like `limits.h` which defines `PTHREAD_STACK_MIN`.
This is what caused the build bot failure in https://reviews.llvm.org/D156981#4570776.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D157444
This patch is an alternative to D155902. It provides the following benefits:
- No buffer manual allocation and error handling for the general case
- More flexible API : width specifier, sign and prefix handling
- Simpler code
The more flexible API removes the need for manually tweaking the buffer afterwards, and so prevents relying on implementation details of IntegerToString.
Reviewed By: michaelrj, jhuber6
Differential Revision: https://reviews.llvm.org/D156981
Summrary:
Following D156014 we can now use aliases for NVPTX, removing this source
of divergence. We require at least +ptx63 and at least sm_30 for
`.alias` but this is already within what we build for with `libc`
support.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D157323
Update documentaiton now that macros are laid out in a more structured way.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D143911
This patch is large, but is almost entirely just adding casts to calls
to syscall_impl. Much of the work was done programatically, with human
checking when the syntax or types got confusing.
Reviewed By: mcgrathr
Differential Revision: https://reviews.llvm.org/D156950
Some printf implementations perform a null check on pointers passed to
%s. While that's not in the standard, this patch adds it as an option
for compatibility. It also puts a similar check in %n behind the same
flag.
Reviewed By: lntue
Differential Revision: https://reviews.llvm.org/D156923
This was generated using clang-tidy and clang-apply-replacements,
on src/string/*.cpp for just the llvmlibc-inline-function-decl
check, after applying https://reviews.llvm.org/D157164, and then
some manual fixup.
Reviewed By: abrachet
Differential Revision: https://reviews.llvm.org/D157169
The v variants of the printf functions take their variadic arguments as
a va_list instead of as individual arguments. They are otherwise
identical to the corresponding printf variants. This patch adds them
(vprintf, vfprintf, vsprintf, and vsnprintf) as well as tests.
Reviewed By: phosek
Differential Revision: https://reviews.llvm.org/D157138
This broke some bots that don't have linux/time_types.h available
(libc-x86_64-debian-*).
The header is needed because of __kernel_timespec, and since this is
only needed when SYS_sched_rr_get_interval_time64 is available, guarding
the include should fix the broken bot.
This patch adds a bunch of ifdefs to handle the 32 bit versions of
some syscalls, which often only append a 64 to the name of the syscall
(with exception of SYS_lseek -> SYS_llseek and SYS_futex ->
SYS_futex_time64)
This patch also tries to handle cases where wait4 is not available
(as in riscv32): to implement wait, wait4 and waitpid when wait4 is
not available, we check for alternative wait calls and ultimately rely
on waitid to implement them all.
In riscv32, only waitid is available, so we need it to support this
platform.
Reviewed By: michaelrj
Differential Revision: https://reviews.llvm.org/D148371
The Fuchsia zxtest library has ASSERT_DEATH but not EXPECT_DEATH.
The latter may be added in the future, but for now just use the
former as substitute.
Reviewed By: abrachet
Differential Revision: https://reviews.llvm.org/D156940
Other libc implementations support underscores in NaN(n-char-sequence)
strings. Us not supporting that is causing fuzz failures, so this patch
solves the problem.
Reviewed By: lntue
Differential Revision: https://reviews.llvm.org/D156927
In 32-bit systems, sizeof(size_t) is 4, so we fail to build an 128-bit
integer in mul_shift_mod_1e9, which ends up ignoring the top bits in the
mantissa.
This patch fixes the issue by calling the Uint constructor directly. If
it's a system that supports 128-bit integers, the constructor that takes
a value will be called, if the system doesn't support 128-bit integers
(like rv32), mantissa is already a UInt.
Reviewed By: lntue, michaelrj
Differential Revision: https://reviews.llvm.org/D156813
This patch fixes the return time for sched_getscheduler which was set to
always zero. The syscall documentation, however, defines:
On success, sched_getscheduler() returns the policy for the thread (a
nonnegative integer).
I also changed the return type for sched_setscheduler, but this change
didn't impact and test case.
This patch also removes the duplicated code from param_and_scheduler_test.cpp
and adds SCHED_BATCH and SCHED_IDLE to the tests.
Reviewed By: michaelrj
Differential Revision: https://reviews.llvm.org/D156700
The GPU makes use of different address spaces. We generally work with
global memory, thread private memory, and thread shared memory. This
patch simply adds a few preliminary wrappers to map these concepts to
the numerical values the backend uses. Obviously casts between these
will need to be checked by the user.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D156731
The other architectures use a brief sleep to defer work during this spin
loop that checks the RPC mailboxes. This patch adds one for x64 to
improve usage when running the server.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D156566
The number of trailing zeroes was being calculated incorrectly. It was
assuming that it could add all of the implicit leading zeroes in the
final block, not accounting for the number of digits actually reqested
by the precision.
Reviewed By: lntue
Differential Revision: https://reviews.llvm.org/D156489
Previously including SCUDO in a libc build with runtimes/ as root was
not possible since this code only checked for compiler-rt enabled via
LLVM_ENABLED_PROJECTS.
Reviewed By: thesamesam
Differential Revision: https://reviews.llvm.org/D156388
This patch adds support for `fread` on the GPU via the RPC mechanism.
Here we simply pass the size of the read to the server and then copy it
back to the client via the RPC channel. This should allow us to do the
basic operations on files now. This will obviously be slow for large
sizes due ot the number of RPC calls involved, this could be optimized
further by having a special RPC call that can initiate a memcpy between
the two pointers.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D155121
The expected number for the max ptrdiff value was expected to be exactly
4294967296 (2**32) for 32 bit systems, when it should be
4294967295 (2**32 - 1). This also adds a second test to check for this
case on non-32 bit systems.
Reviewed By: lntue, mikhail.ramalho
Differential Revision: https://reviews.llvm.org/D156257
Summray:
We landed some extra math support, which is apparently broken on the
max / min functions. the `mod` functions cannot be tested as they use
`std::limits` which don't exist in a freestanding environment. Also the
`blockstore` test seems to be broken. We will need to fix these in the
future but for now we need something in a workable state.
Reviewed By: jplehr
Differential Revision: https://reviews.llvm.org/D156329
It's necessary for the assert_fail function, so it needs to stay in for
the moment.
Reviewed By: alfredfo
Differential Revision: https://reviews.llvm.org/D156275
Previously displaying a failed assert would involve a runtime integer to
string conversion. This patch changes that to be a compile time string
conversion.
This was inspired by a comment by JonChesterfield on https://reviews.llvm.org/D155899
Reviewed By: lntue, sivachandra, JonChesterfield
Differential Revision: https://reviews.llvm.org/D156168
This patch does the noisy work of removing the test opcodes from the
exported interface to an interface that is only visible in `libc`. The
benefit of this is that we both test the exported RPC registration more
directly, and we do not need to give this interface to users.
I have decided to export any opcode that is not a "core" libc feature as
having its MSB set in the opcode. We can think of these as non-libc
"extensions".
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D154848
Currently we keep an internal buffer of device memory that is used to
indicate ownership of a port. Since we only use this as a single bit we
can simply turn this into a bitfield. I did this manually rather than
having a separate type as we need very special handling of the masks
used to interact with the locks.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D155511
The new printf writer design focuses on optimizing the fast path. It
inlines any write to a buffer or string, and by handling buffering
itself can more effectively work with both internal and external file
implementations. The overflow hook should allow for expansion to
asprintf with minimal extra code.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D153999
Clang supports the `-Wglobal-constructors` flag which will indicate if a
global constructor is being used. The current goal in `libc` is to make
the constructors `constexpr` to prevent this from happening with
straight construction. However, there are many other cases where we can
emit a constructor that this won't catch. This should give warning if
someone accidentally introduces a global constructor.
Reviewed By: michaelrj
Differential Revision: https://reviews.llvm.org/D155721
There are some cases when testing we want to override the logic for not
building tests if the loader is not present. This allows users to
specify an external binary that fulfils the same duties which will force
the tests to be built even without meeting the dependencies.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D155837
If the clock_freq symbol isn't used, and is removed,
we don't need to abort the loader. Can instead just not set it.
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D155832
This file required a global constructor due to copying the file stream
and have a non-constexpr constructor for the wrapper type. Also, I
changes the `opterr` to be a pointer, because it seemed like it wasn't
being set correctly as an externally visibile variable if we just
captured it by value.
Reviewed By: abrachet
Differential Revision: https://reviews.llvm.org/D155766
The `File` interface currently has a destructor to delete the buffer if
it is owned by the file. This is problematic for the globally allocated
`stdout`, `stdin`, and `stderr` files. This causes the file interface to
have global constructors to initialize the destructors to use these.
However, these never use the destructors because they don't own the
buffer. This patch removes the destructor and calls in manually in the
close implementation. The platform close should never need to access the
buffer and it needs to be done before clearing the whole thing, so this
should work.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D155762
HSA headers might be under a hsa/ directory or might not.
This scheme matches the one used by the openmp amdgpu plugin.
Reviewed By: jhuber6, jplehr
Differential Revision: https://reviews.llvm.org/D155812
The indirection here is for some reason causing an unnecessary
constructor. If we leave this uninitialized we will get the default
constructor which simply zero initliaizes the global. I've checked the
output and confirmed that it uses the `zeroinitializer` so this should
be safe.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D155720
This patch adds the `rpc_host_call` function as a GPU extension. This is
exported from the `libc` project to use the RPC interface to call a
function pointer via RPC any copying the arguments by-value. The
interface can only support a single void pointer argument much like
pthreads. The function call here is the bare-bones version of what's
required for OpenMP reverse offloading. Full support will require
interfacing with the mapping table, nowait support, etc.
I decided to test this interface in `libomptarget` as that will be the
primary consumer and it would be more difficult to make a test in `libc`
due to the testing infrastructure not really having a concept of the
"host" as it runs directly on the GPU as if it were a CPU target.
Reviewed By: jplehr
Differential Revision: https://reviews.llvm.org/D155003
Summary:
This caused test failures on the gfx90a buildbot. This works on my
gfx1030 and the Nvidia buildbots, so we'll need to investigate what is
going wrong here. For now revert it to get the bots green.
This reverts commit 05abcc5792.
This patch mostly renames files so it better reflects the function they declare.
Reviewed By: michaelrj
Differential Revision: https://reviews.llvm.org/D155607
The amount of spaces to pad with is stored in the variable
padding_spaces, previously the actual write calls used the same formula
to calculate the value. This simplifies and clarifies the values by just
reusing the variable.
Reviewed By: lntue
Differential Revision: https://reviews.llvm.org/D155113
MPFR has a minimum precision of 2, but the strtofloat fuzz sometimes
would request a precision of 1 for the case of the minimum subnormal.
This patch tells the fuzzer to ignore any case where the precision would
go below 2.
Reviewed By: lntue
Differential Revision: https://reviews.llvm.org/D155130
Currently we keep an internal buffer of device memory that is used to
indicate ownership of a port. Since we only use this as a single bit we
can simply turn this into a bitfield. I did this manually rather than
having a separate type as we need very special handling of the masks
used to interact with the locks.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D155511
A previous patch made this cause an error on the GPU. We have not yet
dedicated time towards an optimial implementaiton there but we do not
want it to cause an error. We simply use the fallback routines.
Differential Revision: https://reviews.llvm.org/D155615
This is a follow up on D154800 and D154770 to make the code structure more principled and avoid too many nested #ifdef/#endif.
Reviewed By: courbet
Differential Revision: https://reviews.llvm.org/D155515
This is a follow up on D154800 and D154770 to make the code structure more principled and avoid too many nested #ifdef/#endif.
Reviewed By: courbet
Differential Revision: https://reviews.llvm.org/D155181
This is a follow up on D154800 and D154770 to make the code structure more principled and avoid too many nested #ifdef/#endif.
Reviewed By: courbet
Differential Revision: https://reviews.llvm.org/D155174
This is a follow up on D154800 and D154770 to make the code structure more principled and avoid too many nested #ifdef/#endif.
Reviewed By: courbet
Differential Revision: https://reviews.llvm.org/D155099
This is a follow up on D154800 and D154770 to make the code structure more principled and avoid too many nested #ifdef/#endif.
Reviewed By: courbet
Differential Revision: https://reviews.llvm.org/D155076
This ensures that if someone calls the `rpc_shutdown` method multiple
times it will not segfault and gracefully continue. This was causing
problems in the OpenMP usage. This could point to other issues, but for
now this is a safe fix.
Differential Revision: https://reviews.llvm.org/D155005
Subnormal floating point numbers have a lower effective precision than
normal floating point numbers. This can cause issues for the fuzz test
since the MPFR floats have a constant precision regardless of the
exponent, and the precision must match exactly or else create rounding
errors. To solve this problem, the precision of the MPFR floats is
dynamically calculated.
Reviewed By: lntue
Differential Revision: https://reviews.llvm.org/D154909
CUDA requires a PTX feature to be compiled generally, because the
`libcgpu.a` archive contains LLVM-IR we need to have one present to
compile it. Currently, the wrapper fatbinary format we use to
incorporate these into single-source offloading languages has a special
option to provide this. Since this was not present in the builds, if the
user did not specify it via `-foffload-lto` it would not compile from
CUDA or OpenMP due to the missing PTX features. Fix this by passing it
to the packager invocation.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D154864
These functions have definitions differing between C and C++. GNU
respects the C++ definitions while the LLVM libc does not. This causes
many bugs and the current hack creates other issues. Rather than hack
around this I'd rather temporarily disable these than regress with the
integration into other offloading languages. We lose test support for
them but we should be able to re-enable these once the `libc` headers
provide these correctly.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D154850
D152592 introduced LIBC_INCLUDE_DIR for the location of the include
directory, use it in relevant CMake rules.
Differential Revision: https://reviews.llvm.org/D154278
There will be subsequent patches to move things around and make the file layout more principled.
Differential Revision: https://reviews.llvm.org/D154770
This is an alternate approach to the patches proposed in D153897 and
D153794. Rather than exporting a single header that can be included on
the GPU in all circumstances, this patch chooses to instead generate a
separate set of headers that only provides the declarations. This can
then be used by external tooling to set up what's on the GPU. This
leaves room for header hacks for offloading languages without needing to
worry about the `libc` implementation.
Currently this generates a set of headers that only contain the
declarations. These will then be installed to a new clang resource
directory called `llvm_libc_wrappers/` which will house the shim code.
We can then automaticlaly include this from `clang` when offloading to
wrap around the headers while specifying what's on the GPU.
Reviewed By: jdoerfert, JonChesterfield
Differential Revision: https://reviews.llvm.org/D154036
D152592 introduced LIBC_INCLUDE_DIR for the location of the include
directory, use it in relevant CMake rules.
Differential Revision: https://reviews.llvm.org/D154278
This patch adds the intial support for running an RPC server in
libomptarget to handle host services. We interface with the library
provided by the `libc` project to stand up a basic server. We introduce
a new type that is controlled by the plugin and has each device
intialize its interface. We then run a basic server to check the RPC
buffer.
This patch does not fully implement the interface. In the future each
plugin will want to define special handlers via the interface to support
things like malloc or H2D copies coming from RPC. We will also want to
allow the plugin to specify t he number of ports. This is currently
capped in the implementation but will be adjusted soon.
Right now running the server is handled by whatever thread ends up doing
the waiting. This is probably not a completely sound solution but I am
not overly familiar with the behaviour of OpenMP tasks and what would be
required here. This works okay with synchrnous regions, and somewhat
fine with `nowait` regions, but I've observed some weird behavior when
one of those regions calls `exit`.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D154312
AMDGPU supports aliases now, so we can drop this case and leave it only
for the NVPTX target. Unfortunately it's unlikely that NVPTX will be
able to support this in the future due to their PTX language being very
limited.
Reviewed By: lntue
Differential Revision: https://reviews.llvm.org/D154704
For machines with a lot of cores, hardware prefetchers can saturate the memory bus when utilization is high.
In this case it is desirable to turn off the hardware prefetcher completely.
This has a big impact on the performance of memory functions such as `memcpy` that rely on the fact that the next cache line will be readily available.
This patch adds the 'LIBC_COPT_MEMCPY_X86_USE_SOFTWARE_PREFETCHING' compile time option that generates a version of memcpy with software prefetching. While not fully restoring the original performances it mitigates the impact to an acceptable level.
Reviewed By: rtenneti
Differential Revision: https://reviews.llvm.org/D154494
This reverts commit a4a26374aa.
This was causing some problems with the CPU build and CUDA buildbot.
Revert until I can figure out what those issues are and fix them. I
believe it is just some CMake.
This is an alternate approach to the patches proposed in D153897 and
D153794. Rather than exporting a single header that can be included on
the GPU in all circumstances, this patch chooses to instead generate a
separate set of headers that only provides the declarations. This can
then be used by external tooling to set up what's on the GPU. This
leaves room for header hacks for offloading languages without needing to
worry about the `libc` implementation.
Currently this generates a set of headers that only contain the
declarations. These will then be installed to a new clang resource
directory called `llvm_libc_wrappers/` which will house the shim code.
We can then automaticlaly include this from `clang` when offloading to
wrap around the headers while specifying what's on the GPU.
Reviewed By: jdoerfert, JonChesterfield
Differential Revision: https://reviews.llvm.org/D154036
This patch adds the necessary support for the fopen and fclose functions
to work on the GPU via RPC. I added a new test that enables testing this
with the minimal features we have on the GPU. I will update it once we
have `fread` and `fwrite` to actually check the outputted strings. For
now I just relied on checking manually via the outpuot temp file.
Reviewed By: JonChesterfield, sivachandra
Differential Revision: https://reviews.llvm.org/D154519
Another low hanging fruit we can put on the GPU, this ports the tests
over to the hermetic framework so we can run them on the GPU.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D154540
Summary:
Reviewer requested that this routine not be a macro, however that means
that it was not being intitialized as the static initializer was done
before the memcpy from the device. Fix this so we can get timing
information.
This patch adds the necessary support to provide timing information in
`libc` tests. This is useful for determining which tests look what
amount of time. We also can use this as a test basis for providing more
fine-grained timing when implementing things on the GPU.
The main difficulty with this is the fact that the AMDGPU fixed
frequency clock operates at an unknown frequency. We need to read this
on a per-card basis from the driver and then copy it in. NVPTX on the
other hand has a fixed clock at a resolution of 1ns. I have also
increased the resolution of the print-outs as the majority of these are
below a millisecond for me.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D154446
The accuracy for the MPFR numbers in the strtofloat fuzz test was set
too high, causing rounding issues when rounding to a smaller final
result.
Reviewed By: lntue
Differential Revision: https://reviews.llvm.org/D154150
D152592 introduced LIBC_INCLUDE_DIR for the location of the include
directory, use it in relevant CMake rules.
Differential Revision: https://reviews.llvm.org/D154278
When crt1 isn't available, which is typical on baremetal, hermetic tests
aren't created and the hermetic test target won't be available.
Differential Revision: https://reviews.llvm.org/D154279
Summary:
This function is intended to only be used on the GPU as a shorthand. The
static assert should only fire if it's called ,but it seems that its
precence can sometimes cause issues and other times not. Simply remove
it as it's causing build problems.
Fix a bunch more instances of incorrect use of the `static`
keyword and missing use of LIBC_INLINE and LIBC_INLINE_VAR
macros. Note that even forward declarations and generic template
declarations must follow the prescribed patterns for libc code so
that they match every definition, all template specializations.
Reviewed By: Caslyn
Differential Revision: https://reviews.llvm.org/D154260
Implicit narrowing conversions from int to uint16_t
get a compiler warning with the warning settings used
in the Fuchsia build.
Reviewed By: michaelrj
Differential Revision: https://reviews.llvm.org/D154256
This patch makes sure that we always build the RPC server. The proposed
used for this is to begin integrating this server implementation into
`libomptarget`. That requires that we build this server ahead of time
when using a `LLVM_ENABLE_PROJECTS` build. Make a few tweaks to ensure
that the GCC compiler which may be used for this build doesn't complain.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D154105
This patch adds the other two methods to the server so the external
users can use the interface through the obfuscated interface.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D154224
This is based on ideas from @nafi to:
- use a branchless version of 'cmp' for 'uint32_t',
- completely resolve the lexicographic comparison through vector
operations when wide types are available. We also get rid of byte
reloads and serializing '__builtin_ctzll'.
I did not include the suggestion to replace comparisons of 'uint16_t'
with two 'uint8_t' as it did not seem to help the codegen. This can
be revisited in sub-sequent patches.
The code been rewritten to reduce nested function calls, making the
job of the inliner easier and preventing harmful code duplication.
Reviewed By: nafi3000
Differential Revision: https://reviews.llvm.org/D148717
This patch simply enables the `div`, `ldiv,` and, `lldiv` functions on
the GPU. This should be straightforward enough.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D154143
The RPC calls all have delays associated with them. Currently the `exit`
function does an async send and immediately exits the GPU. This can have
the effect that the RPC server never sees the exit call and we continue.
This patch changes that to first sync with the server before continuing
to perform its exit. There is still a hazard here, where the kernel can
complete before the RPC call reads back its response, but this is simply
multi-threaded hazards. This change ensures that the server *will*
always exit some time after the GPU exits.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D154112
Clean up exhaustive tests. Let check functions return number of failures instead of passed/failed.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D153682
The RPC client must be initialized to set a pointer to the underlying
buffer. This is currently done with the `reset` method which may not be
ideal for the use-case. We want runtimes to be able to initialize this
without needing to call a kernel. Recent changes allowed the `Client`
type to be trivially copyable. That means we can create a client on the
server side and then copy it over. To that end we take the existing
externally visible symbol and initialize it to the client's pointer.
Therefore we can look up the symbol and copy it over once loaded.
No test currently, I tested with a demo OpenMP application but couldn't think of
how to put that in-tree.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D153633