Prior to this change, we wouldn't build headers that aren't referenced
by other parts of the libc which would result in a build error during
installation. To address this, we make the header target a dependency of
the libc archive. Additionally, we also redo the install targets, moving
the install targets closer to build targets and simplifying the
hierarchy and generally matching what we do for other runtimes.
This header first appeared in 4.4BSD and is provided by a number of C
libraries including Newlib. Several of our embedded projects use this
header and so to make LLVM libc a drop-in replacement, we need to
provide it as well.
For the initial commit, we only implement singly linked variants (SLIST
and STAILQ). The doubly linked variants (LIST, TAILQ and CIRCLEQ) can be
implemented in the future as needed.
These were fixed properly by f1f1875c18.
- Revert "[libc] temporarily set -Wno-shorten-64-to-32 (#77396)"
- Revert "[libc] make off_t 32b for 32b arm (#77350)"
Fixes the following diagnostic:
llvm-project/libc/src/sys/mman/linux/mmap.cpp:44:59: error: implicit
conversion loses integer precision: 'off_t' (aka 'long long') to 'long'
[-Werror,-Wshorten-64-to-32]
size, prot, flags, fd, offset);
^~~~~~
It looks like off_t is a curious types on different platforms. FWICT,
it's 32b
on arm (at least for arm-linux-gnueabi) but 64b elsewhere (including 32b
riscv32-linux-gnu).
Summary:
A previous patch added a dependency on the stack protectors, this was
not built on the GPU targets so every test was disabled. It turns out
that disabled tests still get targets so we need to specifically check
if the it is in the target's set of entrypoints before we can use it.
Another patch, because the build-bot was down, snuck in that prevented
the new math tests from being run. The problem is that the `signal.h`
header requires target specific definitions but was being used
unconditionally. I have made changes that disable building this header
if the file is not defined in the config. This required disbaling the
signal_to_string utility, so that will simply be missing from targets
that don't define it.
When building with compiler-rt enabled, warnings such as the following
are
observed:
llvm-project/llvm/build/projects/compiler-rt/../libc/include/llvm-libc-macros/linux/sys-stat-macros.h:46:9:
warning: 'S_IXOTH' macro redefined [-Wmacro-redefined]
#define S_IXOTH 00001
^
llvm-project/llvm/build/projects/compiler-rt/../libc/include/llvm-libc-macros/linux/fcntl-macros.h:61:9:
note: previous definition is here
#define S_IXOTH 01
^
It looks like we have these multiply defined. Deduplicate these flags;
users
should expect to find them in sys/stat.h. S_FIFO was wrong anyways
(should
have been S_IFIFO).
Implement `prctl` as specified in
https://man7.org/linux/man-pages/man2/prctl.2.html.
This patch also includes test cases covering two simple use cases:
1. `PR_GET_NAME/PR_SET_NAME`: where userspace data is passed via arg2.
2. `PR_GET_THP_DISABLE`: where return value is passed via syscal retval.
This patch implements `hcreate(_r)/hsearch(_r)/hdestroy(_r)` as
specified in https://man7.org/linux/man-pages/man3/hsearch.3.html.
Notice that `neon/asimd` extension is not yet added in this patch.
- The implementation is largely simplified from rust's
[`hashbrown`](https://github.com/rust-lang/hashbrown/blob/master/src/raw/mod.rs)
as we only consider fix-sized insertion-only hashtables. Technical
details are provided in code comments.
- This patch also contains a portable string hash function, which is
derived from [`aHash`](https://github.com/tkaitchuck/aHash)'s fallback
routine. Not using any SIMD acceleration, it has a good enough quality
(passing all SMHasher tests) and is not too bad in speed.
- Some general functionalities are added, such as `memory_size`,
`offset_to`(alignment), `next_power_of_two`, `is_power_of_two`.
`ctz/clz` are extended to support shorter integers.
Summary:
The `fgets` function as implemented is not functional currently when
called with multiple threads. This is because we rely on reapeatedly
polling the character to detect EOF. This doesn't work when there are
multiple threads that may with to poll the characters. this patch pulls
out the logic into a standalone RPC call to handle this in a single
operation such that calling it from multiple threads functions as
expected. It also makes it less slow because we no longer make N RPC
calls for N characters.
Summary:
This function follows closely with the pattern of all the other
functions. That is, making a new opcode and forwarding the call to the
host. However, this also required modifying the test somewhat. It seems
that not all `libc` implementations follow the same error rules as are
tested here, and it is not explicit in the standard, so we simply
disable these EOF checks when targeting the GPU.
Summary:
This enum previously manually specified the value. This just made it
unnecessarily difficult to add new ones without changing everything.
This patch also makes it compatible with C by removing the `:`
annotation and instead using the `LAST` method.
Summary:
This patch adds the necessary entrypoints to handle the `fseek`,
`fflush`, and `ftell` functions. These are all very straightfoward, we
simply make RPC calls to the associated function on the other end.
Implementing it this way allows us to more or less borrow the state of
the stream from the server as we intentionally maintain no internal
state on the GPU device. However, this does not implement the `errno`
functinality so that must be ignored.
Summary:
Normally, the implementation of `puts` simply writes a second newline
charcter after printing the first string. However, because the GPU does
everything in batches of the SIMT group size, this will end up with very
poor output where you get the strings printed and then 1-64 newline
characters all in a row. Optimizations like to turn `printf` calls into
`puts` so it's a good idea to make this produce the expected output.
The least invasive way I could do this was to add a new opcode. It's a
little bloated, but it avoids an unneccessary and slow send operation to
configure this.
For file handling, we need more definitions from
linux/stat.h, so this pulls them in. It also adjusts other definitions
to match the kernel's exactly [NFC] so that it's easy to verify that
there's been no divergence one day when it's time to use linux/stat.h
directly.
Tested:
check-libc
Summary:
Previously, the `fread` operation was wrong in cases when we read less
data than was requested. That is, if we tried to read N bytes while the
file was in EOF, it would still copy N bytes of garbage. This is fixed
by only copying over the sizes we got from locally opening it rather
than just using the provided size.
Additionally, this patch simplifies the interface. The output functions
have special variants for writing to stdout / stderr. This is primarily
an optimization for these common cases so we can avoid sending the
stream as an argument which has a high delay. Because for input, we
already need to start with a `send` to tell the server how much data to
read, it costs us nothing to send the file along with it so this is
redundant. Re-use the file encoding scheme from the other
implementations, the one that stores the stream type in the LSBs of the
FILE pointer.
This patch updates the siginfo_t struct definition to match the
definition from the kernel here:
https://github.com/torvalds/linux/blob/master/include/uapi/asm-generic/siginfo.h
In particular, there are two main changes:
1. swap position of si_code and si_errno: si_code show come after
si_errno in all systems except MIPS. Since we don't MIPS, the order is
fixed for now, but can be easily \#ifdef'd if MIPS support is
implemented in the future.
2. We add a union of structs that are filled depending on the signal
raised.
This change was required for the fork and spawn integration tests in
rv32, since they fork/clone the running process, call
wait/waitid/waitpid, and read the status, which was wrong in rv32
because wait/waitid/waitpid are implemented in rv32 using SYS_waitid.
SYS_waitid takes a pointer to a siginfo_t and fills the proper fields in
the struct. The previous siginfo_t definition was being incorrectly
filled due to not taking into account the signal raised.
When doing a clean build from vscode, it makes the subdirectories in the
source tree rather than in the build folder. Elsehwere in LLVM, they
prefix the MAKE_DIRECTORY calls, so this appears to be the correct
approach.
Summary:
This patch implements the `fgets`, `getc`, `fgetc`, and `getchar`
functions on the GPU. Their implementations are straightforward enough.
One thing worth noting is that the implementation of `fgets` will be
extremely slow due to the high latency to read a single char. A faster
solution would be to make a new RPC call to call `fgets` (due to the
special rule that newline or null breaks the stream). But this is left
out because performance isn't the primary concern here.
This patch changes the size of time_t to be an int64_t. This still
follows the POSIX standard which only requires time_t to be an integer.
Making time_t a 64-bit integer also fixes two cases in 32 bits platforms
that use SYS_clock_nanosleep_time64 and SYS_clock_gettime64, as the name
of these calls implies, they require a 64-bit time_t. For instance, in rv32,
the 32-bit version of these syscalls is not available.
We also follow glibc here, where time_t is still a 32-bit integer in
arm32.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D159125
Summary:
This patch improves the implementation of the standard `rand()` function
by implementing it in terms of the xorshift64star pRNG as described in
https://en.wikipedia.org/wiki/Xorshift#xorshift*. This is a good,
general purpose random number generator that is sufficient for most
applications that do not require an extremely long period. This patch
also correctly initializes the seed to be `1` as described by the
standard. We also increase the `RAND_MAX` value to be `INT_MAX` as the
standard only specifies that it can be larger than 32768.
libc uses SYS_prlimit64 (which takes a struct rlimit64) to implement
setrlimt and getrlimit (which take a struct rlimit). In 64-bit bits
systems this is not an issue since the members of struct rlimit64 and
struct rlimit are 64 bits long, however, in 32-bit systems the members
of struct rlimit are only 32 bits long, causing wrong values being
passed to SYS_prlimit64.
This patch changes rlim_t to be __UINT64_TYPE__ (which also changes
rlimit as a side-effect), fixing the problem of mismatching types in
the syscall.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D159104
This function implements the `abort` function on the GPU. The
implementation here closely mirros the `exit` call where we first
synchornize with the RPC server to make sure it's listening and then we
exit on the GPU.
I was unsure if this should be a simple `__builtin_assert` on the GPU. I
elected to go with an RPC approach to make this a more "true" `abort`
call. That is, it should invoke some signal handlers and exit with the
proper code according to the implemented C library on the server.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D159210
Summary:
We should check for the GPU architectures first, since `__linux__` can
be set potentially during these compilations. Also the test needs to be
a hermetic test.
This patch implements the `clock()` function on the GPU. This function
is supposed to return a timestamp that can be converted into seconds
using the `CLOCKS_PER_SEC` macro. The GPU has a fixed frequency timer
that can be used for this purpose. However, there are some
considerations.
First is that AMDGPU does not have a statically known fixed frequency. I
know internally that the gfx10xx and gfx11xx series use a 100 MHz clock
which will probably remain for the future. Gfx9xx typically uses a 25
MHz clock except for the Vega 10 GPU. The only way to know for sure is
to look it up from the runtime. For this purpose, I elected to default
it to some known values and assign these to an exteranlly visible symbol
that can be initialized if needed. If we do not have a good guess we
just return zero.
Second is that the `CLOCKS_PER_SEC` macro only gives about a microsecond
of resolution. POSIX demands that it's 1,000,000 so it's best that we
keep with this tradition as almost all targets seem to respect this. The
reason this is important is because on the GPU we will almost assuredly
be copying the host's macro value (see the wrapper header) so we should
go with the POSIX version that's most likely to be set. (We could
probably make a warning if the included header doesn't match the
expected value).
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D159118
Currently all of this logic expects to include and link with the headers
in the `linux/` directory. This patch adds a wrapper macro to optionally
add the appropriate dependency if it exists. This is preliminary to
adding some GPU specific versions of these files.
Reviewed By: lntue
Differential Revision: https://reviews.llvm.org/D159021
limits.h currently interferes with Clang's limits.h. include_next
emits a warning because it is a GNU extension. Will re add this once
we figure out a good solution.
This reverts commits 13bbca8d69,
002cba0329, and
0fb3066873.
The LLVM-libc build itself will override include paths and prefer it's
own limits.h over the compiler's limits.h. Because we rely on the
compiler limits.h for numerical limits in LLVM-libc it needs to be
include_next:ed if not already included. The other method to work
around this is to define all numeric macros in place.
Signed-off-by: Alfred Persson Forsberg <cat@catcream.org>
Reviewed By: thesamesam
Differential Revision: https://reviews.llvm.org/D158040
This header contains implementation specific constants.
The compiler already provides its own limits.h with numerical limits
conforming to freestanding ISO C. But it is missing extensions like
POSIX, and does for example not include <linux/limits.h> which is
expected on a Linux system, therefore, an LLVM libc implementation of
limits.h is needed for hosted (__STDC_HOSTED__) environments.
Reviewed By: michaelrj
Differential Revision: https://reviews.llvm.org/D156961
The v variants of the printf functions take their variadic arguments as
a va_list instead of as individual arguments. They are otherwise
identical to the corresponding printf variants. This patch adds them
(vprintf, vfprintf, vsprintf, and vsnprintf) as well as tests.
Reviewed By: phosek
Differential Revision: https://reviews.llvm.org/D157138
This patch adds a bunch of ifdefs to handle the 32 bit versions of
some syscalls, which often only append a 64 to the name of the syscall
(with exception of SYS_lseek -> SYS_llseek and SYS_futex ->
SYS_futex_time64)
This patch also tries to handle cases where wait4 is not available
(as in riscv32): to implement wait, wait4 and waitpid when wait4 is
not available, we check for alternative wait calls and ultimately rely
on waitid to implement them all.
In riscv32, only waitid is available, so we need it to support this
platform.
Reviewed By: michaelrj
Differential Revision: https://reviews.llvm.org/D148371
This patch adds support for `fread` on the GPU via the RPC mechanism.
Here we simply pass the size of the read to the server and then copy it
back to the client via the RPC channel. This should allow us to do the
basic operations on files now. This will obviously be slow for large
sizes due ot the number of RPC calls involved, this could be optimized
further by having a special RPC call that can initiate a memcpy between
the two pointers.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D155121
This patch does the noisy work of removing the test opcodes from the
exported interface to an interface that is only visible in `libc`. The
benefit of this is that we both test the exported RPC registration more
directly, and we do not need to give this interface to users.
I have decided to export any opcode that is not a "core" libc feature as
having its MSB set in the opcode. We can think of these as non-libc
"extensions".
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D154848
This patch adds the `rpc_host_call` function as a GPU extension. This is
exported from the `libc` project to use the RPC interface to call a
function pointer via RPC any copying the arguments by-value. The
interface can only support a single void pointer argument much like
pthreads. The function call here is the bare-bones version of what's
required for OpenMP reverse offloading. Full support will require
interfacing with the mapping table, nowait support, etc.
I decided to test this interface in `libomptarget` as that will be the
primary consumer and it would be more difficult to make a test in `libc`
due to the testing infrastructure not really having a concept of the
"host" as it runs directly on the GPU as if it were a CPU target.
Reviewed By: jplehr
Differential Revision: https://reviews.llvm.org/D155003
This is an alternate approach to the patches proposed in D153897 and
D153794. Rather than exporting a single header that can be included on
the GPU in all circumstances, this patch chooses to instead generate a
separate set of headers that only provides the declarations. This can
then be used by external tooling to set up what's on the GPU. This
leaves room for header hacks for offloading languages without needing to
worry about the `libc` implementation.
Currently this generates a set of headers that only contain the
declarations. These will then be installed to a new clang resource
directory called `llvm_libc_wrappers/` which will house the shim code.
We can then automaticlaly include this from `clang` when offloading to
wrap around the headers while specifying what's on the GPU.
Reviewed By: jdoerfert, JonChesterfield
Differential Revision: https://reviews.llvm.org/D154036
This reverts commit a4a26374aa.
This was causing some problems with the CPU build and CUDA buildbot.
Revert until I can figure out what those issues are and fix them. I
believe it is just some CMake.
This is an alternate approach to the patches proposed in D153897 and
D153794. Rather than exporting a single header that can be included on
the GPU in all circumstances, this patch chooses to instead generate a
separate set of headers that only provides the declarations. This can
then be used by external tooling to set up what's on the GPU. This
leaves room for header hacks for offloading languages without needing to
worry about the `libc` implementation.
Currently this generates a set of headers that only contain the
declarations. These will then be installed to a new clang resource
directory called `llvm_libc_wrappers/` which will house the shim code.
We can then automaticlaly include this from `clang` when offloading to
wrap around the headers while specifying what's on the GPU.
Reviewed By: jdoerfert, JonChesterfield
Differential Revision: https://reviews.llvm.org/D154036
This patch adds the necessary support for the fopen and fclose functions
to work on the GPU via RPC. I added a new test that enables testing this
with the minimal features we have on the GPU. I will update it once we
have `fread` and `fwrite` to actually check the outputted strings. For
now I just relied on checking manually via the outpuot temp file.
Reviewed By: JonChesterfield, sivachandra
Differential Revision: https://reviews.llvm.org/D154519
When LLVM_ENABLE_PER_TARGET_RUNTIME_DIR is enabled, place headers
in `include/<target>` directory, otherwise use `include/`.
Differential Revision: https://reviews.llvm.org/D152592