The %p format wasn't correctly passing along flags and modifiers to the
integer conversion behind the scenes. This patch fixes that behavior, as
well as changing the nullptr behavior to be a string conversion behind
the scenes.
Reviewed By: lntue, jhuber6
Differential Revision: https://reviews.llvm.org/D159458
libc uses SYS_prlimit64 (which takes a struct rlimit64) to implement
setrlimt and getrlimit (which take a struct rlimit). In 64-bit bits
systems this is not an issue since the members of struct rlimit64 and
struct rlimit are 64 bits long, however, in 32-bit systems the members
of struct rlimit are only 32 bits long, causing wrong values being
passed to SYS_prlimit64.
This patch changes rlim_t to be __UINT64_TYPE__ (which also changes
rlimit as a side-effect), fixing the problem of mismatching types in
the syscall.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D159104
A recent patch required the implementation to define `LIBC_NAMESPACE`.
For GPU offloading we provide a static library whose internal
implementation relies on the `libc` headers. This is a separate library
that is constructed during the "bootstrap" phase. This patch moves the
definition of the `LIBC_NAMESPACE` CMake variable up so its available
during bootstrapping and adds it to the definition of the RPC server.
These functions were implemented by simply calling their `__builtin_*`
equivalents.
The builtins were resolving to the libc functions back again. This patch
adds explicit
vendor versions for these functions to avoid the recursion.
This test creates a time_t variable and assigns 0xfffffffffe1d7b01 which
overflows the maximum time_t value for 64-bit time_t, then checks if the
syscall fails and errno was set.
In systems with sizeof(time_t) == 4, the value is narrowed down to
0xfe1d7b01 and doesn't overflow, causing the test to fail.
This patch then disables the test on systems with 32 bits long time_t.
This patch fixes the overflow check in update_from_seconds, used by
gmtime, gmtime_r and mktime.
In update_from_seconds, total_seconds is a int64_t and the previous
overflow check for when sizeof(time_t) == 4 would check if it was <
0x80000000 and > 0x7FFFFFFF, however, this check would cause the
following issues:
1. Valid negative numbers would be discarded, e.g., -1 is
0xffffffffffffffff as a int64_t, outside the range of the overflow
check.
2. Some valid positive numbers would be discarded because the hex
constants were being implicitly converted to int64_t, e.g., 0x80000000
would be implicitly converted to 2147483648, instead of -2147483648.
The fix for both cases was to static_cast total_seconds and the
constants to time_t if sizeof(time_t) == 4. The behaviour is not changed
in systems with sizeof(time_t) == 8.
---------
Signed-off-by: Mikhail R. Gadelha <mikhail@igalia.com>
* replaces `add_rvalue_reference_t` with `is_rvalue_reference_t`
* includes `"stddef.h"` for `size_t` include
---------
Co-authored-by: Guillaume Chatelet <gchatelet@google.com>
Summary:
A previous introduced a new object type for the GPU functions
implemented by an external vendor library. This was done so they we did
not attempt to run tests on functions which we did not implement,
however this accidentally stopped them from being included in the actual
output. Fix this by checking the new type as well.
The long term goal is to remove this vendor handling altogether, but is
being used as a short-term solution to provide a math library on the
GPU which currently lacks one.
Previously, these tests expected that calling mktime with a struct tm
that caused overlow to succeed with return -1
(TimeConstants::OUT_OF_RANGE_RETURN_VALUE), however, the Succeeds call
expects the errno to be zero (no failure).
This patch fixes the expected calls to fail with EOVERFLOW. These tests
are only enabled to 32-bit systems, and are probably not being tested on
the arm32 buildbot, that's why this was not a problem before.
This test was setting tv_nsec to a negative value, which as per the
standard this is an EINVAL:
The value in the tv_nsec field was not in the range [0, 999999999] or
tv_sec was negative.
https://man7.org/linux/man-pages/man2/nanosleep.2.html
In patch D157792, the calls to SYS_llseek/SYS_llseek for 32-bit systems
were fixed in lseek.cpp but there was another implementation in file.cpp
that was missed.
To reduce the code duplication, this patch unifies both call sites to
use a new lseekimpl function.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D159208
`type_traits` and `utility` refer to each other in their
implementations. Also `type_traits` starts to become too big to be
manageable. This PR splits each function into individual files. FTR this
is [how libcxx handles large headers as
well](https://github.com/llvm/llvm-project/tree/main/libcxx/include/__type_traits).
The reland adds two missing functions : is_destructible_v and is_reference_v
`type_traits` and `utility` refer to each other in their
implementations. Also `type_traits` starts to become too big to be
manageable. This PR splits each function into individual files. FTR this
is [how libcxx handles large headers as
well](https://github.com/llvm/llvm-project/tree/main/libcxx/include/__type_traits).
Few printf config options have been setup using this new config system
along with their baremetal overrides. A follow up patch will add generation
of doc/config.rst, which will contain the full list of libc config options
and short description explaining how they affect the libc.
Reviewed By: gchatelet
Differential Revision: https://reviews.llvm.org/D159158
It's very important that the GPU build does not include any system
directories. We currently use `-ffreestanding` to disable a lot of these
features, but we can still accidentally include them if they are not
provided by `libc` yet. This patch changes this to use `-nostdinc` to
disable all standard search paths. Then we use the compiler's resource
directory to pick up the provided headers like `stdint.h`.
Differential Revision: https://reviews.llvm.org/D159445
sys/signal.h currently does not exist in LLVM-libc. Both glibc and
musl provides this header with simply:
> #include <signal.h>
(and a warning in musl libc)
If you try to build LLVM-libc in a sysroot with only LLVM-libc headers
and linux-headers installed then this will fail the build due to
missing headers. However, if --sysroot is not passed, then the
LLVM-libc build will pick up the header from the system which in turn
includes LLVM-libc's signal.h.
LLVM-libc includes <signal.h> in all other cases, so this simply
matches that. Providing <sys/signal.h> could also be done (instead)
for glibc compatibility.
Summary:
The HSA API explicitly states that the size is a count of uint32_t's not
a byte count. This was erroneously being used as a simple memcpy,
causing some weird behaviour. Fix this by correctly passing `1` to
initialize a single integer to zero.
This patch adds the necessary support to provide `assert` functionality
through the GPU `libc` implementation. This implementation creates a
special-case GPU implementation rather than relying on the common
version. This is because the GPU has special considerings for printing.
The assertion is printed out in chunks with `write_to_stderr`, however
when combined with the GPU execution model this causes 32+ threads to
all execute in-lock step. Meaning that we'll get a horribly fragmented
message. Furthermore, potentially thousands of threads could hit the
assertion at once and try to print even if we had it all in one
`printf`.
This is solved by having a one-time lock that each thread group / wave /
warp will attempt to claim. We only let one thread group pass through
while the others simply stop executing. Finally only the first thread in
that group will do the printing until we finally abort execution.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D159296
This function implements the `abort` function on the GPU. The
implementation here closely mirros the `exit` call where we first
synchornize with the RPC server to make sure it's listening and then we
exit on the GPU.
I was unsure if this should be a simple `__builtin_assert` on the GPU. I
elected to go with an RPC approach to make this a more "true" `abort`
call. That is, it should invoke some signal handlers and exit with the
proper code according to the implemented C library on the server.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D159210
The inbox/outbox loads are performed by the current warp, not a single thread.
The outbox load indicates whether a port has been successfully opened. If some
lanes in the warp think it has and others think the port open failed, as the
warp happened to be diverged when the load occurred, all the subsequent control
flow will be incorrect.
The inbox load indicates whether the machine on the other side of the RPC channel
has progressed. If lanes in the warp have different ideas about that, some will
try to progress their state transition while others won't. As far as the RPC layer
is concerned this is a performance problem and not a correctness one - none of the lanes
can start the transition early, only miss it and start late - but in practice the calls
layered on top of RPC do not have the interface required to detect this event and retry
the load on the stalled lanes, so the calls layered on top will be broken.
None of this is broken on amdgpu, but it's likely that the readfirstlane will have
beneficial performance properties there. Possible significant enough that it's
worth landing this ahead of fixing gpu::broadcast_value on volta.
Essentially volta wasn't adequately considered when writing this part of the protocol.
It's a bug present in the initial prototype and propagated thus far, because none of
the test cases push volta into a warp diverged state in the middle of the RPC sequence.
We should have some test cases for volta where port_open and equivalent are called
from diverged warps.
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D159276
The GPU has the ability to sleep for very short periods of time. We can
map this to the existing `nanosleep` utility. This patch maps the
nanosleep utility to the existing hardware instructions as best as
possible.
Depends on D159118
Reviewed By: JonChesterfield, sivachandra
Differential Revision: https://reviews.llvm.org/D159225
Some older gcc toolchains don't define these on 32 bit platforms. This
is a problem for pigweed which uses an older gcc toolchain and targets
32 bit.
Differential Revision: https://reviews.llvm.org/D157112
Summary:
We should check for the GPU architectures first, since `__linux__` can
be set potentially during these compilations. Also the test needs to be
a hermetic test.
This patch implements the `clock()` function on the GPU. This function
is supposed to return a timestamp that can be converted into seconds
using the `CLOCKS_PER_SEC` macro. The GPU has a fixed frequency timer
that can be used for this purpose. However, there are some
considerations.
First is that AMDGPU does not have a statically known fixed frequency. I
know internally that the gfx10xx and gfx11xx series use a 100 MHz clock
which will probably remain for the future. Gfx9xx typically uses a 25
MHz clock except for the Vega 10 GPU. The only way to know for sure is
to look it up from the runtime. For this purpose, I elected to default
it to some known values and assign these to an exteranlly visible symbol
that can be initialized if needed. If we do not have a good guess we
just return zero.
Second is that the `CLOCKS_PER_SEC` macro only gives about a microsecond
of resolution. POSIX demands that it's 1,000,000 so it's best that we
keep with this tradition as almost all targets seem to respect this. The
reason this is important is because on the GPU we will almost assuredly
be copying the host's macro value (see the wrapper header) so we should
go with the POSIX version that's most likely to be set. (We could
probably make a warning if the included header doesn't match the
expected value).
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D159118
This patch adds two new macros to setjmp (STORE, STORE_FP) and two new
macros to longjmp (LOAD, LOAD_FP) that takes a register and a buff, then
select the correct asm instruction for rv32 and rv64.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D158640
This patch changes the instruction in set_thread_ptr from ld to mv,
as rv32 doesn't have the ld instruction, and mv is supported by both
rv32 and rv64.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D159110
This patch changes a test case that tests for overflow when time_t is
32-bit long, however, it was checking size_t instead of time_t.
This in on par with other testcases that correctly check the size of
time_t (asctime_test.cpp, gmtime_r_test.cpp and gmtime_test.cpp).
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D159113
Currently all of this logic expects to include and link with the headers
in the `linux/` directory. This patch adds a wrapper macro to optionally
add the appropriate dependency if it exists. This is preliminary to
adding some GPU specific versions of these files.
Reviewed By: lntue
Differential Revision: https://reviews.llvm.org/D159021
This patch adds some extra cases to the existing argument list test in
`libc`, mainly dealing with arguments of varying sizes and primitive
types.
The purpose of this patch is to provide a wider test area when we begin
to provide varargs support on the GPU as there is no other runtime code
that really tests it. So, running these tests more exhaustively in the
GPU libc project will serve as the runtime tests for GPU vararg support in
D158246.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D158867