Following up the discussion at
https://github.com/llvm/llvm-project/pull/73469#discussion_r1409593911
by @nickdesaulniers.
According to FreeBSD implementation
(https://android.googlesource.com/platform/bionic/+/refs/heads/main/libc/upstream-freebsd/lib/libc/stdlib/hcreate.c),
`hsearch` is able to handle the cases where the global table is not
properly initialized. To do this, FreeBSD actually allows hash table to
be dynamically resized. If the global table is uninitialized at the
first call, the table will be initialized with a minimal size; hence
subsequent insertion will be reasonable as the table grows
automatically.
This patch mimic such behaviors. More precisely, this patch introduces:
1. a full table iterator that scans each element in the table,
2. a resize routine that is automatically triggered whenever the load
factor is reached where it iterates the old table and insert the entries
into a new one,
3. more tests that cover the growth of the table.
This reverts commit 606653091d.
Post submit buildbots are now red. We can use these explicit errors to better
clean up existing warnings, then reland this.
Link: #73966
A recent commit introduced warnings observable when building unit tests.
If the
unit tests don't fail when warnings are introduced into the build, then
we
might fail to notice them in the stream of output from check-libc.
Link: https://github.com/llvm/llvm-project/pull/72763/files#r1410932348
Some of the files in the docs/ directory are from 2019 and haven't been
updated since. This patch updates implementation_standard.rst,
source_tree_layout.rst, and has some minor fixes for strings.rst. It
also marks the most severely out of date files with a warning. These
files will be updated in a later patch.
Floating point properties are a combination of target OS, target
architecture and compiler support.
- Adding target OS detection,
- Moving floating point type detection to its own file.
This is in preparation of adding support for `_Float16` which requires
testing compiler **version** and target architecture.
Implements the `nexttoward`, `nexttowardf` and `nexttowardl` functions.
Also, raise excepts required by the standard in `nextafter` functions.
cc: @lntue
We compute `pow(x, y)` using the formula
```
pow(x, y) = x^y = 2^(y * log2(x))
```
We follow similar steps as in `log2f(x)` and `exp2f(x)`, by breaking
down into `hi + mid + lo` parts, in which `hi` parts are computed using
the exponent field directly, `mid` parts will use look-up tables, and
`lo` parts are approximated by polynomials.
We add some speedup for common use-cases:
```
pow(2, y) = exp2(y)
pow(10, y) = exp10(y)
pow(x, 2) = x * x
pow(x, 1/2) = sqrt(x)
pow(x, -1/2) = rsqrt(x) - to be added
```
Summary:
We previously had to disable these string functions because they were
not compatible with the definitions coming from the GNU / host
environment. The GPU, when exporting its declarations, has a very
difficult requirement that it be compatible with the host environment as
both sides of the compilation need to agree on definitions and what's
present.
This patch more or less gives up an just copies the definitions as
expected by `glibc` if they are provided that way, otherwise we fall
back to the accepted way. This is the alternative solution to an
existing PR which instead disable's GCC's handling.
Summary:
This function follows closely with the pattern of all the other
functions. That is, making a new opcode and forwarding the call to the
host. However, this also required modifying the test somewhat. It seems
that not all `libc` implementations follow the same error rules as are
tested here, and it is not explicit in the standard, so we simply
disable these EOF checks when targeting the GPU.
Implementing expm1 function for double precision based on exp function
algorithm:
- Reduced x = log2(e) * (hi + mid1 + mid2) + lo, where:
* hi is an integer
* mid1 * 2^-6 is an integer
* mid2 * 2^-12 is an integer
* |lo| < 2^-13 + 2^-30
- Then exp(x) - 1 = 2^hi * 2^mid1 * 2^mid2 * exp(lo) - 1 ~ 2^hi *
(2^mid1 * 2^mid2 * (1 + lo * P(lo)) - 2^(-hi) )
- We evaluate fast pass with P(lo) is a degree-3 Taylor polynomial of
(e^lo - 1) / lo in double precision
- If the Ziv accuracy test fails, we use degree-6 Taylor polynomial of
(e^lo - 1) / lo in double double precision
- If the Ziv accuracy test still fails, we re-evaluate everything in
128-bit precision.
Summary:
These wrapper headers need to work around things in the standard
headers. The existing workarounds didn't correctly handle the macros for
`iscascii` and `toascii`. Additionally, `memrchr` can't be used because
it has a different declaration for C++ mode. Fix this so it can be
compiled.
Summary:
This patch adds the necessary entrypoints to handle the `fseek`,
`fflush`, and `ftell` functions. These are all very straightfoward, we
simply make RPC calls to the associated function on the other end.
Implementing it this way allows us to more or less borrow the state of
the stream from the server as we intentionally maintain no internal
state on the GPU device. However, this does not implement the `errno`
functinality so that must be ignored.
In a previous patch, the printf writer was rewritten to use a single
writer class with a buffer and a callback hook. This patch refactors
scanf's reader to match conceptually.
Summary:
This patch implements the `fgets`, `getc`, `fgetc`, and `getchar`
functions on the GPU. Their implementations are straightforward enough.
One thing worth noting is that the implementation of `fgets` will be
extremely slow due to the high latency to read a single char. A faster
solution would be to make a new RPC call to call `fgets` (due to the
special rule that newline or null breaks the stream). But this is left
out because performance isn't the primary concern here.
In the document on undefined behavior, I noted that writing down your
decisions is very important. This document contains all the information
for compile flags and undefined behavior for our printf.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D158311
This patch adds the long double table option for printf into the new
configuration scheme. This allows it to be set for most targets but
unset for baremetal.
Summary:
This patch simply adds the necessary config to enable qsort and bsearch
on the GPU. It is *highly* unlikely that anyone will use these, as they
are single threaded, but we may as well support all entrypoints that we
can.
Summary:
This patch implements fwrite, putc, putchar, and fputc on the GPU. These
are very straightforward, the main difference for the GPU implementation
is that we are currently ignoring `errno`. This patch also introduces a
minimal smoke test for `putc` that is an exact copy of the `puts` test
except we print the string char by char. This also modifies the `fopen`
test to use `fwrite` to mirror its use of `fread` so that it is tested
as well.
This patch adds the necessary support to provide `assert` functionality
through the GPU `libc` implementation. This implementation creates a
special-case GPU implementation rather than relying on the common
version. This is because the GPU has special considerings for printing.
The assertion is printed out in chunks with `write_to_stderr`, however
when combined with the GPU execution model this causes 32+ threads to
all execute in-lock step. Meaning that we'll get a horribly fragmented
message. Furthermore, potentially thousands of threads could hit the
assertion at once and try to print even if we had it all in one
`printf`.
This is solved by having a one-time lock that each thread group / wave /
warp will attempt to claim. We only let one thread group pass through
while the others simply stop executing. Finally only the first thread in
that group will do the printing until we finally abort execution.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D159296
This function implements the `abort` function on the GPU. The
implementation here closely mirros the `exit` call where we first
synchornize with the RPC server to make sure it's listening and then we
exit on the GPU.
I was unsure if this should be a simple `__builtin_assert` on the GPU. I
elected to go with an RPC approach to make this a more "true" `abort`
call. That is, it should invoke some signal handlers and exit with the
proper code according to the implemented C library on the server.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D159210
The GPU has the ability to sleep for very short periods of time. We can
map this to the existing `nanosleep` utility. This patch maps the
nanosleep utility to the existing hardware instructions as best as
possible.
Depends on D159118
Reviewed By: JonChesterfield, sivachandra
Differential Revision: https://reviews.llvm.org/D159225
This patch implements the `clock()` function on the GPU. This function
is supposed to return a timestamp that can be converted into seconds
using the `CLOCKS_PER_SEC` macro. The GPU has a fixed frequency timer
that can be used for this purpose. However, there are some
considerations.
First is that AMDGPU does not have a statically known fixed frequency. I
know internally that the gfx10xx and gfx11xx series use a 100 MHz clock
which will probably remain for the future. Gfx9xx typically uses a 25
MHz clock except for the Vega 10 GPU. The only way to know for sure is
to look it up from the runtime. For this purpose, I elected to default
it to some known values and assign these to an exteranlly visible symbol
that can be initialized if needed. If we do not have a good guess we
just return zero.
Second is that the `CLOCKS_PER_SEC` macro only gives about a microsecond
of resolution. POSIX demands that it's 1,000,000 so it's best that we
keep with this tradition as almost all targets seem to respect this. The
reason this is important is because on the GPU we will almost assuredly
be copying the host's macro value (see the wrapper header) so we should
go with the POSIX version that's most likely to be set. (We could
probably make a warning if the included header doesn't match the
expected value).
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D159118
Implement double precision exp2 function correctly rounded for all
rounding modes. Using the same algorithm as double precision exp function in
https://reviews.llvm.org/D158551.
Reviewed By: zimmermann6
Differential Revision: https://reviews.llvm.org/D158812
Implement double precision exp function correctly rounded for all
rounding modes. Using 4 stages:
- Range reduction: reduce to `exp(x) = 2^hi * 2^mid1 * 2^mid2 * exp(lo)`.
- Use 64 + 64 LUT for 2^mid1 and 2^mid2, and use cubic Taylor polynomial to
approximate `(exp(lo) - 1) / lo` in double precision. Relative error in this
step is bounded by 1.5 * 2^-63.
- If the rounding test fails, use degree-6 Taylor polynomial to approximate
`exp(lo)` in double-double precision. Relative error in this step is bounded by
2^-99.
- If the rounding test still fails, use degree-7 Taylor polynomial to compute
`exp(lo)` in ~128-bit precision.
Reviewed By: zimmermann6
Differential Revision: https://reviews.llvm.org/D158551
This patch implements the `fopen`, `fclose`, and `fread` functions on
the GPU. These are pretty much re-implemented from what existed but
using the new interface. Having this subset allows us to test the
interface a bit more strenuously since we can write and read to a file.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D157622
The GPU has much tighter requirements for handling IO functions.
Previously we attempted to define the GPU as one of the platform files.
Using a common interface allowed us to easily define these functions
without much extra work. However, it became more clear that this was a
poor fit for the GPU. The file interface uses function pointers, which
prevented inlining and caused bad perfromance and resource usage on the
GPU. Further, using an actual `FILE` type rather than referring to it as
a host stub prevented us from usin files coming from the host on the GPU
device.
After talking with @sivachandra, the approach now is to simply define
GPU specific versions of the functions we intend to support. Also, we
are ignoring `errno` for the time being as it is unlikely we will ever
care about supporting it fully.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D157427
Update documentaiton now that macros are laid out in a more structured way.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D143911
This patch does the noisy work of removing the test opcodes from the
exported interface to an interface that is only visible in `libc`. The
benefit of this is that we both test the exported RPC registration more
directly, and we do not need to give this interface to users.
I have decided to export any opcode that is not a "core" libc feature as
having its MSB set in the opcode. We can think of these as non-libc
"extensions".
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D154848
These functions have definitions differing between C and C++. GNU
respects the C++ definitions while the LLVM libc does not. This causes
many bugs and the current hack creates other issues. Rather than hack
around this I'd rather temporarily disable these than regress with the
integration into other offloading languages. We lose test support for
them but we should be able to re-enable these once the `libc` headers
provide these correctly.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D154850
This patch adds the necessary support for the fopen and fclose functions
to work on the GPU via RPC. I added a new test that enables testing this
with the minimal features we have on the GPU. I will update it once we
have `fread` and `fwrite` to actually check the outputted strings. For
now I just relied on checking manually via the outpuot temp file.
Reviewed By: JonChesterfield, sivachandra
Differential Revision: https://reviews.llvm.org/D154519
Another low hanging fruit we can put on the GPU, this ports the tests
over to the hermetic framework so we can run them on the GPU.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D154540
This patch simply enables the `div`, `ldiv,` and, `lldiv` functions on
the GPU. This should be straightforward enough.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D154143
The RPC calls all have delays associated with them. Currently the `exit`
function does an async send and immediately exits the GPU. This can have
the effect that the RPC server never sees the exit call and we continue.
This patch changes that to first sync with the server before continuing
to perform its exit. There is still a hazard here, where the kernel can
complete before the RPC call reads back its response, but this is simply
multi-threaded hazards. This change ensures that the server *will*
always exit some time after the GPU exits.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D154112
This provides some basic motivation behind the GPU libc. Suggests are welcome.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D152028