This makes the interface less error prone. The acquire was previously
forgotten. Release is currently missing if recv() is the last operation made
before close.
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D153571
When LLVM_ENABLE_PER_TARGET_RUNTIME_DIR is enabled, place headers
in `include/<target>` directory, otherwise use `include/`.
Differential Revision: https://reviews.llvm.org/D152592
This patch prepares the RPC interface to be installed. We place this in
the existing `llvm-gpu-none` directory as it will also give us access to
the generated `libc` headers for the opcodes.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D153040
Once integrated in our codebase the patch triggered a bunch of failing
tests. We do not yet understand where the bug is but we revert it to
move forward with integration.
This reverts commit 5e32765c15ab8df3d2635a2bb5078c5b1d5714d5.
Before this change, a separate static method named cleanup was used to
cleanup the file. Instead, now the close method cleans up the full file
object using the platform's close function.
Reviewed By: lntue
Differential Revision: https://reviews.llvm.org/D153377
This pass used to cause huge compile time regressions, That has been
address and can now be re-added.
Differential Revision: https://reviews.llvm.org/D153374
Currently the implementation of the RPC interface requires a flexible
struct. This caused problems when compilling the RPC server with GCC as
would be required if trying to export the RPC server interface. This
required that we either move to the `x[1]` workaround or make it a
template parameter. While just using `x[1]` would be much less noisy,
this is technically undefined behavior. For this reason I elected to use
templates.
The downside to using templates is that the server code must now be able
to handle multiple different types at runtime. I was unable to find a
good solution that didn't rely on type erasure so I simply branch off of
the given value.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D153304
This patch:
(1) adds the add_with_carry_const and sub_with_borrow_const constexpr calls
to add and sub, respectively. Both add and sub are constexpr calls and
were call the non-constexpr version of add/sub_with_borrow.
(2) adds explicit UIntType construct calls in some fp tests.
Reviewed By: lntue
Differential Revision: https://reviews.llvm.org/D150223
Recently the AMDGPU backend automatically enables a pass to optimize
atomics. This results in the LTO build taking about 10x longer in all
cases. For now we disable this by default as was the case before the
patch in D152649.
Reviewed By: lntue
Differential Revision: https://reviews.llvm.org/D153232
This provides some basic motivation behind the GPU libc. Suggests are welcome.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D152028
Currently the GPU has restrictions on how many tests can be run in
parallel due to resource constraints. However, building these tests can
take a long time so we want to be able to build them in parallel. This
patch introduces the option `LIBC_GPU_TEST_JOBS` which is set to the
number of threads to run in parallel.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D153157
These currently give warnings for unused variables or a default case
where everything is covered.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D153137
This patch adds a test directly for the `fputs` function similar to the
existing `puts` test. This lets us know that the default file pointers
are function and the `fputs` interface works.
Reviewed By: lntue
Differential Revision: https://reviews.llvm.org/D152288
These are the only variables I could find that use LIBC_INLINE. Note, these are namespace scoped constexpr so local linkage is implied. inline is useful here to silence clang's unused-const-variable variable. For Fuchsia, the distinction between LIBC_INLINE and LIBC_INLINE_VAR is helpful because we define LIBC_INLINE as `[[gnu::always_inline]] inline` when building with gcc. This isn't meaningful on variables.
Alternatively, we could make these variables simply constexpr and also add `[[maybe_unused]]`
Reviewed By: sivachandra, mcgrathr
Differential Revision: https://reviews.llvm.org/D152951
Switching to this interface we neglected to actually write the output
from the malloc call to the RPC buffer. Fix this so the tests pass
again.
Differential Revision: https://reviews.llvm.org/D153069
The GPU port of the LLVM C library needs to export a few extensions to
the interface such that users can interface with it. This patch adds the
necessary logic to define a GPU extension. Currently, this only exports
a `rpc_reset_client` function. This allows us to use the server in
D147054 to set up the RPC interface outside of `libc`.
Depends on https://reviews.llvm.org/D147054
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D152283
This patch begins providing a generic static library that wraps around
the raw `rpc.h` interface. As discussed in the corresponding RFC,
https://discourse.llvm.org/t/rfc-libc-exporting-the-rpc-interface-for-the-gpu-libc/71030,
we want to begin exporting RPC services to external users. In order to
do this we decided to not expose the `rpc.h` header by wrapping around
its functionality. This is done with a C-interface as we make heavy use
of callbacks and allows us to provide a predictable interface.
Reviewed By: JonChesterfield, sivachandra
Differential Revision: https://reviews.llvm.org/D147054
These functions were previously removed due to problems running the
tests with `errno` in them. This was resolved previously by making the
internal implementation of these functions use a global `errno` so that
tests can still use `errno` functionality as long as they are run with a
single thread. This allows us to re-enable these tests as a previous
patch has also resolved the issue where the `stdlib` tests could not be
hermetic due to the dependence on system rounding functions.
Reviewed By: lntue
Differential Revision: https://reviews.llvm.org/D153016
This patch moves the definitions of the standard IO streams to the
platform file definition. This is necessary because previously we had a
level of indirection where the stream's `FILE *` was initialized based
on the pointer to the internal `__llvm_libc` version. This cannot be
resolved ahead of time by the linker because the address will not be
known until runtime. This caused the previous implementation to emit a
global constructor to initialize the pointer to the actual `FILE *`. By
moving these definitions so that we can bind their address to the
original file type we can avoid this global constructor.
This file keeps the entrypoints, but makes them empty files only
containing an external reference. This is so they still appear as
entrypoints and get emitted as declarations in the generated headers.
Reviewed By: lntue, sivachandra
Differential Revision: https://reviews.llvm.org/D152983
This adds the generic FMA utilities for the GPU. We implement these
through the builtins which map to the FMA instructions in the ISA. These
may not have strict compliance with other assumptions in the the `libc`
such as rounding modes. I've included the relevant information on how
the GPU vendors map the behaviour. This should help make it easier to
implement some future generic versions.
Depends on D152486
Reviewed By: lntue
Differential Revision: https://reviews.llvm.org/D152923
This patch adds an outline to begin adding a `libmgpu.a` file for
provindg math on the GPU. Currently, this is most likely going to be
wrapping around existing vendor libraries and placing them in a more
usable format. Long term, we would like to provide our own
implementations of math functions that can be used instead.
This patch works by simply forwarding the calls to the standard C math
library calls like `sin` to the appropriate vendor call like `__nv_sin`.
Currently, we will use the vendor libraries directly and link them in
via `-mlink-builtin-bitcode`. This is necessary because of bizarre
interactions with the generic bitcode, `-mlink-builtin-bitcode`
internalizes and only links in the used symbols, furthermore is
propagates the target's default attributes and its the only "truly"
correct way to pull in these vendor bitcode libraries without error.
If the vendor libraries are not availible at build time, we will still
create the `libmgpu.a`, but we will expect that the vendor library
definitions will be provided by the user's compilation as is made
possible by https://reviews.llvm.org/D152442.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D152486
Fixing an issue with LLVM libc's fenv.h defined rounding mode macros
differently from system libc, making get_round() return different values from
fegetround(). Also letting math tests to skip rounding modes that cannot be
set. This should allow math tests to be run on platforms in which fenv.h is not
implemented yet.
This allows us to re-enable hermatic floating point tests in
https://reviews.llvm.org/D151123 and reverting https://reviews.llvm.org/D152742.
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D152873
We define LIBC_INLINE to include [[clang::internal_linkage]], and these
must appear before other specifiers. Additionally, there was also a
missing cast that was causing warnings.
Differential Revision: https://reviews.llvm.org/D152865
This patch mimics the behavior of Google Test and allow users to log custom messages after all flavors of ASSERT_ / EXPECT_.
Reviewed By: sivachandra, lntue
Differential Revision: https://reviews.llvm.org/D152630
This patch mimics the behavior of Google Test and allow users to log custom messages after all flavors of ASSERT_ / EXPECT_.
Reviewed By: sivachandra, lntue
Differential Revision: https://reviews.llvm.org/D152630
Most of the time `memmove` is called on buffers that are disjoint, in that case we can use `memcpy` which is faster.
The additional test is branchless on x86, aarch64 and RISCV with the zbb extension (bitmanip).
On x86 this patch adds a latency of 2 to 3 cycles.
Before
```
--------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
--------------------------------------------------------------------------------
BM_Memmove/0/0_median 5.00 ns 5.00 ns 10 bytes_per_cycle=1.25477/s bytes_per_second=2.62933G/s items_per_second=199.87M/s __llvm_libc::memmove,memmove Google A
BM_Memmove/1/0_median 6.21 ns 6.21 ns 10 bytes_per_cycle=3.22173/s bytes_per_second=6.75106G/s items_per_second=160.955M/s __llvm_libc::memmove,memmove Google B
BM_Memmove/2/0_median 8.09 ns 8.09 ns 10 bytes_per_cycle=5.31462/s bytes_per_second=11.1366G/s items_per_second=123.603M/s __llvm_libc::memmove,memmove Google D
BM_Memmove/3/0_median 5.95 ns 5.95 ns 10 bytes_per_cycle=2.71865/s bytes_per_second=5.69687G/s items_per_second=167.967M/s __llvm_libc::memmove,memmove Google L
BM_Memmove/4/0_median 5.63 ns 5.63 ns 10 bytes_per_cycle=2.28294/s bytes_per_second=4.78383G/s items_per_second=177.615M/s __llvm_libc::memmove,memmove Google M
BM_Memmove/5/0_median 5.68 ns 5.68 ns 10 bytes_per_cycle=2.16798/s bytes_per_second=4.54295G/s items_per_second=176.015M/s __llvm_libc::memmove,memmove Google Q
BM_Memmove/6/0_median 7.46 ns 7.46 ns 10 bytes_per_cycle=3.97619/s bytes_per_second=8.332G/s items_per_second=134.044M/s __llvm_libc::memmove,memmove Google S
BM_Memmove/7/0_median 5.40 ns 5.40 ns 10 bytes_per_cycle=1.79695/s bytes_per_second=3.76546G/s items_per_second=185.211M/s __llvm_libc::memmove,memmove Google U
BM_Memmove/8/0_median 5.62 ns 5.62 ns 10 bytes_per_cycle=3.18747/s bytes_per_second=6.67927G/s items_per_second=177.983M/s __llvm_libc::memmove,memmove Google W
BM_Memmove/9/0_median 101 ns 101 ns 10 bytes_per_cycle=9.77359/s bytes_per_second=20.4803G/s items_per_second=9.9333M/s __llvm_libc::memmove,uniform 384 to 4096
```
After
```
BM_Memmove/0/0_median 3.57 ns 3.57 ns 10 bytes_per_cycle=1.71375/s bytes_per_second=3.59112G/s items_per_second=280.411M/s __llvm_libc::memmove,memmove Google A
BM_Memmove/1/0_median 4.52 ns 4.52 ns 10 bytes_per_cycle=4.47557/s bytes_per_second=9.37843G/s items_per_second=221.427M/s __llvm_libc::memmove,memmove Google B
BM_Memmove/2/0_median 5.70 ns 5.70 ns 10 bytes_per_cycle=7.37396/s bytes_per_second=15.4519G/s items_per_second=175.399M/s __llvm_libc::memmove,memmove Google D
BM_Memmove/3/0_median 4.47 ns 4.47 ns 10 bytes_per_cycle=3.4148/s bytes_per_second=7.15563G/s items_per_second=223.743M/s __llvm_libc::memmove,memmove Google L
BM_Memmove/4/0_median 4.53 ns 4.53 ns 10 bytes_per_cycle=2.86071/s bytes_per_second=5.99454G/s items_per_second=220.69M/s __llvm_libc::memmove,memmove Google M
BM_Memmove/5/0_median 4.19 ns 4.19 ns 10 bytes_per_cycle=2.5484/s bytes_per_second=5.3401G/s items_per_second=238.924M/s __llvm_libc::memmove,memmove Google Q
BM_Memmove/6/0_median 5.02 ns 5.02 ns 10 bytes_per_cycle=5.94164/s bytes_per_second=12.4505G/s items_per_second=199.14M/s __llvm_libc::memmove,memmove Google S
BM_Memmove/7/0_median 4.03 ns 4.03 ns 10 bytes_per_cycle=2.47028/s bytes_per_second=5.17641G/s items_per_second=247.906M/s __llvm_libc::memmove,memmove Google U
BM_Memmove/8/0_median 4.70 ns 4.70 ns 10 bytes_per_cycle=3.84975/s bytes_per_second=8.06706G/s items_per_second=212.72M/s __llvm_libc::memmove,memmove Google W
BM_Memmove/9/0_median 90.7 ns 90.7 ns 10 bytes_per_cycle=10.8681/s bytes_per_second=22.7739G/s items_per_second=11.02M/s __llvm_libc::memmove,uniform 384 to 4096
```
Reviewed By: courbet
Differential Revision: https://reviews.llvm.org/D152811
Add Int<> and Int128 types to replace the usage of __int128_t in math
functions. Clean up to make sure that (U)Int128 and __(u)int128_t are
interchangeable in the code base.
Reviewed By: sivachandra, mikhail.ramalho
Differential Revision: https://reviews.llvm.org/D152459
A patch enabled this test which uses that `add_fp_unittest`.
Unfortunately we do not support these on the GPU because it attempts to
link in the floating point utils which are not built supporting
hermetic tests. This was attempted to be fixed in D151123 but that had
to be reverted. For now disable these so the tests pass.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D152742
This patch adds the reentrent qsort entrypoint, qsort_r. This is done by
extending the qsort functionality and moving it to a shared utility
header. For this reason the qsort_r tests focus mostly on the places
where it differs from qsort, since they share the same sorting code.
Reviewed By: sivachandra, lntue
Differential Revision: https://reviews.llvm.org/D152467
The current argument types are currently switched around for ftruncate
and truncate. Currently passes tests because the internal definitions
inside the __llvm_libc namespace are fine.
Reviewed By: michaelrj, thesamesam, sivachandra
Differential Revision: https://reviews.llvm.org/D152664
This is based on ideas from @nafi to:
- use a branchless version of 'cmp' for 'uint32_t',
- completely resolve the lexicographic comparison through vector
operations when wide types are available. We also get rid of byte
reloads and serializing '__builtin_ctzll'.
I did not include the suggestion to replace comparisons of 'uint16_t'
with two 'uint8_t' as it did not seem to help the codegen. This can
be revisited in sub-sequent patches.
The code been rewritten to reduce nested function calls, making the
job of the inliner easier and preventing harmful code duplication.
Reviewed By: nafi3000
Differential Revision: https://reviews.llvm.org/D148717
Many math functions need to check for floating point rounding modes to
return correct values. Currently most of them use the internal implementation
of `fegetround`, which is platform-dependent and blocking math functions to be
enabled on platforms with unimplemented `fegetround`. In this change, we add
platform independent rounding mode checks and switching math functions to use
them instead. https://github.com/llvm/llvm-project/issues/63016
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D152280