Commit Graph

342 Commits

Author SHA1 Message Date
Joseph Huber
c850ea1498 [libc] Support fopen / fclose on the GPU
This patch adds the necessary support for the fopen and fclose functions
to work on the GPU via RPC. I added a new test that enables testing this
with the minimal features we have on the GPU. I will update it once we
have `fread` and `fwrite` to actually check the outputted strings. For
now I just relied on checking manually via the outpuot temp file.

Reviewed By: JonChesterfield, sivachandra

Differential Revision: https://reviews.llvm.org/D154519
2023-07-05 18:31:58 -05:00
Joseph Huber
5db39796bf [libc] Support timing information in libc tests
This patch adds the necessary support to provide timing information in
`libc` tests. This is useful for determining which tests look what
amount of time. We also can use this as a test basis for providing more
fine-grained timing when implementing things on the GPU.

The main difficulty with this is the fact that the AMDGPU fixed
frequency clock operates at an unknown frequency. We need to read this
on a per-card basis from the driver and then copy it in. NVPTX on the
other hand has a fixed clock at a resolution of 1ns. I have also
increased the resolution of the print-outs as the majority of these are
below a millisecond for me.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D154446
2023-07-05 14:27:08 -05:00
Michael Jones
cfbcbc8f88 [libc] fix MPFR rounding problems in fuzz test
The accuracy for the MPFR numbers in the strtofloat fuzz test was set
too high, causing rounding issues when rounding to a smaller final
result.

Reviewed By: lntue

Differential Revision: https://reviews.llvm.org/D154150
2023-07-05 10:53:40 -07:00
Joseph Huber
df52a22b1b [libc] Make the RPC server target always available
This patch makes sure that we always build the RPC server. The proposed
used for this is to begin integrating this server implementation into
`libomptarget`. That requires that we build this server ahead of time
when using a `LLVM_ENABLE_PROJECTS` build. Make a few tweaks to ensure
that the GCC compiler which may be used for this build doesn't complain.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D154105
2023-06-30 11:30:57 -05:00
Joseph Huber
62f57bc9b0 [libc] Add other RPC callback methods to the RPC server
This patch adds the other two methods to the server so the external
users can use the interface through the obfuscated interface.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D154224
2023-06-30 11:29:37 -05:00
Joseph Huber
667c10353e [libc] Fix the implementation of exit on the GPU
The RPC calls all have delays associated with them. Currently the `exit`
function does an async send and immediately exits the GPU. This can have
the effect that the RPC server never sees the exit call and we continue.
This patch changes that to first sync with the server before continuing
to perform its exit. There is still a hazard here, where the kernel can
complete before the RPC call reads back its response, but this is simply
multi-threaded hazards. This change ensures that the server *will*
always exit some time after the GPU exits.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D154112
2023-06-29 13:22:23 -05:00
Tue Ly
f320fefc4a [libc][math] Implement erff function correctly rounded to all rounding modes.
Implement correctly rounded `erff` functions.

For `x >= 4`, `erff(x) = 1` for `FE_TONEAREST` or `FE_UPWARD`, `0x1.ffffep-1` for `FE_DOWNWARD` or `FE_TOWARDZERO`.

For `0 <= x < 4`, we divide into 32 sub-intervals of length `1/8`, and use a degree-15 odd polynomial to approximate `erff(x)` in each sub-interval:
```
  erff(x) ~ x * (c0 + c1 * x^2 + c2 * x^4 + ... + c7 * x^14).
```

For `x < 0`, we can use the same formula as above, since the odd part is factored out.

Performance tested with `perf.sh` tool from the CORE-MATH project on AMD Ryzen 9 5900X:

Reciprocal throughput (clock cycles / op)
```
$ ./perf.sh erff --path2
GNU libc version: 2.35
GNU libc release: stable
-- CORE-MATH reciprocal throughput --  with -march=native      (with FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 11.790 + 0.182 clc/call; Median-Min = 0.154 clc/call; Max = 12.255 clc/call;
-- CORE-MATH reciprocal throughput --  with -march=x86-64-v2      (without FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 14.205 + 0.151 clc/call; Median-Min = 0.159 clc/call; Max = 15.893 clc/call;

-- System LIBC reciprocal throughput --
[####################] 100 %
Ntrial = 20 ; Min = 45.519 + 0.445 clc/call; Median-Min = 0.552 clc/call; Max = 46.345 clc/call;

-- LIBC reciprocal throughput --  with -mavx2 -mfma     (with FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 9.595 + 0.214 clc/call; Median-Min = 0.220 clc/call; Max = 9.887 clc/call;
-- LIBC reciprocal throughput --  with -msse4.2     (without FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 10.223 + 0.190 clc/call; Median-Min = 0.222 clc/call; Max = 10.474 clc/call;
```

and latency (clock cycles / op):
```
$ ./perf.sh erff --path2
GNU libc version: 2.35
GNU libc release: stable
-- CORE-MATH latency --  with -march=native      (with FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 38.566 + 0.391 clc/call; Median-Min = 0.503 clc/call; Max = 39.170 clc/call;
-- CORE-MATH latency --  with -march=x86-64-v2      (without FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 43.223 + 0.667 clc/call; Median-Min = 0.680 clc/call; Max = 43.913 clc/call;

-- System LIBC latency --
[####################] 100 %
Ntrial = 20 ; Min = 111.613 + 1.267 clc/call; Median-Min = 1.696 clc/call; Max = 113.444 clc/call;

-- LIBC latency --  with -mavx2 -mfma     (with FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 40.138 + 0.410 clc/call; Median-Min = 0.536 clc/call; Max = 40.729 clc/call;
-- LIBC latency --  with -msse4.2     (without FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 44.858 + 0.872 clc/call; Median-Min = 0.814 clc/call; Max = 46.019 clc/call;
```

Reviewed By: michaelrj

Differential Revision: https://reviews.llvm.org/D153683
2023-06-28 13:58:37 -04:00
Joseph Huber
31c154881c [libc] Allow the RPC client to be initialized via a H2D memcpy
The RPC client must be initialized to set a pointer to the underlying
buffer. This is currently done with the `reset` method which may not be
ideal for the use-case. We want runtimes to be able to initialize this
without needing to call a kernel. Recent changes allowed the `Client`
type to be trivially copyable. That means we can create a client on the
server side and then copy it over. To that end we take the existing
externally visible symbol and initialize it to the client's pointer.
Therefore we can look up the symbol and copy it over once loaded.

No test currently, I tested with a demo OpenMP application but couldn't think of
how to put that in-tree.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D153633
2023-06-26 10:41:32 -05:00
Joseph Huber
e0b487bfc0 [libc] Rename and install the RPC server interface
This patch prepares the RPC interface to be installed. We place this in
the existing `llvm-gpu-none` directory as it will also give us access to
the generated `libc` headers for the opcodes.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D153040
2023-06-21 11:26:24 -05:00
Joseph Huber
4272d09196 [libc][NFC] Cleanup the RPC server implementation prior to installing
This does some simple cleanup prior to landing the patch to install
these.

Differential Revision: https://reviews.llvm.org/D153439
2023-06-21 11:14:20 -05:00
Joseph Huber
964a535bfa [libc] Remove flexible array and replace with a template
Currently the implementation of the RPC interface requires a flexible
struct. This caused problems when compilling the RPC server with GCC as
would be required if trying to export the RPC server interface. This
required that we either move to the `x[1]` workaround or make it a
template parameter. While just using `x[1]` would be much less noisy,
this is technically undefined behavior. For this reason I elected to use
templates.

The downside to using templates is that the server code must now be able
to handle multiple different types at runtime. I was unable to find a
good solution that didn't rely on type erasure so I simply branch off of
the given value.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D153304
2023-06-20 15:22:37 -05:00
Joseph Huber
490958b9ea [libc][obvious] Actually return the value from malloc for NVPTX
Switching to this interface we neglected to actually write the output
from the malloc call to the RPC buffer. Fix this so the tests pass
again.

Differential Revision: https://reviews.llvm.org/D153069
2023-06-15 15:13:11 -05:00
Joseph Huber
dcdfc963d7 [libc] Export GPU extensions to libc for external use
The GPU port of the LLVM C library needs to export a few extensions to
the interface such that users can interface with it. This patch adds the
necessary logic to define a GPU extension. Currently, this only exports
a `rpc_reset_client` function. This allows us to use the server in
D147054 to set up the RPC interface outside of `libc`.

Depends on https://reviews.llvm.org/D147054

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D152283
2023-06-15 11:02:24 -05:00
Joseph Huber
719d77ed28 [libc] Begin implementing a library for the RPC server
This patch begins providing a generic static library that wraps around
the raw `rpc.h` interface. As discussed in the corresponding RFC,
https://discourse.llvm.org/t/rfc-libc-exporting-the-rpc-interface-for-the-gpu-libc/71030,
we want to begin exporting RPC services to external users. In order to
do this we decided to not expose the `rpc.h` header by wrapping around
its functionality. This is done with a C-interface as we make heavy use
of callbacks and allows us to provide a predictable interface.

Reviewed By: JonChesterfield, sivachandra

Differential Revision: https://reviews.llvm.org/D147054
2023-06-15 11:02:23 -05:00
Tue Ly
055be3c30c [libc] Enable hermetic floating point tests again.
Fixing an issue with LLVM libc's fenv.h defined rounding mode macros
differently from system libc, making get_round() return different values from
fegetround().  Also letting math tests to skip rounding modes that cannot be
set.  This should allow math tests to be run on platforms in which fenv.h is not
implemented yet.

This allows us to re-enable hermatic floating point tests in
https://reviews.llvm.org/D151123 and reverting https://reviews.llvm.org/D152742.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D152873
2023-06-14 10:53:35 -04:00
Guillaume Chatelet
9902fc8dad [libc] Enable custom logging in LibcTest
This patch mimics the behavior of Google Test and allow users to log custom messages after all flavors of ASSERT_ / EXPECT_.

Reviewed By: sivachandra, lntue

Differential Revision: https://reviews.llvm.org/D152630
2023-06-14 13:37:50 +00:00
Guillaume Chatelet
bdb07c98c4 Revert D152630 "[libc] Enable custom logging in LibcTest"
Failing buildbot https://lab.llvm.org/buildbot/#/builders/73/builds/49707
This reverts commit 9a7b4c9348.
2023-06-14 10:31:49 +00:00
Guillaume Chatelet
9a7b4c9348 [libc] Enable custom logging in LibcTest
This patch mimics the behavior of Google Test and allow users to log custom messages after all flavors of ASSERT_ / EXPECT_.

Reviewed By: sivachandra, lntue

Differential Revision: https://reviews.llvm.org/D152630
2023-06-14 10:26:18 +00:00
Tue Ly
37458f6693 [libc][math] Move str method from FPBits class to testing utils.
str method of FPBits class is only used for pretty printing its objects
in tests.  It brings cpp::string dependency to FPBits class, which is not ideal
for embedded use case.  We move str method to a free function in test utils and
remove this dependency of FPBits class.

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D152607
2023-06-10 02:50:58 -04:00
Joseph Huber
168fa31816 [libc] Fix some tests on NVPTX due to insufficient stack size
A few of these tests were disabled due to failing on NVPTX. After
looking into it the vast majority of these cases were due to
insufficient stack memory. This can be worked around by increasing the
stack size in the loader or by reducing the memory usage in the case of
large string constants.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D152583
2023-06-09 16:42:14 -05:00
Michael Jones
688b9730d1 [libc] add options to printf decimal floats
This patch adds three options for printf decimal long doubles, and these
can also apply to doubles.

1. Use a giant table which is fast and accurate, but takes up ~5MB).
2. Use dyadic floats for approximations, which only gives ~50 digits of
   accuracy but is very fast.
3. Use large integers for approximations, which is accurate but very
   slow.

Reviewed By: sivachandra, lntue

Differential Revision: https://reviews.llvm.org/D150399
2023-06-08 14:23:15 -07:00
Joseph Huber
e6a350df10 [libc] Replace the PRINT_TO_STDERR opcode for RPC printing.
A previous patch added general support for printing via the RPC
interface. we should consolidate this functionality and get rid of the
old opcode that was used for simple testing.

Reviewed By: lntue

Differential Revision: https://reviews.llvm.org/D152211
2023-06-05 19:28:30 -05:00
Joseph Huber
a59e1712fa [libc][obvious] Fix conditional when CUDA is not found
If CUDA is not found this string will expand into nothing. We need to
surround it with a string otherwise it will cause build failures.

Differential Revision: https://reviews.llvm.org/D152209
2023-06-05 18:51:23 -05:00
Joseph Huber
e6c401b5e8 [libc] Add initial support for 'puts' and 'fputs' to the GPU
This patch adds the initial support required to support basic priting in
`stdio.h` via `puts` and `fputs`. This is done using the existing LLVM C
library `File` API. In this sense we can think of the RPC interface as
our system call to dump the character string to the file. We carry a
`uintptr_t` reference as our native "file descriptor" as it will be used
as an opaque reference to the host's version once functions like
`fopen` are supported.

For some unknown reason the declaration of the `StdIn` variable causes
both the AMDGPU and NVPTX backends to crash if I use the `READ` flag.
This is not used currently as we only support output now, but it needs
to be fixed

Reviewed By: sivachandra, lntue

Differential Revision: https://reviews.llvm.org/D151282
2023-06-05 17:56:55 -05:00
Joseph Huber
a621308881 [libc] Implement basic malloc and free support on the GPU
This patch adds support for the `malloc` and `free` functions. These
currently aren't implemented in-tree so we first add the interface
filies.

This patch provides the most basic support for a true `malloc` and
`free` by using the RPC interface. This is functional, but in the future
we will want to implement a more intelligent system and primarily use
the RPC interface more as a `brk()` or `sbrk()` interface only called
when absolutely necessary. We will need to design an intelligent
allocator in the future.

The semantics of these memory allocations will need to be checked. I am
somewhat iffy on the details. I've heard that HSA can allocate
asynchronously which seems to work with my tests at least. CUDA uses an
implicit synchronization scheme so we need to use an explicitly separate
stream from the one launching the kernel or the default stream. I will
need to test the NVPTX case.

I would appreciate if anyone more experienced with the implementation details
here could chime in for the HSA and CUDA cases.

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D151735
2023-06-05 17:56:53 -05:00
Guillaume Chatelet
c76a3e795e [libc][NFC] Fixing various typos 2023-05-31 12:11:09 +00:00
Tobias Hieta
f98ee40f4b
[NFC][Py Reformat] Reformat python files in the rest of the dirs
This is an ongoing series of commits that are reformatting our
Python code. This catches the last of the python files to
reformat. Since they where so few I bunched them together.

Reformatting is done with `black`.

If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.

If you run into any problems, post to discourse about it and
we will try to help.

RFC Thread below:

https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style

Reviewed By: jhenderson, #libc, Mordante, sivachandra

Differential Revision: https://reviews.llvm.org/D150784
2023-05-25 11:17:05 +02:00
Joseph Huber
e826762a08 [libc] More efficiently send bytes via send_n and recv_n
Currently we have the `send_n` and `recv_n` routines to stream data,
such as a string to print, to the other side. The first operation is to
send the size so the other side knows the number of bytes to recieve.
However, this wasted 56 bytes that could've been sent. This meant that
small values, like the arguments to a function to call on the host for
example, needed to perform an extra send. This patch sends the first 56
bytes in the first packet and continues if necessary.

Depends on D150992

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D151041
2023-05-23 10:59:47 -05:00
Michael Jones
ae3b59e623 [libc] Use MPFR for strtofloat fuzzing
The previous string to float tests didn't check correctness, but due to
the atof differential test proving unreliable the strtofloat fuzz test
has been changed to use MPFR for correctness checking. Some minor bugs
have been found and fixed as well.

Reviewed By: lntue

Differential Revision: https://reviews.llvm.org/D150905
2023-05-22 11:04:53 -07:00
Siva Chandra Reddy
00bd8e9011 [libc] Add a str() method to FPBits which returns a string representation.
Unit tests for the str() method have also been added.

Previously, a separate test only helper function was being used by the
test matchers which has regressed over many cleanups. Moreover, being a
test only utility, it was not tested separately (and hence the
regression).

Reviewed By: michaelrj

Differential Revision: https://reviews.llvm.org/D150906
2023-05-19 06:20:41 +00:00
Siva Chandra Reddy
4dc205f016 [libc] Add a convenience CMake function add_unittest_framework_library.
This function is used to add unit test and hermetic test framework libraries.
It avoids the duplicated code to add compile options to each every test
framework libraries.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D150727
2023-05-17 21:13:50 +00:00
Joseph Huber
182e5acb11 [libc] Check the RPC server once again after the kernel exits
We support asynchronous sends, that means that the kernel can issue a
send, then exit the kernel as we do with the `EXIT` syscall. Because of
the condition it's therefore possible for the kernel to exit and break
from the loop before we check the server again. This can potentially
cause us to ignore an `EXIT` call from the GPU.

Reviewed By: JonChesterfield, lntue

Differential Revision: https://reviews.llvm.org/D150456
2023-05-12 12:49:19 -05:00
Joseph Huber
d21e507cfc [libc] Implement a generic streaming interface in the RPC
Currently we provide the `send_n` and `recv_n` functions. These were
somewhat divergent and not tested on the GPU. This patch changes the
support to be more common. We do this my making the CPU provide an array
equal the to at least the lane size while the GPU can rely on the
private memory address of its stack variables. This allows us to send
data back and forth generically.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D150379
2023-05-11 11:55:41 -05:00
Joseph Huber
30093d6be2 [libc][obvious] Fix undefined variable after name change
I forgot that we still used these variables in the loaders.

Differential Revision: https://reviews.llvm.org/D150362
2023-05-11 09:00:08 -05:00
Joseph Huber
4a2e50e4f4 [libc][NFC] Clean up some code in the RPC implementation.
Small cleanup of the server code and fixes a constant name not following
the naming convention.

Differential Revision: https://reviews.llvm.org/D150361
2023-05-11 08:22:55 -05:00
Jon Chesterfield
bbeae142bf [libc][rpc] Allocate a single block of shared memory instead of three
Allows moving the pointer swap between server and client into reset.
Single allocation simplifies whatever allocates the client/server, currently
the libc loaders.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D150337
2023-05-11 03:04:56 +01:00
Joseph Huber
c8c19e1c31 [libc] Fix RPC interface when sending and recieving aribtrary packets
The interface exported by the RPC library allows users to simply send
and recieve fixed sized packets without worrying about the data motion
underneath. However, this was broken in the current implementation. We
can think of the send and recieve implementations in terms of waiting
for ownership of the buffer, using the buffer, and posting ownership to
the other side. Our implementation of `recv` was incorrect in the
following scenarios.

recv -> send // we still own the buffer and should give away ownership
recv -> close // The other side is not waiting for data, this will
                 result in multiple openings of the same port

This patch attempts to fix this with an admittedly hacky fix where we
track if the previous implementation was a recv and post conditionally.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D150327
2023-05-10 18:51:38 -05:00
Jon Chesterfield
f497611f43 [libc][rpc] Allocate locks array within process
Replaces the globals currently used. Worth changing to a bitmap
before allowing runtime number of ports >> 64. One bit per port is likely
to be cheap enough that sizing for the worst case is always fine, otherwise
in the future we can change to dynamically allocating it.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D150309
2023-05-11 00:41:51 +01:00
Joseph Huber
aea866c12c [libc] Support concurrent RPC port access on the GPU
Previously we used a single port to implement the RPC. This was
sufficient for single threaded tests but can potentially cause deadlocks
when using multiple threads. The reason for this is that GPUs make no
forward progress guarantees. Therefore one group of threads waiting on
another group of threads can spin forever because there is no guarantee
that the other threads will continue executing. The typical workaround
for this is to allocate enough memory that a sufficiently large number
of work groups can make progress. As long as this number is somewhat
close to the amount of total concurrency we can obtain reliable
execution around a shared resource.

This patch enables using multiple ports by widening the arrays to a
predetermined size and indexes into them. Empty ports are currently
obtained via a trivial linker scan. This should be imporoved in the
future for performance reasons. Portions of D148191 were applied to
achieve parallel support.

Depends on D149581

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D149598
2023-05-05 10:12:19 -05:00
Joseph Huber
901266dad3 [libc] Change GPU startup and loader to use multiple kernels
The GPU has a different execution model to standard `_start`
implementations. On the GPU, all threads are active at the start of a
kernel. In order to correctly intitialize and call the constructors we
want single threaded semantics. Previously, this was done using a
makeshift global barrier with atomics. However, it should be easier to
simply put the portions of the code that must be single threaded in
separate kernels and then call those with only one thread. Generally,
mixing global state between kernel launches makes optimizations more
difficult, similarly to calling a function outside of the TU, but for
testing it is better to be correct.

Depends on D149527 D148943

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D149581
2023-05-04 19:31:41 -05:00
Joseph Huber
507edb52f9 [libc] Enable multiple threads to use RPC on the GPU
The execution model of the GPU expects that groups of threads will
execute in lock-step in SIMD fashion. It's both important for
performance and correctness that we treat this as the smallest possible
granularity for an RPC operation. Thus, we map multiple threads to a
single larger buffer and ship that across the wire.

This patch makes the necessary changes to support executing the RPC on
the GPU with multiple threads. This requires some workarounds to mimic
the model when handling the protocol from the CPU. I'm not completely
happy with some of the workarounds required, but I think it should work.

Uses some of the implementation details from D148191.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D148943
2023-05-04 19:31:41 -05:00
Joseph Huber
2e1c0ec629 [libc] Support global constructors and destructors on NVPTX
This patch adds the necessary hacks to support global constructors and
destructors. This is an incredibly hacky process caused by the primary
fact that Nvidia does not provide any binary tools and very little
linker support. We first had to emit references to these functions and
their priority in D149451. Then we dig them out of the module once it's
loaded to manually create the list that the linker should have made for
us. This patch also contains a few Nvidia specific hacks, but it passes
the test, albeit with a stack size warning from `ptxas` for the
callback. But this should be fine given the resource usage of a common
test.

This also adds a dependency on LLVM to the NVPTX loader, which hopefully doesn't
cause problems with our CUDA buildbot.

Depends on D149451

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D149527
2023-05-04 07:13:00 -05:00
Joseph Huber
91528d2058 [libc] Fix printing on the GPU when given a cpp::string_ref
The implementation of the test printing currently expects a null
terminated C-string. However, the `write_to_stderr` interface uses a
string view, which doesn't need to be null terminated. This patch
changes the printing interface to directly use `fwrite` instead rather
than relying on a null terminator.

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D149493
2023-04-28 21:32:01 -05:00
Joseph Huber
0bd564a259 [libc] Add a test to directly stimulate the RPC interface
Currently, the RPC interface with the loader is only tested if the other
tests fail. This test adds a direct test that runs a simple integer
increment over the RPC handshake 10000 times.

Depends on https://reviews.llvm.org/D148288

Reviewed By: lntue

Differential Revision: https://reviews.llvm.org/D148342
2023-04-19 20:02:32 -05:00
Joseph Huber
d0ff5e4030 [libc] Update RPC interface for system utilities on the GPU
This patch reworks the RPC interface to allow more generic memory
operations using the shared better. This patch decomposes the entire RPC
interface into opening a port and calling `send` or `recv` on it.

The `send` function sends a single packet of the length of the buffer.
The `recv` function is paired with the `send` call to then use the data.
So, any aribtrary combination of sending packets is possible. The only
restriction is that the client initiates the exchange with a `send`
while the server consumes it with a `recv`.

The operation of this is driven by two independent state machines that
tracks the buffer ownership during loads / stores. We keep track of two
so that we can transition between a send state and a recv state without
an extra wait. State transitions are observed via bit toggling, e.g.

This interface supports an efficient `send -> ack -> send -> ack -> send`
interface and allows for the last send to be ignored without checking
the ack.

A following patch will add some more comprehensive testing to this interface. I
I informally made an RPC call that simply incremented an integer and it took
roughly 10 microsends to complete an RPC call.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D148288
2023-04-19 20:02:31 -05:00
Joseph Huber
bc11bb3e26 [libc] Add the '--threads' and '--blocks' option to the GPU loaders
We will want to test the GPU `libc` with multiple threads in the future.
This patch adds the `--threads` and `--blocks` option to set the `x`
dimension of the kernel. Using CUDA terminology instead of OpenCL for
familiarity.

Depends on D148288 D148342

Reviewed By: jdoerfert, sivachandra, tra

Differential Revision: https://reviews.llvm.org/D148485
2023-04-19 08:01:58 -05:00
Siva Chandra Reddy
447d59e071 [libc][NFC] Move RoundingModeUtils to LibcFPTestHelpers.
Reviewed By: lntue

Differential Revision: https://reviews.llvm.org/D148602
2023-04-18 18:37:30 +00:00
Siva Chandra Reddy
e3645eadb8 [libc][NFC] Move ExecuteFunction test util to test/UnitTest.
Reviewed By: lntue

Differential Revision: https://reviews.llvm.org/D148611
2023-04-18 16:17:42 +00:00
Siva Chandra Reddy
fc121d0c73 [libc][NFC] Remove the unused FDReader testutil.
Differential Revision: https://reviews.llvm.org/D148454
2023-04-17 17:44:47 +00:00
Siva Chandra Reddy
dcf296b541 [libc][NFC] Remove the StreamWrapper class and use the new test logger.
Reviewed By: lntue

Differential Revision: https://reviews.llvm.org/D148452
2023-04-17 15:48:18 +00:00