A recent patch required the implementation to define `LIBC_NAMESPACE`.
For GPU offloading we provide a static library whose internal
implementation relies on the `libc` headers. This is a separate library
that is constructed during the "bootstrap" phase. This patch moves the
definition of the `LIBC_NAMESPACE` CMake variable up so its available
during bootstrapping and adds it to the definition of the RPC server.
Summary:
The HSA API explicitly states that the size is a count of uint32_t's not
a byte count. This was erroneously being used as a simple memcpy,
causing some weird behaviour. Fix this by correctly passing `1` to
initialize a single integer to zero.
This function implements the `abort` function on the GPU. The
implementation here closely mirros the `exit` call where we first
synchornize with the RPC server to make sure it's listening and then we
exit on the GPU.
I was unsure if this should be a simple `__builtin_assert` on the GPU. I
elected to go with an RPC approach to make this a more "true" `abort`
call. That is, it should invoke some signal handlers and exit with the
proper code according to the implemented C library on the server.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D159210
This `MAX_LANE_SIZE` was a hack from the days when we used a single
instance of the server and had some GPU state handle it. Now that we
have everything templated this really shouldn't be used. This patch
removes its use and replaces it with template arguments.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D158633
This patch adds support for `fread` on the GPU via the RPC mechanism.
Here we simply pass the size of the read to the server and then copy it
back to the client via the RPC channel. This should allow us to do the
basic operations on files now. This will obviously be slow for large
sizes due ot the number of RPC calls involved, this could be optimized
further by having a special RPC call that can initiate a memcpy between
the two pointers.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D155121
This patch does the noisy work of removing the test opcodes from the
exported interface to an interface that is only visible in `libc`. The
benefit of this is that we both test the exported RPC registration more
directly, and we do not need to give this interface to users.
I have decided to export any opcode that is not a "core" libc feature as
having its MSB set in the opcode. We can think of these as non-libc
"extensions".
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D154848
Currently we keep an internal buffer of device memory that is used to
indicate ownership of a port. Since we only use this as a single bit we
can simply turn this into a bitfield. I did this manually rather than
having a separate type as we need very special handling of the masks
used to interact with the locks.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D155511
There are some cases when testing we want to override the logic for not
building tests if the loader is not present. This allows users to
specify an external binary that fulfils the same duties which will force
the tests to be built even without meeting the dependencies.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D155837
If the clock_freq symbol isn't used, and is removed,
we don't need to abort the loader. Can instead just not set it.
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D155832
HSA headers might be under a hsa/ directory or might not.
This scheme matches the one used by the openmp amdgpu plugin.
Reviewed By: jhuber6, jplehr
Differential Revision: https://reviews.llvm.org/D155812
This patch adds the `rpc_host_call` function as a GPU extension. This is
exported from the `libc` project to use the RPC interface to call a
function pointer via RPC any copying the arguments by-value. The
interface can only support a single void pointer argument much like
pthreads. The function call here is the bare-bones version of what's
required for OpenMP reverse offloading. Full support will require
interfacing with the mapping table, nowait support, etc.
I decided to test this interface in `libomptarget` as that will be the
primary consumer and it would be more difficult to make a test in `libc`
due to the testing infrastructure not really having a concept of the
"host" as it runs directly on the GPU as if it were a CPU target.
Reviewed By: jplehr
Differential Revision: https://reviews.llvm.org/D155003
Summary:
This caused test failures on the gfx90a buildbot. This works on my
gfx1030 and the Nvidia buildbots, so we'll need to investigate what is
going wrong here. For now revert it to get the bots green.
This reverts commit 05abcc5792.
Currently we keep an internal buffer of device memory that is used to
indicate ownership of a port. Since we only use this as a single bit we
can simply turn this into a bitfield. I did this manually rather than
having a separate type as we need very special handling of the masks
used to interact with the locks.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D155511
This ensures that if someone calls the `rpc_shutdown` method multiple
times it will not segfault and gracefully continue. This was causing
problems in the OpenMP usage. This could point to other issues, but for
now this is a safe fix.
Differential Revision: https://reviews.llvm.org/D155005
This is an alternate approach to the patches proposed in D153897 and
D153794. Rather than exporting a single header that can be included on
the GPU in all circumstances, this patch chooses to instead generate a
separate set of headers that only provides the declarations. This can
then be used by external tooling to set up what's on the GPU. This
leaves room for header hacks for offloading languages without needing to
worry about the `libc` implementation.
Currently this generates a set of headers that only contain the
declarations. These will then be installed to a new clang resource
directory called `llvm_libc_wrappers/` which will house the shim code.
We can then automaticlaly include this from `clang` when offloading to
wrap around the headers while specifying what's on the GPU.
Reviewed By: jdoerfert, JonChesterfield
Differential Revision: https://reviews.llvm.org/D154036
This reverts commit a4a26374aa.
This was causing some problems with the CPU build and CUDA buildbot.
Revert until I can figure out what those issues are and fix them. I
believe it is just some CMake.
This is an alternate approach to the patches proposed in D153897 and
D153794. Rather than exporting a single header that can be included on
the GPU in all circumstances, this patch chooses to instead generate a
separate set of headers that only provides the declarations. This can
then be used by external tooling to set up what's on the GPU. This
leaves room for header hacks for offloading languages without needing to
worry about the `libc` implementation.
Currently this generates a set of headers that only contain the
declarations. These will then be installed to a new clang resource
directory called `llvm_libc_wrappers/` which will house the shim code.
We can then automaticlaly include this from `clang` when offloading to
wrap around the headers while specifying what's on the GPU.
Reviewed By: jdoerfert, JonChesterfield
Differential Revision: https://reviews.llvm.org/D154036
This patch adds the necessary support for the fopen and fclose functions
to work on the GPU via RPC. I added a new test that enables testing this
with the minimal features we have on the GPU. I will update it once we
have `fread` and `fwrite` to actually check the outputted strings. For
now I just relied on checking manually via the outpuot temp file.
Reviewed By: JonChesterfield, sivachandra
Differential Revision: https://reviews.llvm.org/D154519
This patch adds the necessary support to provide timing information in
`libc` tests. This is useful for determining which tests look what
amount of time. We also can use this as a test basis for providing more
fine-grained timing when implementing things on the GPU.
The main difficulty with this is the fact that the AMDGPU fixed
frequency clock operates at an unknown frequency. We need to read this
on a per-card basis from the driver and then copy it in. NVPTX on the
other hand has a fixed clock at a resolution of 1ns. I have also
increased the resolution of the print-outs as the majority of these are
below a millisecond for me.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D154446
The accuracy for the MPFR numbers in the strtofloat fuzz test was set
too high, causing rounding issues when rounding to a smaller final
result.
Reviewed By: lntue
Differential Revision: https://reviews.llvm.org/D154150
This patch makes sure that we always build the RPC server. The proposed
used for this is to begin integrating this server implementation into
`libomptarget`. That requires that we build this server ahead of time
when using a `LLVM_ENABLE_PROJECTS` build. Make a few tweaks to ensure
that the GCC compiler which may be used for this build doesn't complain.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D154105
This patch adds the other two methods to the server so the external
users can use the interface through the obfuscated interface.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D154224
The RPC calls all have delays associated with them. Currently the `exit`
function does an async send and immediately exits the GPU. This can have
the effect that the RPC server never sees the exit call and we continue.
This patch changes that to first sync with the server before continuing
to perform its exit. There is still a hazard here, where the kernel can
complete before the RPC call reads back its response, but this is simply
multi-threaded hazards. This change ensures that the server *will*
always exit some time after the GPU exits.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D154112
The RPC client must be initialized to set a pointer to the underlying
buffer. This is currently done with the `reset` method which may not be
ideal for the use-case. We want runtimes to be able to initialize this
without needing to call a kernel. Recent changes allowed the `Client`
type to be trivially copyable. That means we can create a client on the
server side and then copy it over. To that end we take the existing
externally visible symbol and initialize it to the client's pointer.
Therefore we can look up the symbol and copy it over once loaded.
No test currently, I tested with a demo OpenMP application but couldn't think of
how to put that in-tree.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D153633
This patch prepares the RPC interface to be installed. We place this in
the existing `llvm-gpu-none` directory as it will also give us access to
the generated `libc` headers for the opcodes.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D153040
Currently the implementation of the RPC interface requires a flexible
struct. This caused problems when compilling the RPC server with GCC as
would be required if trying to export the RPC server interface. This
required that we either move to the `x[1]` workaround or make it a
template parameter. While just using `x[1]` would be much less noisy,
this is technically undefined behavior. For this reason I elected to use
templates.
The downside to using templates is that the server code must now be able
to handle multiple different types at runtime. I was unable to find a
good solution that didn't rely on type erasure so I simply branch off of
the given value.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D153304
Switching to this interface we neglected to actually write the output
from the malloc call to the RPC buffer. Fix this so the tests pass
again.
Differential Revision: https://reviews.llvm.org/D153069
The GPU port of the LLVM C library needs to export a few extensions to
the interface such that users can interface with it. This patch adds the
necessary logic to define a GPU extension. Currently, this only exports
a `rpc_reset_client` function. This allows us to use the server in
D147054 to set up the RPC interface outside of `libc`.
Depends on https://reviews.llvm.org/D147054
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D152283
This patch begins providing a generic static library that wraps around
the raw `rpc.h` interface. As discussed in the corresponding RFC,
https://discourse.llvm.org/t/rfc-libc-exporting-the-rpc-interface-for-the-gpu-libc/71030,
we want to begin exporting RPC services to external users. In order to
do this we decided to not expose the `rpc.h` header by wrapping around
its functionality. This is done with a C-interface as we make heavy use
of callbacks and allows us to provide a predictable interface.
Reviewed By: JonChesterfield, sivachandra
Differential Revision: https://reviews.llvm.org/D147054
Fixing an issue with LLVM libc's fenv.h defined rounding mode macros
differently from system libc, making get_round() return different values from
fegetround(). Also letting math tests to skip rounding modes that cannot be
set. This should allow math tests to be run on platforms in which fenv.h is not
implemented yet.
This allows us to re-enable hermatic floating point tests in
https://reviews.llvm.org/D151123 and reverting https://reviews.llvm.org/D152742.
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D152873
This patch mimics the behavior of Google Test and allow users to log custom messages after all flavors of ASSERT_ / EXPECT_.
Reviewed By: sivachandra, lntue
Differential Revision: https://reviews.llvm.org/D152630
This patch mimics the behavior of Google Test and allow users to log custom messages after all flavors of ASSERT_ / EXPECT_.
Reviewed By: sivachandra, lntue
Differential Revision: https://reviews.llvm.org/D152630
str method of FPBits class is only used for pretty printing its objects
in tests. It brings cpp::string dependency to FPBits class, which is not ideal
for embedded use case. We move str method to a free function in test utils and
remove this dependency of FPBits class.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D152607
A few of these tests were disabled due to failing on NVPTX. After
looking into it the vast majority of these cases were due to
insufficient stack memory. This can be worked around by increasing the
stack size in the loader or by reducing the memory usage in the case of
large string constants.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D152583
This patch adds three options for printf decimal long doubles, and these
can also apply to doubles.
1. Use a giant table which is fast and accurate, but takes up ~5MB).
2. Use dyadic floats for approximations, which only gives ~50 digits of
accuracy but is very fast.
3. Use large integers for approximations, which is accurate but very
slow.
Reviewed By: sivachandra, lntue
Differential Revision: https://reviews.llvm.org/D150399
A previous patch added general support for printing via the RPC
interface. we should consolidate this functionality and get rid of the
old opcode that was used for simple testing.
Reviewed By: lntue
Differential Revision: https://reviews.llvm.org/D152211
If CUDA is not found this string will expand into nothing. We need to
surround it with a string otherwise it will cause build failures.
Differential Revision: https://reviews.llvm.org/D152209
This patch adds the initial support required to support basic priting in
`stdio.h` via `puts` and `fputs`. This is done using the existing LLVM C
library `File` API. In this sense we can think of the RPC interface as
our system call to dump the character string to the file. We carry a
`uintptr_t` reference as our native "file descriptor" as it will be used
as an opaque reference to the host's version once functions like
`fopen` are supported.
For some unknown reason the declaration of the `StdIn` variable causes
both the AMDGPU and NVPTX backends to crash if I use the `READ` flag.
This is not used currently as we only support output now, but it needs
to be fixed
Reviewed By: sivachandra, lntue
Differential Revision: https://reviews.llvm.org/D151282
This patch adds support for the `malloc` and `free` functions. These
currently aren't implemented in-tree so we first add the interface
filies.
This patch provides the most basic support for a true `malloc` and
`free` by using the RPC interface. This is functional, but in the future
we will want to implement a more intelligent system and primarily use
the RPC interface more as a `brk()` or `sbrk()` interface only called
when absolutely necessary. We will need to design an intelligent
allocator in the future.
The semantics of these memory allocations will need to be checked. I am
somewhat iffy on the details. I've heard that HSA can allocate
asynchronously which seems to work with my tests at least. CUDA uses an
implicit synchronization scheme so we need to use an explicitly separate
stream from the one launching the kernel or the default stream. I will
need to test the NVPTX case.
I would appreciate if anyone more experienced with the implementation details
here could chime in for the HSA and CUDA cases.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D151735
This is an ongoing series of commits that are reformatting our
Python code. This catches the last of the python files to
reformat. Since they where so few I bunched them together.
Reformatting is done with `black`.
If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.
If you run into any problems, post to discourse about it and
we will try to help.
RFC Thread below:
https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style
Reviewed By: jhenderson, #libc, Mordante, sivachandra
Differential Revision: https://reviews.llvm.org/D150784
Currently we have the `send_n` and `recv_n` routines to stream data,
such as a string to print, to the other side. The first operation is to
send the size so the other side knows the number of bytes to recieve.
However, this wasted 56 bytes that could've been sent. This meant that
small values, like the arguments to a function to call on the host for
example, needed to perform an extra send. This patch sends the first 56
bytes in the first packet and continues if necessary.
Depends on D150992
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D151041
The previous string to float tests didn't check correctness, but due to
the atof differential test proving unreliable the strtofloat fuzz test
has been changed to use MPFR for correctness checking. Some minor bugs
have been found and fixed as well.
Reviewed By: lntue
Differential Revision: https://reviews.llvm.org/D150905
Unit tests for the str() method have also been added.
Previously, a separate test only helper function was being used by the
test matchers which has regressed over many cleanups. Moreover, being a
test only utility, it was not tested separately (and hence the
regression).
Reviewed By: michaelrj
Differential Revision: https://reviews.llvm.org/D150906
This function is used to add unit test and hermetic test framework libraries.
It avoids the duplicated code to add compile options to each every test
framework libraries.
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D150727
We support asynchronous sends, that means that the kernel can issue a
send, then exit the kernel as we do with the `EXIT` syscall. Because of
the condition it's therefore possible for the kernel to exit and break
from the loop before we check the server again. This can potentially
cause us to ignore an `EXIT` call from the GPU.
Reviewed By: JonChesterfield, lntue
Differential Revision: https://reviews.llvm.org/D150456