Commit Graph

2518 Commits

Author SHA1 Message Date
Joseph Huber
a28e4eac26
[libc] Default to a single threaded thread pool for GPU tests (#74486)
Summary:
The GPU tests tend to fail when run massively in parallel. This is why
we use a CMake job pool to limit it to 1 in most cases. We should
default to the configuration that is most likely to work, that being a
single thread. There aren't enough GPU tests for this to be a massive
increase in test time on the bots, so we should default to what works
guaranteed.
2023-12-05 15:25:50 +00:00
Guillaume Chatelet
21b986637b
[libc] Fix arm32 tests (#74457)
`ASSERT_EQ` requires that both operands have the same type but on arm32
`size_t` is `unsigned int` instead of `unsigned long`. Using `size_t`
explicitely to avoid "conflicting types for parameter 'ValType"
2023-12-05 13:53:19 +01:00
Guillaume Chatelet
1d89478830
[reland][libc][NFC] Remove __support/bit.h and use __support/CPP/bit.h instead (#73939) (#74446)
Same as #73939 but also fix `libc/src/string/memory_utils/op_aarch64.h`
that was still using `deferred_static_assert`.
2023-12-05 11:35:13 +01:00
Guillaume Chatelet
de7fdc5b54
Revert "[libc][NFC] Remove __support/bit.h and use __support/CPP/bit.h instead" (#74444)
Reverts llvm/llvm-project#73939

This broke libc-aarch64-ubuntu build bot 
https://lab.llvm.org/buildbot/#/builders/138/builds/56186
2023-12-05 11:25:39 +01:00
Guillaume Chatelet
b140948850
[libc][NFC] Remove __support/bit.h and use __support/CPP/bit.h instead (#73939) 2023-12-05 11:21:07 +01:00
michaelrj-google
ab65c9c3bb
[libc][NFC] fix int warnings in float conversion (#74379)
The printf float to string conversion functions had some implicit
integer conversion warnings on gcc. This patch adds explicit casts to
these places.
2023-12-04 16:34:24 -08:00
Schrodinger ZHU Yifan
6fd1c1b8ef
[libc] fix HashTable warnings and build problems (#74371)
According to https://lab.llvm.org/buildbot/#/builders/163/builds/48002,
the generic build on HashTable fails with two major issues with
`werror`:
1. warnings on `error: suggest braces around initialization of
subobject`.
2. `__support/HashTable` tests are built regardless of its entrypoints`

This PR attempts to fix such issues.
2023-12-04 14:06:57 -08:00
Nick Desaulniers
0d59cfc7a3
[libc] fix -Wconversion in float_to_string.h (#74369)
Fixes:
libc/src/__support/float_to_string.h:551:48: error: conversion from
‘long
unsigned int’ to ‘int32_t’ {aka ‘int’} may change value
[-Werror=conversion]
551 | const int32_t shift_amount = SHIFT_CONST + (-exponent - IDX_SIZE *
idx);
| ~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Observed in gcc fullbuilds.

IDX_SIZE is a size_t (aka 'long unsigned int'), but has the value 128,
so the
expression is undergoing implicit promotion.

Link: https://lab.llvm.org/buildbot/#/builders/250/builds/14891
2023-12-04 13:12:51 -08:00
Schrodinger ZHU Yifan
ff51b60b18
[libc] Revert #73704 and subsequent fixes #73984, #74026 (#74355)
The test cases of mincore require getting correct page size from OS. As
`sysconf` is not functioning correctly, these patches are implemented in
a somewhat confusing way. We revert such patches and will reintroduce
mincore after we correct sysconf.

This reverts 54878b8, 985c0d1 and 418a3a4.
2023-12-04 12:49:12 -08:00
Nick Desaulniers
6886a52d6d Revert "[libc] build with -Werror (#73966)"
This reverts commit 606653091d.

Post submit buildbots are now red. We can use these explicit errors to better
clean up existing warnings, then reland this.

Link: #73966
2023-12-04 11:31:59 -08:00
Nick Desaulniers
606653091d
[libc] build with -Werror (#73966)
A recent commit introduced warnings observable when building unit tests.
If the
unit tests don't fail when warnings are introduced into the build, then
we
might fail to notice them in the stream of output from check-libc.

Link: https://github.com/llvm/llvm-project/pull/72763/files#r1410932348
2023-12-04 11:08:59 -08:00
Petr Hosek
c33e5d59e5
[libc] Add the missing math_extras.h include (#74259)
math_extras.h is used in integer_utils.h when building for 32-bit
platforms but the include is missing.
2023-12-04 11:02:03 -08:00
Schrodinger ZHU Yifan
a0eda10947
[libc][NFC] unify startup library's code style with the rest (#74041)
This PR unifies the startup library's code style with the rest of libc.
2023-12-04 10:31:18 -08:00
Guillaume Chatelet
8628ca29aa
[libc] Fix UB in memory utils (#74295)
The [standard](https://eel.is/c++draft/expr.add#4.3) forbids forming
pointers to invalid objects even if the pointer is never read from or
written to. This patch makes sure that we don't do pointer arithmetic on
invalid pointers.


Co-authored-by: Vitaly Buka <vitalybuka@google.com>
2023-12-04 10:57:35 +01:00
Joseph Huber
9553e156cb [libc] Allocate fine-grained memory for the RPC host symbol
Summary:
This pointer has been causing issues. Allocating and reading from coarse
memory on the CPU is not guaranteed and varies depending on the kernel
version and support. Previously we attempted to pin the memory but this
caused unexpected failures. This should be a legal operation and work
around the problem as fine-grained memory should be always legal to
write to by both sides.
2023-12-01 13:47:33 -06:00
Joseph Huber
8c1d476db0 Revert "[libc] Explicitly pin memory for the client symbol lookup (#73988)"
Summary:
This caused the bots to begin failing. Revert for now to get the bot
green.

This reverts commit 8bea804923.
This reverts commit e1395c7bdb.
2023-12-01 13:04:49 -06:00
Joseph Huber
8bea804923
[libc] Move the pointer to pin off the stack to the heap (#74118)
Summary:
This may be problematic to pin a stack pointer. Allocate it via the OS
allocator instead as the documentation suggests.

For some reason, if you attempt to free this pointer after the memory
region has been unlocked, it will return an invalid pointer.
2023-12-01 12:31:34 -06:00
Caslyn Tonelli
3693f44fff
[libc] Exclude Fuchsia from float128 detection (#73985)
Following from https://github.com/llvm/llvm-project/pull/73372:

Fuchsia targets currently don't support `float128`. Add detection for
`LIBC_TARGET_OS_IS_FUCHSIA`, and exclude this OS from setting
`LIBC_COMPILER_HAS_FLOAT128_EXTENSION`.
2023-12-01 10:30:18 -08:00
Guillaume Chatelet
977af4252d
[libc][NFC] Rename SPECIAL_X86_LONG_DOUBLE in LIBC_LONG_DOUBLE_IS_X86_FLOAT80 (#73950) 2023-12-01 14:23:08 +01:00
Guillaume Chatelet
f1d0276e4c
[libc][NFC] Rename LIBC_LONG_DOUBLE_IS_IEEE754_BIN128 to LIBC_LONG_DOUBLE_IS_FLOAT128 (#74052)
To make it consistent with
https://github.com/llvm/llvm-project/pull/73948 and
https://github.com/llvm/llvm-project/pull/73950
2023-12-01 13:57:36 +01:00
Guillaume Chatelet
808b7d2203
[libc][NFC] rename LONG_DOUBLE_IS_DOUBLE into LIBC_LONG_DOUBLE_IS_FLOAT64 (#73948) 2023-12-01 13:55:31 +01:00
Guillaume Chatelet
bb98227db1
[libc][NFC] Remove named_pair (#73952)
`named_pair` does not provide enough value to deserve its own header.
2023-12-01 10:30:15 +01:00
Guillaume Chatelet
2c976a1fac
[libc] Fix _Float16 detection for x86 (#73947) 2023-12-01 09:47:26 +01:00
Guillaume Chatelet
9557fcca56
[libc] Fix lint message (#73956) 2023-12-01 09:32:22 +01:00
Schrodinger ZHU Yifan
54878b80f3
[libc] remove fragile test from mincore (#74026) 2023-11-30 23:35:35 -05:00
Schrodinger ZHU Yifan
985c0d1903
[libc][mincore] use correct page_size for test (#73984) 2023-11-30 20:36:28 -05:00
Joseph Huber
e1395c7bdb
[libc] Explicitly pin memory for the client symbol lookup (#73988)
Summary:
Previously, we determined that coarse grained memory cannot be used in
the general case. That removed the buffer used to transfer the memory,
however we still had this lookup. Though we do not access the symbol
directly, it still conflicts with the agents apparently. Pin this as
well.

This resolves the problems @lntue was having with the `libc` GPU build.
2023-11-30 15:35:33 -06:00
Nick Desaulniers
cc84a14197
[libc] fix getchar_unlocked (#73874)
A typo was leading to getc_unlocked.cpp.o being included into libc.a
twice.

I only noticed because I was trying to convert libc.a to a shared object
via

$ ld.lld -o libc.so --whole-archive libc.a

which errored since getc_unlocked was being defined twice.
2023-11-30 12:38:00 -08:00
Joseph Huber
0584e6c166
[libc] Explicitly pin memory for the HSA memory transfer (#73973)
Summary:
This portion of code handles mapping the RPC client memory over to the
device. HSA copies need to be between two slices of memory that HSA has
allocated. Previously we used coarse-grained memory to act as the host
source. However, the support for this varies depending on the kernel and
version and should not be relied upon. This patch changes that handling
to use the `hsa_amd_memory_lock` API to explicitly pin memory to a
location sufficient for a DMA transfer to the GPU.
2023-11-30 13:46:52 -06:00
Schrodinger ZHU Yifan
418a3a4577
[libc][SysMMan] implement mincore (#73704)
Implement `mincore` as specified in
https://man7.org/linux/man-pages/man2/mincore.2.html
2023-11-30 14:22:36 -05:00
Samira Bazuzi
3f505cd587
[libc] Mark operator== const to avoid ambiguity in C++20. (#73954)
C++20 will automatically generate an operator== with reversed operand
order, which is ambiguous with the written operator== when one argument
is marked const and the other isn't.

This operator currently triggers -Wambiguous-reversed-operator at
several usage sites in libc/test/src/__support/CPP/bitset_test.cpp,
starting with line 153.
2023-11-30 17:22:10 +01:00
Guillaume Chatelet
b703bd821d
[libc] Add more functions in CPP/bit.h (#73814)
Once this is submitted we can remove `include/__support/bit.h` that
duplicates some of this functionality.
2023-11-30 13:51:02 +01:00
Guillaume Chatelet
8b25381bb6
[libc] Add the digits property to numeric_limits (#73926) 2023-11-30 13:34:55 +01:00
Guillaume Chatelet
7eb3103123
[libc] Fix cast semantic in integer_to_string (#73804) 2023-11-29 16:40:25 +01:00
Joseph Huber
0468867c98
[libc] Fix the GPU build for the hashing support (#73799)
Summary:
For reasons unknown to me, this function is undefined only on the GPU
build if you use `uintptr_t` but not `uint64_t` directly. This patch
makes an ifdef to use this directly for the GPU build to fix the bots.
2023-11-29 09:04:36 -06:00
Guillaume Chatelet
e2a37e5130
[libc][NFC] Fix missing LIBC_INLINE + style (#73659) 2023-11-29 10:37:54 +01:00
Schrodinger ZHU Yifan
1886b1a580
[libc] add PREFER_GENERIC flag (#73744)
There are some basic vectorization features in standard architecture
specifications. Such as SSE/SSE2 for x86-64, or NEON for aarch64. Even
though such features are almost always available, we still need some
methods to test fallback routines without any vectorization.

Previous attempt in hsearch adds a DISABLE_SSE2_OPT flag that tries to
compile the code with -mno-sse2 in order to test specific table scanning
routines. However, it turns out that such flag may have some unwanted
side effects hindering portability.

This PR introduces PREFER_GENERIC as an alternative. When a target is
built with PREFER_GENERIC, cmake will define a macro
__LIBC_PREFER_GENERIC such that developers can selectively choose the
fallback routine based on the macro.
2023-11-28 23:47:48 -05:00
Schrodinger ZHU Yifan
e399a317ef
[libc] fix build on aarch64 (#73739)
* avoid implicit narrowing conversion
* move hsearch entrypoints to FULL_BUILD
2023-11-28 22:39:00 -05:00
Schrodinger ZHU Yifan
81e3e7e5d4
[libc] [search] implement hcreate(_r)/hsearch(_r)/hdestroy(_r) (#73469)
This patch implements `hcreate(_r)/hsearch(_r)/hdestroy(_r)` as
specified in https://man7.org/linux/man-pages/man3/hsearch.3.html.

Notice that `neon/asimd` extension is not yet added in this patch. 

- The implementation is largely simplified from rust's
[`hashbrown`](https://github.com/rust-lang/hashbrown/blob/master/src/raw/mod.rs)
as we only consider fix-sized insertion-only hashtables. Technical
details are provided in code comments.

- This patch also contains a portable string hash function, which is
derived from [`aHash`](https://github.com/tkaitchuck/aHash)'s fallback
routine. Not using any SIMD acceleration, it has a good enough quality
(passing all SMHasher tests) and is not too bad in speed.

- Some general functionalities are added, such as `memory_size`,
`offset_to`(alignment), `next_power_of_two`, `is_power_of_two`.
`ctz/clz` are extended to support shorter integers.
2023-11-28 21:02:25 -05:00
michaelrj-google
86b0ccaee1
[libc][NFC] unify nextafter and nexttoward code (#73698)
Previously the nextafter and nexttoward implementations were almost
identical, with the exception of whether or not the second argument was
a template or just long double. This patch unifies them by making the
two argument templates independent.
2023-11-28 15:14:15 -08:00
michaelrj-google
43f783ff66
[libc][docs] Update implementation docs (#73590)
Some of the files in the docs/ directory are from 2019 and haven't been
updated since. This patch updates implementation_standard.rst,
source_tree_layout.rst, and has some minor fixes for strings.rst. It
also marks the most severely out of date files with a warning. These
files will be updated in a later patch.
2023-11-28 10:14:12 -08:00
Nishant Mittal
18fd6df885
[libc][math] Add unit tests for raising excepts in nextafter (#73556)
Follow up to
https://github.com/llvm/llvm-project/pull/72763#discussion_r1398277962.

### Summary
- Add unit tests for raising excepts in `nextafter`. 
- Fixed a bug in testing code for `nexttoward`.  

cc: @lntue
2023-11-28 00:50:17 -05:00
michaelrj-google
f90f036efb
[libc] Move in_use into OptionalStorage (#73569)
The previous optional class would call the destructor on a non-trivially
destructible object regardless of if it had already been reset. This
patch fixes this by moving tracking for if the object exists into the
internal storage class for optional.
2023-11-27 13:31:10 -08:00
Guillaume Chatelet
c599b8eec0
[libc][NFC] Decouple FP properties from C++ types (#73517)
We simplify the floating point properties by splitting concerns:
 - We define all distinct floating point formats in the `FPType` enum.
 - We map them to properties via the `FPProperties` trait.
- We map from C++ type to `FPType` in the `getFPType` function so logic
is easier to understand and extend.
2023-11-27 17:05:49 +01:00
Guillaume Chatelet
fb23fabc19
[libc] Fix forward octal prefix (#73526)
To fix failing build bots after :
https://github.com/llvm/llvm-project/pull/73372

```
/home/libc-buildbot/libc-aarch64-ubuntu/libc-aarch64-ubuntu-fullbuild-dbg/llvm-project/libc/src/__support/macros/properties/float.h:40:71: error: invalid digit '9' in octal constant
#if (defined(LIBC_COMPILER_CLANG_VER) && (LIBC_COMPILER_CLANG_VER >= 0900)) || \
```
2023-11-27 16:21:20 +01:00
Guillaume Chatelet
9539cbf033
[libc] Add detection support for float16 (#73372) 2023-11-27 16:08:17 +01:00
Joseph Huber
bf02c84cb8
[libc] Use file lock to join newline on RPC puts call (#73373)
Summary:
The puts call appends a newline. With multiple threads, this can be done
out of order such that another thread puts something before we finish
appending the newline. Add a flockfile and funlockfile to ensure that
the whole string is printed before another string can appear.
2023-11-27 08:41:15 -06:00
Guillaume Chatelet
ee5749bf78
[libc] Provide compiler version properties (#73344)
This will be used to support conditional compilation based on compiler
version.
We adopt the same convention as
[libc++](https://github.com/llvm/llvm-project/blob/main/libcxx/include/__config)
- thx @legrosbuffle for the suggestion!
Usage:
```
#if defined(LIBC_COMPILER_CLANG_VER)
#  if LIBC_COMPILER_CLANG_VER < 1500
#    warning "Libc only supports Clang 15 and later"
#  endif
#elif defined(LIBC_COMPILER_GCC_VER)
#  if LIBC_COMPILER_GCC_VER < 1500
#    warning "Libc only supports GCC 15 and later"
#  endif
#elif defined(LIBC_COMPILER_MSC_VER)
#  if LIBC_COMPILER_MSC_VER < 1930
#    warning "Libc only supports Visual Studio 2022 RTW (17.0) and later"
#  endif
#endif
```
2023-11-24 17:02:11 +01:00
Guillaume Chatelet
89a832435d
[libc][cmake] Add missing dependencies for type_traits (#73339) 2023-11-24 16:29:29 +01:00
Guillaume Chatelet
5e5a22caf8
[libc][NFC] Move float macro into its own header / add target os detection (#73311)
Floating point properties are a combination of target OS, target
architecture and compiler support.
 - Adding target OS detection,
 - Moving floating point type detection to its own file.

This is in preparation of adding support for `_Float16` which requires
testing compiler **version** and target architecture.
2023-11-24 16:11:05 +01:00
Guillaume Chatelet
dc9787c872
[libc][NFC] Remove dead code (#73315) 2023-11-24 14:33:19 +01:00
Guillaume Chatelet
d924c5d721
[libc][NFC] Sink "PlatformDefs.h" into "FloatProperties.h" (#73226)
`PlatformDefs.h` does not bring a lot of value as a separate file.
It is transitively included in `FloatProperties.h` and `FPBits.h`. This
patch sinks it into `FloatProperties.h` and removes the associated build
targets.
2023-11-23 11:23:18 +01:00
Guillaume Chatelet
436f5f652b
[libc][NFC] Remove unused define (#73222) 2023-11-23 10:24:49 +01:00
Guillaume Chatelet
c444879313
[libc][NFC] Split builtin_wrapper into bit and math_extras (#73113)
Split `builtin_wrapper.h` into `bit.h` and `math_extras.h` to mimic LLVM
`llvm/ADT/Bit.h` and `llvm/Support/MathExtras.h`.
Also added unittest place holders.
2023-11-23 09:58:59 +01:00
Guillaume Chatelet
f12be145ec
[libc][bazel] Enable __support tests (#73125) 2023-11-22 16:36:37 +01:00
Joseph Huber
fa1e49cf37 [libc] Disable nexttoward tests on the GPU
Summary:

These tests are currently failing for some reason. A lot of math tests
on the GPU are disabled temporarily and need to be fixed.
2023-11-22 07:46:25 -06:00
Nishant Mittal
0c49fc4c68
[libc][math] Implement nexttoward functions (#72763)
Implements the `nexttoward`, `nexttowardf` and `nexttowardl` functions.
Also, raise excepts required by the standard in `nextafter` functions.

cc: @lntue
2023-11-21 09:02:51 -05:00
Joseph Huber
8341a40ec1
[libc] Update the AMDGPU implementation to use code object 5 (#72580)
Summary:
This patch includes the necessary changes to make the `libc` tests
running on AMD GPUs run using the newer code object version. The 'code
object version' is AMD's internal ABI for making kernel calls. The move
from 4 to 5 changed how we handle arguments for builtins such as
obtaining the grid size or setting up the size of the private stack.

Fixes: https://github.com/llvm/llvm-project/issues/72517
2023-11-21 07:14:10 -06:00
Joseph Huber
abd85cd473
[libc] Remove the optional arguments for NVPTX constructors (#69536)
Summary:
We call the global constructors by function pointer. For whatever reason
the NVPTX architecture relies very specifically on the arguments to the
function pointer invocation matching what the function is implemented
as. This is problematic as most of these constructors are generated
with no arguments. This patch removes the extended arguments that GNU
and LLVM use for the constructors optionally so that it can support the
common case.
2023-11-20 17:10:15 -06:00
michaelrj-google
4db99c8b54
[libc] Add base for target config within cmake (#72318)
Currently the only way to add or remove entrypoints is to modify the
entrypoints.txt file for the current target. This isn't ideal since
a user would have to carry a diff for this file when updating their
checkout. This patch adds a basic mechanism to allow the user to remove
entrypoints without modifying the repository.
2023-11-17 11:32:27 -08:00
lntue
545f4d9855
[libc] Remove recursion in get_object_files_for_test to improve build time. (#72351) 2023-11-16 08:55:51 -05:00
lntue
0177c1c443
[libc] Only perform MSAN unpoison in non-constexpr context. (#72299) 2023-11-14 14:01:35 -05:00
lntue
86c57b9795
[libc][arm] Use __ARM_FP to detect floating point support for FEnvImpl. (#72177)
https://github.com/llvm/llvm-project/issues/72157
2023-11-13 19:36:57 -05:00
lntue
6899f035ae
[libc] Check if arm targets support FPSCR in FEnvImpl.h. (#72158)
https://github.com/llvm/llvm-project/issues/72157
2023-11-13 18:32:42 -05:00
Joseph Huber
9a6517e63a [libc][NFC] Do not emit init / fini kernels in NVPTX libc
Summray:
A recent patch upgrades the NVPTX ctor / dtor lowering pass to emit
kernels so other languages can call them. We do this manually in `libc`
so we do not need this. Use the provided flag to disable this step to
keep the created kernels cleaner.
2023-11-13 09:24:02 -06:00
lntue
de79314197
[libc] Fix missing ; in spec.td. (#71977) 2023-11-10 15:22:10 -05:00
lntue
3f906f513e
[libc][math] Add initial support for C23 float128 math functions, starting with copysignf128. (#71731) 2023-11-10 14:32:59 -05:00
Guillaume Chatelet
c07f73e754
[libc] Update configure.rst after config.json modification (#71942)
The update is automatically generated from `config/config.json`.
2023-11-10 16:45:36 +01:00
doshimili
3153aa4c95
[libc] Adding a version of memset with software prefetching (#70857)
Software prefetching helps recover performance when hardware prefetching
is disabled. The 'LIBC_COPT_MEMSET_X86_USE_SOFTWARE_PREFETCHING' compile
time option allows users to use this patch.
2023-11-10 10:56:16 +01:00
Joseph Huber
dc30fa6aca [libc][fix] Call GPU destructors in the correct order
Summary:
I was mistakenly iterating the list backwards. Regular semantics puts
both arrays in priority order but the destructors are called backwards.
2023-11-09 09:22:41 -06:00
Joseph Huber
b1af3c0857
[libc][FIXME] Disable math tests to make the GPU bots green (#71603)
Summary:
This is a quick hack to disable affected GPU math tests so the bots will
be green again.

The offending commit is d2361b2048. If
that is reverted along with this patch the tests also pass.
2023-11-07 17:43:21 -06:00
michaelrj-google
009ba779c4
[libc][NFC] Remove libcpp include from atanf_test (#71449)
The test for atanf used <initializer_list> to simplify iterating through
an array. This caused issues with the new features.h change by creating
a
libcpp dependency in the test. This change moves the list to an array
variable, removing the need for that dependency.
2023-11-07 10:35:09 -08:00
lntue
a0303d8923
[libc][bazel] Add powf target and fix bazel overlay. (#71464) 2023-11-07 08:27:02 -05:00
Dmitry Vyukov
d275277544
[libc] Optimize mempcy size thresholds (#70049)
Adjust boundary conditions for sizes = 16/32/64.
See the added comment for explanations.

Results on a machine with AVX2, so sizes 64/128 affected:
```
                │   baseline   │               adjusted               │
                │    sec/op    │   sec/op     vs base                 │
memcpy/Google_A   5.701n ±  0%   5.551n ± 1%   -2.63% (n=100)
memcpy/Google_B   3.817n ±  0%   3.776n ± 0%   -1.07% (p=0.000 n=100)
memcpy/Google_D   11.35n ±  1%   11.32n ± 0%        ~ (p=0.066 n=100)
memcpy/Google_U   3.874n ± 1%    3.821n ± 1%   -1.37% (p=0.001 n=100)
memcpy/64         3.843n ±  0%   3.105n ± 3%  -19.22% (n=50)
memcpy/128        4.842n ±  0%   3.818n ± 0%  -21.15% (p=0.000 n=50)
```
2023-11-07 08:37:19 +01:00
lntue
d2361b2048
[libc][math] Add min/max/min_denorm/max_denorm constants to FPBits and clean up its constants return types. (#71298) 2023-11-06 18:22:34 -05:00
lntue
bc7a3bd864
[libc][math] Implement powf function correctly rounded to all rounding modes. (#71188)
We compute `pow(x, y)` using the formula
```
  pow(x, y) = x^y = 2^(y * log2(x))
```
We follow similar steps as in `log2f(x)` and `exp2f(x)`, by breaking
down into `hi + mid + lo` parts, in which `hi` parts are computed using
the exponent field directly, `mid` parts will use look-up tables, and
`lo` parts are approximated by polynomials.

We add some speedup for common use-cases:
```
  pow(2, y) = exp2(y)
  pow(10, y) = exp10(y)
  pow(x, 2) = x * x
  pow(x, 1/2) = sqrt(x)
  pow(x, -1/2) = rsqrt(x) - to be added
```
2023-11-06 16:54:25 -05:00
Guillaume Chatelet
bdac972071
Fix load64_aligned (#71391)
Fix #64758 `load64_aligned` was missing a case for `alignment == 6`.
2023-11-06 14:59:26 +01:00
michaelrj-google
73e974c00a
[libc] Cleanup ErrnoSetterMatcher target (#71240)
The ErrnoSetterMatcher target was renamed in a previous patch, but not
all uses were caught. This patch fixes those that remain.
2023-11-03 17:00:08 -07:00
Joseph Huber
158d7b8c23
[libc] Allow hermetic timing if the clock function is built (#71092)
Summary:
This patch fixes some code duplication on the GPU. The GPU build wanted
to enable timing for hermetic tests so it built some special case
handling into the test suite. Now that `clock` is supported on the
target we can simply link against the external interface. Because we
include `clock.h` for the CLOCKS_PER_SEC macro we remap the C entrypoint
to the internal one if it ends up called. This should allow hermetic
tests to run with timing if it is supported.
2023-11-02 15:03:17 -05:00
Jon Chesterfield
f0e100a05a
[amdgpu][openmp] Treat missing TIMESTAMP_FREQUENCY as non-fatal (#70987)
If you build with dynamic_hsa, the symbol is known and compilation
succeeds. If you then run with a slightly older libhsa, this argument is
not recognised and an error returned. I'd rather the program runs with a
misleading omp wtime than refuses to run at all.
2023-11-01 22:43:34 +00:00
Roland McGrath
ba177c7286
[libc] Add a few missing casts (#70850)
Stricter GCC warnings about implicit widening and narrowing cases
necessitate additional explicit casts around some integer operations.
2023-10-31 12:37:09 -07:00
michaelrj-google
8ca565cd3b
[libc] Fix printf long double truncation bound (#70705)
The calculation for if a number being printed is truncated and should be
rounded up assumed a double for one of its constants, causing
occassional misrounding. This fixes that by making the constant based on
the mantissa width.
2023-10-30 14:04:00 -07:00
Joseph Huber
9e390a1408 [libc][Obvious] Fix missing semicolon in AMDGPU loader implementation
Summary:
Title
2023-10-30 14:58:46 -05:00
Jon Chesterfield
896749aa0d
[amdgpu][openmp] Avoiding writing to packet header twice (#70695)
I think it follows from the HSA spec that a write to the first byte is
deemed significant to the GPU in which case writing to the second short
and reading back from it later would be safe. However, the examples for
this all involve an atomic write to the first 32 bits and it seems a
credible risk that the occasional CI errors abound invalid packets have
as their root cause that the firmware notices the early write to
packet->setup and treats that as a sign that the packet is ready to go.

That was overly-paranoid, however in passing noticed the code in libc is
genuinely invalid. The memset writes a zero to the header byte, changing
it from type_invalid (1) to type_vendor (0), at which point the GPU is
free to read the 64 byte packet and interpret it as a vendor packet,
which is probably why libc CI periodically errors about invalid packets.

Also a drive by change to do the atomic store on a uint32_t
consistently. I'm not sure offhand what __atomic_store_n on a uint16_t*
and an int resolves to, seems better to be unambiguous there.
2023-10-30 18:35:52 +00:00
Joseph Huber
8e447a123b
[libc] Optimize the RPC memory copy for the AMDGPU target (#70467)
Summary:
We previously made the change to make the GPU target use builtin
implementations of memory copy functions. However, this had the negative
effect of massively increasing register usages when using the printing
interface. For example, a `printf` call went from using 25 VGPRs to 54
simply because of using the builtin. However, we probably want to still
export the builitin, but for the RPC interface we heavily prefer small
resource usage over the performance gains of fully unrolling this loop.
For NVPTX however, the builtin implementation causes the resource usage
to go down (36 registers total for a regular `fputs` call) so we will
maintain that implementation.

I think specializing this is the right call as we will always prefer the
implementation with the smallest resource footprint for this interface,
as performance is already going to be heavily bottlenecked by the use of
fine-grained memory.
2023-10-27 14:55:37 -05:00
michaelrj-google
6e863c4073
[libc] Fix incorrect printing for alt mode ints (#70252)
Previously, our printf would incorrectly handle conversions like
("%#x",0) and ("%#o",0). This patch corrects the behavior to match what
is described in the standard.
2023-10-27 11:04:11 -07:00
Dmitry Vyukov
0e110fb429
[libc] memmove optimizations (#70043)
1. Remove is_disjoint check for smaller sizes and reduce code bloat.

inline_memmove may handle some small sizes as efficiently
as inline_memcpy. For these sizes we may not do is_disjoint check.
This both avoids additional code for the most frequent smaller sizes
and removes code bloat (we don't need the memcpy logic for small sizes).
Here we heavily rely on inlining and dead code elimination: from the
first
inline_memmove we should get only handling of small sizes, and from
the second inline_memmove and inline_memcpy we should get only handling
of larger sizes.

2. Use the memcpy thresholds for memmove.
Memcpy thresholds were more carefully tuned.
This becomes more important since we use memmove
for all small sizes always now.

3. Fix boundary conditions for sizes = 16/32/64.
See the added comment for explanations.

Memmove function size drops from 885 to 715 bytes
due to removed duplication.

```
                 │  baseline   │             small-size              │
                 │   sec/op    │   sec/op     vs base                │
memmove/Google_A   3.208n ± 0%   2.911n ± 0%   -9.25% (n=100)
memmove/Google_B   4.113n ± 1%   3.428n ± 0%  -16.65% (n=100)
memmove/Google_D   5.838n ± 0%   4.158n ± 0%  -28.78% (n=100)
memmove/Google_S   4.712n ± 1%   3.899n ± 0%  -17.25% (n=100)
memmove/Google_U   3.609n ± 0%   3.247n ± 1%  -10.02% (n=100)
memmove/0          2.982n ± 0%   2.169n ± 0%  -27.26% (n=50)
memmove/1          3.253n ± 0%   2.168n ± 0%  -33.34% (n=50)
memmove/2          3.255n ± 0%   2.169n ± 0%  -33.38% (n=50)
memmove/3          3.259n ± 2%   2.175n ± 0%  -33.27% (p=0.000 n=50)
memmove/4          3.259n ± 0%   2.168n ± 5%  -33.46% (p=0.000 n=50)
memmove/5          2.488n ± 0%   1.926n ± 0%  -22.57% (p=0.000 n=50)
memmove/6          2.490n ± 0%   1.928n ± 0%  -22.59% (p=0.000 n=50)
memmove/7          2.492n ± 0%   1.927n ± 0%  -22.65% (p=0.000 n=50)
memmove/8          2.737n ± 0%   2.711n ± 0%   -0.97% (p=0.000 n=50)
memmove/9          2.736n ± 0%   2.711n ± 0%   -0.94% (p=0.000 n=50)
memmove/10         2.739n ± 0%   2.711n ± 0%   -1.04% (p=0.000 n=50)
memmove/11         2.740n ± 0%   2.711n ± 0%   -1.07% (p=0.000 n=50)
memmove/12         2.740n ± 0%   2.711n ± 0%   -1.09% (p=0.000 n=50)
memmove/13         2.744n ± 0%   2.711n ± 0%   -1.22% (p=0.000 n=50)
memmove/14         2.742n ± 0%   2.711n ± 0%   -1.14% (p=0.000 n=50)
memmove/15         2.742n ± 0%   2.711n ± 0%   -1.15% (p=0.000 n=50)
memmove/16         2.997n ± 0%   2.981n ± 0%   -0.52% (p=0.000 n=50)
memmove/17         2.998n ± 0%   2.981n ± 0%   -0.55% (p=0.000 n=50)
memmove/18         2.998n ± 0%   2.981n ± 0%   -0.55% (p=0.000 n=50)
memmove/19         2.999n ± 0%   2.982n ± 0%   -0.59% (p=0.000 n=50)
memmove/20         2.998n ± 0%   2.981n ± 0%   -0.55% (p=0.000 n=50)
memmove/21         3.000n ± 0%   2.981n ± 0%   -0.61% (p=0.000 n=50)
memmove/22         3.002n ± 0%   2.981n ± 0%   -0.68% (p=0.000 n=50)
memmove/23         3.002n ± 0%   2.981n ± 0%   -0.67% (p=0.000 n=50)
memmove/24         3.002n ± 0%   2.981n ± 0%   -0.70% (n=50)
memmove/25         3.002n ± 0%   2.981n ± 0%   -0.68% (p=0.000 n=50)
memmove/26         3.004n ± 0%   2.982n ± 0%   -0.74% (p=0.000 n=50)
memmove/27         3.005n ± 0%   2.981n ± 0%   -0.79% (n=50)
memmove/28         3.005n ± 0%   2.982n ± 0%   -0.77% (n=50)
memmove/29         3.009n ± 0%   2.981n ± 0%   -0.92% (n=50)
memmove/30         3.008n ± 0%   2.981n ± 0%   -0.89% (n=50)
memmove/31         3.007n ± 0%   2.982n ± 0%   -0.86% (n=50)
memmove/32         3.540n ± 0%   2.998n ± 0%  -15.31% (p=0.000 n=50)
memmove/33         3.544n ± 0%   2.997n ± 0%  -15.44% (p=0.000 n=50)
memmove/34         3.546n ± 0%   2.999n ± 0%  -15.42% (n=50)
memmove/35         3.545n ± 0%   2.999n ± 0%  -15.40% (n=50)
memmove/36         3.548n ± 0%   2.998n ± 0%  -15.52% (p=0.000 n=50)
memmove/37         3.546n ± 0%   3.000n ± 0%  -15.41% (n=50)
memmove/38         3.549n ± 0%   2.999n ± 0%  -15.49% (p=0.000 n=50)
memmove/39         3.549n ± 0%   2.999n ± 0%  -15.48% (p=0.000 n=50)
memmove/40         3.549n ± 0%   3.000n ± 0%  -15.46% (p=0.000 n=50)
memmove/41         3.550n ± 0%   3.001n ± 0%  -15.47% (n=50)
memmove/42         3.549n ± 0%   3.001n ± 0%  -15.43% (n=50)
memmove/43         3.552n ± 0%   3.001n ± 0%  -15.52% (p=0.000 n=50)
memmove/44         3.552n ± 0%   3.001n ± 0%  -15.51% (n=50)
memmove/45         3.552n ± 0%   3.002n ± 0%  -15.48% (n=50)
memmove/46         3.554n ± 0%   3.001n ± 0%  -15.55% (p=0.000 n=50)
memmove/47         3.556n ± 0%   3.002n ± 0%  -15.58% (p=0.000 n=50)
memmove/48         3.555n ± 0%   3.003n ± 0%  -15.54% (n=50)
memmove/49         3.557n ± 0%   3.002n ± 0%  -15.59% (p=0.000 n=50)
memmove/50         3.557n ± 0%   3.004n ± 0%  -15.55% (p=0.000 n=50)
memmove/51         3.556n ± 0%   3.004n ± 0%  -15.53% (p=0.000 n=50)
memmove/52         3.561n ± 0%   3.004n ± 0%  -15.65% (p=0.000 n=50)
memmove/53         3.558n ± 0%   3.004n ± 0%  -15.57% (p=0.000 n=50)
memmove/54         3.561n ± 0%   3.005n ± 0%  -15.62% (n=50)
memmove/55         3.560n ± 0%   3.006n ± 0%  -15.57% (n=50)
memmove/56         3.562n ± 0%   3.006n ± 0%  -15.60% (p=0.000 n=50)
memmove/57         3.563n ± 0%   3.006n ± 0%  -15.64% (n=50)
memmove/58         3.565n ± 0%   3.007n ± 0%  -15.64% (p=0.000 n=50)
memmove/59         3.564n ± 0%   3.006n ± 0%  -15.66% (p=0.000 n=50)
memmove/60         3.570n ± 0%   3.008n ± 0%  -15.74% (p=0.000 n=50)
memmove/61         3.566n ± 0%   3.009n ± 0%  -15.63% (p=0.000 n=50)
memmove/62         3.567n ± 0%   3.007n ± 0%  -15.70% (p=0.000 n=50)
memmove/63         3.568n ± 0%   3.008n ± 0%  -15.71% (p=0.000 n=50)
memmove/64         4.104n ± 0%   3.008n ± 0%  -26.70% (p=0.000 n=50)
memmove/65         4.126n ± 0%   3.662n ± 0%  -11.26% (p=0.000 n=50)
memmove/66         4.128n ± 0%   3.662n ± 0%  -11.29% (n=50)
memmove/67         4.129n ± 0%   3.662n ± 0%  -11.31% (n=50)
memmove/68         4.129n ± 0%   3.661n ± 0%  -11.33% (p=0.000 n=50)
memmove/69         4.130n ± 0%   3.662n ± 0%  -11.34% (p=0.000 n=50)
memmove/70         4.130n ± 0%   3.662n ± 0%  -11.33% (n=50)
memmove/71         4.132n ± 0%   3.662n ± 0%  -11.38% (p=0.000 n=50)
memmove/72         4.131n ± 0%   3.661n ± 0%  -11.39% (n=50)
memmove/73         4.135n ± 0%   3.661n ± 0%  -11.45% (p=0.000 n=50)
memmove/74         4.137n ± 0%   3.662n ± 0%  -11.49% (n=50)
memmove/75         4.138n ± 0%   3.662n ± 0%  -11.51% (p=0.000 n=50)
memmove/76         4.139n ± 0%   3.661n ± 0%  -11.56% (p=0.000 n=50)
memmove/77         4.136n ± 0%   3.662n ± 0%  -11.47% (p=0.000 n=50)
memmove/78         4.143n ± 0%   3.661n ± 0%  -11.62% (p=0.000 n=50)
memmove/79         4.142n ± 0%   3.661n ± 0%  -11.60% (n=50)
memmove/80         4.142n ± 0%   3.661n ± 0%  -11.62% (p=0.000 n=50)
memmove/81         4.140n ± 0%   3.661n ± 0%  -11.57% (n=50)
memmove/82         4.146n ± 0%   3.661n ± 0%  -11.69% (n=50)
memmove/83         4.143n ± 0%   3.661n ± 0%  -11.63% (p=0.000 n=50)
memmove/84         4.143n ± 0%   3.661n ± 0%  -11.63% (n=50)
memmove/85         4.147n ± 0%   3.661n ± 0%  -11.73% (p=0.000 n=50)
memmove/86         4.142n ± 0%   3.661n ± 0%  -11.62% (p=0.000 n=50)
memmove/87         4.147n ± 0%   3.661n ± 0%  -11.72% (p=0.000 n=50)
memmove/88         4.148n ± 0%   3.661n ± 0%  -11.74% (n=50)
memmove/89         4.152n ± 0%   3.661n ± 0%  -11.84% (n=50)
memmove/90         4.151n ± 0%   3.661n ± 0%  -11.81% (n=50)
memmove/91         4.150n ± 0%   3.661n ± 0%  -11.78% (n=50)
memmove/92         4.153n ± 0%   3.661n ± 0%  -11.86% (n=50)
memmove/93         4.158n ± 0%   3.661n ± 0%  -11.95% (n=50)
memmove/94         4.157n ± 0%   3.661n ± 0%  -11.95% (p=0.000 n=50)
memmove/95         4.155n ± 0%   3.661n ± 0%  -11.90% (p=0.000 n=50)
memmove/96         4.149n ± 0%   3.660n ± 0%  -11.79% (n=50)
memmove/97         4.157n ± 0%   3.661n ± 0%  -11.94% (n=50)
memmove/98         4.157n ± 0%   3.661n ± 0%  -11.94% (n=50)
memmove/99         4.168n ± 0%   3.661n ± 0%  -12.17% (p=0.000 n=50)
memmove/100        4.159n ± 0%   3.660n ± 0%  -12.00% (p=0.000 n=50)
memmove/101        4.161n ± 0%   3.660n ± 0%  -12.03% (p=0.000 n=50)
memmove/102        4.165n ± 0%   3.660n ± 0%  -12.12% (p=0.000 n=50)
memmove/103        4.164n ± 0%   3.661n ± 0%  -12.08% (n=50)
memmove/104        4.164n ± 0%   3.660n ± 0%  -12.11% (n=50)
memmove/105        4.165n ± 0%   3.660n ± 0%  -12.12% (p=0.000 n=50)
memmove/106        4.166n ± 0%   3.660n ± 0%  -12.15% (n=50)
memmove/107        4.171n ± 0%   3.660n ± 1%  -12.26% (p=0.000 n=50)
memmove/108        4.173n ± 0%   3.660n ± 0%  -12.30% (p=0.000 n=50)
memmove/109        4.170n ± 0%   3.660n ± 0%  -12.24% (n=50)
memmove/110        4.174n ± 0%   3.660n ± 0%  -12.31% (n=50)
memmove/111        4.176n ± 0%   3.660n ± 0%  -12.35% (p=0.000 n=50)
memmove/112        4.174n ± 0%   3.659n ± 0%  -12.34% (p=0.000 n=50)
memmove/113        4.176n ± 0%   3.660n ± 0%  -12.35% (n=50)
memmove/114        4.182n ± 0%   3.660n ± 0%  -12.49% (n=50)
memmove/115        4.185n ± 0%   3.660n ± 0%  -12.55% (n=50)
memmove/116        4.184n ± 0%   3.659n ± 0%  -12.54% (n=50)
memmove/117        4.182n ± 0%   3.660n ± 0%  -12.50% (n=50)
memmove/118        4.188n ± 0%   3.660n ± 0%  -12.61% (n=50)
memmove/119        4.186n ± 0%   3.660n ± 0%  -12.57% (p=0.000 n=50)
memmove/120        4.189n ± 0%   3.659n ± 0%  -12.63% (n=50)
memmove/121        4.187n ± 0%   3.660n ± 0%  -12.60% (n=50)
memmove/122        4.186n ± 0%   3.660n ± 0%  -12.58% (n=50)
memmove/123        4.187n ± 0%   3.660n ± 0%  -12.60% (n=50)
memmove/124        4.189n ± 0%   3.659n ± 0%  -12.65% (n=50)
memmove/125        4.195n ± 0%   3.659n ± 0%  -12.78% (n=50)
memmove/126        4.197n ± 0%   3.659n ± 0%  -12.81% (n=50)
memmove/127        4.194n ± 0%   3.659n ± 0%  -12.75% (n=50)
memmove/128        5.035n ± 0%   3.659n ± 0%  -27.32% (n=50)
memmove/129        5.127n ± 0%   5.164n ± 0%   +0.73% (p=0.000 n=50)
memmove/130        5.130n ± 0%   5.176n ± 0%   +0.88% (p=0.000 n=50)
memmove/131        5.127n ± 0%   5.180n ± 0%   +1.05% (p=0.000 n=50)
memmove/132        5.131n ± 0%   5.169n ± 0%   +0.75% (p=0.000 n=50)
memmove/133        5.137n ± 0%   5.179n ± 0%   +0.81% (p=0.000 n=50)
memmove/134        5.140n ± 0%   5.178n ± 0%   +0.74% (p=0.000 n=50)
memmove/135        5.141n ± 0%   5.187n ± 0%   +0.88% (p=0.000 n=50)
memmove/136        5.133n ± 0%   5.184n ± 0%   +0.99% (p=0.000 n=50)
memmove/137        5.148n ± 0%   5.186n ± 0%   +0.73% (p=0.000 n=50)
memmove/138        5.143n ± 0%   5.189n ± 0%   +0.88% (p=0.000 n=50)
memmove/139        5.142n ± 0%   5.192n ± 0%   +0.97% (p=0.000 n=50)
memmove/140        5.141n ± 0%   5.192n ± 0%   +1.01% (p=0.000 n=50)
memmove/141        5.155n ± 0%   5.188n ± 0%   +0.64% (p=0.000 n=50)
memmove/142        5.146n ± 0%   5.192n ± 0%   +0.90% (p=0.000 n=50)
memmove/143        5.142n ± 0%   5.203n ± 0%   +1.19% (p=0.000 n=50)
memmove/144        5.146n ± 0%   5.197n ± 0%   +0.99% (p=0.000 n=50)
memmove/145        5.146n ± 0%   5.196n ± 0%   +0.97% (p=0.000 n=50)
memmove/146        5.151n ± 0%   5.207n ± 0%   +1.10% (p=0.000 n=50)
memmove/147        5.151n ± 0%   5.205n ± 0%   +1.06% (p=0.000 n=50)
memmove/148        5.156n ± 0%   5.190n ± 0%   +0.66% (p=0.000 n=50)
memmove/149        5.158n ± 0%   5.212n ± 0%   +1.04% (p=0.000 n=50)
memmove/150        5.160n ± 0%   5.203n ± 0%   +0.84% (p=0.000 n=50)
memmove/151        5.167n ± 0%   5.210n ± 0%   +0.83% (p=0.000 n=50)
memmove/152        5.157n ± 0%   5.206n ± 0%   +0.94% (p=0.000 n=50)
memmove/153        5.170n ± 0%   5.211n ± 0%   +0.80% (p=0.000 n=50)
memmove/154        5.169n ± 0%   5.222n ± 0%   +1.02% (p=0.000 n=50)
memmove/155        5.171n ± 0%   5.215n ± 0%   +0.87% (p=0.000 n=50)
memmove/156        5.174n ± 0%   5.214n ± 0%   +0.78% (p=0.000 n=50)
memmove/157        5.171n ± 0%   5.218n ± 0%   +0.92% (p=0.000 n=50)
memmove/158        5.168n ± 0%   5.224n ± 0%   +1.09% (p=0.000 n=50)
memmove/159        5.179n ± 0%   5.218n ± 0%   +0.76% (p=0.000 n=50)
memmove/160        5.170n ± 0%   5.219n ± 0%   +0.95% (p=0.000 n=50)
memmove/161        5.187n ± 0%   5.220n ± 0%   +0.64% (p=0.000 n=50)
memmove/162        5.189n ± 0%   5.234n ± 0%   +0.86% (p=0.000 n=50)
memmove/163        5.199n ± 0%   5.250n ± 0%   +0.99% (p=0.000 n=50)
memmove/164        5.205n ± 0%   5.260n ± 0%   +1.04% (p=0.000 n=50)
memmove/165        5.208n ± 0%   5.261n ± 0%   +1.01% (p=0.000 n=50)
memmove/166        5.227n ± 0%   5.275n ± 0%   +0.91% (p=0.000 n=50)
memmove/167        5.233n ± 0%   5.281n ± 0%   +0.92% (p=0.000 n=50)
memmove/168        5.236n ± 0%   5.295n ± 0%   +1.12% (p=0.000 n=50)
memmove/169        5.256n ± 0%   5.297n ± 0%   +0.79% (p=0.000 n=50)
memmove/170        5.259n ± 0%   5.302n ± 0%   +0.80% (p=0.000 n=50)
memmove/171        5.269n ± 0%   5.321n ± 0%   +0.97% (p=0.000 n=50)
memmove/172        5.266n ± 0%   5.318n ± 0%   +0.98% (p=0.000 n=50)
memmove/173        5.272n ± 0%   5.330n ± 0%   +1.09% (p=0.000 n=50)
memmove/174        5.284n ± 0%   5.331n ± 0%   +0.89% (p=0.000 n=50)
memmove/175        5.284n ± 0%   5.322n ± 0%   +0.72% (p=0.000 n=50)
memmove/176        5.298n ± 0%   5.337n ± 0%   +0.74% (p=0.000 n=50)
memmove/177        5.282n ± 0%   5.338n ± 0%   +1.04% (p=0.000 n=50)
memmove/178        5.299n ± 0%   5.337n ± 0%   +0.71% (p=0.000 n=50)
memmove/179        5.296n ± 0%   5.343n ± 0%   +0.88% (p=0.000 n=50)
memmove/180        5.292n ± 0%   5.343n ± 0%   +0.97% (p=0.000 n=50)
memmove/181        5.303n ± 0%   5.335n ± 0%   +0.60% (p=0.000 n=50)
memmove/182        5.305n ± 0%   5.338n ± 0%   +0.62% (p=0.000 n=50)
memmove/183        5.298n ± 0%   5.329n ± 0%   +0.59% (p=0.000 n=50)
memmove/184        5.299n ± 0%   5.333n ± 0%   +0.64% (p=0.000 n=50)
memmove/185        5.291n ± 0%   5.330n ± 0%   +0.73% (p=0.000 n=50)
memmove/186        5.296n ± 0%   5.332n ± 0%   +0.68% (p=0.000 n=50)
memmove/187        5.297n ± 0%   5.320n ± 0%   +0.44% (p=0.000 n=50)
memmove/188        5.286n ± 0%   5.314n ± 0%   +0.53% (p=0.000 n=50)
memmove/189        5.293n ± 0%   5.318n ± 0%   +0.46% (p=0.000 n=50)
memmove/190        5.294n ± 0%   5.318n ± 0%   +0.45% (p=0.000 n=50)
memmove/191        5.292n ± 0%   5.314n ± 0%   +0.40% (p=0.032 n=50)
memmove/192        5.272n ± 0%   5.304n ± 0%   +0.60% (p=0.000 n=50)
memmove/193        5.279n ± 0%   5.310n ± 0%   +0.57% (p=0.000 n=50)
memmove/194        5.294n ± 0%   5.308n ± 0%   +0.26% (p=0.018 n=50)
memmove/195        5.302n ± 0%   5.311n ± 0%   +0.18% (p=0.010 n=50)
memmove/196        5.301n ± 0%   5.316n ± 0%   +0.28% (p=0.023 n=50)
memmove/197        5.302n ± 0%   5.327n ± 0%   +0.47% (p=0.000 n=50)
memmove/198        5.310n ± 0%   5.326n ± 0%   +0.30% (p=0.003 n=50)
memmove/199        5.303n ± 0%   5.319n ± 0%   +0.30% (p=0.009 n=50)
memmove/200        5.312n ± 0%   5.330n ± 0%   +0.35% (p=0.001 n=50)
memmove/201        5.307n ± 0%   5.333n ± 0%   +0.50% (p=0.000 n=50)
memmove/202        5.311n ± 0%   5.334n ± 0%   +0.44% (p=0.000 n=50)
memmove/203        5.313n ± 0%   5.335n ± 0%   +0.41% (p=0.006 n=50)
memmove/204        5.312n ± 0%   5.332n ± 0%   +0.36% (p=0.002 n=50)
memmove/205        5.318n ± 0%   5.345n ± 0%   +0.50% (p=0.000 n=50)
memmove/206        5.311n ± 0%   5.333n ± 0%   +0.42% (p=0.002 n=50)
memmove/207        5.310n ± 0%   5.338n ± 0%   +0.52% (p=0.000 n=50)
memmove/208        5.319n ± 0%   5.341n ± 0%   +0.40% (p=0.004 n=50)
memmove/209        5.330n ± 0%   5.346n ± 0%   +0.30% (p=0.004 n=50)
memmove/210        5.329n ± 0%   5.349n ± 0%   +0.38% (p=0.002 n=50)
memmove/211        5.318n ± 0%   5.340n ± 0%   +0.41% (p=0.000 n=50)
memmove/212        5.339n ± 0%   5.343n ± 0%        ~ (p=0.396 n=50)
memmove/213        5.329n ± 0%   5.343n ± 0%   +0.25% (p=0.017 n=50)
memmove/214        5.339n ± 0%   5.358n ± 0%   +0.35% (p=0.035 n=50)
memmove/215        5.342n ± 0%   5.346n ± 0%        ~ (p=0.063 n=50)
memmove/216        5.338n ± 0%   5.359n ± 0%   +0.39% (p=0.002 n=50)
memmove/217        5.341n ± 0%   5.362n ± 0%   +0.39% (p=0.015 n=50)
memmove/218        5.354n ± 0%   5.373n ± 0%   +0.36% (p=0.041 n=50)
memmove/219        5.352n ± 0%   5.362n ± 0%        ~ (p=0.143 n=50)
memmove/220        5.344n ± 0%   5.370n ± 0%   +0.50% (p=0.001 n=50)
memmove/221        5.345n ± 0%   5.373n ± 0%   +0.53% (p=0.000 n=50)
memmove/222        5.348n ± 0%   5.360n ± 0%   +0.23% (p=0.014 n=50)
memmove/223        5.354n ± 0%   5.377n ± 0%   +0.43% (p=0.024 n=50)
memmove/224        5.352n ± 0%   5.363n ± 0%        ~ (p=0.052 n=50)
memmove/225        5.372n ± 0%   5.380n ± 0%        ~ (p=0.481 n=50)
memmove/226        5.368n ± 0%   5.386n ± 0%   +0.34% (p=0.004 n=50)
memmove/227        5.386n ± 0%   5.402n ± 0%   +0.29% (p=0.028 n=50)
memmove/228        5.400n ± 0%   5.408n ± 0%        ~ (p=0.174 n=50)
memmove/229        5.423n ± 0%   5.427n ± 0%        ~ (p=0.444 n=50)
memmove/230        5.411n ± 0%   5.429n ± 0%   +0.33% (p=0.020 n=50)
memmove/231        5.420n ± 0%   5.433n ± 0%   +0.24% (p=0.034 n=50)
memmove/232        5.435n ± 0%   5.441n ± 0%        ~ (p=0.235 n=50)
memmove/233        5.446n ± 0%   5.462n ± 0%        ~ (p=0.590 n=50)
memmove/234        5.467n ± 0%   5.461n ± 0%        ~ (p=0.921 n=50)
memmove/235        5.472n ± 0%   5.478n ± 0%        ~ (p=0.883 n=50)
memmove/236        5.466n ± 0%   5.478n ± 0%        ~ (p=0.324 n=50)
memmove/237        5.471n ± 0%   5.489n ± 0%        ~ (p=0.132 n=50)
memmove/238        5.485n ± 0%   5.489n ± 0%        ~ (p=0.460 n=50)
memmove/239        5.484n ± 0%   5.488n ± 0%        ~ (p=0.833 n=50)
memmove/240        5.483n ± 0%   5.495n ± 0%        ~ (p=0.095 n=50)
memmove/241        5.498n ± 0%   5.514n ± 0%        ~ (p=0.077 n=50)
memmove/242        5.518n ± 0%   5.517n ± 0%        ~ (p=0.481 n=50)
memmove/243        5.514n ± 0%   5.511n ± 0%        ~ (p=0.503 n=50)
memmove/244        5.510n ± 0%   5.497n ± 0%   -0.24% (p=0.038 n=50)
memmove/245        5.516n ± 0%   5.505n ± 0%        ~ (p=0.317 n=50)
memmove/246        5.513n ± 1%   5.494n ± 0%        ~ (p=0.147 n=50)
memmove/247        5.518n ± 0%   5.499n ± 0%   -0.36% (p=0.011 n=50)
memmove/248        5.503n ± 0%   5.492n ± 0%        ~ (p=0.267 n=50)
memmove/249        5.498n ± 0%   5.497n ± 0%        ~ (p=0.765 n=50)
memmove/250        5.485n ± 0%   5.493n ± 0%        ~ (p=0.348 n=50)
memmove/251        5.503n ± 0%   5.482n ± 0%   -0.37% (p=0.013 n=50)
memmove/252        5.497n ± 0%   5.485n ± 0%        ~ (p=0.077 n=50)
memmove/253        5.489n ± 0%   5.496n ± 0%        ~ (p=0.850 n=50)
memmove/254        5.497n ± 0%   5.491n ± 0%        ~ (p=0.548 n=50)
memmove/255        5.484n ± 1%   5.494n ± 0%        ~ (p=0.888 n=50)
memmove/256        6.952n ± 0%   7.676n ± 0%  +10.41% (p=0.000 n=50)
geomean            4.406n        4.127n        -6.33%
```
2023-10-26 13:40:25 +02:00
Dmitry Vyukov
605fadf0ca
[libc] Add --sweep-min-size flag for benchmarks (#70302)
We have --sweep-max-size, it's reasonable to have --sweep-min-size as
well. It can be used when working on the logic for larger sizes, or to
collect a profile for larger sizes only.
2023-10-26 11:06:15 +02:00
Joseph Huber
e3d2a7d0a5
[libc] Compile the GPU functions with '-fconvergent-functions' (#70229)
Summary:
This patch simply adds the `-fconvergent-functions` flag to the GPU
compilation. This is in relation to the behaviour of SIMT
architectures under divergence. With the flag, we assume every function
is convergent by default and rely on the compiler's divergence analysis
to transform it if possible.

Fixes: https://github.com/llvm/llvm-project/issues/63853
2023-10-25 14:13:21 -05:00
Benjamin Kramer
c4e9a43773 [libc] Fix a constexpr violation from b4e552999d
In msan mode this calls __msan_unpoison, which isn't constexpr.
2023-10-25 13:36:17 +02:00
michaelrj-google
2282af26ea
[libc] Disable -NaN test on float128 systems (#70146)
Some float128 systems (specifically the ones used for aarch64 buildbots)
don't respect signs for long double NaNs. This patch disables the printf
test that was failing due to this.
2023-10-24 16:45:54 -07:00
michaelrj-google
b4e552999d
[libc] Fix printf long double inf, bitcast in msan (#70067)
These bugs were found with the new printf long double fuzzing. The long
double inf vs nan bug was introduced when we changed to
get_explicit_exponent. The bitcast msan issue hadn't come up previously,
but isn't a real bug, just a poisoning confusion.
2023-10-24 15:41:54 -07:00
Dmitry Vyukov
f364a7a8b4
[libc] Speed up memmove overlapping check (#70017)
Use a check that requries fewer instructions and cheaper.
Current code:
```
   1b704:       48 39 f7                cmp    %rsi,%rdi
   1b707:       48 89 f0                mov    %rsi,%rax
   1b70a:       48 0f 47 c7             cmova  %rdi,%rax
   1b70e:       48 89 f9                mov    %rdi,%rcx
   1b711:       48 0f 47 ce             cmova  %rsi,%rcx
   1b715:       48 01 d1                add    %rdx,%rcx
   1b718:       48 39 c1                cmp    %rax,%rcx
```
New code:
```
   1b704:       48 89 f8                mov    %rdi,%rax
   1b707:       48 29 f0                sub    %rsi,%rax
   1b70a:       48 89 c1                mov    %rax,%rcx
   1b70d:       48 f7 d9                neg    %rcx
   1b710:       48 0f 48 c8             cmovs  %rax,%rcx
   1b714:       48 39 d1                cmp    %rdx,%rcx
```
```
                 │  baseline   │              disjoint              │
                 │   sec/op    │   sec/op     vs base               │
memmove/Google_A   3.910n ± 0%   3.861n ± 1%  -1.26% (p=0.000 n=50)
```
```
            │  baseline   │              disjoint               │
            │   sec/op    │   sec/op     vs base                │
memmove/1     2.724n ± 3%   2.441n ± 0%  -10.37% (n=50)
memmove/2     2.878n ± 0%   2.713n ± 0%   -5.73% (n=50)
memmove/3     2.835n ± 0%   2.593n ± 0%   -8.54% (n=50)
memmove/4     3.032n ± 0%   2.776n ± 0%   -8.45% (p=0.000 n=50)
memmove/5     2.833n ± 0%   2.600n ± 0%   -8.20% (p=0.000 n=50)
memmove/6     2.758n ± 0%   2.744n ± 0%   -0.52% (p=0.000 n=50)
memmove/7     2.762n ± 0%   2.744n ± 0%   -0.63% (p=0.000 n=50)
memmove/8     2.763n ± 0%   2.750n ± 0%   -0.46% (p=0.000 n=50)
memmove/9     3.182n ± 0%   3.269n ± 0%   +2.75% (p=0.000 n=50)
memmove/10    3.185n ± 0%   3.270n ± 0%   +2.64% (p=0.000 n=50)
memmove/11    3.188n ± 0%   3.277n ± 0%   +2.79% (p=0.000 n=50)
memmove/12    3.190n ± 0%   3.279n ± 0%   +2.82% (p=0.000 n=50)
memmove/13    3.194n ± 0%   3.281n ± 0%   +2.73% (p=0.000 n=50)
memmove/14    3.197n ± 0%   3.285n ± 0%   +2.77% (p=0.000 n=50)
memmove/15    3.198n ± 0%   3.282n ± 0%   +2.62% (p=0.000 n=50)
memmove/16    3.201n ± 0%   3.284n ± 0%   +2.61% (p=0.000 n=50)
memmove/17    3.564n ± 0%   3.320n ± 0%   -6.86% (p=0.000 n=50)
memmove/18    3.572n ± 0%   3.313n ± 0%   -7.25% (p=0.000 n=50)
memmove/19    3.572n ± 0%   3.325n ± 0%   -6.94% (p=0.000 n=50)
memmove/20    3.575n ± 0%   3.319n ± 0%   -7.15% (p=0.000 n=50)
memmove/21    3.578n ± 0%   3.327n ± 0%   -7.03% (p=0.000 n=50)
memmove/22    3.581n ± 0%   3.330n ± 0%   -7.01% (p=0.000 n=50)
memmove/23    3.582n ± 0%   3.354n ± 1%   -6.37% (p=0.000 n=50)
memmove/24    3.587n ± 0%   3.347n ± 1%   -6.71% (p=0.000 n=50)
memmove/25    3.591n ± 0%   3.320n ± 0%   -7.55% (p=0.000 n=50)
memmove/26    3.593n ± 0%   3.348n ± 0%   -6.82% (p=0.000 n=50)
memmove/27    3.596n ± 0%   3.346n ± 0%   -6.94% (p=0.000 n=50)
memmove/28    3.597n ± 0%   3.357n ± 0%   -6.67% (p=0.000 n=50)
memmove/29    3.601n ± 0%   3.340n ± 0%   -7.23% (p=0.000 n=50)
memmove/30    3.602n ± 0%   3.345n ± 0%   -7.12% (p=0.000 n=50)
memmove/31    3.608n ± 0%   3.357n ± 0%   -6.94% (p=0.000 n=50)
memmove/32    3.605n ± 0%   3.352n ± 0%   -7.01% (p=0.000 n=50)
memmove/33    4.128n ± 1%   3.829n ± 0%   -7.23% (p=0.000 n=50)
memmove/34    4.149n ± 0%   3.836n ± 0%   -7.54% (p=0.000 n=50)
memmove/35    4.134n ± 0%   3.839n ± 0%   -7.15% (n=50)
memmove/36    4.151n ± 0%   3.842n ± 0%   -7.45% (n=50)
memmove/37    4.152n ± 0%   3.841n ± 0%   -7.49% (p=0.000 n=50)
memmove/38    4.159n ± 0%   3.844n ± 0%   -7.58% (p=0.000 n=50)
memmove/39    4.165n ± 0%   3.841n ± 0%   -7.78% (p=0.000 n=50)
memmove/40    4.162n ± 0%   3.837n ± 0%   -7.81% (p=0.000 n=50)
memmove/41    4.161n ± 0%   3.845n ± 0%   -7.58% (p=0.000 n=50)
memmove/42    4.164n ± 0%   3.851n ± 0%   -7.53% (p=0.000 n=50)
memmove/43    4.165n ± 0%   3.843n ± 0%   -7.74% (p=0.000 n=50)
memmove/44    4.175n ± 0%   3.847n ± 0%   -7.83% (p=0.000 n=50)
memmove/45    4.170n ± 0%   3.849n ± 0%   -7.70% (p=0.000 n=50)
memmove/46    4.175n ± 0%   3.850n ± 0%   -7.79% (p=0.000 n=50)
memmove/47    4.180n ± 0%   3.851n ± 0%   -7.87% (p=0.000 n=50)
memmove/48    4.178n ± 0%   3.852n ± 0%   -7.81% (p=0.000 n=50)
memmove/49    4.175n ± 0%   3.851n ± 0%   -7.76% (n=50)
memmove/50    4.178n ± 0%   3.855n ± 0%   -7.73% (p=0.000 n=50)
memmove/51    4.190n ± 0%   3.859n ± 0%   -7.91% (p=0.000 n=50)
memmove/52    4.188n ± 0%   3.859n ± 0%   -7.84% (p=0.000 n=50)
memmove/53    4.191n ± 0%   3.863n ± 0%   -7.82% (p=0.000 n=50)
memmove/54    4.192n ± 0%   3.860n ± 0%   -7.91% (p=0.000 n=50)
memmove/55    4.192n ± 0%   3.869n ± 0%   -7.70% (p=0.000 n=50)
memmove/56    4.204n ± 0%   3.866n ± 0%   -8.05% (p=0.000 n=50)
memmove/57    4.198n ± 0%   3.864n ± 0%   -7.95% (p=0.000 n=50)
memmove/58    4.202n ± 0%   3.865n ± 0%   -8.02% (p=0.000 n=50)
memmove/59    4.208n ± 0%   3.868n ± 0%   -8.09% (p=0.000 n=50)
memmove/60    4.205n ± 0%   3.873n ± 0%   -7.89% (p=0.000 n=50)
memmove/61    4.212n ± 0%   3.872n ± 0%   -8.08% (p=0.000 n=50)
memmove/62    4.214n ± 0%   3.870n ± 0%   -8.16% (p=0.000 n=50)
memmove/63    4.215n ± 0%   3.877n ± 0%   -8.02% (p=0.000 n=50)
memmove/64    4.217n ± 0%   3.881n ± 0%   -7.99% (p=0.000 n=50)
memmove/65    4.990n ± 0%   4.683n ± 0%   -6.15% (p=0.000 n=50)
memmove/66    5.022n ± 0%   4.719n ± 0%   -6.03% (p=0.000 n=50)
memmove/67    5.030n ± 0%   4.725n ± 0%   -6.07% (p=0.000 n=50)
memmove/68    5.035n ± 0%   4.724n ± 0%   -6.18% (p=0.000 n=50)
memmove/69    5.030n ± 0%   4.725n ± 0%   -6.07% (p=0.000 n=50)
memmove/70    5.040n ± 0%   4.728n ± 0%   -6.19% (p=0.000 n=50)
memmove/71    5.053n ± 0%   4.728n ± 0%   -6.43% (p=0.000 n=50)
memmove/72    5.050n ± 0%   4.732n ± 0%   -6.29% (p=0.000 n=50)
memmove/73    5.049n ± 0%   4.733n ± 0%   -6.24% (p=0.000 n=50)
memmove/74    5.054n ± 0%   4.734n ± 0%   -6.34% (p=0.000 n=50)
memmove/75    5.063n ± 0%   4.736n ± 0%   -6.46% (p=0.000 n=50)
memmove/76    5.046n ± 0%   4.741n ± 0%   -6.04% (p=0.000 n=50)
memmove/77    5.057n ± 0%   4.741n ± 0%   -6.25% (p=0.000 n=50)
memmove/78    5.077n ± 0%   4.739n ± 0%   -6.65% (p=0.000 n=50)
memmove/79    5.074n ± 0%   4.746n ± 0%   -6.46% (p=0.000 n=50)
memmove/80    5.085n ± 0%   4.747n ± 0%   -6.65% (p=0.000 n=50)
memmove/81    5.077n ± 0%   4.735n ± 0%   -6.74% (p=0.000 n=50)
memmove/82    5.087n ± 0%   4.747n ± 0%   -6.68% (p=0.000 n=50)
memmove/83    5.087n ± 0%   4.754n ± 0%   -6.56% (p=0.000 n=50)
memmove/84    5.096n ± 0%   4.753n ± 0%   -6.73% (p=0.000 n=50)
memmove/85    5.082n ± 0%   4.749n ± 0%   -6.55% (p=0.000 n=50)
memmove/86    5.103n ± 0%   4.752n ± 0%   -6.87% (p=0.000 n=50)
memmove/87    5.096n ± 0%   4.760n ± 0%   -6.61% (p=0.000 n=50)
memmove/88    5.099n ± 0%   4.765n ± 0%   -6.55% (p=0.000 n=50)
memmove/89    5.104n ± 0%   4.757n ± 0%   -6.79% (p=0.000 n=50)
memmove/90    5.117n ± 0%   4.767n ± 0%   -6.84% (p=0.000 n=50)
memmove/91    5.100n ± 0%   4.766n ± 0%   -6.54% (p=0.000 n=50)
memmove/92    5.103n ± 0%   4.763n ± 0%   -6.67% (p=0.000 n=50)
memmove/93    5.115n ± 0%   4.772n ± 0%   -6.71% (p=0.000 n=50)
memmove/94    5.117n ± 0%   4.769n ± 0%   -6.80% (p=0.000 n=50)
memmove/95    5.131n ± 0%   4.775n ± 0%   -6.94% (p=0.000 n=50)
memmove/96    5.129n ± 0%   4.772n ± 0%   -6.97% (p=0.000 n=50)
memmove/97    5.130n ± 0%   4.764n ± 0%   -7.13% (p=0.000 n=50)
memmove/98    5.134n ± 0%   4.780n ± 0%   -6.89% (p=0.000 n=50)
memmove/99    5.141n ± 0%   4.780n ± 0%   -7.03% (p=0.000 n=50)
memmove/100   5.141n ± 0%   4.780n ± 0%   -7.02% (p=0.000 n=50)
memmove/101   5.150n ± 0%   4.782n ± 0%   -7.14% (p=0.000 n=50)
memmove/102   5.150n ± 0%   4.790n ± 0%   -6.99% (p=0.000 n=50)
memmove/103   5.156n ± 0%   4.788n ± 0%   -7.14% (n=50)
memmove/104   5.157n ± 0%   4.793n ± 0%   -7.05% (p=0.000 n=50)
memmove/105   5.147n ± 0%   4.791n ± 0%   -6.90% (p=0.000 n=50)
memmove/106   5.167n ± 0%   4.793n ± 0%   -7.23% (p=0.000 n=50)
memmove/107   5.165n ± 0%   4.801n ± 0%   -7.06% (p=0.000 n=50)
memmove/108   5.173n ± 0%   4.800n ± 0%   -7.21% (p=0.000 n=50)
memmove/109   5.173n ± 0%   4.797n ± 0%   -7.27% (p=0.000 n=50)
memmove/110   5.171n ± 0%   4.808n ± 0%   -7.01% (p=0.000 n=50)
memmove/111   5.180n ± 0%   4.799n ± 0%   -7.36% (p=0.000 n=50)
memmove/112   5.185n ± 0%   4.812n ± 0%   -7.19% (p=0.000 n=50)
memmove/113   5.187n ± 0%   4.797n ± 0%   -7.53% (p=0.000 n=50)
memmove/114   5.183n ± 0%   4.809n ± 0%   -7.21% (n=50)
memmove/115   5.193n ± 0%   4.811n ± 0%   -7.36% (p=0.000 n=50)
memmove/116   5.196n ± 0%   4.815n ± 0%   -7.32% (p=0.000 n=50)
memmove/117   5.199n ± 0%   4.816n ± 0%   -7.37% (p=0.000 n=50)
memmove/118   5.198n ± 0%   4.811n ± 0%   -7.45% (p=0.000 n=50)
memmove/119   5.203n ± 0%   4.818n ± 0%   -7.40% (p=0.000 n=50)
memmove/120   5.195n ± 0%   4.823n ± 0%   -7.16% (p=0.000 n=50)
memmove/121   5.203n ± 0%   4.812n ± 0%   -7.51% (p=0.000 n=50)
memmove/122   5.204n ± 0%   4.818n ± 0%   -7.42% (n=50)
memmove/123   5.202n ± 0%   4.822n ± 0%   -7.31% (p=0.000 n=50)
memmove/124   5.216n ± 0%   4.823n ± 0%   -7.54% (p=0.000 n=50)
memmove/125   5.227n ± 0%   4.823n ± 0%   -7.72% (p=0.000 n=50)
memmove/126   5.235n ± 0%   4.830n ± 0%   -7.74% (p=0.000 n=50)
memmove/127   5.237n ± 0%   4.833n ± 0%   -7.72% (p=0.000 n=50)
memmove/128   5.241n ± 0%   4.832n ± 0%   -7.81% (p=0.000 n=50)
memmove/129   6.460n ± 0%   5.858n ± 0%   -9.31% (p=0.000 n=50)
memmove/130   7.539n ± 0%   6.634n ± 0%  -12.00% (p=0.000 n=50)
memmove/131   7.542n ± 0%   6.623n ± 0%  -12.18% (p=0.000 n=50)
memmove/132   7.527n ± 0%   6.667n ± 1%  -11.43% (p=0.000 n=50)
memmove/133   7.521n ± 0%   6.631n ± 0%  -11.83% (p=0.000 n=50)
memmove/134   7.531n ± 0%   6.642n ± 0%  -11.81% (p=0.000 n=50)
memmove/135   7.541n ± 0%   6.692n ± 1%  -11.25% (p=0.000 n=50)
memmove/136   7.549n ± 0%   6.657n ± 0%  -11.81% (p=0.000 n=50)
memmove/137   7.544n ± 0%   6.646n ± 0%  -11.90% (p=0.000 n=50)
memmove/138   7.557n ± 0%   6.673n ± 1%  -11.70% (p=0.000 n=50)
memmove/139   7.545n ± 0%   6.654n ± 0%  -11.81% (n=50)
memmove/140   7.559n ± 0%   6.680n ± 1%  -11.63% (p=0.000 n=50)
memmove/141   7.560n ± 0%   6.664n ± 0%  -11.85% (p=0.000 n=50)
memmove/142   7.556n ± 0%   6.679n ± 0%  -11.62% (p=0.000 n=50)
memmove/143   7.570n ± 0%   6.683n ± 1%  -11.71% (p=0.000 n=50)
memmove/144   7.586n ± 0%   6.683n ± 0%  -11.91% (p=0.000 n=50)
memmove/145   7.593n ± 0%   6.665n ± 0%  -12.22% (p=0.000 n=50)
memmove/146   7.591n ± 0%   6.665n ± 0%  -12.20% (p=0.000 n=50)
memmove/147   7.598n ± 0%   6.665n ± 0%  -12.27% (p=0.000 n=50)
memmove/148   7.598n ± 0%   6.670n ± 0%  -12.21% (p=0.000 n=50)
memmove/149   7.593n ± 0%   6.691n ± 0%  -11.88% (p=0.000 n=50)
memmove/150   7.625n ± 0%   6.713n ± 1%  -11.97% (p=0.000 n=50)
memmove/151   7.603n ± 0%   6.710n ± 1%  -11.74% (p=0.000 n=50)
memmove/152   7.613n ± 0%   6.701n ± 1%  -11.97% (p=0.000 n=50)
memmove/153   7.595n ± 0%   6.710n ± 0%  -11.65% (p=0.000 n=50)
memmove/154   7.614n ± 0%   6.721n ± 0%  -11.74% (p=0.000 n=50)
memmove/155   7.615n ± 0%   6.709n ± 0%  -11.89% (p=0.000 n=50)
memmove/156   7.613n ± 0%   6.693n ± 0%  -12.08% (p=0.000 n=50)
memmove/157   7.628n ± 0%   6.708n ± 0%  -12.05% (p=0.000 n=50)
memmove/158   7.629n ± 0%   6.706n ± 0%  -12.10% (p=0.000 n=50)
memmove/159   7.639n ± 0%   6.724n ± 0%  -11.98% (p=0.000 n=50)
memmove/160   7.619n ± 0%   6.702n ± 0%  -12.04% (p=0.000 n=50)
memmove/161   7.653n ± 0%   6.698n ± 0%  -12.49% (p=0.000 n=50)
memmove/162   8.104n ± 0%   7.140n ± 1%  -11.89% (p=0.000 n=50)
memmove/163   8.141n ± 0%   7.187n ± 1%  -11.72% (p=0.000 n=50)
memmove/164   8.154n ± 0%   7.107n ± 0%  -12.84% (p=0.000 n=50)
memmove/165   8.143n ± 0%   7.117n ± 0%  -12.59% (p=0.000 n=50)
memmove/166   8.176n ± 0%   7.110n ± 0%  -13.04% (p=0.000 n=50)
memmove/167   8.194n ± 0%   7.168n ± 1%  -12.52% (p=0.000 n=50)
memmove/168   8.214n ± 0%   7.188n ± 1%  -12.50% (p=0.000 n=50)
memmove/169   8.220n ± 0%   7.242n ± 1%  -11.90% (p=0.000 n=50)
memmove/170   8.228n ± 0%   7.244n ± 1%  -11.96% (p=0.000 n=50)
memmove/171   8.263n ± 0%   7.184n ± 0%  -13.06% (p=0.000 n=50)
memmove/172   8.259n ± 0%   7.325n ± 1%  -11.31% (p=0.000 n=50)
memmove/173   8.271n ± 0%   7.225n ± 0%  -12.65% (p=0.000 n=50)
memmove/174   8.284n ± 0%   7.287n ± 1%  -12.04% (p=0.000 n=50)
memmove/175   8.289n ± 0%   7.282n ± 1%  -12.15% (p=0.000 n=50)
memmove/176   8.309n ± 0%   7.328n ± 1%  -11.81% (p=0.000 n=50)
memmove/177   8.317n ± 0%   7.264n ± 1%  -12.67% (p=0.000 n=50)
memmove/178   8.302n ± 0%   7.342n ± 1%  -11.57% (p=0.000 n=50)
memmove/179   8.309n ± 0%   7.357n ± 1%  -11.45% (p=0.000 n=50)
memmove/180   8.304n ± 0%   7.318n ± 1%  -11.87% (p=0.000 n=50)
memmove/181   8.312n ± 0%   7.363n ± 1%  -11.42% (p=0.000 n=50)
memmove/182   8.315n ± 0%   7.320n ± 1%  -11.96% (p=0.000 n=50)
memmove/183   8.330n ± 0%   7.286n ± 1%  -12.53% (p=0.000 n=50)
memmove/184   8.310n ± 0%   7.324n ± 1%  -11.86% (p=0.000 n=50)
memmove/185   8.303n ± 0%   7.267n ± 1%  -12.47% (p=0.000 n=50)
memmove/186   8.287n ± 0%   7.312n ± 1%  -11.76% (p=0.000 n=50)
memmove/187   8.298n ± 0%   7.395n ± 2%  -10.88% (p=0.000 n=50)
memmove/188   8.296n ± 0%   7.339n ± 1%  -11.54% (p=0.000 n=50)
memmove/189   8.306n ± 0%   7.299n ± 1%  -12.12% (p=0.000 n=50)
memmove/190   8.281n ± 0%   7.309n ± 1%  -11.74% (p=0.000 n=50)
memmove/191   8.299n ± 0%   7.282n ± 1%  -12.26% (p=0.000 n=50)
memmove/192   8.281n ± 0%   7.335n ± 1%  -11.41% (p=0.000 n=50)
memmove/193   8.299n ± 0%   7.325n ± 1%  -11.74% (p=0.000 n=50)
memmove/194   8.641n ± 0%   8.034n ± 0%   -7.02% (p=0.000 n=50)
memmove/195   8.667n ± 0%   8.073n ± 0%   -6.85% (p=0.000 n=50)
memmove/196   8.666n ± 0%   8.030n ± 0%   -7.34% (p=0.000 n=50)
memmove/197   8.660n ± 0%   8.096n ± 1%   -6.51% (p=0.000 n=50)
memmove/198   8.688n ± 0%   8.047n ± 0%   -7.39% (p=0.000 n=50)
memmove/199   8.678n ± 0%   8.061n ± 0%   -7.11% (p=0.000 n=50)
memmove/200   8.669n ± 0%   8.034n ± 0%   -7.32% (p=0.000 n=50)
memmove/201   8.692n ± 0%   8.061n ± 0%   -7.26% (p=0.000 n=50)
memmove/202   8.668n ± 0%   8.060n ± 0%   -7.02% (p=0.000 n=50)
memmove/203   8.687n ± 0%   8.066n ± 0%   -7.15% (p=0.000 n=50)
memmove/204   8.699n ± 0%   8.076n ± 0%   -7.16% (p=0.000 n=50)
memmove/205   8.676n ± 0%   8.085n ± 0%   -6.82% (p=0.000 n=50)
memmove/206   8.684n ± 0%   8.101n ± 1%   -6.71% (p=0.000 n=50)
memmove/207   8.725n ± 0%   8.099n ± 0%   -7.18% (p=0.000 n=50)
memmove/208   8.674n ± 0%   8.073n ± 0%   -6.92% (p=0.000 n=50)
memmove/209   8.697n ± 0%   8.088n ± 0%   -7.01% (p=0.000 n=50)
memmove/210   8.733n ± 0%   8.076n ± 0%   -7.53% (p=0.000 n=50)
memmove/211   8.732n ± 0%   8.104n ± 0%   -7.19% (p=0.000 n=50)
memmove/212   8.730n ± 0%   8.091n ± 0%   -7.32% (p=0.000 n=50)
memmove/213   8.728n ± 0%   8.100n ± 0%   -7.19% (p=0.000 n=50)
memmove/214   8.744n ± 1%   8.081n ± 1%   -7.57% (p=0.000 n=50)
memmove/215   8.734n ± 0%   8.150n ± 0%   -6.68% (p=0.000 n=50)
memmove/216   8.748n ± 0%   8.116n ± 0%   -7.23% (p=0.000 n=50)
memmove/217   8.751n ± 0%   8.129n ± 1%   -7.11% (p=0.000 n=50)
memmove/218   8.747n ± 0%   8.114n ± 0%   -7.23% (p=0.000 n=50)
memmove/219   8.733n ± 0%   8.159n ± 0%   -6.57% (p=0.000 n=50)
memmove/220   8.764n ± 0%   8.145n ± 0%   -7.06% (p=0.000 n=50)
memmove/221   8.764n ± 0%   8.142n ± 0%   -7.10% (p=0.000 n=50)
memmove/222   8.775n ± 0%   8.152n ± 0%   -7.10% (p=0.000 n=50)
memmove/223   8.771n ± 0%   8.143n ± 0%   -7.16% (p=0.000 n=50)
memmove/224   8.778n ± 0%   8.175n ± 1%   -6.87% (p=0.000 n=50)
memmove/225   8.794n ± 0%   8.138n ± 0%   -7.45% (p=0.000 n=50)
memmove/226   10.13n ± 0%   10.06n ± 0%   -0.71% (p=0.000 n=50)
memmove/227   10.14n ± 0%   10.08n ± 0%   -0.53% (p=0.000 n=50)
memmove/228   10.13n ± 0%   10.08n ± 0%   -0.56% (p=0.000 n=50)
memmove/229   10.17n ± 0%   10.11n ± 0%   -0.56% (p=0.000 n=50)
memmove/230   10.17n ± 0%   10.13n ± 0%   -0.38% (p=0.003 n=50)
memmove/231   10.16n ± 0%   10.12n ± 0%   -0.41% (p=0.001 n=50)
memmove/232   10.19n ± 0%   10.12n ± 0%   -0.67% (p=0.000 n=50)
memmove/233   10.21n ± 0%   10.14n ± 0%   -0.71% (p=0.000 n=50)
memmove/234   10.24n ± 0%   10.16n ± 0%   -0.79% (p=0.000 n=50)
memmove/235   10.24n ± 0%   10.16n ± 0%   -0.76% (p=0.000 n=50)
memmove/236   10.25n ± 0%   10.16n ± 0%   -0.81% (p=0.000 n=50)
memmove/237   10.24n ± 0%   10.17n ± 0%   -0.69% (p=0.000 n=50)
memmove/238   10.27n ± 0%   10.19n ± 0%   -0.79% (p=0.000 n=50)
memmove/239   10.29n ± 0%   10.19n ± 0%   -0.90% (p=0.000 n=50)
memmove/240   10.30n ± 0%   10.20n ± 0%   -0.95% (p=0.000 n=50)
memmove/241   10.29n ± 0%   10.20n ± 0%   -0.91% (p=0.000 n=50)
memmove/242   10.30n ± 0%   10.22n ± 0%   -0.80% (p=0.000 n=50)
memmove/243   10.32n ± 0%   10.23n ± 0%   -0.87% (p=0.000 n=50)
memmove/244   10.32n ± 0%   10.24n ± 0%   -0.74% (p=0.000 n=50)
memmove/245   10.33n ± 0%   10.23n ± 0%   -0.97% (p=0.000 n=50)
memmove/246   10.33n ± 0%   10.24n ± 0%   -0.92% (p=0.000 n=50)
memmove/247   10.31n ± 0%   10.24n ± 0%   -0.69% (p=0.000 n=50)
memmove/248   10.32n ± 0%   10.26n ± 0%   -0.55% (p=0.000 n=50)
memmove/249   10.33n ± 0%   10.28n ± 0%   -0.52% (p=0.000 n=50)
memmove/250   10.34n ± 0%   10.27n ± 0%   -0.66% (p=0.000 n=50)
memmove/251   10.32n ± 0%   10.27n ± 0%   -0.45% (p=0.000 n=50)
memmove/252   10.34n ± 0%   10.30n ± 0%   -0.39% (p=0.005 n=50)
memmove/253   10.33n ± 0%   10.27n ± 0%   -0.57% (p=0.000 n=50)
memmove/254   10.33n ± 0%   10.27n ± 0%   -0.54% (p=0.000 n=50)
memmove/255   10.34n ± 0%   10.29n ± 0%   -0.50% (p=0.002 n=50)
memmove/256   10.36n ± 0%   10.31n ± 0%   -0.44% (p=0.006 n=50)
memmove/257   10.33n ± 0%   10.29n ± 0%   -0.36% (p=0.004 n=50)
geomean       6.142n        5.696n        -7.26%
```
2023-10-24 16:05:27 +02:00
Joseph Huber
25bf1ae99b
[libc] Enable remaining string functions on the GPU (#68346)
Summary:
We previously had to disable these string functions because they were
not compatible with the definitions coming from the GNU / host
environment. The GPU, when exporting its declarations, has a very
difficult requirement that it be compatible with the host environment as
both sides of the compilation need to agree on definitions and what's
present.

This patch more or less gives up an just copies the definitions as
expected by `glibc` if they are provided that way, otherwise we fall
back to the accepted way. This is the alternative solution to an
existing PR which instead disable's GCC's handling.
2023-10-23 13:16:20 -04:00
Hans Wennborg
e2fc68c3db Typos: 'maxium', 'minium' 2023-10-23 10:42:28 +02:00
Anton Rydahl
e774482c4c
Fixed typo in GPU libm device library warning (#69752)
Correcting a small typo in the error message when the CUDA device libraries are not detected.
2023-10-20 12:17:26 -07:00
lntue
6d53fdeab4
[libc][NFC] Attempt to deflake gettimeofday_test. (#69719)
Only check if gettimeofday call succeeds.
2023-10-20 11:08:01 -04:00
lntue
ec10c36b07
[libc][NFC] Forcing data type in gettimeofday_test when comparing the diff. (#69652) 2023-10-19 19:49:59 -04:00
Joseph Huber
630037ede4
[libc] Partially implement 'rand' for the GPU (#66167)
Summary:
This patch partially implements the `rand` function on the GPU. This is
partial because the GPU currently doesn't support thread local storage
or static initializers. To implement this on the GPU. I use 1/8th of the
local / shared memory quota to treak the shared memory as thread local
storage. This is done by simply allocating enough storage for each
thread in the block and indexing into this based off of the thread id.
The downside to this is that it does not initialize `srand` correctly to
be `1` as the standard says, it is also wasteful. In the future we
should figure out a way to support TLS on the GPU so that this can be
completely common and less resource intensive.
2023-10-19 17:01:43 -04:00
Joseph Huber
a39215768b
[libc] Rework the 'fgets' implementation on the GPU (#69635)
Summary:
The `fgets` function as implemented is not functional currently when
called with multiple threads. This is because we rely on reapeatedly
polling the character to detect EOF. This doesn't work when there are
multiple threads that may with to poll the characters. this patch pulls
out the logic into a standalone RPC call to handle this in a single
operation such that calling it from multiple threads functions as
expected. It also makes it less slow because we no longer make N RPC
calls for N characters.
2023-10-19 17:00:01 -04:00
Anton Rydahl
c73ad025b1
[libc][libm][GPU] Add missing vendor entrypoints to the GPU version of libm (#66034)
This patch populates the GPU version of `libm` with missing vendor entrypoints. The vendor math entrypoints are disabled by default but can be enabled with the CMake option `LIBC_GPU_VENDOR_MATH=ON`.
2023-10-19 12:24:50 -07:00
Alfred Persson Forsberg
67770cbb98
[libc][NFC] Fix features.h.def file header 2023-10-19 20:00:26 +02:00
alfredfo
f350532099
[libc] Fix accidental LIBC_NAMESPACE_clock_freq (#69620)
See-also: https://github.com/llvm/llvm-project/pull/69548
2023-10-19 19:39:02 +02:00
lntue
3fd5113cba
[libc][math][NFC] Remove global scope constants declaration in math tests (#69558)
Clean up usage of `DECLARE_SPECIAL_CONSTANTS` in global scope.
2023-10-19 10:30:11 -04:00
alfredfo
d404130134
[libc] Fix accidental LIBC_NAMESPACE_syscall definition (#69548)
Building helloworld.c currently errors with "undefined symbol:
__llvm_libc_syscall"

See: https://github.com/llvm/llvm-project/pull/67032
2023-10-19 11:22:16 +02:00
alfredfo
74b0465fe9
[libc] Add simple features.h with implementation macro (#69402)
In the future this should probably be autogenerated so it defines
library version.

See: Discussion in #libc
https://discord.com/channels/636084430946959380/636732994891284500/1163979080979460176
2023-10-19 04:08:13 +02:00
Joseph Huber
ddc30ff802
[libc] Implement the 'ungetc' function on the GPU (#69248)
Summary:
This function follows closely with the pattern of all the other
functions. That is, making a new opcode and forwarding the call to the
host. However, this also required modifying the test somewhat. It seems
that not all `libc` implementations follow the same error rules as are
tested here, and it is not explicit in the standard, so we simply
disable these EOF checks when targeting the GPU.
2023-10-17 13:02:31 -05:00
michaelrj-google
8a47ad4b67
[libc] Add simple long double to printf float fuzz (#68449)
Recent testing has uncovered some hard-to-find bugs in printf's long
double support. This patch adds an extra long double path to the fuzzer
with minimal extra effort. While a more thorough long double fuzzer
would be useful, it would need to handle the non-standard cases of 80
bit long doubles such as unnormal and pseudo-denormal numbers. For that
reason, a standalone long double fuzzer is left for future development.
2023-10-16 13:32:34 -07:00
Samira Bazuzi
b5c2fa14ea
[libc] Mark operator== const to avoid ambiguity in C++20. (#68805)
C++20 will automatically generate an operator== with reversed operand
order, which is ambiguous with the written operator== when one argument
is marked const and the other isn't.

This operator currently triggers -Wambiguous-reversed-operator at usage
site libc/test/UnitTest/PrintfMatcher.cpp:28.
2023-10-11 23:59:13 -04:00
Joseph Huber
9bcf9dc98a [libc] Fix missing warp sync for the NVPTX assert
Summary:
The implementation of `assert` has an if statement so that only the
first thread in the warp prints the assertion. On modern NVPTX
architecture, this can be printed out of order with the abort call. This
would lead to only a portion of the message being printed and then
exiting the program. By adding a mandatory warp sync we force the full
string to be printed before we continue to the abort.
2023-10-10 12:50:37 -05:00
Joseph Huber
fa23a2396b
[libc] Fix linking of AMDGPU device runtime control constants for math (#65676)
Summary:
Currently, `libc` temporarily provides math by linking against existing
vendor implementations. To use the AMDGPU DeviceRTL we need to define a
handful of control constants that alter behaviour for architecture
specific things. Previously these were marked `extern const` because
they must be present when we link-in the vendor bitcode library.
However, this causes linker errors if more than one math function was
used.

This patch fixes the issue by marking these functions as used and inline
on top of being external. This means that they are linkable, but it
gives us `linkonce_odr` semantics. The downside is that these globals
won't be optimized out, but it allows us to perform constant propagation
on them unlike using `weak`.
2023-10-06 21:50:35 -05:00
Joseph Huber
4cb6c1c7cb
[libc] Enable missing memory tests on the GPU (#68111)
Summary:
There were a few tests that weren't enabled on the GPU. This is because
the logic caused them to be skipped as we don't use CPU featured on the
host. This also disables the logic making multiple versions of the
memory functions.
2023-10-06 08:27:36 -05:00
tnv01
28245b4ecb
[libc] Add x86-64 stack protector support. 2023-10-04 14:18:23 -07:00
michaelrj-google
bfcfc2a6d4
[libc] Fix typo in long double negative block (#68243)
The long double version of float to string's get_negative_block had a
bug in table mode. In table mode, one of the tables is named
"MIN_BLOCK_2" and it stores the number of blocks that are all zeroes
before the digits start for a given index. The check for long doubles
was incorrectly "block_index <= MIN_BLOCK_2[idx]" when it should be
"block_index < MIN_BLOCK_2[idx]" (without the equal sign). This bug
caused an off-by-one error for some long double values. This patch fixes
the bug and adds tests to ensure it doesn't regress.
2023-10-04 13:00:48 -07:00
Mikhail R. Gadelha
714b4c82bb
[libc][NFC] Fix -Wdangling-else when compiling libc with gcc >= 7 (#67833)
Explicit braces were added to fix the "suggest explicit braces to avoid
ambiguous ‘else’" warning since the current solution (switch (0) case 0:
default:) doesn't work since gcc 7 (see
https://github.com/google/googletest/issues/1119)

gcc 13 generates about 5000 of these warnings when building libc without
this patch.
2023-10-04 11:44:42 -04:00
Joseph Huber
452fa6b86d
[libc] Change the GPU to use builtin memory functions (#68003)
Summary:
The GPU build is special in the sense that we always know that
up-to-date `clang` is always going to be the compiler. This allows us to
rely directly on builtins, which allow us to push a lot of this
complexity into the backend. Backend implementations are favored on
the GPU because it allows us to do a lot more target specific
optimizations. This patch changes over the common memory functions to
use builtin versions when building for AMDGPU or NVPTX.
2023-10-04 07:02:55 -05:00
Mikhail R. Gadelha
824b1677a4
[libc][NFC] Fix missing field 'tm_isdst' initializer warning (#67837)
This patch fixes several warnings thrown by clang about an uninitialized
member of struct tm, tm_isdst.

Weirdly, gcc doesn't complain about it, probably this member is never
read in the tests.
2023-10-02 19:32:55 -04:00
Mikhail R. Gadelha
8fc87f54a8
[libc][NFC] Couple of small warning fixes (#67847)
This patch fixes a couple of warnings when compiling with gcc 13:

* CPP/type_traits_test.cpp: 'apply' overrides a member function but is
not marked 'override'
* UnitTest/LibcTest.cpp:98: control reaches end of non-void function
* MPFRWrapper/MPFRUtils.cpp:75: control reaches end of non-void function
* smoke/FrexpTest.h:92: backslash-newline at end of file
* __support/float_to_string.h:118: comparison of unsigned expression in ‘>= 0’ is always true
* test/src/__support/CPP/bitset_test.cpp:197: comparison of unsigned expression in ‘>= 0’ is always true

---------

Signed-off-by: Mikhail R. Gadelha <mikhail@igalia.com>
2023-10-02 19:29:26 -04:00
Joseph Huber
f88f090a2e
[libc] Correct 'memrchr' definition and re-enable on GPU (#67850)
Summary:
This was disabled on the GPU because it conflicted with the definition
in `glibc`. According to information online and in the `glibc`
implementation, the first argument should be a `const void *`. Fixing
this resolves the problem when exporting this to offloading languages.
2023-09-29 18:22:00 -05:00
Joseph Huber
e0b702ffc2
[libc] Fix nanosleep definition in the posix spec (#67855)
Summary:
The POSIX standard expects the first argument to this function to be
constant, e.g. https://man7.org/linux/man-pages/man2/nanosleep.2.html.
This fixes that problem and also corrects an obvious problem with
enabling this for offloading.
2023-09-29 17:35:10 -05:00
Joseph Huber
ce38cbb13b [libc][NFC] Adjust the libc init / fini array test
Summary:
The NVPTX backend is picky about the definitions of functions. Because
we call these functions with these arguments it can cause some problems
when it goes through the backend. This was observed in a different test
for `printf` that hasn't been landed yet. Also adjust the priority.
2023-09-29 13:22:02 -05:00
Joseph Huber
22ebf1e9b7 [libc][Obvious] Do not pass 'nolibc' and other flags to the GPU build
Summary:
Previously this code was applied to the integration tests but did not
copy the logic that stopped this from being passed to the GPU build.
Copy the full line to avoid the warnings and prevent any libraries from
being included.
2023-09-29 12:57:02 -05:00
Mikhail R. Gadelha
dbceb1d936
[libc] Fix unused variable in fputc test (#67830)
This is probably a copy-and-paste error and the variable 'more' was left
unused.
2023-09-29 12:31:40 -04:00
lntue
da28593d71
[libc][math] Implement double precision expm1 function correctly rounded for all rounding modes. (#67048)
Implementing expm1 function for double precision based on exp function
algorithm:

- Reduced x = log2(e) * (hi + mid1 + mid2) + lo, where:
  * hi is an integer
  * mid1 * 2^-6 is an integer
  * mid2 * 2^-12 is an integer
  * |lo| < 2^-13 + 2^-30
- Then exp(x) - 1 = 2^hi * 2^mid1 * 2^mid2 * exp(lo) - 1 ~ 2^hi *
(2^mid1 * 2^mid2 * (1 + lo * P(lo)) - 2^(-hi) )
- We evaluate fast pass with P(lo) is a degree-3 Taylor polynomial of
(e^lo - 1) / lo in double precision
- If the Ziv accuracy test fails, we use degree-6 Taylor polynomial of
(e^lo - 1) / lo in double double precision
- If the Ziv accuracy test still fails, we re-evaluate everything in
128-bit precision.
2023-09-28 16:43:15 -04:00
Joseph Huber
cc2445589d [libc] Fix wrapper headers for some ctype macros and C++ decls
Summary:
These wrapper headers need to work around things in the standard
headers. The existing workarounds didn't correctly handle the macros for
`iscascii` and `toascii`. Additionally, `memrchr` can't be used because
it has a different declaration for C++ mode. Fix this so it can be
compiled.
2023-09-28 10:00:34 -05:00
Joseph Huber
1a5d3b6cda
[libc] Scan the ports more fairly in the RPC server (#66680)
Summary:
Currently, we use the RPC server to respond to different ports which
each contain a request from some client thread wishing to do work on the
server. This scan starts at zero and continues until its checked all
ports at which point it resets. If we find an active port, we service it
and then restart the search.

This is bad for two reasons. First, it means that we will always bias
the lower ports. If a thread grabs a high port it will be stuck for a
very long time until all the other work is done. Second, it means that
the `handle_server` function can technically run indefinitely as long as
the client is always pushing new work. Because the OpenMP implementation
uses the user thread to service the kernel, this means that it could be
stalled with another asyncrhonous device's kernels.

This patch addresses this by making the server restart at the next port
over. This means we will always do a full scan of the ports before
quitting.
2023-09-26 16:09:48 -05:00
Joseph Huber
6273b6d9dc
[libc] Change RPC opcode enum definition (#67439)
Summary:
This enum previously manually specified the value. This just made it
unnecessarily difficult to add new ones without changing everything.
This patch also makes it compatible with C by removing the `:`
annotation and instead using the `LAST` method.
2023-09-26 15:24:28 -05:00
Joseph Huber
2b7227db1e [libc] Fix RPC server global after mass replace of __llvm_libc
Summary:
This variable needs a reserved name starting with `__`. It was
mistakenly changed with a mass replace. It happened to work because the
tests still picked up the associated symbol, but it just became a bad
name because it's not reserved anymore.
2023-09-26 14:28:48 -05:00
Siva Chandra
f2c9fe452f
[libc][NFC] Fix delete operator linkage names after switch to LIBC_NAMESPACE. (#67475)
The name __llvm_libc was mass-replaced with LIBC_NAMESPACE which ended
up changing the "__llvm_libc" prefix of the delete operator linkage names to
"LIBC_NAMESPACE". This change corrects it by changing the namespace prefix
to "__llvm_libc_<version info>".
2023-09-26 11:53:14 -07:00
Siva Chandra
425defd810
[libc][Obvious] Remove the previous ErrnoSetterMatcher target. (#67469)
A target still depending on the old target has been updated.
2023-09-26 11:01:21 -07:00
Siva Chandra
3bfd6a7521
[libc][NFC] Add compile options only to the header libraries which use them. (#67447)
Other libraries dependent on these libraries will automatically inherit
those compile options. This change in particular affects the compile
option "-DLIBC_COPT_STDIO_USE_SYSTEM_FILE".
2023-09-26 09:20:00 -07:00
Mikhail R. Gadelha
e3087c4b8c [libc] Start to refactor riscv platform abstraction to support both 32 and 64 bits versions
This patch enables the compilation of libc for rv32 by unifying the
current rv64 and rv32 implementation into a single rv implementation.

We updated the cmake file to match the new riscv32 arch and force
LIBC_TARGET_ARCHITECTURE to be "riscv" whenever we find "riscv32" or
"riscv64". This is required as LIBC_TARGET_ARCHITECTURE is used in the
path for several platform specific implementations.

Reviewed By: michaelrj

Differential Revision: https://reviews.llvm.org/D148797
2023-09-26 12:32:25 -03:00
Siva Chandra
599eadec28
[libc] Propagate printf config options from a single config header library. (#66979)
printf_core.parser is not yet updated to use the printf config options. It
does not use them currently anyway and the corresponding parser_test
should be updated to respect the config options.
2023-09-26 08:16:31 -07:00
Siva Chandra
aecb58005c
[libc][NFC] Remove an inappropriate -ffreestanding arg to memory_utils test. (#67435) 2023-09-26 08:04:08 -07:00
Joseph Huber
1b8c8155cc [libc][Obvious] Fix incorrect filepath for ftell.h header
Summary:
The previous patch moved the location of this CMake line but didn't
update the header. Fix it.
2023-09-26 10:02:20 -05:00
Joseph Huber
7ac8e26fc7
[libc] Implement fseek, fflush, and ftell on the GPU (#67160)
Summary:
This patch adds the necessary entrypoints to handle the `fseek`,
`fflush`, and `ftell` functions. These are all very straightfoward, we
simply make RPC calls to the associated function on the other end.
Implementing it this way allows us to more or less borrow the state of
the stream from the server as we intentionally maintain no internal
state on the GPU device. However, this does not implement the `errno`
functinality so that must be ignored.
2023-09-26 09:46:46 -05:00
Guillaume Chatelet
b6bc9d72f6
[libc] Mass replace enclosing namespace (#67032)
This is step 4 of
https://discourse.llvm.org/t/rfc-customizable-namespace-to-allow-testing-the-libc-when-the-system-libc-is-also-llvms-libc/73079
2023-09-26 11:45:04 +02:00
michaelrj-google
23552fe220
[libc] Acquire the lock for scanf files (#67357)
When creating the new scanf reader design, I forgot to add back the
calls to flockfile and funlockfile in vfscanf_internal. This patch fixes
that, and also changes the system file version to use the normal
variants since ungetc_unlocked isn't always available.
2023-09-25 15:00:03 -07:00
Joseph Huber
791b279924
[libc] Change the puts implementation on the GPU (#67189)
Summary:
Normally, the implementation of `puts` simply writes a second newline
charcter after printing the first string. However, because the GPU does
everything in batches of the SIMT group size, this will end up with very
poor output where you get the strings printed and then 1-64 newline
characters all in a row. Optimizations like to turn `printf` calls into
`puts` so it's a good idea to make this produce the expected output.

The least invasive way I could do this was to add a new opcode. It's a
little bloated, but it avoids an unneccessary and slow send operation to
configure this.
2023-09-25 11:17:22 -05:00
Joseph Huber
6f4ed39b4a
[libc] Enable hermetic tests for the stdio test suite (#67339)
Summary:
There are several tests here that are not yet using the `add_libc_test`.
Rather than do this individually we should just update these all at
once. These all pass on my x64 build so I'm assuming it should be fine.
2023-09-25 11:14:17 -05:00
Joseph Huber
b5440e443a [libc] Fix cyclical dependency on errno matcher for NVPTX architectures
Summary:
The NVPTX backend cannot handle cyclical dependencies on global variable
initializers. That is, a global variable cannot be used to initialize or
reference another global variable inside of it. This situation was
encountered with the new errno tests. This patch simply replaces the
offending function with a constant version to break the dependency and
alllow the tests to run again.
2023-09-23 08:59:34 -05:00
michaelrj-google
a5a008ff4f
[libc] Refactor scanf reader to match printf (#66023)
In a previous patch, the printf writer was rewritten to use a single
writer class with a buffer and a callback hook. This patch refactors
scanf's reader to match conceptually.
2023-09-22 12:50:02 -07:00
Mikhail R. Gadelha
2f98ff716c
[libc] Update integration test's linking options (#67158)
This patch set the integration test's linking options to be the same one
used in the hermetic tests.

In particular, by removing -nostdlib the tests are linked with
libgcc/compiler-rt and this fixes an issue undefined reference to
__udivdi3 and __umoddi3 in rv32.
2023-09-22 12:06:27 -04:00
Siva Chandra
62a3d84f5c
[libc][NFC] Extend ErrnoSetterMatcher to test expected inequalities. (#67153)
Before this change, ErrnoSetterMatcher only allowed testing for equality
of the expected return and errno values. This change extends it to allow
testing for expected inequalities of the return and errno values. The
test libc.test.src.stdio.fileop_test has been updated to use the
ErrnoSetterMatcher with tests for inequalities.
2023-09-22 08:59:10 -07:00
Mikhail R. Gadelha
7db91b4abe
[libc] Fix pthread_create_test for 32 bit systems (#66564)
The test tries to set the guard_size and stack_size of a thread to
SIZE_MAX / 4, which is a huge value in 64-bit systems but 1GB in 32-bit
ones.

We increase the size to 3 * (SIZE_MAX / 4) so it can also fail in 32-bit 
systems.
2023-09-22 10:10:07 -04:00
Mikhail R. Gadelha
50d1500447
[libc] Add ${CMAKE_CROSSCOMPILING_EMULATOR} to custom test cmdlines (#66565)
${CMAKE_CROSSCOMPILING_EMULATOR} will be used in the new rv32 buildbot
and is prepended automatically when we call add_custom_target in CMake,
except when we use a custom command.

There are two places where custom commands are used in libc, so we
explicitly add the ${CMAKE_CROSSCOMPILING_EMULATOR} variable there.
Other systems that don't use ${CMAKE_CROSSCOMPILING_EMULATOR} are
unaffected
2023-09-22 09:45:27 -04:00
Jeff Bailey
c618e13161
[libc] Pull more definitions from linux/stat.h (#67071)
For file handling, we need more definitions from
linux/stat.h, so this pulls them in. It also adjusts other definitions
to match the kernel's exactly [NFC] so that it's easy to verify that
there's been no divergence one day when it's time to use linux/stat.h
directly.

Tested:
check-libc
2023-09-21 22:34:54 -07:00
Joseph Huber
e0be78be42
[libc] Template the printf / scanf parser class (#66277)
Summary:
The parser class for stdio currently accepts different argument
providers. In-tree this is only used for a fuzzer test, however, the
proposed implementation of the GPU handling of printf / scanf will
require custom argument handlers. This makes the current approach of
using a preprocessor macro messier. This path proposed folding this
logic into a template instantiation. The downside to this is that
because the implementation of the parser class is placed into an
implementation file we need to manually instantiate the needed templates
which will slightly bloat binary size. Alternatively we could remove the
implementation file, or key off of the `libc` external packaging macro
so it is not present in the installed version.
2023-09-21 17:02:26 -05:00
Joseph Huber
f548d19fc8
[libc] Fix and simplify the implementation of 'fread' on the GPU (#66948)
Summary:
Previously, the `fread` operation was wrong in cases when we read less
data than was requested. That is, if we tried to read N bytes while the
file was in EOF, it would still copy N bytes of garbage. This is fixed
by only copying over the sizes we got from locally opening it rather
than just using the provided size.

Additionally, this patch simplifies the interface. The output functions
have special variants for writing to stdout / stderr. This is primarily
an optimization for these common cases so we can avoid sending the
stream as an argument which has a high delay. Because for input, we
already need to start with a `send` to tell the server how much data to
read, it costs us nothing to send the file along with it so this is
redundant. Re-use the file encoding scheme from the other
implementations, the one that stores the stream type in the LSBs of the
FILE pointer.
2023-09-21 14:28:06 -05:00
michaelrj-google
5bd34e0a55
[libc] Fix Off By One Errors In Printf Long Double (#66957)
Two major off-by-one errors are fixed in this patch. The first is in
float_to_string.h with length_for_num, which wasn't accounting for the
implicit leading bit when calculating the length of a number, causing
a missing digit on 80 bit float max. The other off-by-one is the
ryu_long_double_constants.h (a.k.a the Mega Table) not having any
entries for the last POW10_OFFSET in POW10_SPLIT. This was also found on
80 bit float max. Finally, the integer calculation mode was using a
slightly too short integer, again on 80 bit float max, not accounting
for the mantissa width. All of these are fixed in this patch.
2023-09-21 11:43:29 -07:00
Joseph Huber
e2bc0f9266 [libc][NFC] Remove unused function from the RPC server
Summary:
I missed removing this now-unused function in the previous patch. Remove
it to clean up the interface.
2023-09-21 11:56:48 -05:00
Joseph Huber
59896c168a
[libc] Remove the 'rpc_reset' routine from the RPC implementation (#66700)
Summary:
This patch removes the `rpc_reset` function. This was previously used to
initialize the RPC client on the device by setting up the pointers to
communicate with the server. The purpose of this was to make it easier
to initialize the device for testing. However, this prevented us from
enforcing an invariant that the buffers are all read-only from the
client side.

The expected way to initialize the server is now to copy it from the
host runtime. This will allow us to maintain that the RPC client is in
the constant address space on the GPU, potentially through inference,
and improving caching behaviour.
2023-09-21 11:07:09 -05:00
Mikhail R. Gadelha
8d7ca08b9f
[libc] Update siginfo_t to match kernel definition (#66560)
This patch updates the siginfo_t struct definition to match the
definition from the kernel here:

https://github.com/torvalds/linux/blob/master/include/uapi/asm-generic/siginfo.h

In particular, there are two main changes:

1. swap position of si_code and si_errno: si_code show come after
si_errno in all systems except MIPS. Since we don't MIPS, the order is
fixed for now, but can be easily \#ifdef'd if MIPS support is
implemented in the future.

2. We add a union of structs that are filled depending on the signal
raised.

This change was required for the fork and spawn integration tests in
rv32, since they fork/clone the running process, call
wait/waitid/waitpid, and read the status, which was wrong in rv32
because wait/waitid/waitpid are implemented in rv32 using SYS_waitid.

SYS_waitid takes a pointer to a siginfo_t and fills the proper fields in
the struct. The previous siginfo_t definition was being incorrectly
filled due to not taking into account the signal raised.
2023-09-21 10:59:03 -04:00
Guillaume Chatelet
270547f3bf
[libc][clang-tidy] Add llvm-header-guard to get consistant naming and prevent file copy/paste issues. (#66477) 2023-09-21 11:14:47 +02:00
Joseph Huber
3641d18557 [libc][Obvious] Fix incorrect RPC opcode for clearerr
Summary:
This was mistakenly using the opcode for `ferror` which wasn't noticed
because tests using this weren't yet activated. This patch fixes this
mistake.
2023-09-20 11:54:35 -05:00
Guillaume Chatelet
467077796a
[reland][libc][cmake] Tidy compiler includes (#66783) (#66878)
This is a reland of #66783 a35a3b75b2
fixing the benchmark breakage.
2023-09-20 11:21:46 +02:00
michaelrj-google
d37496e75a
[libc] Fix printf config not working (#66834)
The list of printf copts available in config.json wasn't working because
the printf_core subdirectory was included before the printf_copts
variable was defined, making it effectively nothing for the printf
internals. Additionally, the tests weren't respecting the flags so they
would cause the tests to fail. This patch reorders the cmake in src and
adds flag handling in test.
2023-09-19 15:36:14 -07:00
Guillaume Chatelet
9feb0c9b6e
Revert "[libc][cmake] Tidy compiler includes (#66783)" (#66822)
This reverts commit a35a3b75b2. This broke
libc benchmarks.
2023-09-19 23:18:08 +02:00
Guillaume Chatelet
a35a3b75b2
[libc][cmake] Tidy compiler includes (#66783)
We want to activate `llvm-header-guard` (#66477) but the current CMake
configuration includes paths that should be `isystem`. This PR restricts
the number of `-I` passed to the clang command line and correctly marks
the llvm libc include path as `isystem`.
2023-09-19 23:08:29 +02:00
Tue Ly
84c899b235 [libc][math] Extract non-MPFR math tests into libc-math-smoke-tests.
Extract non-MPFR math tests into libc-math-smoke-tests.

Reviewed By: sivachandra, jhuber6

Differential Revision: https://reviews.llvm.org/D159477
2023-09-19 12:10:21 -04:00
Jeff Bailey
acfb99d9fd
[libc] Specify path for making include/ subdirs (#66589)
When doing a clean build from vscode, it makes the subdirectories in the
source tree rather than in the build folder. Elsehwere in LLVM, they
prefix the MAKE_DIRECTORY calls, so this appears to be the correct
approach.
2023-09-18 21:00:51 -07:00
Joseph Huber
c354ee8d18
[libc][GPU] Fix dependencies for externally installed stub files (#66653)
Summary:
The GPU build has a lot of magic around how we package the output.
Generally, the GPU needs to exist as a secondary fatbinary image for
offloading languages. This is because offloading languages pretend like
offloading to an accelerator is a single file. This then needs to be put
into a single file to make it mesh with the existing build
infrastructure. To work with this, the `libc` makes an installed version
of the library that simply embeds the GPU code into an empty stub file.

This wasn't being updated correctly, which lead to the installed `libc`
static library not being updated correctly when the underlying file was
changed. The previous behaviour only updated when the entrypoint itself
was modified, but not any of its headers. By adding a dependcy on the
actual *object* file we should now capture the regular CMake semantics.
2023-09-18 10:15:02 -05:00
Joseph Huber
b8f64431ea
[libc] Add GPU config file using the new format (#66635)
Summary:
This patch copies a config file for the GPU similar to the
baremetal/embedded implementation. This will configure the
implementations of functions like `sprintf` and `snprintf` to be
compiled into more simple versions that can be run on the GPU. These
functions cannot be enabled yet as Vararg support hasn't landed, but it
will be used then.
2023-09-18 08:06:59 -05:00
Siva Chandra
7d7df7f237
[libc] Add a developer doc about adding new config options. (#66432) 2023-09-15 14:34:25 -07:00
Guillaume Chatelet
c21be63228
[libc][cmake] Report invalid clang-tidy path (#66475)
Adds better error reporting for missing clang-tidy.
2023-09-15 17:43:48 +02:00
Guillaume Chatelet
2dbdc9fc85
[libc] Add invoke / invoke_result type traits (#65750) 2023-09-15 11:15:41 +02:00
Joseph Huber
bbe7eb92b4 [libc][Obvious] Fix missing entrypoints after moving to generic
Summary:
The previous patch moved the implementations of these to generic/ and
accidentally did not add the unlocked variants. This patch fixes that
2023-09-14 15:59:08 -05:00
Joseph Huber
a1be5d69df
[libc] Implement more input functions on the GPU (#66288)
Summary:
This patch implements the `fgets`, `getc`, `fgetc`, and `getchar`
functions on the GPU. Their implementations are straightforward enough.
One thing worth noting is that the implementation of `fgets` will be
extremely slow due to the high latency to read a single char. A faster
solution would be to make a new RPC call to call `fgets` (due to the
special rule that newline or null breaks the stream). But this is left
out because performance isn't the primary concern here.
2023-09-14 15:39:29 -05:00
Mikhail R. Gadelha
72e6f06119
[libc] Fix start up crash on 32 bit systems (#66210)
This patch changes the default types of argc/argv so it's no longer a
uint64_t in all systems, instead, it's now a uintptr_t, which fixes
crashes in 32-bit systems that expect 32-bit types. This patch also adds
two uintptr_t types (EnvironType and AuxEntryType) for the same reason.

The patch also adds a PgrHdrTableType type behind an ifdef that's
Elf64_Phdr in 64-bit systems and Elf32_Phdr in 32-bit systems.
2023-09-14 09:02:35 -04:00
Alex Brachet
2ad7a06cb1
[libc] Fix some warnings (#66366)
Some compilers will warn about dangling else and missleading lack of
parentheses.
2023-09-14 08:47:21 -04:00
Guillaume Chatelet
aee8f8784a
[libc][utils] cpp::always_false to enable static_assert(false) (#66209) 2023-09-14 10:28:43 +02:00
Siva Chandra
f8f934e22c
[libc][NFC] Make the dummy header target under overlay build a library. (#66329)
This fixes the broken overlay builders.
2023-09-13 22:52:20 -07:00
Siva Chandra
17114f8b19
[libc] Remove common_libc_tuners.cmake and move options into config.json. (#66226)
The name has been changed to adhere to the config option naming format.
The necessary build changes to use the new option have also been made.
2023-09-13 22:17:00 -07:00
Michael Jones
3fb63c2921 [libc] simplify printf float writing
The two decimal float printing styles are similar, but different in how
they end. For simplicity of writing I initially gave them different
"write_last_block" functions. This patch unifies them into one function.

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D158036
2023-09-13 13:53:29 -07:00
Michael Jones
aa1eacd10c [libc][docs] Printf behavior doc
In the document on undefined behavior, I noted that writing down your
decisions is very important. This document contains all the information
for compile flags and undefined behavior for our printf.

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D158311
2023-09-13 13:42:30 -07:00
Joseph Huber
089b81105a [libc][NFC][Docs] Update some GPU testing information
Summary:
This comment is outddated and can be removed, also mention an option for
limiting parallelism during tests in the documentation.
2023-09-13 14:30:30 -05:00
michaelrj-google
380eb46b13
[libc] Move long double table option to new config (#66151)
This patch adds the long double table option for printf into the new
configuration scheme. This allows it to be set for most targets but
unset for baremetal.
2023-09-13 10:43:05 -07:00
Joseph Huber
bf85f27370
[libc] Implement 'qsort' and 'bsearch' on the GPU (#66230)
Summary:
This patch simply adds the necessary config to enable qsort and bsearch
on the GPU. It is *highly* unlikely that anyone will use these, as they
are single threaded, but we may as well support all entrypoints that we
can.
2023-09-13 12:06:34 -05:00
Siva Chandra
d25b4fae93
[libc][NFC] Make entrypoint alias targets real library targets. (#66044)
This is part of a libc wide CMake cleanup which aims to eliminate
certain explicitly duplicated logic which is available in CMake-3.20.
This change in particular makes the entrypoint aliases real library
targets so that they can be treated as normal library targets by other
libc build rules.
2023-09-13 08:35:23 -07:00
Mikhail R. Gadelha
75398f28eb [libc] Make time_t 64 bits long on all platforms but arm32
This patch changes the size of time_t to be an int64_t. This still
follows the POSIX standard which only requires time_t to be an integer.

Making time_t a 64-bit integer also fixes two cases in 32 bits platforms
that use SYS_clock_nanosleep_time64 and SYS_clock_gettime64, as the name
of these calls implies, they require a 64-bit time_t. For instance, in rv32,
the 32-bit version of these syscalls is not available.

We also follow glibc here, where time_t is still a 32-bit integer in
arm32.

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D159125
2023-09-13 10:49:39 -03:00
Joseph Huber
ef169f5707
[libc] Improve the implementation of the rand() function (#66131)
Summary:
This patch improves the implementation of the standard `rand()` function
by implementing it in terms of the xorshift64star pRNG as described in
https://en.wikipedia.org/wiki/Xorshift#xorshift*. This is a good,
general purpose random number generator that is sufficient for most
applications that do not require an extremely long period. This patch
also correctly initializes the seed to be `1` as described by the
standard. We also increase the `RAND_MAX` value to be `INT_MAX` as the
standard only specifies that it can be larger than 32768.
2023-09-12 16:52:20 -05:00
Joseph Huber
688019851e
[libc][NFC] Factor GPU exiting into a common function (#66093)
Summary:
We currently call the GPU routine to terminate the current thread in
three separate locations .This should be wrapped into a helper function
to simplify the implementation.
2023-09-12 14:59:02 -05:00
Siva Chandra
c5ad6c7781
[libc] Fix a typo in a CMakeLists.txt - replace DEPS with DEPENDS. (#66130) 2023-09-12 12:24:27 -07:00
Siva Chandra
0f31e5697b
[libc] Add missing deps for header libraries. (#66125)
Also, we removed CMP0076 exception sometime back but did not adjust the
build rules. The adjustment in the build rules is also done in this
patch.
2023-09-12 11:53:03 -07:00
Siva Chandra
9048aa71af
[libc] Make add_header and add_gen_header targets normal library targets. (#66045)
This way, they can be added as deps to other library targets without any
special handling.
2023-09-12 08:50:05 -07:00
Guillaume Chatelet
7329816285
[libc] Add is_object (#65749)
Add the is_object type traits.
Implementation comes from
https://en.cppreference.com/w/cpp/types/is_object
2023-09-12 10:35:22 +02:00
Siva Chandra
eb06125604
[libc][NFC] Eliminate the internal header library target. (#65837)
The internal header library target with name suffix `.__header_library`
has been removed as it serves no purpose now. It was added to make older
versions of CMake happy.
2023-09-11 11:22:33 -07:00
Joseph Huber
76af6e77c0
[libc] Manually set the AMDGPU code object version (#65986)
Summary:
There is currently effort to change over the default AMDGPU code object
version https://github.com/llvm/llvm-project/pull/65410. However, this
unfortunately causes problems in the LLVM LibC test suite that leads to
a hang while executing. This is most likely a bug to do with indirect
call optimization, as it can be avoided without optimizations or with
manually preventing inlining in the AMDGPU startup code.

This patch sets the AMDGPU code object version to be four explicitly on
the LibC test suite. This should unblock the efforts to move the default
to 5 without breaking the test suite. This isn't a great solution, but
there is currently some time pressure to get COV5 landed and this seems
to be the easiest solution.
2023-09-11 13:07:56 -05:00
Guillaume Chatelet
a1f5a495e0
[libc] Add type_traits tests (#65956)
This is not exhaustive for now but it provides a placeholder for
`invoke_result` test mentioned in #65750.
2023-09-11 14:15:12 +00:00
Guillaume Chatelet
d557e2b076
[libc][NFC] Fix missing header in CMakelists.txt (#65960) 2023-09-11 14:12:58 +00:00
Guillaume Chatelet
88348252a6
[libc] Add missing add_lvalue_reference_t (#65940) 2023-09-11 11:31:37 +02:00
Joseph Huber
60c0d303d6
[libc] Implement stdio writing functions for the GPU port (#65809)
Summary:
This patch implements fwrite, putc, putchar, and fputc on the GPU. These
are very straightforward, the main difference for the GPU implementation
is that we are currently ignoring `errno`. This patch also introduces a
minimal smoke test for `putc` that is an exact copy of the `puts` test
except we print the string char by char. This also modifies the `fopen`
test to use `fwrite` to mirror its use of `fread` so that it is tested
as well.
2023-09-09 13:27:07 -05:00
Siva Chandra
b0068b5b06
[libc][NFC] Make add_header_library rule support COMPILE_OPTIONS. (#65821)
The options added via COMPILE_OPTIONS will be treated as INTERFACE
options. This will help in setting compile options based on libc config
options in future patches.
2023-09-08 20:34:00 -07:00
Siva Chandra
ca2a4e76ea
[libc] Generate configure.rst from the JSON config information. (#65791) 2023-09-08 13:11:09 -07:00
Joseph Huber
31d4f0692f
[libc][NFC] Cleanup the GPU file I/O utility header (#65680)
Summary:
The GPU uses separate implementations to perform file IO. This is all
done through the RPC interface and we kept it minimal such that we could
treat a `stdin`, `stdout`, or `stderr` handle from the CPU correctly on
the GPU. The RPC implementation uses different opcodes for whether or
not we are using one of the standard streams. This is so we do not need
to initialize anything to access the CPU's standard stream, because the
server knows that it should print to `stdout` if it gets the `STDOUT`
variant of the opcode. It also saves us an RPC call, which are expensive
relatively  speaking. This patch simply cleans up this interface to make
them all use a common function. This is done in preparation to implement
some more file IO functions like getc or putc.
2023-09-08 14:15:53 -05:00
Siva Chandra
1d0d57e89a
[libc][docs] Fix docs/gpu/support.rst. (#65790) 2023-09-08 11:45:20 -07:00
Joseph Huber
71168f6889
[libc] Build the libc objects using a generic AMDGPU ABI (#65782)
Summary:
AMDGPU binaries use a "code object" as the ABI indicator. We are
currently trying to move over to a newer code object. We want these
library functions to use the "generic" or default ABI such that it is
specified when linked into the user application. Currently this will
default to v4 as the startup code will use whatever the current default
is.
2023-09-08 13:17:00 -05:00
Mikhail R. Gadelha
123bf08402
[libc] Unify gettime implementations (#65383)
Similar to D159208, this patch unifies the calls to a syscall, in this
patch it is the syscall SYS_clock_gettime/SYS_clock_gettime64.

This patch also fixes calls to SYS_clock_gettime64 by creating a
timespec64 object, passing it to the syscall and rewriting the timespec
given by the caller with timespec64 object's contents. This fixes cases
where timespec has a 4 bytes long time_t member, but SYS_clock_gettime
is not available (e.g., rv32).
2023-09-08 12:41:29 -04:00
Guillaume Chatelet
74971db140
[libc] Add is_scalar (#65740)
Adds the is_scalar traits based on implementation in
https://en.cppreference.com/w/cpp/types/is_scalar
2023-09-08 12:45:17 +00:00
Guillaume Chatelet
eebf8faf3e
[libc] Add is_member_pointer_v (#65631)
Implementation from
https://en.cppreference.com/w/cpp/types/is_member_pointer
2023-09-08 11:36:19 +02:00