This patch implements __cxa_init_primary_exception, an extension to the
Itanium C++ ABI. This extension is already present in both libsupc++ and
libcxxrt. This patch also starts making use of this function in
std::make_exception_ptr: instead of going through a full throw/catch
cycle, we are now able to initialize an exception directly, thus making
std::make_exception_ptr around 30x faster.
Locale objects use atomic reference counting, which may be very
expensive in parallel applications. The classic locale is used by
default by all streams and can be very contended. But it's never
destroyed, so the reference counting is also completely pointless on the
classic locale. Currently ~70% of time in the parallel stringstream
benchmarks is spent in locale ctor/dtor. And the execution radically
slows down with more threads.
Avoid reference counting on the classic locale. With this change
parallel benchmarks start to scale with threads.
This is a re-application of f8afc53d64 (aka PR #72112) which was
reverted in 4e0c48b907 because it broke the sanitizer builds due
to an initialization order fiasco. This issue has now been fixed by
ensuring that the locale is constinit'ed.
Co-authored-by: Dmitry Vyukov <dvyukov@google.com>
<filesystem> is a C++17 addition. In C++11 and C++14 modes, we actually
have all the code for <filesystem> but it is hidden behind a non-inline
namespace __fs so it is not accessible. Instead of doing this unusual
dance, just guard the code for filesystem behind a classic C++17 check
like we normally do.
We support GCC 13, which supports the flag. This allows simplifying the
CMake logic around the use of -nostdlib++. Note that there are other
places where we don't assume -nostdlib++ yet in the build, but this
patch is intentionally trying to be small because this part of our CMake
is pretty tricky.
Credits: this change is based on analysis and a proof of concept by
gerbens@google.com.
Before, the compiler loses track of end as 'this' and other references
possibly escape beyond the compiler's scope. This can be see in the
generated assembly:
16.28 │200c80: mov %r15d,(%rax)
60.87 │200c83: add $0x4,%rax
│200c87: mov %rax,-0x38(%rbp)
0.03 │200c8b: → jmpq 200d4e
...
...
1.69 │200d4e: cmp %r15d,%r12d
│200d51: → je 200c40
16.34 │200d57: inc %r15d
0.05 │200d5a: mov -0x38(%rbp),%rax
3.27 │200d5e: mov -0x30(%rbp),%r13
1.47 │200d62: cmp %r13,%rax
│200d65: → jne 200c80
We fix this by always explicitly storing the loaded local and pointer
back at the end of push back. This generates some slight source 'noise',
but creates nice and compact fast path code, i.e.:
32.64 │200760: mov %r14d,(%r12)
9.97 │200764: add $0x4,%r12
6.97 │200768: mov %r12,-0x38(%rbp)
32.17 │20076c: add $0x1,%r14d
2.36 │200770: cmp %r14d,%ebx
│200773: → je 200730
8.98 │200775: mov -0x30(%rbp),%r13
6.75 │200779: cmp %r13,%r12
│20077c: → jne 200760
Now there is a single store for the push_back value (as before), and a
single store for the end without a reload (dependency).
For fully local vectors, (i.e., not referenced elsewhere), the capacity
load and store inside the loop could also be removed, but this requires
more substantial refactoring inside vector.
Differential Revision: https://reviews.llvm.org/D80588
Temporary workaround for the following CMake error:
```
CMake Error in /llvm/libcxx/benchmarks/CMakeLists.txt:
The compiler feature "cxx_std_23" is not known to CXX compiler
"IBMClang"
```
This patch is a melting pot of changes picked up from
https://llvm.org/D61878. It adds a few tests checking corner cases of
unordered containers comparison and adds benchmarks for a few
unordered_set operations.
This fixes some unused variable and "comparison of integers of different
signs" warnings in the benchmarks.
Differential Revision: https://reviews.llvm.org/D156613
This commit contains refactorings around __dynamic_cast without changing
its behavior. Some important changes include:
- Refactor __dynamic_cast into various small helper functions;
- Move dynamic_cast_stress.pass.cpp to libcxx/benchmarks and refactor
it into a benchmark. The benchmark performance numbers are updated
as well.
Differential Revision: https://reviews.llvm.org/D138006
In the event the internal function __init is called with an empty string the code will take unnecessary extra steps, in addition, the code generated might be overall greater because, to my understanding, when initializing a string with an empty `const char*` "" (like in this case), the compiler might be unable to deduce the string is indeed empty at compile time and more code is generated.
The goal of this patch is to make a new internal function that will accept just an error code skipping the empty string argument. It should skip the unnecessary steps and in the event `if (ec)` is `false`, it will return an empty string using the correct ctor, avoiding any extra code generation issues.
After the conversation about this patch matured in the libcxx channel on the LLVM Discord server, the patch was analyzed quickly with "Compiler Explorer" and other tools and it was discovered that it does indeed reduce the amount of code generated when using the latest stable clang version (16) which in turn produces faster code.
This patch targets LLVM 18 as it will break the ABI by addressing https://github.com/llvm/llvm-project/issues/63985
Benchmark tests run on other machines as well show in the best case, that the new version without the extra string as an argument performs 10 times faster.
On the buildkite CI run it shows the new code takes less CPU time as well.
In conclusion, the new code should also just appear cleaner because there are fewer checks to do when there is no message.
Reviewed By: #libc, philnik
Spies: emaste, nemanjai, philnik, libcxx-commits
Differential Revision: https://reviews.llvm.org/D155820
This commit does a pass of clang-format over files in libc++ that
don't require major changes to conform to our style guide, or for
which we're not overly concerned about conflicting with in-flight
patches or hindering the git blame.
This roughly covers:
- benchmarks
- range algorithms
- concepts
- type traits
I did a manual verification of all the changes, and in particular I
applied clang-format on/off annotations in a few places where the
result was less readable after than before. This was not necessary
in a lot of places, however I did find that clang-format had pretty
bad taste when it comes to formatting concepts.
Differential Revision: https://reviews.llvm.org/D153140
CMake older than 3.20.0 is no longer supported.
This removes work-arounds for no longer supported versions.
Reviewed By: #libc, jloser, philnik
Differential Revision: https://reviews.llvm.org/D152099
This is an ongoing series of commits that are reformatting our
Python code.
Reformatting is done with `black`.
If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.
If you run into any problems, post to discourse about it and
we will try to help.
RFC Thread below:
https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style
Reviewed By: #libc, kwk, Mordante
Differential Revision: https://reviews.llvm.org/D150763
The original `__dynamic_cast` implementation does not use the ABI-provided `src2dst_offset` parameter which helps improve performance on the hot paths. This patch improves the performance on the hot paths in `__dynamic_cast` by leveraging hints provided by the `src2dst_offset` parameter. This patch also includes a performance benchmark suite for the `__dynamic_cast` implementation.
Reviewed By: philnik, ldionne, #libc, #libc_abi, avogelsgesang
Spies: mikhail.ramalho, avogelsgesang, xingxue, libcxx-commits
Differential Revision: https://reviews.llvm.org/D138005
Don't forward to `min_element` for small types that are trivially copyable, and instead use a naive loop that keeps track of the smallest element (as opposed to an iterator to the smallest element). This allows the compiler to vectorize the loop in some cases.
Reviewed By: #libc, ldionne
Spies: ldionne, libcxx-commits
Differential Revision: https://reviews.llvm.org/D143596
The implementation makes use of the freedom added by LWG 3410. We have
two variants of this algorithm:
* a fast path for random access iterators: This fast path computes the
maximum number of loop iterations up-front and does not compare the
iterators against their limits on every loop iteration.
* A basic implementation for all other iterators: This implementation
compares the iterators against their limits in every loop iteration.
However, it still takes advantage of the freedom added by LWG 3410 to
avoid unnecessary additional iterator comparisons, as originally
specified by P1614R2.
https://godbolt.org/z/7xbMEen5e shows the benefit of the fast path:
The hot loop generated of `lexicographical_compare_three_way3` is
more tight than for `lexicographical_compare_three_way1`. The added
benchmark illustrates how this leads to a 30% - 50% performance
improvement on integer vectors.
Implements part of P1614R2 "The Mothership has Landed"
Fixes LWG 3410 and LWG 3350
Differential Revision: https://reviews.llvm.org/D131395
This has multiple benefits:
- The optimizations are also performed for the `ranges::` versions of the algorithms
- Code duplication is reduced
- it is simpler to add this optimization for other segmented iterators,
like `ranges::join_view::iterator`
- Algorithm code is removed from `<deque>`
Reviewed By: ldionne, huixie90, #libc
Spies: mstorsjo, sstefan1, EricWF, libcxx-commits, mgorny
Differential Revision: https://reviews.llvm.org/D132505
Bumping down is significantly faster than bumping up. This is ABI breaking, but the ABI of `pmr::monotonic_buffer_resource` was only stabilized in this release cycle, so we can still change it.
For a more detailed explanation why bumping down is better, see https://fitzgeraldnick.com/2019/11/01/always-bump-downwards.html.
Reviewed By: ldionne, #libc
Spies: libcxx-commits
Differential Revision: https://reviews.llvm.org/D141435
This implements the Grapheme clustering as required by
P1868R2 width: clarifying units of width and precision in std::format
This was omitted in the initial patch, but the paper was marked as completed. This really completes the paper.
Reviewed By: ldionne, #libc
Differential Revision: https://reviews.llvm.org/D126971
In particular remove the ability to expel incomplete features from the
library at configure-time, since this can now be done through the
_LIBCPP_ENABLE_EXPERIMENTAL macro.
Also, never provide symbols related to incomplete features inside the
dylib, instead provide them in c++experimental.a (this changes the
symbols list, but not for any configuration that should have shipped).
Differential Revision: https://reviews.llvm.org/D128928
This is a preparation to look at possible performance improvements.
Reviewed By: #libc, ldionne
Differential Revision: https://reviews.llvm.org/D129421
As of now containers key_eq might get called when rehashing happens, which is redundant for unique keys containers.
Reviewed By: #libc, philnik, Mordante
Differential Revision: https://reviews.llvm.org/D128021
The optimization level used when building the benchmarks should
match the optimization level of the current build. Otherwise, we
can end up mixing an -O3 or -O0 optimized dylib with benchmarks
built with -O2, which is really misleading.
Differential Revision: https://reviews.llvm.org/D127987