This commit turns on ASan annotations in `std::basic_string` for all
allocators by default.
Originally suggested here: https://reviews.llvm.org/D146214
String annotations added here:
https://github.com/llvm/llvm-project/pull/72677
This commit is part of our efforts to support container annotations with
(almost) every allocator. Annotating `std::basic_string` with default
allocator is implemented in
https://github.com/llvm/llvm-project/pull/72677.
Additionally it removes `__begin != nullptr` because `data()` should
never return a nullptr.
Support in ASan API exists since
1c5ad6d2c0.
This patch removes the check in std::basic_string annotation member
function (__annotate_contiguous_container) to support different
allocators.
You can turn off annotations for a specific allocator based on changes
from
2fa1bec7a2.
The motivation for a research and those changes was a bug, found by
Trail of Bits, in a real code where an out-of-bounds read could happen
as two strings were compared via a call to `std::equal` that took
`iter1_begin`, `iter1_end`, `iter2_begin` iterators (with a custom
comparison function). When object `iter1` was longer than `iter2`, read
out-of-bounds on `iter2` could happen. Container sanitization would
detect it.
If you have any questions, please email:
- advenam.tacet@trailofbits.com
- disconnect3d@trailofbits.com
As described in #69994, using the escape hatch makes us non-conforming
in C++20 due to incorrect constexpr-ness. It also leads to bad
diagnostics as reported by #63900. We discussed the issue in the libc++
monthly meeting and we agreed that we should deprecate the macro in LLVM
18, and then remove it in LLVM 19 since it causes too many problems.
This patch does the first part of this -- it deprecates the macro.
Fixes#69994Fixes#63900
Partially addresses #75975
This commit is a refactor (increases readability) and optimization fix.
This is a fixed commit of
https://github.com/llvm/llvm-project/pull/76200 First reverthed here:
1ea7a56057
Please, check original PR for details.
The difference is a return type of the lambda.
Original description:
This commit addresses optimization and instrumentation challenges
encountered within comma constructors.
1) _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS does not work in comma
constructors.
2) Code inside comma constructors is not always correctly optimized.
Problematic code examples:
- `: __r_(((__str.__is_long() ? 0 : (__str.__annotate_delete(), 0)),
std::move(__str.__r_))) {`
- `: __r_(__r_([&](){ if(!__s.__is_long()) __s.__annotate_delete();
return std::move(__s.__r_);}())) {`
However, lambda with argument seems to be correctly optimized. This
patch uses that fact.
Use of lambda based on idea from @ldionne.
Since we use _LIBCPP_USING_IF_EXISTS to handle missing C library functions
now, _LIBCPP_C_HAS_NO_GETS shouldn't be necessary anymore.
See the discussion thread in #77242 for more details.
This was causing compilation errors when attempting to compare a
`shared_ptr<T[]>` with `nullptr`, as `get()` returns `T*` rather than `T
(*)[]`. `unique_ptr` did not have this issue, but I've added tests to
make sure.
Using `regex_search` with the regex_constant `match_default` and a
simple regex pattern `$` is expected to match general strings such as
_"a", "ab", "abc"..._ at `[last, last)` positions. But, the current
implementation fails to do so.
Fixes#75042
As pointed out by @Zingam the paper was implemented in libc++ as an
extension. This patch does the bookkeeping. The inital release version
is based on historical release dates.
Completes:
- Add a conditional noexcept specification to std::apply
This function replaces a call to `__move_assign` (internal function)
with two calls to public member functions (`resize` and `erase`). The
order of calls is chosen for the best performance.
This change is required to [turn on ASan string annotations for short
strings](https://github.com/llvm/llvm-project/pull/75882) (Short String
Optimization - SSO).
The `std::basic_string` class's `void __move_assign(basic_string&&
__str, size_type __pos, size_type __len)` function operates on
uninitialized strings, where it is reasonable to assume that the memory
is not poisoned. However, in `sstream` this function is applied to
existing strings that already have poisoned memory.
String ASan annotations turned on here:
https://github.com/llvm/llvm-project/pull/72677
This commit addresses optimization and instrumentation challenges
encountered within comma constructors.
1) _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS does not work in comma
constructors.
2) Code inside comma constructors is not always correctly optimized.
Problematic code examples:
- `: __r_(((__str.__is_long() ? 0 : (__str.__annotate_delete(), 0)),
std::move(__str.__r_))) {`
- `: __r_(__r_([&](){ if(!__s.__is_long()) __s.__annotate_delete();
return std::move(__s.__r_);}())) {`
However, lambda with argument seems to be correctly optimized. The patch employs this.
Use of lambda based on an idea from @ldionne.
Also introduce `_LIBCPP_ASSERT_PEDANTIC` for assertions violating which
results in a no-op or other benign behavior, but which may nevertheless
indicate a bug in the invoking code.
fixes#70506
The detailed problem description is in #70506
The original proposed fix was to remove `[[no_unique_address]]` except
when `_Tp` is empty.
Edit:
After the discussion in the comments below, the new fix here is to
remove the `[[no_unique_address]]` from `movable_box` in the cases where
we need to add our own assignment operator, which has contains the
problematic `construct_at`
The status table incorrectly marks P0521R0 as nothing to do. This is not
correct the function should be deprecated.
During our latest monthly meeting we argreed to remove the
_LIBCPP_ENABLE_CXXyy_REMOVED_FEATURES macros, therefore the new macro is
not
added to that global list.
Implements
- P0521R0 Proposed Resolution for CA 14 (shared_ptr use_count/unique)
Implements parts of
- P0619R4 Reviewing Deprecated Facilities of C++17 for C++20
---------
Co-authored-by: Nikolas Klauser <nikolasklauser@berlin.de>
As pointed out in #72883, the implementation only needs to return the
value of ranges::next and does not need to obtain the value through
ranges::advance, which causes it to have O(n) complexity in the case
of random-access-sized but non-common range.
Fixes#72883
This commit implements conditional compilation for ASan helper code.
As convey to me by @EricWF, string benchmarks with UBSan have been
experiencing significant performance hit after the commit with ASan
string annotations. This is likely due to the fact that no-op ASan code
is not optimized out with UBSan. To address this issue, this commit
conditionalizes the inclusion of ASan helper function bodies using
`#ifdef` directives. This approach allows us to selectively include only
the ASan code when it's actually required, thereby enhancing
optimizations and improving performance.
While issue was noticed in string benchmarks, I expect same overhead
(just less noticeable) in other containers, therefore `std::vector` and
`std::deque` have same changes.
To see impact of that change run `string.libcxx.out` with UBSan and
`--benchmark_filter=BM_StringAssign` or
`--benchmark_filter=BM_StringConstruct`.
std::midpoint is specified by having a pointer overload in
[numeric.ops.midpoint].
With the way the pointer overload is specified, users can expect that
calling
std::midpoint as `std::midpoint<T>(a, b)` should work, but it didn't in
libc++
due to the way the pointer overload was specified.
Fixes#67046
Notable things in this commit:
* refactors `__indirect_binary_left_foldable`, making it slightly
different (but equivalent) to _`indirect-binary-left-foldable`_, which
improves readability (a [patch to the Working Paper][patch] was made)
* omits `__cpo` namespace, since it is not required for implementing
niebloids (a cleanup should happen in 2024)
* puts tests ensuring invocable robustness and dangling correctness
inside the correctness testing to ensure that the algorithms' results
are still correct
[patch]: https://github.com/cplusplus/draft/pull/6734
Finishes implementation of
- P2093R14 Formatted output
- P2539R4 Should the output of std::print to a terminal be synchronized
with the underlying stream?
Differential Revision: https://reviews.llvm.org/D156609
Currently, when libc++'s views::take specially handles an iota_view, the
addition is done after dereferencing the beginning iterator. However, in
[range.take.overview]/2.3, the addition is done before the dereferencing,
which means that the standard requires the returned iota_view to have
the same W and Bound type in such cases.
This patch fixes that, and also fixes a test that was testing the
incorrect behavior.
Fixes#75611
This patch runs clang-format on all of libcxx/include and libcxx/src, in
accordance with the RFC discussed at [1]. Follow-up patches will format
the benchmarks, the test suite and remaining parts of the code. I'm
splitting this one into its own patch so the diff is a bit easier to
review.
This patch was generated with:
find libcxx/include libcxx/src -type f \
| grep -v 'module.modulemap.in' \
| grep -v 'CMakeLists.txt' \
| grep -v 'README.txt' \
| grep -v 'libcxx.imp' \
| grep -v '__config_site.in' \
| xargs clang-format -i
A Git merge driver is available in libcxx/utils/clang-format-merge-driver.sh
to help resolve merge and rebase issues across these formatting changes.
[1]: https://discourse.llvm.org/t/rfc-clang-formatting-all-of-libc-once-and-for-all
This patch removes assumptions that std::array's iterators are raw
pointers in the source code and in our test suite. While this is true
right now, this doesn't have to be true and ion the future we might want
to enable bounded iterators in std::array, which would require this
change.
This is a pre-requisite for landing #74482
It's not that I have much love for C++03, but we should ensure that it
works. Some recent changes broke this configuration because slightly
older Clang versions don't support attribute syntax in C++03 mode.
Even though constexpr implicitly makes functions inline, we try not to
rely on this implicit effect in the code base. We are mostly consistent
about using `inline` on non-template free-functions to make it clear
that we don't have an ODR violation.
This patch simply fixes a few places where we didn't explicitly use
inline on non-template free functions, presumably because they were
constexpr.
Fixes#75227
Fixes#75002. Found while running libc++'s tests with MSVC's STL.
This is a superset of #74961 that also fixes the product code
and adds a regression test. Thanks again, @cpplearner!
To summarize: `views::split` and `views::lazy_split` aren't unary,
aren't range adaptor **closure** objects, and can't be piped. However,
\[range.adaptor.object\]/8 says that `views::split(pattern)` and
`views::lazy_split(pattern)` produce unary, pipeable, range adaptor
closure objects.
This PR adjusts the test coverage accordingly, allowing it to portably
pass for libc++ and MSVC's STL.
This commit introduces basic annotations for `std::basic_string`,
mirroring the approach used in `std::vector` and `std::deque`.
Initially, only long strings with the default allocator will be
annotated. Short strings (_SSO - short string optimization_) and strings
with non-default allocators will be annotated in the near future, with
separate commits dedicated to enabling them. The process will be similar
to the workflow employed for enabling annotations in `std::deque`.
**Please note**: these annotations function effectively only when libc++
and libc++abi dylibs are instrumented (with ASan). This aligns with the
prevailing behavior of Memory Sanitizer.
To avoid breaking everything, this commit also appends
`_LIBCPP_INSTRUMENTED_WITH_ASAN` to `__config_site` whenever libc++ is
compiled with ASan. If this macro is not defined, string annotations are
not enabled. However, linking a binary that does **not** annotate
strings with a dynamic library that annotates strings, is not permitted.
Originally proposed here: https://reviews.llvm.org/D132769
Related patches on Phabricator:
- Turning on annotations for short strings:
https://reviews.llvm.org/D147680
- Turning on annotations for all allocators:
https://reviews.llvm.org/D146214
This PR is a part of a series of patches extending AddressSanitizer C++
container overflow detection capabilities by adding annotations, similar
to those existing in `std::vector` and `std::deque` collections. These
enhancements empower ASan to effectively detect instances where the
instrumented program attempts to access memory within a collection's
internal allocation that remains unused. This includes cases where
access occurs before or after the stored elements in `std::deque`, or
between the `std::basic_string`'s size (including the null terminator)
and capacity bounds.
The introduction of these annotations was spurred by a real-world
software bug discovered by Trail of Bits, involving an out-of-bounds
memory access during the comparison of two strings using the
`std::equals` function. This function was taking iterators
(`iter1_begin`, `iter1_end`, `iter2_begin`) to perform the comparison,
using a custom comparison function. When the `iter1` object exceeded the
length of `iter2`, an out-of-bounds read could occur on the `iter2`
object. Container sanitization, upon enabling these annotations, would
effectively identify and flag this potential vulnerability.
This Pull Request introduces basic annotations for `std::basic_string`.
Long strings exhibit structural similarities to `std::vector` and will
be annotated accordingly. Short strings are already implemented, but
will be turned on separately in a forthcoming commit. Look at [a
comment](https://github.com/llvm/llvm-project/pull/72677#issuecomment-1850554465)
below to read about SSO issues at current moment.
Due to the functionality introduced in
[D132522](dd1b7b797a),
the `__sanitizer_annotate_contiguous_container` function now offers
compatibility with all allocators. However, enabling this support will
be done in a subsequent commit. For the time being, only strings with
the default allocator will be annotated.
If you have any questions, please email:
- advenam.tacet@trailofbits.com
- disconnect3d@trailofbits.com
Adding months to a year_month should wrap the year when the month
becomes greater than twelve or less than one.
This fixes the issue for year_month. Other classes with a year and month
do not have this issue. This has been verified and tests are added to
avoid possible regressions.
Also fixes some variable copy-paste errors in the tests.
Fixes https://github.com/llvm/llvm-project/issues/73162
Found while running libc++'s test suite with MSVC's STL.
* I've filed [LWG-4021](https://cplusplus.github.io/LWG/issue4021)
"`mdspan::is_always_meow()` should be `noexcept`" and implemented this
in libc++'s product and test code.
* Use `LIBCPP_STATIC_ASSERT` to avoid issues with `noexcept`
strengthening in MSVC's STL.
+ As permitted by the Standard, MSVC's STL conditionally strengthens
`mdspan` construction/`is_meow`/`stride` and `elements_view` iterator
`base() &&`, and always strengthens `basic_stringbuf` `swap`.
+ In `mdspan/properties.pass.cpp`, this also upgrades runtime `assert`s
to `static_assert`s.
* Improvement: Upgrade `assert` to `static_assert` when inspecting the
`noexcept`ness of `std::ranges::iter_move`. (These `!noexcept` tests
weren't causing issues for MSVC's STL, so I didn't change them to be
libc++-specific.)
This change requires quite a number of changes in the tests; this is not
code I expect people to use in the wild. So I don't expect breakage for
users.
Implements:
- P2905R2 Runtime format strings, as a Defect Report
'isctype' fails in arm64-big-endian because the __regex_word involved
in mask operation is not changed based on the platform endianness, while
the character mask does change.
This makes libc++'s <filesystem> tests compatible with MSVC's STL.
In msvc_stdlib_force_include.h, we need to define 3 more macros:
- _CRT_DECLARE_NONSTDC_NAMES activates the POSIX names of
`getcwd` etc. As the comment explains, we need this because
we test with Clang `-fno-ms-compatibility`, which defines
`__STDC__` to `1`, which causes the UCRT headers to disable
the POSIX names by default.
- Then we need _CRT_NONSTDC_NO_WARNINGS to avoid emitting
deprecation warnings about the POSIX names.
- Finally, we need `NOMINMAX` to seal away the ancient evil.
These macros are documented in https://learn.microsoft.com/en-us/cpp/c-runtime-library/compatibility?view=msvc-170.
As a drive-by change, the patch adds a "simulated" macro for
__has_feature(hwaddress_sanitizer). It also clang-formats all
of msvc_stdlib_force_include.h and removes guards for
__has_builtin(__builtin_source_location) in <source_location>,
since those are not needed anymore.
Found while running libc++'s tests with MSVC's STL.
`ranges::rotate_copy` takes `forward_iterator`s as this test's comment
banner correctly depicts. However, this test had bogus assertions
expecting that `ranges::rotate_copy` would be constrained away for
not-quite-**bidi** iterators. @philnik777 confirmed that these were
copy-paste relics from the `ranges::reverse_copy` test.
I fixed this by replacing the assertions with the test types that aren't
quite **forward** iterators/ranges. Additionally, I noticed that the
top-level `test()` function was missing coverage with the weakest
possible `forward_iterator<int*>`.
This revealed that the product code in `ranges_rotate_copy.h` was
similarly damaged. In addition to fixing it by taking `forward_iterator`
and `forward_range` as depicted in the Standard, this drops the
inclusion of `<__iterator/reverse_iterator.h>` as this algorithm doesn't
need `std::__reverse_range`.
This commit refactors the ASan annotation functions in libc++ to reduce
unnecessary code duplication. Additionally it adds a small optimization.
- Eliminates two redundant function versions by utilizing the
`[[maybe_unused]]` attribute and guarding function bodies with `#ifndef
_LIBCPP_HAS_NO_ASAN`.
- Introduces an additional guard to an auxiliary function, allowing the
removal of a no-ops function body. This approach avoids relying on the
optimizer for code elimination.
Fixes#73043
This is in preparation for clang-formatting the whole code base. These
annotations are required either to avoid clang-format bugs or because
the manually formatted code is significantly more readable than the
clang-formatted alternative. All in all, it seems like very few
annotations are required, which means that clang-format is doing a very
good job in most cases.
In preparation for running clang-format on the whole code base, we are
also removing mentions of the legacy _LIBCPP_INLINE_VISIBILITY macro in
favor of the newer _LIBCPP_HIDE_FROM_ABI.
We're still leaving the definition of _LIBCPP_INLINE_VISIBILITY to avoid
creating needless breakage in case some older patches are checked-in
with mentions of the old macro. After we branch for LLVM 18, we can do
another pass to clean up remaining uses of the macro that might have
gotten introduced by mistake (if any) and remove the macro itself at the
same time. This is just a minor convenience to smooth out the transition
as much as possible.
See
https://discourse.llvm.org/t/rfc-clang-formatting-all-of-libc-once-and-for-all
for the clang-format proposal.
The header is used in the dylib, this is not an issue at the moment
since the dylib is built using C++23 but it would be when building
with C++26.
Post release comments in #72496 seem to indicate this removal is an
issue for Fuchsia.
This paper was voted in as a DR, so it's retroactively enabled back to
C++20; the C++ version that introduced std::format.
Implements:
- P2909R4 Fix formatting of code units as integers (Dude, where’s my
``char``?)
Locale objects use atomic reference counting, which may be very
expensive in parallel applications. The classic locale is used by
default by all streams and can be very contended. But it's never
destroyed, so the reference counting is also completely pointless on the
classic locale. Currently ~70% of time in the parallel stringstream
benchmarks is spent in locale ctor/dtor. And the execution radically
slows down with more threads.
Avoid reference counting on the classic locale. With this change
parallel benchmarks start to scale with threads.
This is a re-application of f8afc53d64 (aka PR #72112) which was
reverted in 4e0c48b907 because it broke the sanitizer builds due
to an initialization order fiasco. This issue has now been fixed by
ensuring that the locale is constinit'ed.
Co-authored-by: Dmitry Vyukov <dvyukov@google.com>
<filesystem> is a C++17 addition. In C++11 and C++14 modes, we actually
have all the code for <filesystem> but it is hidden behind a non-inline
namespace __fs so it is not accessible. Instead of doing this unusual
dance, just guard the code for filesystem behind a classic C++17 check
like we normally do.
The intent of these particular functions, since their introduction, was
to NOT be inlinable.
However, the mechanism by which this was accomplished was non-obvious,
and stopped working when string is compiled for C++20.
A longstanding behavior specified by the C++ standard is that
instantiation of the body of a template function is suppressed by an
extern template declaration -- unless the function is explicitly marked
either constexpr or inline. Of course, if the body is not instantiated,
then it cannot possibly be inlined, and thus all the functions listed in
libcxx/include/__string/extern_template_lists.h were uninlineable.
But, in C++20 mode, string functions were annotated constexpr, which
means they _are_ instantiated, and do become inlineable. And, in fact,
they do get inlined, which has caused noticeable binary-size growth for
users.
For example, in C++17,
`std::string f(std::string *in) { return *in; }`
does not inline the copy-constructor call, and instead generates a call
to the exported function defined in the libc++ shared library.
I think we probably don't want to mark all functions that are currently
in the extern template list as noinline, as many of them really are
reasonable inlining candidates. Thus, I've restricted this change to
only the few functions that were clearly intended to be outlined.
See commits like b019c5c037 (and some
others like it) for background, in which functions were removed from the
extern template list in the unstable ABI in order to allow the
short-string case to be inlined, while moving the long-string case to a
separate function, added to the extern template list.
Several experimental headers around std::pmr have been slated for
removal for a while now. This patch actually performs the removal and
cleanups from the code base.
I don't know when, but at some point we lost test coverage to ensue that
all the headers are in the modulemap. This adds a test to make sure all
the headers (excluding a few which shouldn't be part of the modulemap)
are at least mentioned. This also fixes a few headers which bit-rotted
while we were missing the coverage.
This makes the conditionals quite a bit simpler to understand, since it
avoids double negatives and makes sure we have <__availability>
included. For vendors which use availability macros, it also enforces
that they check when specific features are introduced and define the
macro for their platform appropriately.
We have quite a few macros scattered around in `<__config>` which are
there for QoI purposes. To make things a bit simpler this patch moves
all these attributes into a single place.
When working on an OpenMP offloading backend for standard parallel
algorithms (https://github.com/llvm/llvm-project/pull/66968) we noticed
the need of a generalization of `__is_trivial_plus_operation`. This patch
merges `__is_trivial_equality_predicate` and `__is_trivial_plus_operation`
into `__desugars_to`, and in the future we might extend the latter to support
other binary operations as well.
Co-authored-by: Louis Dionne <ldionne.2@gmail.com>
The creation of the global and the classic locales was pretty twisty.
This patch refactors how this is done to reduce the amount of
indirections and prepare the terrain for a future where GCC implements
the no_destroy attribute.
Their definition was a bit roundabout and it was actually wrong since
atomic_unsigned_lock_free would be a signed type whenever
__cxx_contention_t is lock free, which is most of the time.
Fixes#72968
This patch re-introduces special support for narrowing conversions to
bool
in std::variant, which was removed in 170810fca6 in order to make
libc++
Standards-conforming.
The special support is gated by the
`_LIBCPP_ENABLE_NARROWING_CONVERSIONS_IN_VARIANT`
macro and will be supported for LLVM 18 only as a courtesy to help large
code bases migrate over to the Standard behavior.
---------
Co-authored-by: Bogdan Graur <bgraur@google.com>
Co-authored-by: Louis Dionne <ldionne.2@gmail.com>
This patch brings std::ios_base::noreplace from P2467R1 to libc++.
This requires compiling the shared library in C++23 mode since otherwise
fstream::open(...) doesn't know about the new flag.
Differential Revision: https://reviews.llvm.org/D137640
Co-authored-by: Louis Dionne <ldionne.2@gmail.com>
This enables all optimizations that rely on
`is_trivially_equality_comparable` to work with these integral types,
for example `std::equal` and `std::find`.
Instead of using individual macros to turn off missing C library
features, we use the using_if_exists attribute now. This patch removes
the _LIBCPP_HAS_NO_FGETPOS_FSETPOS macro used to workaround missing
fgetpos and fsetpos on older versions of Android -- using_if_exists
should take care of those in the headers and we should add appropriate
XFAILs to the tests instead of using TEST_HAS_NO_FGETPOS_FSETPOS.