llvm-capstone

mirror of https://github.com/capstone-engine/llvm-capstone.git synced 2024-11-27 07:31:28 +00:00

Author	SHA1	Message	Date
itrofimow	51e91b64d0	[libc++abi] Implement __cxa_init_primary_exception and use it to optimize std::make_exception_ptr (#65534 ) This patch implements __cxa_init_primary_exception, an extension to the Itanium C++ ABI. This extension is already present in both libsupc++ and libcxxrt. This patch also starts making use of this function in std::make_exception_ptr: instead of going through a full throw/catch cycle, we are now able to initialize an exception directly, thus making std::make_exception_ptr around 30x faster.	2024-01-22 10:12:41 -05:00
ZijunZhaoCCK	fdd089b500	[libc++] Implement ranges::contains (#65148 ) Differential Revision: https://reviews.llvm.org/D159232 ``` Running ./ranges_contains.libcxx.out Run on (10 X 24.121 MHz CPU s) CPU Caches: L1 Data 64 KiB (x10) L1 Instruction 128 KiB (x10) L2 Unified 4096 KiB (x5) Load Average: 3.37, 6.77, 5.27 -------------------------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------------------------- bm_contains_char/16 1.88 ns 1.87 ns 371607095 bm_contains_char/256 7.48 ns 7.47 ns 93292285 bm_contains_char/4096 99.7 ns 99.6 ns 7013185 bm_contains_char/65536 1296 ns 1294 ns 540436 bm_contains_char/1048576 23887 ns 23860 ns 29302 bm_contains_char/16777216 389420 ns 389095 ns 1796 bm_contains_int/16 7.14 ns 7.14 ns 97776288 bm_contains_int/256 90.4 ns 90.3 ns 7558089 bm_contains_int/4096 1294 ns 1290 ns 543052 bm_contains_int/65536 20482 ns 20443 ns 34334 bm_contains_int/1048576 328817 ns 327965 ns 2147 bm_contains_int/16777216 5246279 ns 5239361 ns 133 bm_contains_bool/16 2.19 ns 2.19 ns 322565780 bm_contains_bool/256 3.42 ns 3.41 ns 205025467 bm_contains_bool/4096 22.1 ns 22.1 ns 31780479 bm_contains_bool/65536 333 ns 332 ns 2106606 bm_contains_bool/1048576 5126 ns 5119 ns 135901 bm_contains_bool/16777216 81656 ns 81574 ns 8569 ``` --------- Co-authored-by: Nathan Gauër <brioche@google.com>	2023-12-19 16:34:19 -08:00
Nikolas Klauser	f7407411a1	[libc++] Optimize std::find for segmented iterators (#67224 ) ``` -------------------------------------------------------------------------- Benchmark old new -------------------------------------------------------------------------- bm_find<std::deque<char>>/1 6.06 ns 10.6 ns bm_find<std::deque<char>>/2 15.5 ns 10.6 ns bm_find<std::deque<char>>/3 19.0 ns 10.6 ns bm_find<std::deque<char>>/4 20.8 ns 10.6 ns bm_find<std::deque<char>>/5 22.0 ns 10.6 ns bm_find<std::deque<char>>/6 23.0 ns 10.5 ns bm_find<std::deque<char>>/7 24.8 ns 10.7 ns bm_find<std::deque<char>>/8 25.7 ns 10.6 ns bm_find<std::deque<char>>/16 28.3 ns 10.6 ns bm_find<std::deque<char>>/64 44.2 ns 27.0 ns bm_find<std::deque<char>>/512 133 ns 37.6 ns bm_find<std::deque<char>>/4096 867 ns 53.1 ns bm_find<std::deque<char>>/32768 6838 ns 160 ns bm_find<std::deque<char>>/262144 52897 ns 1495 ns bm_find<std::deque<char>>/1048576 215621 ns 6077 ns bm_find<std::deque<short>>/1 6.03 ns 6.28 ns bm_find<std::deque<short>>/2 15.8 ns 15.8 ns bm_find<std::deque<short>>/3 20.5 ns 20.3 ns bm_find<std::deque<short>>/4 21.0 ns 21.0 ns bm_find<std::deque<short>>/5 23.0 ns 22.1 ns bm_find<std::deque<short>>/6 22.6 ns 23.0 ns bm_find<std::deque<short>>/7 23.4 ns 23.7 ns bm_find<std::deque<short>>/8 24.4 ns 24.9 ns bm_find<std::deque<short>>/16 26.6 ns 27.2 ns bm_find<std::deque<short>>/64 43.2 ns 40.9 ns bm_find<std::deque<short>>/512 124 ns 90.7 ns bm_find<std::deque<short>>/4096 845 ns 525 ns bm_find<std::deque<short>>/32768 7273 ns 3194 ns bm_find<std::deque<short>>/262144 53710 ns 24385 ns bm_find<std::deque<short>>/1048576 216086 ns 96195 ns bm_find<std::deque<int>>/1 6.03 ns 10.3 ns bm_find<std::deque<int>>/2 15.6 ns 10.3 ns bm_find<std::deque<int>>/3 19.1 ns 10.3 ns bm_find<std::deque<int>>/4 22.3 ns 10.3 ns bm_find<std::deque<int>>/5 23.5 ns 10.4 ns bm_find<std::deque<int>>/6 23.1 ns 10.3 ns bm_find<std::deque<int>>/7 23.7 ns 10.2 ns bm_find<std::deque<int>>/8 24.5 ns 10.2 ns bm_find<std::deque<int>>/16 27.9 ns 26.6 ns bm_find<std::deque<int>>/64 42.6 ns 32.2 ns bm_find<std::deque<int>>/512 123 ns 43.0 ns bm_find<std::deque<int>>/4096 874 ns 93.5 ns bm_find<std::deque<int>>/32768 7031 ns 751 ns bm_find<std::deque<int>>/262144 57723 ns 6169 ns bm_find<std::deque<int>>/1048576 230867 ns 35851 ns bm_ranges_find<std::deque<char>>/1 5.97 ns 10.6 ns bm_ranges_find<std::deque<char>>/2 16.0 ns 10.5 ns bm_ranges_find<std::deque<char>>/3 19.5 ns 10.5 ns bm_ranges_find<std::deque<char>>/4 21.1 ns 10.6 ns bm_ranges_find<std::deque<char>>/5 22.8 ns 10.5 ns bm_ranges_find<std::deque<char>>/6 22.8 ns 10.6 ns bm_ranges_find<std::deque<char>>/7 23.4 ns 10.8 ns bm_ranges_find<std::deque<char>>/8 24.1 ns 10.5 ns bm_ranges_find<std::deque<char>>/16 26.9 ns 10.6 ns bm_ranges_find<std::deque<char>>/64 50.2 ns 27.2 ns bm_ranges_find<std::deque<char>>/512 126 ns 38.3 ns bm_ranges_find<std::deque<char>>/4096 868 ns 53.8 ns bm_ranges_find<std::deque<char>>/32768 6695 ns 161 ns bm_ranges_find<std::deque<char>>/262144 54411 ns 1497 ns bm_ranges_find<std::deque<char>>/1048576 241699 ns 6042 ns bm_ranges_find<std::deque<short>>/1 6.39 ns 6.31 ns bm_ranges_find<std::deque<short>>/2 15.8 ns 15.9 ns bm_ranges_find<std::deque<short>>/3 19.0 ns 19.8 ns bm_ranges_find<std::deque<short>>/4 20.8 ns 20.9 ns bm_ranges_find<std::deque<short>>/5 21.8 ns 22.1 ns bm_ranges_find<std::deque<short>>/6 23.0 ns 23.0 ns bm_ranges_find<std::deque<short>>/7 23.2 ns 23.9 ns bm_ranges_find<std::deque<short>>/8 23.7 ns 24.4 ns bm_ranges_find<std::deque<short>>/16 26.6 ns 26.8 ns bm_ranges_find<std::deque<short>>/64 43.4 ns 39.7 ns bm_ranges_find<std::deque<short>>/512 131 ns 90.5 ns bm_ranges_find<std::deque<short>>/4096 851 ns 523 ns bm_ranges_find<std::deque<short>>/32768 7370 ns 3166 ns bm_ranges_find<std::deque<short>>/262144 60778 ns 24814 ns bm_ranges_find<std::deque<short>>/1048576 229288 ns 99273 ns bm_ranges_find<std::deque<int>>/1 6.43 ns 10.2 ns bm_ranges_find<std::deque<int>>/2 16.6 ns 10.2 ns bm_ranges_find<std::deque<int>>/3 19.6 ns 10.2 ns bm_ranges_find<std::deque<int>>/4 21.0 ns 10.2 ns bm_ranges_find<std::deque<int>>/5 21.9 ns 10.4 ns bm_ranges_find<std::deque<int>>/6 22.7 ns 10.2 ns bm_ranges_find<std::deque<int>>/7 23.9 ns 10.2 ns bm_ranges_find<std::deque<int>>/8 23.8 ns 10.2 ns bm_ranges_find<std::deque<int>>/16 27.2 ns 27.1 ns bm_ranges_find<std::deque<int>>/64 42.4 ns 32.4 ns bm_ranges_find<std::deque<int>>/512 122 ns 43.0 ns bm_ranges_find<std::deque<int>>/4096 895 ns 93.7 ns bm_ranges_find<std::deque<int>>/32768 6890 ns 756 ns bm_ranges_find<std::deque<int>>/262144 54025 ns 6102 ns bm_ranges_find<std::deque<int>>/1048576 221558 ns 32783 ns ```	2023-12-15 17:10:16 +01:00
Hui	36dd743212	[libc++][test] add more benchmarks for `stop_token` (#69590 )	2023-12-02 16:14:38 +00:00
Louis Dionne	1a5af34e6f	[libc++] Speed up classic locale (take 2) (#73533 ) Locale objects use atomic reference counting, which may be very expensive in parallel applications. The classic locale is used by default by all streams and can be very contended. But it's never destroyed, so the reference counting is also completely pointless on the classic locale. Currently ~70% of time in the parallel stringstream benchmarks is spent in locale ctor/dtor. And the execution radically slows down with more threads. Avoid reference counting on the classic locale. With this change parallel benchmarks start to scale with threads. This is a re-application of `f8afc53d64` (aka PR #72112) which was reverted in `4e0c48b907` because it broke the sanitizer builds due to an initialization order fiasco. This issue has now been fixed by ensuring that the locale is constinit'ed. Co-authored-by: Dmitry Vyukov <dvyukov@google.com>	2023-11-29 09:31:05 -05:00
Louis Dionne	ac8c9f1e39	[libc++] Properly guard std::filesystem with >= C++17 (#72701 ) <filesystem> is a C++17 addition. In C++11 and C++14 modes, we actually have all the code for <filesystem> but it is hidden behind a non-inline namespace __fs so it is not accessible. Instead of doing this unusual dance, just guard the code for filesystem behind a classic C++17 check like we normally do.	2023-11-28 18:41:59 -05:00
Kirill Stoimenov	4e0c48b907	Revert "[libc++] Speed up classic locale (#72112 )" Looks like it broke the ASAN build: https://lab.llvm.org/buildbot/#/builders/168/builds/17053/steps/9/logs/stdio This reverts commit `f8afc53d64`.	2023-11-27 20:21:58 +00:00
Dmitry Vyukov	f8afc53d64	[libc++] Speed up classic locale (#72112 ) Locale objects use atomic reference counting, which may be very expensive in parallel applications. The classic locale is used by default by all streams and can be very contended. But it's never destroyed, so the reference counting is also completely pointless on the classic locale. Currently ~70% of time in the parallel stringstream benchmarks is spent in locale ctor/dtor. And the execution radically slows down with more threads. Avoid reference counting on the classic locale. With this change parallel benchmarks start to scale with threads. Co-authored-by: Louis Dionne <ldionne.2@gmail.com> ``` │ baseline │ optimized │ │ sec/op │ sec/op vs base │ Istream_numbers/0/threads:1 4.672µ ± 0% 4.419µ ± 0% -5.42% (p=0.000 n=30+39) Istream_numbers/0/threads:72 539.817µ ± 0% 9.842µ ± 1% -98.18% (p=0.000 n=30+40) Istream_numbers/1/threads:1 4.890µ ± 0% 4.750µ ± 0% -2.85% (p=0.000 n=30+40) Istream_numbers/1/threads:72 66.44µ ± 1% 10.14µ ± 1% -84.74% (p=0.000 n=30+40) Istream_numbers/2/threads:1 4.888µ ± 0% 4.746µ ± 0% -2.92% (p=0.000 n=30+40) Istream_numbers/2/threads:72 494.8µ ± 0% 410.2µ ± 1% -17.11% (p=0.000 n=30+40) Istream_numbers/3/threads:1 4.697µ ± 0% 4.695µ ± 5% ~ (p=0.391 n=30+37) Istream_numbers/3/threads:72 421.5µ ± 7% 421.9µ ± 9% ~ (p=0.665 n=30) Ostream_number/0/threads:1 183.0n ± 0% 141.0n ± 2% -22.95% (p=0.000 n=30) Ostream_number/0/threads:72 24196.5n ± 1% 343.5n ± 3% -98.58% (p=0.000 n=30) Ostream_number/1/threads:1 250.0n ± 0% 196.0n ± 2% -21.60% (p=0.000 n=30) Ostream_number/1/threads:72 16260.5n ± 0% 407.0n ± 2% -97.50% (p=0.000 n=30) Ostream_number/2/threads:1 254.0n ± 0% 196.0n ± 1% -22.83% (p=0.000 n=30) Ostream_number/2/threads:72 28.49µ ± 1% 18.89µ ± 5% -33.72% (p=0.000 n=30) Ostream_number/3/threads:1 185.0n ± 0% 185.0n ± 0% 0.00% (p=0.017 n=30) Ostream_number/3/threads:72 19.38µ ± 4% 19.33µ ± 5% ~ (p=0.425 n=30) ```	2023-11-27 07:00:21 +01:00
Nikolas Klauser	c81bfc61da	[libc++] Optimize for_each for segmented iterators ``` --------------------------------------------------- Benchmark old new --------------------------------------------------- bm_for_each/1 3.00 ns 2.98 ns bm_for_each/2 4.53 ns 4.57 ns bm_for_each/3 5.82 ns 5.82 ns bm_for_each/4 6.94 ns 6.91 ns bm_for_each/5 7.55 ns 7.75 ns bm_for_each/6 7.06 ns 7.45 ns bm_for_each/7 6.69 ns 7.14 ns bm_for_each/8 6.86 ns 4.06 ns bm_for_each/16 11.5 ns 5.73 ns bm_for_each/64 43.7 ns 4.06 ns bm_for_each/512 356 ns 7.98 ns bm_for_each/4096 2787 ns 53.6 ns bm_for_each/32768 20836 ns 438 ns bm_for_each/262144 195362 ns 4945 ns bm_for_each/1048576 685482 ns 19822 ns ``` Reviewed By: ldionne, Mordante, #libc Spies: bgraur, sberg, arichardson, libcxx-commits Differential Revision: https://reviews.llvm.org/D151274	2023-11-14 23:55:24 +01:00
Hui	511236e074	[libc++][test] Add `stop_token` benchmark (#69117 ) This is transforming the `stop_token` benchmark that Lewis Baker had created into Google Bench https://reviews.llvm.org/D154702	2023-10-16 21:49:37 +01:00
Louis Dionne	36bb134ac7	[libc++] Use -nostdlib++ on GCC unconditionally (#68832 ) We support GCC 13, which supports the flag. This allows simplifying the CMake logic around the use of -nostdlib++. Note that there are other places where we don't assume -nostdlib++ yet in the build, but this patch is intentionally trying to be small because this part of our CMake is pretty tricky.	2023-10-13 11:53:57 -07:00
Nikolas Klauser	a9138cdb36	[libc++] Optimize ranges::count for __bit_iterators ``` --------------------------------------------------------------- Benchmark old new --------------------------------------------------------------- bm_vector_bool_count/1 1.92 ns 1.92 ns bm_vector_bool_count/2 1.92 ns 1.92 ns bm_vector_bool_count/3 1.92 ns 1.92 ns bm_vector_bool_count/4 1.92 ns 1.92 ns bm_vector_bool_count/5 1.92 ns 1.92 ns bm_vector_bool_count/6 1.92 ns 1.92 ns bm_vector_bool_count/7 1.92 ns 1.92 ns bm_vector_bool_count/8 1.92 ns 1.92 ns bm_vector_bool_count/16 1.92 ns 1.92 ns bm_vector_bool_count/64 2.24 ns 2.25 ns bm_vector_bool_count/512 3.19 ns 3.20 ns bm_vector_bool_count/4096 14.1 ns 12.3 ns bm_vector_bool_count/32768 84.0 ns 83.6 ns bm_vector_bool_count/262144 664 ns 661 ns bm_vector_bool_count/1048576 2623 ns 2628 ns bm_vector_bool_ranges_count/1 1.07 ns 1.92 ns bm_vector_bool_ranges_count/2 1.65 ns 1.92 ns bm_vector_bool_ranges_count/3 2.27 ns 1.92 ns bm_vector_bool_ranges_count/4 2.68 ns 1.92 ns bm_vector_bool_ranges_count/5 3.33 ns 1.92 ns bm_vector_bool_ranges_count/6 3.99 ns 1.92 ns bm_vector_bool_ranges_count/7 4.67 ns 1.92 ns bm_vector_bool_ranges_count/8 5.19 ns 1.92 ns bm_vector_bool_ranges_count/16 11.1 ns 1.92 ns bm_vector_bool_ranges_count/64 52.2 ns 2.24 ns bm_vector_bool_ranges_count/512 452 ns 3.20 ns bm_vector_bool_ranges_count/4096 3577 ns 12.1 ns bm_vector_bool_ranges_count/32768 28725 ns 83.7 ns bm_vector_bool_ranges_count/262144 229676 ns 662 ns bm_vector_bool_ranges_count/1048576 905574 ns 2625 ns ``` Reviewed By: #libc, ldionne Spies: arichardson, ldionne, libcxx-commits Differential Revision: https://reviews.llvm.org/D156956	2023-10-06 22:58:41 +02:00
Martijn Vels	6fe4e033f0	[libc++] Optimize vector push_back to avoid continuous load and store of end pointer Credits: this change is based on analysis and a proof of concept by gerbens@google.com. Before, the compiler loses track of end as 'this' and other references possibly escape beyond the compiler's scope. This can be see in the generated assembly: 16.28 │200c80: mov %r15d,(%rax) 60.87 │200c83: add $0x4,%rax │200c87: mov %rax,-0x38(%rbp) 0.03 │200c8b: → jmpq 200d4e ... ... 1.69 │200d4e: cmp %r15d,%r12d │200d51: → je 200c40 16.34 │200d57: inc %r15d 0.05 │200d5a: mov -0x38(%rbp),%rax 3.27 │200d5e: mov -0x30(%rbp),%r13 1.47 │200d62: cmp %r13,%rax │200d65: → jne 200c80 We fix this by always explicitly storing the loaded local and pointer back at the end of push back. This generates some slight source 'noise', but creates nice and compact fast path code, i.e.: 32.64 │200760: mov %r14d,(%r12) 9.97 │200764: add $0x4,%r12 6.97 │200768: mov %r12,-0x38(%rbp) 32.17 │20076c: add $0x1,%r14d 2.36 │200770: cmp %r14d,%ebx │200773: → je 200730 8.98 │200775: mov -0x30(%rbp),%r13 6.75 │200779: cmp %r13,%r12 │20077c: → jne 200760 Now there is a single store for the push_back value (as before), and a single store for the end without a reload (dependency). For fully local vectors, (i.e., not referenced elsewhere), the capacity load and store inside the loop could also be removed, but this requires more substantial refactoring inside vector. Differential Revision: https://reviews.llvm.org/D80588	2023-10-02 09:12:37 -04:00
Konstantin Varlamov	dd788af74a	Reapply "[libc++][ranges] Add benchmarks for the `from_range` constructors of `vector` and `deque`." (#67753 ) This reverts commit `10edd5d943` and guards against older versions of GCC to work around the problem.	2023-09-29 10:27:20 -04:00
Aaron Ballman	10edd5d943	Revert "[libc++][ranges] Add benchmarks for the `from_range` constructors of `vector` and `deque`." This reverts commit `390ac82317`. It broke the sphinx publish bots for our documentation: https://lab.llvm.org/buildbot/#/builders/242/builds/1130 because that machine has GCC 9.4.0 which does not know about C++23	2023-09-25 10:40:51 -04:00
Jake Egan	1abff08e9e	[AIX] cxx_std_23 is currently not a known feature to ibm-clang (#66952 ) Temporary workaround for the following CMake error: ``` CMake Error in /llvm/libcxx/benchmarks/CMakeLists.txt: The compiler feature "cxx_std_23" is not known to CXX compiler "IBMClang" ```	2023-09-21 10:34:07 -04:00
Louis Dionne	8bbcc6d548	[libc++] Add test coverage for unordered containers comparison (#66692 ) This patch is a melting pot of changes picked up from https://llvm.org/D61878. It adds a few tests checking corner cases of unordered containers comparison and adds benchmarks for a few unordered_set operations.	2023-09-21 05:11:49 -04:00
Zijun Zhao	0218ea4aaa	[libc++] Implement ranges::ends_with Reviewed By: #libc, var-const Differential Revision: https://reviews.llvm.org/D150831	2023-09-18 11:56:10 -07:00
Dmitry Ilvokhin	3df9c2984b	[libc++] Fix minor warnings in libcxx benchmarks This fixes some unused variable and "comparison of integers of different signs" warnings in the benchmarks. Differential Revision: https://reviews.llvm.org/D156613	2023-09-13 17:05:56 -04:00
Louis Dionne	d671126ad0	[libc++][NFC] Remove stray #if 1 that was probably a debugging leftover	2023-09-12 17:19:28 -04:00
Sirui Mu	679c0b48d7	[libc++abi] Refactor around __dynamic_cast This commit contains refactorings around __dynamic_cast without changing its behavior. Some important changes include: - Refactor __dynamic_cast into various small helper functions; - Move dynamic_cast_stress.pass.cpp to libcxx/benchmarks and refactor it into a benchmark. The benchmark performance numbers are updated as well. Differential Revision: https://reviews.llvm.org/D138006	2023-09-08 11:47:24 -04:00
varconst	390ac82317	[libc++][ranges] Add benchmarks for the `from_range` constructors of `vector` and `deque`. Differential Revision: https://reviews.llvm.org/D150747	2023-09-05 15:57:45 -07:00
Nikolas Klauser	68b1035965	[libc++][PSTL] Add a __parallel_sort implementation to libdispatch Reviewed By: #libc, ldionne Spies: ldionne, libcxx-commits Differential Revision: https://reviews.llvm.org/D155136	2023-08-15 12:20:40 -07:00
Edoardo Sanguineti	66bb6b4fda	[libc++] Optimize internal function in <system_error> In the event the internal function __init is called with an empty string the code will take unnecessary extra steps, in addition, the code generated might be overall greater because, to my understanding, when initializing a string with an empty `const char*` "" (like in this case), the compiler might be unable to deduce the string is indeed empty at compile time and more code is generated. The goal of this patch is to make a new internal function that will accept just an error code skipping the empty string argument. It should skip the unnecessary steps and in the event `if (ec)` is `false`, it will return an empty string using the correct ctor, avoiding any extra code generation issues. After the conversation about this patch matured in the libcxx channel on the LLVM Discord server, the patch was analyzed quickly with "Compiler Explorer" and other tools and it was discovered that it does indeed reduce the amount of code generated when using the latest stable clang version (16) which in turn produces faster code. This patch targets LLVM 18 as it will break the ABI by addressing https://github.com/llvm/llvm-project/issues/63985 Benchmark tests run on other machines as well show in the best case, that the new version without the extra string as an argument performs 10 times faster. On the buildkite CI run it shows the new code takes less CPU time as well. In conclusion, the new code should also just appear cleaner because there are fewer checks to do when there is no message. Reviewed By: #libc, philnik Spies: emaste, nemanjai, philnik, libcxx-commits Differential Revision: https://reviews.llvm.org/D155820	2023-08-11 13:08:45 -07:00
Nikolas Klauser	8670b53e11	[libc++] Optimize ranges::find for vector<bool> Benchmark results: ``` ---------------------------------------------------------------- Benchmark old new ---------------------------------------------------------------- bm_vector_bool_ranges_find/1 5.64 ns 6.08 ns bm_vector_bool_ranges_find/2 16.5 ns 6.03 ns bm_vector_bool_ranges_find/3 20.3 ns 6.07 ns bm_vector_bool_ranges_find/4 22.2 ns 6.08 ns bm_vector_bool_ranges_find/5 23.5 ns 6.05 ns bm_vector_bool_ranges_find/6 24.4 ns 6.10 ns bm_vector_bool_ranges_find/7 26.7 ns 6.10 ns bm_vector_bool_ranges_find/8 25.0 ns 6.08 ns bm_vector_bool_ranges_find/16 27.9 ns 6.07 ns bm_vector_bool_ranges_find/64 44.5 ns 5.35 ns bm_vector_bool_ranges_find/512 243 ns 25.7 ns bm_vector_bool_ranges_find/4096 1858 ns 35.6 ns bm_vector_bool_ranges_find/32768 15461 ns 93.5 ns bm_vector_bool_ranges_find/262144 126462 ns 571 ns bm_vector_bool_ranges_find/1048576 497736 ns 2272 ns ``` Reviewed By: #libc, Mordante Spies: var-const, Mordante, libcxx-commits Differential Revision: https://reviews.llvm.org/D156039	2023-08-01 10:28:25 -07:00
Dmitry Ilvokhin	78addb2c32	Use hash value checks optimizations consistently There are couple of optimizations of `__hash_table::find` which are applicable to other places like `__hash_table::__node_insert_unique_prepare` and `__hash_table::__emplace_unique_key_args`. ``` for (__nd = __nd->__next_; __nd != nullptr && (__nd->__hash() == __hash // ^^^^^^^^^^^^^^^^^^^^^^ // (1) \|\| std::__constrain_hash(__nd->__hash(), __bc) == __chash); __nd = __nd->__next_) { if ((__nd->__hash() == __hash) // ^^^^^^^^^^^^^^^^^^^^^^^^^^ // (2) && key_eq()(__nd->__upcast()->__value_, __k)) return iterator(__nd, this); } ``` (1): We can avoid expensive modulo operations from `std::__constrain_hash` if hashes matched. This one is from commit `6a411472e3`. (2): We can avoid `key_eq` calls if hashes didn't match. Commit: `318d35a7bc`. Both of them are applicable for insert and emplace methods. Results of unordered_set_operations benchmark: ``` Comparing /tmp/main to /tmp/hashtable-hash-value-optimization Benchmark Time CPU Time Old Time New CPU Old CPU New ------------------------------------------------------------------------------------------------------------------------------------------------------ BM_Hash/uint32_random_std_hash/1024 -0.0127 -0.0127 1511 1492 1508 1489 BM_Hash/uint32_random_custom_hash/1024 +0.0012 +0.0013 1370 1371 1367 1369 BM_Hash/uint32_top_std_hash/1024 -0.0027 -0.0028 1502 1497 1498 1494 BM_Hash/uint32_top_custom_hash/1024 +0.0033 +0.0032 1368 1373 1365 1370 BM_InsertValue/unordered_set_uint32/1024 +0.0267 +0.0266 36421 37392 36350 37318 BM_InsertValue/unordered_set_uint32_sorted/1024 +0.0230 +0.0229 28247 28897 28193 28837 BM_InsertValue/unordered_set_top_bits_uint32/1024 +0.0492 +0.0491 31012 32539 30952 32472 BM_InsertValueRehash/unordered_set_top_bits_uint32/1024 +0.0523 +0.0520 62905 66197 62780 66043 BM_InsertValue/unordered_set_string/1024 -0.0252 -0.0253 300762 293189 299805 292221 BM_InsertValueRehash/unordered_set_string/1024 -0.0932 -0.0920 332924 301882 331276 300810 BM_InsertValue/unordered_set_prefixed_string/1024 -0.0578 -0.0577 314239 296072 313222 295137 BM_InsertValueRehash/unordered_set_prefixed_string/1024 -0.0986 -0.0985 336091 302950 334982 301995 BM_Find/unordered_set_random_uint64/1024 -0.1416 -0.1417 16075 13798 16041 13769 BM_FindRehash/unordered_set_random_uint64/1024 -0.0105 -0.0105 5900 5838 5889 5827 BM_Find/unordered_set_sorted_uint64/1024 +0.0014 +0.0014 2813 2817 2807 2811 BM_FindRehash/unordered_set_sorted_uint64/1024 -0.0247 -0.0249 5863 5718 5851 5706 BM_Find/unordered_set_sorted_uint128/1024 +0.0113 +0.0112 15570 15746 15539 15713 BM_FindRehash/unordered_set_sorted_uint128/1024 +0.0438 +0.0441 6917 7220 6902 7206 BM_Find/unordered_set_sorted_uint32/1024 -0.0020 -0.0020 3098 3091 3092 3085 BM_FindRehash/unordered_set_sorted_uint32/1024 +0.0570 +0.0569 5377 5684 5368 5673 BM_Find/unordered_set_sorted_large_uint64/1024 +0.0081 +0.0081 3594 3623 3587 3616 BM_FindRehash/unordered_set_sorted_large_uint64/1024 -0.0542 -0.0540 6154 5820 6140 5808 BM_Find/unordered_set_top_bits_uint64/1024 -0.0061 -0.0061 10440 10377 10417 10353 BM_FindRehash/unordered_set_top_bits_uint64/1024 +0.0131 +0.0128 5852 5928 5840 5914 BM_Find/unordered_set_string/1024 -0.0352 -0.0349 189037 182384 188389 181809 BM_FindRehash/unordered_set_string/1024 -0.0309 -0.0311 180718 175142 180141 174532 BM_Find/unordered_set_prefixed_string/1024 -0.0559 -0.0557 190853 180177 190251 179659 BM_FindRehash/unordered_set_prefixed_string/1024 -0.0563 -0.0561 182396 172136 181797 171602 BM_Rehash/unordered_set_string_arg/1024 -0.0244 -0.0241 27052 26393 26989 26339 BM_Rehash/unordered_set_int_arg/1024 -0.0410 -0.0410 19582 18779 19539 18738 BM_InsertDuplicate/unordered_set_int/1024 +0.0023 +0.0025 12168 12196 12142 12173 BM_InsertDuplicate/unordered_set_string/1024 -0.0505 -0.0504 189238 179683 188648 179133 BM_InsertDuplicate/unordered_set_prefixed_string/1024 -0.0989 -0.0987 198893 179222 198263 178702 BM_EmplaceDuplicate/unordered_set_int/1024 -0.0175 -0.0173 12674 12452 12646 12427 BM_EmplaceDuplicate/unordered_set_string/1024 -0.0559 -0.0557 190104 179481 189492 178934 BM_EmplaceDuplicate/unordered_set_prefixed_string/1024 -0.1111 -0.1110 201233 178870 200608 178341 BM_InsertDuplicate/unordered_set_int_insert_arg/1024 -0.0747 -0.0745 12993 12022 12964 11997 BM_InsertDuplicate/unordered_set_string_insert_arg/1024 -0.0584 -0.0583 191489 180311 190864 179731 BM_EmplaceDuplicate/unordered_set_int_insert_arg/1024 -0.0807 -0.0804 35946 33047 35866 32982 BM_EmplaceDuplicate/unordered_set_string_arg/1024 -0.0312 -0.0310 321623 311601 320559 310637 OVERALL_GEOMEAN -0.0276 -0.0275 0 0 0 0 ``` Time differences looks more like noise to me. But if we want to have this optimizations in `find`, we probably want them in `insert` and `emplace` as well. Reviewed By: #libc, Mordante Differential Revision: https://reviews.llvm.org/D140779	2023-07-04 21:01:08 +02:00
Louis Dionne	5aa03b648b	[libc++][NFC] Apply clang-format on large parts of the code base This commit does a pass of clang-format over files in libc++ that don't require major changes to conform to our style guide, or for which we're not overly concerned about conflicting with in-flight patches or hindering the git blame. This roughly covers: - benchmarks - range algorithms - concepts - type traits I did a manual verification of all the changes, and in particular I applied clang-format on/off annotations in a few places where the result was less readable after than before. This was not necessary in a lot of places, however I did find that clang-format had pretty bad taste when it comes to formatting concepts. Differential Revision: https://reviews.llvm.org/D153140	2023-06-19 11:19:51 -04:00
Mark de Wever	f26b43fa5c	[libc++] Removes CMake work-arounds. CMake older than 3.20.0 is no longer supported. This removes work-arounds for no longer supported versions. Reviewed By: #libc, jloser, philnik Differential Revision: https://reviews.llvm.org/D152099	2023-06-06 08:10:31 +02:00
Nikolas Klauser	d965960fcf	Revert "[libc++] Optimize for_each for segmented iterators" This reverts commit `b1dc43aa3a`.	2023-06-05 10:00:02 -07:00
Nikolas Klauser	b1dc43aa3a	[libc++] Optimize for_each for segmented iterators ``` --------------------------------------------------- Benchmark old new --------------------------------------------------- bm_for_each/1 3.00 ns 2.98 ns bm_for_each/2 4.53 ns 4.57 ns bm_for_each/3 5.82 ns 5.82 ns bm_for_each/4 6.94 ns 6.91 ns bm_for_each/5 7.55 ns 7.75 ns bm_for_each/6 7.06 ns 7.45 ns bm_for_each/7 6.69 ns 7.14 ns bm_for_each/8 6.86 ns 4.06 ns bm_for_each/16 11.5 ns 5.73 ns bm_for_each/64 43.7 ns 4.06 ns bm_for_each/512 356 ns 7.98 ns bm_for_each/4096 2787 ns 53.6 ns bm_for_each/32768 20836 ns 438 ns bm_for_each/262144 195362 ns 4945 ns bm_for_each/1048576 685482 ns 19822 ns ``` Reviewed By: ldionne, Mordante, #libc Spies: arichardson, libcxx-commits Differential Revision: https://reviews.llvm.org/D151274	2023-05-31 18:15:25 -07:00
Nikolas Klauser	1fd08edd58	[libc++] Forward to std::{,w}memchr in std::find Reviewed By: #libc, ldionne Spies: Mordante, libcxx-commits, ldionne, mikhail.ramalho Differential Revision: https://reviews.llvm.org/D144394	2023-05-25 07:59:50 -07:00
Tobias Hieta	7bfaa0f09d	[NFC][Py Reformat] Reformat python files in libcxx/libcxxabi This is an ongoing series of commits that are reformatting our Python code. Reformatting is done with `black`. If you end up having problems merging this commit because you have made changes to a python file, the best way to handle that is to run git checkout --ours <yourfile> and then reformat it with black. If you run into any problems, post to discourse about it and we will try to help. RFC Thread below: https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style Reviewed By: #libc, kwk, Mordante Differential Revision: https://reviews.llvm.org/D150763	2023-05-25 11:15:34 +02:00
AdityaK	63a2b206fa	[libc++, std::vector] call the optimized version of __uninitialized_allocator_copy for trivial types See: https://github.com/llvm/llvm-project/issues/61987 Fix suggested by: @philnik and @var-const Reviewers: philnik, ldionne, EricWF, var-const Differential Revision: https://reviews.llvm.org/D147741 Testing: ninja check-cxx check-clang check-llvm Benchmark Testcases (BM_CopyConstruct, and BM_Assignment) added. performance improvement: Run on (8 X 4800 MHz CPU s) CPU Caches: L1 Data 48 KiB (x4) L1 Instruction 32 KiB (x4) L2 Unified 1280 KiB (x4) L3 Unified 12288 KiB (x1) Load Average: 1.66, 3.02, 2.43 Comparing build-runtimes-base/libcxx/benchmarks/vector_operations.libcxx.out to build-runtimes/libcxx/benchmarks/vector_operations.libcxx.out Benchmark Time CPU Time Old Time New CPU Old CPU New ---------------------------------------------------------------------------------------------------------------------------------------- BM_ConstructSize/vector_byte/5140480 +0.0362 +0.0362 116906 121132 116902 121131 BM_CopyConstruct/vector_int/5140480 -0.4563 -0.4577 1755224 954241 1755330 951987 BM_Assignment/vector_int/5140480 -0.0222 -0.0220 990045 968095 989917 968125 BM_ConstructSizeValue/vector_byte/5140480 +0.0308 +0.0307 116970 120567 116977 120573 BM_ConstructIterIter/vector_char/1024 -0.0831 -0.0831 19 17 19 17 BM_ConstructIterIter/vector_size_t/1024 +0.0129 +0.0131 88 89 88 89 BM_ConstructIterIter/vector_string/1024 -0.0064 -0.0018 54455 54109 54208 54112 OVERALL_GEOMEAN -0.0845 -0.0842 0 0 0 0 FYI, the perf improvements for BM_CopyConstruct due to this patch is mostly subsumed by the https://reviews.llvm.org/D149826. However this patch still adds value by converting copy to memmove (the second testcase). Before the patch: ``` define linkonce_odr dso_local void @_ZNSt3__16vectorIiNS_9allocatorIiEEE18__construct_at_endIPiS5_EEvT_T0_m(ptr noundef nonnull align 8 dereferenceable(24) %0, ptr noundef %1, ptr noundef %2, i64 noundef %3) local_unnamed_addr #4 comdat align 2 { %5 = getelementptr inbounds %"class.std::__1::vector", ptr %0, i64 0, i32 1 %6 = load ptr, ptr %5, align 8, !tbaa !12 %7 = icmp eq ptr %1, %2 br i1 %7, label %16, label %8 8: ; preds = %4, %8 %9 = phi ptr [ %13, %8 ], [ %1, %4 ] %10 = phi ptr [ %14, %8 ], [ %6, %4 ] %11 = icmp ne ptr %10, null tail call void @llvm.assume(i1 %11) %12 = load i32, ptr %9, align 4, !tbaa !14 store i32 %12, ptr %10, align 4, !tbaa !14 %13 = getelementptr inbounds i32, ptr %9, i64 1 %14 = getelementptr inbounds i32, ptr %10, i64 1 %15 = icmp eq ptr %13, %2 br i1 %15, label %16, label %8, !llvm.loop !16 16: ; preds = %8, %4 %17 = phi ptr [ %6, %4 ], [ %14, %8 ] store ptr %17, ptr %5, align 8, !tbaa !12 ret void } ``` After the patch: ``` define linkonce_odr dso_local void @_ZNSt3__16vectorIiNS_9allocatorIiEEE18__construct_at_endIPiS5_EEvT_T0_m(ptr noundef nonnull align 8 dereferenceable(24) %0, ptr noundef %1, ptr noundef %2, i64 noundef %3) local_unnamed_addr #4 comdat align 2 { %5 = getelementptr inbounds %"class.std::__1::vector", ptr %0, i64 0, i32 1 %6 = load ptr, ptr %5, align 8, !tbaa !12 %7 = ptrtoint ptr %2 to i64 %8 = ptrtoint ptr %1 to i64 %9 = sub i64 %7, %8 %10 = ashr exact i64 %9, 2 tail call void @llvm.memmove.p0.p0.i64(ptr align 4 %6, ptr align 4 %1, i64 %9, i1 false) %11 = getelementptr inbounds i32, ptr %6, i64 %10 store ptr %11, ptr %5, align 8, !tbaa !12 ret void } ``` This is due to the optimized version of uninitialized_allocator_copy function.	2023-05-24 13:40:53 -07:00
Sirui Mu	c9d475c937	[libc++abi] Improve performance of __dynamic_cast The original `__dynamic_cast` implementation does not use the ABI-provided `src2dst_offset` parameter which helps improve performance on the hot paths. This patch improves the performance on the hot paths in `__dynamic_cast` by leveraging hints provided by the `src2dst_offset` parameter. This patch also includes a performance benchmark suite for the `__dynamic_cast` implementation. Reviewed By: philnik, ldionne, #libc, #libc_abi, avogelsgesang Spies: mikhail.ramalho, avogelsgesang, xingxue, libcxx-commits Differential Revision: https://reviews.llvm.org/D138005	2023-03-19 10:04:34 +01:00
Nikolas Klauser	aff3cdc604	[libc++] Optimize std::ranges::{min, max} for types that are cheap to copy Don't forward to `min_element` for small types that are trivially copyable, and instead use a naive loop that keeps track of the smallest element (as opposed to an iterator to the smallest element). This allows the compiler to vectorize the loop in some cases. Reviewed By: #libc, ldionne Spies: ldionne, libcxx-commits Differential Revision: https://reviews.llvm.org/D143596	2023-03-11 16:28:24 +01:00
Nikolas Klauser	b4ecfd3c46	[libc++] Forward to std::memcmp for trivially comparable types in equal Reviewed By: #libc, ldionne Spies: ldionne, Mordante, libcxx-commits Differential Revision: https://reviews.llvm.org/D139554	2023-02-21 17:11:21 +01:00
Adrian Vogelsgesang	2a06757a20	[libc++][spaceship] Implement `lexicographical_compare_three_way` The implementation makes use of the freedom added by LWG 3410. We have two variants of this algorithm: * a fast path for random access iterators: This fast path computes the maximum number of loop iterations up-front and does not compare the iterators against their limits on every loop iteration. * A basic implementation for all other iterators: This implementation compares the iterators against their limits in every loop iteration. However, it still takes advantage of the freedom added by LWG 3410 to avoid unnecessary additional iterator comparisons, as originally specified by P1614R2. https://godbolt.org/z/7xbMEen5e shows the benefit of the fast path: The hot loop generated of `lexicographical_compare_three_way3` is more tight than for `lexicographical_compare_three_way1`. The added benchmark illustrates how this leads to a 30% - 50% performance improvement on integer vectors. Implements part of P1614R2 "The Mothership has Landed" Fixes LWG 3410 and LWG 3350 Differential Revision: https://reviews.llvm.org/D131395	2023-02-12 14:51:08 -08:00
Nikolas Klauser	21f4232dd9	[libc++] Enable segmented iterator optimizations for join_view::iterator Reviewed By: ldionne, #libc Spies: libcxx-commits Differential Revision: https://reviews.llvm.org/D138413	2023-01-20 07:55:58 +01:00
Nikolas Klauser	c90801457f	[libc++] Refactor deque::iterator algorithm optimizations This has multiple benefits: - The optimizations are also performed for the `ranges::` versions of the algorithms - Code duplication is reduced - it is simpler to add this optimization for other segmented iterators, like `ranges::join_view::iterator` - Algorithm code is removed from `<deque>` Reviewed By: ldionne, huixie90, #libc Spies: mstorsjo, sstefan1, EricWF, libcxx-commits, mgorny Differential Revision: https://reviews.llvm.org/D132505	2023-01-19 20:11:43 +01:00
Nikolas Klauser	549a5fd0b7	[libc++] Make pmr::monotonic_buffer_resource bump down Bumping down is significantly faster than bumping up. This is ABI breaking, but the ABI of `pmr::monotonic_buffer_resource` was only stabilized in this release cycle, so we can still change it. For a more detailed explanation why bumping down is better, see https://fitzgeraldnick.com/2019/11/01/always-bump-downwards.html. Reviewed By: ldionne, #libc Spies: libcxx-commits Differential Revision: https://reviews.llvm.org/D141435	2023-01-12 18:36:45 +01:00
Louis Dionne	355e0ce3c5	[libc++] Extend check for non-ASCII characters to src/, test/ and benchmarks/ Differential Revision: https://reviews.llvm.org/D132180	2022-08-23 18:36:38 -04:00
Mark de Wever	857a78c04d	[libc++] Implements Unicode grapheme clustering This implements the Grapheme clustering as required by P1868R2 width: clarifying units of width and precision in std::format This was omitted in the initial patch, but the paper was marked as completed. This really completes the paper. Reviewed By: ldionne, #libc Differential Revision: https://reviews.llvm.org/D126971	2022-07-20 18:38:32 +02:00
Louis Dionne	8711fcae27	[libc++] Treat incomplete features just like other experimental features In particular remove the ability to expel incomplete features from the library at configure-time, since this can now be done through the _LIBCPP_ENABLE_EXPERIMENTAL macro. Also, never provide symbols related to incomplete features inside the dylib, instead provide them in c++experimental.a (this changes the symbols list, but not for any configuration that should have shipped). Differential Revision: https://reviews.llvm.org/D128928	2022-07-19 10:50:20 -04:00
Mark de Wever	f338f416ba	[libc++][format] Adds integral formatter benchmarks. This is a preparation to look at possible performance improvements. Reviewed By: #libc, ldionne Differential Revision: https://reviews.llvm.org/D129421	2022-07-12 19:17:57 +02:00
Ivan Trofimov	3085e42f80	[libc++] Don't call key_eq in unordered_map/set rehashing routine As of now containers key_eq might get called when rehashing happens, which is redundant for unique keys containers. Reviewed By: #libc, philnik, Mordante Differential Revision: https://reviews.llvm.org/D128021	2022-07-10 11:44:12 +02:00
Konstantin Varlamov	c945bd0da6	[libc++][ranges] Implement modifying heap algorithms: - `ranges::make_heap`; - `ranges::push_heap`; - `ranges::pop_heap`; - `ranges::sort_heap`. Differential Revision: https://reviews.llvm.org/D128115	2022-07-08 13:48:41 -07:00
Konstantin Varlamov	94c7b89fe5	[libc++][ranges] Implement `ranges::stable_sort`. Differential Revision: https://reviews.llvm.org/D127834	2022-07-01 16:34:26 -07:00
Louis Dionne	303c4c37ea	[libc++] Don't force -O2 when building the benchmarks The optimization level used when building the benchmarks should match the optimization level of the current build. Otherwise, we can end up mixing an -O3 or -O0 optimized dylib with benchmarks built with -O2, which is really misleading. Differential Revision: https://reviews.llvm.org/D127987	2022-06-17 17:34:02 -04:00
Konstantin Varlamov	ff3989e6ae	[libc++][ranges] Implement `ranges::sort`. Differential Revision: https://reviews.llvm.org/D127557	2022-06-16 15:21:06 -07:00
Mark de Wever	aed5ddf8d0	[libc++][format] Implement format-string. Implements the compile-time checking of the formatting arguments. Completes: - P2216 std::format improvements Reviewed By: #libc, ldionne Differential Revision: https://reviews.llvm.org/D121530	2022-06-11 15:25:56 +02:00

1 2 3 4

160 Commits