llvm-capstone/libcxx/benchmarks
Martijn Vels 6fe4e033f0 [libc++] Optimize vector push_back to avoid continuous load and store of end pointer
Credits: this change is based on analysis and a proof of concept by
gerbens@google.com.

Before, the compiler loses track of end as 'this' and other references
possibly escape beyond the compiler's scope. This can be see in the
generated assembly:

     16.28 │200c80:   mov     %r15d,(%rax)
     60.87 │200c83:   add     $0x4,%rax
           │200c87:   mov     %rax,-0x38(%rbp)
      0.03 │200c8b: → jmpq    200d4e
      ...
      ...
      1.69 │200d4e:   cmp     %r15d,%r12d
           │200d51: → je      200c40
     16.34 │200d57:   inc     %r15d
      0.05 │200d5a:   mov     -0x38(%rbp),%rax
      3.27 │200d5e:   mov     -0x30(%rbp),%r13
      1.47 │200d62:   cmp     %r13,%rax
           │200d65: → jne     200c80

We fix this by always explicitly storing the loaded local and pointer
back at the end of push back. This generates some slight source 'noise',
but creates nice and compact fast path code, i.e.:

     32.64 │200760:   mov    %r14d,(%r12)
      9.97 │200764:   add    $0x4,%r12
      6.97 │200768:   mov    %r12,-0x38(%rbp)
     32.17 │20076c:   add    $0x1,%r14d
      2.36 │200770:   cmp    %r14d,%ebx
           │200773: → je     200730
      8.98 │200775:   mov    -0x30(%rbp),%r13
      6.75 │200779:   cmp    %r13,%r12
           │20077c: → jne    200760

Now there is a single store for the push_back value (as before), and a
single store for the end without a reload (dependency).

For fully local vectors, (i.e., not referenced elsewhere), the capacity
load and store inside the loop could also be removed, but this requires
more substantial refactoring inside vector.

Differential Revision: https://reviews.llvm.org/D80588
2023-10-02 09:12:37 -04:00
..
algorithms [libc++] Implement ranges::ends_with 2023-09-18 11:56:10 -07:00
libcxxabi [libc++abi] Refactor around __dynamic_cast 2023-09-08 11:47:24 -04:00
algorithms.partition_point.bench.cpp
allocation.bench.cpp [libc++] Fix minor warnings in libcxx benchmarks 2023-09-13 17:05:56 -04:00
CartesianBenchmarks.h
CMakeLists.txt Reapply "[libc++][ranges] Add benchmarks for the from_range constructors of vector and deque." (#67753) 2023-09-29 10:27:20 -04:00
ContainerBenchmarks.h [libc++] Optimize vector push_back to avoid continuous load and store of end pointer 2023-10-02 09:12:37 -04:00
deque_iterator.bench.cpp
deque.bench.cpp Reapply "[libc++][ranges] Add benchmarks for the from_range constructors of vector and deque." (#67753) 2023-09-29 10:27:20 -04:00
filesystem.bench.cpp
format_to_n.bench.cpp
format_to.bench.cpp
format.bench.cpp
formatted_size.bench.cpp
formatter_float.bench.cpp
formatter_int.bench.cpp
function.bench.cpp
GenerateInput.h
join_view.bench.cpp
lexicographical_compare_three_way.bench.cpp
lit.cfg.py
lit.site.cfg.py.in
map.bench.cpp
monotonic_buffer.bench.cpp [libc++] Fix minor warnings in libcxx benchmarks 2023-09-13 17:05:56 -04:00
ordered_set.bench.cpp
random.bench.cpp
std_format_spec_string_unicode.bench.cpp
string.bench.cpp
stringstream.bench.cpp
system_error.bench.cpp
to_chars.bench.cpp
unordered_set_operations.bench.cpp [libc++] Add test coverage for unordered containers comparison (#66692) 2023-09-21 05:11:49 -04:00
util_smartptr.bench.cpp
Utilities.h
variant_visit_1.bench.cpp
variant_visit_2.bench.cpp
variant_visit_3.bench.cpp
VariantBenchmarks.h
vector_operations.bench.cpp [libc++] Optimize vector push_back to avoid continuous load and store of end pointer 2023-10-02 09:12:37 -04:00