llvm-capstone

mirror of https://github.com/capstone-engine/llvm-capstone.git synced 2024-10-07 19:03:57 +00:00

History

Tue Ly b91e78da37 [libc][math] Implement double precision log1p correctly rounded to all rounding modes. Implement double precision log1p function correctly rounded to all rounding modes. Performance - For `0.5 <= x <= 2`, the fast pass hitting rate is about 99.93%. - Benchmarks with `./perf.sh` tool from the CORE-MATH project, unit is (CPU clocks / call). - Reciprocal throughput from CORE-MATH's perf tool on Ryzen 5900X: ``` $ ./perf.sh log1p GNU libc version: 2.35 GNU libc release: stable -- CORE-MATH reciprocal throughput -- with FMA [####################] 100 % Ntrial = 20 ; Min = 39.792 + 1.011 clc/call; Median-Min = 0.940 clc/call; Max = 41.373 clc/call; -- CORE-MATH reciprocal throughput -- without FMA (-march=x86-64-v2) [####################] 100 % Ntrial = 20 ; Min = 87.285 + 1.135 clc/call; Median-Min = 1.299 clc/call; Max = 89.715 clc/call; -- System LIBC reciprocal throughput -- [####################] 100 % Ntrial = 20 ; Min = 20.666 + 0.123 clc/call; Median-Min = 0.125 clc/call; Max = 20.828 clc/call; -- LIBC reciprocal throughput -- with FMA [####################] 100 % Ntrial = 20 ; Min = 20.928 + 0.771 clc/call; Median-Min = 0.725 clc/call; Max = 22.767 clc/call; -- LIBC reciprocal throughput -- without FMA [####################] 100 % Ntrial = 20 ; Min = 31.461 + 0.528 clc/call; Median-Min = 0.602 clc/call; Max = 36.809 clc/call; ``` - Latency from CORE-MATH's perf tool on Ryzen 5900X: ``` $ ./perf.sh log1p --latency GNU libc version: 2.35 GNU libc release: stable -- CORE-MATH latency -- with FMA [####################] 100 % Ntrial = 20 ; Min = 77.875 + 0.062 clc/call; Median-Min = 0.051 clc/call; Max = 78.003 clc/call; -- CORE-MATH latency -- without FMA (-march=x86-64-v2) [####################] 100 % Ntrial = 20 ; Min = 101.958 + 1.202 clc/call; Median-Min = 1.325 clc/call; Max = 104.452 clc/call; -- System LIBC latency -- [####################] 100 % Ntrial = 20 ; Min = 60.581 + 1.443 clc/call; Median-Min = 1.611 clc/call; Max = 62.285 clc/call; -- LIBC latency -- with FMA [####################] 100 % Ntrial = 20 ; Min = 48.817 + 1.108 clc/call; Median-Min = 1.300 clc/call; Max = 50.282 clc/call; -- LIBC latency -- without FMA [####################] 100 % Ntrial = 20 ; Min = 61.121 + 0.599 clc/call; Median-Min = 0.761 clc/call; Max = 62.020 clc/call; ``` - Accurate pass latency: ``` $ ./perf.sh log1p --latency --simple_stat GNU libc version: 2.35 GNU libc release: stable -- CORE-MATH latency -- with FMA 760.444 -- CORE-MATH latency -- without FMA (-march=x86-64-v2) 827.880 -- LIBC latency -- with FMA 711.837 -- LIBC latency -- without FMA 764.317 ``` Reviewed By: zimmermann6 Differential Revision: https://reviews.llvm.org/D151049	2023-05-23 11:04:04 -04:00
..
arm	[libc][math] Implement double precision log1p correctly rounded to all rounding modes.	2023-05-23 11:04:04 -04:00
x86_64	[libc] Add memmem implementation	2023-04-11 20:49:25 +00:00

Tue Ly b91e78da37 [libc][math] Implement double precision log1p correctly rounded to all rounding modes.

Implement double precision log1p function correctly rounded to all
rounding modes.

**Performance**

  - For `0.5 <= x <= 2`, the fast pass hitting rate is about 99.93%.
  - Benchmarks with `./perf.sh` tool from the CORE-MATH project, unit is (CPU clocks / call).
  - Reciprocal throughput from CORE-MATH's perf tool on Ryzen 5900X:
```
$ ./perf.sh log1p
GNU libc version: 2.35
GNU libc release: stable

-- CORE-MATH reciprocal throughput -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 39.792 + 1.011 clc/call; Median-Min = 0.940 clc/call; Max = 41.373 clc/call;

-- CORE-MATH reciprocal throughput -- without FMA (-march=x86-64-v2)
[####################] 100 %
Ntrial = 20 ; Min = 87.285 + 1.135 clc/call; Median-Min = 1.299 clc/call; Max = 89.715 clc/call;

-- System LIBC reciprocal throughput --
[####################] 100 %
Ntrial = 20 ; Min = 20.666 + 0.123 clc/call; Median-Min = 0.125 clc/call; Max = 20.828 clc/call;

-- LIBC reciprocal throughput -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 20.928 + 0.771 clc/call; Median-Min = 0.725 clc/call; Max = 22.767 clc/call;

-- LIBC reciprocal throughput -- without FMA
[####################] 100 %
Ntrial = 20 ; Min = 31.461 + 0.528 clc/call; Median-Min = 0.602 clc/call; Max = 36.809 clc/call;

```
  - Latency from CORE-MATH's perf tool on Ryzen 5900X:
```
$ ./perf.sh log1p --latency
GNU libc version: 2.35
GNU libc release: stable

-- CORE-MATH latency -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 77.875 + 0.062 clc/call; Median-Min = 0.051 clc/call; Max = 78.003 clc/call;

-- CORE-MATH latency -- without FMA (-march=x86-64-v2)
[####################] 100 %
Ntrial = 20 ; Min = 101.958 + 1.202 clc/call; Median-Min = 1.325 clc/call; Max = 104.452 clc/call;

-- System LIBC latency --
[####################] 100 %
Ntrial = 20 ; Min = 60.581 + 1.443 clc/call; Median-Min = 1.611 clc/call; Max = 62.285 clc/call;

-- LIBC latency -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 48.817 + 1.108 clc/call; Median-Min = 1.300 clc/call; Max = 50.282 clc/call;

-- LIBC latency -- without FMA
[####################] 100 %
Ntrial = 20 ; Min = 61.121 + 0.599 clc/call; Median-Min = 0.761 clc/call; Max = 62.020 clc/call;
```
  - Accurate pass latency:
```
$ ./perf.sh log1p --latency --simple_stat
GNU libc version: 2.35
GNU libc release: stable

-- CORE-MATH latency -- with FMA
760.444

-- CORE-MATH latency -- without FMA (-march=x86-64-v2)
827.880

-- LIBC latency -- with FMA
711.837

-- LIBC latency -- without FMA
764.317
```

Reviewed By: zimmermann6

Differential Revision: https://reviews.llvm.org/D151049

2023-05-23 11:04:04 -04:00

arm

[libc][math] Implement double precision log1p correctly rounded to all rounding modes.

2023-05-23 11:04:04 -04:00

x86_64

[libc] Add memmem implementation

2023-04-11 20:49:25 +00:00