!23 回退 'Pull Request !22 : 将optimized-routine从21.02升级至23.01版本'

Merge pull request !23 from openharmony_ci/revert-merge-22-master
This commit is contained in:
openharmony_ci
2023-12-09 09:54:31 +00:00
committed by Gitee
229 changed files with 4904 additions and 5156 deletions
+1 -229
View File
@@ -1,11 +1,6 @@
MIT OR Apache-2.0 WITH LLVM-exception
=====================================
MIT License
-----------
Copyright (c) 1999-2022, Arm Limited.
Copyright (c) 1999-2019, Arm Limited.
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
@@ -24,226 +19,3 @@ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
Apache-2.0 WITH LLVM-exception
------------------------------
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
--- LLVM Exceptions to the Apache 2.0 License ----
As an exception, if, as a result of your compiling your source code, portions
of this Software are embedded into an Object form of such source code, you
may redistribute such embedded portions in such Object form without complying
with the conditions of Sections 4(a), 4(b) and 4(d) of the License.
In addition, if you combine or link compiled forms of this Software with
software that is licensed under the GPLv2 ("Combined Software") and if a
court of competent jurisdiction determines that the patent provision (Section
3), the indemnity provision (Section 9) or other Section of the License
conflicts with the conditions of the GPLv2, you may retroactively and
prospectively choose to deem waived or otherwise exclude such Section(s) of
the License, but only in their entirety and only with respect to the Combined
Software.
+2 -5
View File
@@ -1,7 +1,7 @@
# Makefile - requires GNU make
#
# Copyright (c) 2018-2022, Arm Limited.
# SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
# Copyright (c) 2018-2020, Arm Limited.
# SPDX-License-Identifier: MIT
srcdir = .
prefix = /usr
@@ -11,7 +11,6 @@ includedir = $(prefix)/include
# Configure these in config.mk, do not make changes in this file.
SUBS = math string networking
PLSUBS = math
HOST_CC = cc
HOST_CFLAGS = -std=c99 -O2
HOST_LDFLAGS =
@@ -21,7 +20,6 @@ CPPFLAGS =
CFLAGS = -std=c99 -O2
CFLAGS_SHARED = -fPIC
CFLAGS_ALL = -Ibuild/include $(CPPFLAGS) $(CFLAGS)
CFLAGS_PL = -Ibuild/pl/include $(CPPFLAGS) $(CFLAGS) -DPL
LDFLAGS =
LDLIBS =
AR = $(CROSS_COMPILE)ar
@@ -53,7 +51,6 @@ $(DIRS):
mkdir -p $@
$(filter %.os,$(ALL_FILES)): CFLAGS_ALL += $(CFLAGS_SHARED)
$(filter %.os,$(ALL_FILES)): CFLAGS_PL += $(CFLAGS_SHARED)
build/%.o: $(srcdir)/%.S
$(CC) $(CFLAGS_ALL) -c -o $@ $<
+3 -36
View File
@@ -19,7 +19,7 @@
policylist:
1. policy: If the OAT-Default.xml policies do not meet your requirements, please add policies here.
2. policyitem: The fields type, name, path, desc is required, and the fields rule, group, filefilter is optional,the default value is:
<policyitem type="" name="" path="" desc="" rule="may" filefilter="defaultPolicyFilter"/>
<policyitem type="" name="" path="" desc="" rule="may" group="defaultGroup" filefilter="defaultPolicyFilter"/>
3. policyitem type:
"compatibility" is used to check license compatibility in the specified path;
"license" is used to check source license header in the specified path;
@@ -49,43 +49,10 @@ All configurations in this file will be merged to OAT-Default.xml, if you have a
<configuration>
<oatconfig>
<licensefile></licensefile>
<policylist>
<policy>
<policyitem type="license" name="MIT" path=".*" desc="兼容license"/>
<policyitem type="compatibility" name="GPL-2.0+-with-LLVM-exception" path="math/.*" rule="may" filefilter="defaultPolicyFilter" desc="Other process calls"/>
<policyitem type="compatibility" name="GPL-2.0+-with-LLVM-exception" path="networking/.*" rule="may" filefilter="defaultPolicyFilter" desc="Other process calls"/>
<policyitem type="compatibility" name="GPL-2.0+-with-LLVM-exception" path="string/aarch64/.*" rule="may" filefilter="defaultPolicyFilter" desc="Other process calls"/>
<policyitem type="compatibility" name="GPL-2.0+-with-LLVM-exception" path="string/include/.*" rule="may" filefilter="defaultPolicyFilter" desc="Other process calls"/>
<policyitem type="compatibility" name="GPL-2.0+-with-LLVM-exception" path="string/bench/.*" rule="may" filefilter="defaultPolicyFilter" desc="Other process calls"/>
<policyitem type="compatibility" name="GPL-2.0+-with-LLVM-exception" path="string/test/.*" rule="may" filefilter="defaultPolicyFilter" desc="Other process calls"/>
<policyitem type="compatibility" name="GPL-2.0+-with-LLVM-exception" path="string/x86_64/.*" rule="may" filefilter="defaultPolicyFilter" desc="Other process calls"/>
<policyitem type="compatibility" name="GPL-2.0+-with-LLVM-exception" path="string/Dir.mk" rule="may" filefilter="defaultPolicyFilter" desc="Other process calls"/>
<policyitem type="compatibility" name="GPL-2.0+-with-LLVM-exception" path="Makefile" rule="may" filefilter="defaultPolicyFilter" desc="Other process calls"/>
</policy>
</policylist>
<filefilterlist>
<filefilter name="defaultPolicyFilter" desc="Filters for compatibility, license header policies">
<filteritem type="filepath" name="math/README.contributors" desc="官方自带文件"/>
<filteritem type="filepath" name="LICENSE" desc="官方自带文件"/>
<filteritem type="filepath" name="string/README.contributors" desc="官方自带文件"/>
<filteritem type="filepath" name="README.OpenSource" desc="官方自带文件"/>
<filteritem type="filepath" name="README" desc="官方自带文件"/>
<filteritem type="filepath" name="optimized-routines.gni" desc="不涉及license"/>
<filteritem type="filepath" name="bundle.json" desc="不涉及license"/>
<filteritem type="filepath" name="config.mk.dist" desc="不涉及license"/>
</filefilter>
<filefilter name="binaryFileTypePolicyFilter" desc="Filters for binary file policies">
<filteritem type="filename" name="*.pdf" desc="官方自带文件"/>
<filteritem type="filename" name="*.pdf" desc="官方自带文件"/>
</filefilter>
</filefilterlist>
</filefilterlist>
</oatconfig>
</configuration>
+5 -9
View File
@@ -2,17 +2,14 @@ Arm Optimized Routines
----------------------
This repository contains implementations of library functions
provided by Arm. The outbound license is available under a dual
license, at the users election, as reflected in the LICENSE file.
Contributions to this project are accepted, but Contributors have
to sign an Assignment Agreement, please follow the instructions in
provided by Arm under MIT License (See LICENSE). Contributions
to this project are accepted, but Contributors have to sign an
Assignment Agreement, please follow the instructions in
contributor-agreement.pdf. This is needed so upstreaming code
to projects that require copyright assignment is possible. Further
contribution requirements are documented in README.contributors of
the appropriate subdirectory.
to projects that require copyright assignment is possible.
Regular quarterly releases are tagged as vYY.MM, the latest
release is v23.01.
release is v21.02.
Source code layout:
@@ -27,7 +24,6 @@ networking/test/ - networking test and benchmark related sources.
string/ - string routines subproject sources.
string/include/ - string library public headers.
string/test/ - string test and benchmark related sources.
pl/... - separately maintained performance library code.
The steps to build the target libraries and run the tests:
+4 -21
View File
@@ -1,14 +1,11 @@
# Example config.mk
#
# Copyright (c) 2018-2022, Arm Limited.
# SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
# Copyright (c) 2018-2020, Arm Limited.
# SPDX-License-Identifier: MIT
# Subprojects to build
SUBS = math string networking
# Subsubprojects to build if subproject pl is built
PLSUBS = math
# Target architecture: aarch64, arm or x86_64
ARCH = aarch64
@@ -59,22 +56,8 @@ math-cflags += -ffp-contract=fast -fno-math-errno
# Use with clang.
#math-cflags += -ffp-contract=fast
# Disable/enable SVE vector math code and tests
WANT_SVE_MATH = 0
ifeq ($(WANT_SVE_MATH), 1)
math-cflags += -march=armv8.2-a+sve
endif
math-cflags += -DWANT_SVE_MATH=$(WANT_SVE_MATH)
# If defined to 1, set errno in math functions according to ISO C. Many math
# libraries do not set errno, so this is 0 by default. It may need to be
# set to 1 if math.h has (math_errhandling & MATH_ERRNO) != 0.
WANT_ERRNO = 0
math-cflags += -DWANT_ERRNO=$(WANT_ERRNO)
# If set to 1, set fenv in vector math routines.
WANT_SIMD_EXCEPT = 0
math-cflags += -DWANT_SIMD_EXCEPT=$(WANT_SIMD_EXCEPT)
# Disable vector math code
#math-cflags += -DWANT_VMATH=0
# Disable fenv checks
#math-ulpflags = -q -f
+5 -12
View File
@@ -1,14 +1,12 @@
# Makefile fragment - requires GNU make
#
# Copyright (c) 2019-2022, Arm Limited.
# SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
# Copyright (c) 2019, Arm Limited.
# SPDX-License-Identifier: MIT
S := $(srcdir)/math
B := build/math
math-lib-srcs := $(wildcard $(S)/*.[cS])
math-lib-srcs += $(wildcard $(S)/$(ARCH)/*.[cS])
math-test-srcs := \
$(S)/test/mathtest.c \
$(S)/test/mathbench.c \
@@ -17,7 +15,6 @@ math-test-srcs := \
math-test-host-srcs := $(wildcard $(S)/test/rtest/*.[cS])
math-includes := $(patsubst $(S)/%,build/%,$(wildcard $(S)/include/*.h))
math-test-includes := $(patsubst $(S)/%,build/include/%,$(wildcard $(S)/test/*.h))
math-libs := \
build/lib/libmathlib.so \
@@ -45,11 +42,10 @@ math-files := \
$(math-tools) \
$(math-host-tools) \
$(math-includes) \
$(math-test-includes) \
all-math: $(math-libs) $(math-tools) $(math-includes) $(math-test-includes)
all-math: $(math-libs) $(math-tools) $(math-includes)
$(math-objs): $(math-includes) $(math-test-includes)
$(math-objs): $(math-includes)
$(math-objs): CFLAGS_ALL += $(math-cflags)
$(B)/test/mathtest.o: CFLAGS_ALL += -fmath-errno
$(math-host-objs): CC = $(HOST_CC)
@@ -87,9 +83,6 @@ build/bin/ulp: $(B)/test/ulp.o build/lib/libmathlib.a
build/include/%.h: $(S)/include/%.h
cp $< $@
build/include/test/%.h: $(S)/test/%.h
cp $< $@
build/bin/%.sh: $(S)/test/%.sh
cp $< $@
@@ -103,7 +96,7 @@ check-math-rtest: $(math-host-tools) $(math-tools)
cat $(math-rtests) | build/bin/rtest | $(EMULATOR) build/bin/mathtest $(math-testflags)
check-math-ulp: $(math-tools)
ULPFLAGS="$(math-ulpflags)" WANT_SIMD_EXCEPT="$(WANT_SIMD_EXCEPT)" build/bin/runulp.sh $(EMULATOR)
ULPFLAGS="$(math-ulpflags)" build/bin/runulp.sh $(EMULATOR)
check-math: check-math-test check-math-rtest check-math-ulp
-78
View File
@@ -1,78 +0,0 @@
STYLE REQUIREMENTS
==================
1. Most code in this sub-directory is expected to be upstreamed into glibc so
the GNU Coding Standard and glibc specific conventions should be followed
to ease upstreaming.
2. ABI and symbols: the code should be written so it is suitable for inclusion
into a libc with minimal changes. This e.g. means that internal symbols
should be hidden and in the implementation reserved namespace according to
ISO C and POSIX rules. If possible the built shared libraries and static
library archives should be usable to override libc symbols at link time (or
at runtime via LD_PRELOAD). This requires the symbols to follow the glibc ABI
(other than symbol versioning), this cannot be done reliably for static
linking so this is a best effort requirement.
3. API: include headers should be suitable for benchmarking and testing code
and should not conflict with libc headers.
CONTRIBUTION GUIDELINES FOR math SUB-DIRECTORY
==============================================
1. Math functions have quality and performance requirements.
2. Quality:
- Worst-case ULP error should be small in the entire input domain (for most
common double precision scalar functions the target is < 0.66 ULP error,
and < 1 ULP for single precision, even performance optimized function
variant should not have > 5 ULP error if the goal is to be a drop in
replacement for a standard math function), this should be tested
statistically (or on all inputs if possible in reasonable amount of time).
The ulp tool is for this and runulp.sh should be updated for new functions.
- All standard rounding modes need to be supported but in non-default rounding
modes the quality requirement can be relaxed. (Non-nearest rounded
computation can be slow and inaccurate but has to be correct for conformance
reasons.)
- Special cases and error handling need to follow ISO C Annex F requirements,
POSIX requirements, IEEE 754-2008 requirements and Glibc requiremnts:
https://www.gnu.org/software/libc/manual/html_mono/libc.html#Errors-in-Math-Functions
this should be tested by direct tests (glibc test system may be used for it).
- Error handling code should be decoupled from the approximation code as much
as possible. (There are helper functions, these take care of errno as well
as exception raising.)
- Vector math code does not need to work in non-nearest rounding mode and error
handling side effects need not happen (fenv exceptions and errno), but the
result should be correct (within quality requirements, which are lower for
vector code than for scalar code).
- Error bounds of the approximation should be clearly documented.
- The code should build and pass tests on arm, aarch64 and x86_64 GNU linux
systems. (Routines and features can be disabled on specific targets, but
the build must complete). On aarch64, both little- and big-endian targets
are supported as well as valid combinations of architecture extensions.
The configurations that should be tested depend on the contribution.
3. Performance:
- Common math code should be benchmarked on modern aarch64 microarchitectures
over typical inputs.
- Performance improvements should be documented (relative numbers can be
published; it is enough to use the mathbench microbenchmark tool which should
be updated for new functions).
- Attention should be paid to the compilation flags: for aarch64 fma
contraction should be on and math errno turned off so some builtins can be
inlined.
- The code should be reasonably performant on x86_64 too, e.g. some rounding
instructions and fma may not be available on x86_64, such builtins turn into
libc calls with slow code. Such slowdown is not acceptable, a faster fallback
should be present: glibc and bionic use the same code on all targets. (This
does not apply to vector math code).
-87
View File
@@ -1,87 +0,0 @@
/*
* Double-precision vector cos function.
*
* Copyright (c) 2019-2023, Arm Limited.
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
*/
#include "mathlib.h"
#include "v_math.h"
static const struct data
{
float64x2_t poly[7];
float64x2_t range_val, shift, inv_pi, half_pi, pi_1, pi_2, pi_3;
} data = {
/* Worst-case error is 3.3 ulp in [-pi/2, pi/2]. */
.poly = { V2 (-0x1.555555555547bp-3), V2 (0x1.1111111108a4dp-7),
V2 (-0x1.a01a019936f27p-13), V2 (0x1.71de37a97d93ep-19),
V2 (-0x1.ae633919987c6p-26), V2 (0x1.60e277ae07cecp-33),
V2 (-0x1.9e9540300a1p-41) },
.inv_pi = V2 (0x1.45f306dc9c883p-2),
.half_pi = V2 (0x1.921fb54442d18p+0),
.pi_1 = V2 (0x1.921fb54442d18p+1),
.pi_2 = V2 (0x1.1a62633145c06p-53),
.pi_3 = V2 (0x1.c1cd129024e09p-106),
.shift = V2 (0x1.8p52),
.range_val = V2 (0x1p23)
};
#define C(i) d->poly[i]
static float64x2_t VPCS_ATTR NOINLINE
special_case (float64x2_t x, float64x2_t y, uint64x2_t odd, uint64x2_t cmp)
{
y = vreinterpretq_f64_u64 (veorq_u64 (vreinterpretq_u64_f64 (y), odd));
return v_call_f64 (cos, x, y, cmp);
}
float64x2_t VPCS_ATTR V_NAME_D1 (cos) (float64x2_t x)
{
const struct data *d = ptr_barrier (&data);
float64x2_t n, r, r2, r3, r4, t1, t2, t3, y;
uint64x2_t odd, cmp;
#if WANT_SIMD_EXCEPT
r = vabsq_f64 (x);
cmp = vcgeq_u64 (vreinterpretq_u64_f64 (r),
vreinterpretq_u64_f64 (d->range_val));
if (unlikely (v_any_u64 (cmp)))
/* If fenv exceptions are to be triggered correctly, set any special lanes
to 1 (which is neutral w.r.t. fenv). These lanes will be fixed by
special-case handler later. */
r = vbslq_f64 (cmp, v_f64 (1.0), r);
#else
cmp = vcageq_f64 (x, d->range_val);
r = x;
#endif
/* n = rint((|x|+pi/2)/pi) - 0.5. */
n = vfmaq_f64 (d->shift, d->inv_pi, vaddq_f64 (r, d->half_pi));
odd = vshlq_n_u64 (vreinterpretq_u64_f64 (n), 63);
n = vsubq_f64 (n, d->shift);
n = vsubq_f64 (n, v_f64 (0.5));
/* r = |x| - n*pi (range reduction into -pi/2 .. pi/2). */
r = vfmsq_f64 (r, d->pi_1, n);
r = vfmsq_f64 (r, d->pi_2, n);
r = vfmsq_f64 (r, d->pi_3, n);
/* sin(r) poly approx. */
r2 = vmulq_f64 (r, r);
r3 = vmulq_f64 (r2, r);
r4 = vmulq_f64 (r2, r2);
t1 = vfmaq_f64 (C (4), C (5), r2);
t2 = vfmaq_f64 (C (2), C (3), r2);
t3 = vfmaq_f64 (C (0), C (1), r2);
y = vfmaq_f64 (t1, C (6), r4);
y = vfmaq_f64 (t2, y, r4);
y = vfmaq_f64 (t3, y, r4);
y = vfmaq_f64 (r, y, r3);
if (unlikely (v_any_u64 (cmp)))
return special_case (x, y, odd, cmp);
return vreinterpretq_f64_u64 (veorq_u64 (vreinterpretq_u64_f64 (y), odd));
}
-82
View File
@@ -1,82 +0,0 @@
/*
* Single-precision vector cos function.
*
* Copyright (c) 2019-2023, Arm Limited.
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
*/
#include "mathlib.h"
#include "v_math.h"
static const struct data
{
float32x4_t poly[4];
float32x4_t range_val, inv_pi, half_pi, shift, pi_1, pi_2, pi_3;
} data = {
/* 1.886 ulp error. */
.poly = { V4 (-0x1.555548p-3f), V4 (0x1.110df4p-7f), V4 (-0x1.9f42eap-13f),
V4 (0x1.5b2e76p-19f) },
.pi_1 = V4 (0x1.921fb6p+1f),
.pi_2 = V4 (-0x1.777a5cp-24f),
.pi_3 = V4 (-0x1.ee59dap-49f),
.inv_pi = V4 (0x1.45f306p-2f),
.shift = V4 (0x1.8p+23f),
.half_pi = V4 (0x1.921fb6p0f),
.range_val = V4 (0x1p20f)
};
#define C(i) d->poly[i]
static float32x4_t VPCS_ATTR NOINLINE
special_case (float32x4_t x, float32x4_t y, uint32x4_t odd, uint32x4_t cmp)
{
/* Fall back to scalar code. */
y = vreinterpretq_f32_u32 (veorq_u32 (vreinterpretq_u32_f32 (y), odd));
return v_call_f32 (cosf, x, y, cmp);
}
float32x4_t VPCS_ATTR V_NAME_F1 (cos) (float32x4_t x)
{
const struct data *d = ptr_barrier (&data);
float32x4_t n, r, r2, r3, y;
uint32x4_t odd, cmp;
#if WANT_SIMD_EXCEPT
r = vabsq_f32 (x);
cmp = vcgeq_u32 (vreinterpretq_u32_f32 (r),
vreinterpretq_u32_f32 (d->range_val));
if (unlikely (v_any_u32 (cmp)))
/* If fenv exceptions are to be triggered correctly, set any special lanes
to 1 (which is neutral w.r.t. fenv). These lanes will be fixed by
special-case handler later. */
r = vbslq_f32 (cmp, v_f32 (1.0f), r);
#else
cmp = vcageq_f32 (x, d->range_val);
r = x;
#endif
/* n = rint((|x|+pi/2)/pi) - 0.5. */
n = vfmaq_f32 (d->shift, d->inv_pi, vaddq_f32 (r, d->half_pi));
odd = vshlq_n_u32 (vreinterpretq_u32_f32 (n), 31);
n = vsubq_f32 (n, d->shift);
n = vsubq_f32 (n, v_f32 (0.5f));
/* r = |x| - n*pi (range reduction into -pi/2 .. pi/2). */
r = vfmsq_f32 (r, d->pi_1, n);
r = vfmsq_f32 (r, d->pi_2, n);
r = vfmsq_f32 (r, d->pi_3, n);
/* y = sin(r). */
r2 = vmulq_f32 (r, r);
r3 = vmulq_f32 (r2, r);
y = vfmaq_f32 (C (2), C (3), r2);
y = vfmaq_f32 (C (1), y, r2);
y = vfmaq_f32 (C (0), y, r2);
y = vfmaq_f32 (r, y, r3);
if (unlikely (v_any_u32 (cmp)))
return special_case (x, y, odd, cmp);
return vreinterpretq_f32_u32 (veorq_u32 (vreinterpretq_u32_f32 (y), odd));
}
-125
View File
@@ -1,125 +0,0 @@
/*
* Double-precision vector e^x function.
*
* Copyright (c) 2019-2023, Arm Limited.
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
*/
#include "mathlib.h"
#include "v_math.h"
#define N (1 << V_EXP_TABLE_BITS)
#define IndexMask (N - 1)
const static volatile struct
{
float64x2_t poly[3];
float64x2_t inv_ln2, ln2_hi, ln2_lo, shift;
#if !WANT_SIMD_EXCEPT
float64x2_t special_bound, scale_thresh;
#endif
} data = {
/* maxerr: 1.88 +0.5 ulp
rel error: 1.4337*2^-53
abs error: 1.4299*2^-53 in [ -ln2/256, ln2/256 ]. */
.poly = { V2 (0x1.ffffffffffd43p-2), V2 (0x1.55555c75adbb2p-3),
V2 (0x1.55555da646206p-5) },
#if !WANT_SIMD_EXCEPT
.scale_thresh = V2 (163840.0), /* 1280.0 * N. */
.special_bound = V2 (704.0),
#endif
.inv_ln2 = V2 (0x1.71547652b82fep7), /* N/ln2. */
.ln2_hi = V2 (0x1.62e42fefa39efp-8), /* ln2/N. */
.ln2_lo = V2 (0x1.abc9e3b39803f3p-63),
.shift = V2 (0x1.8p+52)
};
#define C(i) data.poly[i]
#define Tab __v_exp_data
#if WANT_SIMD_EXCEPT
# define TinyBound v_u64 (0x2000000000000000) /* asuint64 (0x1p-511). */
# define BigBound v_u64 (0x4080000000000000) /* asuint64 (0x1p9). */
# define SpecialBound v_u64 (0x2080000000000000) /* BigBound - TinyBound. */
static float64x2_t VPCS_ATTR NOINLINE
special_case (float64x2_t x, float64x2_t y, uint64x2_t cmp)
{
/* If fenv exceptions are to be triggered correctly, fall back to the scalar
routine to special lanes. */
return v_call_f64 (exp, x, y, cmp);
}
#else
# define SpecialOffset v_u64 (0x6000000000000000) /* 0x1p513. */
/* SpecialBias1 + SpecialBias1 = asuint(1.0). */
# define SpecialBias1 v_u64 (0x7000000000000000) /* 0x1p769. */
# define SpecialBias2 v_u64 (0x3010000000000000) /* 0x1p-254. */
static inline float64x2_t VPCS_ATTR
special_case (float64x2_t s, float64x2_t y, float64x2_t n)
{
/* 2^(n/N) may overflow, break it up into s1*s2. */
uint64x2_t b = vandq_u64 (vcltzq_f64 (n), SpecialOffset);
float64x2_t s1 = vreinterpretq_f64_u64 (vsubq_u64 (SpecialBias1, b));
float64x2_t s2 = vreinterpretq_f64_u64 (
vaddq_u64 (vsubq_u64 (vreinterpretq_u64_f64 (s), SpecialBias2), b));
uint64x2_t cmp = vcagtq_f64 (n, data.scale_thresh);
float64x2_t r1 = vmulq_f64 (s1, s1);
float64x2_t r0 = vmulq_f64 (vfmaq_f64 (s2, y, s2), s1);
return vbslq_f64 (cmp, r1, r0);
}
#endif
float64x2_t VPCS_ATTR V_NAME_D1 (exp) (float64x2_t x)
{
float64x2_t n, r, r2, s, y, z;
uint64x2_t cmp, u, e;
#if WANT_SIMD_EXCEPT
/* If any lanes are special, mask them with 1 and retain a copy of x to allow
special_case to fix special lanes later. This is only necessary if fenv
exceptions are to be triggered correctly. */
float64x2_t xm = x;
uint64x2_t iax = vreinterpretq_u64_f64 (vabsq_f64 (x));
cmp = vcgeq_u64 (vsubq_u64 (iax, TinyBound), SpecialBound);
if (unlikely (v_any_u64 (cmp)))
x = vbslq_f64 (cmp, v_f64 (1), x);
#else
cmp = vcagtq_f64 (x, data.special_bound);
#endif
/* n = round(x/(ln2/N)). */
z = vfmaq_f64 (data.shift, x, data.inv_ln2);
u = vreinterpretq_u64_f64 (z);
n = vsubq_f64 (z, data.shift);
/* r = x - n*ln2/N. */
r = x;
r = vfmsq_f64 (r, data.ln2_hi, n);
r = vfmsq_f64 (r, data.ln2_lo, n);
e = vshlq_n_u64 (u, 52 - V_EXP_TABLE_BITS);
/* y = exp(r) - 1 ~= r + C0 r^2 + C1 r^3 + C2 r^4. */
r2 = vmulq_f64 (r, r);
y = vfmaq_f64 (C (0), C (1), r);
y = vfmaq_f64 (y, C (2), r2);
y = vfmaq_f64 (r, y, r2);
/* s = 2^(n/N). */
u = (uint64x2_t){ Tab[u[0] & IndexMask], Tab[u[1] & IndexMask] };
s = vreinterpretq_f64_u64 (vaddq_u64 (u, e));
if (unlikely (v_any_u64 (cmp)))
#if WANT_SIMD_EXCEPT
return special_case (xm, vfmaq_f64 (s, y, s), cmp);
#else
return special_case (s, y, n);
#endif
return vfmaq_f64 (s, y, s);
}
-113
View File
@@ -1,113 +0,0 @@
/*
* Single-precision vector 2^x function.
*
* Copyright (c) 2019-2023, Arm Limited.
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
*/
#include "mathlib.h"
#include "v_math.h"
static const struct data
{
float32x4_t poly[5];
uint32x4_t exponent_bias;
#if !WANT_SIMD_EXCEPT
float32x4_t special_bound, scale_thresh;
#endif
} data = {
/* maxerr: 1.962 ulp. */
.poly = { V4 (0x1.59977ap-10f), V4 (0x1.3ce9e4p-7f), V4 (0x1.c6bd32p-5f),
V4 (0x1.ebf9bcp-3f), V4 (0x1.62e422p-1f) },
.exponent_bias = V4 (0x3f800000),
#if !WANT_SIMD_EXCEPT
.special_bound = V4 (126.0f),
.scale_thresh = V4 (192.0f),
#endif
};
#define C(i) d->poly[i]
#if WANT_SIMD_EXCEPT
# define TinyBound v_u32 (0x20000000) /* asuint (0x1p-63). */
# define BigBound v_u32 (0x42800000) /* asuint (0x1p6). */
# define SpecialBound v_u32 (0x22800000) /* BigBound - TinyBound. */
static float32x4_t VPCS_ATTR NOINLINE
special_case (float32x4_t x, float32x4_t y, uint32x4_t cmp)
{
/* If fenv exceptions are to be triggered correctly, fall back to the scalar
routine for special lanes. */
return v_call_f32 (exp2f, x, y, cmp);
}
#else
# define SpecialOffset v_u32 (0x82000000)
# define SpecialBias v_u32 (0x7f000000)
static float32x4_t VPCS_ATTR NOINLINE
special_case (float32x4_t poly, float32x4_t n, uint32x4_t e, uint32x4_t cmp1,
float32x4_t scale, const struct data *d)
{
/* 2^n may overflow, break it up into s1*s2. */
uint32x4_t b = vandq_u32 (vclezq_f32 (n), SpecialOffset);
float32x4_t s1 = vreinterpretq_f32_u32 (vaddq_u32 (b, SpecialBias));
float32x4_t s2 = vreinterpretq_f32_u32 (vsubq_u32 (e, b));
uint32x4_t cmp2 = vcagtq_f32 (n, d->scale_thresh);
float32x4_t r2 = vmulq_f32 (s1, s1);
float32x4_t r1 = vmulq_f32 (vfmaq_f32 (s2, poly, s2), s1);
/* Similar to r1 but avoids double rounding in the subnormal range. */
float32x4_t r0 = vfmaq_f32 (scale, poly, scale);
float32x4_t r = vbslq_f32 (cmp1, r1, r0);
return vbslq_f32 (cmp2, r2, r);
}
#endif
float32x4_t VPCS_ATTR V_NAME_F1 (exp2) (float32x4_t x)
{
const struct data *d = ptr_barrier (&data);
float32x4_t n, r, r2, scale, p, q, poly;
uint32x4_t cmp, e;
#if WANT_SIMD_EXCEPT
/* asuint(|x|) - TinyBound >= BigBound - TinyBound. */
uint32x4_t ia = vreinterpretq_u32_f32 (vabsq_f32 (x));
cmp = vcgeq_u32 (vsubq_u32 (ia, TinyBound), SpecialBound);
float32x4_t xm = x;
/* If any lanes are special, mask them with 1 and retain a copy of x to allow
special_case to fix special lanes later. This is only necessary if fenv
exceptions are to be triggered correctly. */
if (unlikely (v_any_u32 (cmp)))
x = vbslq_f32 (cmp, v_f32 (1), x);
#endif
/* exp2(x) = 2^n (1 + poly(r)), with 1 + poly(r) in [1/sqrt(2),sqrt(2)]
x = n + r, with r in [-1/2, 1/2]. */
n = vrndaq_f32 (x);
r = vsubq_f32 (x, n);
e = vshlq_n_u32 (vreinterpretq_u32_s32 (vcvtaq_s32_f32 (x)), 23);
scale = vreinterpretq_f32_u32 (vaddq_u32 (e, d->exponent_bias));
#if !WANT_SIMD_EXCEPT
cmp = vcagtq_f32 (n, d->special_bound);
#endif
r2 = vmulq_f32 (r, r);
p = vfmaq_f32 (C (1), C (0), r);
q = vfmaq_f32 (C (3), C (2), r);
q = vfmaq_f32 (q, p, r2);
p = vmulq_f32 (C (4), r);
poly = vfmaq_f32 (p, q, r2);
if (unlikely (v_any_u32 (cmp)))
#if WANT_SIMD_EXCEPT
return special_case (xm, vfmaq_f32 (scale, poly, scale), cmp);
#else
return special_case (poly, n, e, cmp, scale, d);
#endif
return vfmaq_f32 (scale, poly, scale);
}
-72
View File
@@ -1,72 +0,0 @@
/*
* Single-precision vector 2^x function.
*
* Copyright (c) 2019-2023, Arm Limited.
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
*/
#include "mathlib.h"
#include "v_math.h"
static const float Poly[] = {
/* maxerr: 0.878 ulp. */
0x1.416b5ep-13f, 0x1.5f082ep-10f, 0x1.3b2dep-7f, 0x1.c6af7cp-5f, 0x1.ebfbdcp-3f, 0x1.62e43p-1f
};
#define C0 v_f32 (Poly[0])
#define C1 v_f32 (Poly[1])
#define C2 v_f32 (Poly[2])
#define C3 v_f32 (Poly[3])
#define C4 v_f32 (Poly[4])
#define C5 v_f32 (Poly[5])
#define Shift v_f32 (0x1.8p23f)
#define InvLn2 v_f32 (0x1.715476p+0f)
#define Ln2hi v_f32 (0x1.62e4p-1f)
#define Ln2lo v_f32 (0x1.7f7d1cp-20f)
static float32x4_t VPCS_ATTR NOINLINE
specialcase (float32x4_t poly, float32x4_t n, uint32x4_t e, float32x4_t absn)
{
/* 2^n may overflow, break it up into s1*s2. */
uint32x4_t b = (n <= v_f32 (0.0f)) & v_u32 (0x83000000);
float32x4_t s1 = vreinterpretq_f32_u32 (v_u32 (0x7f000000) + b);
float32x4_t s2 = vreinterpretq_f32_u32 (e - b);
uint32x4_t cmp = absn > v_f32 (192.0f);
float32x4_t r1 = s1 * s1;
float32x4_t r0 = poly * s1 * s2;
return vreinterpretq_f32_u32 ((cmp & vreinterpretq_u32_f32 (r1))
| (~cmp & vreinterpretq_u32_f32 (r0)));
}
float32x4_t VPCS_ATTR
_ZGVnN4v_exp2f_1u (float32x4_t x)
{
float32x4_t n, r, scale, poly, absn;
uint32x4_t cmp, e;
/* exp2(x) = 2^n * poly(r), with poly(r) in [1/sqrt(2),sqrt(2)]
x = n + r, with r in [-1/2, 1/2]. */
#if 0
float32x4_t z;
z = x + Shift;
n = z - Shift;
r = x - n;
e = vreinterpretq_u32_f32 (z) << 23;
#else
n = vrndaq_f32 (x);
r = x - n;
e = vreinterpretq_u32_s32 (vcvtaq_s32_f32 (x)) << 23;
#endif
scale = vreinterpretq_f32_u32 (e + v_u32 (0x3f800000));
absn = vabsq_f32 (n);
cmp = absn > v_f32 (126.0f);
poly = vfmaq_f32 (C1, C0, r);
poly = vfmaq_f32 (C2, poly, r);
poly = vfmaq_f32 (C3, poly, r);
poly = vfmaq_f32 (C4, poly, r);
poly = vfmaq_f32 (C5, poly, r);
poly = vfmaq_f32 (v_f32 (1.0f), poly, r);
if (unlikely (v_any_u32 (cmp)))
return specialcase (poly, n, e, absn);
return scale * poly;
}
-146
View File
@@ -1,146 +0,0 @@
/*
* Lookup table for double-precision e^x vector function.
*
* Copyright (c) 2019-2023, Arm Limited.
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
*/
#include "v_math.h"
# define N (1 << V_EXP_TABLE_BITS)
/* 2^(j/N), j=0..N. */
const uint64_t __v_exp_data[] = {
# if N == 128
0x3ff0000000000000, 0x3feff63da9fb3335, 0x3fefec9a3e778061,
0x3fefe315e86e7f85, 0x3fefd9b0d3158574, 0x3fefd06b29ddf6de,
0x3fefc74518759bc8, 0x3fefbe3ecac6f383, 0x3fefb5586cf9890f,
0x3fefac922b7247f7, 0x3fefa3ec32d3d1a2, 0x3fef9b66affed31b,
0x3fef9301d0125b51, 0x3fef8abdc06c31cc, 0x3fef829aaea92de0,
0x3fef7a98c8a58e51, 0x3fef72b83c7d517b, 0x3fef6af9388c8dea,
0x3fef635beb6fcb75, 0x3fef5be084045cd4, 0x3fef54873168b9aa,
0x3fef4d5022fcd91d, 0x3fef463b88628cd6, 0x3fef3f49917ddc96,
0x3fef387a6e756238, 0x3fef31ce4fb2a63f, 0x3fef2b4565e27cdd,
0x3fef24dfe1f56381, 0x3fef1e9df51fdee1, 0x3fef187fd0dad990,
0x3fef1285a6e4030b, 0x3fef0cafa93e2f56, 0x3fef06fe0a31b715,
0x3fef0170fc4cd831, 0x3feefc08b26416ff, 0x3feef6c55f929ff1,
0x3feef1a7373aa9cb, 0x3feeecae6d05d866, 0x3feee7db34e59ff7,
0x3feee32dc313a8e5, 0x3feedea64c123422, 0x3feeda4504ac801c,
0x3feed60a21f72e2a, 0x3feed1f5d950a897, 0x3feece086061892d,
0x3feeca41ed1d0057, 0x3feec6a2b5c13cd0, 0x3feec32af0d7d3de,
0x3feebfdad5362a27, 0x3feebcb299fddd0d, 0x3feeb9b2769d2ca7,
0x3feeb6daa2cf6642, 0x3feeb42b569d4f82, 0x3feeb1a4ca5d920f,
0x3feeaf4736b527da, 0x3feead12d497c7fd, 0x3feeab07dd485429,
0x3feea9268a5946b7, 0x3feea76f15ad2148, 0x3feea5e1b976dc09,
0x3feea47eb03a5585, 0x3feea34634ccc320, 0x3feea23882552225,
0x3feea155d44ca973, 0x3feea09e667f3bcd, 0x3feea012750bdabf,
0x3fee9fb23c651a2f, 0x3fee9f7df9519484, 0x3fee9f75e8ec5f74,
0x3fee9f9a48a58174, 0x3fee9feb564267c9, 0x3feea0694fde5d3f,
0x3feea11473eb0187, 0x3feea1ed0130c132, 0x3feea2f336cf4e62,
0x3feea427543e1a12, 0x3feea589994cce13, 0x3feea71a4623c7ad,
0x3feea8d99b4492ed, 0x3feeaac7d98a6699, 0x3feeace5422aa0db,
0x3feeaf3216b5448c, 0x3feeb1ae99157736, 0x3feeb45b0b91ffc6,
0x3feeb737b0cdc5e5, 0x3feeba44cbc8520f, 0x3feebd829fde4e50,
0x3feec0f170ca07ba, 0x3feec49182a3f090, 0x3feec86319e32323,
0x3feecc667b5de565, 0x3feed09bec4a2d33, 0x3feed503b23e255d,
0x3feed99e1330b358, 0x3feede6b5579fdbf, 0x3feee36bbfd3f37a,
0x3feee89f995ad3ad, 0x3feeee07298db666, 0x3feef3a2b84f15fb,
0x3feef9728de5593a, 0x3feeff76f2fb5e47, 0x3fef05b030a1064a,
0x3fef0c1e904bc1d2, 0x3fef12c25bd71e09, 0x3fef199bdd85529c,
0x3fef20ab5fffd07a, 0x3fef27f12e57d14b, 0x3fef2f6d9406e7b5,
0x3fef3720dcef9069, 0x3fef3f0b555dc3fa, 0x3fef472d4a07897c,
0x3fef4f87080d89f2, 0x3fef5818dcfba487, 0x3fef60e316c98398,
0x3fef69e603db3285, 0x3fef7321f301b460, 0x3fef7c97337b9b5f,
0x3fef864614f5a129, 0x3fef902ee78b3ff6, 0x3fef9a51fbc74c83,
0x3fefa4afa2a490da, 0x3fefaf482d8e67f1, 0x3fefba1bee615a27,
0x3fefc52b376bba97, 0x3fefd0765b6e4540, 0x3fefdbfdad9cbe14,
0x3fefe7c1819e90d8, 0x3feff3c22b8f71f1,
# elif N == 256
0x3ff0000000000000, 0x3feffb1afa5abcbf, 0x3feff63da9fb3335,
0x3feff168143b0281, 0x3fefec9a3e778061, 0x3fefe7d42e11bbcc,
0x3fefe315e86e7f85, 0x3fefde5f72f654b1, 0x3fefd9b0d3158574,
0x3fefd50a0e3c1f89, 0x3fefd06b29ddf6de, 0x3fefcbd42b72a836,
0x3fefc74518759bc8, 0x3fefc2bdf66607e0, 0x3fefbe3ecac6f383,
0x3fefb9c79b1f3919, 0x3fefb5586cf9890f, 0x3fefb0f145e46c85,
0x3fefac922b7247f7, 0x3fefa83b23395dec, 0x3fefa3ec32d3d1a2,
0x3fef9fa55fdfa9c5, 0x3fef9b66affed31b, 0x3fef973028d7233e,
0x3fef9301d0125b51, 0x3fef8edbab5e2ab6, 0x3fef8abdc06c31cc,
0x3fef86a814f204ab, 0x3fef829aaea92de0, 0x3fef7e95934f312e,
0x3fef7a98c8a58e51, 0x3fef76a45471c3c2, 0x3fef72b83c7d517b,
0x3fef6ed48695bbc0, 0x3fef6af9388c8dea, 0x3fef672658375d2f,
0x3fef635beb6fcb75, 0x3fef5f99f8138a1c, 0x3fef5be084045cd4,
0x3fef582f95281c6b, 0x3fef54873168b9aa, 0x3fef50e75eb44027,
0x3fef4d5022fcd91d, 0x3fef49c18438ce4d, 0x3fef463b88628cd6,
0x3fef42be3578a819, 0x3fef3f49917ddc96, 0x3fef3bdda27912d1,
0x3fef387a6e756238, 0x3fef351ffb82140a, 0x3fef31ce4fb2a63f,
0x3fef2e85711ece75, 0x3fef2b4565e27cdd, 0x3fef280e341ddf29,
0x3fef24dfe1f56381, 0x3fef21ba7591bb70, 0x3fef1e9df51fdee1,
0x3fef1b8a66d10f13, 0x3fef187fd0dad990, 0x3fef157e39771b2f,
0x3fef1285a6e4030b, 0x3fef0f961f641589, 0x3fef0cafa93e2f56,
0x3fef09d24abd886b, 0x3fef06fe0a31b715, 0x3fef0432edeeb2fd,
0x3fef0170fc4cd831, 0x3feefeb83ba8ea32, 0x3feefc08b26416ff,
0x3feef96266e3fa2d, 0x3feef6c55f929ff1, 0x3feef431a2de883b,
0x3feef1a7373aa9cb, 0x3feeef26231e754a, 0x3feeecae6d05d866,
0x3feeea401b7140ef, 0x3feee7db34e59ff7, 0x3feee57fbfec6cf4,
0x3feee32dc313a8e5, 0x3feee0e544ede173, 0x3feedea64c123422,
0x3feedc70df1c5175, 0x3feeda4504ac801c, 0x3feed822c367a024,
0x3feed60a21f72e2a, 0x3feed3fb2709468a, 0x3feed1f5d950a897,
0x3feecffa3f84b9d4, 0x3feece086061892d, 0x3feecc2042a7d232,
0x3feeca41ed1d0057, 0x3feec86d668b3237, 0x3feec6a2b5c13cd0,
0x3feec4e1e192aed2, 0x3feec32af0d7d3de, 0x3feec17dea6db7d7,
0x3feebfdad5362a27, 0x3feebe41b817c114, 0x3feebcb299fddd0d,
0x3feebb2d81d8abff, 0x3feeb9b2769d2ca7, 0x3feeb8417f4531ee,
0x3feeb6daa2cf6642, 0x3feeb57de83f4eef, 0x3feeb42b569d4f82,
0x3feeb2e2f4f6ad27, 0x3feeb1a4ca5d920f, 0x3feeb070dde910d2,
0x3feeaf4736b527da, 0x3feeae27dbe2c4cf, 0x3feead12d497c7fd,
0x3feeac0827ff07cc, 0x3feeab07dd485429, 0x3feeaa11fba87a03,
0x3feea9268a5946b7, 0x3feea84590998b93, 0x3feea76f15ad2148,
0x3feea6a320dceb71, 0x3feea5e1b976dc09, 0x3feea52ae6cdf6f4,
0x3feea47eb03a5585, 0x3feea3dd1d1929fd, 0x3feea34634ccc320,
0x3feea2b9febc8fb7, 0x3feea23882552225, 0x3feea1c1c70833f6,
0x3feea155d44ca973, 0x3feea0f4b19e9538, 0x3feea09e667f3bcd,
0x3feea052fa75173e, 0x3feea012750bdabf, 0x3fee9fdcddd47645,
0x3fee9fb23c651a2f, 0x3fee9f9298593ae5, 0x3fee9f7df9519484,
0x3fee9f7466f42e87, 0x3fee9f75e8ec5f74, 0x3fee9f8286ead08a,
0x3fee9f9a48a58174, 0x3fee9fbd35d7cbfd, 0x3fee9feb564267c9,
0x3feea024b1ab6e09, 0x3feea0694fde5d3f, 0x3feea0b938ac1cf6,
0x3feea11473eb0187, 0x3feea17b0976cfdb, 0x3feea1ed0130c132,
0x3feea26a62ff86f0, 0x3feea2f336cf4e62, 0x3feea3878491c491,
0x3feea427543e1a12, 0x3feea4d2add106d9, 0x3feea589994cce13,
0x3feea64c1eb941f7, 0x3feea71a4623c7ad, 0x3feea7f4179f5b21,
0x3feea8d99b4492ed, 0x3feea9cad931a436, 0x3feeaac7d98a6699,
0x3feeabd0a478580f, 0x3feeace5422aa0db, 0x3feeae05bad61778,
0x3feeaf3216b5448c, 0x3feeb06a5e0866d9, 0x3feeb1ae99157736,
0x3feeb2fed0282c8a, 0x3feeb45b0b91ffc6, 0x3feeb5c353aa2fe2,
0x3feeb737b0cdc5e5, 0x3feeb8b82b5f98e5, 0x3feeba44cbc8520f,
0x3feebbdd9a7670b3, 0x3feebd829fde4e50, 0x3feebf33e47a22a2,
0x3feec0f170ca07ba, 0x3feec2bb4d53fe0d, 0x3feec49182a3f090,
0x3feec674194bb8d5, 0x3feec86319e32323, 0x3feeca5e8d07f29e,
0x3feecc667b5de565, 0x3feece7aed8eb8bb, 0x3feed09bec4a2d33,
0x3feed2c980460ad8, 0x3feed503b23e255d, 0x3feed74a8af46052,
0x3feed99e1330b358, 0x3feedbfe53c12e59, 0x3feede6b5579fdbf,
0x3feee0e521356eba, 0x3feee36bbfd3f37a, 0x3feee5ff3a3c2774,
0x3feee89f995ad3ad, 0x3feeeb4ce622f2ff, 0x3feeee07298db666,
0x3feef0ce6c9a8952, 0x3feef3a2b84f15fb, 0x3feef68415b749b1,
0x3feef9728de5593a, 0x3feefc6e29f1c52a, 0x3feeff76f2fb5e47,
0x3fef028cf22749e4, 0x3fef05b030a1064a, 0x3fef08e0b79a6f1f,
0x3fef0c1e904bc1d2, 0x3fef0f69c3f3a207, 0x3fef12c25bd71e09,
0x3fef16286141b33d, 0x3fef199bdd85529c, 0x3fef1d1cd9fa652c,
0x3fef20ab5fffd07a, 0x3fef244778fafb22, 0x3fef27f12e57d14b,
0x3fef2ba88988c933, 0x3fef2f6d9406e7b5, 0x3fef33405751c4db,
0x3fef3720dcef9069, 0x3fef3b0f2e6d1675, 0x3fef3f0b555dc3fa,
0x3fef43155b5bab74, 0x3fef472d4a07897c, 0x3fef4b532b08c968,
0x3fef4f87080d89f2, 0x3fef53c8eacaa1d6, 0x3fef5818dcfba487,
0x3fef5c76e862e6d3, 0x3fef60e316c98398, 0x3fef655d71ff6075,
0x3fef69e603db3285, 0x3fef6e7cd63a8315, 0x3fef7321f301b460,
0x3fef77d5641c0658, 0x3fef7c97337b9b5f, 0x3fef81676b197d17,
0x3fef864614f5a129, 0x3fef8b333b16ee12, 0x3fef902ee78b3ff6,
0x3fef953924676d76, 0x3fef9a51fbc74c83, 0x3fef9f7977cdb740,
0x3fefa4afa2a490da, 0x3fefa9f4867cca6e, 0x3fefaf482d8e67f1,
0x3fefb4aaa2188510, 0x3fefba1bee615a27, 0x3fefbf9c1cb6412a,
0x3fefc52b376bba97, 0x3fefcac948dd7274, 0x3fefd0765b6e4540,
0x3fefd632798844f8, 0x3fefdbfdad9cbe14, 0x3fefe1d802243c89,
0x3fefe7c1819e90d8, 0x3fefedba3692d514, 0x3feff3c22b8f71f1,
0x3feff9d96b2a23d9,
# endif
};
-122
View File
@@ -1,122 +0,0 @@
/*
* Single-precision vector e^x function.
*
* Copyright (c) 2019-2023, Arm Limited.
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
*/
#include "mathlib.h"
#include "v_math.h"
static const struct data
{
float32x4_t poly[5];
float32x4_t shift, inv_ln2, ln2_hi, ln2_lo;
uint32x4_t exponent_bias;
#if !WANT_SIMD_EXCEPT
float32x4_t special_bound, scale_thresh;
#endif
} data = {
/* maxerr: 1.45358 +0.5 ulp. */
.poly = { V4 (0x1.0e4020p-7f), V4 (0x1.573e2ep-5f), V4 (0x1.555e66p-3f),
V4 (0x1.fffdb6p-2f), V4 (0x1.ffffecp-1f) },
.shift = V4 (0x1.8p23f),
.inv_ln2 = V4 (0x1.715476p+0f),
.ln2_hi = V4 (0x1.62e4p-1f),
.ln2_lo = V4 (0x1.7f7d1cp-20f),
.exponent_bias = V4 (0x3f800000),
#if !WANT_SIMD_EXCEPT
.special_bound = V4 (126.0f),
.scale_thresh = V4 (192.0f),
#endif
};
#define C(i) d->poly[i]
#if WANT_SIMD_EXCEPT
# define TinyBound v_u32 (0x20000000) /* asuint (0x1p-63). */
# define BigBound v_u32 (0x42800000) /* asuint (0x1p6). */
# define SpecialBound v_u32 (0x22800000) /* BigBound - TinyBound. */
static float32x4_t VPCS_ATTR NOINLINE
special_case (float32x4_t x, float32x4_t y, uint32x4_t cmp)
{
/* If fenv exceptions are to be triggered correctly, fall back to the scalar
routine to special lanes. */
return v_call_f32 (expf, x, y, cmp);
}
#else
# define SpecialOffset v_u32 (0x82000000)
# define SpecialBias v_u32 (0x7f000000)
static float32x4_t VPCS_ATTR NOINLINE
special_case (float32x4_t poly, float32x4_t n, uint32x4_t e, uint32x4_t cmp1,
float32x4_t scale, const struct data *d)
{
/* 2^n may overflow, break it up into s1*s2. */
uint32x4_t b = vandq_u32 (vclezq_f32 (n), SpecialOffset);
float32x4_t s1 = vreinterpretq_f32_u32 (vaddq_u32 (b, SpecialBias));
float32x4_t s2 = vreinterpretq_f32_u32 (vsubq_u32 (e, b));
uint32x4_t cmp2 = vcagtq_f32 (n, d->scale_thresh);
float32x4_t r2 = vmulq_f32 (s1, s1);
float32x4_t r1 = vmulq_f32 (vfmaq_f32 (s2, poly, s2), s1);
/* Similar to r1 but avoids double rounding in the subnormal range. */
float32x4_t r0 = vfmaq_f32 (scale, poly, scale);
float32x4_t r = vbslq_f32 (cmp1, r1, r0);
return vbslq_f32 (cmp2, r2, r);
}
#endif
float32x4_t VPCS_ATTR V_NAME_F1 (exp) (float32x4_t x)
{
const struct data *d = ptr_barrier (&data);
float32x4_t n, r, r2, scale, p, q, poly, z;
uint32x4_t cmp, e;
#if WANT_SIMD_EXCEPT
/* asuint(x) - TinyBound >= BigBound - TinyBound. */
cmp = vcgeq_u32 (
vsubq_u32 (vandq_u32 (vreinterpretq_u32_f32 (x), v_u32 (0x7fffffff)),
TinyBound),
SpecialBound);
float32x4_t xm = x;
/* If any lanes are special, mask them with 1 and retain a copy of x to allow
special case handler to fix special lanes later. This is only necessary if
fenv exceptions are to be triggered correctly. */
if (unlikely (v_any_u32 (cmp)))
x = vbslq_f32 (cmp, v_f32 (1), x);
#endif
/* exp(x) = 2^n (1 + poly(r)), with 1 + poly(r) in [1/sqrt(2),sqrt(2)]
x = ln2*n + r, with r in [-ln2/2, ln2/2]. */
z = vfmaq_f32 (d->shift, x, d->inv_ln2);
n = vsubq_f32 (z, d->shift);
r = vfmsq_f32 (x, n, d->ln2_hi);
r = vfmsq_f32 (r, n, d->ln2_lo);
e = vshlq_n_u32 (vreinterpretq_u32_f32 (z), 23);
scale = vreinterpretq_f32_u32 (vaddq_u32 (e, d->exponent_bias));
#if !WANT_SIMD_EXCEPT
cmp = vcagtq_f32 (n, d->special_bound);
#endif
r2 = vmulq_f32 (r, r);
p = vfmaq_f32 (C (1), C (0), r);
q = vfmaq_f32 (C (3), C (2), r);
q = vfmaq_f32 (q, p, r2);
p = vmulq_f32 (C (4), r);
poly = vfmaq_f32 (p, q, r2);
if (unlikely (v_any_u32 (cmp)))
#if WANT_SIMD_EXCEPT
return special_case (xm, vfmaq_f32 (scale, poly, scale), cmp);
#else
return special_case (poly, n, e, cmp, scale, d);
#endif
return vfmaq_f32 (scale, poly, scale);
}
-77
View File
@@ -1,77 +0,0 @@
/*
* Single-precision vector e^x function.
*
* Copyright (c) 2019-2023, Arm Limited.
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
*/
#include "mathlib.h"
#include "v_math.h"
static const float Poly[] = {
/* maxerr: 0.36565 +0.5 ulp. */
0x1.6a6000p-10f,
0x1.12718ep-7f,
0x1.555af0p-5f,
0x1.555430p-3f,
0x1.fffff4p-2f,
};
#define C0 v_f32 (Poly[0])
#define C1 v_f32 (Poly[1])
#define C2 v_f32 (Poly[2])
#define C3 v_f32 (Poly[3])
#define C4 v_f32 (Poly[4])
#define Shift v_f32 (0x1.8p23f)
#define InvLn2 v_f32 (0x1.715476p+0f)
#define Ln2hi v_f32 (0x1.62e4p-1f)
#define Ln2lo v_f32 (0x1.7f7d1cp-20f)
static float32x4_t VPCS_ATTR NOINLINE
specialcase (float32x4_t poly, float32x4_t n, uint32x4_t e, float32x4_t absn)
{
/* 2^n may overflow, break it up into s1*s2. */
uint32x4_t b = (n <= v_f32 (0.0f)) & v_u32 (0x83000000);
float32x4_t s1 = vreinterpretq_f32_u32 (v_u32 (0x7f000000) + b);
float32x4_t s2 = vreinterpretq_f32_u32 (e - b);
uint32x4_t cmp = absn > v_f32 (192.0f);
float32x4_t r1 = s1 * s1;
float32x4_t r0 = poly * s1 * s2;
return vreinterpretq_f32_u32 ((cmp & vreinterpretq_u32_f32 (r1))
| (~cmp & vreinterpretq_u32_f32 (r0)));
}
float32x4_t VPCS_ATTR
_ZGVnN4v_expf_1u (float32x4_t x)
{
float32x4_t n, r, scale, poly, absn, z;
uint32x4_t cmp, e;
/* exp(x) = 2^n * poly(r), with poly(r) in [1/sqrt(2),sqrt(2)]
x = ln2*n + r, with r in [-ln2/2, ln2/2]. */
#if 1
z = vfmaq_f32 (Shift, x, InvLn2);
n = z - Shift;
r = vfmaq_f32 (x, n, -Ln2hi);
r = vfmaq_f32 (r, n, -Ln2lo);
e = vreinterpretq_u32_f32 (z) << 23;
#else
z = x * InvLn2;
n = vrndaq_f32 (z);
r = vfmaq_f32 (x, n, -Ln2hi);
r = vfmaq_f32 (r, n, -Ln2lo);
e = vreinterpretq_u32_s32 (vcvtaq_s32_f32 (z)) << 23;
#endif
scale = vreinterpretq_f32_u32 (e + v_u32 (0x3f800000));
absn = vabsq_f32 (n);
cmp = absn > v_f32 (126.0f);
poly = vfmaq_f32 (C1, C0, r);
poly = vfmaq_f32 (C2, poly, r);
poly = vfmaq_f32 (C3, poly, r);
poly = vfmaq_f32 (C4, poly, r);
poly = vfmaq_f32 (v_f32 (1.0f), poly, r);
poly = vfmaq_f32 (v_f32 (1.0f), poly, r);
if (unlikely (v_any_u32 (cmp)))
return specialcase (poly, n, e, absn);
return scale * poly;
}
-100
View File
@@ -1,100 +0,0 @@
/*
* Double-precision vector log(x) function.
*
* Copyright (c) 2019-2023, Arm Limited.
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
*/
#include "mathlib.h"
#include "v_math.h"
static const struct data
{
uint64x2_t min_norm;
uint32x4_t special_bound;
float64x2_t poly[5];
float64x2_t ln2;
uint64x2_t sign_exp_mask;
} data = {
/* Worst-case error: 1.17 + 0.5 ulp.
Rel error: 0x1.6272e588p-56 in [ -0x1.fc1p-9 0x1.009p-8 ]. */
.poly = { V2 (-0x1.ffffffffffff7p-2), V2 (0x1.55555555170d4p-2),
V2 (-0x1.0000000399c27p-2), V2 (0x1.999b2e90e94cap-3),
V2 (-0x1.554e550bd501ep-3) },
.ln2 = V2 (0x1.62e42fefa39efp-1),
.min_norm = V2 (0x0010000000000000),
.special_bound = V4 (0x7fe00000), /* asuint64(inf) - min_norm. */
.sign_exp_mask = V2 (0xfff0000000000000)
};
#define A(i) d->poly[i]
#define N (1 << V_LOG_TABLE_BITS)
#define IndexMask (N - 1)
#define Off v_u64 (0x3fe6900900000000)
struct entry
{
float64x2_t invc;
float64x2_t logc;
};
static inline struct entry
lookup (uint64x2_t i)
{
/* Since N is a power of 2, n % N = n & (N - 1). */
struct entry e;
uint64_t i0 = (i[0] >> (52 - V_LOG_TABLE_BITS)) & IndexMask;
uint64_t i1 = (i[1] >> (52 - V_LOG_TABLE_BITS)) & IndexMask;
float64x2_t e0 = vld1q_f64 (&__v_log_data.table[i0].invc);
float64x2_t e1 = vld1q_f64 (&__v_log_data.table[i1].invc);
e.invc = vuzp1q_f64 (e0, e1);
e.logc = vuzp2q_f64 (e0, e1);
return e;
}
static float64x2_t VPCS_ATTR NOINLINE
special_case (float64x2_t x, float64x2_t y, float64x2_t hi, float64x2_t r2,
uint32x2_t cmp)
{
return v_call_f64 (log, x, vfmaq_f64 (hi, y, r2), vmovl_u32 (cmp));
}
float64x2_t VPCS_ATTR V_NAME_D1 (log) (float64x2_t x)
{
const struct data *d = ptr_barrier (&data);
float64x2_t z, r, r2, p, y, kd, hi;
uint64x2_t ix, iz, tmp;
uint32x2_t cmp;
int64x2_t k;
struct entry e;
ix = vreinterpretq_u64_f64 (x);
cmp = vcge_u32 (vsubhn_u64 (ix, d->min_norm),
vget_low_u32 (d->special_bound));
/* x = 2^k z; where z is in range [Off,2*Off) and exact.
The range is split into N subintervals.
The ith subinterval contains z and c is near its center. */
tmp = vsubq_u64 (ix, Off);
k = vshrq_n_s64 (vreinterpretq_s64_u64 (tmp), 52); /* arithmetic shift. */
iz = vsubq_u64 (ix, vandq_u64 (tmp, d->sign_exp_mask));
z = vreinterpretq_f64_u64 (iz);
e = lookup (tmp);
/* log(x) = log1p(z/c-1) + log(c) + k*Ln2. */
r = vfmaq_f64 (v_f64 (-1.0), z, e.invc);
kd = vcvtq_f64_s64 (k);
/* hi = r + log(c) + k*Ln2. */
hi = vfmaq_f64 (vaddq_f64 (e.logc, r), kd, d->ln2);
/* y = r2*(A0 + r*A1 + r2*(A2 + r*A3 + r2*A4)) + hi. */
r2 = vmulq_f64 (r, r);
y = vfmaq_f64 (A (2), A (3), r);
p = vfmaq_f64 (A (0), A (1), r);
y = vfmaq_f64 (y, A (4), r2);
y = vfmaq_f64 (p, y, r2);
if (unlikely (v_any_u32h (cmp)))
return special_case (x, y, hi, r2, cmp);
return vfmaq_f64 (hi, y, r2);
}
-156
View File
@@ -1,156 +0,0 @@
/*
* Lookup table for double-precision log(x) vector function.
*
* Copyright (c) 2019-2023, Arm Limited.
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
*/
#include "v_math.h"
#define N (1 << V_LOG_TABLE_BITS)
const struct v_log_data __v_log_data = {
/* Algorithm:
x = 2^k z
log(x) = k ln2 + log(c) + poly(z/c - 1)
where z is in [a;2a) which is split into N subintervals (a=0x1.69009p-1,
N=128) and log(c) and 1/c for the ith subinterval comes from lookup tables:
table[i].invc = 1/c
table[i].logc = (double)log(c)
where c is near the center of the subinterval and is chosen by trying several
floating point invc candidates around 1/center and selecting one for which
the error in (double)log(c) is minimized (< 0x1p-74), except the subinterval
that contains 1 and the previous one got tweaked to avoid cancellation. */
.table = { { 0x1.6a133d0dec120p+0, -0x1.62fe995eb963ap-2 },
{ 0x1.6815f2f3e42edp+0, -0x1.5d5a48dad6b67p-2 },
{ 0x1.661e39be1ac9ep+0, -0x1.57bde257d2769p-2 },
{ 0x1.642bfa30ac371p+0, -0x1.52294fbf2af55p-2 },
{ 0x1.623f1d916f323p+0, -0x1.4c9c7b598aa38p-2 },
{ 0x1.60578da220f65p+0, -0x1.47174fc5ff560p-2 },
{ 0x1.5e75349dea571p+0, -0x1.4199b7fa7b5cap-2 },
{ 0x1.5c97fd387a75ap+0, -0x1.3c239f48cfb99p-2 },
{ 0x1.5abfd2981f200p+0, -0x1.36b4f154d2aebp-2 },
{ 0x1.58eca051dc99cp+0, -0x1.314d9a0ff32fbp-2 },
{ 0x1.571e526d9df12p+0, -0x1.2bed85cca3cffp-2 },
{ 0x1.5554d555b3fcbp+0, -0x1.2694a11421af9p-2 },
{ 0x1.539015e2a20cdp+0, -0x1.2142d8d014fb2p-2 },
{ 0x1.51d0014ee0164p+0, -0x1.1bf81a2c77776p-2 },
{ 0x1.50148538cd9eep+0, -0x1.16b452a39c6a4p-2 },
{ 0x1.4e5d8f9f698a1p+0, -0x1.11776ffa6c67ep-2 },
{ 0x1.4cab0edca66bep+0, -0x1.0c416035020e0p-2 },
{ 0x1.4afcf1a9db874p+0, -0x1.071211aa10fdap-2 },
{ 0x1.495327136e16fp+0, -0x1.01e972e293b1bp-2 },
{ 0x1.47ad9e84af28fp+0, -0x1.f98ee587fd434p-3 },
{ 0x1.460c47b39ae15p+0, -0x1.ef5800ad716fbp-3 },
{ 0x1.446f12b278001p+0, -0x1.e52e160484698p-3 },
{ 0x1.42d5efdd720ecp+0, -0x1.db1104b19352ep-3 },
{ 0x1.4140cfe001a0fp+0, -0x1.d100ac59e0bd6p-3 },
{ 0x1.3fafa3b421f69p+0, -0x1.c6fced287c3bdp-3 },
{ 0x1.3e225c9c8ece5p+0, -0x1.bd05a7b317c29p-3 },
{ 0x1.3c98ec29a211ap+0, -0x1.b31abd229164fp-3 },
{ 0x1.3b13442a413fep+0, -0x1.a93c0edadb0a3p-3 },
{ 0x1.399156baa3c54p+0, -0x1.9f697ee30d7ddp-3 },
{ 0x1.38131639b4cdbp+0, -0x1.95a2efa9aa40ap-3 },
{ 0x1.36987540fbf53p+0, -0x1.8be843d796044p-3 },
{ 0x1.352166b648f61p+0, -0x1.82395ecc477edp-3 },
{ 0x1.33adddb3eb575p+0, -0x1.7896240966422p-3 },
{ 0x1.323dcd99fc1d3p+0, -0x1.6efe77aca8c55p-3 },
{ 0x1.30d129fefc7d2p+0, -0x1.65723e117ec5cp-3 },
{ 0x1.2f67e6b72fe7dp+0, -0x1.5bf15c0955706p-3 },
{ 0x1.2e01f7cf8b187p+0, -0x1.527bb6c111da1p-3 },
{ 0x1.2c9f518ddc86ep+0, -0x1.491133c939f8fp-3 },
{ 0x1.2b3fe86e5f413p+0, -0x1.3fb1b90c7fc58p-3 },
{ 0x1.29e3b1211b25cp+0, -0x1.365d2cc485f8dp-3 },
{ 0x1.288aa08b373cfp+0, -0x1.2d13758970de7p-3 },
{ 0x1.2734abcaa8467p+0, -0x1.23d47a721fd47p-3 },
{ 0x1.25e1c82459b81p+0, -0x1.1aa0229f25ec2p-3 },
{ 0x1.2491eb1ad59c5p+0, -0x1.117655ddebc3bp-3 },
{ 0x1.23450a54048b5p+0, -0x1.0856fbf83ab6bp-3 },
{ 0x1.21fb1bb09e578p+0, -0x1.fe83fabbaa106p-4 },
{ 0x1.20b415346d8f7p+0, -0x1.ec6e8507a56cdp-4 },
{ 0x1.1f6fed179a1acp+0, -0x1.da6d68c7cc2eap-4 },
{ 0x1.1e2e99b93c7b3p+0, -0x1.c88078462be0cp-4 },
{ 0x1.1cf011a7a882ap+0, -0x1.b6a786a423565p-4 },
{ 0x1.1bb44b97dba5ap+0, -0x1.a4e2676ac7f85p-4 },
{ 0x1.1a7b3e66cdd4fp+0, -0x1.9330eea777e76p-4 },
{ 0x1.1944e11dc56cdp+0, -0x1.8192f134d5ad9p-4 },
{ 0x1.18112aebb1a6ep+0, -0x1.70084464f0538p-4 },
{ 0x1.16e013231b7e9p+0, -0x1.5e90bdec5cb1fp-4 },
{ 0x1.15b1913f156cfp+0, -0x1.4d2c3433c5536p-4 },
{ 0x1.14859cdedde13p+0, -0x1.3bda7e219879ap-4 },
{ 0x1.135c2dc68cfa4p+0, -0x1.2a9b732d27194p-4 },
{ 0x1.12353bdb01684p+0, -0x1.196eeb2b10807p-4 },
{ 0x1.1110bf25b85b4p+0, -0x1.0854be8ef8a7ep-4 },
{ 0x1.0feeafd2f8577p+0, -0x1.ee998cb277432p-5 },
{ 0x1.0ecf062c51c3bp+0, -0x1.ccadb79919fb9p-5 },
{ 0x1.0db1baa076c8bp+0, -0x1.aae5b1d8618b0p-5 },
{ 0x1.0c96c5bb3048ep+0, -0x1.89413015d7442p-5 },
{ 0x1.0b7e20263e070p+0, -0x1.67bfe7bf158dep-5 },
{ 0x1.0a67c2acd0ce3p+0, -0x1.46618f83941bep-5 },
{ 0x1.0953a6391e982p+0, -0x1.2525df1b0618ap-5 },
{ 0x1.0841c3caea380p+0, -0x1.040c8e2f77c6ap-5 },
{ 0x1.07321489b13eap+0, -0x1.c62aad39f738ap-6 },
{ 0x1.062491aee9904p+0, -0x1.847fe3bdead9cp-6 },
{ 0x1.05193497a7cc5p+0, -0x1.43183683400acp-6 },
{ 0x1.040ff6b5f5e9fp+0, -0x1.01f31c4e1d544p-6 },
{ 0x1.0308d19aa6127p+0, -0x1.82201d1e6b69ap-7 },
{ 0x1.0203beedb0c67p+0, -0x1.00dd0f3e1bfd6p-7 },
{ 0x1.010037d38bcc2p+0, -0x1.ff6fe1feb4e53p-9 },
{ 1.0, 0.0 },
{ 0x1.fc06d493cca10p-1, 0x1.fe91885ec8e20p-8 },
{ 0x1.f81e6ac3b918fp-1, 0x1.fc516f716296dp-7 },
{ 0x1.f44546ef18996p-1, 0x1.7bb4dd70a015bp-6 },
{ 0x1.f07b10382c84bp-1, 0x1.f84c99b34b674p-6 },
{ 0x1.ecbf7070e59d4p-1, 0x1.39f9ce4fb2d71p-5 },
{ 0x1.e91213f715939p-1, 0x1.7756c0fd22e78p-5 },
{ 0x1.e572a9a75f7b7p-1, 0x1.b43ee82db8f3ap-5 },
{ 0x1.e1e0e2c530207p-1, 0x1.f0b3fced60034p-5 },
{ 0x1.de5c72d8a8be3p-1, 0x1.165bd78d4878ep-4 },
{ 0x1.dae50fa5658ccp-1, 0x1.3425d2715ebe6p-4 },
{ 0x1.d77a71145a2dap-1, 0x1.51b8bd91b7915p-4 },
{ 0x1.d41c51166623ep-1, 0x1.6f15632c76a47p-4 },
{ 0x1.d0ca6ba0bb29fp-1, 0x1.8c3c88ecbe503p-4 },
{ 0x1.cd847e8e59681p-1, 0x1.a92ef077625dap-4 },
{ 0x1.ca4a499693e00p-1, 0x1.c5ed5745fa006p-4 },
{ 0x1.c71b8e399e821p-1, 0x1.e27876de1c993p-4 },
{ 0x1.c3f80faf19077p-1, 0x1.fed104fce4cdcp-4 },
{ 0x1.c0df92dc2b0ecp-1, 0x1.0d7bd9c17d78bp-3 },
{ 0x1.bdd1de3cbb542p-1, 0x1.1b76986cef97bp-3 },
{ 0x1.baceb9e1007a3p-1, 0x1.295913d24f750p-3 },
{ 0x1.b7d5ef543e55ep-1, 0x1.37239fa295d17p-3 },
{ 0x1.b4e749977d953p-1, 0x1.44d68dd78714bp-3 },
{ 0x1.b20295155478ep-1, 0x1.52722ebe5d780p-3 },
{ 0x1.af279f8e82be2p-1, 0x1.5ff6d12671f98p-3 },
{ 0x1.ac5638197fdf3p-1, 0x1.6d64c2389484bp-3 },
{ 0x1.a98e2f102e087p-1, 0x1.7abc4da40fddap-3 },
{ 0x1.a6cf5606d05c1p-1, 0x1.87fdbda1e8452p-3 },
{ 0x1.a4197fc04d746p-1, 0x1.95295b06a5f37p-3 },
{ 0x1.a16c80293dc01p-1, 0x1.a23f6d34abbc5p-3 },
{ 0x1.9ec82c4dc5bc9p-1, 0x1.af403a28e04f2p-3 },
{ 0x1.9c2c5a491f534p-1, 0x1.bc2c06a85721ap-3 },
{ 0x1.9998e1480b618p-1, 0x1.c903161240163p-3 },
{ 0x1.970d9977c6c2dp-1, 0x1.d5c5aa93287ebp-3 },
{ 0x1.948a5c023d212p-1, 0x1.e274051823fa9p-3 },
{ 0x1.920f0303d6809p-1, 0x1.ef0e656300c16p-3 },
{ 0x1.8f9b698a98b45p-1, 0x1.fb9509f05aa2ap-3 },
{ 0x1.8d2f6b81726f6p-1, 0x1.04041821f37afp-2 },
{ 0x1.8acae5bb55badp-1, 0x1.0a340a49b3029p-2 },
{ 0x1.886db5d9275b8p-1, 0x1.105a7918a126dp-2 },
{ 0x1.8617ba567c13cp-1, 0x1.1677819812b84p-2 },
{ 0x1.83c8d27487800p-1, 0x1.1c8b405b40c0ep-2 },
{ 0x1.8180de3c5dbe7p-1, 0x1.2295d16cfa6b1p-2 },
{ 0x1.7f3fbe71cdb71p-1, 0x1.28975066318a2p-2 },
{ 0x1.7d055498071c1p-1, 0x1.2e8fd855d86fcp-2 },
{ 0x1.7ad182e54f65ap-1, 0x1.347f83d605e59p-2 },
{ 0x1.78a42c3c90125p-1, 0x1.3a666d1244588p-2 },
{ 0x1.767d342f76944p-1, 0x1.4044adb6f8ec4p-2 },
{ 0x1.745c7ef26b00ap-1, 0x1.461a5f077558cp-2 },
{ 0x1.7241f15769d0fp-1, 0x1.4be799e20b9c8p-2 },
{ 0x1.702d70d396e41p-1, 0x1.51ac76a6b79dfp-2 },
{ 0x1.6e1ee3700cd11p-1, 0x1.57690d5744a45p-2 },
{ 0x1.6c162fc9cbe02p-1, 0x1.5d1d758e45217p-2 } }
};
-74
View File
@@ -1,74 +0,0 @@
/*
* Single-precision vector log function.
*
* Copyright (c) 2019-2023, Arm Limited.
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
*/
#include "mathlib.h"
#include "v_math.h"
static const struct data
{
uint32x4_t min_norm;
uint16x8_t special_bound;
float32x4_t poly[7];
float32x4_t ln2, tiny_bound;
uint32x4_t off, mantissa_mask;
} data = {
/* 3.34 ulp error. */
.poly = { V4 (-0x1.3e737cp-3f), V4 (0x1.5a9aa2p-3f), V4 (-0x1.4f9934p-3f),
V4 (0x1.961348p-3f), V4 (-0x1.00187cp-2f), V4 (0x1.555d7cp-2f),
V4 (-0x1.ffffc8p-2f) },
.ln2 = V4 (0x1.62e43p-1f),
.tiny_bound = V4 (0x1p-126),
.min_norm = V4 (0x00800000),
.special_bound = V8 (0x7f00), /* asuint32(inf) - min_norm. */
.off = V4 (0x3f2aaaab), /* 0.666667. */
.mantissa_mask = V4 (0x007fffff)
};
#define P(i) d->poly[7 - i]
static float32x4_t VPCS_ATTR NOINLINE
special_case (float32x4_t x, float32x4_t y, float32x4_t r2, float32x4_t p,
uint16x4_t cmp)
{
/* Fall back to scalar code. */
return v_call_f32 (logf, x, vfmaq_f32 (p, y, r2), vmovl_u16 (cmp));
}
float32x4_t VPCS_ATTR V_NAME_F1 (log) (float32x4_t x)
{
const struct data *d = ptr_barrier (&data);
float32x4_t n, p, q, r, r2, y;
uint32x4_t u;
uint16x4_t cmp;
u = vreinterpretq_u32_f32 (x);
cmp = vcge_u16 (vsubhn_u32 (u, d->min_norm),
vget_low_u16 (d->special_bound));
/* x = 2^n * (1+r), where 2/3 < 1+r < 4/3. */
u = vsubq_u32 (u, d->off);
n = vcvtq_f32_s32 (
vshrq_n_s32 (vreinterpretq_s32_u32 (u), 23)); /* signextend. */
u = vandq_u32 (u, d->mantissa_mask);
u = vaddq_u32 (u, d->off);
r = vsubq_f32 (vreinterpretq_f32_u32 (u), v_f32 (1.0f));
/* y = log(1+r) + n*ln2. */
r2 = vmulq_f32 (r, r);
/* n*ln2 + r + r2*(P1 + r*P2 + r2*(P3 + r*P4 + r2*(P5 + r*P6 + r2*P7))). */
p = vfmaq_f32 (P (5), P (6), r);
q = vfmaq_f32 (P (3), P (4), r);
y = vfmaq_f32 (P (1), P (2), r);
p = vfmaq_f32 (p, P (7), r2);
q = vfmaq_f32 (q, p, r2);
y = vfmaq_f32 (y, q, r2);
p = vfmaq_f32 (r, d->ln2, n);
if (unlikely (v_any_u16h (cmp)))
return special_case (x, y, r2, p, cmp);
return vfmaq_f32 (p, y, r2);
}
-135
View File
@@ -1,135 +0,0 @@
/*
* Vector math abstractions.
*
* Copyright (c) 2019-2023, Arm Limited.
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
*/
#ifndef _V_MATH_H
#define _V_MATH_H
#if !__aarch64__
# error "Cannot build without AArch64"
#endif
#define VPCS_ATTR __attribute__ ((aarch64_vector_pcs))
#define V_NAME_F1(fun) _ZGVnN4v_##fun##f
#define V_NAME_D1(fun) _ZGVnN2v_##fun
#define V_NAME_F2(fun) _ZGVnN4vv_##fun##f
#define V_NAME_D2(fun) _ZGVnN2vv_##fun
#include <stdint.h>
#include "../math_config.h"
#include <arm_neon.h>
/* Shorthand helpers for declaring constants. */
# define V2(X) { X, X }
# define V4(X) { X, X, X, X }
# define V8(X) { X, X, X, X, X, X, X, X }
static inline int
v_any_u16h (uint16x4_t x)
{
return vget_lane_u64 (vreinterpret_u64_u16 (x), 0) != 0;
}
static inline int
v_lanes32 (void)
{
return 4;
}
static inline float32x4_t
v_f32 (float x)
{
return (float32x4_t) V4 (x);
}
static inline uint32x4_t
v_u32 (uint32_t x)
{
return (uint32x4_t) V4 (x);
}
/* true if any elements of a v_cond result is non-zero. */
static inline int
v_any_u32 (uint32x4_t x)
{
/* assume elements in x are either 0 or -1u. */
return vpaddd_u64 (vreinterpretq_u64_u32 (x)) != 0;
}
static inline int
v_any_u32h (uint32x2_t x)
{
return vget_lane_u64 (vreinterpret_u64_u32 (x), 0) != 0;
}
static inline float32x4_t
v_lookup_f32 (const float *tab, uint32x4_t idx)
{
return (float32x4_t){tab[idx[0]], tab[idx[1]], tab[idx[2]], tab[idx[3]]};
}
static inline uint32x4_t
v_lookup_u32 (const uint32_t *tab, uint32x4_t idx)
{
return (uint32x4_t){tab[idx[0]], tab[idx[1]], tab[idx[2]], tab[idx[3]]};
}
static inline float32x4_t
v_call_f32 (float (*f) (float), float32x4_t x, float32x4_t y, uint32x4_t p)
{
return (float32x4_t){p[0] ? f (x[0]) : y[0], p[1] ? f (x[1]) : y[1],
p[2] ? f (x[2]) : y[2], p[3] ? f (x[3]) : y[3]};
}
static inline float32x4_t
v_call2_f32 (float (*f) (float, float), float32x4_t x1, float32x4_t x2,
float32x4_t y, uint32x4_t p)
{
return (float32x4_t){p[0] ? f (x1[0], x2[0]) : y[0],
p[1] ? f (x1[1], x2[1]) : y[1],
p[2] ? f (x1[2], x2[2]) : y[2],
p[3] ? f (x1[3], x2[3]) : y[3]};
}
static inline int
v_lanes64 (void)
{
return 2;
}
static inline float64x2_t
v_f64 (double x)
{
return (float64x2_t) V2 (x);
}
static inline uint64x2_t
v_u64 (uint64_t x)
{
return (uint64x2_t) V2 (x);
}
/* true if any elements of a v_cond result is non-zero. */
static inline int
v_any_u64 (uint64x2_t x)
{
/* assume elements in x are either 0 or -1u. */
return vpaddd_u64 (x) != 0;
}
static inline float64x2_t
v_lookup_f64 (const double *tab, uint64x2_t idx)
{
return (float64x2_t){tab[idx[0]], tab[idx[1]]};
}
static inline uint64x2_t
v_lookup_u64 (const uint64_t *tab, uint64x2_t idx)
{
return (uint64x2_t){tab[idx[0]], tab[idx[1]]};
}
static inline float64x2_t
v_call_f64 (double (*f) (double), float64x2_t x, float64x2_t y, uint64x2_t p)
{
double p1 = p[1];
double x1 = x[1];
if (likely (p[0]))
y[0] = f (x[0]);
if (likely (p1))
y[1] = f (x1);
return y;
}
#endif
-22
View File
@@ -1,22 +0,0 @@
/*
* Double-precision vector pow function.
*
* Copyright (c) 2020-2023, Arm Limited.
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
*/
#include "mathlib.h"
#include "v_math.h"
float64x2_t VPCS_ATTR V_NAME_D2 (pow) (float64x2_t x, float64x2_t y)
{
float64x2_t z;
for (int lane = 0; lane < v_lanes64 (); lane++)
{
double sx = x[lane];
double sy = y[lane];
double sz = pow (sx, sy);
z[lane] = sz;
}
return z;
}
-148
View File
@@ -1,148 +0,0 @@
/*
* Single-precision vector powf function.
*
* Copyright (c) 2019-2023, Arm Limited.
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
*/
#include "v_math.h"
#define Min v_u32 (0x00800000)
#define Max v_u32 (0x7f800000)
#define Thresh v_u32 (0x7f000000) /* Max - Min. */
#define MantissaMask v_u32 (0x007fffff)
#define A data.log2_poly
#define C data.exp2f_poly
/* 2.6 ulp ~ 0.5 + 2^24 (128*Ln2*relerr_log2 + relerr_exp2). */
#define Off v_u32 (0x3f35d000)
#define V_POWF_LOG2_TABLE_BITS 5
#define V_EXP2F_TABLE_BITS 5
#define Log2IdxMask v_u32 ((1 << V_POWF_LOG2_TABLE_BITS) - 1)
#define Scale ((double) (1 << V_EXP2F_TABLE_BITS))
static const struct
{
struct
{
double invc, logc;
} log2_tab[1 << V_POWF_LOG2_TABLE_BITS];
double log2_poly[4];
uint64_t exp2f_tab[1 << V_EXP2F_TABLE_BITS];
double exp2f_poly[3];
} data = {
.log2_tab = {{0x1.6489890582816p+0, -0x1.e960f97b22702p-2 * Scale},
{0x1.5cf19b35e3472p+0, -0x1.c993406cd4db6p-2 * Scale},
{0x1.55aac0e956d65p+0, -0x1.aa711d9a7d0f3p-2 * Scale},
{0x1.4eb0022977e01p+0, -0x1.8bf37bacdce9bp-2 * Scale},
{0x1.47fcccda1dd1fp+0, -0x1.6e13b3519946ep-2 * Scale},
{0x1.418ceabab68c1p+0, -0x1.50cb8281e4089p-2 * Scale},
{0x1.3b5c788f1edb3p+0, -0x1.341504a237e2bp-2 * Scale},
{0x1.3567de48e9c9ap+0, -0x1.17eaab624ffbbp-2 * Scale},
{0x1.2fabc80fd19bap+0, -0x1.f88e708f8c853p-3 * Scale},
{0x1.2a25200ce536bp+0, -0x1.c24b6da113914p-3 * Scale},
{0x1.24d108e0152e3p+0, -0x1.8d02ee397cb1dp-3 * Scale},
{0x1.1facd8ab2fbe1p+0, -0x1.58ac1223408b3p-3 * Scale},
{0x1.1ab614a03efdfp+0, -0x1.253e6fd190e89p-3 * Scale},
{0x1.15ea6d03af9ffp+0, -0x1.e5641882c12ffp-4 * Scale},
{0x1.1147b994bb776p+0, -0x1.81fea712926f7p-4 * Scale},
{0x1.0ccbf650593aap+0, -0x1.203e240de64a3p-4 * Scale},
{0x1.0875408477302p+0, -0x1.8029b86a78281p-5 * Scale},
{0x1.0441d42a93328p+0, -0x1.85d713190fb9p-6 * Scale},
{0x1p+0, 0x0p+0 * Scale},
{0x1.f1d006c855e86p-1, 0x1.4c1cc07312997p-5 * Scale},
{0x1.e28c3341aa301p-1, 0x1.5e1848ccec948p-4 * Scale},
{0x1.d4bdf9aa64747p-1, 0x1.04cfcb7f1196fp-3 * Scale},
{0x1.c7b45a24e5803p-1, 0x1.582813d463c21p-3 * Scale},
{0x1.bb5f5eb2ed60ap-1, 0x1.a936fa68760ccp-3 * Scale},
{0x1.afb0bff8fe6b4p-1, 0x1.f81bc31d6cc4ep-3 * Scale},
{0x1.a49badf7ab1f5p-1, 0x1.2279a09fae6b1p-2 * Scale},
{0x1.9a14a111fc4c9p-1, 0x1.47ec0b6df5526p-2 * Scale},
{0x1.901131f5b2fdcp-1, 0x1.6c71762280f1p-2 * Scale},
{0x1.8687f73f6d865p-1, 0x1.90155070798dap-2 * Scale},
{0x1.7d7067eb77986p-1, 0x1.b2e23b1d3068cp-2 * Scale},
{0x1.74c2c1cf97b65p-1, 0x1.d4e21b0daa86ap-2 * Scale},
{0x1.6c77f37cff2a1p-1, 0x1.f61e2a2f67f3fp-2 * Scale},},
.log2_poly = { /* rel err: 1.5 * 2^-30. */
-0x1.6ff5daa3b3d7cp-2 * Scale, 0x1.ec81d03c01aebp-2 * Scale,
-0x1.71547bb43f101p-1 * Scale, 0x1.7154764a815cbp0 * Scale,},
.exp2f_tab = {0x3ff0000000000000, 0x3fefd9b0d3158574, 0x3fefb5586cf9890f,
0x3fef9301d0125b51, 0x3fef72b83c7d517b, 0x3fef54873168b9aa,
0x3fef387a6e756238, 0x3fef1e9df51fdee1, 0x3fef06fe0a31b715,
0x3feef1a7373aa9cb, 0x3feedea64c123422, 0x3feece086061892d,
0x3feebfdad5362a27, 0x3feeb42b569d4f82, 0x3feeab07dd485429,
0x3feea47eb03a5585, 0x3feea09e667f3bcd, 0x3fee9f75e8ec5f74,
0x3feea11473eb0187, 0x3feea589994cce13, 0x3feeace5422aa0db,
0x3feeb737b0cdc5e5, 0x3feec49182a3f090, 0x3feed503b23e255d,
0x3feee89f995ad3ad, 0x3feeff76f2fb5e47, 0x3fef199bdd85529c,
0x3fef3720dcef9069, 0x3fef5818dcfba487, 0x3fef7c97337b9b5f,
0x3fefa4afa2a490da, 0x3fefd0765b6e4540,},
.exp2f_poly = { /* rel err: 1.69 * 2^-34. */
0x1.c6af84b912394p-5 / Scale / Scale / Scale,
0x1.ebfce50fac4f3p-3 / Scale / Scale,
0x1.62e42ff0c52d6p-1 / Scale}};
static float32x4_t VPCS_ATTR NOINLINE
special_case (float32x4_t x, float32x4_t y, float32x4_t ret, uint32x4_t cmp)
{
return v_call2_f32 (powf, x, y, ret, cmp);
}
float32x4_t VPCS_ATTR V_NAME_F2 (pow) (float32x4_t x, float32x4_t y)
{
uint32x4_t u = vreinterpretq_u32_f32 (x);
uint32x4_t cmp = vcgeq_u32 (vsubq_u32 (u, Min), Thresh);
uint32x4_t tmp = vsubq_u32 (u, Off);
uint32x4_t i = vandq_u32 (vshrq_n_u32 (tmp, (23 - V_POWF_LOG2_TABLE_BITS)),
Log2IdxMask);
uint32x4_t top = vbicq_u32 (tmp, MantissaMask);
uint32x4_t iz = vsubq_u32 (u, top);
int32x4_t k = vshrq_n_s32 (vreinterpretq_s32_u32 (top),
23 - V_EXP2F_TABLE_BITS); /* arithmetic shift. */
float32x4_t ret;
for (int lane = 0; lane < 4; lane++)
{
/* Use double precision for each lane. */
double invc = data.log2_tab[i[lane]].invc;
double logc = data.log2_tab[i[lane]].logc;
double z = (double) asfloat (iz[lane]);
/* log2(x) = log1p(z/c-1)/ln2 + log2(c) + k. */
double r = __builtin_fma (z, invc, -1.0);
double y0 = logc + (double) k[lane];
/* Polynomial to approximate log1p(r)/ln2. */
double logx = A[0];
logx = r * logx + A[1];
logx = r * logx + A[2];
logx = r * logx + A[3];
logx = r * logx + y0;
double ylogx = y[lane] * logx;
cmp[lane] = (asuint64 (ylogx) >> 47 & 0xffff)
>= asuint64 (126.0 * (1 << V_EXP2F_TABLE_BITS)) >> 47
? 1
: cmp[lane];
/* N*x = k + r with r in [-1/2, 1/2]. */
double kd = round (ylogx);
uint64_t ki = lround (ylogx);
r = ylogx - kd;
/* exp2(x) = 2^(k/N) * 2^r ~= s * (C0*r^3 + C1*r^2 + C2*r + 1). */
uint64_t t = data.exp2f_tab[ki % (1 << V_EXP2F_TABLE_BITS)];
t += ki << (52 - V_EXP2F_TABLE_BITS);
double s = asdouble (t);
double p = C[0];
p = __builtin_fma (p, r, C[1]);
p = __builtin_fma (p, r, C[2]);
p = __builtin_fma (p, s * r, s);
ret[lane] = p;
}
if (unlikely (v_any_u32 (cmp)))
return special_case (x, y, ret, cmp);
return ret;
}
-97
View File
@@ -1,97 +0,0 @@
/*
* Double-precision vector sin function.
*
* Copyright (c) 2019-2023, Arm Limited.
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
*/
#include "mathlib.h"
#include "v_math.h"
static const struct data
{
float64x2_t poly[7];
float64x2_t range_val, inv_pi, shift, pi_1, pi_2, pi_3;
} data = {
.poly = { V2 (-0x1.555555555547bp-3), V2 (0x1.1111111108a4dp-7),
V2 (-0x1.a01a019936f27p-13), V2 (0x1.71de37a97d93ep-19),
V2 (-0x1.ae633919987c6p-26), V2 (0x1.60e277ae07cecp-33),
V2 (-0x1.9e9540300a1p-41) },
.range_val = V2 (0x1p23),
.inv_pi = V2 (0x1.45f306dc9c883p-2),
.pi_1 = V2 (0x1.921fb54442d18p+1),
.pi_2 = V2 (0x1.1a62633145c06p-53),
.pi_3 = V2 (0x1.c1cd129024e09p-106),
.shift = V2 (0x1.8p52),
};
#if WANT_SIMD_EXCEPT
# define TinyBound v_u64 (0x3000000000000000) /* asuint64 (0x1p-255). */
# define Thresh v_u64 (0x1160000000000000) /* RangeVal - TinyBound. */
#endif
#define C(i) d->poly[i]
static float64x2_t VPCS_ATTR NOINLINE
special_case (float64x2_t x, float64x2_t y, uint64x2_t odd, uint64x2_t cmp)
{
y = vreinterpretq_f64_u64 (veorq_u64 (vreinterpretq_u64_f64 (y), odd));
return v_call_f64 (sin, x, y, cmp);
}
/* Vector (AdvSIMD) sin approximation.
Maximum observed error in [-pi/2, pi/2], where argument is not reduced,
is 2.87 ULP:
_ZGVnN2v_sin (0x1.921d5c6a07142p+0) got 0x1.fffffffa7dc02p-1
want 0x1.fffffffa7dc05p-1
Maximum observed error in the entire non-special domain ([-2^23, 2^23])
is 3.22 ULP:
_ZGVnN2v_sin (0x1.5702447b6f17bp+22) got 0x1.ffdcd125c84fbp-3
want 0x1.ffdcd125c84f8p-3. */
float64x2_t VPCS_ATTR V_NAME_D1 (sin) (float64x2_t x)
{
const struct data *d = ptr_barrier (&data);
float64x2_t n, r, r2, r3, r4, y, t1, t2, t3;
uint64x2_t odd, cmp;
#if WANT_SIMD_EXCEPT
/* Detect |x| <= TinyBound or |x| >= RangeVal. If fenv exceptions are to be
triggered correctly, set any special lanes to 1 (which is neutral w.r.t.
fenv). These lanes will be fixed by special-case handler later. */
uint64x2_t ir = vreinterpretq_u64_f64 (vabsq_f64 (x));
cmp = vcgeq_u64 (vsubq_u64 (ir, TinyBound), Thresh);
r = vbslq_f64 (cmp, vreinterpretq_f64_u64 (cmp), x);
#else
r = x;
cmp = vcageq_f64 (x, d->range_val);
#endif
/* n = rint(|x|/pi). */
n = vfmaq_f64 (d->shift, d->inv_pi, r);
odd = vshlq_n_u64 (vreinterpretq_u64_f64 (n), 63);
n = vsubq_f64 (n, d->shift);
/* r = |x| - n*pi (range reduction into -pi/2 .. pi/2). */
r = vfmsq_f64 (r, d->pi_1, n);
r = vfmsq_f64 (r, d->pi_2, n);
r = vfmsq_f64 (r, d->pi_3, n);
/* sin(r) poly approx. */
r2 = vmulq_f64 (r, r);
r3 = vmulq_f64 (r2, r);
r4 = vmulq_f64 (r2, r2);
t1 = vfmaq_f64 (C (4), C (5), r2);
t2 = vfmaq_f64 (C (2), C (3), r2);
t3 = vfmaq_f64 (C (0), C (1), r2);
y = vfmaq_f64 (t1, C (6), r4);
y = vfmaq_f64 (t2, y, r4);
y = vfmaq_f64 (t3, y, r4);
y = vfmaq_f64 (r, y, r3);
if (unlikely (v_any_u64 (cmp)))
return special_case (x, y, odd, cmp);
return vreinterpretq_f64_u64 (veorq_u64 (vreinterpretq_u64_f64 (y), odd));
}
-82
View File
@@ -1,82 +0,0 @@
/*
* Single-precision vector sin function.
*
* Copyright (c) 2019-2023, Arm Limited.
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
*/
#include "mathlib.h"
#include "v_math.h"
static const struct data
{
float32x4_t poly[4];
float32x4_t range_val, inv_pi, shift, pi_1, pi_2, pi_3;
} data = {
/* 1.886 ulp error. */
.poly = { V4 (-0x1.555548p-3f), V4 (0x1.110df4p-7f), V4 (-0x1.9f42eap-13f),
V4 (0x1.5b2e76p-19f) },
.pi_1 = V4 (0x1.921fb6p+1f),
.pi_2 = V4 (-0x1.777a5cp-24f),
.pi_3 = V4 (-0x1.ee59dap-49f),
.inv_pi = V4 (0x1.45f306p-2f),
.shift = V4 (0x1.8p+23f),
.range_val = V4 (0x1p20f)
};
#if WANT_SIMD_EXCEPT
# define TinyBound v_u32 (0x21000000) /* asuint32(0x1p-61f). */
# define Thresh v_u32 (0x28800000) /* RangeVal - TinyBound. */
#endif
#define C(i) d->poly[i]
static float32x4_t VPCS_ATTR NOINLINE
special_case (float32x4_t x, float32x4_t y, uint32x4_t odd, uint32x4_t cmp)
{
/* Fall back to scalar code. */
y = vreinterpretq_f32_u32 (veorq_u32 (vreinterpretq_u32_f32 (y), odd));
return v_call_f32 (sinf, x, y, cmp);
}
float32x4_t VPCS_ATTR V_NAME_F1 (sin) (float32x4_t x)
{
const struct data *d = ptr_barrier (&data);
float32x4_t n, r, r2, y;
uint32x4_t odd, cmp;
#if WANT_SIMD_EXCEPT
uint32x4_t ir = vreinterpretq_u32_f32 (vabsq_f32 (x));
cmp = vcgeq_u32 (vsubq_u32 (ir, TinyBound), Thresh);
/* If fenv exceptions are to be triggered correctly, set any special lanes
to 1 (which is neutral w.r.t. fenv). These lanes will be fixed by
special-case handler later. */
r = vbslq_f32 (cmp, vreinterpretq_f32_u32 (cmp), x);
#else
r = x;
cmp = vcageq_f32 (x, d->range_val);
#endif
/* n = rint(|x|/pi) */
n = vfmaq_f32 (d->shift, d->inv_pi, r);
odd = vshlq_n_u32 (vreinterpretq_u32_f32 (n), 31);
n = vsubq_f32 (n, d->shift);
/* r = |x| - n*pi (range reduction into -pi/2 .. pi/2) */
r = vfmsq_f32 (r, d->pi_1, n);
r = vfmsq_f32 (r, d->pi_2, n);
r = vfmsq_f32 (r, d->pi_3, n);
/* y = sin(r) */
r2 = vmulq_f32 (r, r);
y = vfmaq_f32 (C (2), C (3), r2);
y = vfmaq_f32 (C (1), y, r2);
y = vfmaq_f32 (C (0), y, r2);
y = vfmaq_f32 (r, vmulq_f32 (y, r2), r);
if (unlikely (v_any_u32 (cmp)))
return special_case (x, y, odd, cmp);
return vreinterpretq_f32_u32 (veorq_u32 (vreinterpretq_u32_f32 (y), odd));
}
+3 -3
View File
@@ -1,8 +1,8 @@
/*
* Single-precision cos function.
*
* Copyright (c) 2018-2021, Arm Limited.
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
* Copyright (c) 2018-2019, Arm Limited.
* SPDX-License-Identifier: MIT
*/
#include <stdint.h>
@@ -22,7 +22,7 @@ cosf (float y)
int n;
const sincos_t *p = &__sincosf_table[0];
if (abstop12 (y) < abstop12 (pio4f))
if (abstop12 (y) < abstop12 (pio4))
{
double x2 = x * x;
+1 -1
View File
@@ -2,7 +2,7 @@
* Double-precision erf(x) function.
*
* Copyright (c) 2020, Arm Limited.
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
* SPDX-License-Identifier: MIT
*/
#include "math_config.h"
+1 -1
View File
@@ -2,7 +2,7 @@
* Shared data between erf and erfc.
*
* Copyright (c) 2019-2020, Arm Limited.
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
* SPDX-License-Identifier: MIT
*/
#include "math_config.h"
+1 -1
View File
@@ -2,7 +2,7 @@
* Single-precision erf(x) function.
*
* Copyright (c) 2020, Arm Limited.
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
* SPDX-License-Identifier: MIT
*/
#include <stdint.h>
+1 -1
View File
@@ -2,7 +2,7 @@
* Data for approximation of erff.
*
* Copyright (c) 2019-2020, Arm Limited.
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
* SPDX-License-Identifier: MIT
*/
#include "math_config.h"
+1 -1
View File
@@ -2,7 +2,7 @@
* Double-precision e^x function.
*
* Copyright (c) 2018-2019, Arm Limited.
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
* SPDX-License-Identifier: MIT
*/
#include <float.h>
-129
View File
@@ -1,129 +0,0 @@
/*
* Double-precision 10^x function.
*
* Copyright (c) 2023, Arm Limited.
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
*/
#include "math_config.h"
#define N (1 << EXP_TABLE_BITS)
#define IndexMask (N - 1)
#define OFlowBound 0x1.34413509f79ffp8 /* log10(DBL_MAX). */
#define UFlowBound -0x1.5ep+8 /* -350. */
#define SmallTop 0x3c6 /* top12(0x1p-57). */
#define BigTop 0x407 /* top12(0x1p8). */
#define Thresh 0x41 /* BigTop - SmallTop. */
#define Shift __exp_data.shift
#define C(i) __exp_data.exp10_poly[i]
static double
special_case (uint64_t sbits, double_t tmp, uint64_t ki)
{
double_t scale, y;
if (ki - (1ull << 16) < 0x80000000)
{
/* The exponent of scale might have overflowed by 1. */
sbits -= 1ull << 52;
scale = asdouble (sbits);
y = 2 * (scale + scale * tmp);
return check_oflow (eval_as_double (y));
}
/* n < 0, need special care in the subnormal range. */
sbits += 1022ull << 52;
scale = asdouble (sbits);
y = scale + scale * tmp;
if (y < 1.0)
{
/* Round y to the right precision before scaling it into the subnormal
range to avoid double rounding that can cause 0.5+E/2 ulp error where
E is the worst-case ulp error outside the subnormal range. So this
is only useful if the goal is better than 1 ulp worst-case error. */
double_t lo = scale - y + scale * tmp;
double_t hi = 1.0 + y;
lo = 1.0 - hi + y + lo;
y = eval_as_double (hi + lo) - 1.0;
/* Avoid -0.0 with downward rounding. */
if (WANT_ROUNDING && y == 0.0)
y = 0.0;
/* The underflow exception needs to be signaled explicitly. */
force_eval_double (opt_barrier_double (0x1p-1022) * 0x1p-1022);
}
y = 0x1p-1022 * y;
return check_uflow (y);
}
/* Double-precision 10^x approximation. Largest observed error is ~0.513 ULP. */
double
exp10 (double x)
{
uint64_t ix = asuint64 (x);
uint32_t abstop = (ix >> 52) & 0x7ff;
if (unlikely (abstop - SmallTop >= Thresh))
{
if (abstop - SmallTop >= 0x80000000)
/* Avoid spurious underflow for tiny x.
Note: 0 is common input. */
return x + 1;
if (abstop == 0x7ff)
return ix == asuint64 (-INFINITY) ? 0.0 : x + 1.0;
if (x >= OFlowBound)
return __math_oflow (0);
if (x < UFlowBound)
return __math_uflow (0);
/* Large x is special-cased below. */
abstop = 0;
}
/* Reduce x: z = x * N / log10(2), k = round(z). */
double_t z = __exp_data.invlog10_2N * x;
double_t kd;
int64_t ki;
#if TOINT_INTRINSICS
kd = roundtoint (z);
ki = converttoint (z);
#else
kd = eval_as_double (z + Shift);
kd -= Shift;
ki = kd;
#endif
/* r = x - k * log10(2), r in [-0.5, 0.5]. */
double_t r = x;
r = __exp_data.neglog10_2hiN * kd + r;
r = __exp_data.neglog10_2loN * kd + r;
/* exp10(x) = 2^(k/N) * 2^(r/N).
Approximate the two components separately. */
/* s = 2^(k/N), using lookup table. */
uint64_t e = ki << (52 - EXP_TABLE_BITS);
uint64_t i = (ki & IndexMask) * 2;
uint64_t u = __exp_data.tab[i + 1];
uint64_t sbits = u + e;
double_t tail = asdouble (__exp_data.tab[i]);
/* 2^(r/N) ~= 1 + r * Poly(r). */
double_t r2 = r * r;
double_t p = C (0) + r * C (1);
double_t y = C (2) + r * C (3);
y = y + r2 * C (4);
y = p + r2 * y;
y = tail + y * r;
if (unlikely (abstop == 0))
return special_case (sbits, y, ki);
/* Assemble components:
y = 2^(r/N) * 2^(k/N)
~= (y + 1) * s. */
double_t s = asdouble (sbits);
return eval_as_double (s * y + s);
}
+1 -1
View File
@@ -2,7 +2,7 @@
* Double-precision 2^x function.
*
* Copyright (c) 2018-2019, Arm Limited.
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
* SPDX-License-Identifier: MIT
*/
#include <float.h>
+1 -1
View File
@@ -2,7 +2,7 @@
* Single-precision 2^x function.
*
* Copyright (c) 2017-2018, Arm Limited.
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
* SPDX-License-Identifier: MIT
*/
#include <math.h>
+1 -1
View File
@@ -2,7 +2,7 @@
* Shared data between expf, exp2f and powf.
*
* Copyright (c) 2017-2018, Arm Limited.
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
* SPDX-License-Identifier: MIT
*/
#include "math_config.h"
+1 -22
View File
@@ -2,7 +2,7 @@
* Shared data between exp, exp2 and pow.
*
* Copyright (c) 2018, Arm Limited.
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
* SPDX-License-Identifier: MIT
*/
#include "math_config.h"
@@ -12,7 +12,6 @@
const struct exp_data __exp_data = {
// N/ln2
.invln2N = 0x1.71547652b82fep0 * N,
.invlog10_2N = 0x1.a934f0979a371p1 * N,
// -ln2/N
#if N == 64
.negln2hiN = -0x1.62e42fefa0000p-7,
@@ -27,8 +26,6 @@ const struct exp_data __exp_data = {
.negln2hiN = -0x1.62e42fef80000p-10,
.negln2loN = -0x1.1cf79abc9e3b4p-45,
#endif
.neglog10_2hiN = -0x1.3441350ap-2 / N,
.neglog10_2loN = 0x1.0c0219dc1da99p-39 / N,
// Used for rounding when !TOINT_INTRINSICS
#if EXP_USE_TOINT_NARROW
.shift = 0x1800000000.8p0,
@@ -150,24 +147,6 @@ const struct exp_data __exp_data = {
0x1.3b2ab786ee1dap-7,
#endif
},
.exp10_poly = {
#if EXP10_POLY_WIDE
/* Range is wider if using shift-based reduction: coeffs generated
using Remez in [-log10(2)/128, log10(2)/128 ]. */
0x1.26bb1bbb55515p1,
0x1.53524c73cd32bp1,
0x1.0470591e1a108p1,
0x1.2bd77b12fe9a8p0,
0x1.14289fef24b78p-1
#else
/* Coeffs generated using Remez in [-log10(2)/256, log10(2)/256 ]. */
0x1.26bb1bbb55516p1,
0x1.53524c73ce9fep1,
0x1.0470591ce4b26p1,
0x1.2bd76577fe684p0,
0x1.1446eeccd0efbp-1
#endif
},
// 2^(k/N) ~= H[k]*(1 + T[k]) for int k in [0,N)
// tab[2*k] = asuint64(T[k])
// tab[2*k+1] = asuint64(H[k]) - (k << 52)/N
+1 -1
View File
@@ -2,7 +2,7 @@
* Single-precision e^x function.
*
* Copyright (c) 2017-2019, Arm Limited.
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
* SPDX-License-Identifier: MIT
*/
#include <math.h>
+55 -14
View File
@@ -1,8 +1,8 @@
/*
* Public API.
*
* Copyright (c) 2015-2023, Arm Limited.
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
* Copyright (c) 2015-2020, Arm Limited.
* SPDX-License-Identifier: MIT
*/
#ifndef _MATHLIB_H
@@ -18,33 +18,74 @@ float cosf (float);
void sincosf (float, float*, float*);
double exp (double);
double exp10 (double);
double exp2 (double);
double log (double);
double log2 (double);
double pow (double, double);
/* Scalar functions using the vector algorithm with identical result. */
float __s_sinf (float);
float __s_cosf (float);
float __s_expf (float);
float __s_expf_1u (float);
float __s_exp2f (float);
float __s_exp2f_1u (float);
float __s_logf (float);
float __s_powf (float, float);
double __s_sin (double);
double __s_cos (double);
double __s_exp (double);
double __s_log (double);
double __s_pow (double, double);
#if __aarch64__
# if __GNUC__ >= 5
#if __GNUC__ >= 5
typedef __Float32x4_t __f32x4_t;
typedef __Float64x2_t __f64x2_t;
# elif __clang_major__*100+__clang_minor__ >= 305
#elif __clang_major__*100+__clang_minor__ >= 305
typedef __attribute__((__neon_vector_type__(4))) float __f32x4_t;
typedef __attribute__((__neon_vector_type__(2))) double __f64x2_t;
# else
# error Unsupported compiler
# endif
#else
#error Unsupported compiler
#endif
# if __GNUC__ >= 9 || __clang_major__ >= 8
# undef __vpcs
# define __vpcs __attribute__((__aarch64_vector_pcs__))
/* Vector functions following the base PCS. */
__f32x4_t __v_sinf (__f32x4_t);
__f32x4_t __v_cosf (__f32x4_t);
__f32x4_t __v_expf (__f32x4_t);
__f32x4_t __v_expf_1u (__f32x4_t);
__f32x4_t __v_exp2f (__f32x4_t);
__f32x4_t __v_exp2f_1u (__f32x4_t);
__f32x4_t __v_logf (__f32x4_t);
__f32x4_t __v_powf (__f32x4_t, __f32x4_t);
__f64x2_t __v_sin (__f64x2_t);
__f64x2_t __v_cos (__f64x2_t);
__f64x2_t __v_exp (__f64x2_t);
__f64x2_t __v_log (__f64x2_t);
__f64x2_t __v_pow (__f64x2_t, __f64x2_t);
#if __GNUC__ >= 9 || __clang_major__ >= 8
#define __vpcs __attribute__((__aarch64_vector_pcs__))
/* Vector functions following the vector PCS. */
__vpcs __f32x4_t __vn_sinf (__f32x4_t);
__vpcs __f32x4_t __vn_cosf (__f32x4_t);
__vpcs __f32x4_t __vn_expf (__f32x4_t);
__vpcs __f32x4_t __vn_expf_1u (__f32x4_t);
__vpcs __f32x4_t __vn_exp2f (__f32x4_t);
__vpcs __f32x4_t __vn_exp2f_1u (__f32x4_t);
__vpcs __f32x4_t __vn_logf (__f32x4_t);
__vpcs __f32x4_t __vn_powf (__f32x4_t, __f32x4_t);
__vpcs __f64x2_t __vn_sin (__f64x2_t);
__vpcs __f64x2_t __vn_cos (__f64x2_t);
__vpcs __f64x2_t __vn_exp (__f64x2_t);
__vpcs __f64x2_t __vn_log (__f64x2_t);
__vpcs __f64x2_t __vn_pow (__f64x2_t, __f64x2_t);
/* Vector functions following the vector PCS using ABI names. */
__vpcs __f32x4_t _ZGVnN4v_sinf (__f32x4_t);
__vpcs __f32x4_t _ZGVnN4v_cosf (__f32x4_t);
__vpcs __f32x4_t _ZGVnN4v_expf_1u (__f32x4_t);
__vpcs __f32x4_t _ZGVnN4v_expf (__f32x4_t);
__vpcs __f32x4_t _ZGVnN4v_exp2f_1u (__f32x4_t);
__vpcs __f32x4_t _ZGVnN4v_exp2f (__f32x4_t);
__vpcs __f32x4_t _ZGVnN4v_logf (__f32x4_t);
__vpcs __f32x4_t _ZGVnN4vv_powf (__f32x4_t, __f32x4_t);
@@ -53,7 +94,7 @@ __vpcs __f64x2_t _ZGVnN2v_cos (__f64x2_t);
__vpcs __f64x2_t _ZGVnN2v_exp (__f64x2_t);
__vpcs __f64x2_t _ZGVnN2v_log (__f64x2_t);
__vpcs __f64x2_t _ZGVnN2vv_pow (__f64x2_t, __f64x2_t);
# endif
#endif
#endif
#endif
+1 -1
View File
@@ -2,7 +2,7 @@
* Double-precision log(x) function.
*
* Copyright (c) 2018-2019, Arm Limited.
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
* SPDX-License-Identifier: MIT
*/
#include <float.h>
+1 -1
View File
@@ -2,7 +2,7 @@
* Double-precision log2(x) function.
*
* Copyright (c) 2018-2019, Arm Limited.
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
* SPDX-License-Identifier: MIT
*/
#include <float.h>
+1 -1
View File
@@ -2,7 +2,7 @@
* Data for log2.
*
* Copyright (c) 2018, Arm Limited.
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
* SPDX-License-Identifier: MIT
*/
#include "math_config.h"
+1 -1
View File
@@ -2,7 +2,7 @@
* Single-precision log2 function.
*
* Copyright (c) 2017-2018, Arm Limited.
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
* SPDX-License-Identifier: MIT
*/
#include <math.h>
+1 -1
View File
@@ -2,7 +2,7 @@
* Data definition for log2f.
*
* Copyright (c) 2017-2018, Arm Limited.
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
* SPDX-License-Identifier: MIT
*/
#include "math_config.h"
+1 -1
View File
@@ -2,7 +2,7 @@
* Data for log.
*
* Copyright (c) 2018, Arm Limited.
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
* SPDX-License-Identifier: MIT
*/
#include "math_config.h"
+3 -3
View File
@@ -1,8 +1,8 @@
/*
* Single-precision log function.
*
* Copyright (c) 2017-2023, Arm Limited.
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
* Copyright (c) 2017-2019, Arm Limited.
* SPDX-License-Identifier: MIT
*/
#include <math.h>
@@ -57,7 +57,7 @@ logf (float x)
tmp = ix - OFF;
i = (tmp >> (23 - LOGF_TABLE_BITS)) % N;
k = (int32_t) tmp >> 23; /* arithmetic shift */
iz = ix - (tmp & 0xff800000);
iz = ix - (tmp & 0x1ff << 23);
invc = T[i].invc;
logc = T[i].logc;
z = (double_t) asfloat (iz);
+1 -1
View File
@@ -2,7 +2,7 @@
* Data definition for logf.
*
* Copyright (c) 2017-2019, Arm Limited.
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
* SPDX-License-Identifier: MIT
*/
#include "math_config.h"
+2 -32
View File
@@ -1,8 +1,8 @@
/*
* Configuration for math routines.
*
* Copyright (c) 2017-2023, Arm Limited.
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
* Copyright (c) 2017-2020, Arm Limited.
* SPDX-License-Identifier: MIT
*/
#ifndef _MATH_CONFIG_H
@@ -92,17 +92,6 @@
# define unlikely(x) (x)
#endif
/* Return ptr but hide its value from the compiler so accesses through it
cannot be optimized based on the contents. */
#define ptr_barrier(ptr) \
({ \
__typeof (ptr) __ptr = (ptr); \
__asm("" : "+r"(__ptr)); \
__ptr; \
})
/* Symbol renames to avoid libc conflicts. */
#if HAVE_FAST_ROUND
/* When set, the roundtoint and converttoint functions are provided with
the semantics documented below. */
@@ -392,22 +381,15 @@ extern const struct powf_log2_data
#define EXP_USE_TOINT_NARROW 0
#define EXP2_POLY_ORDER 5
#define EXP2_POLY_WIDE 0
/* Wider exp10 polynomial necessary for good precision in non-nearest rounding
and !TOINT_INTRINSICS. */
#define EXP10_POLY_WIDE 0
extern const struct exp_data
{
double invln2N;
double invlog10_2N;
double shift;
double negln2hiN;
double negln2loN;
double neglog10_2hiN;
double neglog10_2loN;
double poly[4]; /* Last four coefficients. */
double exp2_shift;
double exp2_poly[EXP2_POLY_ORDER];
double exp10_poly[5];
uint64_t tab[2*(1 << EXP_TABLE_BITS)];
} __exp_data HIDDEN;
@@ -477,16 +459,4 @@ extern const struct erf_data
double erfc_poly_F[ERFC_POLY_F_NCOEFFS];
} __erf_data HIDDEN;
#define V_EXP_TABLE_BITS 7
extern const uint64_t __v_exp_data[1 << V_EXP_TABLE_BITS] HIDDEN;
#define V_LOG_TABLE_BITS 7
extern const struct v_log_data
{
struct
{
double invc, logc;
} table[1 << V_LOG_TABLE_BITS];
} __v_log_data HIDDEN;
#endif
+1 -1
View File
@@ -2,7 +2,7 @@
* Double-precision math error handling.
*
* Copyright (c) 2018, Arm Limited.
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
* SPDX-License-Identifier: MIT
*/
#include "math_config.h"
+1 -1
View File
@@ -2,7 +2,7 @@
* Single-precision math error handling.
*
* Copyright (c) 2017-2020, Arm Limited.
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
* SPDX-License-Identifier: MIT
*/
#include "math_config.h"
+1 -1
View File
@@ -2,7 +2,7 @@
* Double-precision x^y function.
*
* Copyright (c) 2018-2020, Arm Limited.
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
* SPDX-License-Identifier: MIT
*/
#include <float.h>
+1 -1
View File
@@ -2,7 +2,7 @@
* Data for the log part of pow.
*
* Copyright (c) 2018, Arm Limited.
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
* SPDX-License-Identifier: MIT
*/
#include "math_config.h"
+1 -1
View File
@@ -2,7 +2,7 @@
* Single-precision pow function.
*
* Copyright (c) 2017-2019, Arm Limited.
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
* SPDX-License-Identifier: MIT
*/
#include <math.h>
+1 -1
View File
@@ -2,7 +2,7 @@
* Data definition for powf.
*
* Copyright (c) 2017-2019, Arm Limited.
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
* SPDX-License-Identifier: MIT
*/
#include "math_config.h"
+6
View File
@@ -0,0 +1,6 @@
/*
* Copyright (c) 2019, Arm Limited.
* SPDX-License-Identifier: MIT
*/
#define SCALAR 1
#include "v_cos.c"
+6
View File
@@ -0,0 +1,6 @@
/*
* Copyright (c) 2019, Arm Limited.
* SPDX-License-Identifier: MIT
*/
#define SCALAR 1
#include "v_cosf.c"
+6
View File
@@ -0,0 +1,6 @@
/*
* Copyright (c) 2019, Arm Limited.
* SPDX-License-Identifier: MIT
*/
#define SCALAR 1
#include "v_exp.c"
+6
View File
@@ -0,0 +1,6 @@
/*
* Copyright (c) 2019, Arm Limited.
* SPDX-License-Identifier: MIT
*/
#define SCALAR 1
#include "v_exp2f.c"
+6
View File
@@ -0,0 +1,6 @@
/*
* Copyright (c) 2019, Arm Limited.
* SPDX-License-Identifier: MIT
*/
#define SCALAR 1
#include "v_exp2f_1u.c"
+6
View File
@@ -0,0 +1,6 @@
/*
* Copyright (c) 2019, Arm Limited.
* SPDX-License-Identifier: MIT
*/
#define SCALAR 1
#include "v_expf.c"
+6
View File
@@ -0,0 +1,6 @@
/*
* Copyright (c) 2019, Arm Limited.
* SPDX-License-Identifier: MIT
*/
#define SCALAR 1
#include "v_expf_1u.c"
+6
View File
@@ -0,0 +1,6 @@
/*
* Copyright (c) 2019, Arm Limited.
* SPDX-License-Identifier: MIT
*/
#define SCALAR 1
#include "v_log.c"
+6
View File
@@ -0,0 +1,6 @@
/*
* Copyright (c) 2019, Arm Limited.
* SPDX-License-Identifier: MIT
*/
#define SCALAR 1
#include "v_logf.c"
+6
View File
@@ -0,0 +1,6 @@
/*
* Copyright (c) 2020, Arm Limited.
* SPDX-License-Identifier: MIT
*/
#define SCALAR 1
#include "v_pow.c"
+6
View File
@@ -0,0 +1,6 @@
/*
* Copyright (c) 2019, Arm Limited.
* SPDX-License-Identifier: MIT
*/
#define SCALAR 1
#include "v_powf.c"
+6
View File
@@ -0,0 +1,6 @@
/*
* Copyright (c) 2019, Arm Limited.
* SPDX-License-Identifier: MIT
*/
#define SCALAR 1
#include "v_sin.c"
+6
View File
@@ -0,0 +1,6 @@
/*
* Copyright (c) 2019, Arm Limited.
* SPDX-License-Identifier: MIT
*/
#define SCALAR 1
#include "v_sinf.c"
+3 -3
View File
@@ -1,8 +1,8 @@
/*
* Single-precision sin/cos function.
*
* Copyright (c) 2018-2021, Arm Limited.
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
* Copyright (c) 2018-2019, Arm Limited.
* SPDX-License-Identifier: MIT
*/
#include <stdint.h>
@@ -22,7 +22,7 @@ sincosf (float y, float *sinp, float *cosp)
int n;
const sincos_t *p = &__sincosf_table[0];
if (abstop12 (y) < abstop12 (pio4f))
if (abstop12 (y) < abstop12 (pio4))
{
double x2 = x * x;
+3 -3
View File
@@ -1,8 +1,8 @@
/*
* Header for sinf, cosf and sincosf.
*
* Copyright (c) 2018-2021, Arm Limited.
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
* Copyright (c) 2018, Arm Limited.
* SPDX-License-Identifier: MIT
*/
#include <stdint.h>
@@ -12,7 +12,7 @@
/* 2PI * 2^-64. */
static const double pi63 = 0x1.921FB54442D18p-62;
/* PI / 4. */
static const float pio4f = 0x1.921FB6p-1f;
static const double pio4 = 0x1.921FB54442D18p-1;
/* The constants and polynomials for sine and cosine. */
typedef struct
+1 -1
View File
@@ -2,7 +2,7 @@
* Data definition for sinf, cosf and sincosf.
*
* Copyright (c) 2018-2019, Arm Limited.
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
* SPDX-License-Identifier: MIT
*/
#include <stdint.h>
+3 -3
View File
@@ -1,8 +1,8 @@
/*
* Single-precision sin function.
*
* Copyright (c) 2018-2021, Arm Limited.
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
* Copyright (c) 2018-2019, Arm Limited.
* SPDX-License-Identifier: MIT
*/
#include <math.h>
@@ -21,7 +21,7 @@ sinf (float y)
int n;
const sincos_t *p = &__sincosf_table[0];
if (abstop12 (y) < abstop12 (pio4f))
if (abstop12 (y) < abstop12 (pio4))
{
s = x * x;
+265 -134
View File
@@ -1,8 +1,8 @@
/*
* Microbenchmark for math functions.
*
* Copyright (c) 2018-2022, Arm Limited.
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
* Copyright (c) 2018-2020, Arm Limited.
* SPDX-License-Identifier: MIT
*/
#undef _GNU_SOURCE
@@ -15,6 +15,11 @@
#include <math.h>
#include "mathlib.h"
#ifndef WANT_VMATH
/* Enable the build of vector math code. */
# define WANT_VMATH 1
#endif
/* Number of measurements, best result is reported. */
#define MEASURE 60
/* Array size. */
@@ -29,9 +34,8 @@ static float Af[N];
static long measurecount = MEASURE;
static long itercount = ITER;
#ifdef __vpcs
#include <arm_neon.h>
typedef float64x2_t v_double;
#if __aarch64__ && WANT_VMATH
typedef __f64x2_t v_double;
#define v_double_len() 2
@@ -47,7 +51,7 @@ v_double_dup (double x)
return (v_double){x, x};
}
typedef float32x4_t v_float;
typedef __f32x4_t v_float;
#define v_float_len() 4
@@ -72,49 +76,6 @@ typedef float v_float;
#define v_float_len(x) 1
#define v_float_load(x) (x)[0]
#define v_float_dup(x) (x)
#endif
#if WANT_SVE_MATH
#include <arm_sve.h>
typedef svbool_t sv_bool;
typedef svfloat64_t sv_double;
#define sv_double_len() svcntd()
static inline sv_double
sv_double_load (const double *p)
{
svbool_t pg = svptrue_b64();
return svld1(pg, p);
}
static inline sv_double
sv_double_dup (double x)
{
return svdup_n_f64(x);
}
typedef svfloat32_t sv_float;
#define sv_float_len() svcntw()
static inline sv_float
sv_float_load (const float *p)
{
svbool_t pg = svptrue_b32();
return svld1(pg, p);
}
static inline sv_float
sv_float_dup (float x)
{
return svdup_n_f32(x);
}
#else
/* dummy definitions to make things compile. */
#define sv_double_len(x) 1
#define sv_float_len(x) 1
#endif
static double
@@ -128,6 +89,21 @@ dummyf (float x)
{
return x;
}
#if WANT_VMATH
#if __aarch64__
static v_double
__v_dummy (v_double x)
{
return x;
}
static v_float
__v_dummyf (v_float x)
{
return x;
}
#ifdef __vpcs
__vpcs static v_double
__vn_dummy (v_double x)
@@ -140,23 +116,101 @@ __vn_dummyf (v_float x)
{
return x;
}
#endif
#if WANT_SVE_MATH
static sv_double
__sv_dummy (sv_double x, sv_bool pg)
__vpcs static v_float
xy__vn_powf (v_float x)
{
return x;
return __vn_powf (x, x);
}
static sv_float
__sv_dummyf (sv_float x, sv_bool pg)
__vpcs static v_float
xy_Z_powf (v_float x)
{
return x;
return _ZGVnN4vv_powf (x, x);
}
__vpcs static v_double
xy__vn_pow (v_double x)
{
return __vn_pow (x, x);
}
__vpcs static v_double
xy_Z_pow (v_double x)
{
return _ZGVnN2vv_pow (x, x);
}
#endif
#include "test/mathbench_wrappers.h"
static v_float
xy__v_powf (v_float x)
{
return __v_powf (x, x);
}
static v_double
xy__v_pow (v_double x)
{
return __v_pow (x, x);
}
#endif
static float
xy__s_powf (float x)
{
return __s_powf (x, x);
}
static double
xy__s_pow (double x)
{
return __s_pow (x, x);
}
#endif
static double
xypow (double x)
{
return pow (x, x);
}
static float
xypowf (float x)
{
return powf (x, x);
}
static double
xpow (double x)
{
return pow (x, 23.4);
}
static float
xpowf (float x)
{
return powf (x, 23.4f);
}
static double
ypow (double x)
{
return pow (2.34, x);
}
static float
ypowf (float x)
{
return powf (2.34f, x);
}
static float
sincosf_wrap (float x)
{
float s, c;
sincosf (x, &s, &c);
return s + c;
}
static const struct fun
{
@@ -169,40 +223,127 @@ static const struct fun
{
double (*d) (double);
float (*f) (float);
v_double (*vd) (v_double);
v_float (*vf) (v_float);
#ifdef __vpcs
__vpcs v_double (*vnd) (v_double);
__vpcs v_float (*vnf) (v_float);
#endif
#if WANT_SVE_MATH
sv_double (*svd) (sv_double, sv_bool);
sv_float (*svf) (sv_float, sv_bool);
#endif
} fun;
} funtab[] = {
#define D(func, lo, hi) {#func, 'd', 0, lo, hi, {.d = func}},
#define F(func, lo, hi) {#func, 'f', 0, lo, hi, {.f = func}},
#define VD(func, lo, hi) {#func, 'd', 'v', lo, hi, {.vd = func}},
#define VF(func, lo, hi) {#func, 'f', 'v', lo, hi, {.vf = func}},
#define VND(func, lo, hi) {#func, 'd', 'n', lo, hi, {.vnd = func}},
#define VNF(func, lo, hi) {#func, 'f', 'n', lo, hi, {.vnf = func}},
#define SVD(func, lo, hi) {#func, 'd', 's', lo, hi, {.svd = func}},
#define SVF(func, lo, hi) {#func, 'f', 's', lo, hi, {.svf = func}},
D (dummy, 1.0, 2.0)
D (exp, -9.9, 9.9)
D (exp, 0.5, 1.0)
D (exp2, -9.9, 9.9)
D (log, 0.01, 11.1)
D (log, 0.999, 1.001)
D (log2, 0.01, 11.1)
D (log2, 0.999, 1.001)
{"pow", 'd', 0, 0.01, 11.1, {.d = xypow}},
D (xpow, 0.01, 11.1)
D (ypow, -9.9, 9.9)
D (erf, -6.0, 6.0)
F (dummyf, 1.0, 2.0)
F (expf, -9.9, 9.9)
F (exp2f, -9.9, 9.9)
F (logf, 0.01, 11.1)
F (log2f, 0.01, 11.1)
{"powf", 'f', 0, 0.01, 11.1, {.f = xypowf}},
F (xpowf, 0.01, 11.1)
F (ypowf, -9.9, 9.9)
{"sincosf", 'f', 0, 0.1, 0.7, {.f = sincosf_wrap}},
{"sincosf", 'f', 0, 0.8, 3.1, {.f = sincosf_wrap}},
{"sincosf", 'f', 0, -3.1, 3.1, {.f = sincosf_wrap}},
{"sincosf", 'f', 0, 3.3, 33.3, {.f = sincosf_wrap}},
{"sincosf", 'f', 0, 100, 1000, {.f = sincosf_wrap}},
{"sincosf", 'f', 0, 1e6, 1e32, {.f = sincosf_wrap}},
F (sinf, 0.1, 0.7)
F (sinf, 0.8, 3.1)
F (sinf, -3.1, 3.1)
F (sinf, 3.3, 33.3)
F (sinf, 100, 1000)
F (sinf, 1e6, 1e32)
F (cosf, 0.1, 0.7)
F (cosf, 0.8, 3.1)
F (cosf, -3.1, 3.1)
F (cosf, 3.3, 33.3)
F (cosf, 100, 1000)
F (cosf, 1e6, 1e32)
F (erff, -4.0, 4.0)
#if WANT_VMATH
D (__s_sin, -3.1, 3.1)
D (__s_cos, -3.1, 3.1)
D (__s_exp, -9.9, 9.9)
D (__s_log, 0.01, 11.1)
{"__s_pow", 'd', 0, 0.01, 11.1, {.d = xy__s_pow}},
F (__s_expf, -9.9, 9.9)
F (__s_expf_1u, -9.9, 9.9)
F (__s_exp2f, -9.9, 9.9)
F (__s_exp2f_1u, -9.9, 9.9)
F (__s_logf, 0.01, 11.1)
{"__s_powf", 'f', 0, 0.01, 11.1, {.f = xy__s_powf}},
F (__s_sinf, -3.1, 3.1)
F (__s_cosf, -3.1, 3.1)
#if __aarch64__
VD (__v_dummy, 1.0, 2.0)
VD (__v_sin, -3.1, 3.1)
VD (__v_cos, -3.1, 3.1)
VD (__v_exp, -9.9, 9.9)
VD (__v_log, 0.01, 11.1)
{"__v_pow", 'd', 'v', 0.01, 11.1, {.vd = xy__v_pow}},
VF (__v_dummyf, 1.0, 2.0)
VF (__v_expf, -9.9, 9.9)
VF (__v_expf_1u, -9.9, 9.9)
VF (__v_exp2f, -9.9, 9.9)
VF (__v_exp2f_1u, -9.9, 9.9)
VF (__v_logf, 0.01, 11.1)
{"__v_powf", 'f', 'v', 0.01, 11.1, {.vf = xy__v_powf}},
VF (__v_sinf, -3.1, 3.1)
VF (__v_cosf, -3.1, 3.1)
#ifdef __vpcs
VND (__vn_dummy, 1.0, 2.0)
VND (__vn_exp, -9.9, 9.9)
VND (_ZGVnN2v_exp, -9.9, 9.9)
VND (__vn_log, 0.01, 11.1)
VND (_ZGVnN2v_log, 0.01, 11.1)
{"__vn_pow", 'd', 'n', 0.01, 11.1, {.vnd = xy__vn_pow}},
{"_ZGVnN2vv_pow", 'd', 'n', 0.01, 11.1, {.vnd = xy_Z_pow}},
VND (__vn_sin, -3.1, 3.1)
VND (_ZGVnN2v_sin, -3.1, 3.1)
VND (__vn_cos, -3.1, 3.1)
VND (_ZGVnN2v_cos, -3.1, 3.1)
VNF (__vn_dummyf, 1.0, 2.0)
VNF (__vn_expf, -9.9, 9.9)
VNF (_ZGVnN4v_expf, -9.9, 9.9)
VNF (__vn_expf_1u, -9.9, 9.9)
VNF (__vn_exp2f, -9.9, 9.9)
VNF (_ZGVnN4v_exp2f, -9.9, 9.9)
VNF (__vn_exp2f_1u, -9.9, 9.9)
VNF (__vn_logf, 0.01, 11.1)
VNF (_ZGVnN4v_logf, 0.01, 11.1)
{"__vn_powf", 'f', 'n', 0.01, 11.1, {.vnf = xy__vn_powf}},
{"_ZGVnN4vv_powf", 'f', 'n', 0.01, 11.1, {.vnf = xy_Z_powf}},
VNF (__vn_sinf, -3.1, 3.1)
VNF (_ZGVnN4v_sinf, -3.1, 3.1)
VNF (__vn_cosf, -3.1, 3.1)
VNF (_ZGVnN4v_cosf, -3.1, 3.1)
#endif
#endif
#if WANT_SVE_MATH
SVD (__sv_dummy, 1.0, 2.0)
SVF (__sv_dummyf, 1.0, 2.0)
#endif
#include "test/mathbench_funcs.h"
{0},
#undef F
#undef D
#undef VF
#undef VD
#undef VNF
#undef VND
#undef SVF
#undef SVD
};
static void
@@ -301,6 +442,38 @@ runf_latency (float f (float))
prev = f (Af[i] + prev * z);
}
static void
run_v_thruput (v_double f (v_double))
{
for (int i = 0; i < N; i += v_double_len ())
f (v_double_load (A+i));
}
static void
runf_v_thruput (v_float f (v_float))
{
for (int i = 0; i < N; i += v_float_len ())
f (v_float_load (Af+i));
}
static void
run_v_latency (v_double f (v_double))
{
v_double z = v_double_dup (zero);
v_double prev = z;
for (int i = 0; i < N; i += v_double_len ())
prev = f (v_double_load (A+i) + prev * z);
}
static void
runf_v_latency (v_float f (v_float))
{
v_float z = v_float_dup (zero);
v_float prev = z;
for (int i = 0; i < N; i += v_float_len ())
prev = f (v_float_load (Af+i) + prev * z);
}
#ifdef __vpcs
static void
run_vn_thruput (__vpcs v_double f (v_double))
@@ -319,57 +492,19 @@ runf_vn_thruput (__vpcs v_float f (v_float))
static void
run_vn_latency (__vpcs v_double f (v_double))
{
volatile uint64x2_t vsel = (uint64x2_t) { 0, 0 };
uint64x2_t sel = vsel;
v_double prev = v_double_dup (0);
v_double z = v_double_dup (zero);
v_double prev = z;
for (int i = 0; i < N; i += v_double_len ())
prev = f (vbslq_f64 (sel, prev, v_double_load (A+i)));
prev = f (v_double_load (A+i) + prev * z);
}
static void
runf_vn_latency (__vpcs v_float f (v_float))
{
volatile uint32x4_t vsel = (uint32x4_t) { 0, 0, 0, 0 };
uint32x4_t sel = vsel;
v_float prev = v_float_dup (0);
v_float z = v_float_dup (zero);
v_float prev = z;
for (int i = 0; i < N; i += v_float_len ())
prev = f (vbslq_f32 (sel, prev, v_float_load (Af+i)));
}
#endif
#if WANT_SVE_MATH
static void
run_sv_thruput (sv_double f (sv_double, sv_bool))
{
for (int i = 0; i < N; i += sv_double_len ())
f (sv_double_load (A+i), svptrue_b64 ());
}
static void
runf_sv_thruput (sv_float f (sv_float, sv_bool))
{
for (int i = 0; i < N; i += sv_float_len ())
f (sv_float_load (Af+i), svptrue_b32 ());
}
static void
run_sv_latency (sv_double f (sv_double, sv_bool))
{
volatile sv_bool vsel = svptrue_b64 ();
sv_bool sel = vsel;
sv_double prev = sv_double_dup (0);
for (int i = 0; i < N; i += sv_double_len ())
prev = f (svsel_f64 (sel, sv_double_load (A+i), prev), svptrue_b64 ());
}
static void
runf_sv_latency (sv_float f (sv_float, sv_bool))
{
volatile sv_bool vsel = svptrue_b32 ();
sv_bool sel = vsel;
sv_float prev = sv_float_dup (0);
for (int i = 0; i < N; i += sv_float_len ())
prev = f (svsel_f32 (sel, sv_float_load (Af+i), prev), svptrue_b32 ());
prev = f (v_float_load (Af+i) + prev * z);
}
#endif
@@ -404,10 +539,10 @@ bench1 (const struct fun *f, int type, double lo, double hi)
const char *s = type == 't' ? "rthruput" : "latency";
int vlen = 1;
if (f->vec == 'n')
vlen = f->prec == 'd' ? v_double_len() : v_float_len();
else if (f->vec == 's')
vlen = f->prec == 'd' ? sv_double_len() : sv_float_len();
if (f->vec && f->prec == 'd')
vlen = v_double_len();
else if (f->vec && f->prec == 'f')
vlen = v_float_len();
if (f->prec == 'd' && type == 't' && f->vec == 0)
TIMEIT (run_thruput, f->fun.d);
@@ -417,6 +552,14 @@ bench1 (const struct fun *f, int type, double lo, double hi)
TIMEIT (runf_thruput, f->fun.f);
else if (f->prec == 'f' && type == 'l' && f->vec == 0)
TIMEIT (runf_latency, f->fun.f);
else if (f->prec == 'd' && type == 't' && f->vec == 'v')
TIMEIT (run_v_thruput, f->fun.vd);
else if (f->prec == 'd' && type == 'l' && f->vec == 'v')
TIMEIT (run_v_latency, f->fun.vd);
else if (f->prec == 'f' && type == 't' && f->vec == 'v')
TIMEIT (runf_v_thruput, f->fun.vf);
else if (f->prec == 'f' && type == 'l' && f->vec == 'v')
TIMEIT (runf_v_latency, f->fun.vf);
#ifdef __vpcs
else if (f->prec == 'd' && type == 't' && f->vec == 'n')
TIMEIT (run_vn_thruput, f->fun.vnd);
@@ -427,32 +570,20 @@ bench1 (const struct fun *f, int type, double lo, double hi)
else if (f->prec == 'f' && type == 'l' && f->vec == 'n')
TIMEIT (runf_vn_latency, f->fun.vnf);
#endif
#if WANT_SVE_MATH
else if (f->prec == 'd' && type == 't' && f->vec == 's')
TIMEIT (run_sv_thruput, f->fun.svd);
else if (f->prec == 'd' && type == 'l' && f->vec == 's')
TIMEIT (run_sv_latency, f->fun.svd);
else if (f->prec == 'f' && type == 't' && f->vec == 's')
TIMEIT (runf_sv_thruput, f->fun.svf);
else if (f->prec == 'f' && type == 'l' && f->vec == 's')
TIMEIT (runf_sv_latency, f->fun.svf);
#endif
if (type == 't')
{
ns100 = (100 * dt + itercount * N / 2) / (itercount * N);
printf ("%9s %8s: %4u.%02u ns/elem %10llu ns in [%g %g] vlen %d\n",
f->name, s,
printf ("%9s %8s: %4u.%02u ns/elem %10llu ns in [%g %g]\n", f->name, s,
(unsigned) (ns100 / 100), (unsigned) (ns100 % 100),
(unsigned long long) dt, lo, hi, vlen);
(unsigned long long) dt, lo, hi);
}
else if (type == 'l')
{
ns100 = (100 * dt + itercount * N / vlen / 2) / (itercount * N / vlen);
printf ("%9s %8s: %4u.%02u ns/call %10llu ns in [%g %g] vlen %d\n",
f->name, s,
printf ("%9s %8s: %4u.%02u ns/call %10llu ns in [%g %g]\n", f->name, s,
(unsigned) (ns100 / 100), (unsigned) (ns100 % 100),
(unsigned long long) dt, lo, hi, vlen);
(unsigned long long) dt, lo, hi);
}
fflush (stdout);
}
-62
View File
@@ -1,62 +0,0 @@
/*
* Function entries for mathbench.
*
* Copyright (c) 2022-2023, Arm Limited.
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
*/
/* clang-format off */
D (exp, -9.9, 9.9)
D (exp, 0.5, 1.0)
D (exp10, -9.9, 9.9)
D (exp2, -9.9, 9.9)
D (log, 0.01, 11.1)
D (log, 0.999, 1.001)
D (log2, 0.01, 11.1)
D (log2, 0.999, 1.001)
{"pow", 'd', 0, 0.01, 11.1, {.d = xypow}},
D (xpow, 0.01, 11.1)
D (ypow, -9.9, 9.9)
D (erf, -6.0, 6.0)
F (expf, -9.9, 9.9)
F (exp2f, -9.9, 9.9)
F (logf, 0.01, 11.1)
F (log2f, 0.01, 11.1)
{"powf", 'f', 0, 0.01, 11.1, {.f = xypowf}},
F (xpowf, 0.01, 11.1)
F (ypowf, -9.9, 9.9)
{"sincosf", 'f', 0, 0.1, 0.7, {.f = sincosf_wrap}},
{"sincosf", 'f', 0, 0.8, 3.1, {.f = sincosf_wrap}},
{"sincosf", 'f', 0, -3.1, 3.1, {.f = sincosf_wrap}},
{"sincosf", 'f', 0, 3.3, 33.3, {.f = sincosf_wrap}},
{"sincosf", 'f', 0, 100, 1000, {.f = sincosf_wrap}},
{"sincosf", 'f', 0, 1e6, 1e32, {.f = sincosf_wrap}},
F (sinf, 0.1, 0.7)
F (sinf, 0.8, 3.1)
F (sinf, -3.1, 3.1)
F (sinf, 3.3, 33.3)
F (sinf, 100, 1000)
F (sinf, 1e6, 1e32)
F (cosf, 0.1, 0.7)
F (cosf, 0.8, 3.1)
F (cosf, -3.1, 3.1)
F (cosf, 3.3, 33.3)
F (cosf, 100, 1000)
F (cosf, 1e6, 1e32)
F (erff, -4.0, 4.0)
#ifdef __vpcs
VND (_ZGVnN2v_exp, -9.9, 9.9)
VND (_ZGVnN2v_log, 0.01, 11.1)
{"_ZGVnN2vv_pow", 'd', 'n', 0.01, 11.1, {.vnd = xy_Z_pow}},
VND (_ZGVnN2v_sin, -3.1, 3.1)
VND (_ZGVnN2v_cos, -3.1, 3.1)
VNF (_ZGVnN4v_expf, -9.9, 9.9)
VNF (_ZGVnN4v_expf_1u, -9.9, 9.9)
VNF (_ZGVnN4v_exp2f, -9.9, 9.9)
VNF (_ZGVnN4v_exp2f_1u, -9.9, 9.9)
VNF (_ZGVnN4v_logf, 0.01, 11.1)
{"_ZGVnN4vv_powf", 'f', 'n', 0.01, 11.1, {.vnf = xy_Z_powf}},
VNF (_ZGVnN4v_sinf, -3.1, 3.1)
VNF (_ZGVnN4v_cosf, -3.1, 3.1)
#endif
/* clang-format on */
-66
View File
@@ -1,66 +0,0 @@
/*
* Function wrappers for mathbench.
*
* Copyright (c) 2022-2023, Arm Limited.
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
*/
#ifdef __vpcs
__vpcs static v_float
xy_Z_powf (v_float x)
{
return _ZGVnN4vv_powf (x, x);
}
__vpcs static v_double
xy_Z_pow (v_double x)
{
return _ZGVnN2vv_pow (x, x);
}
#endif
static double
xypow (double x)
{
return pow (x, x);
}
static float
xypowf (float x)
{
return powf (x, x);
}
static double
xpow (double x)
{
return pow (x, 23.4);
}
static float
xpowf (float x)
{
return powf (x, 23.4f);
}
static double
ypow (double x)
{
return pow (2.34, x);
}
static float
ypowf (float x)
{
return powf (2.34f, x);
}
static float
sincosf_wrap (float x)
{
float s, c;
sincosf (x, &s, &c);
return s + c;
}
+4 -12
View File
@@ -1,8 +1,8 @@
/*
* mathtest.c - test rig for mathlib
*
* Copyright (c) 1998-2022, Arm Limited.
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
* Copyright (c) 1998-2019, Arm Limited.
* SPDX-License-Identifier: MIT
*/
#include <assert.h>
@@ -196,11 +196,9 @@ int is_complex_rettype(int rettype) {
#define TFUNCARM(arg,ret,name,tolerance) { t_func, arg, ret, (void*)& ARM_PREFIX(name), m_none, tolerance, #name }
#define MFUNC(arg,ret,name,tolerance) { t_macro, arg, ret, NULL, m_##name, tolerance, #name }
#ifndef PL
/* sincosf wrappers for easier testing. */
static float sincosf_sinf(float x) { float s,c; sincosf(x, &s, &c); return s; }
static float sincosf_cosf(float x) { float s,c; sincosf(x, &s, &c); return c; }
#endif
test_func tfuncs[] = {
/* trigonometric */
@@ -220,10 +218,9 @@ test_func tfuncs[] = {
TFUNCARM(at_s,rt_s, tanf, 4*ULPUNIT),
TFUNCARM(at_s,rt_s, sinf, 3*ULPUNIT/4),
TFUNCARM(at_s,rt_s, cosf, 3*ULPUNIT/4),
#ifndef PL
TFUNCARM(at_s,rt_s, sincosf_sinf, 3*ULPUNIT/4),
TFUNCARM(at_s,rt_s, sincosf_cosf, 3*ULPUNIT/4),
#endif
/* hyperbolic */
TFUNC(at_d, rt_d, atanh, 4*ULPUNIT),
TFUNC(at_d, rt_d, asinh, 4*ULPUNIT),
@@ -254,7 +251,6 @@ test_func tfuncs[] = {
TFUNCARM(at_s,rt_s, expf, 3*ULPUNIT/4),
TFUNCARM(at_s,rt_s, exp2f, 3*ULPUNIT/4),
TFUNC(at_s,rt_s, expm1f, ULPUNIT),
TFUNC(at_d,rt_d, exp10, ULPUNIT),
/* power */
TFUNC(at_d2,rt_d, pow, 3*ULPUNIT/4),
@@ -1022,7 +1018,6 @@ int runtest(testdetail t) {
DO_DOP(d_arg1,op1r);
DO_DOP(d_arg2,op2r);
s_arg1.i = t.op1r[0]; s_arg2.i = t.op2r[0];
s_res.i = 0;
/*
* Detect NaNs, infinities and denormals on input, and set a
@@ -1157,25 +1152,22 @@ int runtest(testdetail t) {
tresultr[0] = t.resultr[0];
tresultr[1] = t.resultr[1];
resultr[0] = d_res.i[dmsd]; resultr[1] = d_res.i[dlsd];
resulti[0] = resulti[1] = 0;
wres = 2;
break;
case rt_i:
tresultr[0] = t.resultr[0];
resultr[0] = intres;
resulti[0] = 0;
wres = 1;
break;
case rt_s:
case rt_s2:
tresultr[0] = t.resultr[0];
resultr[0] = s_res.i;
resulti[0] = 0;
wres = 1;
break;
default:
puts("unhandled rettype in runtest");
abort ();
wres = 0;
}
if(t.resultc != rc_none) {
int err = 0;
+1 -1
View File
@@ -2,7 +2,7 @@
* dotest.c - actually generate mathlib test cases
*
* Copyright (c) 1999-2019, Arm Limited.
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
* SPDX-License-Identifier: MIT
*/
#include <stdio.h>
+1 -1
View File
@@ -2,7 +2,7 @@
* intern.h
*
* Copyright (c) 1999-2019, Arm Limited.
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
* SPDX-License-Identifier: MIT
*/
#ifndef mathtest_intern_h
+1 -1
View File
@@ -2,7 +2,7 @@
* main.c
*
* Copyright (c) 1999-2019, Arm Limited.
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
* SPDX-License-Identifier: MIT
*/
#include <assert.h>
+1 -1
View File
@@ -2,7 +2,7 @@
* random.c - random number generator for producing mathlib test cases
*
* Copyright (c) 1998-2019, Arm Limited.
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
* SPDX-License-Identifier: MIT
*/
#include "types.h"
+1 -1
View File
@@ -2,7 +2,7 @@
* random.h - header for random.c
*
* Copyright (c) 2009-2019, Arm Limited.
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
* SPDX-License-Identifier: MIT
*/
#include "types.h"
+1 -1
View File
@@ -2,7 +2,7 @@
* semi.c: test implementations of mathlib seminumerical functions
*
* Copyright (c) 1999-2019, Arm Limited.
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
* SPDX-License-Identifier: MIT
*/
#include <stdio.h>
+1 -1
View File
@@ -2,7 +2,7 @@
* semi.h: header for semi.c
*
* Copyright (c) 1999-2019, Arm Limited.
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
* SPDX-License-Identifier: MIT
*/
#ifndef test_semi_h
+1 -1
View File
@@ -2,7 +2,7 @@
* types.h
*
* Copyright (c) 2005-2019, Arm Limited.
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
* SPDX-License-Identifier: MIT
*/
#ifndef mathtest_types_h
+1 -1
View File
@@ -2,7 +2,7 @@
* wrappers.c - wrappers to modify output of MPFR/MPC test functions
*
* Copyright (c) 2014-2019, Arm Limited.
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
* SPDX-License-Identifier: MIT
*/
#include <assert.h>
+1 -1
View File
@@ -2,7 +2,7 @@
* wrappers.h - wrappers to modify output of MPFR/MPC test functions
*
* Copyright (c) 2014-2019, Arm Limited.
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
* SPDX-License-Identifier: MIT
*/
typedef struct {
+80 -47
View File
@@ -2,8 +2,8 @@
# ULP error check script.
#
# Copyright (c) 2019-2023, Arm Limited.
# SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
# Copyright (c) 2019-2020, Arm Limited.
# SPDX-License-Identifier: MIT
#set -x
set -eu
@@ -72,16 +72,6 @@ t pow 0x1.ffffffffffff0p-1 0x1.0000000000008p0 x 0x1p60 0x1p68 50000
t pow 0x1.ffffffffff000p-1 0x1p0 x 0x1p50 0x1p52 50000
t pow -0x1.ffffffffff000p-1 -0x1p0 x 0x1p50 0x1p52 50000
L=0.02
t exp10 0 0x1p-47 5000
t exp10 -0 -0x1p-47 5000
t exp10 0x1p-47 1 50000
t exp10 -0x1p-47 -1 50000
t exp10 1 0x1.34413509f79ffp8 50000
t exp10 -1 -0x1.434e6420f4374p8 50000
t exp10 0x1.34413509f79ffp8 inf 5000
t exp10 -0x1.434e6420f4374p8 -inf 5000
L=1.0
Ldir=0.9
t erf 0 0xffff000000000000 10000
@@ -153,10 +143,15 @@ Ldir=0.5
done
# vector functions
Ldir=0.5
r='n'
flags="${ULPFLAGS:--q}"
flags="${ULPFLAGS:--q} -f"
runs=
check __s_exp 1 && runs=1
runv=
check __v_exp 1 && runv=1
runvn=
check __vn_exp 1 && runvn=1
range_exp='
0 0xffff000000000000 10000
@@ -182,10 +177,9 @@ range_pow='
'
range_sin='
0 0x1p23 500000
-0 -0x1p23 500000
0x1p23 inf 10000
-0x1p23 -inf 10000
0 0xffff000000000000 10000
0x1p-4 0x1p4 400000
-0x1p-23 0x1p23 400000
'
range_cos="$range_sin"
@@ -205,10 +199,9 @@ range_logf='
'
range_sinf='
0 0x1p20 500000
-0 -0x1p20 500000
0x1p20 inf 10000
-0x1p20 -inf 10000
0 0xffff0000 10000
0x1p-4 0x1p4 300000
-0x1p-9 -0x1p9 300000
'
range_cosf="$range_sinf"
@@ -236,8 +229,9 @@ L_sinf=1.4
L_cosf=1.4
L_powf=2.1
while read G F D
while read G F R
do
[ "$R" = 1 ] || continue
case "$G" in \#*) continue ;; esac
eval range="\${range_$G}"
eval L="\${L_$G}"
@@ -245,35 +239,74 @@ do
do
[ -n "$X" ] || continue
case "$X" in \#*) continue ;; esac
disable_fenv=""
if [ -z "$WANT_SIMD_EXCEPT" ] || [ $WANT_SIMD_EXCEPT -eq 0 ]; then
# If library was built with SIMD exceptions
# disabled, disable fenv checking in ulp
# tool. Otherwise, fenv checking may still be
# disabled by adding -f to the end of the run
# line.
disable_fenv="-f"
fi
t $D $disable_fenv $F $X
t $F $X
done << EOF
$range
EOF
done << EOF
# group symbol run
exp _ZGVnN2v_exp
log _ZGVnN2v_log
pow _ZGVnN2vv_pow -f
sin _ZGVnN2v_sin -z
cos _ZGVnN2v_cos
expf _ZGVnN4v_expf
expf_1u _ZGVnN4v_expf_1u -f
exp2f _ZGVnN4v_exp2f
exp2f_1u _ZGVnN4v_exp2f_1u -f
logf _ZGVnN4v_logf
sinf _ZGVnN4v_sinf -z
cosf _ZGVnN4v_cosf
powf _ZGVnN4vv_powf -f
exp __s_exp $runs
exp __v_exp $runv
exp __vn_exp $runvn
exp _ZGVnN2v_exp $runvn
log __s_log $runs
log __v_log $runv
log __vn_log $runvn
log _ZGVnN2v_log $runvn
pow __s_pow $runs
pow __v_pow $runv
pow __vn_pow $runvn
pow _ZGVnN2vv_pow $runvn
sin __s_sin $runs
sin __v_sin $runv
sin __vn_sin $runvn
sin _ZGVnN2v_sin $runvn
cos __s_cos $runs
cos __v_cos $runv
cos __vn_cos $runvn
cos _ZGVnN2v_cos $runvn
expf __s_expf $runs
expf __v_expf $runv
expf __vn_expf $runvn
expf _ZGVnN4v_expf $runvn
expf_1u __s_expf_1u $runs
expf_1u __v_expf_1u $runv
expf_1u __vn_expf_1u $runvn
exp2f __s_exp2f $runs
exp2f __v_exp2f $runv
exp2f __vn_exp2f $runvn
exp2f _ZGVnN4v_exp2f $runvn
exp2f_1u __s_exp2f_1u $runs
exp2f_1u __v_exp2f_1u $runv
exp2f_1u __vn_exp2f_1u $runvn
logf __s_logf $runs
logf __v_logf $runv
logf __vn_logf $runvn
logf _ZGVnN4v_logf $runvn
sinf __s_sinf $runs
sinf __v_sinf $runv
sinf __vn_sinf $runvn
sinf _ZGVnN4v_sinf $runvn
cosf __s_cosf $runs
cosf __v_cosf $runv
cosf __vn_cosf $runvn
cosf _ZGVnN4v_cosf $runvn
powf __s_powf $runs
powf __v_powf $runv
powf __vn_powf $runvn
powf _ZGVnN4vv_powf $runvn
EOF
[ 0 -eq $FAIL ] || {
+1 -1
View File
@@ -1,7 +1,7 @@
; cosf.tst - Directed test cases for SP cosine
;
; Copyright (c) 2007-2019, Arm Limited.
; SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
; SPDX-License-Identifier: MIT
func=cosf op1=7fc00001 result=7fc00001 errno=0
func=cosf op1=ffc00001 result=7fc00001 errno=0
+1 -1
View File
@@ -1,7 +1,7 @@
; erf.tst - Directed test cases for erf
;
; Copyright (c) 2007-2020, Arm Limited.
; SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
; SPDX-License-Identifier: MIT
func=erf op1=7ff80000.00000001 result=7ff80000.00000001 errno=0
func=erf op1=fff80000.00000001 result=7ff80000.00000001 errno=0
+1 -1
View File
@@ -1,7 +1,7 @@
; erff.tst
;
; Copyright (c) 2007-2020, Arm Limited.
; SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
; SPDX-License-Identifier: MIT
func=erff op1=7fc00001 result=7fc00001 errno=0
func=erff op1=ffc00001 result=7fc00001 errno=0
+1 -1
View File
@@ -1,7 +1,7 @@
; Directed test cases for exp
;
; Copyright (c) 2018-2019, Arm Limited.
; SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
; SPDX-License-Identifier: MIT
func=exp op1=7ff80000.00000001 result=7ff80000.00000001 errno=0
func=exp op1=fff80000.00000001 result=7ff80000.00000001 errno=0
-15
View File
@@ -1,15 +0,0 @@
; Directed test cases for exp10
;
; Copyright (c) 2023, Arm Limited.
; SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
func=exp10 op1=7ff80000.00000001 result=7ff80000.00000001 errno=0
func=exp10 op1=fff80000.00000001 result=7ff80000.00000001 errno=0
func=exp10 op1=7ff00000.00000001 result=7ff80000.00000001 errno=0 status=i
func=exp10 op1=fff00000.00000001 result=7ff80000.00000001 errno=0 status=i
func=exp10 op1=7ff00000.00000000 result=7ff00000.00000000 errno=0
func=exp10 op1=7fefffff.ffffffff result=7ff00000.00000000 errno=ERANGE status=ox
func=exp10 op1=fff00000.00000000 result=00000000.00000000 errno=0
func=exp10 op1=ffefffff.ffffffff result=00000000.00000000 errno=ERANGE status=ux
func=exp10 op1=00000000.00000000 result=3ff00000.00000000 errno=0
func=exp10 op1=80000000.00000000 result=3ff00000.00000000 errno=0
+1 -1
View File
@@ -1,7 +1,7 @@
; Directed test cases for exp2
;
; Copyright (c) 2018-2019, Arm Limited.
; SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
; SPDX-License-Identifier: MIT
func=exp2 op1=7ff80000.00000001 result=7ff80000.00000001 errno=0
func=exp2 op1=fff80000.00000001 result=7ff80000.00000001 errno=0
+1 -1
View File
@@ -1,7 +1,7 @@
; exp2f.tst - Directed test cases for exp2f
;
; Copyright (c) 2017-2019, Arm Limited.
; SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
; SPDX-License-Identifier: MIT
func=exp2f op1=7fc00001 result=7fc00001 errno=0
func=exp2f op1=ffc00001 result=7fc00001 errno=0
+1 -1
View File
@@ -1,7 +1,7 @@
; expf.tst - Directed test cases for expf
;
; Copyright (c) 2007-2019, Arm Limited.
; SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
; SPDX-License-Identifier: MIT
func=expf op1=7fc00001 result=7fc00001 errno=0
func=expf op1=ffc00001 result=7fc00001 errno=0
+1 -1
View File
@@ -1,7 +1,7 @@
; Directed test cases for log
;
; Copyright (c) 2018-2019, Arm Limited.
; SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
; SPDX-License-Identifier: MIT
func=log op1=7ff80000.00000001 result=7ff80000.00000001 errno=0
func=log op1=fff80000.00000001 result=7ff80000.00000001 errno=0
+1 -1
View File
@@ -1,7 +1,7 @@
; Directed test cases for log2
;
; Copyright (c) 2018-2019, Arm Limited.
; SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
; SPDX-License-Identifier: MIT
func=log2 op1=7ff80000.00000001 result=7ff80000.00000001 errno=0
func=log2 op1=fff80000.00000001 result=7ff80000.00000001 errno=0
+1 -1
View File
@@ -1,7 +1,7 @@
; log2f.tst - Directed test cases for log2f
;
; Copyright (c) 2017-2019, Arm Limited.
; SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
; SPDX-License-Identifier: MIT
func=log2f op1=7fc00001 result=7fc00001 errno=0
func=log2f op1=ffc00001 result=7fc00001 errno=0
+1 -1
View File
@@ -1,7 +1,7 @@
; logf.tst - Directed test cases for logf
;
; Copyright (c) 2007-2019, Arm Limited.
; SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
; SPDX-License-Identifier: MIT
func=logf op1=7fc00001 result=7fc00001 errno=0
func=logf op1=ffc00001 result=7fc00001 errno=0
+1 -1
View File
@@ -1,7 +1,7 @@
; Directed test cases for pow
;
; Copyright (c) 2018-2019, Arm Limited.
; SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
; SPDX-License-Identifier: MIT
func=pow op1=00000000.00000000 op2=00000000.00000000 result=3ff00000.00000000 errno=0
func=pow op1=00000000.00000000 op2=00000000.00000001 result=00000000.00000000 errno=0
+1 -1
View File
@@ -1,7 +1,7 @@
; powf.tst - Directed test cases for powf
;
; Copyright (c) 2007-2019, Arm Limited.
; SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
; SPDX-License-Identifier: MIT
func=powf op1=7f800001 op2=7f800001 result=7fc00001 errno=0 status=i
func=powf op1=7f800001 op2=ff800001 result=7fc00001 errno=0 status=i
+1 -1
View File
@@ -1,7 +1,7 @@
; Directed test cases for SP sincos
;
; Copyright (c) 2007-2019, Arm Limited.
; SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
; SPDX-License-Identifier: MIT
func=sincosf_sinf op1=7fc00001 result=7fc00001 errno=0
+1 -1
View File
@@ -1,7 +1,7 @@
; sinf.tst - Directed test cases for SP sine
;
; Copyright (c) 2007-2019, Arm Limited.
; SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
; SPDX-License-Identifier: MIT
func=sinf op1=7fc00001 result=7fc00001 errno=0
+1 -1
View File
@@ -1,7 +1,7 @@
!! double.tst - Random test case specification for DP functions
!!
!! Copyright (c) 1999-2019, Arm Limited.
!! SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
!! SPDX-License-Identifier: MIT
test exp 10000
test exp2 10000

Some files were not shown because too many files have changed in this diff Show More