mirror of
https://github.com/openharmony/third_party_optimized_routines.git
synced 2026-07-01 06:42:05 -04:00
Merge pull request !23 from openharmony_ci/revert-merge-22-master
This commit is contained in:
@@ -1,11 +1,6 @@
|
||||
MIT OR Apache-2.0 WITH LLVM-exception
|
||||
=====================================
|
||||
|
||||
|
||||
MIT License
|
||||
-----------
|
||||
|
||||
Copyright (c) 1999-2022, Arm Limited.
|
||||
Copyright (c) 1999-2019, Arm Limited.
|
||||
|
||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
of this software and associated documentation files (the "Software"), to deal
|
||||
@@ -24,226 +19,3 @@ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
SOFTWARE.
|
||||
|
||||
|
||||
Apache-2.0 WITH LLVM-exception
|
||||
------------------------------
|
||||
|
||||
Apache License
|
||||
Version 2.0, January 2004
|
||||
http://www.apache.org/licenses/
|
||||
|
||||
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
|
||||
|
||||
1. Definitions.
|
||||
|
||||
"License" shall mean the terms and conditions for use, reproduction,
|
||||
and distribution as defined by Sections 1 through 9 of this document.
|
||||
|
||||
"Licensor" shall mean the copyright owner or entity authorized by
|
||||
the copyright owner that is granting the License.
|
||||
|
||||
"Legal Entity" shall mean the union of the acting entity and all
|
||||
other entities that control, are controlled by, or are under common
|
||||
control with that entity. For the purposes of this definition,
|
||||
"control" means (i) the power, direct or indirect, to cause the
|
||||
direction or management of such entity, whether by contract or
|
||||
otherwise, or (ii) ownership of fifty percent (50%) or more of the
|
||||
outstanding shares, or (iii) beneficial ownership of such entity.
|
||||
|
||||
"You" (or "Your") shall mean an individual or Legal Entity
|
||||
exercising permissions granted by this License.
|
||||
|
||||
"Source" form shall mean the preferred form for making modifications,
|
||||
including but not limited to software source code, documentation
|
||||
source, and configuration files.
|
||||
|
||||
"Object" form shall mean any form resulting from mechanical
|
||||
transformation or translation of a Source form, including but
|
||||
not limited to compiled object code, generated documentation,
|
||||
and conversions to other media types.
|
||||
|
||||
"Work" shall mean the work of authorship, whether in Source or
|
||||
Object form, made available under the License, as indicated by a
|
||||
copyright notice that is included in or attached to the work
|
||||
(an example is provided in the Appendix below).
|
||||
|
||||
"Derivative Works" shall mean any work, whether in Source or Object
|
||||
form, that is based on (or derived from) the Work and for which the
|
||||
editorial revisions, annotations, elaborations, or other modifications
|
||||
represent, as a whole, an original work of authorship. For the purposes
|
||||
of this License, Derivative Works shall not include works that remain
|
||||
separable from, or merely link (or bind by name) to the interfaces of,
|
||||
the Work and Derivative Works thereof.
|
||||
|
||||
"Contribution" shall mean any work of authorship, including
|
||||
the original version of the Work and any modifications or additions
|
||||
to that Work or Derivative Works thereof, that is intentionally
|
||||
submitted to Licensor for inclusion in the Work by the copyright owner
|
||||
or by an individual or Legal Entity authorized to submit on behalf of
|
||||
the copyright owner. For the purposes of this definition, "submitted"
|
||||
means any form of electronic, verbal, or written communication sent
|
||||
to the Licensor or its representatives, including but not limited to
|
||||
communication on electronic mailing lists, source code control systems,
|
||||
and issue tracking systems that are managed by, or on behalf of, the
|
||||
Licensor for the purpose of discussing and improving the Work, but
|
||||
excluding communication that is conspicuously marked or otherwise
|
||||
designated in writing by the copyright owner as "Not a Contribution."
|
||||
|
||||
"Contributor" shall mean Licensor and any individual or Legal Entity
|
||||
on behalf of whom a Contribution has been received by Licensor and
|
||||
subsequently incorporated within the Work.
|
||||
|
||||
2. Grant of Copyright License. Subject to the terms and conditions of
|
||||
this License, each Contributor hereby grants to You a perpetual,
|
||||
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
||||
copyright license to reproduce, prepare Derivative Works of,
|
||||
publicly display, publicly perform, sublicense, and distribute the
|
||||
Work and such Derivative Works in Source or Object form.
|
||||
|
||||
3. Grant of Patent License. Subject to the terms and conditions of
|
||||
this License, each Contributor hereby grants to You a perpetual,
|
||||
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
||||
(except as stated in this section) patent license to make, have made,
|
||||
use, offer to sell, sell, import, and otherwise transfer the Work,
|
||||
where such license applies only to those patent claims licensable
|
||||
by such Contributor that are necessarily infringed by their
|
||||
Contribution(s) alone or by combination of their Contribution(s)
|
||||
with the Work to which such Contribution(s) was submitted. If You
|
||||
institute patent litigation against any entity (including a
|
||||
cross-claim or counterclaim in a lawsuit) alleging that the Work
|
||||
or a Contribution incorporated within the Work constitutes direct
|
||||
or contributory patent infringement, then any patent licenses
|
||||
granted to You under this License for that Work shall terminate
|
||||
as of the date such litigation is filed.
|
||||
|
||||
4. Redistribution. You may reproduce and distribute copies of the
|
||||
Work or Derivative Works thereof in any medium, with or without
|
||||
modifications, and in Source or Object form, provided that You
|
||||
meet the following conditions:
|
||||
|
||||
(a) You must give any other recipients of the Work or
|
||||
Derivative Works a copy of this License; and
|
||||
|
||||
(b) You must cause any modified files to carry prominent notices
|
||||
stating that You changed the files; and
|
||||
|
||||
(c) You must retain, in the Source form of any Derivative Works
|
||||
that You distribute, all copyright, patent, trademark, and
|
||||
attribution notices from the Source form of the Work,
|
||||
excluding those notices that do not pertain to any part of
|
||||
the Derivative Works; and
|
||||
|
||||
(d) If the Work includes a "NOTICE" text file as part of its
|
||||
distribution, then any Derivative Works that You distribute must
|
||||
include a readable copy of the attribution notices contained
|
||||
within such NOTICE file, excluding those notices that do not
|
||||
pertain to any part of the Derivative Works, in at least one
|
||||
of the following places: within a NOTICE text file distributed
|
||||
as part of the Derivative Works; within the Source form or
|
||||
documentation, if provided along with the Derivative Works; or,
|
||||
within a display generated by the Derivative Works, if and
|
||||
wherever such third-party notices normally appear. The contents
|
||||
of the NOTICE file are for informational purposes only and
|
||||
do not modify the License. You may add Your own attribution
|
||||
notices within Derivative Works that You distribute, alongside
|
||||
or as an addendum to the NOTICE text from the Work, provided
|
||||
that such additional attribution notices cannot be construed
|
||||
as modifying the License.
|
||||
|
||||
You may add Your own copyright statement to Your modifications and
|
||||
may provide additional or different license terms and conditions
|
||||
for use, reproduction, or distribution of Your modifications, or
|
||||
for any such Derivative Works as a whole, provided Your use,
|
||||
reproduction, and distribution of the Work otherwise complies with
|
||||
the conditions stated in this License.
|
||||
|
||||
5. Submission of Contributions. Unless You explicitly state otherwise,
|
||||
any Contribution intentionally submitted for inclusion in the Work
|
||||
by You to the Licensor shall be under the terms and conditions of
|
||||
this License, without any additional terms or conditions.
|
||||
Notwithstanding the above, nothing herein shall supersede or modify
|
||||
the terms of any separate license agreement you may have executed
|
||||
with Licensor regarding such Contributions.
|
||||
|
||||
6. Trademarks. This License does not grant permission to use the trade
|
||||
names, trademarks, service marks, or product names of the Licensor,
|
||||
except as required for reasonable and customary use in describing the
|
||||
origin of the Work and reproducing the content of the NOTICE file.
|
||||
|
||||
7. Disclaimer of Warranty. Unless required by applicable law or
|
||||
agreed to in writing, Licensor provides the Work (and each
|
||||
Contributor provides its Contributions) on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
||||
implied, including, without limitation, any warranties or conditions
|
||||
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
|
||||
PARTICULAR PURPOSE. You are solely responsible for determining the
|
||||
appropriateness of using or redistributing the Work and assume any
|
||||
risks associated with Your exercise of permissions under this License.
|
||||
|
||||
8. Limitation of Liability. In no event and under no legal theory,
|
||||
whether in tort (including negligence), contract, or otherwise,
|
||||
unless required by applicable law (such as deliberate and grossly
|
||||
negligent acts) or agreed to in writing, shall any Contributor be
|
||||
liable to You for damages, including any direct, indirect, special,
|
||||
incidental, or consequential damages of any character arising as a
|
||||
result of this License or out of the use or inability to use the
|
||||
Work (including but not limited to damages for loss of goodwill,
|
||||
work stoppage, computer failure or malfunction, or any and all
|
||||
other commercial damages or losses), even if such Contributor
|
||||
has been advised of the possibility of such damages.
|
||||
|
||||
9. Accepting Warranty or Additional Liability. While redistributing
|
||||
the Work or Derivative Works thereof, You may choose to offer,
|
||||
and charge a fee for, acceptance of support, warranty, indemnity,
|
||||
or other liability obligations and/or rights consistent with this
|
||||
License. However, in accepting such obligations, You may act only
|
||||
on Your own behalf and on Your sole responsibility, not on behalf
|
||||
of any other Contributor, and only if You agree to indemnify,
|
||||
defend, and hold each Contributor harmless for any liability
|
||||
incurred by, or claims asserted against, such Contributor by reason
|
||||
of your accepting any such warranty or additional liability.
|
||||
|
||||
END OF TERMS AND CONDITIONS
|
||||
|
||||
APPENDIX: How to apply the Apache License to your work.
|
||||
|
||||
To apply the Apache License to your work, attach the following
|
||||
boilerplate notice, with the fields enclosed by brackets "[]"
|
||||
replaced with your own identifying information. (Don't include
|
||||
the brackets!) The text should be enclosed in the appropriate
|
||||
comment syntax for the file format. We also recommend that a
|
||||
file or class name and description of purpose be included on the
|
||||
same "printed page" as the copyright notice for easier
|
||||
identification within third-party archives.
|
||||
|
||||
Copyright [yyyy] [name of copyright owner]
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
|
||||
|
||||
--- LLVM Exceptions to the Apache 2.0 License ----
|
||||
|
||||
As an exception, if, as a result of your compiling your source code, portions
|
||||
of this Software are embedded into an Object form of such source code, you
|
||||
may redistribute such embedded portions in such Object form without complying
|
||||
with the conditions of Sections 4(a), 4(b) and 4(d) of the License.
|
||||
|
||||
In addition, if you combine or link compiled forms of this Software with
|
||||
software that is licensed under the GPLv2 ("Combined Software") and if a
|
||||
court of competent jurisdiction determines that the patent provision (Section
|
||||
3), the indemnity provision (Section 9) or other Section of the License
|
||||
conflicts with the conditions of the GPLv2, you may retroactively and
|
||||
prospectively choose to deem waived or otherwise exclude such Section(s) of
|
||||
the License, but only in their entirety and only with respect to the Combined
|
||||
Software.
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
# Makefile - requires GNU make
|
||||
#
|
||||
# Copyright (c) 2018-2022, Arm Limited.
|
||||
# SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
# Copyright (c) 2018-2020, Arm Limited.
|
||||
# SPDX-License-Identifier: MIT
|
||||
|
||||
srcdir = .
|
||||
prefix = /usr
|
||||
@@ -11,7 +11,6 @@ includedir = $(prefix)/include
|
||||
|
||||
# Configure these in config.mk, do not make changes in this file.
|
||||
SUBS = math string networking
|
||||
PLSUBS = math
|
||||
HOST_CC = cc
|
||||
HOST_CFLAGS = -std=c99 -O2
|
||||
HOST_LDFLAGS =
|
||||
@@ -21,7 +20,6 @@ CPPFLAGS =
|
||||
CFLAGS = -std=c99 -O2
|
||||
CFLAGS_SHARED = -fPIC
|
||||
CFLAGS_ALL = -Ibuild/include $(CPPFLAGS) $(CFLAGS)
|
||||
CFLAGS_PL = -Ibuild/pl/include $(CPPFLAGS) $(CFLAGS) -DPL
|
||||
LDFLAGS =
|
||||
LDLIBS =
|
||||
AR = $(CROSS_COMPILE)ar
|
||||
@@ -53,7 +51,6 @@ $(DIRS):
|
||||
mkdir -p $@
|
||||
|
||||
$(filter %.os,$(ALL_FILES)): CFLAGS_ALL += $(CFLAGS_SHARED)
|
||||
$(filter %.os,$(ALL_FILES)): CFLAGS_PL += $(CFLAGS_SHARED)
|
||||
|
||||
build/%.o: $(srcdir)/%.S
|
||||
$(CC) $(CFLAGS_ALL) -c -o $@ $<
|
||||
|
||||
@@ -19,7 +19,7 @@
|
||||
policylist:
|
||||
1. policy: If the OAT-Default.xml policies do not meet your requirements, please add policies here.
|
||||
2. policyitem: The fields type, name, path, desc is required, and the fields rule, group, filefilter is optional,the default value is:
|
||||
<policyitem type="" name="" path="" desc="" rule="may" filefilter="defaultPolicyFilter"/>
|
||||
<policyitem type="" name="" path="" desc="" rule="may" group="defaultGroup" filefilter="defaultPolicyFilter"/>
|
||||
3. policyitem type:
|
||||
"compatibility" is used to check license compatibility in the specified path;
|
||||
"license" is used to check source license header in the specified path;
|
||||
@@ -49,43 +49,10 @@ All configurations in this file will be merged to OAT-Default.xml, if you have a
|
||||
|
||||
<configuration>
|
||||
<oatconfig>
|
||||
<licensefile></licensefile>
|
||||
<policylist>
|
||||
<policy>
|
||||
<policyitem type="license" name="MIT" path=".*" desc="兼容license"/>
|
||||
<policyitem type="compatibility" name="GPL-2.0+-with-LLVM-exception" path="math/.*" rule="may" filefilter="defaultPolicyFilter" desc="Other process calls"/>
|
||||
<policyitem type="compatibility" name="GPL-2.0+-with-LLVM-exception" path="networking/.*" rule="may" filefilter="defaultPolicyFilter" desc="Other process calls"/>
|
||||
<policyitem type="compatibility" name="GPL-2.0+-with-LLVM-exception" path="string/aarch64/.*" rule="may" filefilter="defaultPolicyFilter" desc="Other process calls"/>
|
||||
<policyitem type="compatibility" name="GPL-2.0+-with-LLVM-exception" path="string/include/.*" rule="may" filefilter="defaultPolicyFilter" desc="Other process calls"/>
|
||||
<policyitem type="compatibility" name="GPL-2.0+-with-LLVM-exception" path="string/bench/.*" rule="may" filefilter="defaultPolicyFilter" desc="Other process calls"/>
|
||||
<policyitem type="compatibility" name="GPL-2.0+-with-LLVM-exception" path="string/test/.*" rule="may" filefilter="defaultPolicyFilter" desc="Other process calls"/>
|
||||
<policyitem type="compatibility" name="GPL-2.0+-with-LLVM-exception" path="string/x86_64/.*" rule="may" filefilter="defaultPolicyFilter" desc="Other process calls"/>
|
||||
<policyitem type="compatibility" name="GPL-2.0+-with-LLVM-exception" path="string/Dir.mk" rule="may" filefilter="defaultPolicyFilter" desc="Other process calls"/>
|
||||
<policyitem type="compatibility" name="GPL-2.0+-with-LLVM-exception" path="Makefile" rule="may" filefilter="defaultPolicyFilter" desc="Other process calls"/>
|
||||
|
||||
|
||||
</policy>
|
||||
</policylist>
|
||||
<filefilterlist>
|
||||
<filefilter name="defaultPolicyFilter" desc="Filters for compatibility, license header policies">
|
||||
<filteritem type="filepath" name="math/README.contributors" desc="官方自带文件"/>
|
||||
<filteritem type="filepath" name="LICENSE" desc="官方自带文件"/>
|
||||
<filteritem type="filepath" name="string/README.contributors" desc="官方自带文件"/>
|
||||
<filteritem type="filepath" name="README.OpenSource" desc="官方自带文件"/>
|
||||
<filteritem type="filepath" name="README" desc="官方自带文件"/>
|
||||
<filteritem type="filepath" name="optimized-routines.gni" desc="不涉及license"/>
|
||||
<filteritem type="filepath" name="bundle.json" desc="不涉及license"/>
|
||||
<filteritem type="filepath" name="config.mk.dist" desc="不涉及license"/>
|
||||
|
||||
|
||||
|
||||
</filefilter>
|
||||
|
||||
<filefilter name="binaryFileTypePolicyFilter" desc="Filters for binary file policies">
|
||||
<filteritem type="filename" name="*.pdf" desc="官方自带文件"/>
|
||||
<filteritem type="filename" name="*.pdf" desc="官方自带文件"/>
|
||||
</filefilter>
|
||||
|
||||
|
||||
</filefilterlist>
|
||||
</filefilterlist>
|
||||
</oatconfig>
|
||||
</configuration>
|
||||
|
||||
@@ -2,17 +2,14 @@ Arm Optimized Routines
|
||||
----------------------
|
||||
|
||||
This repository contains implementations of library functions
|
||||
provided by Arm. The outbound license is available under a dual
|
||||
license, at the user’s election, as reflected in the LICENSE file.
|
||||
Contributions to this project are accepted, but Contributors have
|
||||
to sign an Assignment Agreement, please follow the instructions in
|
||||
provided by Arm under MIT License (See LICENSE). Contributions
|
||||
to this project are accepted, but Contributors have to sign an
|
||||
Assignment Agreement, please follow the instructions in
|
||||
contributor-agreement.pdf. This is needed so upstreaming code
|
||||
to projects that require copyright assignment is possible. Further
|
||||
contribution requirements are documented in README.contributors of
|
||||
the appropriate subdirectory.
|
||||
to projects that require copyright assignment is possible.
|
||||
|
||||
Regular quarterly releases are tagged as vYY.MM, the latest
|
||||
release is v23.01.
|
||||
release is v21.02.
|
||||
|
||||
Source code layout:
|
||||
|
||||
@@ -27,7 +24,6 @@ networking/test/ - networking test and benchmark related sources.
|
||||
string/ - string routines subproject sources.
|
||||
string/include/ - string library public headers.
|
||||
string/test/ - string test and benchmark related sources.
|
||||
pl/... - separately maintained performance library code.
|
||||
|
||||
The steps to build the target libraries and run the tests:
|
||||
|
||||
|
||||
+4
-21
@@ -1,14 +1,11 @@
|
||||
# Example config.mk
|
||||
#
|
||||
# Copyright (c) 2018-2022, Arm Limited.
|
||||
# SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
# Copyright (c) 2018-2020, Arm Limited.
|
||||
# SPDX-License-Identifier: MIT
|
||||
|
||||
# Subprojects to build
|
||||
SUBS = math string networking
|
||||
|
||||
# Subsubprojects to build if subproject pl is built
|
||||
PLSUBS = math
|
||||
|
||||
# Target architecture: aarch64, arm or x86_64
|
||||
ARCH = aarch64
|
||||
|
||||
@@ -59,22 +56,8 @@ math-cflags += -ffp-contract=fast -fno-math-errno
|
||||
# Use with clang.
|
||||
#math-cflags += -ffp-contract=fast
|
||||
|
||||
# Disable/enable SVE vector math code and tests
|
||||
WANT_SVE_MATH = 0
|
||||
ifeq ($(WANT_SVE_MATH), 1)
|
||||
math-cflags += -march=armv8.2-a+sve
|
||||
endif
|
||||
math-cflags += -DWANT_SVE_MATH=$(WANT_SVE_MATH)
|
||||
|
||||
# If defined to 1, set errno in math functions according to ISO C. Many math
|
||||
# libraries do not set errno, so this is 0 by default. It may need to be
|
||||
# set to 1 if math.h has (math_errhandling & MATH_ERRNO) != 0.
|
||||
WANT_ERRNO = 0
|
||||
math-cflags += -DWANT_ERRNO=$(WANT_ERRNO)
|
||||
|
||||
# If set to 1, set fenv in vector math routines.
|
||||
WANT_SIMD_EXCEPT = 0
|
||||
math-cflags += -DWANT_SIMD_EXCEPT=$(WANT_SIMD_EXCEPT)
|
||||
# Disable vector math code
|
||||
#math-cflags += -DWANT_VMATH=0
|
||||
|
||||
# Disable fenv checks
|
||||
#math-ulpflags = -q -f
|
||||
|
||||
+5
-12
@@ -1,14 +1,12 @@
|
||||
# Makefile fragment - requires GNU make
|
||||
#
|
||||
# Copyright (c) 2019-2022, Arm Limited.
|
||||
# SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
# Copyright (c) 2019, Arm Limited.
|
||||
# SPDX-License-Identifier: MIT
|
||||
|
||||
S := $(srcdir)/math
|
||||
B := build/math
|
||||
|
||||
math-lib-srcs := $(wildcard $(S)/*.[cS])
|
||||
math-lib-srcs += $(wildcard $(S)/$(ARCH)/*.[cS])
|
||||
|
||||
math-test-srcs := \
|
||||
$(S)/test/mathtest.c \
|
||||
$(S)/test/mathbench.c \
|
||||
@@ -17,7 +15,6 @@ math-test-srcs := \
|
||||
math-test-host-srcs := $(wildcard $(S)/test/rtest/*.[cS])
|
||||
|
||||
math-includes := $(patsubst $(S)/%,build/%,$(wildcard $(S)/include/*.h))
|
||||
math-test-includes := $(patsubst $(S)/%,build/include/%,$(wildcard $(S)/test/*.h))
|
||||
|
||||
math-libs := \
|
||||
build/lib/libmathlib.so \
|
||||
@@ -45,11 +42,10 @@ math-files := \
|
||||
$(math-tools) \
|
||||
$(math-host-tools) \
|
||||
$(math-includes) \
|
||||
$(math-test-includes) \
|
||||
|
||||
all-math: $(math-libs) $(math-tools) $(math-includes) $(math-test-includes)
|
||||
all-math: $(math-libs) $(math-tools) $(math-includes)
|
||||
|
||||
$(math-objs): $(math-includes) $(math-test-includes)
|
||||
$(math-objs): $(math-includes)
|
||||
$(math-objs): CFLAGS_ALL += $(math-cflags)
|
||||
$(B)/test/mathtest.o: CFLAGS_ALL += -fmath-errno
|
||||
$(math-host-objs): CC = $(HOST_CC)
|
||||
@@ -87,9 +83,6 @@ build/bin/ulp: $(B)/test/ulp.o build/lib/libmathlib.a
|
||||
build/include/%.h: $(S)/include/%.h
|
||||
cp $< $@
|
||||
|
||||
build/include/test/%.h: $(S)/test/%.h
|
||||
cp $< $@
|
||||
|
||||
build/bin/%.sh: $(S)/test/%.sh
|
||||
cp $< $@
|
||||
|
||||
@@ -103,7 +96,7 @@ check-math-rtest: $(math-host-tools) $(math-tools)
|
||||
cat $(math-rtests) | build/bin/rtest | $(EMULATOR) build/bin/mathtest $(math-testflags)
|
||||
|
||||
check-math-ulp: $(math-tools)
|
||||
ULPFLAGS="$(math-ulpflags)" WANT_SIMD_EXCEPT="$(WANT_SIMD_EXCEPT)" build/bin/runulp.sh $(EMULATOR)
|
||||
ULPFLAGS="$(math-ulpflags)" build/bin/runulp.sh $(EMULATOR)
|
||||
|
||||
check-math: check-math-test check-math-rtest check-math-ulp
|
||||
|
||||
|
||||
@@ -1,78 +0,0 @@
|
||||
STYLE REQUIREMENTS
|
||||
==================
|
||||
|
||||
1. Most code in this sub-directory is expected to be upstreamed into glibc so
|
||||
the GNU Coding Standard and glibc specific conventions should be followed
|
||||
to ease upstreaming.
|
||||
|
||||
2. ABI and symbols: the code should be written so it is suitable for inclusion
|
||||
into a libc with minimal changes. This e.g. means that internal symbols
|
||||
should be hidden and in the implementation reserved namespace according to
|
||||
ISO C and POSIX rules. If possible the built shared libraries and static
|
||||
library archives should be usable to override libc symbols at link time (or
|
||||
at runtime via LD_PRELOAD). This requires the symbols to follow the glibc ABI
|
||||
(other than symbol versioning), this cannot be done reliably for static
|
||||
linking so this is a best effort requirement.
|
||||
|
||||
3. API: include headers should be suitable for benchmarking and testing code
|
||||
and should not conflict with libc headers.
|
||||
|
||||
|
||||
CONTRIBUTION GUIDELINES FOR math SUB-DIRECTORY
|
||||
==============================================
|
||||
|
||||
1. Math functions have quality and performance requirements.
|
||||
|
||||
2. Quality:
|
||||
- Worst-case ULP error should be small in the entire input domain (for most
|
||||
common double precision scalar functions the target is < 0.66 ULP error,
|
||||
and < 1 ULP for single precision, even performance optimized function
|
||||
variant should not have > 5 ULP error if the goal is to be a drop in
|
||||
replacement for a standard math function), this should be tested
|
||||
statistically (or on all inputs if possible in reasonable amount of time).
|
||||
The ulp tool is for this and runulp.sh should be updated for new functions.
|
||||
|
||||
- All standard rounding modes need to be supported but in non-default rounding
|
||||
modes the quality requirement can be relaxed. (Non-nearest rounded
|
||||
computation can be slow and inaccurate but has to be correct for conformance
|
||||
reasons.)
|
||||
|
||||
- Special cases and error handling need to follow ISO C Annex F requirements,
|
||||
POSIX requirements, IEEE 754-2008 requirements and Glibc requiremnts:
|
||||
https://www.gnu.org/software/libc/manual/html_mono/libc.html#Errors-in-Math-Functions
|
||||
this should be tested by direct tests (glibc test system may be used for it).
|
||||
|
||||
- Error handling code should be decoupled from the approximation code as much
|
||||
as possible. (There are helper functions, these take care of errno as well
|
||||
as exception raising.)
|
||||
|
||||
- Vector math code does not need to work in non-nearest rounding mode and error
|
||||
handling side effects need not happen (fenv exceptions and errno), but the
|
||||
result should be correct (within quality requirements, which are lower for
|
||||
vector code than for scalar code).
|
||||
|
||||
- Error bounds of the approximation should be clearly documented.
|
||||
|
||||
- The code should build and pass tests on arm, aarch64 and x86_64 GNU linux
|
||||
systems. (Routines and features can be disabled on specific targets, but
|
||||
the build must complete). On aarch64, both little- and big-endian targets
|
||||
are supported as well as valid combinations of architecture extensions.
|
||||
The configurations that should be tested depend on the contribution.
|
||||
|
||||
3. Performance:
|
||||
- Common math code should be benchmarked on modern aarch64 microarchitectures
|
||||
over typical inputs.
|
||||
|
||||
- Performance improvements should be documented (relative numbers can be
|
||||
published; it is enough to use the mathbench microbenchmark tool which should
|
||||
be updated for new functions).
|
||||
|
||||
- Attention should be paid to the compilation flags: for aarch64 fma
|
||||
contraction should be on and math errno turned off so some builtins can be
|
||||
inlined.
|
||||
|
||||
- The code should be reasonably performant on x86_64 too, e.g. some rounding
|
||||
instructions and fma may not be available on x86_64, such builtins turn into
|
||||
libc calls with slow code. Such slowdown is not acceptable, a faster fallback
|
||||
should be present: glibc and bionic use the same code on all targets. (This
|
||||
does not apply to vector math code).
|
||||
@@ -1,87 +0,0 @@
|
||||
/*
|
||||
* Double-precision vector cos function.
|
||||
*
|
||||
* Copyright (c) 2019-2023, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
*/
|
||||
|
||||
#include "mathlib.h"
|
||||
#include "v_math.h"
|
||||
|
||||
static const struct data
|
||||
{
|
||||
float64x2_t poly[7];
|
||||
float64x2_t range_val, shift, inv_pi, half_pi, pi_1, pi_2, pi_3;
|
||||
} data = {
|
||||
/* Worst-case error is 3.3 ulp in [-pi/2, pi/2]. */
|
||||
.poly = { V2 (-0x1.555555555547bp-3), V2 (0x1.1111111108a4dp-7),
|
||||
V2 (-0x1.a01a019936f27p-13), V2 (0x1.71de37a97d93ep-19),
|
||||
V2 (-0x1.ae633919987c6p-26), V2 (0x1.60e277ae07cecp-33),
|
||||
V2 (-0x1.9e9540300a1p-41) },
|
||||
.inv_pi = V2 (0x1.45f306dc9c883p-2),
|
||||
.half_pi = V2 (0x1.921fb54442d18p+0),
|
||||
.pi_1 = V2 (0x1.921fb54442d18p+1),
|
||||
.pi_2 = V2 (0x1.1a62633145c06p-53),
|
||||
.pi_3 = V2 (0x1.c1cd129024e09p-106),
|
||||
.shift = V2 (0x1.8p52),
|
||||
.range_val = V2 (0x1p23)
|
||||
};
|
||||
|
||||
#define C(i) d->poly[i]
|
||||
|
||||
static float64x2_t VPCS_ATTR NOINLINE
|
||||
special_case (float64x2_t x, float64x2_t y, uint64x2_t odd, uint64x2_t cmp)
|
||||
{
|
||||
y = vreinterpretq_f64_u64 (veorq_u64 (vreinterpretq_u64_f64 (y), odd));
|
||||
return v_call_f64 (cos, x, y, cmp);
|
||||
}
|
||||
|
||||
float64x2_t VPCS_ATTR V_NAME_D1 (cos) (float64x2_t x)
|
||||
{
|
||||
const struct data *d = ptr_barrier (&data);
|
||||
float64x2_t n, r, r2, r3, r4, t1, t2, t3, y;
|
||||
uint64x2_t odd, cmp;
|
||||
|
||||
#if WANT_SIMD_EXCEPT
|
||||
r = vabsq_f64 (x);
|
||||
cmp = vcgeq_u64 (vreinterpretq_u64_f64 (r),
|
||||
vreinterpretq_u64_f64 (d->range_val));
|
||||
if (unlikely (v_any_u64 (cmp)))
|
||||
/* If fenv exceptions are to be triggered correctly, set any special lanes
|
||||
to 1 (which is neutral w.r.t. fenv). These lanes will be fixed by
|
||||
special-case handler later. */
|
||||
r = vbslq_f64 (cmp, v_f64 (1.0), r);
|
||||
#else
|
||||
cmp = vcageq_f64 (x, d->range_val);
|
||||
r = x;
|
||||
#endif
|
||||
|
||||
/* n = rint((|x|+pi/2)/pi) - 0.5. */
|
||||
n = vfmaq_f64 (d->shift, d->inv_pi, vaddq_f64 (r, d->half_pi));
|
||||
odd = vshlq_n_u64 (vreinterpretq_u64_f64 (n), 63);
|
||||
n = vsubq_f64 (n, d->shift);
|
||||
n = vsubq_f64 (n, v_f64 (0.5));
|
||||
|
||||
/* r = |x| - n*pi (range reduction into -pi/2 .. pi/2). */
|
||||
r = vfmsq_f64 (r, d->pi_1, n);
|
||||
r = vfmsq_f64 (r, d->pi_2, n);
|
||||
r = vfmsq_f64 (r, d->pi_3, n);
|
||||
|
||||
/* sin(r) poly approx. */
|
||||
r2 = vmulq_f64 (r, r);
|
||||
r3 = vmulq_f64 (r2, r);
|
||||
r4 = vmulq_f64 (r2, r2);
|
||||
|
||||
t1 = vfmaq_f64 (C (4), C (5), r2);
|
||||
t2 = vfmaq_f64 (C (2), C (3), r2);
|
||||
t3 = vfmaq_f64 (C (0), C (1), r2);
|
||||
|
||||
y = vfmaq_f64 (t1, C (6), r4);
|
||||
y = vfmaq_f64 (t2, y, r4);
|
||||
y = vfmaq_f64 (t3, y, r4);
|
||||
y = vfmaq_f64 (r, y, r3);
|
||||
|
||||
if (unlikely (v_any_u64 (cmp)))
|
||||
return special_case (x, y, odd, cmp);
|
||||
return vreinterpretq_f64_u64 (veorq_u64 (vreinterpretq_u64_f64 (y), odd));
|
||||
}
|
||||
@@ -1,82 +0,0 @@
|
||||
/*
|
||||
* Single-precision vector cos function.
|
||||
*
|
||||
* Copyright (c) 2019-2023, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
*/
|
||||
|
||||
#include "mathlib.h"
|
||||
#include "v_math.h"
|
||||
|
||||
static const struct data
|
||||
{
|
||||
float32x4_t poly[4];
|
||||
float32x4_t range_val, inv_pi, half_pi, shift, pi_1, pi_2, pi_3;
|
||||
} data = {
|
||||
/* 1.886 ulp error. */
|
||||
.poly = { V4 (-0x1.555548p-3f), V4 (0x1.110df4p-7f), V4 (-0x1.9f42eap-13f),
|
||||
V4 (0x1.5b2e76p-19f) },
|
||||
|
||||
.pi_1 = V4 (0x1.921fb6p+1f),
|
||||
.pi_2 = V4 (-0x1.777a5cp-24f),
|
||||
.pi_3 = V4 (-0x1.ee59dap-49f),
|
||||
|
||||
.inv_pi = V4 (0x1.45f306p-2f),
|
||||
.shift = V4 (0x1.8p+23f),
|
||||
.half_pi = V4 (0x1.921fb6p0f),
|
||||
.range_val = V4 (0x1p20f)
|
||||
};
|
||||
|
||||
#define C(i) d->poly[i]
|
||||
|
||||
static float32x4_t VPCS_ATTR NOINLINE
|
||||
special_case (float32x4_t x, float32x4_t y, uint32x4_t odd, uint32x4_t cmp)
|
||||
{
|
||||
/* Fall back to scalar code. */
|
||||
y = vreinterpretq_f32_u32 (veorq_u32 (vreinterpretq_u32_f32 (y), odd));
|
||||
return v_call_f32 (cosf, x, y, cmp);
|
||||
}
|
||||
|
||||
float32x4_t VPCS_ATTR V_NAME_F1 (cos) (float32x4_t x)
|
||||
{
|
||||
const struct data *d = ptr_barrier (&data);
|
||||
float32x4_t n, r, r2, r3, y;
|
||||
uint32x4_t odd, cmp;
|
||||
|
||||
#if WANT_SIMD_EXCEPT
|
||||
r = vabsq_f32 (x);
|
||||
cmp = vcgeq_u32 (vreinterpretq_u32_f32 (r),
|
||||
vreinterpretq_u32_f32 (d->range_val));
|
||||
if (unlikely (v_any_u32 (cmp)))
|
||||
/* If fenv exceptions are to be triggered correctly, set any special lanes
|
||||
to 1 (which is neutral w.r.t. fenv). These lanes will be fixed by
|
||||
special-case handler later. */
|
||||
r = vbslq_f32 (cmp, v_f32 (1.0f), r);
|
||||
#else
|
||||
cmp = vcageq_f32 (x, d->range_val);
|
||||
r = x;
|
||||
#endif
|
||||
|
||||
/* n = rint((|x|+pi/2)/pi) - 0.5. */
|
||||
n = vfmaq_f32 (d->shift, d->inv_pi, vaddq_f32 (r, d->half_pi));
|
||||
odd = vshlq_n_u32 (vreinterpretq_u32_f32 (n), 31);
|
||||
n = vsubq_f32 (n, d->shift);
|
||||
n = vsubq_f32 (n, v_f32 (0.5f));
|
||||
|
||||
/* r = |x| - n*pi (range reduction into -pi/2 .. pi/2). */
|
||||
r = vfmsq_f32 (r, d->pi_1, n);
|
||||
r = vfmsq_f32 (r, d->pi_2, n);
|
||||
r = vfmsq_f32 (r, d->pi_3, n);
|
||||
|
||||
/* y = sin(r). */
|
||||
r2 = vmulq_f32 (r, r);
|
||||
r3 = vmulq_f32 (r2, r);
|
||||
y = vfmaq_f32 (C (2), C (3), r2);
|
||||
y = vfmaq_f32 (C (1), y, r2);
|
||||
y = vfmaq_f32 (C (0), y, r2);
|
||||
y = vfmaq_f32 (r, y, r3);
|
||||
|
||||
if (unlikely (v_any_u32 (cmp)))
|
||||
return special_case (x, y, odd, cmp);
|
||||
return vreinterpretq_f32_u32 (veorq_u32 (vreinterpretq_u32_f32 (y), odd));
|
||||
}
|
||||
@@ -1,125 +0,0 @@
|
||||
/*
|
||||
* Double-precision vector e^x function.
|
||||
*
|
||||
* Copyright (c) 2019-2023, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
*/
|
||||
|
||||
#include "mathlib.h"
|
||||
#include "v_math.h"
|
||||
|
||||
#define N (1 << V_EXP_TABLE_BITS)
|
||||
#define IndexMask (N - 1)
|
||||
|
||||
const static volatile struct
|
||||
{
|
||||
float64x2_t poly[3];
|
||||
float64x2_t inv_ln2, ln2_hi, ln2_lo, shift;
|
||||
#if !WANT_SIMD_EXCEPT
|
||||
float64x2_t special_bound, scale_thresh;
|
||||
#endif
|
||||
} data = {
|
||||
/* maxerr: 1.88 +0.5 ulp
|
||||
rel error: 1.4337*2^-53
|
||||
abs error: 1.4299*2^-53 in [ -ln2/256, ln2/256 ]. */
|
||||
.poly = { V2 (0x1.ffffffffffd43p-2), V2 (0x1.55555c75adbb2p-3),
|
||||
V2 (0x1.55555da646206p-5) },
|
||||
#if !WANT_SIMD_EXCEPT
|
||||
.scale_thresh = V2 (163840.0), /* 1280.0 * N. */
|
||||
.special_bound = V2 (704.0),
|
||||
#endif
|
||||
.inv_ln2 = V2 (0x1.71547652b82fep7), /* N/ln2. */
|
||||
.ln2_hi = V2 (0x1.62e42fefa39efp-8), /* ln2/N. */
|
||||
.ln2_lo = V2 (0x1.abc9e3b39803f3p-63),
|
||||
.shift = V2 (0x1.8p+52)
|
||||
};
|
||||
|
||||
#define C(i) data.poly[i]
|
||||
#define Tab __v_exp_data
|
||||
|
||||
#if WANT_SIMD_EXCEPT
|
||||
|
||||
# define TinyBound v_u64 (0x2000000000000000) /* asuint64 (0x1p-511). */
|
||||
# define BigBound v_u64 (0x4080000000000000) /* asuint64 (0x1p9). */
|
||||
# define SpecialBound v_u64 (0x2080000000000000) /* BigBound - TinyBound. */
|
||||
|
||||
static float64x2_t VPCS_ATTR NOINLINE
|
||||
special_case (float64x2_t x, float64x2_t y, uint64x2_t cmp)
|
||||
{
|
||||
/* If fenv exceptions are to be triggered correctly, fall back to the scalar
|
||||
routine to special lanes. */
|
||||
return v_call_f64 (exp, x, y, cmp);
|
||||
}
|
||||
|
||||
#else
|
||||
|
||||
# define SpecialOffset v_u64 (0x6000000000000000) /* 0x1p513. */
|
||||
/* SpecialBias1 + SpecialBias1 = asuint(1.0). */
|
||||
# define SpecialBias1 v_u64 (0x7000000000000000) /* 0x1p769. */
|
||||
# define SpecialBias2 v_u64 (0x3010000000000000) /* 0x1p-254. */
|
||||
|
||||
static inline float64x2_t VPCS_ATTR
|
||||
special_case (float64x2_t s, float64x2_t y, float64x2_t n)
|
||||
{
|
||||
/* 2^(n/N) may overflow, break it up into s1*s2. */
|
||||
uint64x2_t b = vandq_u64 (vcltzq_f64 (n), SpecialOffset);
|
||||
float64x2_t s1 = vreinterpretq_f64_u64 (vsubq_u64 (SpecialBias1, b));
|
||||
float64x2_t s2 = vreinterpretq_f64_u64 (
|
||||
vaddq_u64 (vsubq_u64 (vreinterpretq_u64_f64 (s), SpecialBias2), b));
|
||||
uint64x2_t cmp = vcagtq_f64 (n, data.scale_thresh);
|
||||
float64x2_t r1 = vmulq_f64 (s1, s1);
|
||||
float64x2_t r0 = vmulq_f64 (vfmaq_f64 (s2, y, s2), s1);
|
||||
return vbslq_f64 (cmp, r1, r0);
|
||||
}
|
||||
|
||||
#endif
|
||||
|
||||
float64x2_t VPCS_ATTR V_NAME_D1 (exp) (float64x2_t x)
|
||||
{
|
||||
float64x2_t n, r, r2, s, y, z;
|
||||
uint64x2_t cmp, u, e;
|
||||
|
||||
#if WANT_SIMD_EXCEPT
|
||||
/* If any lanes are special, mask them with 1 and retain a copy of x to allow
|
||||
special_case to fix special lanes later. This is only necessary if fenv
|
||||
exceptions are to be triggered correctly. */
|
||||
float64x2_t xm = x;
|
||||
uint64x2_t iax = vreinterpretq_u64_f64 (vabsq_f64 (x));
|
||||
cmp = vcgeq_u64 (vsubq_u64 (iax, TinyBound), SpecialBound);
|
||||
if (unlikely (v_any_u64 (cmp)))
|
||||
x = vbslq_f64 (cmp, v_f64 (1), x);
|
||||
#else
|
||||
cmp = vcagtq_f64 (x, data.special_bound);
|
||||
#endif
|
||||
|
||||
/* n = round(x/(ln2/N)). */
|
||||
z = vfmaq_f64 (data.shift, x, data.inv_ln2);
|
||||
u = vreinterpretq_u64_f64 (z);
|
||||
n = vsubq_f64 (z, data.shift);
|
||||
|
||||
/* r = x - n*ln2/N. */
|
||||
r = x;
|
||||
r = vfmsq_f64 (r, data.ln2_hi, n);
|
||||
r = vfmsq_f64 (r, data.ln2_lo, n);
|
||||
|
||||
e = vshlq_n_u64 (u, 52 - V_EXP_TABLE_BITS);
|
||||
|
||||
/* y = exp(r) - 1 ~= r + C0 r^2 + C1 r^3 + C2 r^4. */
|
||||
r2 = vmulq_f64 (r, r);
|
||||
y = vfmaq_f64 (C (0), C (1), r);
|
||||
y = vfmaq_f64 (y, C (2), r2);
|
||||
y = vfmaq_f64 (r, y, r2);
|
||||
|
||||
/* s = 2^(n/N). */
|
||||
u = (uint64x2_t){ Tab[u[0] & IndexMask], Tab[u[1] & IndexMask] };
|
||||
s = vreinterpretq_f64_u64 (vaddq_u64 (u, e));
|
||||
|
||||
if (unlikely (v_any_u64 (cmp)))
|
||||
#if WANT_SIMD_EXCEPT
|
||||
return special_case (xm, vfmaq_f64 (s, y, s), cmp);
|
||||
#else
|
||||
return special_case (s, y, n);
|
||||
#endif
|
||||
|
||||
return vfmaq_f64 (s, y, s);
|
||||
}
|
||||
@@ -1,113 +0,0 @@
|
||||
/*
|
||||
* Single-precision vector 2^x function.
|
||||
*
|
||||
* Copyright (c) 2019-2023, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
*/
|
||||
|
||||
#include "mathlib.h"
|
||||
#include "v_math.h"
|
||||
|
||||
static const struct data
|
||||
{
|
||||
float32x4_t poly[5];
|
||||
uint32x4_t exponent_bias;
|
||||
#if !WANT_SIMD_EXCEPT
|
||||
float32x4_t special_bound, scale_thresh;
|
||||
#endif
|
||||
} data = {
|
||||
/* maxerr: 1.962 ulp. */
|
||||
.poly = { V4 (0x1.59977ap-10f), V4 (0x1.3ce9e4p-7f), V4 (0x1.c6bd32p-5f),
|
||||
V4 (0x1.ebf9bcp-3f), V4 (0x1.62e422p-1f) },
|
||||
.exponent_bias = V4 (0x3f800000),
|
||||
#if !WANT_SIMD_EXCEPT
|
||||
.special_bound = V4 (126.0f),
|
||||
.scale_thresh = V4 (192.0f),
|
||||
#endif
|
||||
};
|
||||
|
||||
#define C(i) d->poly[i]
|
||||
|
||||
#if WANT_SIMD_EXCEPT
|
||||
|
||||
# define TinyBound v_u32 (0x20000000) /* asuint (0x1p-63). */
|
||||
# define BigBound v_u32 (0x42800000) /* asuint (0x1p6). */
|
||||
# define SpecialBound v_u32 (0x22800000) /* BigBound - TinyBound. */
|
||||
|
||||
static float32x4_t VPCS_ATTR NOINLINE
|
||||
special_case (float32x4_t x, float32x4_t y, uint32x4_t cmp)
|
||||
{
|
||||
/* If fenv exceptions are to be triggered correctly, fall back to the scalar
|
||||
routine for special lanes. */
|
||||
return v_call_f32 (exp2f, x, y, cmp);
|
||||
}
|
||||
|
||||
#else
|
||||
|
||||
# define SpecialOffset v_u32 (0x82000000)
|
||||
# define SpecialBias v_u32 (0x7f000000)
|
||||
|
||||
static float32x4_t VPCS_ATTR NOINLINE
|
||||
special_case (float32x4_t poly, float32x4_t n, uint32x4_t e, uint32x4_t cmp1,
|
||||
float32x4_t scale, const struct data *d)
|
||||
{
|
||||
/* 2^n may overflow, break it up into s1*s2. */
|
||||
uint32x4_t b = vandq_u32 (vclezq_f32 (n), SpecialOffset);
|
||||
float32x4_t s1 = vreinterpretq_f32_u32 (vaddq_u32 (b, SpecialBias));
|
||||
float32x4_t s2 = vreinterpretq_f32_u32 (vsubq_u32 (e, b));
|
||||
uint32x4_t cmp2 = vcagtq_f32 (n, d->scale_thresh);
|
||||
float32x4_t r2 = vmulq_f32 (s1, s1);
|
||||
float32x4_t r1 = vmulq_f32 (vfmaq_f32 (s2, poly, s2), s1);
|
||||
/* Similar to r1 but avoids double rounding in the subnormal range. */
|
||||
float32x4_t r0 = vfmaq_f32 (scale, poly, scale);
|
||||
float32x4_t r = vbslq_f32 (cmp1, r1, r0);
|
||||
return vbslq_f32 (cmp2, r2, r);
|
||||
}
|
||||
|
||||
#endif
|
||||
|
||||
float32x4_t VPCS_ATTR V_NAME_F1 (exp2) (float32x4_t x)
|
||||
{
|
||||
const struct data *d = ptr_barrier (&data);
|
||||
float32x4_t n, r, r2, scale, p, q, poly;
|
||||
uint32x4_t cmp, e;
|
||||
|
||||
#if WANT_SIMD_EXCEPT
|
||||
/* asuint(|x|) - TinyBound >= BigBound - TinyBound. */
|
||||
uint32x4_t ia = vreinterpretq_u32_f32 (vabsq_f32 (x));
|
||||
cmp = vcgeq_u32 (vsubq_u32 (ia, TinyBound), SpecialBound);
|
||||
float32x4_t xm = x;
|
||||
/* If any lanes are special, mask them with 1 and retain a copy of x to allow
|
||||
special_case to fix special lanes later. This is only necessary if fenv
|
||||
exceptions are to be triggered correctly. */
|
||||
if (unlikely (v_any_u32 (cmp)))
|
||||
x = vbslq_f32 (cmp, v_f32 (1), x);
|
||||
#endif
|
||||
|
||||
/* exp2(x) = 2^n (1 + poly(r)), with 1 + poly(r) in [1/sqrt(2),sqrt(2)]
|
||||
x = n + r, with r in [-1/2, 1/2]. */
|
||||
n = vrndaq_f32 (x);
|
||||
r = vsubq_f32 (x, n);
|
||||
e = vshlq_n_u32 (vreinterpretq_u32_s32 (vcvtaq_s32_f32 (x)), 23);
|
||||
scale = vreinterpretq_f32_u32 (vaddq_u32 (e, d->exponent_bias));
|
||||
|
||||
#if !WANT_SIMD_EXCEPT
|
||||
cmp = vcagtq_f32 (n, d->special_bound);
|
||||
#endif
|
||||
|
||||
r2 = vmulq_f32 (r, r);
|
||||
p = vfmaq_f32 (C (1), C (0), r);
|
||||
q = vfmaq_f32 (C (3), C (2), r);
|
||||
q = vfmaq_f32 (q, p, r2);
|
||||
p = vmulq_f32 (C (4), r);
|
||||
poly = vfmaq_f32 (p, q, r2);
|
||||
|
||||
if (unlikely (v_any_u32 (cmp)))
|
||||
#if WANT_SIMD_EXCEPT
|
||||
return special_case (xm, vfmaq_f32 (scale, poly, scale), cmp);
|
||||
#else
|
||||
return special_case (poly, n, e, cmp, scale, d);
|
||||
#endif
|
||||
|
||||
return vfmaq_f32 (scale, poly, scale);
|
||||
}
|
||||
@@ -1,72 +0,0 @@
|
||||
/*
|
||||
* Single-precision vector 2^x function.
|
||||
*
|
||||
* Copyright (c) 2019-2023, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
*/
|
||||
|
||||
#include "mathlib.h"
|
||||
#include "v_math.h"
|
||||
|
||||
static const float Poly[] = {
|
||||
/* maxerr: 0.878 ulp. */
|
||||
0x1.416b5ep-13f, 0x1.5f082ep-10f, 0x1.3b2dep-7f, 0x1.c6af7cp-5f, 0x1.ebfbdcp-3f, 0x1.62e43p-1f
|
||||
};
|
||||
#define C0 v_f32 (Poly[0])
|
||||
#define C1 v_f32 (Poly[1])
|
||||
#define C2 v_f32 (Poly[2])
|
||||
#define C3 v_f32 (Poly[3])
|
||||
#define C4 v_f32 (Poly[4])
|
||||
#define C5 v_f32 (Poly[5])
|
||||
|
||||
#define Shift v_f32 (0x1.8p23f)
|
||||
#define InvLn2 v_f32 (0x1.715476p+0f)
|
||||
#define Ln2hi v_f32 (0x1.62e4p-1f)
|
||||
#define Ln2lo v_f32 (0x1.7f7d1cp-20f)
|
||||
|
||||
static float32x4_t VPCS_ATTR NOINLINE
|
||||
specialcase (float32x4_t poly, float32x4_t n, uint32x4_t e, float32x4_t absn)
|
||||
{
|
||||
/* 2^n may overflow, break it up into s1*s2. */
|
||||
uint32x4_t b = (n <= v_f32 (0.0f)) & v_u32 (0x83000000);
|
||||
float32x4_t s1 = vreinterpretq_f32_u32 (v_u32 (0x7f000000) + b);
|
||||
float32x4_t s2 = vreinterpretq_f32_u32 (e - b);
|
||||
uint32x4_t cmp = absn > v_f32 (192.0f);
|
||||
float32x4_t r1 = s1 * s1;
|
||||
float32x4_t r0 = poly * s1 * s2;
|
||||
return vreinterpretq_f32_u32 ((cmp & vreinterpretq_u32_f32 (r1))
|
||||
| (~cmp & vreinterpretq_u32_f32 (r0)));
|
||||
}
|
||||
|
||||
float32x4_t VPCS_ATTR
|
||||
_ZGVnN4v_exp2f_1u (float32x4_t x)
|
||||
{
|
||||
float32x4_t n, r, scale, poly, absn;
|
||||
uint32x4_t cmp, e;
|
||||
|
||||
/* exp2(x) = 2^n * poly(r), with poly(r) in [1/sqrt(2),sqrt(2)]
|
||||
x = n + r, with r in [-1/2, 1/2]. */
|
||||
#if 0
|
||||
float32x4_t z;
|
||||
z = x + Shift;
|
||||
n = z - Shift;
|
||||
r = x - n;
|
||||
e = vreinterpretq_u32_f32 (z) << 23;
|
||||
#else
|
||||
n = vrndaq_f32 (x);
|
||||
r = x - n;
|
||||
e = vreinterpretq_u32_s32 (vcvtaq_s32_f32 (x)) << 23;
|
||||
#endif
|
||||
scale = vreinterpretq_f32_u32 (e + v_u32 (0x3f800000));
|
||||
absn = vabsq_f32 (n);
|
||||
cmp = absn > v_f32 (126.0f);
|
||||
poly = vfmaq_f32 (C1, C0, r);
|
||||
poly = vfmaq_f32 (C2, poly, r);
|
||||
poly = vfmaq_f32 (C3, poly, r);
|
||||
poly = vfmaq_f32 (C4, poly, r);
|
||||
poly = vfmaq_f32 (C5, poly, r);
|
||||
poly = vfmaq_f32 (v_f32 (1.0f), poly, r);
|
||||
if (unlikely (v_any_u32 (cmp)))
|
||||
return specialcase (poly, n, e, absn);
|
||||
return scale * poly;
|
||||
}
|
||||
@@ -1,146 +0,0 @@
|
||||
/*
|
||||
* Lookup table for double-precision e^x vector function.
|
||||
*
|
||||
* Copyright (c) 2019-2023, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
*/
|
||||
|
||||
#include "v_math.h"
|
||||
|
||||
# define N (1 << V_EXP_TABLE_BITS)
|
||||
|
||||
/* 2^(j/N), j=0..N. */
|
||||
const uint64_t __v_exp_data[] = {
|
||||
# if N == 128
|
||||
0x3ff0000000000000, 0x3feff63da9fb3335, 0x3fefec9a3e778061,
|
||||
0x3fefe315e86e7f85, 0x3fefd9b0d3158574, 0x3fefd06b29ddf6de,
|
||||
0x3fefc74518759bc8, 0x3fefbe3ecac6f383, 0x3fefb5586cf9890f,
|
||||
0x3fefac922b7247f7, 0x3fefa3ec32d3d1a2, 0x3fef9b66affed31b,
|
||||
0x3fef9301d0125b51, 0x3fef8abdc06c31cc, 0x3fef829aaea92de0,
|
||||
0x3fef7a98c8a58e51, 0x3fef72b83c7d517b, 0x3fef6af9388c8dea,
|
||||
0x3fef635beb6fcb75, 0x3fef5be084045cd4, 0x3fef54873168b9aa,
|
||||
0x3fef4d5022fcd91d, 0x3fef463b88628cd6, 0x3fef3f49917ddc96,
|
||||
0x3fef387a6e756238, 0x3fef31ce4fb2a63f, 0x3fef2b4565e27cdd,
|
||||
0x3fef24dfe1f56381, 0x3fef1e9df51fdee1, 0x3fef187fd0dad990,
|
||||
0x3fef1285a6e4030b, 0x3fef0cafa93e2f56, 0x3fef06fe0a31b715,
|
||||
0x3fef0170fc4cd831, 0x3feefc08b26416ff, 0x3feef6c55f929ff1,
|
||||
0x3feef1a7373aa9cb, 0x3feeecae6d05d866, 0x3feee7db34e59ff7,
|
||||
0x3feee32dc313a8e5, 0x3feedea64c123422, 0x3feeda4504ac801c,
|
||||
0x3feed60a21f72e2a, 0x3feed1f5d950a897, 0x3feece086061892d,
|
||||
0x3feeca41ed1d0057, 0x3feec6a2b5c13cd0, 0x3feec32af0d7d3de,
|
||||
0x3feebfdad5362a27, 0x3feebcb299fddd0d, 0x3feeb9b2769d2ca7,
|
||||
0x3feeb6daa2cf6642, 0x3feeb42b569d4f82, 0x3feeb1a4ca5d920f,
|
||||
0x3feeaf4736b527da, 0x3feead12d497c7fd, 0x3feeab07dd485429,
|
||||
0x3feea9268a5946b7, 0x3feea76f15ad2148, 0x3feea5e1b976dc09,
|
||||
0x3feea47eb03a5585, 0x3feea34634ccc320, 0x3feea23882552225,
|
||||
0x3feea155d44ca973, 0x3feea09e667f3bcd, 0x3feea012750bdabf,
|
||||
0x3fee9fb23c651a2f, 0x3fee9f7df9519484, 0x3fee9f75e8ec5f74,
|
||||
0x3fee9f9a48a58174, 0x3fee9feb564267c9, 0x3feea0694fde5d3f,
|
||||
0x3feea11473eb0187, 0x3feea1ed0130c132, 0x3feea2f336cf4e62,
|
||||
0x3feea427543e1a12, 0x3feea589994cce13, 0x3feea71a4623c7ad,
|
||||
0x3feea8d99b4492ed, 0x3feeaac7d98a6699, 0x3feeace5422aa0db,
|
||||
0x3feeaf3216b5448c, 0x3feeb1ae99157736, 0x3feeb45b0b91ffc6,
|
||||
0x3feeb737b0cdc5e5, 0x3feeba44cbc8520f, 0x3feebd829fde4e50,
|
||||
0x3feec0f170ca07ba, 0x3feec49182a3f090, 0x3feec86319e32323,
|
||||
0x3feecc667b5de565, 0x3feed09bec4a2d33, 0x3feed503b23e255d,
|
||||
0x3feed99e1330b358, 0x3feede6b5579fdbf, 0x3feee36bbfd3f37a,
|
||||
0x3feee89f995ad3ad, 0x3feeee07298db666, 0x3feef3a2b84f15fb,
|
||||
0x3feef9728de5593a, 0x3feeff76f2fb5e47, 0x3fef05b030a1064a,
|
||||
0x3fef0c1e904bc1d2, 0x3fef12c25bd71e09, 0x3fef199bdd85529c,
|
||||
0x3fef20ab5fffd07a, 0x3fef27f12e57d14b, 0x3fef2f6d9406e7b5,
|
||||
0x3fef3720dcef9069, 0x3fef3f0b555dc3fa, 0x3fef472d4a07897c,
|
||||
0x3fef4f87080d89f2, 0x3fef5818dcfba487, 0x3fef60e316c98398,
|
||||
0x3fef69e603db3285, 0x3fef7321f301b460, 0x3fef7c97337b9b5f,
|
||||
0x3fef864614f5a129, 0x3fef902ee78b3ff6, 0x3fef9a51fbc74c83,
|
||||
0x3fefa4afa2a490da, 0x3fefaf482d8e67f1, 0x3fefba1bee615a27,
|
||||
0x3fefc52b376bba97, 0x3fefd0765b6e4540, 0x3fefdbfdad9cbe14,
|
||||
0x3fefe7c1819e90d8, 0x3feff3c22b8f71f1,
|
||||
# elif N == 256
|
||||
0x3ff0000000000000, 0x3feffb1afa5abcbf, 0x3feff63da9fb3335,
|
||||
0x3feff168143b0281, 0x3fefec9a3e778061, 0x3fefe7d42e11bbcc,
|
||||
0x3fefe315e86e7f85, 0x3fefde5f72f654b1, 0x3fefd9b0d3158574,
|
||||
0x3fefd50a0e3c1f89, 0x3fefd06b29ddf6de, 0x3fefcbd42b72a836,
|
||||
0x3fefc74518759bc8, 0x3fefc2bdf66607e0, 0x3fefbe3ecac6f383,
|
||||
0x3fefb9c79b1f3919, 0x3fefb5586cf9890f, 0x3fefb0f145e46c85,
|
||||
0x3fefac922b7247f7, 0x3fefa83b23395dec, 0x3fefa3ec32d3d1a2,
|
||||
0x3fef9fa55fdfa9c5, 0x3fef9b66affed31b, 0x3fef973028d7233e,
|
||||
0x3fef9301d0125b51, 0x3fef8edbab5e2ab6, 0x3fef8abdc06c31cc,
|
||||
0x3fef86a814f204ab, 0x3fef829aaea92de0, 0x3fef7e95934f312e,
|
||||
0x3fef7a98c8a58e51, 0x3fef76a45471c3c2, 0x3fef72b83c7d517b,
|
||||
0x3fef6ed48695bbc0, 0x3fef6af9388c8dea, 0x3fef672658375d2f,
|
||||
0x3fef635beb6fcb75, 0x3fef5f99f8138a1c, 0x3fef5be084045cd4,
|
||||
0x3fef582f95281c6b, 0x3fef54873168b9aa, 0x3fef50e75eb44027,
|
||||
0x3fef4d5022fcd91d, 0x3fef49c18438ce4d, 0x3fef463b88628cd6,
|
||||
0x3fef42be3578a819, 0x3fef3f49917ddc96, 0x3fef3bdda27912d1,
|
||||
0x3fef387a6e756238, 0x3fef351ffb82140a, 0x3fef31ce4fb2a63f,
|
||||
0x3fef2e85711ece75, 0x3fef2b4565e27cdd, 0x3fef280e341ddf29,
|
||||
0x3fef24dfe1f56381, 0x3fef21ba7591bb70, 0x3fef1e9df51fdee1,
|
||||
0x3fef1b8a66d10f13, 0x3fef187fd0dad990, 0x3fef157e39771b2f,
|
||||
0x3fef1285a6e4030b, 0x3fef0f961f641589, 0x3fef0cafa93e2f56,
|
||||
0x3fef09d24abd886b, 0x3fef06fe0a31b715, 0x3fef0432edeeb2fd,
|
||||
0x3fef0170fc4cd831, 0x3feefeb83ba8ea32, 0x3feefc08b26416ff,
|
||||
0x3feef96266e3fa2d, 0x3feef6c55f929ff1, 0x3feef431a2de883b,
|
||||
0x3feef1a7373aa9cb, 0x3feeef26231e754a, 0x3feeecae6d05d866,
|
||||
0x3feeea401b7140ef, 0x3feee7db34e59ff7, 0x3feee57fbfec6cf4,
|
||||
0x3feee32dc313a8e5, 0x3feee0e544ede173, 0x3feedea64c123422,
|
||||
0x3feedc70df1c5175, 0x3feeda4504ac801c, 0x3feed822c367a024,
|
||||
0x3feed60a21f72e2a, 0x3feed3fb2709468a, 0x3feed1f5d950a897,
|
||||
0x3feecffa3f84b9d4, 0x3feece086061892d, 0x3feecc2042a7d232,
|
||||
0x3feeca41ed1d0057, 0x3feec86d668b3237, 0x3feec6a2b5c13cd0,
|
||||
0x3feec4e1e192aed2, 0x3feec32af0d7d3de, 0x3feec17dea6db7d7,
|
||||
0x3feebfdad5362a27, 0x3feebe41b817c114, 0x3feebcb299fddd0d,
|
||||
0x3feebb2d81d8abff, 0x3feeb9b2769d2ca7, 0x3feeb8417f4531ee,
|
||||
0x3feeb6daa2cf6642, 0x3feeb57de83f4eef, 0x3feeb42b569d4f82,
|
||||
0x3feeb2e2f4f6ad27, 0x3feeb1a4ca5d920f, 0x3feeb070dde910d2,
|
||||
0x3feeaf4736b527da, 0x3feeae27dbe2c4cf, 0x3feead12d497c7fd,
|
||||
0x3feeac0827ff07cc, 0x3feeab07dd485429, 0x3feeaa11fba87a03,
|
||||
0x3feea9268a5946b7, 0x3feea84590998b93, 0x3feea76f15ad2148,
|
||||
0x3feea6a320dceb71, 0x3feea5e1b976dc09, 0x3feea52ae6cdf6f4,
|
||||
0x3feea47eb03a5585, 0x3feea3dd1d1929fd, 0x3feea34634ccc320,
|
||||
0x3feea2b9febc8fb7, 0x3feea23882552225, 0x3feea1c1c70833f6,
|
||||
0x3feea155d44ca973, 0x3feea0f4b19e9538, 0x3feea09e667f3bcd,
|
||||
0x3feea052fa75173e, 0x3feea012750bdabf, 0x3fee9fdcddd47645,
|
||||
0x3fee9fb23c651a2f, 0x3fee9f9298593ae5, 0x3fee9f7df9519484,
|
||||
0x3fee9f7466f42e87, 0x3fee9f75e8ec5f74, 0x3fee9f8286ead08a,
|
||||
0x3fee9f9a48a58174, 0x3fee9fbd35d7cbfd, 0x3fee9feb564267c9,
|
||||
0x3feea024b1ab6e09, 0x3feea0694fde5d3f, 0x3feea0b938ac1cf6,
|
||||
0x3feea11473eb0187, 0x3feea17b0976cfdb, 0x3feea1ed0130c132,
|
||||
0x3feea26a62ff86f0, 0x3feea2f336cf4e62, 0x3feea3878491c491,
|
||||
0x3feea427543e1a12, 0x3feea4d2add106d9, 0x3feea589994cce13,
|
||||
0x3feea64c1eb941f7, 0x3feea71a4623c7ad, 0x3feea7f4179f5b21,
|
||||
0x3feea8d99b4492ed, 0x3feea9cad931a436, 0x3feeaac7d98a6699,
|
||||
0x3feeabd0a478580f, 0x3feeace5422aa0db, 0x3feeae05bad61778,
|
||||
0x3feeaf3216b5448c, 0x3feeb06a5e0866d9, 0x3feeb1ae99157736,
|
||||
0x3feeb2fed0282c8a, 0x3feeb45b0b91ffc6, 0x3feeb5c353aa2fe2,
|
||||
0x3feeb737b0cdc5e5, 0x3feeb8b82b5f98e5, 0x3feeba44cbc8520f,
|
||||
0x3feebbdd9a7670b3, 0x3feebd829fde4e50, 0x3feebf33e47a22a2,
|
||||
0x3feec0f170ca07ba, 0x3feec2bb4d53fe0d, 0x3feec49182a3f090,
|
||||
0x3feec674194bb8d5, 0x3feec86319e32323, 0x3feeca5e8d07f29e,
|
||||
0x3feecc667b5de565, 0x3feece7aed8eb8bb, 0x3feed09bec4a2d33,
|
||||
0x3feed2c980460ad8, 0x3feed503b23e255d, 0x3feed74a8af46052,
|
||||
0x3feed99e1330b358, 0x3feedbfe53c12e59, 0x3feede6b5579fdbf,
|
||||
0x3feee0e521356eba, 0x3feee36bbfd3f37a, 0x3feee5ff3a3c2774,
|
||||
0x3feee89f995ad3ad, 0x3feeeb4ce622f2ff, 0x3feeee07298db666,
|
||||
0x3feef0ce6c9a8952, 0x3feef3a2b84f15fb, 0x3feef68415b749b1,
|
||||
0x3feef9728de5593a, 0x3feefc6e29f1c52a, 0x3feeff76f2fb5e47,
|
||||
0x3fef028cf22749e4, 0x3fef05b030a1064a, 0x3fef08e0b79a6f1f,
|
||||
0x3fef0c1e904bc1d2, 0x3fef0f69c3f3a207, 0x3fef12c25bd71e09,
|
||||
0x3fef16286141b33d, 0x3fef199bdd85529c, 0x3fef1d1cd9fa652c,
|
||||
0x3fef20ab5fffd07a, 0x3fef244778fafb22, 0x3fef27f12e57d14b,
|
||||
0x3fef2ba88988c933, 0x3fef2f6d9406e7b5, 0x3fef33405751c4db,
|
||||
0x3fef3720dcef9069, 0x3fef3b0f2e6d1675, 0x3fef3f0b555dc3fa,
|
||||
0x3fef43155b5bab74, 0x3fef472d4a07897c, 0x3fef4b532b08c968,
|
||||
0x3fef4f87080d89f2, 0x3fef53c8eacaa1d6, 0x3fef5818dcfba487,
|
||||
0x3fef5c76e862e6d3, 0x3fef60e316c98398, 0x3fef655d71ff6075,
|
||||
0x3fef69e603db3285, 0x3fef6e7cd63a8315, 0x3fef7321f301b460,
|
||||
0x3fef77d5641c0658, 0x3fef7c97337b9b5f, 0x3fef81676b197d17,
|
||||
0x3fef864614f5a129, 0x3fef8b333b16ee12, 0x3fef902ee78b3ff6,
|
||||
0x3fef953924676d76, 0x3fef9a51fbc74c83, 0x3fef9f7977cdb740,
|
||||
0x3fefa4afa2a490da, 0x3fefa9f4867cca6e, 0x3fefaf482d8e67f1,
|
||||
0x3fefb4aaa2188510, 0x3fefba1bee615a27, 0x3fefbf9c1cb6412a,
|
||||
0x3fefc52b376bba97, 0x3fefcac948dd7274, 0x3fefd0765b6e4540,
|
||||
0x3fefd632798844f8, 0x3fefdbfdad9cbe14, 0x3fefe1d802243c89,
|
||||
0x3fefe7c1819e90d8, 0x3fefedba3692d514, 0x3feff3c22b8f71f1,
|
||||
0x3feff9d96b2a23d9,
|
||||
# endif
|
||||
};
|
||||
@@ -1,122 +0,0 @@
|
||||
/*
|
||||
* Single-precision vector e^x function.
|
||||
*
|
||||
* Copyright (c) 2019-2023, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
*/
|
||||
|
||||
#include "mathlib.h"
|
||||
#include "v_math.h"
|
||||
|
||||
static const struct data
|
||||
{
|
||||
float32x4_t poly[5];
|
||||
float32x4_t shift, inv_ln2, ln2_hi, ln2_lo;
|
||||
uint32x4_t exponent_bias;
|
||||
#if !WANT_SIMD_EXCEPT
|
||||
float32x4_t special_bound, scale_thresh;
|
||||
#endif
|
||||
} data = {
|
||||
/* maxerr: 1.45358 +0.5 ulp. */
|
||||
.poly = { V4 (0x1.0e4020p-7f), V4 (0x1.573e2ep-5f), V4 (0x1.555e66p-3f),
|
||||
V4 (0x1.fffdb6p-2f), V4 (0x1.ffffecp-1f) },
|
||||
.shift = V4 (0x1.8p23f),
|
||||
.inv_ln2 = V4 (0x1.715476p+0f),
|
||||
.ln2_hi = V4 (0x1.62e4p-1f),
|
||||
.ln2_lo = V4 (0x1.7f7d1cp-20f),
|
||||
.exponent_bias = V4 (0x3f800000),
|
||||
#if !WANT_SIMD_EXCEPT
|
||||
.special_bound = V4 (126.0f),
|
||||
.scale_thresh = V4 (192.0f),
|
||||
#endif
|
||||
};
|
||||
|
||||
#define C(i) d->poly[i]
|
||||
|
||||
#if WANT_SIMD_EXCEPT
|
||||
|
||||
# define TinyBound v_u32 (0x20000000) /* asuint (0x1p-63). */
|
||||
# define BigBound v_u32 (0x42800000) /* asuint (0x1p6). */
|
||||
# define SpecialBound v_u32 (0x22800000) /* BigBound - TinyBound. */
|
||||
|
||||
static float32x4_t VPCS_ATTR NOINLINE
|
||||
special_case (float32x4_t x, float32x4_t y, uint32x4_t cmp)
|
||||
{
|
||||
/* If fenv exceptions are to be triggered correctly, fall back to the scalar
|
||||
routine to special lanes. */
|
||||
return v_call_f32 (expf, x, y, cmp);
|
||||
}
|
||||
|
||||
#else
|
||||
|
||||
# define SpecialOffset v_u32 (0x82000000)
|
||||
# define SpecialBias v_u32 (0x7f000000)
|
||||
|
||||
static float32x4_t VPCS_ATTR NOINLINE
|
||||
special_case (float32x4_t poly, float32x4_t n, uint32x4_t e, uint32x4_t cmp1,
|
||||
float32x4_t scale, const struct data *d)
|
||||
{
|
||||
/* 2^n may overflow, break it up into s1*s2. */
|
||||
uint32x4_t b = vandq_u32 (vclezq_f32 (n), SpecialOffset);
|
||||
float32x4_t s1 = vreinterpretq_f32_u32 (vaddq_u32 (b, SpecialBias));
|
||||
float32x4_t s2 = vreinterpretq_f32_u32 (vsubq_u32 (e, b));
|
||||
uint32x4_t cmp2 = vcagtq_f32 (n, d->scale_thresh);
|
||||
float32x4_t r2 = vmulq_f32 (s1, s1);
|
||||
float32x4_t r1 = vmulq_f32 (vfmaq_f32 (s2, poly, s2), s1);
|
||||
/* Similar to r1 but avoids double rounding in the subnormal range. */
|
||||
float32x4_t r0 = vfmaq_f32 (scale, poly, scale);
|
||||
float32x4_t r = vbslq_f32 (cmp1, r1, r0);
|
||||
return vbslq_f32 (cmp2, r2, r);
|
||||
}
|
||||
|
||||
#endif
|
||||
|
||||
float32x4_t VPCS_ATTR V_NAME_F1 (exp) (float32x4_t x)
|
||||
{
|
||||
const struct data *d = ptr_barrier (&data);
|
||||
float32x4_t n, r, r2, scale, p, q, poly, z;
|
||||
uint32x4_t cmp, e;
|
||||
|
||||
#if WANT_SIMD_EXCEPT
|
||||
/* asuint(x) - TinyBound >= BigBound - TinyBound. */
|
||||
cmp = vcgeq_u32 (
|
||||
vsubq_u32 (vandq_u32 (vreinterpretq_u32_f32 (x), v_u32 (0x7fffffff)),
|
||||
TinyBound),
|
||||
SpecialBound);
|
||||
float32x4_t xm = x;
|
||||
/* If any lanes are special, mask them with 1 and retain a copy of x to allow
|
||||
special case handler to fix special lanes later. This is only necessary if
|
||||
fenv exceptions are to be triggered correctly. */
|
||||
if (unlikely (v_any_u32 (cmp)))
|
||||
x = vbslq_f32 (cmp, v_f32 (1), x);
|
||||
#endif
|
||||
|
||||
/* exp(x) = 2^n (1 + poly(r)), with 1 + poly(r) in [1/sqrt(2),sqrt(2)]
|
||||
x = ln2*n + r, with r in [-ln2/2, ln2/2]. */
|
||||
z = vfmaq_f32 (d->shift, x, d->inv_ln2);
|
||||
n = vsubq_f32 (z, d->shift);
|
||||
r = vfmsq_f32 (x, n, d->ln2_hi);
|
||||
r = vfmsq_f32 (r, n, d->ln2_lo);
|
||||
e = vshlq_n_u32 (vreinterpretq_u32_f32 (z), 23);
|
||||
scale = vreinterpretq_f32_u32 (vaddq_u32 (e, d->exponent_bias));
|
||||
|
||||
#if !WANT_SIMD_EXCEPT
|
||||
cmp = vcagtq_f32 (n, d->special_bound);
|
||||
#endif
|
||||
|
||||
r2 = vmulq_f32 (r, r);
|
||||
p = vfmaq_f32 (C (1), C (0), r);
|
||||
q = vfmaq_f32 (C (3), C (2), r);
|
||||
q = vfmaq_f32 (q, p, r2);
|
||||
p = vmulq_f32 (C (4), r);
|
||||
poly = vfmaq_f32 (p, q, r2);
|
||||
|
||||
if (unlikely (v_any_u32 (cmp)))
|
||||
#if WANT_SIMD_EXCEPT
|
||||
return special_case (xm, vfmaq_f32 (scale, poly, scale), cmp);
|
||||
#else
|
||||
return special_case (poly, n, e, cmp, scale, d);
|
||||
#endif
|
||||
|
||||
return vfmaq_f32 (scale, poly, scale);
|
||||
}
|
||||
@@ -1,77 +0,0 @@
|
||||
/*
|
||||
* Single-precision vector e^x function.
|
||||
*
|
||||
* Copyright (c) 2019-2023, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
*/
|
||||
|
||||
#include "mathlib.h"
|
||||
#include "v_math.h"
|
||||
|
||||
static const float Poly[] = {
|
||||
/* maxerr: 0.36565 +0.5 ulp. */
|
||||
0x1.6a6000p-10f,
|
||||
0x1.12718ep-7f,
|
||||
0x1.555af0p-5f,
|
||||
0x1.555430p-3f,
|
||||
0x1.fffff4p-2f,
|
||||
};
|
||||
#define C0 v_f32 (Poly[0])
|
||||
#define C1 v_f32 (Poly[1])
|
||||
#define C2 v_f32 (Poly[2])
|
||||
#define C3 v_f32 (Poly[3])
|
||||
#define C4 v_f32 (Poly[4])
|
||||
|
||||
#define Shift v_f32 (0x1.8p23f)
|
||||
#define InvLn2 v_f32 (0x1.715476p+0f)
|
||||
#define Ln2hi v_f32 (0x1.62e4p-1f)
|
||||
#define Ln2lo v_f32 (0x1.7f7d1cp-20f)
|
||||
|
||||
static float32x4_t VPCS_ATTR NOINLINE
|
||||
specialcase (float32x4_t poly, float32x4_t n, uint32x4_t e, float32x4_t absn)
|
||||
{
|
||||
/* 2^n may overflow, break it up into s1*s2. */
|
||||
uint32x4_t b = (n <= v_f32 (0.0f)) & v_u32 (0x83000000);
|
||||
float32x4_t s1 = vreinterpretq_f32_u32 (v_u32 (0x7f000000) + b);
|
||||
float32x4_t s2 = vreinterpretq_f32_u32 (e - b);
|
||||
uint32x4_t cmp = absn > v_f32 (192.0f);
|
||||
float32x4_t r1 = s1 * s1;
|
||||
float32x4_t r0 = poly * s1 * s2;
|
||||
return vreinterpretq_f32_u32 ((cmp & vreinterpretq_u32_f32 (r1))
|
||||
| (~cmp & vreinterpretq_u32_f32 (r0)));
|
||||
}
|
||||
|
||||
float32x4_t VPCS_ATTR
|
||||
_ZGVnN4v_expf_1u (float32x4_t x)
|
||||
{
|
||||
float32x4_t n, r, scale, poly, absn, z;
|
||||
uint32x4_t cmp, e;
|
||||
|
||||
/* exp(x) = 2^n * poly(r), with poly(r) in [1/sqrt(2),sqrt(2)]
|
||||
x = ln2*n + r, with r in [-ln2/2, ln2/2]. */
|
||||
#if 1
|
||||
z = vfmaq_f32 (Shift, x, InvLn2);
|
||||
n = z - Shift;
|
||||
r = vfmaq_f32 (x, n, -Ln2hi);
|
||||
r = vfmaq_f32 (r, n, -Ln2lo);
|
||||
e = vreinterpretq_u32_f32 (z) << 23;
|
||||
#else
|
||||
z = x * InvLn2;
|
||||
n = vrndaq_f32 (z);
|
||||
r = vfmaq_f32 (x, n, -Ln2hi);
|
||||
r = vfmaq_f32 (r, n, -Ln2lo);
|
||||
e = vreinterpretq_u32_s32 (vcvtaq_s32_f32 (z)) << 23;
|
||||
#endif
|
||||
scale = vreinterpretq_f32_u32 (e + v_u32 (0x3f800000));
|
||||
absn = vabsq_f32 (n);
|
||||
cmp = absn > v_f32 (126.0f);
|
||||
poly = vfmaq_f32 (C1, C0, r);
|
||||
poly = vfmaq_f32 (C2, poly, r);
|
||||
poly = vfmaq_f32 (C3, poly, r);
|
||||
poly = vfmaq_f32 (C4, poly, r);
|
||||
poly = vfmaq_f32 (v_f32 (1.0f), poly, r);
|
||||
poly = vfmaq_f32 (v_f32 (1.0f), poly, r);
|
||||
if (unlikely (v_any_u32 (cmp)))
|
||||
return specialcase (poly, n, e, absn);
|
||||
return scale * poly;
|
||||
}
|
||||
@@ -1,100 +0,0 @@
|
||||
/*
|
||||
* Double-precision vector log(x) function.
|
||||
*
|
||||
* Copyright (c) 2019-2023, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
*/
|
||||
|
||||
#include "mathlib.h"
|
||||
#include "v_math.h"
|
||||
|
||||
static const struct data
|
||||
{
|
||||
uint64x2_t min_norm;
|
||||
uint32x4_t special_bound;
|
||||
float64x2_t poly[5];
|
||||
float64x2_t ln2;
|
||||
uint64x2_t sign_exp_mask;
|
||||
} data = {
|
||||
/* Worst-case error: 1.17 + 0.5 ulp.
|
||||
Rel error: 0x1.6272e588p-56 in [ -0x1.fc1p-9 0x1.009p-8 ]. */
|
||||
.poly = { V2 (-0x1.ffffffffffff7p-2), V2 (0x1.55555555170d4p-2),
|
||||
V2 (-0x1.0000000399c27p-2), V2 (0x1.999b2e90e94cap-3),
|
||||
V2 (-0x1.554e550bd501ep-3) },
|
||||
.ln2 = V2 (0x1.62e42fefa39efp-1),
|
||||
.min_norm = V2 (0x0010000000000000),
|
||||
.special_bound = V4 (0x7fe00000), /* asuint64(inf) - min_norm. */
|
||||
.sign_exp_mask = V2 (0xfff0000000000000)
|
||||
};
|
||||
|
||||
#define A(i) d->poly[i]
|
||||
#define N (1 << V_LOG_TABLE_BITS)
|
||||
#define IndexMask (N - 1)
|
||||
#define Off v_u64 (0x3fe6900900000000)
|
||||
|
||||
struct entry
|
||||
{
|
||||
float64x2_t invc;
|
||||
float64x2_t logc;
|
||||
};
|
||||
|
||||
static inline struct entry
|
||||
lookup (uint64x2_t i)
|
||||
{
|
||||
/* Since N is a power of 2, n % N = n & (N - 1). */
|
||||
struct entry e;
|
||||
uint64_t i0 = (i[0] >> (52 - V_LOG_TABLE_BITS)) & IndexMask;
|
||||
uint64_t i1 = (i[1] >> (52 - V_LOG_TABLE_BITS)) & IndexMask;
|
||||
float64x2_t e0 = vld1q_f64 (&__v_log_data.table[i0].invc);
|
||||
float64x2_t e1 = vld1q_f64 (&__v_log_data.table[i1].invc);
|
||||
e.invc = vuzp1q_f64 (e0, e1);
|
||||
e.logc = vuzp2q_f64 (e0, e1);
|
||||
return e;
|
||||
}
|
||||
|
||||
static float64x2_t VPCS_ATTR NOINLINE
|
||||
special_case (float64x2_t x, float64x2_t y, float64x2_t hi, float64x2_t r2,
|
||||
uint32x2_t cmp)
|
||||
{
|
||||
return v_call_f64 (log, x, vfmaq_f64 (hi, y, r2), vmovl_u32 (cmp));
|
||||
}
|
||||
|
||||
float64x2_t VPCS_ATTR V_NAME_D1 (log) (float64x2_t x)
|
||||
{
|
||||
const struct data *d = ptr_barrier (&data);
|
||||
float64x2_t z, r, r2, p, y, kd, hi;
|
||||
uint64x2_t ix, iz, tmp;
|
||||
uint32x2_t cmp;
|
||||
int64x2_t k;
|
||||
struct entry e;
|
||||
|
||||
ix = vreinterpretq_u64_f64 (x);
|
||||
cmp = vcge_u32 (vsubhn_u64 (ix, d->min_norm),
|
||||
vget_low_u32 (d->special_bound));
|
||||
|
||||
/* x = 2^k z; where z is in range [Off,2*Off) and exact.
|
||||
The range is split into N subintervals.
|
||||
The ith subinterval contains z and c is near its center. */
|
||||
tmp = vsubq_u64 (ix, Off);
|
||||
k = vshrq_n_s64 (vreinterpretq_s64_u64 (tmp), 52); /* arithmetic shift. */
|
||||
iz = vsubq_u64 (ix, vandq_u64 (tmp, d->sign_exp_mask));
|
||||
z = vreinterpretq_f64_u64 (iz);
|
||||
e = lookup (tmp);
|
||||
|
||||
/* log(x) = log1p(z/c-1) + log(c) + k*Ln2. */
|
||||
r = vfmaq_f64 (v_f64 (-1.0), z, e.invc);
|
||||
kd = vcvtq_f64_s64 (k);
|
||||
|
||||
/* hi = r + log(c) + k*Ln2. */
|
||||
hi = vfmaq_f64 (vaddq_f64 (e.logc, r), kd, d->ln2);
|
||||
/* y = r2*(A0 + r*A1 + r2*(A2 + r*A3 + r2*A4)) + hi. */
|
||||
r2 = vmulq_f64 (r, r);
|
||||
y = vfmaq_f64 (A (2), A (3), r);
|
||||
p = vfmaq_f64 (A (0), A (1), r);
|
||||
y = vfmaq_f64 (y, A (4), r2);
|
||||
y = vfmaq_f64 (p, y, r2);
|
||||
|
||||
if (unlikely (v_any_u32h (cmp)))
|
||||
return special_case (x, y, hi, r2, cmp);
|
||||
return vfmaq_f64 (hi, y, r2);
|
||||
}
|
||||
@@ -1,156 +0,0 @@
|
||||
/*
|
||||
* Lookup table for double-precision log(x) vector function.
|
||||
*
|
||||
* Copyright (c) 2019-2023, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
*/
|
||||
|
||||
#include "v_math.h"
|
||||
|
||||
#define N (1 << V_LOG_TABLE_BITS)
|
||||
|
||||
const struct v_log_data __v_log_data = {
|
||||
/* Algorithm:
|
||||
|
||||
x = 2^k z
|
||||
log(x) = k ln2 + log(c) + poly(z/c - 1)
|
||||
|
||||
where z is in [a;2a) which is split into N subintervals (a=0x1.69009p-1,
|
||||
N=128) and log(c) and 1/c for the ith subinterval comes from lookup tables:
|
||||
|
||||
table[i].invc = 1/c
|
||||
table[i].logc = (double)log(c)
|
||||
|
||||
where c is near the center of the subinterval and is chosen by trying several
|
||||
floating point invc candidates around 1/center and selecting one for which
|
||||
the error in (double)log(c) is minimized (< 0x1p-74), except the subinterval
|
||||
that contains 1 and the previous one got tweaked to avoid cancellation. */
|
||||
.table = { { 0x1.6a133d0dec120p+0, -0x1.62fe995eb963ap-2 },
|
||||
{ 0x1.6815f2f3e42edp+0, -0x1.5d5a48dad6b67p-2 },
|
||||
{ 0x1.661e39be1ac9ep+0, -0x1.57bde257d2769p-2 },
|
||||
{ 0x1.642bfa30ac371p+0, -0x1.52294fbf2af55p-2 },
|
||||
{ 0x1.623f1d916f323p+0, -0x1.4c9c7b598aa38p-2 },
|
||||
{ 0x1.60578da220f65p+0, -0x1.47174fc5ff560p-2 },
|
||||
{ 0x1.5e75349dea571p+0, -0x1.4199b7fa7b5cap-2 },
|
||||
{ 0x1.5c97fd387a75ap+0, -0x1.3c239f48cfb99p-2 },
|
||||
{ 0x1.5abfd2981f200p+0, -0x1.36b4f154d2aebp-2 },
|
||||
{ 0x1.58eca051dc99cp+0, -0x1.314d9a0ff32fbp-2 },
|
||||
{ 0x1.571e526d9df12p+0, -0x1.2bed85cca3cffp-2 },
|
||||
{ 0x1.5554d555b3fcbp+0, -0x1.2694a11421af9p-2 },
|
||||
{ 0x1.539015e2a20cdp+0, -0x1.2142d8d014fb2p-2 },
|
||||
{ 0x1.51d0014ee0164p+0, -0x1.1bf81a2c77776p-2 },
|
||||
{ 0x1.50148538cd9eep+0, -0x1.16b452a39c6a4p-2 },
|
||||
{ 0x1.4e5d8f9f698a1p+0, -0x1.11776ffa6c67ep-2 },
|
||||
{ 0x1.4cab0edca66bep+0, -0x1.0c416035020e0p-2 },
|
||||
{ 0x1.4afcf1a9db874p+0, -0x1.071211aa10fdap-2 },
|
||||
{ 0x1.495327136e16fp+0, -0x1.01e972e293b1bp-2 },
|
||||
{ 0x1.47ad9e84af28fp+0, -0x1.f98ee587fd434p-3 },
|
||||
{ 0x1.460c47b39ae15p+0, -0x1.ef5800ad716fbp-3 },
|
||||
{ 0x1.446f12b278001p+0, -0x1.e52e160484698p-3 },
|
||||
{ 0x1.42d5efdd720ecp+0, -0x1.db1104b19352ep-3 },
|
||||
{ 0x1.4140cfe001a0fp+0, -0x1.d100ac59e0bd6p-3 },
|
||||
{ 0x1.3fafa3b421f69p+0, -0x1.c6fced287c3bdp-3 },
|
||||
{ 0x1.3e225c9c8ece5p+0, -0x1.bd05a7b317c29p-3 },
|
||||
{ 0x1.3c98ec29a211ap+0, -0x1.b31abd229164fp-3 },
|
||||
{ 0x1.3b13442a413fep+0, -0x1.a93c0edadb0a3p-3 },
|
||||
{ 0x1.399156baa3c54p+0, -0x1.9f697ee30d7ddp-3 },
|
||||
{ 0x1.38131639b4cdbp+0, -0x1.95a2efa9aa40ap-3 },
|
||||
{ 0x1.36987540fbf53p+0, -0x1.8be843d796044p-3 },
|
||||
{ 0x1.352166b648f61p+0, -0x1.82395ecc477edp-3 },
|
||||
{ 0x1.33adddb3eb575p+0, -0x1.7896240966422p-3 },
|
||||
{ 0x1.323dcd99fc1d3p+0, -0x1.6efe77aca8c55p-3 },
|
||||
{ 0x1.30d129fefc7d2p+0, -0x1.65723e117ec5cp-3 },
|
||||
{ 0x1.2f67e6b72fe7dp+0, -0x1.5bf15c0955706p-3 },
|
||||
{ 0x1.2e01f7cf8b187p+0, -0x1.527bb6c111da1p-3 },
|
||||
{ 0x1.2c9f518ddc86ep+0, -0x1.491133c939f8fp-3 },
|
||||
{ 0x1.2b3fe86e5f413p+0, -0x1.3fb1b90c7fc58p-3 },
|
||||
{ 0x1.29e3b1211b25cp+0, -0x1.365d2cc485f8dp-3 },
|
||||
{ 0x1.288aa08b373cfp+0, -0x1.2d13758970de7p-3 },
|
||||
{ 0x1.2734abcaa8467p+0, -0x1.23d47a721fd47p-3 },
|
||||
{ 0x1.25e1c82459b81p+0, -0x1.1aa0229f25ec2p-3 },
|
||||
{ 0x1.2491eb1ad59c5p+0, -0x1.117655ddebc3bp-3 },
|
||||
{ 0x1.23450a54048b5p+0, -0x1.0856fbf83ab6bp-3 },
|
||||
{ 0x1.21fb1bb09e578p+0, -0x1.fe83fabbaa106p-4 },
|
||||
{ 0x1.20b415346d8f7p+0, -0x1.ec6e8507a56cdp-4 },
|
||||
{ 0x1.1f6fed179a1acp+0, -0x1.da6d68c7cc2eap-4 },
|
||||
{ 0x1.1e2e99b93c7b3p+0, -0x1.c88078462be0cp-4 },
|
||||
{ 0x1.1cf011a7a882ap+0, -0x1.b6a786a423565p-4 },
|
||||
{ 0x1.1bb44b97dba5ap+0, -0x1.a4e2676ac7f85p-4 },
|
||||
{ 0x1.1a7b3e66cdd4fp+0, -0x1.9330eea777e76p-4 },
|
||||
{ 0x1.1944e11dc56cdp+0, -0x1.8192f134d5ad9p-4 },
|
||||
{ 0x1.18112aebb1a6ep+0, -0x1.70084464f0538p-4 },
|
||||
{ 0x1.16e013231b7e9p+0, -0x1.5e90bdec5cb1fp-4 },
|
||||
{ 0x1.15b1913f156cfp+0, -0x1.4d2c3433c5536p-4 },
|
||||
{ 0x1.14859cdedde13p+0, -0x1.3bda7e219879ap-4 },
|
||||
{ 0x1.135c2dc68cfa4p+0, -0x1.2a9b732d27194p-4 },
|
||||
{ 0x1.12353bdb01684p+0, -0x1.196eeb2b10807p-4 },
|
||||
{ 0x1.1110bf25b85b4p+0, -0x1.0854be8ef8a7ep-4 },
|
||||
{ 0x1.0feeafd2f8577p+0, -0x1.ee998cb277432p-5 },
|
||||
{ 0x1.0ecf062c51c3bp+0, -0x1.ccadb79919fb9p-5 },
|
||||
{ 0x1.0db1baa076c8bp+0, -0x1.aae5b1d8618b0p-5 },
|
||||
{ 0x1.0c96c5bb3048ep+0, -0x1.89413015d7442p-5 },
|
||||
{ 0x1.0b7e20263e070p+0, -0x1.67bfe7bf158dep-5 },
|
||||
{ 0x1.0a67c2acd0ce3p+0, -0x1.46618f83941bep-5 },
|
||||
{ 0x1.0953a6391e982p+0, -0x1.2525df1b0618ap-5 },
|
||||
{ 0x1.0841c3caea380p+0, -0x1.040c8e2f77c6ap-5 },
|
||||
{ 0x1.07321489b13eap+0, -0x1.c62aad39f738ap-6 },
|
||||
{ 0x1.062491aee9904p+0, -0x1.847fe3bdead9cp-6 },
|
||||
{ 0x1.05193497a7cc5p+0, -0x1.43183683400acp-6 },
|
||||
{ 0x1.040ff6b5f5e9fp+0, -0x1.01f31c4e1d544p-6 },
|
||||
{ 0x1.0308d19aa6127p+0, -0x1.82201d1e6b69ap-7 },
|
||||
{ 0x1.0203beedb0c67p+0, -0x1.00dd0f3e1bfd6p-7 },
|
||||
{ 0x1.010037d38bcc2p+0, -0x1.ff6fe1feb4e53p-9 },
|
||||
{ 1.0, 0.0 },
|
||||
{ 0x1.fc06d493cca10p-1, 0x1.fe91885ec8e20p-8 },
|
||||
{ 0x1.f81e6ac3b918fp-1, 0x1.fc516f716296dp-7 },
|
||||
{ 0x1.f44546ef18996p-1, 0x1.7bb4dd70a015bp-6 },
|
||||
{ 0x1.f07b10382c84bp-1, 0x1.f84c99b34b674p-6 },
|
||||
{ 0x1.ecbf7070e59d4p-1, 0x1.39f9ce4fb2d71p-5 },
|
||||
{ 0x1.e91213f715939p-1, 0x1.7756c0fd22e78p-5 },
|
||||
{ 0x1.e572a9a75f7b7p-1, 0x1.b43ee82db8f3ap-5 },
|
||||
{ 0x1.e1e0e2c530207p-1, 0x1.f0b3fced60034p-5 },
|
||||
{ 0x1.de5c72d8a8be3p-1, 0x1.165bd78d4878ep-4 },
|
||||
{ 0x1.dae50fa5658ccp-1, 0x1.3425d2715ebe6p-4 },
|
||||
{ 0x1.d77a71145a2dap-1, 0x1.51b8bd91b7915p-4 },
|
||||
{ 0x1.d41c51166623ep-1, 0x1.6f15632c76a47p-4 },
|
||||
{ 0x1.d0ca6ba0bb29fp-1, 0x1.8c3c88ecbe503p-4 },
|
||||
{ 0x1.cd847e8e59681p-1, 0x1.a92ef077625dap-4 },
|
||||
{ 0x1.ca4a499693e00p-1, 0x1.c5ed5745fa006p-4 },
|
||||
{ 0x1.c71b8e399e821p-1, 0x1.e27876de1c993p-4 },
|
||||
{ 0x1.c3f80faf19077p-1, 0x1.fed104fce4cdcp-4 },
|
||||
{ 0x1.c0df92dc2b0ecp-1, 0x1.0d7bd9c17d78bp-3 },
|
||||
{ 0x1.bdd1de3cbb542p-1, 0x1.1b76986cef97bp-3 },
|
||||
{ 0x1.baceb9e1007a3p-1, 0x1.295913d24f750p-3 },
|
||||
{ 0x1.b7d5ef543e55ep-1, 0x1.37239fa295d17p-3 },
|
||||
{ 0x1.b4e749977d953p-1, 0x1.44d68dd78714bp-3 },
|
||||
{ 0x1.b20295155478ep-1, 0x1.52722ebe5d780p-3 },
|
||||
{ 0x1.af279f8e82be2p-1, 0x1.5ff6d12671f98p-3 },
|
||||
{ 0x1.ac5638197fdf3p-1, 0x1.6d64c2389484bp-3 },
|
||||
{ 0x1.a98e2f102e087p-1, 0x1.7abc4da40fddap-3 },
|
||||
{ 0x1.a6cf5606d05c1p-1, 0x1.87fdbda1e8452p-3 },
|
||||
{ 0x1.a4197fc04d746p-1, 0x1.95295b06a5f37p-3 },
|
||||
{ 0x1.a16c80293dc01p-1, 0x1.a23f6d34abbc5p-3 },
|
||||
{ 0x1.9ec82c4dc5bc9p-1, 0x1.af403a28e04f2p-3 },
|
||||
{ 0x1.9c2c5a491f534p-1, 0x1.bc2c06a85721ap-3 },
|
||||
{ 0x1.9998e1480b618p-1, 0x1.c903161240163p-3 },
|
||||
{ 0x1.970d9977c6c2dp-1, 0x1.d5c5aa93287ebp-3 },
|
||||
{ 0x1.948a5c023d212p-1, 0x1.e274051823fa9p-3 },
|
||||
{ 0x1.920f0303d6809p-1, 0x1.ef0e656300c16p-3 },
|
||||
{ 0x1.8f9b698a98b45p-1, 0x1.fb9509f05aa2ap-3 },
|
||||
{ 0x1.8d2f6b81726f6p-1, 0x1.04041821f37afp-2 },
|
||||
{ 0x1.8acae5bb55badp-1, 0x1.0a340a49b3029p-2 },
|
||||
{ 0x1.886db5d9275b8p-1, 0x1.105a7918a126dp-2 },
|
||||
{ 0x1.8617ba567c13cp-1, 0x1.1677819812b84p-2 },
|
||||
{ 0x1.83c8d27487800p-1, 0x1.1c8b405b40c0ep-2 },
|
||||
{ 0x1.8180de3c5dbe7p-1, 0x1.2295d16cfa6b1p-2 },
|
||||
{ 0x1.7f3fbe71cdb71p-1, 0x1.28975066318a2p-2 },
|
||||
{ 0x1.7d055498071c1p-1, 0x1.2e8fd855d86fcp-2 },
|
||||
{ 0x1.7ad182e54f65ap-1, 0x1.347f83d605e59p-2 },
|
||||
{ 0x1.78a42c3c90125p-1, 0x1.3a666d1244588p-2 },
|
||||
{ 0x1.767d342f76944p-1, 0x1.4044adb6f8ec4p-2 },
|
||||
{ 0x1.745c7ef26b00ap-1, 0x1.461a5f077558cp-2 },
|
||||
{ 0x1.7241f15769d0fp-1, 0x1.4be799e20b9c8p-2 },
|
||||
{ 0x1.702d70d396e41p-1, 0x1.51ac76a6b79dfp-2 },
|
||||
{ 0x1.6e1ee3700cd11p-1, 0x1.57690d5744a45p-2 },
|
||||
{ 0x1.6c162fc9cbe02p-1, 0x1.5d1d758e45217p-2 } }
|
||||
};
|
||||
@@ -1,74 +0,0 @@
|
||||
/*
|
||||
* Single-precision vector log function.
|
||||
*
|
||||
* Copyright (c) 2019-2023, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
*/
|
||||
|
||||
#include "mathlib.h"
|
||||
#include "v_math.h"
|
||||
|
||||
static const struct data
|
||||
{
|
||||
uint32x4_t min_norm;
|
||||
uint16x8_t special_bound;
|
||||
float32x4_t poly[7];
|
||||
float32x4_t ln2, tiny_bound;
|
||||
uint32x4_t off, mantissa_mask;
|
||||
} data = {
|
||||
/* 3.34 ulp error. */
|
||||
.poly = { V4 (-0x1.3e737cp-3f), V4 (0x1.5a9aa2p-3f), V4 (-0x1.4f9934p-3f),
|
||||
V4 (0x1.961348p-3f), V4 (-0x1.00187cp-2f), V4 (0x1.555d7cp-2f),
|
||||
V4 (-0x1.ffffc8p-2f) },
|
||||
.ln2 = V4 (0x1.62e43p-1f),
|
||||
.tiny_bound = V4 (0x1p-126),
|
||||
.min_norm = V4 (0x00800000),
|
||||
.special_bound = V8 (0x7f00), /* asuint32(inf) - min_norm. */
|
||||
.off = V4 (0x3f2aaaab), /* 0.666667. */
|
||||
.mantissa_mask = V4 (0x007fffff)
|
||||
};
|
||||
|
||||
#define P(i) d->poly[7 - i]
|
||||
|
||||
static float32x4_t VPCS_ATTR NOINLINE
|
||||
special_case (float32x4_t x, float32x4_t y, float32x4_t r2, float32x4_t p,
|
||||
uint16x4_t cmp)
|
||||
{
|
||||
/* Fall back to scalar code. */
|
||||
return v_call_f32 (logf, x, vfmaq_f32 (p, y, r2), vmovl_u16 (cmp));
|
||||
}
|
||||
|
||||
float32x4_t VPCS_ATTR V_NAME_F1 (log) (float32x4_t x)
|
||||
{
|
||||
const struct data *d = ptr_barrier (&data);
|
||||
float32x4_t n, p, q, r, r2, y;
|
||||
uint32x4_t u;
|
||||
uint16x4_t cmp;
|
||||
|
||||
u = vreinterpretq_u32_f32 (x);
|
||||
cmp = vcge_u16 (vsubhn_u32 (u, d->min_norm),
|
||||
vget_low_u16 (d->special_bound));
|
||||
|
||||
/* x = 2^n * (1+r), where 2/3 < 1+r < 4/3. */
|
||||
u = vsubq_u32 (u, d->off);
|
||||
n = vcvtq_f32_s32 (
|
||||
vshrq_n_s32 (vreinterpretq_s32_u32 (u), 23)); /* signextend. */
|
||||
u = vandq_u32 (u, d->mantissa_mask);
|
||||
u = vaddq_u32 (u, d->off);
|
||||
r = vsubq_f32 (vreinterpretq_f32_u32 (u), v_f32 (1.0f));
|
||||
|
||||
/* y = log(1+r) + n*ln2. */
|
||||
r2 = vmulq_f32 (r, r);
|
||||
/* n*ln2 + r + r2*(P1 + r*P2 + r2*(P3 + r*P4 + r2*(P5 + r*P6 + r2*P7))). */
|
||||
p = vfmaq_f32 (P (5), P (6), r);
|
||||
q = vfmaq_f32 (P (3), P (4), r);
|
||||
y = vfmaq_f32 (P (1), P (2), r);
|
||||
p = vfmaq_f32 (p, P (7), r2);
|
||||
q = vfmaq_f32 (q, p, r2);
|
||||
y = vfmaq_f32 (y, q, r2);
|
||||
p = vfmaq_f32 (r, d->ln2, n);
|
||||
|
||||
if (unlikely (v_any_u16h (cmp)))
|
||||
return special_case (x, y, r2, p, cmp);
|
||||
return vfmaq_f32 (p, y, r2);
|
||||
}
|
||||
@@ -1,135 +0,0 @@
|
||||
/*
|
||||
* Vector math abstractions.
|
||||
*
|
||||
* Copyright (c) 2019-2023, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
*/
|
||||
|
||||
#ifndef _V_MATH_H
|
||||
#define _V_MATH_H
|
||||
|
||||
#if !__aarch64__
|
||||
# error "Cannot build without AArch64"
|
||||
#endif
|
||||
|
||||
#define VPCS_ATTR __attribute__ ((aarch64_vector_pcs))
|
||||
|
||||
#define V_NAME_F1(fun) _ZGVnN4v_##fun##f
|
||||
#define V_NAME_D1(fun) _ZGVnN2v_##fun
|
||||
#define V_NAME_F2(fun) _ZGVnN4vv_##fun##f
|
||||
#define V_NAME_D2(fun) _ZGVnN2vv_##fun
|
||||
|
||||
#include <stdint.h>
|
||||
#include "../math_config.h"
|
||||
#include <arm_neon.h>
|
||||
|
||||
/* Shorthand helpers for declaring constants. */
|
||||
# define V2(X) { X, X }
|
||||
# define V4(X) { X, X, X, X }
|
||||
# define V8(X) { X, X, X, X, X, X, X, X }
|
||||
|
||||
static inline int
|
||||
v_any_u16h (uint16x4_t x)
|
||||
{
|
||||
return vget_lane_u64 (vreinterpret_u64_u16 (x), 0) != 0;
|
||||
}
|
||||
|
||||
static inline int
|
||||
v_lanes32 (void)
|
||||
{
|
||||
return 4;
|
||||
}
|
||||
|
||||
static inline float32x4_t
|
||||
v_f32 (float x)
|
||||
{
|
||||
return (float32x4_t) V4 (x);
|
||||
}
|
||||
static inline uint32x4_t
|
||||
v_u32 (uint32_t x)
|
||||
{
|
||||
return (uint32x4_t) V4 (x);
|
||||
}
|
||||
/* true if any elements of a v_cond result is non-zero. */
|
||||
static inline int
|
||||
v_any_u32 (uint32x4_t x)
|
||||
{
|
||||
/* assume elements in x are either 0 or -1u. */
|
||||
return vpaddd_u64 (vreinterpretq_u64_u32 (x)) != 0;
|
||||
}
|
||||
static inline int
|
||||
v_any_u32h (uint32x2_t x)
|
||||
{
|
||||
return vget_lane_u64 (vreinterpret_u64_u32 (x), 0) != 0;
|
||||
}
|
||||
static inline float32x4_t
|
||||
v_lookup_f32 (const float *tab, uint32x4_t idx)
|
||||
{
|
||||
return (float32x4_t){tab[idx[0]], tab[idx[1]], tab[idx[2]], tab[idx[3]]};
|
||||
}
|
||||
static inline uint32x4_t
|
||||
v_lookup_u32 (const uint32_t *tab, uint32x4_t idx)
|
||||
{
|
||||
return (uint32x4_t){tab[idx[0]], tab[idx[1]], tab[idx[2]], tab[idx[3]]};
|
||||
}
|
||||
static inline float32x4_t
|
||||
v_call_f32 (float (*f) (float), float32x4_t x, float32x4_t y, uint32x4_t p)
|
||||
{
|
||||
return (float32x4_t){p[0] ? f (x[0]) : y[0], p[1] ? f (x[1]) : y[1],
|
||||
p[2] ? f (x[2]) : y[2], p[3] ? f (x[3]) : y[3]};
|
||||
}
|
||||
static inline float32x4_t
|
||||
v_call2_f32 (float (*f) (float, float), float32x4_t x1, float32x4_t x2,
|
||||
float32x4_t y, uint32x4_t p)
|
||||
{
|
||||
return (float32x4_t){p[0] ? f (x1[0], x2[0]) : y[0],
|
||||
p[1] ? f (x1[1], x2[1]) : y[1],
|
||||
p[2] ? f (x1[2], x2[2]) : y[2],
|
||||
p[3] ? f (x1[3], x2[3]) : y[3]};
|
||||
}
|
||||
|
||||
static inline int
|
||||
v_lanes64 (void)
|
||||
{
|
||||
return 2;
|
||||
}
|
||||
static inline float64x2_t
|
||||
v_f64 (double x)
|
||||
{
|
||||
return (float64x2_t) V2 (x);
|
||||
}
|
||||
static inline uint64x2_t
|
||||
v_u64 (uint64_t x)
|
||||
{
|
||||
return (uint64x2_t) V2 (x);
|
||||
}
|
||||
/* true if any elements of a v_cond result is non-zero. */
|
||||
static inline int
|
||||
v_any_u64 (uint64x2_t x)
|
||||
{
|
||||
/* assume elements in x are either 0 or -1u. */
|
||||
return vpaddd_u64 (x) != 0;
|
||||
}
|
||||
static inline float64x2_t
|
||||
v_lookup_f64 (const double *tab, uint64x2_t idx)
|
||||
{
|
||||
return (float64x2_t){tab[idx[0]], tab[idx[1]]};
|
||||
}
|
||||
static inline uint64x2_t
|
||||
v_lookup_u64 (const uint64_t *tab, uint64x2_t idx)
|
||||
{
|
||||
return (uint64x2_t){tab[idx[0]], tab[idx[1]]};
|
||||
}
|
||||
static inline float64x2_t
|
||||
v_call_f64 (double (*f) (double), float64x2_t x, float64x2_t y, uint64x2_t p)
|
||||
{
|
||||
double p1 = p[1];
|
||||
double x1 = x[1];
|
||||
if (likely (p[0]))
|
||||
y[0] = f (x[0]);
|
||||
if (likely (p1))
|
||||
y[1] = f (x1);
|
||||
return y;
|
||||
}
|
||||
|
||||
#endif
|
||||
@@ -1,22 +0,0 @@
|
||||
/*
|
||||
* Double-precision vector pow function.
|
||||
*
|
||||
* Copyright (c) 2020-2023, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
*/
|
||||
|
||||
#include "mathlib.h"
|
||||
#include "v_math.h"
|
||||
|
||||
float64x2_t VPCS_ATTR V_NAME_D2 (pow) (float64x2_t x, float64x2_t y)
|
||||
{
|
||||
float64x2_t z;
|
||||
for (int lane = 0; lane < v_lanes64 (); lane++)
|
||||
{
|
||||
double sx = x[lane];
|
||||
double sy = y[lane];
|
||||
double sz = pow (sx, sy);
|
||||
z[lane] = sz;
|
||||
}
|
||||
return z;
|
||||
}
|
||||
@@ -1,148 +0,0 @@
|
||||
/*
|
||||
* Single-precision vector powf function.
|
||||
*
|
||||
* Copyright (c) 2019-2023, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
*/
|
||||
|
||||
#include "v_math.h"
|
||||
|
||||
#define Min v_u32 (0x00800000)
|
||||
#define Max v_u32 (0x7f800000)
|
||||
#define Thresh v_u32 (0x7f000000) /* Max - Min. */
|
||||
#define MantissaMask v_u32 (0x007fffff)
|
||||
|
||||
#define A data.log2_poly
|
||||
#define C data.exp2f_poly
|
||||
|
||||
/* 2.6 ulp ~ 0.5 + 2^24 (128*Ln2*relerr_log2 + relerr_exp2). */
|
||||
#define Off v_u32 (0x3f35d000)
|
||||
|
||||
#define V_POWF_LOG2_TABLE_BITS 5
|
||||
#define V_EXP2F_TABLE_BITS 5
|
||||
#define Log2IdxMask v_u32 ((1 << V_POWF_LOG2_TABLE_BITS) - 1)
|
||||
#define Scale ((double) (1 << V_EXP2F_TABLE_BITS))
|
||||
|
||||
static const struct
|
||||
{
|
||||
struct
|
||||
{
|
||||
double invc, logc;
|
||||
} log2_tab[1 << V_POWF_LOG2_TABLE_BITS];
|
||||
double log2_poly[4];
|
||||
uint64_t exp2f_tab[1 << V_EXP2F_TABLE_BITS];
|
||||
double exp2f_poly[3];
|
||||
} data = {
|
||||
.log2_tab = {{0x1.6489890582816p+0, -0x1.e960f97b22702p-2 * Scale},
|
||||
{0x1.5cf19b35e3472p+0, -0x1.c993406cd4db6p-2 * Scale},
|
||||
{0x1.55aac0e956d65p+0, -0x1.aa711d9a7d0f3p-2 * Scale},
|
||||
{0x1.4eb0022977e01p+0, -0x1.8bf37bacdce9bp-2 * Scale},
|
||||
{0x1.47fcccda1dd1fp+0, -0x1.6e13b3519946ep-2 * Scale},
|
||||
{0x1.418ceabab68c1p+0, -0x1.50cb8281e4089p-2 * Scale},
|
||||
{0x1.3b5c788f1edb3p+0, -0x1.341504a237e2bp-2 * Scale},
|
||||
{0x1.3567de48e9c9ap+0, -0x1.17eaab624ffbbp-2 * Scale},
|
||||
{0x1.2fabc80fd19bap+0, -0x1.f88e708f8c853p-3 * Scale},
|
||||
{0x1.2a25200ce536bp+0, -0x1.c24b6da113914p-3 * Scale},
|
||||
{0x1.24d108e0152e3p+0, -0x1.8d02ee397cb1dp-3 * Scale},
|
||||
{0x1.1facd8ab2fbe1p+0, -0x1.58ac1223408b3p-3 * Scale},
|
||||
{0x1.1ab614a03efdfp+0, -0x1.253e6fd190e89p-3 * Scale},
|
||||
{0x1.15ea6d03af9ffp+0, -0x1.e5641882c12ffp-4 * Scale},
|
||||
{0x1.1147b994bb776p+0, -0x1.81fea712926f7p-4 * Scale},
|
||||
{0x1.0ccbf650593aap+0, -0x1.203e240de64a3p-4 * Scale},
|
||||
{0x1.0875408477302p+0, -0x1.8029b86a78281p-5 * Scale},
|
||||
{0x1.0441d42a93328p+0, -0x1.85d713190fb9p-6 * Scale},
|
||||
{0x1p+0, 0x0p+0 * Scale},
|
||||
{0x1.f1d006c855e86p-1, 0x1.4c1cc07312997p-5 * Scale},
|
||||
{0x1.e28c3341aa301p-1, 0x1.5e1848ccec948p-4 * Scale},
|
||||
{0x1.d4bdf9aa64747p-1, 0x1.04cfcb7f1196fp-3 * Scale},
|
||||
{0x1.c7b45a24e5803p-1, 0x1.582813d463c21p-3 * Scale},
|
||||
{0x1.bb5f5eb2ed60ap-1, 0x1.a936fa68760ccp-3 * Scale},
|
||||
{0x1.afb0bff8fe6b4p-1, 0x1.f81bc31d6cc4ep-3 * Scale},
|
||||
{0x1.a49badf7ab1f5p-1, 0x1.2279a09fae6b1p-2 * Scale},
|
||||
{0x1.9a14a111fc4c9p-1, 0x1.47ec0b6df5526p-2 * Scale},
|
||||
{0x1.901131f5b2fdcp-1, 0x1.6c71762280f1p-2 * Scale},
|
||||
{0x1.8687f73f6d865p-1, 0x1.90155070798dap-2 * Scale},
|
||||
{0x1.7d7067eb77986p-1, 0x1.b2e23b1d3068cp-2 * Scale},
|
||||
{0x1.74c2c1cf97b65p-1, 0x1.d4e21b0daa86ap-2 * Scale},
|
||||
{0x1.6c77f37cff2a1p-1, 0x1.f61e2a2f67f3fp-2 * Scale},},
|
||||
.log2_poly = { /* rel err: 1.5 * 2^-30. */
|
||||
-0x1.6ff5daa3b3d7cp-2 * Scale, 0x1.ec81d03c01aebp-2 * Scale,
|
||||
-0x1.71547bb43f101p-1 * Scale, 0x1.7154764a815cbp0 * Scale,},
|
||||
.exp2f_tab = {0x3ff0000000000000, 0x3fefd9b0d3158574, 0x3fefb5586cf9890f,
|
||||
0x3fef9301d0125b51, 0x3fef72b83c7d517b, 0x3fef54873168b9aa,
|
||||
0x3fef387a6e756238, 0x3fef1e9df51fdee1, 0x3fef06fe0a31b715,
|
||||
0x3feef1a7373aa9cb, 0x3feedea64c123422, 0x3feece086061892d,
|
||||
0x3feebfdad5362a27, 0x3feeb42b569d4f82, 0x3feeab07dd485429,
|
||||
0x3feea47eb03a5585, 0x3feea09e667f3bcd, 0x3fee9f75e8ec5f74,
|
||||
0x3feea11473eb0187, 0x3feea589994cce13, 0x3feeace5422aa0db,
|
||||
0x3feeb737b0cdc5e5, 0x3feec49182a3f090, 0x3feed503b23e255d,
|
||||
0x3feee89f995ad3ad, 0x3feeff76f2fb5e47, 0x3fef199bdd85529c,
|
||||
0x3fef3720dcef9069, 0x3fef5818dcfba487, 0x3fef7c97337b9b5f,
|
||||
0x3fefa4afa2a490da, 0x3fefd0765b6e4540,},
|
||||
.exp2f_poly = { /* rel err: 1.69 * 2^-34. */
|
||||
0x1.c6af84b912394p-5 / Scale / Scale / Scale,
|
||||
0x1.ebfce50fac4f3p-3 / Scale / Scale,
|
||||
0x1.62e42ff0c52d6p-1 / Scale}};
|
||||
|
||||
static float32x4_t VPCS_ATTR NOINLINE
|
||||
special_case (float32x4_t x, float32x4_t y, float32x4_t ret, uint32x4_t cmp)
|
||||
{
|
||||
return v_call2_f32 (powf, x, y, ret, cmp);
|
||||
}
|
||||
|
||||
float32x4_t VPCS_ATTR V_NAME_F2 (pow) (float32x4_t x, float32x4_t y)
|
||||
{
|
||||
uint32x4_t u = vreinterpretq_u32_f32 (x);
|
||||
uint32x4_t cmp = vcgeq_u32 (vsubq_u32 (u, Min), Thresh);
|
||||
uint32x4_t tmp = vsubq_u32 (u, Off);
|
||||
uint32x4_t i = vandq_u32 (vshrq_n_u32 (tmp, (23 - V_POWF_LOG2_TABLE_BITS)),
|
||||
Log2IdxMask);
|
||||
uint32x4_t top = vbicq_u32 (tmp, MantissaMask);
|
||||
uint32x4_t iz = vsubq_u32 (u, top);
|
||||
int32x4_t k = vshrq_n_s32 (vreinterpretq_s32_u32 (top),
|
||||
23 - V_EXP2F_TABLE_BITS); /* arithmetic shift. */
|
||||
|
||||
float32x4_t ret;
|
||||
for (int lane = 0; lane < 4; lane++)
|
||||
{
|
||||
/* Use double precision for each lane. */
|
||||
double invc = data.log2_tab[i[lane]].invc;
|
||||
double logc = data.log2_tab[i[lane]].logc;
|
||||
double z = (double) asfloat (iz[lane]);
|
||||
|
||||
/* log2(x) = log1p(z/c-1)/ln2 + log2(c) + k. */
|
||||
double r = __builtin_fma (z, invc, -1.0);
|
||||
double y0 = logc + (double) k[lane];
|
||||
|
||||
/* Polynomial to approximate log1p(r)/ln2. */
|
||||
double logx = A[0];
|
||||
logx = r * logx + A[1];
|
||||
logx = r * logx + A[2];
|
||||
logx = r * logx + A[3];
|
||||
logx = r * logx + y0;
|
||||
double ylogx = y[lane] * logx;
|
||||
cmp[lane] = (asuint64 (ylogx) >> 47 & 0xffff)
|
||||
>= asuint64 (126.0 * (1 << V_EXP2F_TABLE_BITS)) >> 47
|
||||
? 1
|
||||
: cmp[lane];
|
||||
|
||||
/* N*x = k + r with r in [-1/2, 1/2]. */
|
||||
double kd = round (ylogx);
|
||||
uint64_t ki = lround (ylogx);
|
||||
r = ylogx - kd;
|
||||
|
||||
/* exp2(x) = 2^(k/N) * 2^r ~= s * (C0*r^3 + C1*r^2 + C2*r + 1). */
|
||||
uint64_t t = data.exp2f_tab[ki % (1 << V_EXP2F_TABLE_BITS)];
|
||||
t += ki << (52 - V_EXP2F_TABLE_BITS);
|
||||
double s = asdouble (t);
|
||||
double p = C[0];
|
||||
p = __builtin_fma (p, r, C[1]);
|
||||
p = __builtin_fma (p, r, C[2]);
|
||||
p = __builtin_fma (p, s * r, s);
|
||||
|
||||
ret[lane] = p;
|
||||
}
|
||||
if (unlikely (v_any_u32 (cmp)))
|
||||
return special_case (x, y, ret, cmp);
|
||||
return ret;
|
||||
}
|
||||
@@ -1,97 +0,0 @@
|
||||
/*
|
||||
* Double-precision vector sin function.
|
||||
*
|
||||
* Copyright (c) 2019-2023, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
*/
|
||||
|
||||
#include "mathlib.h"
|
||||
#include "v_math.h"
|
||||
|
||||
static const struct data
|
||||
{
|
||||
float64x2_t poly[7];
|
||||
float64x2_t range_val, inv_pi, shift, pi_1, pi_2, pi_3;
|
||||
} data = {
|
||||
.poly = { V2 (-0x1.555555555547bp-3), V2 (0x1.1111111108a4dp-7),
|
||||
V2 (-0x1.a01a019936f27p-13), V2 (0x1.71de37a97d93ep-19),
|
||||
V2 (-0x1.ae633919987c6p-26), V2 (0x1.60e277ae07cecp-33),
|
||||
V2 (-0x1.9e9540300a1p-41) },
|
||||
|
||||
.range_val = V2 (0x1p23),
|
||||
.inv_pi = V2 (0x1.45f306dc9c883p-2),
|
||||
.pi_1 = V2 (0x1.921fb54442d18p+1),
|
||||
.pi_2 = V2 (0x1.1a62633145c06p-53),
|
||||
.pi_3 = V2 (0x1.c1cd129024e09p-106),
|
||||
.shift = V2 (0x1.8p52),
|
||||
};
|
||||
|
||||
#if WANT_SIMD_EXCEPT
|
||||
# define TinyBound v_u64 (0x3000000000000000) /* asuint64 (0x1p-255). */
|
||||
# define Thresh v_u64 (0x1160000000000000) /* RangeVal - TinyBound. */
|
||||
#endif
|
||||
|
||||
#define C(i) d->poly[i]
|
||||
|
||||
static float64x2_t VPCS_ATTR NOINLINE
|
||||
special_case (float64x2_t x, float64x2_t y, uint64x2_t odd, uint64x2_t cmp)
|
||||
{
|
||||
y = vreinterpretq_f64_u64 (veorq_u64 (vreinterpretq_u64_f64 (y), odd));
|
||||
return v_call_f64 (sin, x, y, cmp);
|
||||
}
|
||||
|
||||
/* Vector (AdvSIMD) sin approximation.
|
||||
Maximum observed error in [-pi/2, pi/2], where argument is not reduced,
|
||||
is 2.87 ULP:
|
||||
_ZGVnN2v_sin (0x1.921d5c6a07142p+0) got 0x1.fffffffa7dc02p-1
|
||||
want 0x1.fffffffa7dc05p-1
|
||||
Maximum observed error in the entire non-special domain ([-2^23, 2^23])
|
||||
is 3.22 ULP:
|
||||
_ZGVnN2v_sin (0x1.5702447b6f17bp+22) got 0x1.ffdcd125c84fbp-3
|
||||
want 0x1.ffdcd125c84f8p-3. */
|
||||
float64x2_t VPCS_ATTR V_NAME_D1 (sin) (float64x2_t x)
|
||||
{
|
||||
const struct data *d = ptr_barrier (&data);
|
||||
float64x2_t n, r, r2, r3, r4, y, t1, t2, t3;
|
||||
uint64x2_t odd, cmp;
|
||||
|
||||
#if WANT_SIMD_EXCEPT
|
||||
/* Detect |x| <= TinyBound or |x| >= RangeVal. If fenv exceptions are to be
|
||||
triggered correctly, set any special lanes to 1 (which is neutral w.r.t.
|
||||
fenv). These lanes will be fixed by special-case handler later. */
|
||||
uint64x2_t ir = vreinterpretq_u64_f64 (vabsq_f64 (x));
|
||||
cmp = vcgeq_u64 (vsubq_u64 (ir, TinyBound), Thresh);
|
||||
r = vbslq_f64 (cmp, vreinterpretq_f64_u64 (cmp), x);
|
||||
#else
|
||||
r = x;
|
||||
cmp = vcageq_f64 (x, d->range_val);
|
||||
#endif
|
||||
|
||||
/* n = rint(|x|/pi). */
|
||||
n = vfmaq_f64 (d->shift, d->inv_pi, r);
|
||||
odd = vshlq_n_u64 (vreinterpretq_u64_f64 (n), 63);
|
||||
n = vsubq_f64 (n, d->shift);
|
||||
|
||||
/* r = |x| - n*pi (range reduction into -pi/2 .. pi/2). */
|
||||
r = vfmsq_f64 (r, d->pi_1, n);
|
||||
r = vfmsq_f64 (r, d->pi_2, n);
|
||||
r = vfmsq_f64 (r, d->pi_3, n);
|
||||
|
||||
/* sin(r) poly approx. */
|
||||
r2 = vmulq_f64 (r, r);
|
||||
r3 = vmulq_f64 (r2, r);
|
||||
r4 = vmulq_f64 (r2, r2);
|
||||
|
||||
t1 = vfmaq_f64 (C (4), C (5), r2);
|
||||
t2 = vfmaq_f64 (C (2), C (3), r2);
|
||||
t3 = vfmaq_f64 (C (0), C (1), r2);
|
||||
|
||||
y = vfmaq_f64 (t1, C (6), r4);
|
||||
y = vfmaq_f64 (t2, y, r4);
|
||||
y = vfmaq_f64 (t3, y, r4);
|
||||
y = vfmaq_f64 (r, y, r3);
|
||||
|
||||
if (unlikely (v_any_u64 (cmp)))
|
||||
return special_case (x, y, odd, cmp);
|
||||
return vreinterpretq_f64_u64 (veorq_u64 (vreinterpretq_u64_f64 (y), odd));
|
||||
}
|
||||
@@ -1,82 +0,0 @@
|
||||
/*
|
||||
* Single-precision vector sin function.
|
||||
*
|
||||
* Copyright (c) 2019-2023, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
*/
|
||||
|
||||
#include "mathlib.h"
|
||||
#include "v_math.h"
|
||||
|
||||
static const struct data
|
||||
{
|
||||
float32x4_t poly[4];
|
||||
float32x4_t range_val, inv_pi, shift, pi_1, pi_2, pi_3;
|
||||
} data = {
|
||||
/* 1.886 ulp error. */
|
||||
.poly = { V4 (-0x1.555548p-3f), V4 (0x1.110df4p-7f), V4 (-0x1.9f42eap-13f),
|
||||
V4 (0x1.5b2e76p-19f) },
|
||||
|
||||
.pi_1 = V4 (0x1.921fb6p+1f),
|
||||
.pi_2 = V4 (-0x1.777a5cp-24f),
|
||||
.pi_3 = V4 (-0x1.ee59dap-49f),
|
||||
|
||||
.inv_pi = V4 (0x1.45f306p-2f),
|
||||
.shift = V4 (0x1.8p+23f),
|
||||
.range_val = V4 (0x1p20f)
|
||||
};
|
||||
|
||||
#if WANT_SIMD_EXCEPT
|
||||
# define TinyBound v_u32 (0x21000000) /* asuint32(0x1p-61f). */
|
||||
# define Thresh v_u32 (0x28800000) /* RangeVal - TinyBound. */
|
||||
#endif
|
||||
|
||||
#define C(i) d->poly[i]
|
||||
|
||||
static float32x4_t VPCS_ATTR NOINLINE
|
||||
special_case (float32x4_t x, float32x4_t y, uint32x4_t odd, uint32x4_t cmp)
|
||||
{
|
||||
/* Fall back to scalar code. */
|
||||
y = vreinterpretq_f32_u32 (veorq_u32 (vreinterpretq_u32_f32 (y), odd));
|
||||
return v_call_f32 (sinf, x, y, cmp);
|
||||
}
|
||||
|
||||
float32x4_t VPCS_ATTR V_NAME_F1 (sin) (float32x4_t x)
|
||||
{
|
||||
const struct data *d = ptr_barrier (&data);
|
||||
float32x4_t n, r, r2, y;
|
||||
uint32x4_t odd, cmp;
|
||||
|
||||
#if WANT_SIMD_EXCEPT
|
||||
uint32x4_t ir = vreinterpretq_u32_f32 (vabsq_f32 (x));
|
||||
cmp = vcgeq_u32 (vsubq_u32 (ir, TinyBound), Thresh);
|
||||
/* If fenv exceptions are to be triggered correctly, set any special lanes
|
||||
to 1 (which is neutral w.r.t. fenv). These lanes will be fixed by
|
||||
special-case handler later. */
|
||||
r = vbslq_f32 (cmp, vreinterpretq_f32_u32 (cmp), x);
|
||||
#else
|
||||
r = x;
|
||||
cmp = vcageq_f32 (x, d->range_val);
|
||||
#endif
|
||||
|
||||
/* n = rint(|x|/pi) */
|
||||
n = vfmaq_f32 (d->shift, d->inv_pi, r);
|
||||
odd = vshlq_n_u32 (vreinterpretq_u32_f32 (n), 31);
|
||||
n = vsubq_f32 (n, d->shift);
|
||||
|
||||
/* r = |x| - n*pi (range reduction into -pi/2 .. pi/2) */
|
||||
r = vfmsq_f32 (r, d->pi_1, n);
|
||||
r = vfmsq_f32 (r, d->pi_2, n);
|
||||
r = vfmsq_f32 (r, d->pi_3, n);
|
||||
|
||||
/* y = sin(r) */
|
||||
r2 = vmulq_f32 (r, r);
|
||||
y = vfmaq_f32 (C (2), C (3), r2);
|
||||
y = vfmaq_f32 (C (1), y, r2);
|
||||
y = vfmaq_f32 (C (0), y, r2);
|
||||
y = vfmaq_f32 (r, vmulq_f32 (y, r2), r);
|
||||
|
||||
if (unlikely (v_any_u32 (cmp)))
|
||||
return special_case (x, y, odd, cmp);
|
||||
return vreinterpretq_f32_u32 (veorq_u32 (vreinterpretq_u32_f32 (y), odd));
|
||||
}
|
||||
+3
-3
@@ -1,8 +1,8 @@
|
||||
/*
|
||||
* Single-precision cos function.
|
||||
*
|
||||
* Copyright (c) 2018-2021, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
* Copyright (c) 2018-2019, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT
|
||||
*/
|
||||
|
||||
#include <stdint.h>
|
||||
@@ -22,7 +22,7 @@ cosf (float y)
|
||||
int n;
|
||||
const sincos_t *p = &__sincosf_table[0];
|
||||
|
||||
if (abstop12 (y) < abstop12 (pio4f))
|
||||
if (abstop12 (y) < abstop12 (pio4))
|
||||
{
|
||||
double x2 = x * x;
|
||||
|
||||
|
||||
+1
-1
@@ -2,7 +2,7 @@
|
||||
* Double-precision erf(x) function.
|
||||
*
|
||||
* Copyright (c) 2020, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
* SPDX-License-Identifier: MIT
|
||||
*/
|
||||
|
||||
#include "math_config.h"
|
||||
|
||||
+1
-1
@@ -2,7 +2,7 @@
|
||||
* Shared data between erf and erfc.
|
||||
*
|
||||
* Copyright (c) 2019-2020, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
* SPDX-License-Identifier: MIT
|
||||
*/
|
||||
|
||||
#include "math_config.h"
|
||||
|
||||
+1
-1
@@ -2,7 +2,7 @@
|
||||
* Single-precision erf(x) function.
|
||||
*
|
||||
* Copyright (c) 2020, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
* SPDX-License-Identifier: MIT
|
||||
*/
|
||||
|
||||
#include <stdint.h>
|
||||
|
||||
+1
-1
@@ -2,7 +2,7 @@
|
||||
* Data for approximation of erff.
|
||||
*
|
||||
* Copyright (c) 2019-2020, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
* SPDX-License-Identifier: MIT
|
||||
*/
|
||||
|
||||
#include "math_config.h"
|
||||
|
||||
+1
-1
@@ -2,7 +2,7 @@
|
||||
* Double-precision e^x function.
|
||||
*
|
||||
* Copyright (c) 2018-2019, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
* SPDX-License-Identifier: MIT
|
||||
*/
|
||||
|
||||
#include <float.h>
|
||||
|
||||
-129
@@ -1,129 +0,0 @@
|
||||
/*
|
||||
* Double-precision 10^x function.
|
||||
*
|
||||
* Copyright (c) 2023, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
*/
|
||||
|
||||
#include "math_config.h"
|
||||
|
||||
#define N (1 << EXP_TABLE_BITS)
|
||||
#define IndexMask (N - 1)
|
||||
#define OFlowBound 0x1.34413509f79ffp8 /* log10(DBL_MAX). */
|
||||
#define UFlowBound -0x1.5ep+8 /* -350. */
|
||||
#define SmallTop 0x3c6 /* top12(0x1p-57). */
|
||||
#define BigTop 0x407 /* top12(0x1p8). */
|
||||
#define Thresh 0x41 /* BigTop - SmallTop. */
|
||||
#define Shift __exp_data.shift
|
||||
#define C(i) __exp_data.exp10_poly[i]
|
||||
|
||||
static double
|
||||
special_case (uint64_t sbits, double_t tmp, uint64_t ki)
|
||||
{
|
||||
double_t scale, y;
|
||||
|
||||
if (ki - (1ull << 16) < 0x80000000)
|
||||
{
|
||||
/* The exponent of scale might have overflowed by 1. */
|
||||
sbits -= 1ull << 52;
|
||||
scale = asdouble (sbits);
|
||||
y = 2 * (scale + scale * tmp);
|
||||
return check_oflow (eval_as_double (y));
|
||||
}
|
||||
|
||||
/* n < 0, need special care in the subnormal range. */
|
||||
sbits += 1022ull << 52;
|
||||
scale = asdouble (sbits);
|
||||
y = scale + scale * tmp;
|
||||
|
||||
if (y < 1.0)
|
||||
{
|
||||
/* Round y to the right precision before scaling it into the subnormal
|
||||
range to avoid double rounding that can cause 0.5+E/2 ulp error where
|
||||
E is the worst-case ulp error outside the subnormal range. So this
|
||||
is only useful if the goal is better than 1 ulp worst-case error. */
|
||||
double_t lo = scale - y + scale * tmp;
|
||||
double_t hi = 1.0 + y;
|
||||
lo = 1.0 - hi + y + lo;
|
||||
y = eval_as_double (hi + lo) - 1.0;
|
||||
/* Avoid -0.0 with downward rounding. */
|
||||
if (WANT_ROUNDING && y == 0.0)
|
||||
y = 0.0;
|
||||
/* The underflow exception needs to be signaled explicitly. */
|
||||
force_eval_double (opt_barrier_double (0x1p-1022) * 0x1p-1022);
|
||||
}
|
||||
y = 0x1p-1022 * y;
|
||||
|
||||
return check_uflow (y);
|
||||
}
|
||||
|
||||
/* Double-precision 10^x approximation. Largest observed error is ~0.513 ULP. */
|
||||
double
|
||||
exp10 (double x)
|
||||
{
|
||||
uint64_t ix = asuint64 (x);
|
||||
uint32_t abstop = (ix >> 52) & 0x7ff;
|
||||
|
||||
if (unlikely (abstop - SmallTop >= Thresh))
|
||||
{
|
||||
if (abstop - SmallTop >= 0x80000000)
|
||||
/* Avoid spurious underflow for tiny x.
|
||||
Note: 0 is common input. */
|
||||
return x + 1;
|
||||
if (abstop == 0x7ff)
|
||||
return ix == asuint64 (-INFINITY) ? 0.0 : x + 1.0;
|
||||
if (x >= OFlowBound)
|
||||
return __math_oflow (0);
|
||||
if (x < UFlowBound)
|
||||
return __math_uflow (0);
|
||||
|
||||
/* Large x is special-cased below. */
|
||||
abstop = 0;
|
||||
}
|
||||
|
||||
/* Reduce x: z = x * N / log10(2), k = round(z). */
|
||||
double_t z = __exp_data.invlog10_2N * x;
|
||||
double_t kd;
|
||||
int64_t ki;
|
||||
#if TOINT_INTRINSICS
|
||||
kd = roundtoint (z);
|
||||
ki = converttoint (z);
|
||||
#else
|
||||
kd = eval_as_double (z + Shift);
|
||||
kd -= Shift;
|
||||
ki = kd;
|
||||
#endif
|
||||
|
||||
/* r = x - k * log10(2), r in [-0.5, 0.5]. */
|
||||
double_t r = x;
|
||||
r = __exp_data.neglog10_2hiN * kd + r;
|
||||
r = __exp_data.neglog10_2loN * kd + r;
|
||||
|
||||
/* exp10(x) = 2^(k/N) * 2^(r/N).
|
||||
Approximate the two components separately. */
|
||||
|
||||
/* s = 2^(k/N), using lookup table. */
|
||||
uint64_t e = ki << (52 - EXP_TABLE_BITS);
|
||||
uint64_t i = (ki & IndexMask) * 2;
|
||||
uint64_t u = __exp_data.tab[i + 1];
|
||||
uint64_t sbits = u + e;
|
||||
|
||||
double_t tail = asdouble (__exp_data.tab[i]);
|
||||
|
||||
/* 2^(r/N) ~= 1 + r * Poly(r). */
|
||||
double_t r2 = r * r;
|
||||
double_t p = C (0) + r * C (1);
|
||||
double_t y = C (2) + r * C (3);
|
||||
y = y + r2 * C (4);
|
||||
y = p + r2 * y;
|
||||
y = tail + y * r;
|
||||
|
||||
if (unlikely (abstop == 0))
|
||||
return special_case (sbits, y, ki);
|
||||
|
||||
/* Assemble components:
|
||||
y = 2^(r/N) * 2^(k/N)
|
||||
~= (y + 1) * s. */
|
||||
double_t s = asdouble (sbits);
|
||||
return eval_as_double (s * y + s);
|
||||
}
|
||||
+1
-1
@@ -2,7 +2,7 @@
|
||||
* Double-precision 2^x function.
|
||||
*
|
||||
* Copyright (c) 2018-2019, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
* SPDX-License-Identifier: MIT
|
||||
*/
|
||||
|
||||
#include <float.h>
|
||||
|
||||
+1
-1
@@ -2,7 +2,7 @@
|
||||
* Single-precision 2^x function.
|
||||
*
|
||||
* Copyright (c) 2017-2018, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
* SPDX-License-Identifier: MIT
|
||||
*/
|
||||
|
||||
#include <math.h>
|
||||
|
||||
+1
-1
@@ -2,7 +2,7 @@
|
||||
* Shared data between expf, exp2f and powf.
|
||||
*
|
||||
* Copyright (c) 2017-2018, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
* SPDX-License-Identifier: MIT
|
||||
*/
|
||||
|
||||
#include "math_config.h"
|
||||
|
||||
+1
-22
@@ -2,7 +2,7 @@
|
||||
* Shared data between exp, exp2 and pow.
|
||||
*
|
||||
* Copyright (c) 2018, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
* SPDX-License-Identifier: MIT
|
||||
*/
|
||||
|
||||
#include "math_config.h"
|
||||
@@ -12,7 +12,6 @@
|
||||
const struct exp_data __exp_data = {
|
||||
// N/ln2
|
||||
.invln2N = 0x1.71547652b82fep0 * N,
|
||||
.invlog10_2N = 0x1.a934f0979a371p1 * N,
|
||||
// -ln2/N
|
||||
#if N == 64
|
||||
.negln2hiN = -0x1.62e42fefa0000p-7,
|
||||
@@ -27,8 +26,6 @@ const struct exp_data __exp_data = {
|
||||
.negln2hiN = -0x1.62e42fef80000p-10,
|
||||
.negln2loN = -0x1.1cf79abc9e3b4p-45,
|
||||
#endif
|
||||
.neglog10_2hiN = -0x1.3441350ap-2 / N,
|
||||
.neglog10_2loN = 0x1.0c0219dc1da99p-39 / N,
|
||||
// Used for rounding when !TOINT_INTRINSICS
|
||||
#if EXP_USE_TOINT_NARROW
|
||||
.shift = 0x1800000000.8p0,
|
||||
@@ -150,24 +147,6 @@ const struct exp_data __exp_data = {
|
||||
0x1.3b2ab786ee1dap-7,
|
||||
#endif
|
||||
},
|
||||
.exp10_poly = {
|
||||
#if EXP10_POLY_WIDE
|
||||
/* Range is wider if using shift-based reduction: coeffs generated
|
||||
using Remez in [-log10(2)/128, log10(2)/128 ]. */
|
||||
0x1.26bb1bbb55515p1,
|
||||
0x1.53524c73cd32bp1,
|
||||
0x1.0470591e1a108p1,
|
||||
0x1.2bd77b12fe9a8p0,
|
||||
0x1.14289fef24b78p-1
|
||||
#else
|
||||
/* Coeffs generated using Remez in [-log10(2)/256, log10(2)/256 ]. */
|
||||
0x1.26bb1bbb55516p1,
|
||||
0x1.53524c73ce9fep1,
|
||||
0x1.0470591ce4b26p1,
|
||||
0x1.2bd76577fe684p0,
|
||||
0x1.1446eeccd0efbp-1
|
||||
#endif
|
||||
},
|
||||
// 2^(k/N) ~= H[k]*(1 + T[k]) for int k in [0,N)
|
||||
// tab[2*k] = asuint64(T[k])
|
||||
// tab[2*k+1] = asuint64(H[k]) - (k << 52)/N
|
||||
|
||||
+1
-1
@@ -2,7 +2,7 @@
|
||||
* Single-precision e^x function.
|
||||
*
|
||||
* Copyright (c) 2017-2019, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
* SPDX-License-Identifier: MIT
|
||||
*/
|
||||
|
||||
#include <math.h>
|
||||
|
||||
+55
-14
@@ -1,8 +1,8 @@
|
||||
/*
|
||||
* Public API.
|
||||
*
|
||||
* Copyright (c) 2015-2023, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
* Copyright (c) 2015-2020, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT
|
||||
*/
|
||||
|
||||
#ifndef _MATHLIB_H
|
||||
@@ -18,33 +18,74 @@ float cosf (float);
|
||||
void sincosf (float, float*, float*);
|
||||
|
||||
double exp (double);
|
||||
double exp10 (double);
|
||||
double exp2 (double);
|
||||
double log (double);
|
||||
double log2 (double);
|
||||
double pow (double, double);
|
||||
|
||||
/* Scalar functions using the vector algorithm with identical result. */
|
||||
float __s_sinf (float);
|
||||
float __s_cosf (float);
|
||||
float __s_expf (float);
|
||||
float __s_expf_1u (float);
|
||||
float __s_exp2f (float);
|
||||
float __s_exp2f_1u (float);
|
||||
float __s_logf (float);
|
||||
float __s_powf (float, float);
|
||||
double __s_sin (double);
|
||||
double __s_cos (double);
|
||||
double __s_exp (double);
|
||||
double __s_log (double);
|
||||
double __s_pow (double, double);
|
||||
|
||||
#if __aarch64__
|
||||
# if __GNUC__ >= 5
|
||||
#if __GNUC__ >= 5
|
||||
typedef __Float32x4_t __f32x4_t;
|
||||
typedef __Float64x2_t __f64x2_t;
|
||||
# elif __clang_major__*100+__clang_minor__ >= 305
|
||||
#elif __clang_major__*100+__clang_minor__ >= 305
|
||||
typedef __attribute__((__neon_vector_type__(4))) float __f32x4_t;
|
||||
typedef __attribute__((__neon_vector_type__(2))) double __f64x2_t;
|
||||
# else
|
||||
# error Unsupported compiler
|
||||
# endif
|
||||
#else
|
||||
#error Unsupported compiler
|
||||
#endif
|
||||
|
||||
# if __GNUC__ >= 9 || __clang_major__ >= 8
|
||||
# undef __vpcs
|
||||
# define __vpcs __attribute__((__aarch64_vector_pcs__))
|
||||
/* Vector functions following the base PCS. */
|
||||
__f32x4_t __v_sinf (__f32x4_t);
|
||||
__f32x4_t __v_cosf (__f32x4_t);
|
||||
__f32x4_t __v_expf (__f32x4_t);
|
||||
__f32x4_t __v_expf_1u (__f32x4_t);
|
||||
__f32x4_t __v_exp2f (__f32x4_t);
|
||||
__f32x4_t __v_exp2f_1u (__f32x4_t);
|
||||
__f32x4_t __v_logf (__f32x4_t);
|
||||
__f32x4_t __v_powf (__f32x4_t, __f32x4_t);
|
||||
__f64x2_t __v_sin (__f64x2_t);
|
||||
__f64x2_t __v_cos (__f64x2_t);
|
||||
__f64x2_t __v_exp (__f64x2_t);
|
||||
__f64x2_t __v_log (__f64x2_t);
|
||||
__f64x2_t __v_pow (__f64x2_t, __f64x2_t);
|
||||
|
||||
#if __GNUC__ >= 9 || __clang_major__ >= 8
|
||||
#define __vpcs __attribute__((__aarch64_vector_pcs__))
|
||||
|
||||
/* Vector functions following the vector PCS. */
|
||||
__vpcs __f32x4_t __vn_sinf (__f32x4_t);
|
||||
__vpcs __f32x4_t __vn_cosf (__f32x4_t);
|
||||
__vpcs __f32x4_t __vn_expf (__f32x4_t);
|
||||
__vpcs __f32x4_t __vn_expf_1u (__f32x4_t);
|
||||
__vpcs __f32x4_t __vn_exp2f (__f32x4_t);
|
||||
__vpcs __f32x4_t __vn_exp2f_1u (__f32x4_t);
|
||||
__vpcs __f32x4_t __vn_logf (__f32x4_t);
|
||||
__vpcs __f32x4_t __vn_powf (__f32x4_t, __f32x4_t);
|
||||
__vpcs __f64x2_t __vn_sin (__f64x2_t);
|
||||
__vpcs __f64x2_t __vn_cos (__f64x2_t);
|
||||
__vpcs __f64x2_t __vn_exp (__f64x2_t);
|
||||
__vpcs __f64x2_t __vn_log (__f64x2_t);
|
||||
__vpcs __f64x2_t __vn_pow (__f64x2_t, __f64x2_t);
|
||||
|
||||
/* Vector functions following the vector PCS using ABI names. */
|
||||
__vpcs __f32x4_t _ZGVnN4v_sinf (__f32x4_t);
|
||||
__vpcs __f32x4_t _ZGVnN4v_cosf (__f32x4_t);
|
||||
__vpcs __f32x4_t _ZGVnN4v_expf_1u (__f32x4_t);
|
||||
__vpcs __f32x4_t _ZGVnN4v_expf (__f32x4_t);
|
||||
__vpcs __f32x4_t _ZGVnN4v_exp2f_1u (__f32x4_t);
|
||||
__vpcs __f32x4_t _ZGVnN4v_exp2f (__f32x4_t);
|
||||
__vpcs __f32x4_t _ZGVnN4v_logf (__f32x4_t);
|
||||
__vpcs __f32x4_t _ZGVnN4vv_powf (__f32x4_t, __f32x4_t);
|
||||
@@ -53,7 +94,7 @@ __vpcs __f64x2_t _ZGVnN2v_cos (__f64x2_t);
|
||||
__vpcs __f64x2_t _ZGVnN2v_exp (__f64x2_t);
|
||||
__vpcs __f64x2_t _ZGVnN2v_log (__f64x2_t);
|
||||
__vpcs __f64x2_t _ZGVnN2vv_pow (__f64x2_t, __f64x2_t);
|
||||
# endif
|
||||
#endif
|
||||
#endif
|
||||
|
||||
#endif
|
||||
|
||||
+1
-1
@@ -2,7 +2,7 @@
|
||||
* Double-precision log(x) function.
|
||||
*
|
||||
* Copyright (c) 2018-2019, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
* SPDX-License-Identifier: MIT
|
||||
*/
|
||||
|
||||
#include <float.h>
|
||||
|
||||
+1
-1
@@ -2,7 +2,7 @@
|
||||
* Double-precision log2(x) function.
|
||||
*
|
||||
* Copyright (c) 2018-2019, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
* SPDX-License-Identifier: MIT
|
||||
*/
|
||||
|
||||
#include <float.h>
|
||||
|
||||
+1
-1
@@ -2,7 +2,7 @@
|
||||
* Data for log2.
|
||||
*
|
||||
* Copyright (c) 2018, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
* SPDX-License-Identifier: MIT
|
||||
*/
|
||||
|
||||
#include "math_config.h"
|
||||
|
||||
+1
-1
@@ -2,7 +2,7 @@
|
||||
* Single-precision log2 function.
|
||||
*
|
||||
* Copyright (c) 2017-2018, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
* SPDX-License-Identifier: MIT
|
||||
*/
|
||||
|
||||
#include <math.h>
|
||||
|
||||
+1
-1
@@ -2,7 +2,7 @@
|
||||
* Data definition for log2f.
|
||||
*
|
||||
* Copyright (c) 2017-2018, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
* SPDX-License-Identifier: MIT
|
||||
*/
|
||||
|
||||
#include "math_config.h"
|
||||
|
||||
+1
-1
@@ -2,7 +2,7 @@
|
||||
* Data for log.
|
||||
*
|
||||
* Copyright (c) 2018, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
* SPDX-License-Identifier: MIT
|
||||
*/
|
||||
|
||||
#include "math_config.h"
|
||||
|
||||
+3
-3
@@ -1,8 +1,8 @@
|
||||
/*
|
||||
* Single-precision log function.
|
||||
*
|
||||
* Copyright (c) 2017-2023, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
* Copyright (c) 2017-2019, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT
|
||||
*/
|
||||
|
||||
#include <math.h>
|
||||
@@ -57,7 +57,7 @@ logf (float x)
|
||||
tmp = ix - OFF;
|
||||
i = (tmp >> (23 - LOGF_TABLE_BITS)) % N;
|
||||
k = (int32_t) tmp >> 23; /* arithmetic shift */
|
||||
iz = ix - (tmp & 0xff800000);
|
||||
iz = ix - (tmp & 0x1ff << 23);
|
||||
invc = T[i].invc;
|
||||
logc = T[i].logc;
|
||||
z = (double_t) asfloat (iz);
|
||||
|
||||
+1
-1
@@ -2,7 +2,7 @@
|
||||
* Data definition for logf.
|
||||
*
|
||||
* Copyright (c) 2017-2019, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
* SPDX-License-Identifier: MIT
|
||||
*/
|
||||
|
||||
#include "math_config.h"
|
||||
|
||||
+2
-32
@@ -1,8 +1,8 @@
|
||||
/*
|
||||
* Configuration for math routines.
|
||||
*
|
||||
* Copyright (c) 2017-2023, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
* Copyright (c) 2017-2020, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT
|
||||
*/
|
||||
|
||||
#ifndef _MATH_CONFIG_H
|
||||
@@ -92,17 +92,6 @@
|
||||
# define unlikely(x) (x)
|
||||
#endif
|
||||
|
||||
/* Return ptr but hide its value from the compiler so accesses through it
|
||||
cannot be optimized based on the contents. */
|
||||
#define ptr_barrier(ptr) \
|
||||
({ \
|
||||
__typeof (ptr) __ptr = (ptr); \
|
||||
__asm("" : "+r"(__ptr)); \
|
||||
__ptr; \
|
||||
})
|
||||
|
||||
/* Symbol renames to avoid libc conflicts. */
|
||||
|
||||
#if HAVE_FAST_ROUND
|
||||
/* When set, the roundtoint and converttoint functions are provided with
|
||||
the semantics documented below. */
|
||||
@@ -392,22 +381,15 @@ extern const struct powf_log2_data
|
||||
#define EXP_USE_TOINT_NARROW 0
|
||||
#define EXP2_POLY_ORDER 5
|
||||
#define EXP2_POLY_WIDE 0
|
||||
/* Wider exp10 polynomial necessary for good precision in non-nearest rounding
|
||||
and !TOINT_INTRINSICS. */
|
||||
#define EXP10_POLY_WIDE 0
|
||||
extern const struct exp_data
|
||||
{
|
||||
double invln2N;
|
||||
double invlog10_2N;
|
||||
double shift;
|
||||
double negln2hiN;
|
||||
double negln2loN;
|
||||
double neglog10_2hiN;
|
||||
double neglog10_2loN;
|
||||
double poly[4]; /* Last four coefficients. */
|
||||
double exp2_shift;
|
||||
double exp2_poly[EXP2_POLY_ORDER];
|
||||
double exp10_poly[5];
|
||||
uint64_t tab[2*(1 << EXP_TABLE_BITS)];
|
||||
} __exp_data HIDDEN;
|
||||
|
||||
@@ -477,16 +459,4 @@ extern const struct erf_data
|
||||
double erfc_poly_F[ERFC_POLY_F_NCOEFFS];
|
||||
} __erf_data HIDDEN;
|
||||
|
||||
#define V_EXP_TABLE_BITS 7
|
||||
extern const uint64_t __v_exp_data[1 << V_EXP_TABLE_BITS] HIDDEN;
|
||||
|
||||
#define V_LOG_TABLE_BITS 7
|
||||
extern const struct v_log_data
|
||||
{
|
||||
struct
|
||||
{
|
||||
double invc, logc;
|
||||
} table[1 << V_LOG_TABLE_BITS];
|
||||
} __v_log_data HIDDEN;
|
||||
|
||||
#endif
|
||||
|
||||
+1
-1
@@ -2,7 +2,7 @@
|
||||
* Double-precision math error handling.
|
||||
*
|
||||
* Copyright (c) 2018, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
* SPDX-License-Identifier: MIT
|
||||
*/
|
||||
|
||||
#include "math_config.h"
|
||||
|
||||
+1
-1
@@ -2,7 +2,7 @@
|
||||
* Single-precision math error handling.
|
||||
*
|
||||
* Copyright (c) 2017-2020, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
* SPDX-License-Identifier: MIT
|
||||
*/
|
||||
|
||||
#include "math_config.h"
|
||||
|
||||
+1
-1
@@ -2,7 +2,7 @@
|
||||
* Double-precision x^y function.
|
||||
*
|
||||
* Copyright (c) 2018-2020, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
* SPDX-License-Identifier: MIT
|
||||
*/
|
||||
|
||||
#include <float.h>
|
||||
|
||||
+1
-1
@@ -2,7 +2,7 @@
|
||||
* Data for the log part of pow.
|
||||
*
|
||||
* Copyright (c) 2018, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
* SPDX-License-Identifier: MIT
|
||||
*/
|
||||
|
||||
#include "math_config.h"
|
||||
|
||||
+1
-1
@@ -2,7 +2,7 @@
|
||||
* Single-precision pow function.
|
||||
*
|
||||
* Copyright (c) 2017-2019, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
* SPDX-License-Identifier: MIT
|
||||
*/
|
||||
|
||||
#include <math.h>
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
* Data definition for powf.
|
||||
*
|
||||
* Copyright (c) 2017-2019, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
* SPDX-License-Identifier: MIT
|
||||
*/
|
||||
|
||||
#include "math_config.h"
|
||||
|
||||
@@ -0,0 +1,6 @@
|
||||
/*
|
||||
* Copyright (c) 2019, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT
|
||||
*/
|
||||
#define SCALAR 1
|
||||
#include "v_cos.c"
|
||||
@@ -0,0 +1,6 @@
|
||||
/*
|
||||
* Copyright (c) 2019, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT
|
||||
*/
|
||||
#define SCALAR 1
|
||||
#include "v_cosf.c"
|
||||
@@ -0,0 +1,6 @@
|
||||
/*
|
||||
* Copyright (c) 2019, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT
|
||||
*/
|
||||
#define SCALAR 1
|
||||
#include "v_exp.c"
|
||||
@@ -0,0 +1,6 @@
|
||||
/*
|
||||
* Copyright (c) 2019, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT
|
||||
*/
|
||||
#define SCALAR 1
|
||||
#include "v_exp2f.c"
|
||||
@@ -0,0 +1,6 @@
|
||||
/*
|
||||
* Copyright (c) 2019, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT
|
||||
*/
|
||||
#define SCALAR 1
|
||||
#include "v_exp2f_1u.c"
|
||||
@@ -0,0 +1,6 @@
|
||||
/*
|
||||
* Copyright (c) 2019, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT
|
||||
*/
|
||||
#define SCALAR 1
|
||||
#include "v_expf.c"
|
||||
@@ -0,0 +1,6 @@
|
||||
/*
|
||||
* Copyright (c) 2019, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT
|
||||
*/
|
||||
#define SCALAR 1
|
||||
#include "v_expf_1u.c"
|
||||
@@ -0,0 +1,6 @@
|
||||
/*
|
||||
* Copyright (c) 2019, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT
|
||||
*/
|
||||
#define SCALAR 1
|
||||
#include "v_log.c"
|
||||
@@ -0,0 +1,6 @@
|
||||
/*
|
||||
* Copyright (c) 2019, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT
|
||||
*/
|
||||
#define SCALAR 1
|
||||
#include "v_logf.c"
|
||||
@@ -0,0 +1,6 @@
|
||||
/*
|
||||
* Copyright (c) 2020, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT
|
||||
*/
|
||||
#define SCALAR 1
|
||||
#include "v_pow.c"
|
||||
@@ -0,0 +1,6 @@
|
||||
/*
|
||||
* Copyright (c) 2019, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT
|
||||
*/
|
||||
#define SCALAR 1
|
||||
#include "v_powf.c"
|
||||
@@ -0,0 +1,6 @@
|
||||
/*
|
||||
* Copyright (c) 2019, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT
|
||||
*/
|
||||
#define SCALAR 1
|
||||
#include "v_sin.c"
|
||||
@@ -0,0 +1,6 @@
|
||||
/*
|
||||
* Copyright (c) 2019, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT
|
||||
*/
|
||||
#define SCALAR 1
|
||||
#include "v_sinf.c"
|
||||
+3
-3
@@ -1,8 +1,8 @@
|
||||
/*
|
||||
* Single-precision sin/cos function.
|
||||
*
|
||||
* Copyright (c) 2018-2021, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
* Copyright (c) 2018-2019, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT
|
||||
*/
|
||||
|
||||
#include <stdint.h>
|
||||
@@ -22,7 +22,7 @@ sincosf (float y, float *sinp, float *cosp)
|
||||
int n;
|
||||
const sincos_t *p = &__sincosf_table[0];
|
||||
|
||||
if (abstop12 (y) < abstop12 (pio4f))
|
||||
if (abstop12 (y) < abstop12 (pio4))
|
||||
{
|
||||
double x2 = x * x;
|
||||
|
||||
|
||||
+3
-3
@@ -1,8 +1,8 @@
|
||||
/*
|
||||
* Header for sinf, cosf and sincosf.
|
||||
*
|
||||
* Copyright (c) 2018-2021, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
* Copyright (c) 2018, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT
|
||||
*/
|
||||
|
||||
#include <stdint.h>
|
||||
@@ -12,7 +12,7 @@
|
||||
/* 2PI * 2^-64. */
|
||||
static const double pi63 = 0x1.921FB54442D18p-62;
|
||||
/* PI / 4. */
|
||||
static const float pio4f = 0x1.921FB6p-1f;
|
||||
static const double pio4 = 0x1.921FB54442D18p-1;
|
||||
|
||||
/* The constants and polynomials for sine and cosine. */
|
||||
typedef struct
|
||||
|
||||
+1
-1
@@ -2,7 +2,7 @@
|
||||
* Data definition for sinf, cosf and sincosf.
|
||||
*
|
||||
* Copyright (c) 2018-2019, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
* SPDX-License-Identifier: MIT
|
||||
*/
|
||||
|
||||
#include <stdint.h>
|
||||
|
||||
+3
-3
@@ -1,8 +1,8 @@
|
||||
/*
|
||||
* Single-precision sin function.
|
||||
*
|
||||
* Copyright (c) 2018-2021, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
* Copyright (c) 2018-2019, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT
|
||||
*/
|
||||
|
||||
#include <math.h>
|
||||
@@ -21,7 +21,7 @@ sinf (float y)
|
||||
int n;
|
||||
const sincos_t *p = &__sincosf_table[0];
|
||||
|
||||
if (abstop12 (y) < abstop12 (pio4f))
|
||||
if (abstop12 (y) < abstop12 (pio4))
|
||||
{
|
||||
s = x * x;
|
||||
|
||||
|
||||
+265
-134
@@ -1,8 +1,8 @@
|
||||
/*
|
||||
* Microbenchmark for math functions.
|
||||
*
|
||||
* Copyright (c) 2018-2022, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
* Copyright (c) 2018-2020, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT
|
||||
*/
|
||||
|
||||
#undef _GNU_SOURCE
|
||||
@@ -15,6 +15,11 @@
|
||||
#include <math.h>
|
||||
#include "mathlib.h"
|
||||
|
||||
#ifndef WANT_VMATH
|
||||
/* Enable the build of vector math code. */
|
||||
# define WANT_VMATH 1
|
||||
#endif
|
||||
|
||||
/* Number of measurements, best result is reported. */
|
||||
#define MEASURE 60
|
||||
/* Array size. */
|
||||
@@ -29,9 +34,8 @@ static float Af[N];
|
||||
static long measurecount = MEASURE;
|
||||
static long itercount = ITER;
|
||||
|
||||
#ifdef __vpcs
|
||||
#include <arm_neon.h>
|
||||
typedef float64x2_t v_double;
|
||||
#if __aarch64__ && WANT_VMATH
|
||||
typedef __f64x2_t v_double;
|
||||
|
||||
#define v_double_len() 2
|
||||
|
||||
@@ -47,7 +51,7 @@ v_double_dup (double x)
|
||||
return (v_double){x, x};
|
||||
}
|
||||
|
||||
typedef float32x4_t v_float;
|
||||
typedef __f32x4_t v_float;
|
||||
|
||||
#define v_float_len() 4
|
||||
|
||||
@@ -72,49 +76,6 @@ typedef float v_float;
|
||||
#define v_float_len(x) 1
|
||||
#define v_float_load(x) (x)[0]
|
||||
#define v_float_dup(x) (x)
|
||||
|
||||
#endif
|
||||
|
||||
#if WANT_SVE_MATH
|
||||
#include <arm_sve.h>
|
||||
typedef svbool_t sv_bool;
|
||||
typedef svfloat64_t sv_double;
|
||||
|
||||
#define sv_double_len() svcntd()
|
||||
|
||||
static inline sv_double
|
||||
sv_double_load (const double *p)
|
||||
{
|
||||
svbool_t pg = svptrue_b64();
|
||||
return svld1(pg, p);
|
||||
}
|
||||
|
||||
static inline sv_double
|
||||
sv_double_dup (double x)
|
||||
{
|
||||
return svdup_n_f64(x);
|
||||
}
|
||||
|
||||
typedef svfloat32_t sv_float;
|
||||
|
||||
#define sv_float_len() svcntw()
|
||||
|
||||
static inline sv_float
|
||||
sv_float_load (const float *p)
|
||||
{
|
||||
svbool_t pg = svptrue_b32();
|
||||
return svld1(pg, p);
|
||||
}
|
||||
|
||||
static inline sv_float
|
||||
sv_float_dup (float x)
|
||||
{
|
||||
return svdup_n_f32(x);
|
||||
}
|
||||
#else
|
||||
/* dummy definitions to make things compile. */
|
||||
#define sv_double_len(x) 1
|
||||
#define sv_float_len(x) 1
|
||||
#endif
|
||||
|
||||
static double
|
||||
@@ -128,6 +89,21 @@ dummyf (float x)
|
||||
{
|
||||
return x;
|
||||
}
|
||||
|
||||
#if WANT_VMATH
|
||||
#if __aarch64__
|
||||
static v_double
|
||||
__v_dummy (v_double x)
|
||||
{
|
||||
return x;
|
||||
}
|
||||
|
||||
static v_float
|
||||
__v_dummyf (v_float x)
|
||||
{
|
||||
return x;
|
||||
}
|
||||
|
||||
#ifdef __vpcs
|
||||
__vpcs static v_double
|
||||
__vn_dummy (v_double x)
|
||||
@@ -140,23 +116,101 @@ __vn_dummyf (v_float x)
|
||||
{
|
||||
return x;
|
||||
}
|
||||
#endif
|
||||
#if WANT_SVE_MATH
|
||||
static sv_double
|
||||
__sv_dummy (sv_double x, sv_bool pg)
|
||||
|
||||
__vpcs static v_float
|
||||
xy__vn_powf (v_float x)
|
||||
{
|
||||
return x;
|
||||
return __vn_powf (x, x);
|
||||
}
|
||||
|
||||
static sv_float
|
||||
__sv_dummyf (sv_float x, sv_bool pg)
|
||||
__vpcs static v_float
|
||||
xy_Z_powf (v_float x)
|
||||
{
|
||||
return x;
|
||||
return _ZGVnN4vv_powf (x, x);
|
||||
}
|
||||
|
||||
__vpcs static v_double
|
||||
xy__vn_pow (v_double x)
|
||||
{
|
||||
return __vn_pow (x, x);
|
||||
}
|
||||
|
||||
__vpcs static v_double
|
||||
xy_Z_pow (v_double x)
|
||||
{
|
||||
return _ZGVnN2vv_pow (x, x);
|
||||
}
|
||||
#endif
|
||||
|
||||
#include "test/mathbench_wrappers.h"
|
||||
static v_float
|
||||
xy__v_powf (v_float x)
|
||||
{
|
||||
return __v_powf (x, x);
|
||||
}
|
||||
|
||||
static v_double
|
||||
xy__v_pow (v_double x)
|
||||
{
|
||||
return __v_pow (x, x);
|
||||
}
|
||||
#endif
|
||||
|
||||
static float
|
||||
xy__s_powf (float x)
|
||||
{
|
||||
return __s_powf (x, x);
|
||||
}
|
||||
|
||||
static double
|
||||
xy__s_pow (double x)
|
||||
{
|
||||
return __s_pow (x, x);
|
||||
}
|
||||
#endif
|
||||
|
||||
static double
|
||||
xypow (double x)
|
||||
{
|
||||
return pow (x, x);
|
||||
}
|
||||
|
||||
static float
|
||||
xypowf (float x)
|
||||
{
|
||||
return powf (x, x);
|
||||
}
|
||||
|
||||
static double
|
||||
xpow (double x)
|
||||
{
|
||||
return pow (x, 23.4);
|
||||
}
|
||||
|
||||
static float
|
||||
xpowf (float x)
|
||||
{
|
||||
return powf (x, 23.4f);
|
||||
}
|
||||
|
||||
static double
|
||||
ypow (double x)
|
||||
{
|
||||
return pow (2.34, x);
|
||||
}
|
||||
|
||||
static float
|
||||
ypowf (float x)
|
||||
{
|
||||
return powf (2.34f, x);
|
||||
}
|
||||
|
||||
static float
|
||||
sincosf_wrap (float x)
|
||||
{
|
||||
float s, c;
|
||||
sincosf (x, &s, &c);
|
||||
return s + c;
|
||||
}
|
||||
|
||||
static const struct fun
|
||||
{
|
||||
@@ -169,40 +223,127 @@ static const struct fun
|
||||
{
|
||||
double (*d) (double);
|
||||
float (*f) (float);
|
||||
v_double (*vd) (v_double);
|
||||
v_float (*vf) (v_float);
|
||||
#ifdef __vpcs
|
||||
__vpcs v_double (*vnd) (v_double);
|
||||
__vpcs v_float (*vnf) (v_float);
|
||||
#endif
|
||||
#if WANT_SVE_MATH
|
||||
sv_double (*svd) (sv_double, sv_bool);
|
||||
sv_float (*svf) (sv_float, sv_bool);
|
||||
#endif
|
||||
} fun;
|
||||
} funtab[] = {
|
||||
#define D(func, lo, hi) {#func, 'd', 0, lo, hi, {.d = func}},
|
||||
#define F(func, lo, hi) {#func, 'f', 0, lo, hi, {.f = func}},
|
||||
#define VD(func, lo, hi) {#func, 'd', 'v', lo, hi, {.vd = func}},
|
||||
#define VF(func, lo, hi) {#func, 'f', 'v', lo, hi, {.vf = func}},
|
||||
#define VND(func, lo, hi) {#func, 'd', 'n', lo, hi, {.vnd = func}},
|
||||
#define VNF(func, lo, hi) {#func, 'f', 'n', lo, hi, {.vnf = func}},
|
||||
#define SVD(func, lo, hi) {#func, 'd', 's', lo, hi, {.svd = func}},
|
||||
#define SVF(func, lo, hi) {#func, 'f', 's', lo, hi, {.svf = func}},
|
||||
D (dummy, 1.0, 2.0)
|
||||
D (exp, -9.9, 9.9)
|
||||
D (exp, 0.5, 1.0)
|
||||
D (exp2, -9.9, 9.9)
|
||||
D (log, 0.01, 11.1)
|
||||
D (log, 0.999, 1.001)
|
||||
D (log2, 0.01, 11.1)
|
||||
D (log2, 0.999, 1.001)
|
||||
{"pow", 'd', 0, 0.01, 11.1, {.d = xypow}},
|
||||
D (xpow, 0.01, 11.1)
|
||||
D (ypow, -9.9, 9.9)
|
||||
D (erf, -6.0, 6.0)
|
||||
|
||||
F (dummyf, 1.0, 2.0)
|
||||
F (expf, -9.9, 9.9)
|
||||
F (exp2f, -9.9, 9.9)
|
||||
F (logf, 0.01, 11.1)
|
||||
F (log2f, 0.01, 11.1)
|
||||
{"powf", 'f', 0, 0.01, 11.1, {.f = xypowf}},
|
||||
F (xpowf, 0.01, 11.1)
|
||||
F (ypowf, -9.9, 9.9)
|
||||
{"sincosf", 'f', 0, 0.1, 0.7, {.f = sincosf_wrap}},
|
||||
{"sincosf", 'f', 0, 0.8, 3.1, {.f = sincosf_wrap}},
|
||||
{"sincosf", 'f', 0, -3.1, 3.1, {.f = sincosf_wrap}},
|
||||
{"sincosf", 'f', 0, 3.3, 33.3, {.f = sincosf_wrap}},
|
||||
{"sincosf", 'f', 0, 100, 1000, {.f = sincosf_wrap}},
|
||||
{"sincosf", 'f', 0, 1e6, 1e32, {.f = sincosf_wrap}},
|
||||
F (sinf, 0.1, 0.7)
|
||||
F (sinf, 0.8, 3.1)
|
||||
F (sinf, -3.1, 3.1)
|
||||
F (sinf, 3.3, 33.3)
|
||||
F (sinf, 100, 1000)
|
||||
F (sinf, 1e6, 1e32)
|
||||
F (cosf, 0.1, 0.7)
|
||||
F (cosf, 0.8, 3.1)
|
||||
F (cosf, -3.1, 3.1)
|
||||
F (cosf, 3.3, 33.3)
|
||||
F (cosf, 100, 1000)
|
||||
F (cosf, 1e6, 1e32)
|
||||
F (erff, -4.0, 4.0)
|
||||
#if WANT_VMATH
|
||||
D (__s_sin, -3.1, 3.1)
|
||||
D (__s_cos, -3.1, 3.1)
|
||||
D (__s_exp, -9.9, 9.9)
|
||||
D (__s_log, 0.01, 11.1)
|
||||
{"__s_pow", 'd', 0, 0.01, 11.1, {.d = xy__s_pow}},
|
||||
F (__s_expf, -9.9, 9.9)
|
||||
F (__s_expf_1u, -9.9, 9.9)
|
||||
F (__s_exp2f, -9.9, 9.9)
|
||||
F (__s_exp2f_1u, -9.9, 9.9)
|
||||
F (__s_logf, 0.01, 11.1)
|
||||
{"__s_powf", 'f', 0, 0.01, 11.1, {.f = xy__s_powf}},
|
||||
F (__s_sinf, -3.1, 3.1)
|
||||
F (__s_cosf, -3.1, 3.1)
|
||||
#if __aarch64__
|
||||
VD (__v_dummy, 1.0, 2.0)
|
||||
VD (__v_sin, -3.1, 3.1)
|
||||
VD (__v_cos, -3.1, 3.1)
|
||||
VD (__v_exp, -9.9, 9.9)
|
||||
VD (__v_log, 0.01, 11.1)
|
||||
{"__v_pow", 'd', 'v', 0.01, 11.1, {.vd = xy__v_pow}},
|
||||
VF (__v_dummyf, 1.0, 2.0)
|
||||
VF (__v_expf, -9.9, 9.9)
|
||||
VF (__v_expf_1u, -9.9, 9.9)
|
||||
VF (__v_exp2f, -9.9, 9.9)
|
||||
VF (__v_exp2f_1u, -9.9, 9.9)
|
||||
VF (__v_logf, 0.01, 11.1)
|
||||
{"__v_powf", 'f', 'v', 0.01, 11.1, {.vf = xy__v_powf}},
|
||||
VF (__v_sinf, -3.1, 3.1)
|
||||
VF (__v_cosf, -3.1, 3.1)
|
||||
#ifdef __vpcs
|
||||
VND (__vn_dummy, 1.0, 2.0)
|
||||
VND (__vn_exp, -9.9, 9.9)
|
||||
VND (_ZGVnN2v_exp, -9.9, 9.9)
|
||||
VND (__vn_log, 0.01, 11.1)
|
||||
VND (_ZGVnN2v_log, 0.01, 11.1)
|
||||
{"__vn_pow", 'd', 'n', 0.01, 11.1, {.vnd = xy__vn_pow}},
|
||||
{"_ZGVnN2vv_pow", 'd', 'n', 0.01, 11.1, {.vnd = xy_Z_pow}},
|
||||
VND (__vn_sin, -3.1, 3.1)
|
||||
VND (_ZGVnN2v_sin, -3.1, 3.1)
|
||||
VND (__vn_cos, -3.1, 3.1)
|
||||
VND (_ZGVnN2v_cos, -3.1, 3.1)
|
||||
VNF (__vn_dummyf, 1.0, 2.0)
|
||||
VNF (__vn_expf, -9.9, 9.9)
|
||||
VNF (_ZGVnN4v_expf, -9.9, 9.9)
|
||||
VNF (__vn_expf_1u, -9.9, 9.9)
|
||||
VNF (__vn_exp2f, -9.9, 9.9)
|
||||
VNF (_ZGVnN4v_exp2f, -9.9, 9.9)
|
||||
VNF (__vn_exp2f_1u, -9.9, 9.9)
|
||||
VNF (__vn_logf, 0.01, 11.1)
|
||||
VNF (_ZGVnN4v_logf, 0.01, 11.1)
|
||||
{"__vn_powf", 'f', 'n', 0.01, 11.1, {.vnf = xy__vn_powf}},
|
||||
{"_ZGVnN4vv_powf", 'f', 'n', 0.01, 11.1, {.vnf = xy_Z_powf}},
|
||||
VNF (__vn_sinf, -3.1, 3.1)
|
||||
VNF (_ZGVnN4v_sinf, -3.1, 3.1)
|
||||
VNF (__vn_cosf, -3.1, 3.1)
|
||||
VNF (_ZGVnN4v_cosf, -3.1, 3.1)
|
||||
#endif
|
||||
#endif
|
||||
#if WANT_SVE_MATH
|
||||
SVD (__sv_dummy, 1.0, 2.0)
|
||||
SVF (__sv_dummyf, 1.0, 2.0)
|
||||
#endif
|
||||
#include "test/mathbench_funcs.h"
|
||||
{0},
|
||||
#undef F
|
||||
#undef D
|
||||
#undef VF
|
||||
#undef VD
|
||||
#undef VNF
|
||||
#undef VND
|
||||
#undef SVF
|
||||
#undef SVD
|
||||
};
|
||||
|
||||
static void
|
||||
@@ -301,6 +442,38 @@ runf_latency (float f (float))
|
||||
prev = f (Af[i] + prev * z);
|
||||
}
|
||||
|
||||
static void
|
||||
run_v_thruput (v_double f (v_double))
|
||||
{
|
||||
for (int i = 0; i < N; i += v_double_len ())
|
||||
f (v_double_load (A+i));
|
||||
}
|
||||
|
||||
static void
|
||||
runf_v_thruput (v_float f (v_float))
|
||||
{
|
||||
for (int i = 0; i < N; i += v_float_len ())
|
||||
f (v_float_load (Af+i));
|
||||
}
|
||||
|
||||
static void
|
||||
run_v_latency (v_double f (v_double))
|
||||
{
|
||||
v_double z = v_double_dup (zero);
|
||||
v_double prev = z;
|
||||
for (int i = 0; i < N; i += v_double_len ())
|
||||
prev = f (v_double_load (A+i) + prev * z);
|
||||
}
|
||||
|
||||
static void
|
||||
runf_v_latency (v_float f (v_float))
|
||||
{
|
||||
v_float z = v_float_dup (zero);
|
||||
v_float prev = z;
|
||||
for (int i = 0; i < N; i += v_float_len ())
|
||||
prev = f (v_float_load (Af+i) + prev * z);
|
||||
}
|
||||
|
||||
#ifdef __vpcs
|
||||
static void
|
||||
run_vn_thruput (__vpcs v_double f (v_double))
|
||||
@@ -319,57 +492,19 @@ runf_vn_thruput (__vpcs v_float f (v_float))
|
||||
static void
|
||||
run_vn_latency (__vpcs v_double f (v_double))
|
||||
{
|
||||
volatile uint64x2_t vsel = (uint64x2_t) { 0, 0 };
|
||||
uint64x2_t sel = vsel;
|
||||
v_double prev = v_double_dup (0);
|
||||
v_double z = v_double_dup (zero);
|
||||
v_double prev = z;
|
||||
for (int i = 0; i < N; i += v_double_len ())
|
||||
prev = f (vbslq_f64 (sel, prev, v_double_load (A+i)));
|
||||
prev = f (v_double_load (A+i) + prev * z);
|
||||
}
|
||||
|
||||
static void
|
||||
runf_vn_latency (__vpcs v_float f (v_float))
|
||||
{
|
||||
volatile uint32x4_t vsel = (uint32x4_t) { 0, 0, 0, 0 };
|
||||
uint32x4_t sel = vsel;
|
||||
v_float prev = v_float_dup (0);
|
||||
v_float z = v_float_dup (zero);
|
||||
v_float prev = z;
|
||||
for (int i = 0; i < N; i += v_float_len ())
|
||||
prev = f (vbslq_f32 (sel, prev, v_float_load (Af+i)));
|
||||
}
|
||||
#endif
|
||||
|
||||
#if WANT_SVE_MATH
|
||||
static void
|
||||
run_sv_thruput (sv_double f (sv_double, sv_bool))
|
||||
{
|
||||
for (int i = 0; i < N; i += sv_double_len ())
|
||||
f (sv_double_load (A+i), svptrue_b64 ());
|
||||
}
|
||||
|
||||
static void
|
||||
runf_sv_thruput (sv_float f (sv_float, sv_bool))
|
||||
{
|
||||
for (int i = 0; i < N; i += sv_float_len ())
|
||||
f (sv_float_load (Af+i), svptrue_b32 ());
|
||||
}
|
||||
|
||||
static void
|
||||
run_sv_latency (sv_double f (sv_double, sv_bool))
|
||||
{
|
||||
volatile sv_bool vsel = svptrue_b64 ();
|
||||
sv_bool sel = vsel;
|
||||
sv_double prev = sv_double_dup (0);
|
||||
for (int i = 0; i < N; i += sv_double_len ())
|
||||
prev = f (svsel_f64 (sel, sv_double_load (A+i), prev), svptrue_b64 ());
|
||||
}
|
||||
|
||||
static void
|
||||
runf_sv_latency (sv_float f (sv_float, sv_bool))
|
||||
{
|
||||
volatile sv_bool vsel = svptrue_b32 ();
|
||||
sv_bool sel = vsel;
|
||||
sv_float prev = sv_float_dup (0);
|
||||
for (int i = 0; i < N; i += sv_float_len ())
|
||||
prev = f (svsel_f32 (sel, sv_float_load (Af+i), prev), svptrue_b32 ());
|
||||
prev = f (v_float_load (Af+i) + prev * z);
|
||||
}
|
||||
#endif
|
||||
|
||||
@@ -404,10 +539,10 @@ bench1 (const struct fun *f, int type, double lo, double hi)
|
||||
const char *s = type == 't' ? "rthruput" : "latency";
|
||||
int vlen = 1;
|
||||
|
||||
if (f->vec == 'n')
|
||||
vlen = f->prec == 'd' ? v_double_len() : v_float_len();
|
||||
else if (f->vec == 's')
|
||||
vlen = f->prec == 'd' ? sv_double_len() : sv_float_len();
|
||||
if (f->vec && f->prec == 'd')
|
||||
vlen = v_double_len();
|
||||
else if (f->vec && f->prec == 'f')
|
||||
vlen = v_float_len();
|
||||
|
||||
if (f->prec == 'd' && type == 't' && f->vec == 0)
|
||||
TIMEIT (run_thruput, f->fun.d);
|
||||
@@ -417,6 +552,14 @@ bench1 (const struct fun *f, int type, double lo, double hi)
|
||||
TIMEIT (runf_thruput, f->fun.f);
|
||||
else if (f->prec == 'f' && type == 'l' && f->vec == 0)
|
||||
TIMEIT (runf_latency, f->fun.f);
|
||||
else if (f->prec == 'd' && type == 't' && f->vec == 'v')
|
||||
TIMEIT (run_v_thruput, f->fun.vd);
|
||||
else if (f->prec == 'd' && type == 'l' && f->vec == 'v')
|
||||
TIMEIT (run_v_latency, f->fun.vd);
|
||||
else if (f->prec == 'f' && type == 't' && f->vec == 'v')
|
||||
TIMEIT (runf_v_thruput, f->fun.vf);
|
||||
else if (f->prec == 'f' && type == 'l' && f->vec == 'v')
|
||||
TIMEIT (runf_v_latency, f->fun.vf);
|
||||
#ifdef __vpcs
|
||||
else if (f->prec == 'd' && type == 't' && f->vec == 'n')
|
||||
TIMEIT (run_vn_thruput, f->fun.vnd);
|
||||
@@ -427,32 +570,20 @@ bench1 (const struct fun *f, int type, double lo, double hi)
|
||||
else if (f->prec == 'f' && type == 'l' && f->vec == 'n')
|
||||
TIMEIT (runf_vn_latency, f->fun.vnf);
|
||||
#endif
|
||||
#if WANT_SVE_MATH
|
||||
else if (f->prec == 'd' && type == 't' && f->vec == 's')
|
||||
TIMEIT (run_sv_thruput, f->fun.svd);
|
||||
else if (f->prec == 'd' && type == 'l' && f->vec == 's')
|
||||
TIMEIT (run_sv_latency, f->fun.svd);
|
||||
else if (f->prec == 'f' && type == 't' && f->vec == 's')
|
||||
TIMEIT (runf_sv_thruput, f->fun.svf);
|
||||
else if (f->prec == 'f' && type == 'l' && f->vec == 's')
|
||||
TIMEIT (runf_sv_latency, f->fun.svf);
|
||||
#endif
|
||||
|
||||
if (type == 't')
|
||||
{
|
||||
ns100 = (100 * dt + itercount * N / 2) / (itercount * N);
|
||||
printf ("%9s %8s: %4u.%02u ns/elem %10llu ns in [%g %g] vlen %d\n",
|
||||
f->name, s,
|
||||
printf ("%9s %8s: %4u.%02u ns/elem %10llu ns in [%g %g]\n", f->name, s,
|
||||
(unsigned) (ns100 / 100), (unsigned) (ns100 % 100),
|
||||
(unsigned long long) dt, lo, hi, vlen);
|
||||
(unsigned long long) dt, lo, hi);
|
||||
}
|
||||
else if (type == 'l')
|
||||
{
|
||||
ns100 = (100 * dt + itercount * N / vlen / 2) / (itercount * N / vlen);
|
||||
printf ("%9s %8s: %4u.%02u ns/call %10llu ns in [%g %g] vlen %d\n",
|
||||
f->name, s,
|
||||
printf ("%9s %8s: %4u.%02u ns/call %10llu ns in [%g %g]\n", f->name, s,
|
||||
(unsigned) (ns100 / 100), (unsigned) (ns100 % 100),
|
||||
(unsigned long long) dt, lo, hi, vlen);
|
||||
(unsigned long long) dt, lo, hi);
|
||||
}
|
||||
fflush (stdout);
|
||||
}
|
||||
|
||||
@@ -1,62 +0,0 @@
|
||||
/*
|
||||
* Function entries for mathbench.
|
||||
*
|
||||
* Copyright (c) 2022-2023, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
*/
|
||||
/* clang-format off */
|
||||
D (exp, -9.9, 9.9)
|
||||
D (exp, 0.5, 1.0)
|
||||
D (exp10, -9.9, 9.9)
|
||||
D (exp2, -9.9, 9.9)
|
||||
D (log, 0.01, 11.1)
|
||||
D (log, 0.999, 1.001)
|
||||
D (log2, 0.01, 11.1)
|
||||
D (log2, 0.999, 1.001)
|
||||
{"pow", 'd', 0, 0.01, 11.1, {.d = xypow}},
|
||||
D (xpow, 0.01, 11.1)
|
||||
D (ypow, -9.9, 9.9)
|
||||
D (erf, -6.0, 6.0)
|
||||
|
||||
F (expf, -9.9, 9.9)
|
||||
F (exp2f, -9.9, 9.9)
|
||||
F (logf, 0.01, 11.1)
|
||||
F (log2f, 0.01, 11.1)
|
||||
{"powf", 'f', 0, 0.01, 11.1, {.f = xypowf}},
|
||||
F (xpowf, 0.01, 11.1)
|
||||
F (ypowf, -9.9, 9.9)
|
||||
{"sincosf", 'f', 0, 0.1, 0.7, {.f = sincosf_wrap}},
|
||||
{"sincosf", 'f', 0, 0.8, 3.1, {.f = sincosf_wrap}},
|
||||
{"sincosf", 'f', 0, -3.1, 3.1, {.f = sincosf_wrap}},
|
||||
{"sincosf", 'f', 0, 3.3, 33.3, {.f = sincosf_wrap}},
|
||||
{"sincosf", 'f', 0, 100, 1000, {.f = sincosf_wrap}},
|
||||
{"sincosf", 'f', 0, 1e6, 1e32, {.f = sincosf_wrap}},
|
||||
F (sinf, 0.1, 0.7)
|
||||
F (sinf, 0.8, 3.1)
|
||||
F (sinf, -3.1, 3.1)
|
||||
F (sinf, 3.3, 33.3)
|
||||
F (sinf, 100, 1000)
|
||||
F (sinf, 1e6, 1e32)
|
||||
F (cosf, 0.1, 0.7)
|
||||
F (cosf, 0.8, 3.1)
|
||||
F (cosf, -3.1, 3.1)
|
||||
F (cosf, 3.3, 33.3)
|
||||
F (cosf, 100, 1000)
|
||||
F (cosf, 1e6, 1e32)
|
||||
F (erff, -4.0, 4.0)
|
||||
#ifdef __vpcs
|
||||
VND (_ZGVnN2v_exp, -9.9, 9.9)
|
||||
VND (_ZGVnN2v_log, 0.01, 11.1)
|
||||
{"_ZGVnN2vv_pow", 'd', 'n', 0.01, 11.1, {.vnd = xy_Z_pow}},
|
||||
VND (_ZGVnN2v_sin, -3.1, 3.1)
|
||||
VND (_ZGVnN2v_cos, -3.1, 3.1)
|
||||
VNF (_ZGVnN4v_expf, -9.9, 9.9)
|
||||
VNF (_ZGVnN4v_expf_1u, -9.9, 9.9)
|
||||
VNF (_ZGVnN4v_exp2f, -9.9, 9.9)
|
||||
VNF (_ZGVnN4v_exp2f_1u, -9.9, 9.9)
|
||||
VNF (_ZGVnN4v_logf, 0.01, 11.1)
|
||||
{"_ZGVnN4vv_powf", 'f', 'n', 0.01, 11.1, {.vnf = xy_Z_powf}},
|
||||
VNF (_ZGVnN4v_sinf, -3.1, 3.1)
|
||||
VNF (_ZGVnN4v_cosf, -3.1, 3.1)
|
||||
#endif
|
||||
/* clang-format on */
|
||||
@@ -1,66 +0,0 @@
|
||||
/*
|
||||
* Function wrappers for mathbench.
|
||||
*
|
||||
* Copyright (c) 2022-2023, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
*/
|
||||
|
||||
#ifdef __vpcs
|
||||
|
||||
__vpcs static v_float
|
||||
xy_Z_powf (v_float x)
|
||||
{
|
||||
return _ZGVnN4vv_powf (x, x);
|
||||
}
|
||||
|
||||
__vpcs static v_double
|
||||
xy_Z_pow (v_double x)
|
||||
{
|
||||
return _ZGVnN2vv_pow (x, x);
|
||||
}
|
||||
|
||||
#endif
|
||||
|
||||
static double
|
||||
xypow (double x)
|
||||
{
|
||||
return pow (x, x);
|
||||
}
|
||||
|
||||
static float
|
||||
xypowf (float x)
|
||||
{
|
||||
return powf (x, x);
|
||||
}
|
||||
|
||||
static double
|
||||
xpow (double x)
|
||||
{
|
||||
return pow (x, 23.4);
|
||||
}
|
||||
|
||||
static float
|
||||
xpowf (float x)
|
||||
{
|
||||
return powf (x, 23.4f);
|
||||
}
|
||||
|
||||
static double
|
||||
ypow (double x)
|
||||
{
|
||||
return pow (2.34, x);
|
||||
}
|
||||
|
||||
static float
|
||||
ypowf (float x)
|
||||
{
|
||||
return powf (2.34f, x);
|
||||
}
|
||||
|
||||
static float
|
||||
sincosf_wrap (float x)
|
||||
{
|
||||
float s, c;
|
||||
sincosf (x, &s, &c);
|
||||
return s + c;
|
||||
}
|
||||
+4
-12
@@ -1,8 +1,8 @@
|
||||
/*
|
||||
* mathtest.c - test rig for mathlib
|
||||
*
|
||||
* Copyright (c) 1998-2022, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
* Copyright (c) 1998-2019, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT
|
||||
*/
|
||||
|
||||
#include <assert.h>
|
||||
@@ -196,11 +196,9 @@ int is_complex_rettype(int rettype) {
|
||||
#define TFUNCARM(arg,ret,name,tolerance) { t_func, arg, ret, (void*)& ARM_PREFIX(name), m_none, tolerance, #name }
|
||||
#define MFUNC(arg,ret,name,tolerance) { t_macro, arg, ret, NULL, m_##name, tolerance, #name }
|
||||
|
||||
#ifndef PL
|
||||
/* sincosf wrappers for easier testing. */
|
||||
static float sincosf_sinf(float x) { float s,c; sincosf(x, &s, &c); return s; }
|
||||
static float sincosf_cosf(float x) { float s,c; sincosf(x, &s, &c); return c; }
|
||||
#endif
|
||||
|
||||
test_func tfuncs[] = {
|
||||
/* trigonometric */
|
||||
@@ -220,10 +218,9 @@ test_func tfuncs[] = {
|
||||
TFUNCARM(at_s,rt_s, tanf, 4*ULPUNIT),
|
||||
TFUNCARM(at_s,rt_s, sinf, 3*ULPUNIT/4),
|
||||
TFUNCARM(at_s,rt_s, cosf, 3*ULPUNIT/4),
|
||||
#ifndef PL
|
||||
TFUNCARM(at_s,rt_s, sincosf_sinf, 3*ULPUNIT/4),
|
||||
TFUNCARM(at_s,rt_s, sincosf_cosf, 3*ULPUNIT/4),
|
||||
#endif
|
||||
|
||||
/* hyperbolic */
|
||||
TFUNC(at_d, rt_d, atanh, 4*ULPUNIT),
|
||||
TFUNC(at_d, rt_d, asinh, 4*ULPUNIT),
|
||||
@@ -254,7 +251,6 @@ test_func tfuncs[] = {
|
||||
TFUNCARM(at_s,rt_s, expf, 3*ULPUNIT/4),
|
||||
TFUNCARM(at_s,rt_s, exp2f, 3*ULPUNIT/4),
|
||||
TFUNC(at_s,rt_s, expm1f, ULPUNIT),
|
||||
TFUNC(at_d,rt_d, exp10, ULPUNIT),
|
||||
|
||||
/* power */
|
||||
TFUNC(at_d2,rt_d, pow, 3*ULPUNIT/4),
|
||||
@@ -1022,7 +1018,6 @@ int runtest(testdetail t) {
|
||||
DO_DOP(d_arg1,op1r);
|
||||
DO_DOP(d_arg2,op2r);
|
||||
s_arg1.i = t.op1r[0]; s_arg2.i = t.op2r[0];
|
||||
s_res.i = 0;
|
||||
|
||||
/*
|
||||
* Detect NaNs, infinities and denormals on input, and set a
|
||||
@@ -1157,25 +1152,22 @@ int runtest(testdetail t) {
|
||||
tresultr[0] = t.resultr[0];
|
||||
tresultr[1] = t.resultr[1];
|
||||
resultr[0] = d_res.i[dmsd]; resultr[1] = d_res.i[dlsd];
|
||||
resulti[0] = resulti[1] = 0;
|
||||
wres = 2;
|
||||
break;
|
||||
case rt_i:
|
||||
tresultr[0] = t.resultr[0];
|
||||
resultr[0] = intres;
|
||||
resulti[0] = 0;
|
||||
wres = 1;
|
||||
break;
|
||||
case rt_s:
|
||||
case rt_s2:
|
||||
tresultr[0] = t.resultr[0];
|
||||
resultr[0] = s_res.i;
|
||||
resulti[0] = 0;
|
||||
wres = 1;
|
||||
break;
|
||||
default:
|
||||
puts("unhandled rettype in runtest");
|
||||
abort ();
|
||||
wres = 0;
|
||||
}
|
||||
if(t.resultc != rc_none) {
|
||||
int err = 0;
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
* dotest.c - actually generate mathlib test cases
|
||||
*
|
||||
* Copyright (c) 1999-2019, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
* SPDX-License-Identifier: MIT
|
||||
*/
|
||||
|
||||
#include <stdio.h>
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
* intern.h
|
||||
*
|
||||
* Copyright (c) 1999-2019, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
* SPDX-License-Identifier: MIT
|
||||
*/
|
||||
|
||||
#ifndef mathtest_intern_h
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
* main.c
|
||||
*
|
||||
* Copyright (c) 1999-2019, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
* SPDX-License-Identifier: MIT
|
||||
*/
|
||||
|
||||
#include <assert.h>
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
* random.c - random number generator for producing mathlib test cases
|
||||
*
|
||||
* Copyright (c) 1998-2019, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
* SPDX-License-Identifier: MIT
|
||||
*/
|
||||
|
||||
#include "types.h"
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
* random.h - header for random.c
|
||||
*
|
||||
* Copyright (c) 2009-2019, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
* SPDX-License-Identifier: MIT
|
||||
*/
|
||||
|
||||
#include "types.h"
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
* semi.c: test implementations of mathlib seminumerical functions
|
||||
*
|
||||
* Copyright (c) 1999-2019, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
* SPDX-License-Identifier: MIT
|
||||
*/
|
||||
|
||||
#include <stdio.h>
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
* semi.h: header for semi.c
|
||||
*
|
||||
* Copyright (c) 1999-2019, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
* SPDX-License-Identifier: MIT
|
||||
*/
|
||||
|
||||
#ifndef test_semi_h
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
* types.h
|
||||
*
|
||||
* Copyright (c) 2005-2019, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
* SPDX-License-Identifier: MIT
|
||||
*/
|
||||
|
||||
#ifndef mathtest_types_h
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
* wrappers.c - wrappers to modify output of MPFR/MPC test functions
|
||||
*
|
||||
* Copyright (c) 2014-2019, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
* SPDX-License-Identifier: MIT
|
||||
*/
|
||||
|
||||
#include <assert.h>
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
* wrappers.h - wrappers to modify output of MPFR/MPC test functions
|
||||
*
|
||||
* Copyright (c) 2014-2019, Arm Limited.
|
||||
* SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
* SPDX-License-Identifier: MIT
|
||||
*/
|
||||
|
||||
typedef struct {
|
||||
|
||||
+80
-47
@@ -2,8 +2,8 @@
|
||||
|
||||
# ULP error check script.
|
||||
#
|
||||
# Copyright (c) 2019-2023, Arm Limited.
|
||||
# SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
# Copyright (c) 2019-2020, Arm Limited.
|
||||
# SPDX-License-Identifier: MIT
|
||||
|
||||
#set -x
|
||||
set -eu
|
||||
@@ -72,16 +72,6 @@ t pow 0x1.ffffffffffff0p-1 0x1.0000000000008p0 x 0x1p60 0x1p68 50000
|
||||
t pow 0x1.ffffffffff000p-1 0x1p0 x 0x1p50 0x1p52 50000
|
||||
t pow -0x1.ffffffffff000p-1 -0x1p0 x 0x1p50 0x1p52 50000
|
||||
|
||||
L=0.02
|
||||
t exp10 0 0x1p-47 5000
|
||||
t exp10 -0 -0x1p-47 5000
|
||||
t exp10 0x1p-47 1 50000
|
||||
t exp10 -0x1p-47 -1 50000
|
||||
t exp10 1 0x1.34413509f79ffp8 50000
|
||||
t exp10 -1 -0x1.434e6420f4374p8 50000
|
||||
t exp10 0x1.34413509f79ffp8 inf 5000
|
||||
t exp10 -0x1.434e6420f4374p8 -inf 5000
|
||||
|
||||
L=1.0
|
||||
Ldir=0.9
|
||||
t erf 0 0xffff000000000000 10000
|
||||
@@ -153,10 +143,15 @@ Ldir=0.5
|
||||
done
|
||||
|
||||
# vector functions
|
||||
|
||||
Ldir=0.5
|
||||
r='n'
|
||||
flags="${ULPFLAGS:--q}"
|
||||
flags="${ULPFLAGS:--q} -f"
|
||||
runs=
|
||||
check __s_exp 1 && runs=1
|
||||
runv=
|
||||
check __v_exp 1 && runv=1
|
||||
runvn=
|
||||
check __vn_exp 1 && runvn=1
|
||||
|
||||
range_exp='
|
||||
0 0xffff000000000000 10000
|
||||
@@ -182,10 +177,9 @@ range_pow='
|
||||
'
|
||||
|
||||
range_sin='
|
||||
0 0x1p23 500000
|
||||
-0 -0x1p23 500000
|
||||
0x1p23 inf 10000
|
||||
-0x1p23 -inf 10000
|
||||
0 0xffff000000000000 10000
|
||||
0x1p-4 0x1p4 400000
|
||||
-0x1p-23 0x1p23 400000
|
||||
'
|
||||
range_cos="$range_sin"
|
||||
|
||||
@@ -205,10 +199,9 @@ range_logf='
|
||||
'
|
||||
|
||||
range_sinf='
|
||||
0 0x1p20 500000
|
||||
-0 -0x1p20 500000
|
||||
0x1p20 inf 10000
|
||||
-0x1p20 -inf 10000
|
||||
0 0xffff0000 10000
|
||||
0x1p-4 0x1p4 300000
|
||||
-0x1p-9 -0x1p9 300000
|
||||
'
|
||||
range_cosf="$range_sinf"
|
||||
|
||||
@@ -236,8 +229,9 @@ L_sinf=1.4
|
||||
L_cosf=1.4
|
||||
L_powf=2.1
|
||||
|
||||
while read G F D
|
||||
while read G F R
|
||||
do
|
||||
[ "$R" = 1 ] || continue
|
||||
case "$G" in \#*) continue ;; esac
|
||||
eval range="\${range_$G}"
|
||||
eval L="\${L_$G}"
|
||||
@@ -245,35 +239,74 @@ do
|
||||
do
|
||||
[ -n "$X" ] || continue
|
||||
case "$X" in \#*) continue ;; esac
|
||||
disable_fenv=""
|
||||
if [ -z "$WANT_SIMD_EXCEPT" ] || [ $WANT_SIMD_EXCEPT -eq 0 ]; then
|
||||
# If library was built with SIMD exceptions
|
||||
# disabled, disable fenv checking in ulp
|
||||
# tool. Otherwise, fenv checking may still be
|
||||
# disabled by adding -f to the end of the run
|
||||
# line.
|
||||
disable_fenv="-f"
|
||||
fi
|
||||
t $D $disable_fenv $F $X
|
||||
t $F $X
|
||||
done << EOF
|
||||
$range
|
||||
|
||||
EOF
|
||||
done << EOF
|
||||
# group symbol run
|
||||
exp _ZGVnN2v_exp
|
||||
log _ZGVnN2v_log
|
||||
pow _ZGVnN2vv_pow -f
|
||||
sin _ZGVnN2v_sin -z
|
||||
cos _ZGVnN2v_cos
|
||||
expf _ZGVnN4v_expf
|
||||
expf_1u _ZGVnN4v_expf_1u -f
|
||||
exp2f _ZGVnN4v_exp2f
|
||||
exp2f_1u _ZGVnN4v_exp2f_1u -f
|
||||
logf _ZGVnN4v_logf
|
||||
sinf _ZGVnN4v_sinf -z
|
||||
cosf _ZGVnN4v_cosf
|
||||
powf _ZGVnN4vv_powf -f
|
||||
exp __s_exp $runs
|
||||
exp __v_exp $runv
|
||||
exp __vn_exp $runvn
|
||||
exp _ZGVnN2v_exp $runvn
|
||||
|
||||
log __s_log $runs
|
||||
log __v_log $runv
|
||||
log __vn_log $runvn
|
||||
log _ZGVnN2v_log $runvn
|
||||
|
||||
pow __s_pow $runs
|
||||
pow __v_pow $runv
|
||||
pow __vn_pow $runvn
|
||||
pow _ZGVnN2vv_pow $runvn
|
||||
|
||||
sin __s_sin $runs
|
||||
sin __v_sin $runv
|
||||
sin __vn_sin $runvn
|
||||
sin _ZGVnN2v_sin $runvn
|
||||
|
||||
cos __s_cos $runs
|
||||
cos __v_cos $runv
|
||||
cos __vn_cos $runvn
|
||||
cos _ZGVnN2v_cos $runvn
|
||||
|
||||
expf __s_expf $runs
|
||||
expf __v_expf $runv
|
||||
expf __vn_expf $runvn
|
||||
expf _ZGVnN4v_expf $runvn
|
||||
|
||||
expf_1u __s_expf_1u $runs
|
||||
expf_1u __v_expf_1u $runv
|
||||
expf_1u __vn_expf_1u $runvn
|
||||
|
||||
exp2f __s_exp2f $runs
|
||||
exp2f __v_exp2f $runv
|
||||
exp2f __vn_exp2f $runvn
|
||||
exp2f _ZGVnN4v_exp2f $runvn
|
||||
|
||||
exp2f_1u __s_exp2f_1u $runs
|
||||
exp2f_1u __v_exp2f_1u $runv
|
||||
exp2f_1u __vn_exp2f_1u $runvn
|
||||
|
||||
logf __s_logf $runs
|
||||
logf __v_logf $runv
|
||||
logf __vn_logf $runvn
|
||||
logf _ZGVnN4v_logf $runvn
|
||||
|
||||
sinf __s_sinf $runs
|
||||
sinf __v_sinf $runv
|
||||
sinf __vn_sinf $runvn
|
||||
sinf _ZGVnN4v_sinf $runvn
|
||||
|
||||
cosf __s_cosf $runs
|
||||
cosf __v_cosf $runv
|
||||
cosf __vn_cosf $runvn
|
||||
cosf _ZGVnN4v_cosf $runvn
|
||||
|
||||
powf __s_powf $runs
|
||||
powf __v_powf $runv
|
||||
powf __vn_powf $runvn
|
||||
powf _ZGVnN4vv_powf $runvn
|
||||
EOF
|
||||
|
||||
[ 0 -eq $FAIL ] || {
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
; cosf.tst - Directed test cases for SP cosine
|
||||
;
|
||||
; Copyright (c) 2007-2019, Arm Limited.
|
||||
; SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
; SPDX-License-Identifier: MIT
|
||||
|
||||
func=cosf op1=7fc00001 result=7fc00001 errno=0
|
||||
func=cosf op1=ffc00001 result=7fc00001 errno=0
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
; erf.tst - Directed test cases for erf
|
||||
;
|
||||
; Copyright (c) 2007-2020, Arm Limited.
|
||||
; SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
; SPDX-License-Identifier: MIT
|
||||
|
||||
func=erf op1=7ff80000.00000001 result=7ff80000.00000001 errno=0
|
||||
func=erf op1=fff80000.00000001 result=7ff80000.00000001 errno=0
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
; erff.tst
|
||||
;
|
||||
; Copyright (c) 2007-2020, Arm Limited.
|
||||
; SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
; SPDX-License-Identifier: MIT
|
||||
|
||||
func=erff op1=7fc00001 result=7fc00001 errno=0
|
||||
func=erff op1=ffc00001 result=7fc00001 errno=0
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
; Directed test cases for exp
|
||||
;
|
||||
; Copyright (c) 2018-2019, Arm Limited.
|
||||
; SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
; SPDX-License-Identifier: MIT
|
||||
|
||||
func=exp op1=7ff80000.00000001 result=7ff80000.00000001 errno=0
|
||||
func=exp op1=fff80000.00000001 result=7ff80000.00000001 errno=0
|
||||
|
||||
@@ -1,15 +0,0 @@
|
||||
; Directed test cases for exp10
|
||||
;
|
||||
; Copyright (c) 2023, Arm Limited.
|
||||
; SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
|
||||
func=exp10 op1=7ff80000.00000001 result=7ff80000.00000001 errno=0
|
||||
func=exp10 op1=fff80000.00000001 result=7ff80000.00000001 errno=0
|
||||
func=exp10 op1=7ff00000.00000001 result=7ff80000.00000001 errno=0 status=i
|
||||
func=exp10 op1=fff00000.00000001 result=7ff80000.00000001 errno=0 status=i
|
||||
func=exp10 op1=7ff00000.00000000 result=7ff00000.00000000 errno=0
|
||||
func=exp10 op1=7fefffff.ffffffff result=7ff00000.00000000 errno=ERANGE status=ox
|
||||
func=exp10 op1=fff00000.00000000 result=00000000.00000000 errno=0
|
||||
func=exp10 op1=ffefffff.ffffffff result=00000000.00000000 errno=ERANGE status=ux
|
||||
func=exp10 op1=00000000.00000000 result=3ff00000.00000000 errno=0
|
||||
func=exp10 op1=80000000.00000000 result=3ff00000.00000000 errno=0
|
||||
@@ -1,7 +1,7 @@
|
||||
; Directed test cases for exp2
|
||||
;
|
||||
; Copyright (c) 2018-2019, Arm Limited.
|
||||
; SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
; SPDX-License-Identifier: MIT
|
||||
|
||||
func=exp2 op1=7ff80000.00000001 result=7ff80000.00000001 errno=0
|
||||
func=exp2 op1=fff80000.00000001 result=7ff80000.00000001 errno=0
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
; exp2f.tst - Directed test cases for exp2f
|
||||
;
|
||||
; Copyright (c) 2017-2019, Arm Limited.
|
||||
; SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
; SPDX-License-Identifier: MIT
|
||||
|
||||
func=exp2f op1=7fc00001 result=7fc00001 errno=0
|
||||
func=exp2f op1=ffc00001 result=7fc00001 errno=0
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
; expf.tst - Directed test cases for expf
|
||||
;
|
||||
; Copyright (c) 2007-2019, Arm Limited.
|
||||
; SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
; SPDX-License-Identifier: MIT
|
||||
|
||||
func=expf op1=7fc00001 result=7fc00001 errno=0
|
||||
func=expf op1=ffc00001 result=7fc00001 errno=0
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
; Directed test cases for log
|
||||
;
|
||||
; Copyright (c) 2018-2019, Arm Limited.
|
||||
; SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
; SPDX-License-Identifier: MIT
|
||||
|
||||
func=log op1=7ff80000.00000001 result=7ff80000.00000001 errno=0
|
||||
func=log op1=fff80000.00000001 result=7ff80000.00000001 errno=0
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
; Directed test cases for log2
|
||||
;
|
||||
; Copyright (c) 2018-2019, Arm Limited.
|
||||
; SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
; SPDX-License-Identifier: MIT
|
||||
|
||||
func=log2 op1=7ff80000.00000001 result=7ff80000.00000001 errno=0
|
||||
func=log2 op1=fff80000.00000001 result=7ff80000.00000001 errno=0
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
; log2f.tst - Directed test cases for log2f
|
||||
;
|
||||
; Copyright (c) 2017-2019, Arm Limited.
|
||||
; SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
; SPDX-License-Identifier: MIT
|
||||
|
||||
func=log2f op1=7fc00001 result=7fc00001 errno=0
|
||||
func=log2f op1=ffc00001 result=7fc00001 errno=0
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
; logf.tst - Directed test cases for logf
|
||||
;
|
||||
; Copyright (c) 2007-2019, Arm Limited.
|
||||
; SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
; SPDX-License-Identifier: MIT
|
||||
|
||||
func=logf op1=7fc00001 result=7fc00001 errno=0
|
||||
func=logf op1=ffc00001 result=7fc00001 errno=0
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
; Directed test cases for pow
|
||||
;
|
||||
; Copyright (c) 2018-2019, Arm Limited.
|
||||
; SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
; SPDX-License-Identifier: MIT
|
||||
|
||||
func=pow op1=00000000.00000000 op2=00000000.00000000 result=3ff00000.00000000 errno=0
|
||||
func=pow op1=00000000.00000000 op2=00000000.00000001 result=00000000.00000000 errno=0
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
; powf.tst - Directed test cases for powf
|
||||
;
|
||||
; Copyright (c) 2007-2019, Arm Limited.
|
||||
; SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
; SPDX-License-Identifier: MIT
|
||||
|
||||
func=powf op1=7f800001 op2=7f800001 result=7fc00001 errno=0 status=i
|
||||
func=powf op1=7f800001 op2=ff800001 result=7fc00001 errno=0 status=i
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
; Directed test cases for SP sincos
|
||||
;
|
||||
; Copyright (c) 2007-2019, Arm Limited.
|
||||
; SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
; SPDX-License-Identifier: MIT
|
||||
|
||||
|
||||
func=sincosf_sinf op1=7fc00001 result=7fc00001 errno=0
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
; sinf.tst - Directed test cases for SP sine
|
||||
;
|
||||
; Copyright (c) 2007-2019, Arm Limited.
|
||||
; SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
; SPDX-License-Identifier: MIT
|
||||
|
||||
|
||||
func=sinf op1=7fc00001 result=7fc00001 errno=0
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
!! double.tst - Random test case specification for DP functions
|
||||
!!
|
||||
!! Copyright (c) 1999-2019, Arm Limited.
|
||||
!! SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
|
||||
!! SPDX-License-Identifier: MIT
|
||||
|
||||
test exp 10000
|
||||
test exp2 10000
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user