Commit Graph

85 Commits

Author SHA1 Message Date
Janne Grunau
e2710e790c arm: add a cpu flag for the VFPv2 vector mode
The vector mode was deprecated in ARMv7-A/VFPv3 and various cpu
implementations do not support it in hardware. Vector mode code will
depending the OS either be emulated in software or result in an illegal
instruction on cpus which does not support it. This was not really
problem in practice since NEON implementations of the same functions are
preferred. It will however become a problem for checkasm which tests
every cpu flag separately.

Since this is a cpu feature newer cpu do not support anymore the
behaviour of this flag differs from the other flags. It can be only
activated by runtime cpu feature selection.
2015-12-14 16:42:35 +01:00
Martin Storsjö
dcae2e32f7 arm: Suppress tags about used cpu arch and extensions
When all the codepaths using manually set .arch/.fpu code is
behind runtime detection, the elf attributes should be suppressed.

This allows tools to know that the final built binary doesn't
strictly require these extensions.

Signed-off-by: Martin Storsjö <martin@martin.st>
2015-03-07 17:10:08 +02:00
Peter Meerwald
76ce9bd8e2 libavutil: Add ARM av_clip_intp2_arm
add ARM code for implementing av_clip_intp2 using the ssat instruction

on Cortex-A8, av_clip_intp2_arm() is faster than av_clip_intp2_c() and
the generic av_clip(), about -19%

Signed-off-by: Peter Meerwald <pmeerw@pmeerw.net>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2015-02-21 00:54:40 +01:00
Martin Storsjö
f963f80399 arm: Use .data.rel.ro for const data with relocations
Signed-off-by: Martin Storsjö <martin@martin.st>
2014-12-09 11:43:25 +02:00
Ben Avison
6869612f5c arm: Macroize the test for 'setend' CPU instruction support
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2014-07-21 15:08:01 -07:00
Ben Avison
5a272190a0 armv6: Accelerate butterflies_float
I benchmarked the result by measuring the number of gperftools samples that
hit anywhere in the AAC decoder (starting from aac_decode_frame()) or
specifically in butterflies_float_c() / ff_butterflies_float_vfp() for the
same sample AAC stream:

                   Before          After
                   Mean   StdDev   Mean   StdDev  Confidence  Change
Audio decode       1542.8 43.7     1470.5 41.5    100.0%      +4.9%
butterflies_float  130.0  11.9     70.2   12.1    100.0%      +85.2%

Signed-off-by: Martin Storsjö <martin@martin.st>
2014-07-18 01:34:38 +03:00
Ben Avison
5edad2c4a1 armv6: Accelerate vector_fmul_window
I benchmarked the result by measuring the number of gperftools samples that
hit anywhere in the AAC decoder (starting from aac_decode_frame()) or
specifically in vector_fmul_window_c() / ff_vector_fmul_window_vfp() for the
same sample AAC stream:

                    Before          After
                    Mean   StdDev   Mean   StdDev  Confidence  Change
Audio decode        1598.2 47.4     1529.2 25.4    100.0%      +4.5%
vector_fmul_window  244.0  22.1     188.9  22.3    100.0%      +29.2%

Signed-off-by: Martin Storsjö <martin@martin.st>
2014-07-18 01:34:31 +03:00
Martin Storsjö
7b0c7c9163 arm: Detect 32 bit cpu features on ARMv8 when running on a 64 bit kernel
When running on a 64 bit kernel, /proc/cpuinfo lists different
optional features than on 32 bit kernels (because some of them
are mandatory in the 64 bit implemenations).

The kernel does list the old features properly if they are queried
via /proc/self/auxv though - however this file is not always readable
(e.g. on most android systems). The getauxval function could also
provide the same info as /proc/self/auxv even if this file isn't
readable, but this function is not always available (and thus would
need to be loaded with dlsym for compatibility with older android
versions).

The android cpufeatures library does this slightly differently,
by assuming that these are available if the "CPU architecture"
line is >= 8, see [1] for details.

It has been suggested to include the old, non-optional features in
/proc/cpuinfo as well, but that suggested patch never was merged.
See [2] for the discussion around this suggestion.

[1] https://android-review.googlesource.com/91380
[2] http://marc.info/?l=linux-arm-kernel&m=139087240101974

Signed-off-by: Martin Storsjö <martin@martin.st>
2014-06-28 22:16:59 +03:00
Janne Grunau
d5a5598198 build: check if AS supports the '.func' directive
Not supported by Clang's integrated assembler. Since it just adds
debug information it can safely omitted.
2014-06-03 14:23:03 +02:00
Diego Biurrun
831a118078 Update dsputil- and SIMD-related comments to match reality more closely 2014-03-13 05:50:29 -07:00
Janne Grunau
cbddee1cca arm: hpeldsp: prevent overreads in armv6 asm
Based on a patch by Russel King <rmk+libav@arm.linux.org.uk>

Bug-Id: 646
CC: libav-stable@libav.org
2014-03-05 14:30:57 +01:00
Martin Storsjö
543156d751 arm: Mark the stack as non-executable
If linking in an object file without this attribute set, the
linker will assume that an executable stack might be needed.

Signed-off-by: Martin Storsjö <martin@martin.st>
2014-02-19 09:57:19 +02:00
Martin Storsjö
e3fec3f095 arm: Add EXTERN_ASM to the .func and .type declarations for exported symbols
This makes the generated assembly more internally consistent,
avoiding declaring two labels for the same function (for cases
where EXTERN_ASM is empty) and not declaring a separate unprefixed
label in other cases.

This also makes sure the .func and .type delcarations have the same
prefix. They have previously not been used on the platforms
that have prefixed symbols on arm (iOS), but gas-preprocessor
has recently started using the .func declarations for adding
.thumb_func declarations for such functions.

Signed-off-by: Martin Storsjö <martin@martin.st>
2014-02-07 15:14:06 +02:00
Martin Storsjö
44a0a98f92 arm: Add an option for making sure NEON registers aren't clobbered
This is pretty much based on the same test for XMM registers.

Signed-off-by: Martin Storsjö <martin@martin.st>
2014-01-11 00:03:00 +02:00
Martin Storsjö
5dae487235 arm: Allow overriding the alignment set in the function macro
The function macro always sets .align 2 before declaring the
function label (since 5c5e1ea3) and always sets the section to
.text (since 278caa6a).

The .align 5 before certain functions, added in fc252eba, were added
before .text and .align were added to the function macro and thus
became useless/unused when the function macro got them.

This restores the original intention, to align the loop entry
points.

Signed-off-by: Martin Storsjö <martin@martin.st>
2014-01-07 19:29:56 +02:00
Diego Biurrun
7ffda66fd5 arm: float_dsp: Propagate cpu_flags to vfp initialization function 2013-08-29 11:24:14 +02:00
Diego Biurrun
8410d6e93c avutil: Refactor CPU extension availability macros 2013-08-28 23:54:14 +02:00
Diego Biurrun
b78b10c4b7 avutil: Move internal CPU detection function declarations to private header 2013-08-28 23:54:14 +02:00
Diego Biurrun
439902e0d6 Employ consistent LIBAV_COMPAT_ multiple inclusion guards in compat/
Also fix a comment and an #endif comment.
2013-07-18 18:12:38 +02:00
Martin Storsjö
be7952b5c3 arm: Only output eabi attributes if building for ELF
This matches the other eabi attribute in the same file. This is
required in order to build for arm/hardfloat with other object
file formats than ELF.

Signed-off-by: Martin Storsjö <martin@martin.st>
2013-05-27 00:55:33 +03:00
Diego Biurrun
1fda184a85 avutil: Add av_cold attributes to init functions missing them 2013-05-04 22:48:05 +02:00
Martin Storsjö
ab8f1a6989 arm: Fall back to runtime cpu feature detection via /proc/cpuinfo
On recent android versions, /proc/self/auxw is unreadable
(unless the process is running running under the shell uid or
in debuggable mode, which makes it hard to notice). See
http://b.android.com/43055 and
https://android-review.googlesource.com/51271 for more information
about the issue.

This makes sure e.g. neon optimizations are enabled at runtime in
android apps even when built in release mode, if configured to
use the runtime detection.

CC: libav-stable@libav.org
Signed-off-by: Martin Storsjö <martin@martin.st>
2013-02-11 17:15:15 +02:00
Ronald S. Bultje
d56668bd80 floatdsp: move scalarproduct_float from dsputil to avfloatdsp.
This makes the aac decoder and all voice codecs independent of dsputil.
2013-01-22 11:55:42 -08:00
Ronald S. Bultje
5959bfaca3 floatdsp: move butterflies_float from dsputil to avfloatdsp.
This makes wmadec/enc, twinvq and mpegaudiodec (i.e. mp2/mp3)
independent of dsputil.
2013-01-22 11:55:42 -08:00
Ronald S. Bultje
42d3246948 floatdsp: move vector_fmul_reverse from dsputil to avfloatdsp.
Now, nellymoserenc and aacenc no longer depends on dsputil. Independent
of this patch, wmaprodec also does not depend on dsputil, so I removed
it from there also.
2013-01-22 11:55:42 -08:00
Ronald S. Bultje
55aa03b9f8 floatdsp: move vector_fmul_add from dsputil to avfloatdsp. 2013-01-22 11:55:42 -08:00
Justin Ruggles
e034cc6c60 lavc: Move vector_fmul_window to AVFloatDSPContext
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2013-01-16 10:45:45 +01:00
Mans Rullgard
b57c1da81e arm: detect cpu features at runtime on Linux
This allows compiling optimised functions for features not enabled
in the core build and selecting these at runtime if the system has
the necessary support.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-12-07 16:54:04 +00:00
Mans Rullgard
b326755989 arm: rename ARMVFP config symbol to VFP
This is consistent with usual ARM nomenclature as well as with the
VFPV3 and NEON symbols which both lack the ARM prefix.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-12-07 16:54:04 +00:00
Mans Rullgard
a7831d509f arm: use HAVE*_INLINE/EXTERNAL macros for conditional compilation
These macros reflect the actual capabilities required here.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-12-07 16:54:03 +00:00
Justin Ruggles
284ea790d8 dsputil: move vector_fmul_scalar() to AVFloatDSPContext in libavutil 2012-11-26 11:29:06 -05:00
Diego Biurrun
9734b8ba56 Move avutil tables only used in libavcodec to libavcodec. 2012-10-11 18:29:36 +02:00
Mans Rullgard
51a15ed740 ARM: use numeric ID for Tag_ABI_align_preserved
Some old assemblers still in use do not support named tags.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-10-03 11:49:55 +01:00
Mans Rullgard
1ca3b62b10 ARM: bswap: drop armcc version of av_bswap16()
This function causes several versions of armcc to miscompile code,
and the performance impact is small.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-10-02 19:47:56 +01:00
Mans Rullgard
5e826fd65e ARM: set Tag_ABI_align_preserved in all asm files
All our ARM asm preserves alignment so setting this attribute
in a common location is simpler.  This removes numerous warnings
when linking with armcc.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-10-02 19:47:56 +01:00
Mans Rullgard
7bda4ed780 ARM: fix Thumb PIC on Apple
LDR with register offset and PC as base register is not available in
the Thumb instruction set so the addition must be done separately.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-10-02 13:12:33 +01:00
Mans Rullgard
8995d34972 ARM: use 2-operand syntax for ADD Rd, PC in Apple PIC code
The Apple assembler refuses to assemble the 3-operand form
in Thumb2 even though it is valid syntax.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-09-21 07:07:58 +01:00
Mans Rullgard
cdb7db5acd ARM: align PIC offset pools to 4 bytes
When building Thumb2 code, the end of a function, where the PIC
offsets are placed, need not be aligned.  Although the values
are only accessed with instructions allowing unaligned addresses,
keeping them aligned is preferable.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-09-21 07:07:58 +01:00
Mans Rullgard
a27a690fac ARM: swap source operands in some add instructions
This allows using a 16-bit opcode when generating Thumb2 code.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-09-20 17:07:18 +01:00
Mans Rullgard
7689eea49a flacdsp: arm optimised lpc filter 2012-09-15 23:54:21 +01:00
Mans Rullgard
87fa05a0da ARM: intmath: use native-size return types for clipping functions
This avoids having the compiler redundantly mask the values to
the smaller size.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-13 14:51:52 +01:00
Mans Rullgard
6c4975eaaf libavutil: add saturating addition functions
Fixed-point audio codecs often use saturating arithmetic, and
special instructions for these operations are common.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-13 01:03:10 +01:00
Mans Rullgard
0d735ca214 ARM: add missing "cc" clobber in av_clipl_int32_arm()
Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-10 10:51:10 +01:00
Mans Rullgard
ec9d2c15c1 ARM: use Q/R inline asm operand modifiers only if supported
Some compilers do not support the Q/R modifiers used to access
the low/high parts of a 64-bit register pair.  Check for this
and disable all uses of it when not supported.

Fixes bug #337.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-07 21:13:30 +01:00
Mans Rullgard
62634158b7 ARM: generate position independent code to access data symbols
This creates proper position independent code when accessing
data symbols if CONFIG_PIC is set.

References to external symbols should now use the movrelx macro.
Some additional code changes are required since this macro may
need a register to hold the GOT pointer.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-07-01 11:25:06 +01:00
Diego Biurrun
a5a93fa8f5 cosmetics: do not use full path for local headers 2012-06-22 10:49:40 +02:00
Justin Ruggles
cb5042d02c float_dsp: Move vector_fmac_scalar() from libavcodec to libavutil 2012-06-18 18:01:14 -04:00
Mans Rullgard
a839d6abf8 ARM: fix float_dsp breakage from d5a7229
Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-06-08 19:45:37 +01:00
Justin Ruggles
d5a7229ba4 Add a float DSP framework to libavutil
Move vector_fmul() from DSPContext to AVFloatDSPContext.
2012-06-08 13:14:38 -04:00
Justin Ruggles
94d2b0d2fd ARM: Move asm.S from libavcodec to libavutil
This will allow for easier implementation of ARM-optimized functions in
libraries other than libavcodec.
2012-06-08 13:14:38 -04:00