Commit Graph

927 Commits

Author SHA1 Message Date
Lynne
151b41c8cc
fft: remove 16-bit FFT and MDCT code
No longer used by anything.
Unfortunately the old FFT_FLOAT/FFT_FIXED_32 is left as-is. It's
simply too much work for code meant to be all removed anyway.
2021-01-14 01:44:21 +01:00
Lynne
9e05421dbe
ac3enc_fixed: drop unnecessary fixed-point DSP code 2021-01-14 01:44:20 +01:00
Anton Khirnov
e15371061d lavu/mem: move the DECLARE_ALIGNED macro family to mem_internal on next+1 bump
They are not properly namespaced and not intended for public use.
2021-01-01 14:14:57 +01:00
Anton Khirnov
c8c2dfbc37 lavu: move LOCAL_ALIGNED from internal.h to mem_internal.h
That is a more appropriate place for it.
2021-01-01 14:11:01 +01:00
Martin Storsjö
b252178321 libavcodec: arm: Add a NEON implementation of pixblockdsp
Cortex A7     A8     A9    A53   A72
get_pixels_c:                144.7  146.0  143.0  137.7   69.0
get_pixels_armv6:            112.0  106.7   90.2   95.0   72.5
get_pixels_neon:              69.0   29.7   68.7   40.2   19.0
get_pixels_unaligned_c:      144.7  146.2  143.0  137.7   69.0
get_pixels_unaligned_neon:    77.0   36.5   72.5   48.5   19.0
diff_pixels_c:               376.7  319.7  265.5  307.7  148.0
diff_pixels_armv6:           179.0  159.5  205.5  139.0  142.0
diff_pixels_neon:             69.0   40.2   77.5   53.2   26.0
diff_pixels_unaligned_c:     376.7  319.7  265.5  307.7  148.0
diff_pixels_unaligned_neon:   85.0   54.5   93.5   66.7   26.0

Signed-off-by: Martin Storsjö <martin@martin.st>
2020-05-15 23:37:43 +03:00
qoroliang
cacdac819f lavc/hevcdec: fix the HEVC decoder crash when memory over-read
Fix an occasional crash for hevc decoder in ARM 32 platform, the
root cause is the memory over read(read cross the memory boundary)
in SAO NENO functions ff_hevc_sao_band_filter_neon_8 and
ff_hevc_sao_edge_filter_neon_8.

After this fix, the crash disapper in the massive Android phone
test.

Signed-off-by: qoroliang <qoroliang@tencent.com>
2020-04-20 10:28:04 +08:00
Aman Gupta
0e49560806 avcodec/arm/mlpdsp: add missing dependency for truehd
Signed-off-by: Aman Gupta <aman@tmm1.net>
2019-11-11 11:29:55 -08:00
James Almer
47e12966b7 Merge commit '0676de935b1e81bc5b5698fef3e7d48ff2ea77ff'
* commit '0676de935b1e81bc5b5698fef3e7d48ff2ea77ff':
  arm: Implement a NEON version of 422 h264_h_loop_filter_chroma

Merged-by: James Almer <jamrial@gmail.com>
2019-03-22 16:06:04 -03:00
Martin Storsjö
0676de935b arm: Implement a NEON version of 422 h264_h_loop_filter_chroma
Previously, the 420 version was used even for 422.

This fixes occasional checkasm failures.

Signed-off-by: Martin Storsjö <martin@martin.st>
2019-03-21 22:03:46 +02:00
James Almer
d6b62ce1ac Merge commit 'cef914e08310166112ac09567e66452a7679bfc8'
* commit 'cef914e08310166112ac09567e66452a7679bfc8':
  arm: vp8: Optimize put_epel16_h6v6 with vp8_epel8_v6_y2

Merged-by: James Almer <jamrial@gmail.com>
2019-03-14 16:19:41 -03:00
James Almer
7b9ca44cbc arm/h264dsp: change loop filter stride argument to ptrdiff_t
This was missed in d5d699ab6e

Signed-off-by: James Almer <jamrial@gmail.com>
2019-02-20 19:38:55 -03:00
Martin Storsjö
cef914e083 arm: vp8: Optimize put_epel16_h6v6 with vp8_epel8_v6_y2
This makes it similar to put_epel16_v6, and gives a 10-25%
speedup of this function.

Before:                   Cortex A7       A8       A9      A53     A72
vp8_put_epel16_h6v6_neon:    3058.0   2218.5   2459.8   2183.0  1572.2
After:
vp8_put_epel16_h6v6_neon:    2670.8   1934.2   2244.4   1729.4  1503.9

Signed-off-by: Martin Storsjö <martin@martin.st>
2019-02-19 11:46:18 +02:00
Meng Wang
3b2fd96048 avcodec/arm/hevcdsp_sao : add NEON optimization for sao
Signed-off-by: Meng Wang <wangmeng.kids@bytedance.com>
Reviewed-by: Shengbin Meng <shengbinmeng@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2018-04-09 03:45:15 +02:00
Martin Storsjö
5f83935de4 arm: hevcdsp: Add commas between macro arguments
When targeting darwin, clang requires commas between arguments,
while the no-comma form is allowed for other targets.

Since Xcode 9.3, the bundled clang supports altmacro and doesn't
require using gas-preprocessor any longer.

Signed-off-by: Martin Storsjö <martin@martin.st>
2018-03-31 21:59:01 +03:00
Martin Storsjö
6660bc034d arm: hevcdsp: Avoid using macro expansion counters
Clang supports the macro expansion counter (used for making unique
labels within macro expansions), but not when targeting darwin.

Convert uses of the counter into normal local labels, as used
elsewhere.

Since Xcode 9.3, the bundled clang supports altmacro and doesn't
require using gas-preprocessor any longer.

Signed-off-by: Martin Storsjö <martin@martin.st>
2018-03-31 21:55:32 +03:00
James Almer
a7109b82c4 Merge commit 'ab05d3934de8e932dbd77979a687e6598e67535c'
* commit 'ab05d3934de8e932dbd77979a687e6598e67535c':
  arm: vc1dsp: Add commas between macro arguments

Merged-by: James Almer <jamrial@gmail.com>
2018-03-30 15:47:31 -03:00
Martin Storsjö
ab05d3934d arm: vc1dsp: Add commas between macro arguments
When targeting darwin, clang requires commas between arguments,
while the no-comma form is allowed for other targets.

Since Xcode 9.3, the bundled clang supports altmacro and doesn't
require using gas-preprocessor any longer.

Signed-off-by: Martin Storsjö <martin@martin.st>
2018-03-30 15:47:24 +03:00
Aurelien Jacobs
f677718bc8 sbcenc: add armv6 and neon asm optimizations
This was originally based on libsbc, and was fully integrated into ffmpeg.
2018-03-07 22:26:53 +01:00
Michael Niedermayer
7dbbb75ee3 avcodec/arm/sbrdsp_neon: Use a free register instead of putting 2 things in one
Fixes high pitched shriek
Fixes: 25420848_1478428308873746_4255813235963330560_n.mp4

Reported-by: Dale Curtis <dalecurtis@google.com>
Reviewed-by: Dale Curtis <dalecurtis@chromium.org>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2018-01-12 22:45:02 +01:00
James Almer
36de24d5b7 arm/hevc_idct: fix compilation on Android
Compilation error "out of range" fixed for armeabi-v7a. Compilation failed
trying to build libvlc.aar for ARM7 android on ubuntu 16.04 host. Error
messages is "Offset out of range". The reason of the error is assembler LDR
directives in function "ff_hevc_transform_luma_4x4_neon_8" need local storage
in range <1k, but no such storage provided.

Based on a patch by Ihor Bobalo <bob@eleks.com>

Suggested-by: wbs
Signed-off-by: James Almer <jamrial@gmail.com>
2017-12-09 21:46:34 +02:00
Alexandra Hájková
7993ec19af hevc: Add hevc_get_pixel_4/8/12/16/24/32/48/64
Checkasm timings:
block size bitdepth  C       NEON
4           8 bit:    146.7   48.7
           10 bit:    146.7   52.7
8           8 bit:    430.3   84.4
           10 bit:    430.4  119.5
12          8 bit:    812.8  141.0
           10 bit:    812.8  195.0
16          8 bit:   1499.1  268.0
           10 bit:   1498.9  368.4
24          8 bit:   4394.2  574.8
           10 bit:   3696.3  804.8
32          8 bit:   5108.6  568.9
           10 bit:   4249.6  918.8
48          8 bit:  16819.6 2304.9
           10 bit:  13882.0 3178.5
64          8 bit:  13490.8 1799.5
           10 bit:  11018.5 2519.4

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-12-08 23:41:01 +02:00
James Almer
68e479e3ad Merge commit 'b487add7ecf78efda36d49815f8f8757bd24d4cb'
* commit 'b487add7ecf78efda36d49815f8f8757bd24d4cb':
  arm: Remove a redundant check in fmtconvert_init_arm.c

Merged-by: James Almer <jamrial@gmail.com>
2017-11-11 23:30:31 -03:00
James Almer
640073eceb Merge commit '9dde6ab06c48f9447cd16f39bee33569cddb7be4'
* commit '9dde6ab06c48f9447cd16f39bee33569cddb7be4':
  arm: Fix SIGBUS on ARM when compiled with binutils 2.29

Merged-by: James Almer <jamrial@gmail.com>
2017-11-11 13:44:07 -03:00
James Almer
921993503b Merge commit 'd7320ca3ed10f0d35b3740fa03341161e74275ea'
* commit 'd7320ca3ed10f0d35b3740fa03341161e74275ea':
  arm: Avoid using .dn register aliases

Merged-by: James Almer <jamrial@gmail.com>
2017-10-30 21:00:51 -03:00
James Almer
62d86c41b7 Merge commit 'ce080f47b8b55ab3d41eb00487b138d9906d114d'
* commit 'ce080f47b8b55ab3d41eb00487b138d9906d114d':
  hevc: Add NEON 32x32 IDCT

Merged-by: James Almer <jamrial@gmail.com>
2017-10-30 19:59:01 -03:00
James Almer
e9e7e1cc6b Merge commit '118dd4a321a2d67f67c21b076abd0b4d939ab642'
* commit '118dd4a321a2d67f67c21b076abd0b4d939ab642':
  hevc: 16x16 NEON idct: Use the right element size for loads/stores

Merged-by: James Almer <jamrial@gmail.com>
2017-10-30 19:56:29 -03:00
James Almer
31a4112936 Merge commit 'edbf0fffb15dde7a1de70b05855529d5fc769f14'
* commit 'edbf0fffb15dde7a1de70b05855529d5fc769f14':
  hevc: Add NEON add_residual for bitdepth 10

Merged-by: James Almer <jamrial@gmail.com>
2017-10-30 18:07:31 -03:00
James Almer
05beee44c6 Merge commit 'e1c2453a4fac1f7116244d0d05310935c20887e6'
* commit 'e1c2453a4fac1f7116244d0d05310935c20887e6':
  arm: hevc_idct: Tune the add_res_8x8 and add_res_32x32 functions

Merged-by: James Almer <jamrial@gmail.com>
2017-10-30 17:41:08 -03:00
James Almer
999c2271a5 Merge commit '0d4d43513786f1df4d561e1fac924fb0722c6700'
* commit '0d4d43513786f1df4d561e1fac924fb0722c6700':
  hevc: Add NEON add_residual for bitdepth 8

See 03cecf45c1

Merged-by: James Almer <jamrial@gmail.com>
2017-10-30 17:39:37 -03:00
James Almer
f9c3fbc00c Merge commit '3d69dd65c6771c28d3bf4e8e53a905aa8cd01fd9'
* commit '3d69dd65c6771c28d3bf4e8e53a905aa8cd01fd9':
  hevc: Add support for bitdepth 10 for IDCT DC

Merged-by: James Almer <jamrial@gmail.com>
2017-10-30 16:03:27 -03:00
James Almer
cc8c2d3609 Merge commit '358adef0305618219522858e471edf7e0cb4043e'
* commit '358adef0305618219522858e471edf7e0cb4043e':
  hevc: Add NEON IDCT DC functions for bitdepth 8

See 03cecf45c1

Merged-by: James Almer <jamrial@gmail.com>
2017-10-30 15:58:40 -03:00
James Almer
9840ca70e7 Merge commit '89d9869d2491d4209d707a8e7f29c58227ae5a4e'
* commit '89d9869d2491d4209d707a8e7f29c58227ae5a4e':
  hevc: Add NEON 16x16 IDCT

Merged-by: James Almer <jamrial@gmail.com>
2017-10-27 18:22:39 -03:00
James Almer
c0683dce89 Merge commit '0b9a237b2386ff84a6f99716bd58fa27a1b767e7'
* commit '0b9a237b2386ff84a6f99716bd58fa27a1b767e7':
  hevc: Add NEON 4x4 and 8x8 IDCT

[15:12:59] <@ubitux> hevc_idct_4x4_8_c: 389.1
[15:13:00] <@ubitux> hevc_idct_4x4_8_neon: 126.6
[15:13:02] <@ubitux> our ^
[15:13:06] <@ubitux> hevc_idct_4x4_8_c: 389.3
[15:13:08] <@ubitux> hevc_idct_4x4_8_neon: 107.8
[15:13:10] <@ubitux> hevc_idct_4x4_10_c: 418.6
[15:13:12] <@ubitux> hevc_idct_4x4_10_neon: 108.1
[15:13:14] <@ubitux> libav ^
[15:13:30] <@ubitux> so yeah, we can probably trash our versions here

Merged-by: James Almer <jamrial@gmail.com>
2017-10-24 19:10:22 -03:00
Martin Storsjö
b487add7ec arm: Remove a redundant check in fmtconvert_init_arm.c
This was missed in e2710e790c, where have_vfp && !have_vfpv3 were
converted into have_vfp_vm.

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-10-24 09:07:01 +03:00
Martin Storsjö
9dde6ab06c arm: Fix SIGBUS on ARM when compiled with binutils 2.29
In binutils 2.29, the behavior of the ADR instruction changed so that 1 is
added to the address of a Thumb function (previously nothing was added). This
allows the loaded address to be passed to a BLX instruction and the correct
mode change will occur.

See: https://sourceware.org/bugzilla/show_bug.cgi?id=21458

By using adr with a label that isn't annotated as a thumb function,
we avoid the new behaviour in binutils 2.29 and get the same behaviour
as in prior releases, and as in other assemblers (ms armasm.exe,
clang's built in assembler) - an idea that Janne Grunau came up with.

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-09-02 22:18:20 +03:00
Muhammad Faiz
0780ad9c68 avcodec/rdft: remove sintable
It is redundant with costable. The first half of sintable is
identical with the second half of costable. The second half
of sintable is negative value of the first half of sintable.

The computation is changed to handle sign of sin values, in
C code and ARM assembly code.

Signed-off-by: Muhammad Faiz <mfcc64@gmail.com>
2017-07-11 13:22:02 +07:00
Clément Bœsch
b12a36170b lavc/aacpsdsp: use ptrdiff_t for stride in hybrid_analysis 2017-06-28 12:22:39 +02:00
Clément Bœsch
e4a27e2f2d lavc/arm: fix lack of precision in ff_ps_stereo_interpolate_neon
The code originally pre-multiply by 2 the steps, causing the running sum
of the h factors to drift away due to the lack of precision. It quickly
causes an inaccuracy > 0.01.

I tried diverse approaches such as multiply by 2.0 (instead of adding
the value itself) without success.

I'm unable to bench the impact of this change, feel free to compare.

This commit fixes the incoming aacpsdsp tests.

Following is an alternative simplified function (matching the incoming
AArch64 code) that may be used:

function ff_ps_stereo_interpolate_neon, export=1
        vld1.32         {q0}, [r2]
        vld1.32         {q1}, [r3]
        ldr             r12, [sp]
        vmov.f32        q8, q0
        vmov.f32        q9, q1
        vzip.32         q8, q0
        vzip.32         q9, q1
1:
        vld1.32         {d4}, [r0,:64]
        vld1.32         {d6}, [r1,:64]
        vadd.f32        q8, q8, q9
        vadd.f32        q0, q0, q1
        vmov.f32        d5, d4
        vmov.f32        d7, d6
        vmul.f32        q2, q2, q8
        vmla.f32        q2, q3, q0
        vst1.32         {d4}, [r0,:64]!
        vst1.32         {d5}, [r1,:64]!
        subs            r12, r12, #1
        bgt             1b
        bx              lr
endfunc
2017-06-28 11:59:34 +02:00
Martin Storsjö
d7320ca3ed arm: Avoid using .dn register aliases
clang now (in the upcoming 5.0 version) is capable of building our
arm assembly without relying on gas-preprocessor, although clang/LLVM
doesn't support .dn register aliases.

The VC1 MC assembly was only built and used if the chosen assembler
supported the .dn directives though. This was supported as long as
gas-preprocessor was used.

This means that VC1 decoding got a speed regression on clang 5.0,
unless the user manually chose using gas-preprocessor again.

By avoiding using the .dn register aliases, we can build the VC1 MC
assembly with the latest clang version.

Support for the .dn/.qn directives in clang/LLVM isn't actively planned,
see https://bugs.llvm.org/show_bug.cgi?id=18199.

This partially reverts 896a5bff64.

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-05-15 09:52:18 +03:00
Alexandra Hájková
ce080f47b8 hevc: Add NEON 32x32 IDCT
Signed-off-by: Martin Storsjö <martin@martin.st>
2017-05-04 14:08:39 +02:00
Alexandra Hájková
118dd4a321 hevc: 16x16 NEON idct: Use the right element size for loads/stores
This doesn't change the actual behaviour of the code but improves
readability.

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-05-04 14:08:27 +02:00
Alexandra Hájková
edbf0fffb1 hevc: Add NEON add_residual for bitdepth 10
Signed-off-by: Martin Storsjö <martin@martin.st>
2017-05-01 23:39:55 +03:00
Martin Storsjö
e1c2453a4f arm: hevc_idct: Tune the add_res_8x8 and add_res_32x32 functions
Before:              Cortex     A7      A8      A9     A53
hevc_add_res_8x8_8_neon:     116.0    58.7    80.2    90.7
hevc_add_res_32x32_8_neon:  1230.0   737.5  1187.5   974.4
After:
hevc_add_res_8x8_8_neon:      97.7    57.0    73.7    80.0
hevc_add_res_32x32_8_neon:  1216.0   698.7  1127.5   827.1

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-04-28 12:02:14 +03:00
Seppo Tomperi
0d4d435137 hevc: Add NEON add_residual for bitdepth 8
Optimized by Alexandra Hájková.

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-04-27 23:05:27 +03:00
Alexandra Hájková
3d69dd65c6 hevc: Add support for bitdepth 10 for IDCT DC
Signed-off-by: Martin Storsjö <martin@martin.st>
2017-04-25 22:48:45 +03:00
Seppo Tomperi
358adef030 hevc: Add NEON IDCT DC functions for bitdepth 8
Signed-off-by: Alexandra Hájková <alexandra@khirnov.net>
Signed-off-by: Martin Storsjö <martin@martin.st>
2017-04-25 22:48:45 +03:00
Alexandra Hájková
89d9869d24 hevc: Add NEON 16x16 IDCT
The speedup vs C code is around 6-13x.

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-04-12 22:40:54 +03:00
Ronald S. Bultje
40cbd686dc idct_arm: remove use of ff_put/add_pixels_clamped function pointer.
Instead, hardcode the use of the _arm implementation of add_pixels,
and use the C version for put_pixels (as no arm-optimized version
exists). Since there's separate implementations of idct{,_put,_add}
for neon, this has no practical impact on performance.
2017-04-06 10:03:27 -04:00
Ronald S. Bultje
0c46641784 vp9: split out generic decoding skeleton interface API from VP9 types.
This allows vp9dsp.h to only include the VP9 types header, and not the
decoder skeleton interface which is for hardware decoders (dxva2/vaapi).
2017-03-28 18:04:27 -04:00
Ronald S. Bultje
f8c019944d vp9: re-split the decoder/format/dsp interface header files.
The advantage here is that the internal software decoder interface is
not exposed to the DSP functions or the hardware accelerations.
2017-03-28 18:04:26 -04:00