FFmpeg/libavcodec/arm
Martin Storsjö 600f4c9b03 arm: vp9itxfm: Avoid reloading the idct32 coefficients
The idct32x32 function actually pushed q4-q7 onto the stack even
though it didn't clobber them; there are plenty of registers that
can be used to allow keeping all the idct coefficients in registers
without having to reload different subsets of them at different
stages in the transform.

Since the idct16 core transform avoids clobbering q4-q7 (but clobbers
q2-q3 instead, to avoid needing to back up and restore q4-q7 at all
in the idct16 function), and the lanewise vmul needs a register in
the q0-q3 range, we move the stored coefficients from q2-q3 into q4-q5
while doing idct16.

While keeping these coefficients in registers, we still can skip pushing
q7.

Before:                              Cortex A7       A8       A9      A53
vp9_inv_dct_dct_32x32_sub32_add_neon:  18553.8  17182.7  14303.3  12089.7
After:
vp9_inv_dct_dct_32x32_sub32_add_neon:  18470.3  16717.7  14173.6  11860.8

This is cherrypicked from libav commit
402546a172.

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-03-11 13:14:51 +02:00
..
aac.h
aacpsdsp_init_arm.c Reinstate proper FFmpeg license for all files. 2013-08-30 15:47:38 +00:00
aacpsdsp_neon.S Reinstate proper FFmpeg license for all files. 2013-08-30 15:47:38 +00:00
ac3dsp_arm.S Reinstate proper FFmpeg license for all files. 2013-08-30 15:47:38 +00:00
ac3dsp_armv6.S Reinstate proper FFmpeg license for all files. 2013-08-30 15:47:38 +00:00
ac3dsp_init_arm.c Merge commit '4958f35a2ebc307049ff2104ffb944f5f457feb3' 2013-12-09 04:12:40 +01:00
ac3dsp_neon.S Merge commit '4958f35a2ebc307049ff2104ffb944f5f457feb3' 2013-12-09 04:12:40 +01:00
asm-offsets.h Merge commit '6a13505c069890cb0e2a07e29fd819a0cf2e73c1' 2014-04-30 00:23:01 +02:00
audiodsp_arm.h Merge commit '9a9e2f1c8aa4539a261625145e5c1f46a8106ac2' 2014-06-22 17:58:28 +02:00
audiodsp_init_arm.c Merge commit '9a9e2f1c8aa4539a261625145e5c1f46a8106ac2' 2014-06-22 17:58:28 +02:00
audiodsp_init_neon.c Merge commit '9a9e2f1c8aa4539a261625145e5c1f46a8106ac2' 2014-06-22 17:58:28 +02:00
audiodsp_neon.S Merge commit '9a9e2f1c8aa4539a261625145e5c1f46a8106ac2' 2014-06-22 17:58:28 +02:00
blockdsp_arm.h blockdsp: remove high bitdepth parameter 2015-10-02 04:38:40 +02:00
blockdsp_init_arm.c blockdsp: remove high bitdepth parameter 2015-10-02 04:38:40 +02:00
blockdsp_init_neon.c blockdsp: reindent after parameter removal 2015-10-03 23:34:56 +02:00
blockdsp_neon.S Merge commit 'e74433a8e6fc00c8dbde293c97a3e45384c2c1d9' 2014-06-19 04:54:38 +02:00
cabac.h avcodec/arm/cabac: fix inline cabac reader with the UNCHECKED bitstream reader 2014-03-15 01:08:45 +01:00
dca.h avcodec/dca: remove old decoder 2016-01-31 17:09:38 +01:00
fft_fixed_init_arm.c Merge commit '97aec6e75ef36ed0402653519daa8e1fc8ddb555' 2016-04-12 15:43:09 +01:00
fft_fixed_neon.S Merge commit 'f963f80399deb1a2b44c1bac3af7123e8a0c9e46' 2014-12-09 11:58:13 +01:00
fft_init_arm.c Merge commit '4c297249ac0f513a610a62691ce96d6b62f65b94' 2016-04-12 15:43:34 +01:00
fft_neon.S Merge commit 'f963f80399deb1a2b44c1bac3af7123e8a0c9e46' 2014-12-09 11:58:13 +01:00
fft_vfp.S Merge commit 'f963f80399deb1a2b44c1bac3af7123e8a0c9e46' 2014-12-09 11:58:13 +01:00
flacdsp_arm.S
flacdsp_init_arm.c lavc/flac: Fix encoding and decoding with high lpc. 2015-05-17 02:08:58 +02:00
fmtconvert_init_arm.c Merge commit '90b1b9350c0a97c4065ae9054b83e57f48a0de1f' 2016-01-02 11:21:36 +01:00
fmtconvert_neon.S Merge commit '90b1b9350c0a97c4065ae9054b83e57f48a0de1f' 2016-01-02 11:21:36 +01:00
fmtconvert_vfp.S Merge commit 'f0389eb777b1ab4291329d4f709098cdfa7384dc' 2013-08-29 16:10:39 +02:00
g722dsp_init_arm.c Merge commit '702458538d4e52809bcef460d39baabf061b16b5' 2015-02-16 02:16:29 +01:00
g722dsp_neon.S Merge commit '702458538d4e52809bcef460d39baabf061b16b5' 2015-02-16 02:16:29 +01:00
h264chroma_init_arm.c Merge commit '79dad2a932534d1155079f937649e099f9e5cc27' 2013-02-07 13:09:35 +01:00
h264cmc_neon.S avcodec: fix vc1dsp dependencies 2016-09-25 13:11:45 +02:00
h264dsp_init_arm.c lavc/arm: Use the neon vertical chroma loop filter also for H.264 4:2:2. 2015-01-31 10:05:24 +01:00
h264dsp_neon.S
h264idct_neon.S Merge commit '5bcbb516f2ff45290ef7995b081762e668693672' 2014-02-08 00:48:26 +01:00
h264pred_init_arm.c Merge commit '256ef19844892c6cf8e0386e3287bae970ec6320' 2015-07-18 02:13:22 +02:00
h264pred_neon.S
h264qpel_init_arm.c Merge commit '7fb993d338d88f2f62e0a358b6c9f3eb9a3a08ac' 2014-07-25 13:05:08 +02:00
h264qpel_neon.S
hevcdsp_arm.h hevcdsp: fix compilation for arm and aarch64 2015-03-12 20:01:01 +01:00
hevcdsp_deblock_neon.S hevcdsp: HEVC deblocking ARM NEON register clobber fix 2015-02-16 13:27:41 +01:00
hevcdsp_idct_neon.S Merge commit '1bd890ad173d79e7906c5e1d06bf0a06cca4519d' 2017-01-31 15:31:34 +01:00
hevcdsp_init_arm.c hevcdsp: fix compilation for arm and aarch64 2015-03-12 20:01:01 +01:00
hevcdsp_init_neon.c Merge commit '1bd890ad173d79e7906c5e1d06bf0a06cca4519d' 2017-01-31 15:31:34 +01:00
hevcdsp_qpel_neon.S avcodec/hevcdsp: ARM NEON optimized qpel functions 2015-02-25 18:39:51 +01:00
hpeldsp_arm.h Merge commit '7151c5d04aed3b496c21f713dcb603e2cbdb9c49' 2014-01-14 14:38:10 +01:00
hpeldsp_arm.S Merge commit '831a1180785a786272cdcefb71566a770bfb879e' 2014-03-13 23:59:56 +01:00
hpeldsp_armv6.S Merge commit '61985ad72c47bbb668f2d3923bf5c9df83e79323' 2014-03-09 01:16:21 +01:00
hpeldsp_init_arm.c Merge commit '322a1dda973e802db7b57f2007fad3efcd5bab81' 2014-03-22 22:53:33 +01:00
hpeldsp_init_armv6.c Merge commit '7384b7a71338d960e421d6dc3d77da09b0a442cb' 2013-04-20 14:19:08 +02:00
hpeldsp_init_neon.c Merge commit '7384b7a71338d960e421d6dc3d77da09b0a442cb' 2013-04-20 14:19:08 +02:00
hpeldsp_neon.S arm: hpeldsp: Move half-pel assembly from dsputil to hpeldsp 2013-04-19 23:19:08 +03:00
idct.h Merge commit '4de8b60684ce13dff3e3d372dae4f49b9e53f755' 2014-07-21 01:56:22 +02:00
idctdsp_arm.h Merge commit 'e3fcb14347466095839c2a3c47ebecff02da891e' 2014-07-01 15:22:11 +02:00
idctdsp_arm.S avcodec/idctdsp: change {put,add}_pixels_clamped to ptrdiff_t line_size 2014-09-24 21:43:19 -03:00
idctdsp_armv6.S Merge commit 'e3fcb14347466095839c2a3c47ebecff02da891e' 2014-07-01 15:22:11 +02:00
idctdsp_init_arm.c Merge commit '7c6eb0a1b7bf1aac7f033a7ec6d8cacc3b5c2615' 2015-07-27 22:10:35 +02:00
idctdsp_init_armv5te.c Merge commit '4de8b60684ce13dff3e3d372dae4f49b9e53f755' 2014-07-21 01:56:22 +02:00
idctdsp_init_armv6.c Merge commit '7c6eb0a1b7bf1aac7f033a7ec6d8cacc3b5c2615' 2015-07-27 22:10:35 +02:00
idctdsp_init_neon.c avcodec/idctdsp: change {put,add}_pixels_clamped to ptrdiff_t line_size 2014-09-24 21:43:19 -03:00
idctdsp_neon.S Merge commit 'e3fcb14347466095839c2a3c47ebecff02da891e' 2014-07-01 15:22:11 +02:00
int_neon.S Merge commit '054013a0fc6f2b52c60cee3e051be8cc7f82cef3' 2014-05-30 00:59:15 +02:00
jrevdct_arm.S
lossless_audiodsp_init_arm.c apedsp: move to llauddsp 2014-06-05 20:31:59 +02:00
lossless_audiodsp_neon.S apedsp: move to llauddsp 2014-06-05 20:31:59 +02:00
Makefile arm: Add NEON optimizations for 10 and 12 bit vp9 loop filter 2017-01-24 22:35:59 +02:00
mathops.h
mdct_fixed_neon.S Reinstate proper FFmpeg license for all files. 2013-08-30 15:47:38 +00:00
mdct_neon.S Merge commit '5bcbb516f2ff45290ef7995b081762e668693672' 2014-02-08 00:48:26 +01:00
mdct_vfp.S armv6: Accelerate ff_imdct_half for general case (mdct_bits != 6) 2014-07-18 01:34:08 +03:00
me_cmp_armv6.S Merge commit '2d60444331fca1910510038dd3817bea885c2367' 2014-07-17 23:27:40 +02:00
me_cmp_init_arm.c Merge commit '9c12c6ff9539e926df0b2a2299e915ae71872600' 2014-11-24 12:13:00 +01:00
mlpdsp_armv5te.S Merge commit '4c81613df499ba81d64ea102b38d0c6686cc304c' 2014-12-10 00:51:26 +01:00
mlpdsp_armv6.S Merge commit '41ed7ab45fc693f7d7fc35664c0233f4c32d69bb' 2016-06-21 21:55:34 +02:00
mlpdsp_init_arm.c Merge remote-tracking branch 'qatar/master' 2014-03-26 21:23:09 +01:00
mpegaudiodsp_fixed_armv6.S Reinstate proper FFmpeg license for all files. 2013-08-30 15:47:38 +00:00
mpegaudiodsp_init_arm.c Reinstate proper FFmpeg license for all files. 2013-08-30 15:47:38 +00:00
mpegvideo_arm.c Merge commit '835f798c7d20bca89eb4f3593846251ad0d84e4b' 2014-08-15 20:11:56 +02:00
mpegvideo_arm.h Merge commit '835f798c7d20bca89eb4f3593846251ad0d84e4b' 2014-08-15 20:11:56 +02:00
mpegvideo_armv5te_s.S
mpegvideo_armv5te.c Merge commit '41ed7ab45fc693f7d7fc35664c0233f4c32d69bb' 2016-06-21 21:55:34 +02:00
mpegvideo_neon.S Merge commit '5bcbb516f2ff45290ef7995b081762e668693672' 2014-02-08 00:48:26 +01:00
mpegvideoencdsp_armv6.S Merge commit 'c166148409fe8f0dbccef2fe684286a40ba1e37d' 2014-07-07 15:36:58 +02:00
mpegvideoencdsp_init_arm.c Merge commit 'c166148409fe8f0dbccef2fe684286a40ba1e37d' 2014-07-07 15:36:58 +02:00
neon.S Reinstate proper FFmpeg license for all files. 2013-08-30 15:47:38 +00:00
neontest.c avcodec: fix arguments on xmm/neon clobber test wrappers 2016-10-02 02:15:47 -03:00
pixblockdsp_armv6.S Merge commit 'f46bb608d9d76c543e4929dc8cffe36b84bd789e' 2014-07-10 01:22:14 +02:00
pixblockdsp_init_arm.c avcodec: Change get_pixels() to ptrdiff_t linesize 2014-08-06 15:50:54 +02:00
rdft_init_arm.c arm/rdft_init: fix license header 2016-04-12 15:01:19 -03:00
rdft_neon.S
rv34dsp_init_arm.c Merge commit 'a846dccb29d2bb0798af1d47d06100eda9ca87cc' 2013-02-07 13:35:49 +01:00
rv34dsp_neon.S Reinstate proper FFmpeg license for all files. 2013-08-30 15:47:38 +00:00
rv40dsp_init_arm.c Merge commit '7fb993d338d88f2f62e0a358b6c9f3eb9a3a08ac' 2014-07-25 13:05:08 +02:00
rv40dsp_neon.S Reinstate proper FFmpeg license for all files. 2013-08-30 15:47:38 +00:00
sbrdsp_init_arm.c Reinstate proper FFmpeg license for all files. 2013-08-30 15:47:38 +00:00
sbrdsp_neon.S Reinstate proper FFmpeg license for all files. 2013-08-30 15:47:38 +00:00
simple_idct_arm.S Merge commit '41ed7ab45fc693f7d7fc35664c0233f4c32d69bb' 2016-06-21 21:55:34 +02:00
simple_idct_armv5te.S
simple_idct_armv6.S
simple_idct_neon.S
startcode_armv6.S h264: Move start code search functions into separate source files. 2014-08-04 22:22:54 +02:00
startcode.h Merge commit 'db7f1c7c5a1d37e7f4da64a79a97bea1c4b6e9f8' 2014-08-05 12:46:10 +02:00
synth_filter_init_arm.c avcodec/synth_filter: split off remaining code from dcadec files 2016-01-25 14:57:38 -03:00
synth_filter_neon.S
synth_filter_vfp.S Merge commit '7e18a727d2c2a19f22fcf68875d1b05fd2eafcef' 2014-07-18 13:17:29 +02:00
vc1dsp_init_arm.c Fix compile error on arm4/arm5 platform 2014-09-23 21:11:05 +02:00
vc1dsp_init_neon.c Merge commit '896a5bff64264f4d01ed98eacc97a67260c1e17e' 2014-06-03 18:19:21 +02:00
vc1dsp_neon.S Merge commit '896a5bff64264f4d01ed98eacc97a67260c1e17e' 2014-06-03 18:19:21 +02:00
vc1dsp.h Merge commit '832e19063209a5f355af733d1a45f5051f49ce33' 2013-12-20 23:12:16 +01:00
videodsp_arm.h
videodsp_armv5te.S arm: use a local label instead of the function symbol in ff_prefetch_arm 2015-07-20 23:10:29 +02:00
videodsp_init_arm.c Merge commit '620289a20e022b9c16c10d546ef86cc0bb77cc84' 2013-02-06 13:27:24 +01:00
videodsp_init_armv5te.c Merge commit '620289a20e022b9c16c10d546ef86cc0bb77cc84' 2013-02-06 13:27:24 +01:00
vorbisdsp_init_arm.c Merge commit '620289a20e022b9c16c10d546ef86cc0bb77cc84' 2013-02-06 13:27:24 +01:00
vorbisdsp_neon.S
vp3dsp_init_arm.c Merge commit '3dc6272bed7890a49080e18eacf3c7a4a6594b0d' 2014-04-05 18:54:15 +02:00
vp3dsp_neon.S Merge remote-tracking branch 'qatar/master' 2014-01-08 05:44:56 +01:00
vp6dsp_init_arm.c Merge commit '8506ff97c9ea4a1f52983497ecf8d4ef193403a9' 2013-08-24 11:04:11 +02:00
vp6dsp_neon.S Merge commit '8506ff97c9ea4a1f52983497ecf8d4ef193403a9' 2013-08-24 11:04:11 +02:00
vp8_armv6.S
vp8.h arm: asm decode_block_coeffs_internal is vp8 specific 2014-04-04 10:39:29 +02:00
vp8dsp_armv6.S Merge commit '5f74bd31a9bd1ac7655103b11743c12d38e0419f' 2016-11-17 15:05:07 +01:00
vp8dsp_init_arm.c Merge commit 'ac4b32df71bd932838043a4838b86d11e169707f' 2014-04-04 14:46:10 +02:00
vp8dsp_init_armv6.c Merge commit 'ac4b32df71bd932838043a4838b86d11e169707f' 2014-04-04 14:46:10 +02:00
vp8dsp_init_neon.c Merge commit 'ac4b32df71bd932838043a4838b86d11e169707f' 2014-04-04 14:46:10 +02:00
vp8dsp_neon.S Merge commit 'e8b96a77010dd62624c3c65c357d7ae3b397ceaa' 2016-11-14 15:21:49 +01:00
vp8dsp.h Merge commit 'ac4b32df71bd932838043a4838b86d11e169707f' 2014-04-04 14:46:10 +02:00
vp9dsp_init_10bpp_arm.c arm: Add NEON optimizations for 10 and 12 bit vp9 MC 2017-01-24 22:35:50 +02:00
vp9dsp_init_12bpp_arm.c arm: Add NEON optimizations for 10 and 12 bit vp9 MC 2017-01-24 22:35:50 +02:00
vp9dsp_init_16bpp_arm_template.c arm: Add NEON optimizations for 10 and 12 bit vp9 loop filter 2017-01-24 22:35:59 +02:00
vp9dsp_init_arm.c arm: vp9lpf: Implement the mix2_44 function with one single filter pass 2017-03-11 13:14:51 +02:00
vp9dsp_init.h arm: Add NEON optimizations for 10 and 12 bit vp9 MC 2017-01-24 22:35:50 +02:00
vp9itxfm_16bpp_neon.S arm: Add NEON optimizations for 10 and 12 bit vp9 itxfm 2017-01-24 22:35:56 +02:00
vp9itxfm_neon.S arm: vp9itxfm: Avoid reloading the idct32 coefficients 2017-03-11 13:14:51 +02:00
vp9lpf_16bpp_neon.S arm: Add NEON optimizations for 10 and 12 bit vp9 loop filter 2017-01-24 22:35:59 +02:00
vp9lpf_neon.S arm: vp9lpf: Implement the mix2_44 function with one single filter pass 2017-03-11 13:14:51 +02:00
vp9mc_16bpp_neon.S arm: Add NEON optimizations for 10 and 12 bit vp9 MC 2017-01-24 22:35:50 +02:00
vp9mc_neon.S arm: vp9mc: Calculate less unused data in the 4 pixel wide horizontal filter 2017-03-11 13:14:47 +02:00
vp56_arith.h