Commit Graph

84235 Commits

Author SHA1 Message Date
Clément Bœsch
4563a86f01 Merge commit 'ab3554e1a7c04a5ea30f9c905de92348478ef7c8'
* commit 'ab3554e1a7c04a5ea30f9c905de92348478ef7c8':
  configure: Drop check_lib()/require() in favor of check_lib2()/require2()

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 12:23:02 +01:00
Clément Bœsch
8e9dfe0d29 Merge commit '468bfe38c66d4d020984158e53b09a6a5749f394'
* commit '468bfe38c66d4d020984158e53b09a6a5749f394':
  ppc: mpegvideo: Add proper runtime AltiVec detection

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 12:08:11 +01:00
Clément Bœsch
7c54e5870f Merge commit '6ce93757ee6b81fe727bfdc9f546fd0ddf9139c3'
* commit '6ce93757ee6b81fe727bfdc9f546fd0ddf9139c3':
  ppc: Update #endif comments

This commit is mostly a noop as we seem to support PPC LE (see
902ce2a6c4). Only the h264 chunks are
updated.

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 12:06:51 +01:00
Clément Bœsch
9e8fd5c423 Merge commit 'caccb3a0cdc7ee32cbed7eab156d35025133eadc'
* commit 'caccb3a0cdc7ee32cbed7eab156d35025133eadc':
  audiodsp: ppc: Add VSX variant

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 11:57:32 +01:00
Clément Bœsch
3c8f7a8f6b Merge commit 'e89cef40506d990a982aefedfde7d3ca4f88c524'
* commit 'e89cef40506d990a982aefedfde7d3ca4f88c524':
  checkasm: Read the unsigned value as it should

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 11:55:20 +01:00
Clément Bœsch
9785b1e21b Merge commit '75d642a944d5579e4ef20ff3701422a64692afcf'
* commit '75d642a944d5579e4ef20ff3701422a64692afcf':
  vaapi_vp8: Explicitly include libva vp8 decode header
  vaapi_decode: Ignore the profile when not useful
  lavc/vaapi: Add VP8 decode hwaccel
  vp8: Add hwaccel hooks

This merge is a noop as these commits are already under review on the
mailing list. doc/libav-merge.txt is updated to track its progress.

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 11:54:29 +01:00
Clément Bœsch
eed8ccde3e Merge commit '131a85a1fed9966bbd38517f76abfac0237e39dc'
* commit '131a85a1fed9966bbd38517f76abfac0237e39dc':
  utvideo: Change type of array stride parameters to ptrdiff_t

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 11:33:48 +01:00
Clément Bœsch
8316a0e08b Merge commit '52730e0f867fe77b7d2353d8b44e92edb7079ca5'
* commit '52730e0f867fe77b7d2353d8b44e92edb7079ca5':
  iir_filter: Change type of array stride parameters to ptrdiff_t

The merge also updates the MIPS code and drop the extra log.h include.

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 11:27:48 +01:00
Clément Bœsch
d36a423445 Merge commit '6b52762951fa138eef59e2628dabb389e0500e40'
* commit '6b52762951fa138eef59e2628dabb389e0500e40':
  error_resilience: Change type of array stride parameters to ptrdiff_t

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 11:10:46 +01:00
Clément Bœsch
100026bed6 Merge commit 'ec903058447ad5be34d89533962e9ae1aa1c78f7'
* commit 'ec903058447ad5be34d89533962e9ae1aa1c78f7':
  configure: Simplify clock_gettime() test

nanosleep check also updated.

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 11:04:50 +01:00
Clément Bœsch
38343651a8 Merge commit '3aa9d37d03da3c9b482d19b3988659287815280e'
* commit '3aa9d37d03da3c9b482d19b3988659287815280e':
  build: Fix directory dependencies of tests/pixfmts.mak target

This might not be necessary given our mkdirs in the configure, but it
probably doesn't hurt.

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 11:01:02 +01:00
Clément Bœsch
4ae80c3753 Merge commit '0e5dde739943168d6f61d3fb40b3f622e7abfeff'
* commit '0e5dde739943168d6f61d3fb40b3f622e7abfeff':
  configure: Fix --disable-pod2man / --disable-texi2html

This commit is a noop, we have dedicated documentation option for this
purpose.

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 10:47:01 +01:00
Clément Bœsch
d0db00c808 configure: remove pod2man from the config list
The configure has the --disable-manpages option for this purpose, and
--disable-pod2man is currently ignored due to that. This is also
consistent with the other documentation options.
2017-03-20 10:45:48 +01:00
Clément Bœsch
715f781834 Merge commit 'b8c2d407efa41c3db6813ad67fadd51b814765bd'
* commit 'b8c2d407efa41c3db6813ad67fadd51b814765bd':
  configure: Simplify libopenjpeg check

This commit is a noop, our libopenjpeg check is already "simpler".

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 09:48:22 +01:00
Clément Bœsch
6d6f79c737 Merge commit '2610c9528f86286e4c6e174411a26ff5b4815cde'
* commit '2610c9528f86286e4c6e174411a26ff5b4815cde':
  configure: Move initial VAAPI check to a more sensible place

This commit is a noop, see 17989dcf54

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 09:46:33 +01:00
Clément Bœsch
7317b69630 Merge commit '5b5ed92d92252a685e891a5d636870e223b63228'
* commit '5b5ed92d92252a685e891a5d636870e223b63228':
  sanm: Change type of array pitch parameters to ptrdiff_t

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 09:43:52 +01:00
Clément Bœsch
64926292a6 lavc/copy_block: style fix 2017-03-20 09:23:15 +01:00
Clément Bœsch
21c18b0878 Merge commit '73f5e17a203713c4ac4e5a821809823b383b195f'
* commit '73f5e17a203713c4ac4e5a821809823b383b195f':
  copy_block: Change type of array stride parameters to ptrdiff_t

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 09:22:36 +01:00
Clément Bœsch
e59d8d030f Merge commit '21e500ba647aec233d5930d3d1081489d0d53ceb'
* commit '21e500ba647aec233d5930d3d1081489d0d53ceb':
  svq1dec: Change type of array pitch parameters to ptrdiff_t

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 09:17:34 +01:00
Clément Bœsch
bb3ad401fc Merge commit '746c56b7730ce09397d3a8354acc131285e9d829'
* commit '746c56b7730ce09397d3a8354acc131285e9d829':
  indeo: Change type of array pitch parameters to ptrdiff_t

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 09:07:57 +01:00
Clément Bœsch
3835283293 Merge commit '4fb311c804098d78e5ce5f527f9a9c37536d3a08'
* commit '4fb311c804098d78e5ce5f527f9a9c37536d3a08':
  Drop memalign hack

Merged, as this may indeed be uneeded since
46e3936fb0.

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 08:54:44 +01:00
Clément Bœsch
a5cf6628d6 Merge commit 'f01f7a7846529b7c3ef343f117eaa2c0a1457af0'
* commit 'f01f7a7846529b7c3ef343f117eaa2c0a1457af0':
  hwcontext_dxva2: use the special UC copy for downloading frames

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 08:37:40 +01:00
Clément Bœsch
8200b16a9c Merge commit 'd7bc52bf456deba0f32d9fe5c288ec441f1ebef5'
* commit 'd7bc52bf456deba0f32d9fe5c288ec441f1ebef5':
  imgutils: add a function for copying image data from GPU mapped memory

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 08:34:10 +01:00
Clément Bœsch
5d23543277 Merge commit '24da430324735f95880c4a4a54298dc8023125bb'
* commit '24da430324735f95880c4a4a54298dc8023125bb':
  Changelog: mark the release 12 branch

This commit is a noop.

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 08:26:09 +01:00
Clément Bœsch
518961bc99 Merge commit '851960f6f8cf1f946fe42fa36cf6598fac68072c'
* commit '851960f6f8cf1f946fe42fa36cf6598fac68072c':
  lavc: Remove old vaapi decode infrastructure
  avconv_vaapi: Convert to use hw_frames_ctx only
  vaapi_mpeg4: Convert to use the new VAAPI hwaccel code
  vaapi_vc1: Convert to use the new VAAPI hwaccel code
  vaapi_mpeg2: Convert to use the new VAAPI hwaccel code
  vaapi_h264: Convert to use the new VAAPI hwaccel code
  lavc: Rewrite VAAPI decode infrastructure

This merge is a noop, these commits have already been cherry-picked.

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 08:25:01 +01:00
Clément Bœsch
464fcc979c Merge commit '72eba6558ee4f10239ba3f472c0b033ec70082a7'
* commit '72eba6558ee4f10239ba3f472c0b033ec70082a7':
  wmavoice: Simplify GetBitContext initialization

This commit is a noop. We don't have that code anymore since
3deb4b54a2.

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 08:21:09 +01:00
Clément Bœsch
e514a1d404 Merge commit '80fc75d51e3312e1890591048eb6a3d499b6e49d'
* commit '80fc75d51e3312e1890591048eb6a3d499b6e49d':
  Changelog: Mention mov with multiple stsd

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 08:19:03 +01:00
Clément Bœsch
45982bdcd0 Merge commit '728e80cd2e1d4b7c3e26489efcd77bd7a9e84a99'
* commit '728e80cd2e1d4b7c3e26489efcd77bd7a9e84a99':
  High Definition Compatible Digital (HDCD) decoder filter, using libhdcd

This commit is a noop, we have that code natively.

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 08:17:09 +01:00
Clément Bœsch
b1a80bdb62 Merge commit '95f80293456d9d4b1b096621260c38bc90325ec0'
* commit '95f80293456d9d4b1b096621260c38bc90325ec0':
  avprobe: Fix memory leak

This commit is a noop, ffprobe is not affected.

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 08:12:57 +01:00
Clément Bœsch
5e5e793552 doc/APIchanges: fill date & hash for AV_PIX_FMT_FLAG_BAYER 2017-03-20 08:10:54 +01:00
Clément Bœsch
6557d784d2 Merge commit '8db804e8f549d5b86a1edf62736e0ef80f160da9'
* commit '8db804e8f549d5b86a1edf62736e0ef80f160da9':
  mov: Remove old b-frame/video delay heuristic

This commit is a noop, see 425be3c810

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 08:09:15 +01:00
Clément Bœsch
64722057b4 Merge commit 'eb96505b761eb02b6a3efc76d854afa6a41941ff'
* commit 'eb96505b761eb02b6a3efc76d854afa6a41941ff':
  mov: Remove ancient heuristic hack

This commit is a noop, see 04f8d31287

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 08:08:31 +01:00
Clément Bœsch
e811f84a2e swscale: cosmetics in is{RGB,BGR}inInt
Reduce diff with Libav.
2017-03-20 08:02:30 +01:00
Clément Bœsch
d6635daded swscale: remove unused is{RGB,BGR}inBytes 2017-03-20 08:02:30 +01:00
Clément Bœsch
ff6bc16c5a swscale: use a (more correct) function for isPacked 2017-03-20 08:02:30 +01:00
Clément Bœsch
2b9a52bcca swscale: use a function for isAnyRGB 2017-03-20 08:02:30 +01:00
Clément Bœsch
c30875e8b2 swscale: use a function for isBayer 2017-03-20 08:02:30 +01:00
Clément Bœsch
9c2436e1e7 lavu: add AV_PIX_FMT_FLAG_BAYER 2017-03-20 08:02:30 +01:00
Clément Bœsch
f052b1b40f swscale: use a function for isGray 2017-03-20 08:02:30 +01:00
Clément Bœsch
08e1376d81 fate: add fate-sws-pixdesc-query
Test the pixel format querying within libswscale.
2017-03-20 08:02:30 +01:00
Michael Niedermayer
23f3f92361 avcodec/mjpegdec: quant_matrixes can be up to 65535, use uint16_t
Fixes invalid shift
Fixes: 870/clusterfuzz-testcase-5649105424482304

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-03-20 01:38:04 +01:00
Michael Niedermayer
656a17e126 avcodec/mjpegdec: Check quant_matrixes values for being non zero
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-03-20 01:38:02 +01:00
Michael Niedermayer
98da63b3f5 avcodec/vp56: Check avctx->error_concealment before enabling EC
Fixes timeout with 847/clusterfuzz-testcase-5291877358108672
Fixes timeout with 850/clusterfuzz-testcase-5721296509861888

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-03-20 01:33:08 +01:00
Michael Niedermayer
a84d610b37 avcodec/h264_direct: Fix runtime error: signed integer overflow: -9 - 2147483647 cannot be represented in type 'int'
Fixes: 864/clusterfuzz-testcase-4774385942528000

See: [FFmpeg-devel] [PATCH 1/2] avcodec/h264_direct: Fix runtime error: signed integer overflow: 2147483647 - -14133 cannot be represented in type 'int'
See: [FFmpeg-devel] [PATCH 2/2] avcodec/h264_direct: Fix runtime error: signed integer overflow: -9 - 2147483647 cannot be represented in type 'int'

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-03-20 01:33:08 +01:00
Michael Niedermayer
5d996b5649 avcodec/tiff: Check stripsize strippos for overflow
Fixes: 861/clusterfuzz-testcase-5688284384591872

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-03-20 01:33:08 +01:00
Martin Storsjö
61b8a9ea29 aarch64: vp9itxfm16: Do a simpler half/quarter idct16/idct32 when possible
This work is sponsored by, and copyright, Google.

This avoids loading and calculating coefficients that we know will
be zero, and avoids filling the temp buffer with zeros in places
where we know the second pass won't read.

This gives a pretty substantial speedup for the smaller subpartitions.

The code size increases from 21512 bytes to 31400 bytes.

The idct16/32_end macros are moved above the individual functions; the
instructions themselves are unchanged, but since new functions are added
at the same place where the code is moved from, the diff looks rather
messy.

Before:
vp9_inv_dct_dct_16x16_sub1_add_10_neon:     284.6
vp9_inv_dct_dct_16x16_sub2_add_10_neon:    1902.7
vp9_inv_dct_dct_16x16_sub4_add_10_neon:    1903.0
vp9_inv_dct_dct_16x16_sub8_add_10_neon:    2201.1
vp9_inv_dct_dct_16x16_sub12_add_10_neon:   2510.0
vp9_inv_dct_dct_16x16_sub16_add_10_neon:   2821.3
vp9_inv_dct_dct_32x32_sub1_add_10_neon:    1011.6
vp9_inv_dct_dct_32x32_sub2_add_10_neon:    9716.5
vp9_inv_dct_dct_32x32_sub4_add_10_neon:    9704.9
vp9_inv_dct_dct_32x32_sub8_add_10_neon:   10641.7
vp9_inv_dct_dct_32x32_sub12_add_10_neon:  11555.7
vp9_inv_dct_dct_32x32_sub16_add_10_neon:  12499.8
vp9_inv_dct_dct_32x32_sub20_add_10_neon:  13403.7
vp9_inv_dct_dct_32x32_sub24_add_10_neon:  14335.8
vp9_inv_dct_dct_32x32_sub28_add_10_neon:  15253.6
vp9_inv_dct_dct_32x32_sub32_add_10_neon:  16179.5

After:
vp9_inv_dct_dct_16x16_sub1_add_10_neon:     282.8
vp9_inv_dct_dct_16x16_sub2_add_10_neon:    1142.4
vp9_inv_dct_dct_16x16_sub4_add_10_neon:    1139.0
vp9_inv_dct_dct_16x16_sub8_add_10_neon:    1772.9
vp9_inv_dct_dct_16x16_sub12_add_10_neon:   2515.2
vp9_inv_dct_dct_16x16_sub16_add_10_neon:   2823.5
vp9_inv_dct_dct_32x32_sub1_add_10_neon:    1012.7
vp9_inv_dct_dct_32x32_sub2_add_10_neon:    6944.4
vp9_inv_dct_dct_32x32_sub4_add_10_neon:    6944.2
vp9_inv_dct_dct_32x32_sub8_add_10_neon:    7609.8
vp9_inv_dct_dct_32x32_sub12_add_10_neon:   9953.4
vp9_inv_dct_dct_32x32_sub16_add_10_neon:  10770.1
vp9_inv_dct_dct_32x32_sub20_add_10_neon:  13418.8
vp9_inv_dct_dct_32x32_sub24_add_10_neon:  14330.7
vp9_inv_dct_dct_32x32_sub28_add_10_neon:  15257.1
vp9_inv_dct_dct_32x32_sub32_add_10_neon:  16190.6

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-03-19 22:54:37 +02:00
Martin Storsjö
eabc5abf94 arm: vp9itxfm16: Do a simpler half/quarter idct16/idct32 when possible
This work is sponsored by, and copyright, Google.

This avoids loading and calculating coefficients that we know will
be zero, and avoids filling the temp buffer with zeros in places
where we know the second pass won't read.

This gives a pretty substantial speedup for the smaller subpartitions.

The code size increases from 14516 bytes to 22484 bytes.

The idct16/32_end macros are moved above the individual functions; the
instructions themselves are unchanged, but since new functions are added
at the same place where the code is moved from, the diff looks rather
messy.

Before:                                 Cortex A7       A8       A9      A53
vp9_inv_dct_dct_16x16_sub1_add_10_neon:     454.0    270.7    418.5    295.4
vp9_inv_dct_dct_16x16_sub2_add_10_neon:    3840.2   3244.8   3700.1   2337.9
vp9_inv_dct_dct_16x16_sub4_add_10_neon:    4212.5   3575.4   3996.9   2571.6
vp9_inv_dct_dct_16x16_sub8_add_10_neon:    5174.4   4270.5   4615.5   3031.9
vp9_inv_dct_dct_16x16_sub12_add_10_neon:   5676.0   4908.5   5226.5   3491.3
vp9_inv_dct_dct_16x16_sub16_add_10_neon:   6403.9   5589.0   5839.8   3948.5
vp9_inv_dct_dct_32x32_sub1_add_10_neon:    1710.7    944.7   1582.1   1045.4
vp9_inv_dct_dct_32x32_sub2_add_10_neon:   21040.7  16706.1  18687.7  13193.1
vp9_inv_dct_dct_32x32_sub4_add_10_neon:   22197.7  18282.7  19577.5  13918.6
vp9_inv_dct_dct_32x32_sub8_add_10_neon:   24511.5  20911.5  21472.5  15367.5
vp9_inv_dct_dct_32x32_sub12_add_10_neon:  26939.5  24264.3  23239.1  16830.3
vp9_inv_dct_dct_32x32_sub16_add_10_neon:  29419.5  26845.1  25020.6  18259.9
vp9_inv_dct_dct_32x32_sub20_add_10_neon:  31146.4  29633.5  26803.3  19721.7
vp9_inv_dct_dct_32x32_sub24_add_10_neon:  33376.3  32507.8  28642.4  21174.2
vp9_inv_dct_dct_32x32_sub28_add_10_neon:  35629.4  35439.6  30416.5  22625.7
vp9_inv_dct_dct_32x32_sub32_add_10_neon:  37269.9  37914.9  32271.9  24078.9

After:
vp9_inv_dct_dct_16x16_sub1_add_10_neon:     454.0    276.0    418.5    295.1
vp9_inv_dct_dct_16x16_sub2_add_10_neon:    2336.2   1886.0   2251.0   1458.6
vp9_inv_dct_dct_16x16_sub4_add_10_neon:    2531.0   2054.7   2402.8   1591.1
vp9_inv_dct_dct_16x16_sub8_add_10_neon:    3848.6   3491.1   3845.7   2554.8
vp9_inv_dct_dct_16x16_sub12_add_10_neon:   5703.8   4831.6   5230.8   3493.4
vp9_inv_dct_dct_16x16_sub16_add_10_neon:   6399.5   5567.0   5832.4   3951.5
vp9_inv_dct_dct_32x32_sub1_add_10_neon:    1722.1    938.5   1577.3   1044.5
vp9_inv_dct_dct_32x32_sub2_add_10_neon:   15003.5  11576.8  13105.8   9602.2
vp9_inv_dct_dct_32x32_sub4_add_10_neon:   15768.5  12677.2  13726.0  10138.1
vp9_inv_dct_dct_32x32_sub8_add_10_neon:   17278.8  14825.4  14907.5  11185.7
vp9_inv_dct_dct_32x32_sub12_add_10_neon:  22335.7  21544.5  20379.5  15019.8
vp9_inv_dct_dct_32x32_sub16_add_10_neon:  24165.6  23881.7  21938.6  16308.2
vp9_inv_dct_dct_32x32_sub20_add_10_neon:  31082.2  30860.9  26835.3  19711.3
vp9_inv_dct_dct_32x32_sub24_add_10_neon:  33102.6  31922.8  28638.3  21161.0
vp9_inv_dct_dct_32x32_sub28_add_10_neon:  35104.9  34867.5  30411.7  22621.2
vp9_inv_dct_dct_32x32_sub32_add_10_neon:  37438.1  39103.4  32217.8  24067.6

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-03-19 22:54:33 +02:00
Martin Storsjö
d564c9018f aarch64: vp9itxfm16: Move the load_add_store macro out from the itxfm16 pass2 function
This allows reusing the macro for a separate implementation of the
pass2 function.

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-03-19 22:54:30 +02:00
Martin Storsjö
0f2705e66b aarch64: vp9itxfm16: Make the larger core transforms standalone functions
This work is sponsored by, and copyright, Google.

This reduces the code size of libavcodec/aarch64/vp9itxfm_16bpp_neon.o from
26288 to 21512 bytes.

This gives a small slowdown of a couple of tens of cycles, but makes
it more feasible to add more optimized versions of these transforms.

Before:
vp9_inv_dct_dct_16x16_sub4_add_10_neon:    1887.4
vp9_inv_dct_dct_16x16_sub16_add_10_neon:   2801.5
vp9_inv_dct_dct_32x32_sub4_add_10_neon:    9691.4
vp9_inv_dct_dct_32x32_sub32_add_10_neon:  16154.9

After:
vp9_inv_dct_dct_16x16_sub4_add_10_neon:    1899.5
vp9_inv_dct_dct_16x16_sub16_add_10_neon:   2827.2
vp9_inv_dct_dct_32x32_sub4_add_10_neon:    9714.7
vp9_inv_dct_dct_32x32_sub32_add_10_neon:  16175.9

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-03-19 22:54:26 +02:00
Martin Storsjö
0ea603203d arm: vp9itxfm16: Make the larger core transforms standalone functions
This work is sponsored by, and copyright, Google.

This reduces the code size of libavcodec/arm/vp9itxfm_16bpp_neon.o from
17500 to 14516 bytes.

This gives a small slowdown of a couple tens of cycles, up to around
150 cycles for the full case of the largest transform, but makes
it more feasible to add more optimized versions of these transforms.

Before:                                 Cortex A7       A8       A9      A53
vp9_inv_dct_dct_16x16_sub4_add_10_neon:    4237.4   3561.5   3971.8   2525.3
vp9_inv_dct_dct_16x16_sub16_add_10_neon:   6371.9   5452.0   5779.3   3910.5
vp9_inv_dct_dct_32x32_sub4_add_10_neon:   22068.8  17867.5  19555.2  13871.6
vp9_inv_dct_dct_32x32_sub32_add_10_neon:  37268.9  38684.2  32314.2  23969.0

After:
vp9_inv_dct_dct_16x16_sub4_add_10_neon:    4375.1   3571.9   4283.8   2567.2
vp9_inv_dct_dct_16x16_sub16_add_10_neon:   6415.6   5578.9   5844.6   3948.3
vp9_inv_dct_dct_32x32_sub4_add_10_neon:   22653.7  18079.7  19603.7  13905.3
vp9_inv_dct_dct_32x32_sub32_add_10_neon:  37593.2  38862.2  32235.8  24070.9

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-03-19 22:54:19 +02:00