Commit Graph

35 Commits

Author SHA1 Message Date
Magnus Röös
833fed5253 libavcodec: vp8 neon optimizations for aarch64
Partial port of the ARM Neon for aarch64.

Benchmarks from fate:

benchmarking with Linux Perf Monitoring API
nop: 58.6
checkasm: using random seed 1760970128
NEON:
 - vp8dsp.idct       [OK]
 - vp8dsp.mc         [OK]
 - vp8dsp.loopfilter [OK]
checkasm: all 21 tests passed
vp8_idct_add_c: 201.6
vp8_idct_add_neon: 83.1
vp8_idct_dc_add_c: 107.6
vp8_idct_dc_add_neon: 33.8
vp8_idct_dc_add4y_c: 426.4
vp8_idct_dc_add4y_neon: 59.4
vp8_loop_filter8uv_h_c: 688.1
vp8_loop_filter8uv_h_neon: 216.3
vp8_loop_filter8uv_inner_h_c: 649.3
vp8_loop_filter8uv_inner_h_neon: 195.3
vp8_loop_filter8uv_inner_v_c: 544.8
vp8_loop_filter8uv_inner_v_neon: 131.3
vp8_loop_filter8uv_v_c: 706.1
vp8_loop_filter8uv_v_neon: 141.1
vp8_loop_filter16y_h_c: 668.8
vp8_loop_filter16y_h_neon: 242.8
vp8_loop_filter16y_inner_h_c: 647.3
vp8_loop_filter16y_inner_h_neon: 224.6
vp8_loop_filter16y_inner_v_c: 647.8
vp8_loop_filter16y_inner_v_neon: 128.8
vp8_loop_filter16y_v_c: 721.8
vp8_loop_filter16y_v_neon: 154.3
vp8_loop_filter_simple_h_c: 387.8
vp8_loop_filter_simple_h_neon: 187.6
vp8_loop_filter_simple_v_c: 384.1
vp8_loop_filter_simple_v_neon: 78.6
vp8_put_epel8_h4v4_c: 3971.1
vp8_put_epel8_h4v4_neon: 855.1
vp8_put_epel8_h4v6_c: 5060.1
vp8_put_epel8_h4v6_neon: 989.6
vp8_put_epel8_h6v4_c: 4320.8
vp8_put_epel8_h6v4_neon: 1007.3
vp8_put_epel8_h6v6_c: 5449.3
vp8_put_epel8_h6v6_neon: 1158.1
vp8_put_epel16_h6_c: 6683.8
vp8_put_epel16_h6_neon: 831.8
vp8_put_epel16_h6v6_c: 11110.8
vp8_put_epel16_h6v6_neon: 2214.8
vp8_put_epel16_v6_c: 7024.8
vp8_put_epel16_v6_neon: 799.6
vp8_put_pixels8_c: 112.8
vp8_put_pixels8_neon: 78.1
vp8_put_pixels16_c: 131.3
vp8_put_pixels16_neon: 129.8

Signed-off-by: Magnus Röös <mla2.roos@gmail.com>
2019-01-31 20:17:51 +01:00
Hendrik Leppkes
e999a4ed6c Merge commit '2866d108c9e9da7baf53ff57a51d470691049a57'
* commit '2866d108c9e9da7baf53ff57a51d470691049a57':
  vp8dsp: Remove the comment saying that the height is equal to the width

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-17 15:06:28 +01:00
Hendrik Leppkes
5a447edd47 Merge commit 'dc08bbf63a217c839aa4c143f2a1d0b7e2e6d997'
* commit 'dc08bbf63a217c839aa4c143f2a1d0b7e2e6d997':
  vp8dsp: Clarify the first dimension of the mc function tables

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-11-14 15:21:24 +01:00
Martin Storsjö
2866d108c9 vp8dsp: Remove the comment saying that the height is equal to the width
This comment isn't true, the height can be different from the width
for these functions (which is why the height is passed as a parameter
to them).

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-07-10 14:52:16 +03:00
Martin Storsjö
dc08bbf63a vp8dsp: Clarify the first dimension of the mc function tables
Index 0 is w=16, 1 is wd=8, 2 is wd=4.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-07-06 22:45:29 +03:00
Shivraj Patil
dec16372df avcodec/mips: MSA (MIPS-SIMD-Arch) optimizations for VP8 functions
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
2015-08-04 11:15:06 -04:00
Michael Niedermayer
fb61ed1e9f Merge commit 'ac4b32df71bd932838043a4838b86d11e169707f'
* commit 'ac4b32df71bd932838043a4838b86d11e169707f':
  On2 VP7 decoder

Conflicts:
	Changelog
	libavcodec/arm/h264pred_init_arm.c
	libavcodec/arm/vp8dsp.h
	libavcodec/arm/vp8dsp_init_arm.c
	libavcodec/arm/vp8dsp_init_armv6.c
	libavcodec/arm/vp8dsp_init_neon.c
	libavcodec/avcodec.h
	libavcodec/h264pred.c
	libavcodec/version.h
	libavcodec/vp8.c
	libavcodec/vp8.h
	libavcodec/vp8data.h
	libavcodec/vp8dsp.c
	libavcodec/vp8dsp.h
	libavcodec/x86/h264_intrapred_init.c
	libavcodec/x86/vp8dsp_init.c

See: 89f2f5dbd7 and others
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-04 14:46:10 +02:00
Peter Ross
ac4b32df71 On2 VP7 decoder
Further performance improvements and security fixes by
Vittorio Giovara, Luca Barbato and Diego Biurrun.

Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2014-04-04 04:00:11 +02:00
Michael Niedermayer
ae3313e154 Merge commit '53c20f17c78d1d8a0fc2505868f201e69ff59cc5'
* commit '53c20f17c78d1d8a0fc2505868f201e69ff59cc5':
  vp8: K&R formatting cosmetics

Conflicts:
	libavcodec/vp8.c
	libavcodec/vp8.h
	libavcodec/vp8data.h
	libavcodec/vp8dsp.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-30 01:55:31 +01:00
Vittorio Giovara
53c20f17c7 vp8: K&R formatting cosmetics
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2014-03-29 16:11:09 +01:00
Peter Ross
b8664c9294 avcodec/vp8dsp: add VP7 idct and loop filter
Signed-off-by: Peter Ross <pross@xvid.org>
Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-02-15 02:15:35 +01:00
Michael Niedermayer
cae8f469fe Merge commit '38282149b6ce8f4b8361e3b84542ba9aa8a1f32f'
* commit '38282149b6ce8f4b8361e3b84542ba9aa8a1f32f':
  ppc: More consistent arch initialization

Conflicts:
	libavcodec/fft.h
	libavcodec/mpegaudiodsp.c
	libavcodec/mpegaudiodsp.h

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-05-01 18:08:13 +02:00
Diego Biurrun
38282149b6 ppc: More consistent arch initialization 2013-04-30 12:19:45 +02:00
Michael Niedermayer
ac8987591f Merge commit '88bd7fdc821aaa0cbcf44cf075c62aaa42121e3f'
* commit '88bd7fdc821aaa0cbcf44cf075c62aaa42121e3f':
  Drop DCTELEM typedef

Conflicts:
	libavcodec/alpha/dsputil_alpha.h
	libavcodec/alpha/motion_est_alpha.c
	libavcodec/arm/dsputil_init_armv6.c
	libavcodec/bfin/dsputil_bfin.h
	libavcodec/bfin/pixels_bfin.S
	libavcodec/cavs.c
	libavcodec/cavsdec.c
	libavcodec/dct-test.c
	libavcodec/dnxhdenc.c
	libavcodec/dsputil.c
	libavcodec/dsputil.h
	libavcodec/dsputil_template.c
	libavcodec/eamad.c
	libavcodec/h264_cavlc.c
	libavcodec/h264idct_template.c
	libavcodec/mpeg12.c
	libavcodec/mpegvideo.c
	libavcodec/mpegvideo.h
	libavcodec/mpegvideo_enc.c
	libavcodec/ppc/dsputil_altivec.c
	libavcodec/proresdsp.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-01-23 17:44:56 +01:00
Diego Biurrun
88bd7fdc82 Drop DCTELEM typedef
It does not help as an abstraction and adds dsputil dependencies.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2013-01-22 18:32:56 -08:00
Diego Biurrun
511cf612ac miscellaneous typo fixes 2012-12-21 00:18:34 +01:00
Lou Logan
2d38081b4f cosmetics: fix some typos
Patch attached.
From 2d4094fc0dcb4ccd0735eb7e1719e228ebb56bb9 Mon Sep 17 00:00:00 2001
From: Lou Logan <lou@lrcd.com>
Date: Mon, 12 Mar 2012 14:13:44 -0800
Subject: [PATCH] cosmetics: fix some typos

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-03-13 01:14:04 +01:00
Michael Niedermayer
268098d8b2 Merge remote-tracking branch 'qatar/master'
* qatar/master: (29 commits)
  amrwb: remove duplicate arguments from extrapolate_isf().
  amrwb: error out early if mode is invalid.
  h264: change underread for 10bit QPEL to overread.
  matroska: check buffer size for RM-style byte reordering.
  vp8: disable mmx functions with sse/sse2 counterparts on x86-64.
  vp8: change int stride to ptrdiff_t stride.
  wma: fix invalid buffer size assumptions causing random overreads.
  Windows Media Audio Lossless decoder
  rv10/20: Fix slice overflow with checked bitstream reader.
  h263dec: Disallow width/height changing with frame threads.
  rv10/20: Fix a buffer overread caused by losing track of the remaining buffer size.
  rmdec: Honor .RMF tag size rather than assuming 18.
  g722: Fix the QMF scaling
  r3d: don't set codec timebase.
  electronicarts: set timebase for tgv video.
  electronicarts: parse the framerate for cmv video.
  ogg: don't set codec timebase
  electronicarts: don't set codec timebase
  avs: don't set codec timebase
  wavpack: Fix an integer overflow
  ...

Conflicts:
	libavcodec/arm/vp8dsp_init_arm.c
	libavcodec/fraps.c
	libavcodec/h264.c
	libavcodec/mpeg4videodec.c
	libavcodec/mpegvideo.c
	libavcodec/msmpeg4.c
	libavcodec/pnmdec.c
	libavcodec/qpeg.c
	libavcodec/rawenc.c
	libavcodec/ulti.c
	libavcodec/vcr1.c
	libavcodec/version.h
	libavcodec/wmalosslessdec.c
	libavformat/electronicarts.c
	libswscale/ppc/yuv2rgb_altivec.c
	tests/ref/acodec/g722
	tests/ref/fate/ea-cmv

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-03-03 00:23:10 +01:00
Ronald S. Bultje
bd66f073fe vp8: change int stride to ptrdiff_t stride.
On 64bit platforms with 32bit int, this means we won't have to sign-
extend the integer anymore.
2012-03-02 10:31:50 -08:00
Michael Niedermayer
4a519b6e03 Merge remote-tracking branch 'qatar/master'
* qatar/master:
  mov: Use defines for sample flags in fragments
  mov: Use defines for trun flags
  mov: Use defines for tfhd flags
  proresenc: force bitrate not to exceed given limit
  vc1parse: call vc1_init_common().
  wma: don't return 0 on invalid packets.
  asf: prevent packet_size_left from going negative if hdrlen > pktlen.
  mjpegb: don't return 0 at the end of frame decoding.
  rtpdec: Identify incorrectly signalled H263
  vp8dsp: split long line.
  aiff: don't skip block_align==0 check on COMM-after-SSND files.
  dpcm: ignore extra unpaired bytes in stereo streams.
  mp3on4: require a minimum framesize.
  mpc7: assign an error level + context to av_log() msg.
  huffyuv: error out on bit overrun.
  dct-test: Add the missing ff_ prefix to the altivec functions
  dct-test: Remove a stray declaration of a nonexistent function
  movenc: Write the unknown duration as 64 bit fields in ismv
  movenc: Write track durations with all bits set if duration is unknown

Conflicts:
	libavcodec/dct-test.c
	libavcodec/wmadec.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-02-19 01:47:10 +01:00
Ronald S. Bultje
b1af4e9c27 vp8dsp: split long line.
Signed-off-by: Martin Storsjö <martin@martin.st>
2012-02-18 02:32:09 +02:00
Michael Niedermayer
042f9d62ca Merge remote-tracking branch 'qatar/master'
* qatar/master:
  configure: Automatically add more flags required on symbian
  mem.h: switch doxygen parameter order to match function prototype
  doxygen: replace @sa tag by the more readable but equivalent @see
  doxygen: use Doxygen markup for authors and web links where appropriate
  doxygen: do not include license boilerplate in Doxygen documentation
  ac3enc: Mark AVClasses const
  ffserver: Replace two loops with one loop.
  ffmpeg: Fix the check for experimental codecs
  swscale: extend mmx padding.
  swscale: clip unscaled colorspace conversion path.
  doxygen: misc consistency cosmetics
  doc: remove file name from @file directive in Doxygen usage example
  doxygen: consistently place brief description
  doxygen: place empty line between brief description and detailed description
  avformat_open_input(): Add braces to shut up gcc warning.

Conflicts:
	libavcodec/8svx.c
	libavcodec/tiff.c
	libavcodec/tiff.h
	libavcodec/vaapi_h264.c
	libavcodec/vorbis.c
	libavcodec/vorbisdec.c
	libavcodec/vp6.c
	libswscale/swscale_unscaled.c
	libswscale/utils.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2011-07-15 17:51:09 +02:00
Diego Biurrun
6168781f70 doxygen: do not include license boilerplate in Doxygen documentation 2011-07-15 00:52:09 +02:00
Mans Rullgard
2912e87a6c Replace FFmpeg with Libav in licence headers
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-03-19 13:33:20 +00:00
Mans Rullgard
ef15d71c1f VP8: ARM NEON optimisations for dsp functions
This adds NEON optimised versions of all functions in VP8DSPContext.
Based on initial work by Rob Clark.

Signed-off-by: Mans Rullgard <mans@mansr.com>
(cherry picked from commit a1c1d3c003)
2011-02-09 03:31:21 +01:00
Mans Rullgard
a1c1d3c003 VP8: ARM NEON optimisations for dsp functions
This adds NEON optimised versions of all functions in VP8DSPContext.
Based on initial work by Rob Clark.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-02-07 16:08:23 +00:00
Jason Garrett-Glaser
f311208cf1 VP8: much faster DC transform handling
A lot of the time the DC block is empty: don't do the WHT in this case.
A lot of the rest of the time, there's only one coefficient: make a special
DC-only transform for that case.
When the block is empty, don't incorrectly mark luma DCT blocks as having DC
coefficients.

Originally committed as revision 24670 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-08-02 20:57:03 +00:00
Jason Garrett-Glaser
3ae079a3c8 VP8: optimize DC-only chroma case in the same way as luma.
Add MMX idct_dc_add4uv function for this case.
~40% faster chroma idct.

Originally committed as revision 24455 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-23 06:02:52 +00:00
Jason Garrett-Glaser
8a467b2d44 VP8: 30% faster idct_mb
Take shortcuts based on statistically common situations.
Add 4-at-a-time idct_dc function (mmx and sse2) since rows of 4 DC-only DCT
blocks are common.
TODO: tie this more directly into the MB mode, since the DC-level transform is
only used for non-splitmv blocks?

Originally committed as revision 24452 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-23 02:58:27 +00:00
Ronald S. Bultje
3facfc99da Change function prototypes for width=8 inner and mbedge loopfilter functions
so that it does both U and V planes at the same time. This will have speed
advantages when using SSE2 (or higher) optimizations, since we can do both
the U and V rows together in a single xmm register.

This also renames filter16 to filter16y and filter8 to filter8uv so that it's
more obvious what each function is used for.

Originally committed as revision 24337 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-19 21:18:04 +00:00
David Conrad
982fac7357 Altivec VP8 MC functions
Originally committed as revision 23884 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-29 06:42:17 +00:00
Jason Garrett-Glaser
0178d14fe5 First shot at VP8 optimizations:
- MMXEXT, SSE2 and SSSE3 MC functions
- MMX and SSE4 IDCT dc_add functions

Patch by Jason Garrett-Glaser <darkshikari gmail com> and myself.

Originally committed as revision 23815 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-27 02:01:45 +00:00
David Conrad
0ef1dbedcb VP8 bilinear filter
Originally committed as revision 23813 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-27 01:46:29 +00:00
Jason Garrett-Glaser
d6f8476be4 Make VP8 DSP functions take two strides
This isn't useful for the C functions, but will allow re-using H and V functions
for HV functions without adding separate H and V wrappers.

Originally committed as revision 23782 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-25 18:14:07 +00:00
David Conrad
3b636f21da Native VP8 decoder.
Patch by David Conrad <lessen42 gmail com> and myself.

Originally committed as revision 23719 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-22 19:24:09 +00:00