FFmpeg

mirror of https://github.com/xenia-project/FFmpeg.git synced 2024-11-29 14:30:27 +00:00

Author	SHA1	Message	Date
Diego Biurrun	3b9e832e17	x86: Drop silly "_yasm" suffixes from filenames	2012-08-12 17:13:05 +02:00
Mans Rullgard	ec7c501ed5	x86: remove libmpeg2 mmx(ext) idct functions These functions are not faster than other mmx implementations on any hardware I have been able to test on, and they are horribly inaccurate. There is thus no reason to ever use them. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-08-02 12:14:52 +01:00
Ronald S. Bultje	b6a3849adb	fft: port FFT/IMDCT 3dnow functions to yasm, and disable on x86-64. 64-bit CPUs always have SSE available, thus there is no need to compile in the 3dnow functions. This results in smaller binaries.	2012-07-31 21:20:47 -07:00
Mans Rullgard	28f9ab7029	vp3: move idct and loop filter pointers to new vp3dsp context This moves all VP3-specific function pointers from dsputil to a new vp3dsp context. There is no reason to ever use the VP3 IDCT where an MPEG2 IDCT is expected or vice versa. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-07-18 10:32:19 +01:00
Mans Rullgard	ab9f987661	build: add CONFIG_VP3DSP, reduce repetition in OBJS lists Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-07-18 10:32:18 +01:00
Mans Rullgard	8299260470	x86: fft: convert sse inline asm to yasm	2012-06-25 13:31:00 +01:00
Diego Biurrun	7bb3a302fe	build: Consistently handle conditional compilation for all optimization OBJS.	2012-04-12 09:00:49 +02:00
Diego Biurrun	ad0e31f134	build: prettyprinting cosmetics	2012-03-26 13:00:10 +02:00
Diego Biurrun	915a2a0a65	x86: conditionally compile H.264 QPEL optimizations	2012-03-25 11:50:45 +02:00
Christophe GISQUET	34454c761f	SBR DSP x86: implement SSE sbr_sum_square_sse The 32bits targets have been compiled with -mfpmath=sse for proper reference. sbr_sum_square C /32bits: 82c (unrolled)/102c C /64bits: 69c (unrolled)/82c SSE/32bits: 42c SSE/64bits: 31c Use of SSE4.1 dpps to perform the final sum is slower. Not unrolling to perform 8 operations in a loop yields 10 more cycles. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-02-23 15:50:06 -08:00
Ronald S. Bultje	7e4d9d5d45	win64: add a XMM clobber test configure option. This will be useful to test more aggressively for failures to mark XMM registers as clobbered in Win64 builds, and prevent regressions thereof. Based on a patch by Ramiro Polla <ramiro.polla@gmail.com>	2012-02-02 12:00:48 -08:00
Christophe Gisquet	e5c9de2ab7	rv40: x86 SIMD for biweight Provide MMX, SSE2 and SSSE3 versions, with a fast-path when the weights are multiples of 512 (which is often the case when the values round up nicely). *_TIMER report for the 16x16 and 8x8 cases: C: 9015 decicycles in 16, 524257 runs, 31 skips 2656 decicycles in 8, 524271 runs, 17 skips MMX: 4156 decicycles in 16, 262090 runs, 54 skips 1206 decicycles in 8, 262131 runs, 13 skips MMX on fast-path: 2760 decicycles in 16, 524222 runs, 66 skips 995 decicycles in 8, 524252 runs, 36 skips SSE2: 2163 decicycles in 16, 262131 runs, 13 skips 832 decicycles in 8, 262137 runs, 7 skips SSE2 with fast path: 1783 decicycles in 16, 524276 runs, 12 skips 711 decicycles in 8, 524283 runs, 5 skips SSSE3: 2117 decicycles in 16, 262136 runs, 8 skips 814 decicycles in 8, 262143 runs, 1 skips SSSE3 with fast path: 1315 decicycles in 16, 524285 runs, 3 skips 578 decicycles in 8, 524286 runs, 2 skips This means around a 4% speedup for some sequences. Signed-off-by: Diego Biurrun <diego@biurrun.de>	2012-01-30 23:58:25 +01:00
Diego Biurrun	91bafb52ae	x86: Give RV40 init file a more suitable name.	2012-01-30 23:58:24 +01:00
Ronald S. Bultje	59f474b49d	png: convert DSP functions to yasm.	2012-01-29 18:47:50 -08:00
Ronald S. Bultje	e92003514d	png: move DSP functions to their own DSP context.	2012-01-29 08:11:18 -08:00
Christophe GISQUET	3faa303a47	rv34: DC-only inverse transform When decoding coefficients, detect whether the block is DC-only, and take advantage of this knowledge to perform DC-only inverse transform. This is achieved by: - first, changing the 108x4 element modulo_three_table into a 108 element table (kind of base4), and accessing each value using mask and shifts. - then, checking low bits for 0 (as they represent the presence of higher frequency coefficients) Also provide x86 SIMD code for the DC-only inverse transform. Signed-off-by: Kostya Shishkov <kostya.shishkov@gmail.com>	2012-01-12 09:52:33 +01:00
Vitor Sessak	39df0c434c	mpegaudiodec: optimized iMDCT transform Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-01-08 17:40:55 -08:00
Diego Biurrun	30bbd5cbc0	x86: conditionally compile dnxhd encoder optimizations	2011-12-19 13:54:10 +01:00
Diego Biurrun	88b9735753	build: conditionally compile x86 H.264 chroma optimizations	2011-12-14 11:58:45 +01:00
Ronald S. Bultje	e3f530feca	prores: idct sse2/sse4 optimizations. ~3.0-3.5x as fast as original C version, 1.6x as fast overall.	2011-10-11 07:50:48 -07:00
Kostya Shishkov	d241f51e0f	Move RV3/4-specific DSP functions into their own context Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2011-08-11 16:07:15 -07:00
Daniel Kang	9bfa5363da	H.264: Add x86 assembly for 10-bit H.264 qpel functions. Mainly ported from 8-bit H.264 qpel. Some code ported from x264. LGPL ok by author. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2011-07-03 07:43:38 -07:00
Daniel Kang	84e70ef004	h264: Add x86 assembly for 10-bit weight/biweight H.264 functions. Mainly ported from 8-bit H.264 weight/biweight. Signed-off-by: Diego Biurrun <diego@biurrun.de>	2011-06-21 15:24:13 +02:00
Daniel Kang	f188a1e0ca	H.264: Add x86 assembly for 10-bit MC Chroma H.264 functions. Mainly ported from 8-bit H.264 MC Chroma. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2011-06-18 07:52:19 -04:00
Daniel Kang	a8d44f9dd5	Add x86 assembly for some 10-bit H.264 intra predict functions. Parts are inspired from the 8-bit H.264 predict code in Libav. Other parts ported from x264 with relicensing permission from author. Signed-off-by: Diego Biurrun <diego@biurrun.de>	2011-06-06 01:31:02 +02:00
Daniel Kang	836f47d34b	Add IDCT functions for 10-bit H.264. Ports the majority of IDCT functions for 10-bit H.264. Parts are inspired from 8-bit IDCT code in Libav; other parts ported from x264 with relicensing permission from author. Signed-off-by: Ronald S. Bultje <rbultje@google.com>	2011-05-31 15:02:32 -07:00
Vitor Sessak	3758eb0eb9	dct32: port SSE 32-point DCT to YASM	2011-05-21 17:42:26 +02:00
Mans Rullgard	0b5e44ed29	mpegaudiodsp: fix x86 and ppc makefiles Signed-off-by: Mans Rullgard <mans@mansr.com>	2011-05-19 16:32:24 +01:00
Jason Garrett-Glaser	9f3d6ca4f1	Port x86 10-bit H.264 deblock asm from x264	2011-05-10 20:02:15 -07:00
Mans Rullgard	a5444fee06	Add CONFIG_AC3DSP symbol to simplify makefiles Signed-off-by: Mans Rullgard <mans@mansr.com>	2011-03-12 11:35:26 +00:00
Justin Ruggles	dda3f0ef48	Add x86-optimized versions of exponent_min(). Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2011-02-10 15:32:47 -05:00
Justin Ruggles	c73d99e672	Separate format conversion DSP functions from DSPContext. This will be beneficial for use with the audio conversion API without requiring it to depend on all of dsputil. Signed-off-by: Mans Rullgard <mans@mansr.com>	2011-02-02 02:44:53 +00:00
Ronald S. Bultje	d0acc2d2e9	Move sse16_sse2() from inline asm to yasm. It is one of the functions causing Win64/FATE issues. Originally committed as revision 25136 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-17 01:44:17 +00:00
Ronald S. Bultje	1d16a1cf99	Rename h264_idct_sse2.asm to h264_idct.asm; move inline IDCT asm from h264dsp_mmx.c to h264_idct.asm (as yasm code). Because the loops are now coded in asm instead of C, this is (depending on the function) up to 50% faster for cases where gcc didn't do a great job at looping. Since h264_idct_add8() is now faster than the manual loop setup in h264.c, in-asm idct calling can now be enabled for chroma as well (see r16207). For MMX, this is 5% faster. For SSE2 (which isn't done for chroma if h264.c does the looping), this makes it up to 50% faster. Speed gain overall is ~0.5-1.0%. Originally committed as revision 25119 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-14 13:36:26 +00:00
Jason Garrett-Glaser	8acb554aff	LGPL SSE2 H.264 iDCT This leaves no more GPL-only H.264 decoding asm code. Approved by Loren. Originally committed as revision 25092 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-10 02:25:12 +00:00
Stefano Sabatini	c6c98d0897	Move mm_support() from libavcodec to libavutil, make it a public function and rename it to av_get_cpu_flags(). Originally committed as revision 25076 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-08 15:07:14 +00:00
Ronald S. Bultje	2c166c3af1	Port latest x264 deblock asm (before they moved to using NV12 as internal format), LGPL'ed with permission from Jason and Loren. This includes mmx2 code, so remove inline asm from h264dsp_mmx.c accordingly. Originally committed as revision 25031 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-03 16:52:46 +00:00
Ronald S. Bultje	a33a2562c1	Rename h264_weight_sse2.asm to h264_weight.asm; add 16x8/8x16/8x4 non-square biweight code to sse2/ssse3; add sse2 weight code; and use that same code to create mmx2 functions also, so that the inline asm in h264dsp_mmx.c can be removed. OK'ed by Jason on IRC. Originally committed as revision 25019 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-01 20:56:16 +00:00
Ronald S. Bultje	14bc1f2485	Split h264dsp_mmx.c (which was #included in dsputil_mmx.c) in h264_qpel_mmx.c, still #included in dsputil_mmx.c and is part of DSPContext, and h264dsp_mmx.c, which represents H264DSPContext and is now compiled on its own. Originally committed as revision 25018 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-09-01 20:48:59 +00:00
Ronald S. Bultje	5929b3a651	Fix vertical align. Originally committed as revision 25009 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-31 12:32:24 +00:00
Ronald S. Bultje	de1c253bab	Split intra prediction initialization (i.e. assigning of function pointers) into its own file, it doesn't belong in h264dsp_mmx.c (much less so in dsputil_mmx.c). Originally committed as revision 24990 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-30 16:34:13 +00:00
Ronald S. Bultje	d0eb5a1174	Move H264 chroma MC from inline asm to yasm. This fixes VP3/5/6 and VC-1 fate failures on Win64. Originally committed as revision 24989 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-30 16:31:04 +00:00
Ronald S. Bultje	e9f5f020c6	Move VP3 IDCT functions from inline ASM to YASM. This fixes part of the VP3/5/6 issues on Win64. Originally committed as revision 24988 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-30 16:25:46 +00:00
Ronald S. Bultje	89fa3504ed	Move vp6_filter_diag4() x86 SIMD code from inline ASM to YASM. This should help in fixing the Win64 fate failures. Originally committed as revision 24922 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-25 13:44:16 +00:00
Ronald S. Bultje	3a0885146c	Move vp6_filter_diag4() from DSPContext to VP56DSPContext. Originally committed as revision 24921 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-25 13:42:28 +00:00
Jason Garrett-Glaser	4a384de5b8	Split h264dsp and h264pred in configure. Many H.264 derivatives, like RV40 and VP8, use the H.264 prediction functions but not the weight/loopfilter functions. This should reduce the size of builds with one of these derivatives but without H.264 decoding itself. Originally committed as revision 24741 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-07 23:10:25 +00:00
Eli Friedman	c12d6955e2	H.264: SSE2/SSSE3 weighted prediction asm Patch by Eli Friedman <eli.friedman at gmail dot com> Originally committed as revision 24702 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-08-05 00:13:38 +00:00
Vitor Sessak	de4bc44abb	Convert deinterlacing MMX code to YASM Originally committed as revision 24615 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-31 14:50:51 +00:00
Loren Merritt	c7b1d9768c	relicense h264 deblock sse2 to lgpl Originally committed as revision 24408 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-22 00:39:49 +00:00
David Conrad	faa26db28b	MMX/SSE VC1 loop filter Originally committed as revision 24208 to svn://svn.ffmpeg.org/ffmpeg/trunk	2010-07-11 22:53:01 +00:00

1 2

58 Commits