11674 -> 10877 decicycles on my Phenom II.
Overall speedup was unfortunately within measurement error.
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
This fixes two issues preventing suncc from building this code.
The undocumented 'a' operand modifier, causing gcc to omit a $ in
front of immediate operands (as required in addresses), is not
supported by suncc. Luckily, the also undocumented 'c' modifer
has the same effect and is supported.
On some asm statements with a large number of operands, suncc for no
obvious reason fails to correctly substitute some of the operands.
Fortunately, some of the operands in these statements are plain
numbers which can be inserted directly into the code block instead
of passed as operands.
With these changes, the code builds correctly with both gcc and
suncc.
Signed-off-by: Mans Rullgard <mans@mansr.com>
* qatar/master:
wtv: Check the return value from gmtime
x86: fft: convert sse inline asm to yasm
x86: place some inline asm under #if HAVE_INLINE_ASM
Conflicts:
libavcodec/x86/fft_sse.c
libavformat/wtv.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
log: Only include unistd.h if configure found it
ape: create audio stream before reading tags.
mov: make a length variable larger.
image2: Add "start_number" private option to the demuxer
image2: Add "start_number" private option to the muxer
avconv: remove a forgotten debugging printf.
avconv: use more descriptive names for hardcoded filters.
avconv: remove redundant handling of async.
doc/filters: fix typo.
h264: use asm cabac reader under a generic condition
Conflicts:
ffmpeg.c
libavformat/img2dec.c
libavformat/img2enc.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
This removes a dependency on implementation details from generic
code and allows easy addition of the equivalent optimisation for
other architectures than x86.
Signed-off-by: Mans Rullgard <mans@mansr.com>
This adds a hand-optimized assembly version for get_cabac much like the
existing one, but it works if the table offsets are RIP-relative.
Compared to the non-RIP-relative version this adds 2 lea instructions
and it needs one extra register.
There is a surprisingly large performance improvement over the c version (more
so than the generated assembly seems to suggest) just in get_cabac, I measured
roughly 40% faster for get_cabac on a K8. However, overall the difference is
not that big, I measured roughly 5% on a test clip on a K8 and a Core2.
Hopefully it still compiles on x86 32bit...
Now that only one table is used, there's some chance even darwin as compiles
this (apparently the label arithmetic used previously doesn't work if it
involves symbols defined in a different file, thanks to Ronald S. Bultje for
helping me with this).
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
The reason is this is easier for PIC code (in particular on darwin...).
Keep the old names as pointers (static in cabac_functions.h so gcc
knows these are just immediate offsets) so the c code can nicely stay the same
(alternatively could use offsets directly in the functions needing the
tables). This should produce the same code as before with non-pic and better
code (confirmed) with pic.
The assembly uses the new table but still won't work for PIC case.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This adds a hand-optimized assembly version for get_cabac much like the
existing one, but it works if the table offsets are RIP-relative.
Compared to the non-RIP-relative version this adds 2 lea instructions
and it needs one extra register. get_cabac() gets about 40% faster, for
an overall speedup of about 5%.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
The reason is this is easier for PIC code (in particular on darwin...).
Keep the old names as pointers (static in cabac_functions.h so gcc
knows these are just immediate offsets) so the c code can nicely stay the same
(alternatively could use offsets directly in the functions needing the
tables). This should produce the same code as before with non-pic and better
code (confirmed) with pic.
The assembly uses the new table but still won't work for PIC case.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
This adds a hand-optimized assembly version for get_cabac much like the
existing one, but it works if the table offsets are RIP-relative.
Compared to the non-RIP-relative version this adds 2 lea instructions
and it needs one extra register.
There is a surprisingly large performance improvement over the c version (more
so than the generated assembly seems to suggest) just in get_cabac, I measured
roughly 40% faster for get_cabac on a K8. However, overall the difference is
not that big, I measured roughly 5% on a test clip on a K8 and a Core2.
Hopefully it still compiles on x86 32bit...
v2: incorporated feedback from Loren Merritt to avoid rip-relative movs
for every table, and got rid of unnecessary @GOTPCREL.
v3: apply similar fixes to the the decode_significance functions, and use
same macro arguments for non-pic case.
v4: prettify inline asm arguments, add a non-fast-cmov version (as I expect
the c code to be faster otherwise since both cmov and sbb suck hard on a
Prescott, even can't construct the mask with a 64bit shift as that's just as
terrible - it's quite difficult to find usable instructions on that chip...).
This is tested to work but not on a P4, in theory it _should_ be fast there.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
asf: only set index_read if the index contained entries.
cabac: add overread protection to BRANCHLESS_GET_CABAC().
cabac: increment jump locations by one in callers of BRANCHLESS_GET_CABAC().
cabac: remove unused argument from BRANCHLESS_GET_CABAC_UPDATE().
cabac: use struct+offset instead of memory operand in BRANCHLESS_GET_CABAC().
h264: add overread protection to get_cabac_bypass_sign_x86().
h264: reindent get_cabac_bypass_sign_x86().
h264: use struct offsets in get_cabac_bypass_sign_x86().
h264: fix overreads in cabac reader.
wmall: fix seeking.
lagarith: fix buffer overreads.
dvdec: drop unnecessary dv_tablegen.h #include
build: fix doc generation errors in parallel builds
Replace memset(0) by zero initializations.
faandct: Remove FAAN_POSTSCALE define and related code.
dvenc: print allowed profiles if the video doesn't conform to any of them.
avcodec_encode_{audio,video}: only reallocate output packet when it has non-zero size.
FATE: add a test for vp8 with changing frame size.
fate: add kgv1 fate test.
oggdec: calculate correct timestamps in Ogg/FLAC
Conflicts:
libavcodec/4xm.c
libavcodec/cook.c
libavcodec/dvdata.c
libavcodec/dvdsubdec.c
libavcodec/lagarith.c
libavcodec/lagarithrac.c
libavcodec/utils.c
tests/fate/video.mak
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Author: Mans Rullgard <mans@mansr.com>
Date: Sun Dec 11 21:41:59 2011 +0000
x86: cabac: replace explicit memory references with "m" operands
This replaces the explicit offset(reg) memory references with
"m" operands for the same locations. As a result, one fewer
register operand is needed for these inline asm statements.
This change appears to have broken compilation on darwin, and subsequent
fixes by martin (which did not fix compilation) removed the register
advantage, thus this change seems not a good idea to keep.
See: http://fate.ffmpeg.org/log.cgi?time=20120103122446&log=compile&slot=i386-darwin-llvm-gcc-4.2.1
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
mpegvideo_enc: K&R cosmetics
doxygen: remove unreplaced variables from custom header and footer
threads: test for sys/param.h and include it for sysctl on OpenBSD
v4l2: remove unneded linux specific asm/types.h include
x86: Fix constraints for decode_significance*_x86
Conflicts:
libavcodec/mpegvideo_enc.c
libavdevice/v4l2.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Originally, prior to 8742a4ff8, the caller code was compiled
within this condition:
ARCH_X86 && HAVE_7REGS && HAVE_EBX_AVAILABLE && !defined(BROKEN_RELOCATIONS)
Since HAVE_7REGS is defined as
(ARCH_X86_64 || (HAVE_EBX_AVAILABLE && HAVE_EBP_AVAILABLE))
the subcondition HAVE_7REGS && HAVE_EBX_AVAILABLE is equal
to HAVE_7REGS (for 32 bit at least). The correct simplification
of the original condition thus is HAVE_7REGS, not
HAVE_EBX_AVAILABLE.
This fixes compilation in some cases where HAVE_EBP_AVAILABLE = 0
and HAVE_EBX_AVAILABLE = 1.
Signed-off-by: Martin Storsjö <martin@martin.st>
* qatar/master: (27 commits)
asfdec: add side data to ASFStream packet instead of output packet.
idroqdec: set AVFMTCTX_NOHEADER and create streams as they occur.
nellymoserdec: Indicate that the decoder can handle changed parameters
libavcodec: Apply parameter change side data when decoding audio
flvdec: Add param change side data if the sample rate or channels have changed
libavformat: Add a utility function for adding parameter change side data
libavcodec: Define a side data type for parameter changes
aacdec: Handle new extradata passed as side data
flvdec: Export new AAC/H.264 extradata as side data on the next packet
libavcodec: Define a side data type for new extradata
flacdec: skip all track indices at once instead of looping.
mxf: Add PictureEssenceCoding UL for V210.
mxfdec: consider QuantizationBits between 17 and 24 to be pcm_s24*
mxfenc: Add support for MPEG-2 MP@HL-14 in mxf container.
mxf: H.264/MPEG-4 AVC Intra support
configure: Show whether the safe bitstream reader is enabled
x86: Tighten register constraints for decode_significance*_x86.
Replace Subversion revisions in comments by Git hashes.
h264_cabac: synchronize decode_significance_*_x86 conditionals
w32threads: wait for the waked thread in pthread_cond_signal.
...
Conflicts:
libavcodec/avcodec.h
libavcodec/version.h
libavformat/flvdec.c
libavformat/utils.c
tests/ref/lavfi/pixdesc
tests/ref/lavfi/pixfmts_copy
tests/ref/lavfi/pixfmts_null
tests/ref/lavfi/pixfmts_scale
tests/ref/lavfi/pixfmts_vflip
Merged-by: Michael Niedermayer <michaelni@gmx.at>
On 32-bit OS X with gcc 4.0/4.2 and shared libraries enabled, the ebx register
is not available, but required to assemble the functions.
This reverts commit 8742a4f to a simplified version of the original constraints.
* qatar/master:
x86: cabac: replace explicit memory references with "m" operands
avplay: don't request a stereo downmix
wmapro: use av_float2int()
lavc: avoid invalid memcpy() in avcodec_default_release_buffer()
lavu: replace int/float punning functions
lavfi: install libavfilter/vsrc_buffer.h
Remove extraneous semicolons
sdp: Restore the original mp4 format h264 extradata if converted
rtpenc: Add support for mp4 format h264
rtpenc: Simplify code by introducing a separate end pointer
movenc: Use the actual converted sample for RTP hinting
Fix a bunch of common typos.
Conflicts:
doc/developer.texi
doc/eval.texi
doc/filters.texi
doc/protocols.texi
ffmpeg.c
ffplay.c
libavcodec/mpegvideo.h
libavcodec/x86/cabac.h
libavfilter/Makefile
libavformat/avformat.h
libavformat/cafdec.c
libavformat/flvdec.c
libavformat/flvenc.c
libavformat/gxfenc.c
libavformat/img2.c
libavformat/movenc.c
libavformat/mpegts.c
libavformat/rtpenc_h264.c
libavformat/utils.c
libavformat/wtv.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
This replaces the explicit offset(reg) memory references with
"m" operands for the same locations. As a result, one fewer
register operand is needed for these inline asm statements.
Signed-off-by: Mans Rullgard <mans@mansr.com>
* qatar/master:
tls: Use ERR_get_error() in do_tls_poll
indeo3: Fix a fencepost error.
mxfdec: Fix comparison of unsigned expression < 0.
mpegts: set stream id on just created stream, not an unrelated variable
ra288: return error if input buffer is too small
ra288: utilize DSPContext.vector_fmul()
ra288: use memcpy() to copy decoded samples to output
mace: only calculate output buffer size once
Remove redundant filename self-references inside files.
indeo3data: add missing config.h #include for HAVE_BIGENDIAN
x86: drop pointless ARCH_X86 #ifdef from files in x86 subdirectory
avplay: reset rdft when closing stream.
doc/git-howto: expand format-patch and send-email notes.
lavf: expand doxy for some AVFormatContext fields.
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
mxfdec: Include FF_INPUT_BUFFER_PADDING_SIZE when allocating extradata.
H.264: tweak some other x86 asm for Atom
probe: Fix insane flow control.
mpegts: remove invalid error check
s302m: use nondeprecated audio sample format API
lavc: use designated initialisers for all codecs.
x86: cabac: add operand size suffixes missing from 6c32576
Conflicts:
libavcodec/ac3enc_float.c
libavcodec/flacenc.c
libavcodec/frwu.c
libavcodec/pictordec.c
libavcodec/qtrleenc.c
libavcodec/v210enc.c
libavcodec/wmv2dec.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
h263dec: Propagate AV_LOG_ERRORs from slice decoding through frame decoding with sufficient error recognition
x86: cabac: don't load/store context values in asm
H.264: optimize CABAC x86 asm for Atom
vp3/theora: flush after seek.
doc/fftools-common-opts: wording fixes missing from the previous commit.
doc: document using AVOptions in fftools.
cmdutils: add codec_opts parameter to setup_find_stream_info_opts()
cmdutils: clarify documentation for filter_codec_opts()
cmdutils: clarify documentation for setup_find_stream_info_opts()
lavf: add forgotten attribute_deprecated to av_find_stream_info()
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Inspection of compiled code shows gcc handles these fine on its own.
Benchmarking also shows no measurable speed difference.
Removing the remaining cases in get_cabac_bypass_sign_x86() does
cause more substantial changes to the compiled code with uncertain
impact.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Some operands need to be accessed in byte mode, which restricts the
available registers in 32-bit mode. Using the 'q' constraint selects
a suitable register.
Signed-off-by: Mans Rullgard <mans@mansr.com>
* qatar/master:
swscale: remove misplaced comment.
ffmpeg: fix streaming to ffserver.
swscale: split out RGB48 output functions from yuv2packed[12X]_c().
build: move vpath directives to main Makefile
swscale: fix JPEG-range YUV scaling artifacts.
build: move ALLFFLIBS to a more logical place
ARM: factor some repetitive code into macros
Fix SVQ3 after adding 4:4:4 H.264 support
H.264: fix CODEC_FLAG_GRAY
4:4:4 H.264 decoding support
ac3enc: fix allocation of floating point samples.
Conflicts:
ffmpeg.c
libavcodec/dsputil_template.c
libavcodec/h264.c
libavcodec/mpegvideo.c
libavcodec/snow.c
libswscale/swscale.c
libswscale/swscale_internal.h
Merged-by: Michael Niedermayer <michaelni@gmx.at>