Using the not so new init and uninit callbacks, avcodec can now take
care of creating and destroying the VDPAU decoder instance.
The application is still responsible for creating the VDPAU device
and allocating video surfaces - this is necessary to keep video
surfaces on the GPU all the way to the output. But the application
will no longer needs to care about any codec-specific aspects.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
This is similar to what is done in libx264.c
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
These function pointers already existed in the ARM code. Adding them globally
allows calls to the function pointers to access arch-optimized versions of the
functions transparently.
Only set a value if _WIN32_WINNT is undefined or smaller than 0x0600. This is
cleaner than unconditional definition and avoids a number of redefinition
warnings. Also only define a value in one of the two dxva2 headers.
MpegEncContext based decoders are only fully initialized after the first
ff_thread_get_buffer() call. The RV30/40 decoders may fail before a frame
buffer was requested. ff_mpeg_update_thread_context() fails on half
initialized MpegEncContexts. Since this can only happen before a the
first frame was decoded there is no need to call
ff_mpeg_update_thread_context().
Based on patches by John Stebbins and tested by John Stebbins.
CC: libav-stable@libav.org
The packet buffer allocation considers the alpha channel as DCT-coded,
while it is actually run-coded and thus requires a larger buffer.
CC: libav-stable@libav.org
Signed-off-by: Diego Biurrun <diego@biurrun.de>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
The buffer allocation may be incorrect (e.g. with an alpha plane),
and currently causes the buffer to be set to NULL by init_put_bits,
causing a crash later on.
So, detect that situation, and if detected, reallocate the buffer
and ask for a sample that shows the problem.
CC: libav-stable@libav.org
Signed-off-by: Diego Biurrun <diego@biurrun.de>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
If the allocated size, despite best efforts, is too small, exit
with the appropriate error.
CC: libav-stable@libav.org
Signed-off-by: Diego Biurrun <diego@biurrun.de>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
The LZMA support is a semi-official extension supported by libtiff 4.0.0
and later.
Signed-off-by: Diego Elio Pettenò <flameeyes@flameeyes.eu>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Such changes are neither allowed nor supported
Found-by: ami_stuff
Bug-Id: CVE-2013-7020
CC: libav-stable@libav.org
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Reduces the number of calls to tmvp derivation from 933685 to 586271 on
a sequence.
Reviewed-by: Mickaël Raulet <mraulet@insa-rennes.fr>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
The position is either rounded or not checked, so delay the wait to
check the proper value.
Reviewed-by: Mickaël Raulet <mraulet@insa-rennes.fr>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Only use PAL8 if palette is present, else use GRAY8 for pixfmt.
Instead of simulating a grayscale palette, use real grayscale pixels, if no
palette is actually defined.
Signed-off-by: Diego Elio Pettenò <flameeyes@flameeyes.eu>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
1) each of the loops run within a single CTB, so the relevant reference
list is constant
2) when that CTB is, or lies on the same slice as, the current one, we
can use a simple access instead of a relatively expensive call to
ff_hevc_get_ref_list()
It allows attaching other external, opaque data to the frame and passing it
through the reordering process, for cases when the caller wants other data
than just the plain packet pts. There is no way to cleanly achieve this
without the field.
The input data must remain constant, make a copy instead. This is in
theory a performance hit, but since I failed to find any samples
using this feature, this should not matter in practice.
Also, check the size of the header, avoiding invalid reads on truncated
data.
CC:libav-stable@libav.org
The previous implementation of the parser made four passes over each input
buffer (reduced to two if the container format already guaranteed the input
buffer corresponded to frames, such as with MKV). But these buffers are
often 200K in size, certainly enough to flush the data out of L1 cache, and
for many CPUs, all the way out to main memory. The passes were:
1) locate frame boundaries (not needed for MKV etc)
2) copy the data into a contiguous block (not needed for MKV etc)
3) locate the start codes within each frame
4) unescape the data between start codes
After this, the unescaped data was parsed to extract certain header fields,
but because the unescape operation was so large, this was usually also
effectively operating on uncached memory. Most of the unescaped data was
simply thrown away and never processed further. Only step 2 - because it
used memcpy - was using prefetch, making things even worse.
This patch reorganises these steps so that, aside from the copying, the
operations are performed in parallel, maximising cache utilisation. No more
than the worst-case number of bytes needed for header parsing is unescaped.
Most of the data is, in practice, only read in order to search for a start
code, for which optimised implementations already existed in the H264 codec
(notably the ARM version uses prefetch, so we end up doing both remaining
passes at maximum speed). For MKV files, we know when we've found the last
start code of interest in a given frame, so we are able to avoid doing even
that one remaining pass for most of the buffer.
In some use-cases (such as the Raspberry Pi) video decode is handled by the
GPU, but the entire elementary stream is still fed through the parser to
pick out certain elements of the header which are necessary to manage the
decode process. As you might expect, in these cases, the performance of the
parser is significant.
To measure parser performance, I used the same VC-1 elementary stream in
either an MPEG-2 transport stream or a MKV file, and fed it through avconv
with -c:v copy -c:a copy -f null. These are the gperftools counts for
those streams, both filtered to only include vc1_parse() and its callees,
and unfiltered (to include the whole binary). Lower numbers are better:
Before After
File Filtered Mean StdDev Mean StdDev Confidence Change
M2TS No 861.7 8.2 650.5 8.1 100.0% +32.5%
MKV No 868.9 7.4 731.7 9.0 100.0% +18.8%
M2TS Yes 250.0 11.2 27.2 3.4 100.0% +817.9%
MKV Yes 149.0 12.8 1.7 0.8 100.0% +8526.3%
Yes, that last case shows vc1_parse() running 86 times faster! The M2TS
case does show a larger absolute improvement though, since it was worse
to begin with.
This patch has been tested with the FATE suite (albeit on x86 for speed).
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Initialise VC1DSPContext for parser as well as for decoder.
Note, the VC-1 code doesn't actually use the function pointer yet.
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
The rationale is that you have a packed format in form
<greyscale sample> <alpha sample> <greyscale sample> <alpha sample>
and shortening greyscale to 'G' might make one thing about Greenscale instead.
An alias pixel format and color space name are provided for compatibility.
Bug-Id: CVE-2013-0868
inspired by a patch from Michael Niedermayer <michaelni@gmx.at>
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Diego Biurrun <diego@biurrun.de>
CC: libav-stable@libav.org
llvm's integrated assembler does not accept spaces as macro argument
delimiter when targeting darwin. Using a explicit delimiter is a good
idea in principle since it makes case like 'macro 4 -2' vs 'macro 4 - 2'
clear.
Properly address CVE-2011-3946 and parse bitstream as described in the spec.
CC: libav-stable@libav.org
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Make sure the buffer size does not exceed the expected
RLE size.
Prevent an out of array bound write.
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Bug-Id: CVE-2013-0852
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Additional contributions by James Almer <jamrial@gmail.com>,
Carl Eugen Hoyos <cehoyos@ag.or.at>, Fiona Glaser <fiona@x264.com> and
Anton Khirnov <anton@khirnov.net>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
The typedefs also exist in the avfft.h header and since typedefs cannot be
legally redefined in C, the code fails to compile with some compilers.
This reverts commits 11c7155cce and 57f1b1dcc7.
Intrinsics only used on aarch64 since the existing ARMv7 NEON asm
is slightly faster (Cortex-A9, gcc-4.8, micro-benchmarks and full
decoding time).
Signed-off-by: James Yu <james.yu@linaro.org>
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
Such files can be created using the --bff x264 option.
Sample-Id: h264_direct_temporal_mvs_bff.mkv
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
The previous implementation targeted DTS Coherent Acoustics, which only
requires nbits == 4 (fft16()). This case was (and still is) linked directly
rather than being indirected through ff_fft_calc_vfp(), but now the full
range from radix-4 up to radix-65536 is available. This benefits other codecs
such as AAC and AC3.
The implementaion is based upon the C version, with each routine larger than
radix-16 calling a hierarchy of smaller FFT functions, then performing a
post-processing pass. This pass benefits a lot from loop unrolling to
counter the long pipelines in the VFP. A relaxed calling standard also
reduces the overhead of the call hierarchy, and avoiding the excessive
inlining performed by GCC probably helps with I-cache utilisation too.
I benchmarked the result by measuring the number of gperftools samples that
hit anywhere in the AAC decoder (starting from aac_decode_frame()) or
specifically in the FFT routines (fft4() to fft512() and pass()) for the
same sample AAC stream:
Before After
Mean StdDev Mean StdDev Confidence Change
Audio decode 2245.5 53.1 1599.6 43.8 100.0% +40.4%
FFT routines 940.6 22.0 348.1 20.8 100.0% +170.2%
Signed-off-by: Martin Storsjö <martin@martin.st>
The previous implementation targeted DTS Coherent Acoustics, which only
requires mdct_bits == 6. This relatively small size lent itself to
unrolling the loops a small number of times, and encoding offsets
calculated at assembly time within the load/store instructions of each
iteration.
In the more general case (codecs such as AAC and AC3) much larger arrays
are used - mdct_bits == [8, 9, 11]. The old method does not scale for
these cases, so more integer registers are used with non-unrolled versions
of the loops (and with some stack spillage). The postrotation filter loop
is still unrolled by a factor of 2 to permit the double-buffering of some
VFP registers to facilitate overlap of neighbouring iterations.
I benchmarked the result by measuring the number of gperftools samples
that hit anywhere in the AAC decoder (starting from aac_decode_frame())
or specifically in ff_imdct_half_c / ff_imdct_half_vfp, for the same
example AAC stream:
Before After
Mean StdDev Mean StdDev Confidence Change
aac_decode_frame 2368.1 35.8 2117.2 35.3 100.0% +11.8%
ff_imdct_half_* 457.5 22.4 251.2 16.2 100.0% +82.1%
Signed-off-by: Martin Storsjö <martin@martin.st>
For implicit signaling cases (as possible for Spectral Band Replication
and Parametric Stereo Tools), the decoder must decode the first frame to
correctly identify the stream configuration (as called from
avformat_find_stream_info). The mechanism for this is built-in and only
requires adding CODEC_CAP_CHANNEL_CONF to the libfdk-aacdec AVCodec
struct.
Signed-off-by: Omer Osman <omer.osman@iis.fraunhofer.de>
Signed-off-by: Martin Storsjö <martin@martin.st>
The default error concealment method if none is set via
aacDecoder_SetParam(AAC_CONCEAL_METHOD) is set in
CConcealment_InitCommonData within the fdk-aac library
and is set to Energy Interpolation. This method requires one frame
delay to the output. To reduce the default decoder output delay and
avoid missing the last frame in file based decoding, use Noise
Substitution as the default concealment method.
Signed-off-by: Omer Osman <omer.osman@iis.fraunhofer.de>
Signed-off-by: Martin Storsjö <martin@martin.st>
Add ability to handle multiple palettes and objects simultaneously.
Each simultaneous object is given its own AVSubtitleRect.
Note that there can be up to 64 currently valid objects, but only
2 at any one time can be "presented".
Signed-off-by: Anton Khirnov <anton@khirnov.net>