Based on the aarch64 asm. CPU cycle counts on cortex-a9 compared to
gcc 4.8.2:
before: 475 decicycles in get_cabac_noinline, 67106035 runs, 2829 skips
after: 393 decicycles in get_cabac_noinline, 67106474 runs, 2390 skips
Overall speedup is above 2%. Code generated by clang 3.4 is slower on
the same hardware and the relative change is a little larger.
Based on the x86 branchless get_cabac asm. get_cabac_noinline() gets
approximately 20% faster (no cycle counts available) compared to clang
from Xcode 5.1 beta5. More than 6% faster overall. A part of the overall
speedup might be explained by additional inlining of get_cabac().
Initially written by Guillaume Martres <smarter@ubuntu.com> as a GSoC
project. Further contributions by the OpenHEVC project and other
developers, namely:
Mickaël Raulet <mraulet@insa-rennes.fr>
Seppo Tomperi <seppo.tomperi@vtt.fi>
Gildas Cocherel <gildas.cocherel@laposte.net>
Khaled Jerbi <khaled_jerbi@yahoo.fr>
Wassim Hamidouche <wassim.hamidouche@insa-rennes.fr>
Vittorio Giovara <vittorio.giovara@gmail.com>
Jan Ekström <jeebjp@gmail.com>
Anton Khirnov <anton@khirnov.net>
Martin Storsjö <martin@martin.st>
Luca Barbato <lu_zero@gentoo.org>
Yusuke Nakamura <muken.the.vfrmaniac@gmail.com>
Reimar Döffinger <Reimar.Doeffinger@gmx.de>
Diego Biurrun <diego@biurrun.de>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Fixes overreads in HEVC
Fixes Ticket3070
Also fixed remaining issues from Ticket3075 and Ticket3076
Some lines of code taken from 0c5f839693da2276c2da23400f67a67be4ea0af1:libavcodec/x86/cabac.h
and 0c5f839693da2276c2da23400f67a67be4ea0af1:libavcodec/cabac_functions.h
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Initially written by Guillaume Martres <smarter@ubuntu.com> as a GSoC
project. Further contributions by the OpenHEVC project and other
developers, namely:
Mickaël Raulet <mraulet@insa-rennes.fr>
Seppo Tomperi <seppo.tomperi@vtt.fi>
Gildas Cocherel <gildas.cocherel@laposte.net>
Khaled Jerbi <khaled_jerbi@yahoo.fr>
Wassim Hamidouche <wassim.hamidouche@insa-rennes.fr>
Vittorio Giovara <vittorio.giovara@gmail.com>
Jan Ekström <jeebjp@gmail.com>
Anton Khirnov <anton@khirnov.net>
Martin Storsjö <martin@martin.st>
Luca Barbato <lu_zero@gentoo.org>
Yusuke Nakamura <muken.the.vfrmaniac@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'e6d8acf6a8fba4743eb56eabe72a741d1bbee3cb':
indeo: use a typedef for the mc function pointer
cabac: x86 version of get_cabac_bypass
aic: use chroma scan tables while decoding luma component in progressive mode
Conflicts:
libavcodec/aic.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
The reason is this is easier for PIC code (in particular on darwin...).
Keep the old names as pointers (static in cabac_functions.h so gcc
knows these are just immediate offsets) so the c code can nicely stay the same
(alternatively could use offsets directly in the functions needing the
tables). This should produce the same code as before with non-pic and better
code (confirmed) with pic.
The assembly uses the new table but still won't work for PIC case.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
asf: only set index_read if the index contained entries.
cabac: add overread protection to BRANCHLESS_GET_CABAC().
cabac: increment jump locations by one in callers of BRANCHLESS_GET_CABAC().
cabac: remove unused argument from BRANCHLESS_GET_CABAC_UPDATE().
cabac: use struct+offset instead of memory operand in BRANCHLESS_GET_CABAC().
h264: add overread protection to get_cabac_bypass_sign_x86().
h264: reindent get_cabac_bypass_sign_x86().
h264: use struct offsets in get_cabac_bypass_sign_x86().
h264: fix overreads in cabac reader.
wmall: fix seeking.
lagarith: fix buffer overreads.
dvdec: drop unnecessary dv_tablegen.h #include
build: fix doc generation errors in parallel builds
Replace memset(0) by zero initializations.
faandct: Remove FAAN_POSTSCALE define and related code.
dvenc: print allowed profiles if the video doesn't conform to any of them.
avcodec_encode_{audio,video}: only reallocate output packet when it has non-zero size.
FATE: add a test for vp8 with changing frame size.
fate: add kgv1 fate test.
oggdec: calculate correct timestamps in Ogg/FLAC
Conflicts:
libavcodec/4xm.c
libavcodec/cook.c
libavcodec/dvdata.c
libavcodec/dvdsubdec.c
libavcodec/lagarith.c
libavcodec/lagarithrac.c
libavcodec/utils.c
tests/fate/video.mak
Merged-by: Michael Niedermayer <michaelni@gmx.at>
This fixes standalone compilation of some decoders with --disable-optimizations.
cabac.h defines some inline functions that use symbols from cabac.c. Without
optimizations these inline functions are not eliminated and linking fails with
references to non-existing symbols.
Splitting the inline functions off into their own header and only #including
it in the places where the inline functions are used allows #including cabac.h
from anywhere without ill effects.
The reason is this is easier for PIC code (in particular on darwin...).
Keep the old names as pointers (static in cabac_functions.h so gcc
knows these are just immediate offsets) so the c code can nicely stay the same
(alternatively could use offsets directly in the functions needing the
tables). This should produce the same code as before with non-pic and better
code (confirmed) with pic.
The assembly uses the new table but still won't work for PIC case.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>