James Almer
0fbc7a2169
x86/float_dsp: remove usage of integer instructions
2017-05-12 23:34:49 -03:00
James Almer
f1d80bc630
x86/float_dsp: add ff_vector_fmul_reverse_avx2
...
~20% faster than AVX.
Signed-off-by: James Almer <jamrial@gmail.com>
2017-04-11 21:35:35 -03:00
James Almer
ed9b25a148
x86/float_dsp: add ff_vector_dmac_scalar_{sse2,avx,fma3}
2017-04-10 12:18:55 -03:00
Clément Bœsch
f291a9a1ad
Merge commit '99434f4df81b6801b2b535d5b9143305595784f6'
...
* commit '99434f4df81b6801b2b535d5b9143305595784f6':
float_dsp: Have implementation match function pointer prototype
Merged-by: Clément Bœsch <cboesch@gopro.com>
2017-03-30 10:23:25 +02:00
James Almer
c97e986e90
Merge commit '7911186ed616ae81dd8617d6d0e8b08c818db9d8'
...
* commit '7911186ed616ae81dd8617d6d0e8b08c818db9d8':
emms: Give apriv_emms_yasm() a more general name
Merged-by: James Almer <jamrial@gmail.com>
2017-03-23 18:28:56 -03:00
James Almer
29db87af52
Merge commit '6be7944ee2ec2f045e6eb9a93237e992c8b20ac4'
...
* commit '6be7944ee2ec2f045e6eb9a93237e992c8b20ac4':
x86: Add missing colons after assembly labels
Merged-by: James Almer <jamrial@gmail.com>
2017-03-23 18:05:27 -03:00
James Almer
d8962ffbd8
avutil/x86util: don't use movss in VBROADCASTSS macro when src and dst args are the same
...
Reviewed-by: Henrik Gramner <henrik@gramner.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2017-03-21 19:15:00 -03:00
Clément Bœsch
3898e346b3
Merge commit '07e1f99a1bb41d1a615676140eefc85cf69fa793'
...
* commit '07e1f99a1bb41d1a615676140eefc85cf69fa793':
x86util: Document SBUTTERFLY macro
Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 18:38:07 +01:00
Clément Bœsch
8200b16a9c
Merge commit 'd7bc52bf456deba0f32d9fe5c288ec441f1ebef5'
...
* commit 'd7bc52bf456deba0f32d9fe5c288ec441f1ebef5':
imgutils: add a function for copying image data from GPU mapped memory
Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 08:34:10 +01:00
James Darnley
5336887867
avcodec/h264: sse2, avx h luma mbaff deblock/loop filter
...
x86-64 only
Yorkfield:
- sse2: ~2.17x (434 vs. 200 cycles)
Nehalem:
- sse2: ~2.94x (409 vs. 139 cycles)
Skylake:
- sse2: ~3.10x (370 vs. 119 cycles)
- avx: ~3.29x (370 vs. 112 cycles)
2017-02-18 20:26:52 +01:00
James Darnley
7627df15d4
x86util: import MOVHL macro
...
Originally committed to x264 in 1637239a by Henrik Gramner who has
agreed to re-license it as LGPL. Original commit message follows.
x86: Avoid some bypass delays and false dependencies
A bypass delay of 1-3 clock cycles may occur on some CPUs when transitioning
between int and float domains, so try to avoid that if possible.
2017-02-18 20:26:51 +01:00
James Darnley
9d815b7424
avcodec/x86: deduplicate PASS8ROWS macro
2017-02-18 20:26:49 +01:00
James Almer
8d5df204d0
Merge commit '8e9cd81d291b1010c625b2766058aadf4affb537'
...
* commit '8e9cd81d291b1010c625b2766058aadf4affb537':
x86: cpu: Detect Conroe CPUs and their slow shuffle unit
Merged-by: James Almer <jamrial@gmail.com>
2017-01-31 15:20:54 -03:00
James Almer
2eab48177d
Merge commit '7d7355aa92bb36ca0765c49a569a999bcb96f332'
...
* commit '7d7355aa92bb36ca0765c49a569a999bcb96f332':
x86: Add SSSE3_SLOW CPU flag and related convenience macros
Merged-by: James Almer <jamrial@gmail.com>
2017-01-31 15:17:19 -03:00
Henrik Gramner
cd09e3b349
x86inc: Avoid using eax/rax for storing the stack pointer
...
When allocating stack space with an alignment requirement that is larger
than the current stack alignment we need to store a copy of the original
stack pointer in order to be able to restore it later.
If we chose to use another register for this purpose we should not pick
eax/rax since it can be overwritten as a return value.
2017-01-09 16:00:29 +01:00
Diego Biurrun
99434f4df8
float_dsp: Have implementation match function pointer prototype
...
libavutil/x86/float_dsp_init.c(144) : warning C4028: formal parameter 1 different from declaration
libavutil/x86/float_dsp_init.c(144) : warning C4028: formal parameter 2 different from declaration
2016-11-03 17:43:55 +01:00
Michael Niedermayer
051517648b
avutil/x86/emms: Document the emms_c() vs alloc/free relation.
...
Reviewed-by: Andreas Cadhalpun <andreas.cadhalpun@googlemail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2016-10-23 13:02:37 +02:00
Diego Biurrun
7911186ed6
emms: Give apriv_emms_yasm() a more general name
2016-10-18 13:09:09 +02:00
Diego Biurrun
6be7944ee2
x86: Add missing colons after assembly labels
...
This fixes many warnings of the sort
warning: label alone on a line without a colon might be in error
2016-10-17 16:31:26 +02:00
Alexandra Hájková
07e1f99a1b
x86util: Document SBUTTERFLY macro
...
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2016-09-19 10:02:43 +02:00
Anton Khirnov
d7bc52bf45
imgutils: add a function for copying image data from GPU mapped memory
...
See https://software.intel.com/en-us/articles/copying-accelerated-video-decode-frame-buffers
2016-08-31 08:15:47 +02:00
Fiona Glaser
8e9cd81d29
x86: cpu: Detect Conroe CPUs and their slow shuffle unit
2016-07-20 18:43:28 +02:00
Diego Biurrun
7d7355aa92
x86: Add SSSE3_SLOW CPU flag and related convenience macros
2016-07-20 18:43:28 +02:00
James Almer
fd5e6a095f
x86util: Extend SPLATW for avx2
...
Integration to Libav by Josh de Kock <josh@itanimul.li>.
Signed-off-by: Alexandra Hájková <alexandra@khirnov.net>
2016-07-18 15:27:13 +02:00
Ronald S. Bultje
f0a2b6249b
vp9: add 16x16 idct avx2 (8-bit).
...
checkasm --bench, 10k runs, for *_add_${bpc}_${sub_idct}_${opt}, shows
that it's about 1.65x as fast as the AVX version for the full IDCT, and
similar speedups for the sub-IDCTs:
nop: 24.6
vp9_inv_dct_dct_16x16_add_8_1_c: 6444.8
vp9_inv_dct_dct_16x16_add_8_1_sse2: 638.6
vp9_inv_dct_dct_16x16_add_8_1_ssse3: 484.4
vp9_inv_dct_dct_16x16_add_8_1_avx: 661.2
vp9_inv_dct_dct_16x16_add_8_1_avx2: 311.5
vp9_inv_dct_dct_16x16_add_8_2_c: 6665.7
vp9_inv_dct_dct_16x16_add_8_2_sse2: 646.9
vp9_inv_dct_dct_16x16_add_8_2_ssse3: 455.2
vp9_inv_dct_dct_16x16_add_8_2_avx: 521.9
vp9_inv_dct_dct_16x16_add_8_2_avx2: 304.3
vp9_inv_dct_dct_16x16_add_8_4_c: 7022.7
vp9_inv_dct_dct_16x16_add_8_4_sse2: 647.4
vp9_inv_dct_dct_16x16_add_8_4_ssse3: 467.1
vp9_inv_dct_dct_16x16_add_8_4_avx: 446.1
vp9_inv_dct_dct_16x16_add_8_4_avx2: 297.0
vp9_inv_dct_dct_16x16_add_8_8_c: 6800.4
vp9_inv_dct_dct_16x16_add_8_8_sse2: 598.6
vp9_inv_dct_dct_16x16_add_8_8_ssse3: 465.7
vp9_inv_dct_dct_16x16_add_8_8_avx: 440.9
vp9_inv_dct_dct_16x16_add_8_8_avx2: 290.2
vp9_inv_dct_dct_16x16_add_8_16_c: 6626.6
vp9_inv_dct_dct_16x16_add_8_16_sse2: 599.5
vp9_inv_dct_dct_16x16_add_8_16_ssse3: 475.0
vp9_inv_dct_dct_16x16_add_8_16_avx: 469.9
vp9_inv_dct_dct_16x16_add_8_16_avx2: 286.4
2016-07-11 10:14:58 -04:00
Matthieu Bouron
9eb3da2f99
asm: FF_-prefix internal macros used in inline assembly
...
See merge commit '39d6d3618d48625decaff7d9bdbb45b44ef2a805'.
2016-06-27 17:21:18 +02:00
Clément Bœsch
8ef57a0d61
Merge commit '41ed7ab45fc693f7d7fc35664c0233f4c32d69bb'
...
* commit '41ed7ab45fc693f7d7fc35664c0233f4c32d69bb':
cosmetics: Fix spelling mistakes
Merged-by: Clément Bœsch <u@pkh.me>
2016-06-21 21:55:34 +02:00
Matt Oliver
5ca44ebd99
lavu/intmath.h: fix compilation with msvc10.
...
Signed-off-by: Matt Oliver <protogonoi@gmail.com>
2016-06-13 13:49:24 +10:00
James Almer
172af20852
x86/showcqt: use three operand format for some instructions
...
Fixes failures with yasm 1.1.0 and older
Signed-off-by: James Almer <jamrial@gmail.com>
2016-06-08 19:37:08 -03:00
James Almer
99b899483e
avutil/x86util: move haddps sse emulation from showcqt
...
Signed-off-by: James Almer <jamrial@gmail.com>
2016-06-08 14:18:00 -03:00
Diego Biurrun
1e9c5bf4c1
asm: FF_-prefix internal macros used in inline assembly
...
These warnings conflict with system macros on Solaris, producing
truckloads of warnings about macro redefinition.
2016-05-28 19:18:26 +02:00
Anton Mitrofanov
2fb1d17a5a
x86inc: Enable AVX emulation in additional cases
...
Allows emulation to work when dst is equal to src2 as long as the
instruction is commutative, e.g. `addps m0, m1, m0`.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-05-16 10:31:24 +02:00
Anton Mitrofanov
300fb0df84
x86inc: Improve handling of %ifid with multi-token parameters
...
The yasm/nasm preprocessor only checks the first token, which means that
parameters such as `dword [rax]` are treated as identifiers, which is
generally not what we want.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-05-16 10:31:20 +02:00
Anton Mitrofanov
8d02579fae
x86inc: Fix AVX emulation of some instructions
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-05-16 10:31:17 +02:00
Henrik Gramner
ba3eb745cc
x86inc: Fix AVX emulation of scalar float instructions
...
Those instructions are not commutative since they only change the first
element in the vector and leave the rest unmodified.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-05-16 10:31:13 +02:00
Vittorio Giovara
41ed7ab45f
cosmetics: Fix spelling mistakes
...
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2016-05-04 18:16:21 +02:00
Anton Mitrofanov
e428f3b30c
x86inc: Enable AVX emulation in additional cases
...
Allows emulation to work when dst is equal to src2 as long as the
instruction is commutative, e.g. `addps m0, m1, m0`.
2016-04-20 19:16:22 +02:00
Anton Mitrofanov
4bd5583ace
x86inc: Improve handling of %ifid with multi-token parameters
...
The yasm/nasm preprocessor only checks the first token, which means that
parameters such as `dword [rax]` are treated as identifiers, which is
generally not what we want.
2016-04-20 19:16:22 +02:00
Anton Mitrofanov
42be240ad6
x86inc: Fix AVX emulation of some instructions
2016-04-20 19:16:22 +02:00
Henrik Gramner
8dd3ee9ddd
x86inc: Fix AVX emulation of scalar float instructions
...
Those instructions are not commutative since they only change the first
element in the vector and leave the rest unmodified.
2016-04-20 19:16:22 +02:00
James Almer
70d685a77f
x86: use the new helper macros where useful
...
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: James Almer <jamrial@gmail.com>
2016-02-14 20:00:21 -03:00
James Almer
73a4589d4b
x86: add some more helper macros to check for slow cpuflags
...
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: James Almer <jamrial@gmail.com>
2016-02-14 20:00:17 -03:00
James Almer
be22bd32fe
x86/cpu: set avxslow cpuflag on btver2 CPUs
...
They are also slow when using 256 bit wide registers
Reviewed-by: Hendrik Leppkes <h.leppkes@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2016-02-07 16:39:21 -03:00
James Almer
b3b0ecee15
x86/emms: empty the mmx state unconditionally on supported targets
...
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: James Almer <jamrial@gmail.com>
2016-02-04 01:49:01 -03:00
Timothy Gu
44304ae322
all: Add missing header guards
2016-01-28 19:49:48 -08:00
James Almer
b624f0660b
x86: Add ymm_reg struct
...
Needed to declare 32-byte long constants
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2016-01-28 00:41:19 +01:00
Geza Lore
cc602061ee
x86inc: Add debug symbols indicating sizes of compiled functions
...
Some debuggers/profilers use this metadata to determine which function a
given instruction is in; without it they get can confused by local labels
(if you haven't stripped those). On the other hand, some tools are still
confused even with this metadata. e.g. this fixes `gdb`, but not `perf`.
Currently only implemented for ELF.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-01-23 20:46:28 +01:00
Henrik Gramner
002c47798d
x86inc: Avoid creating unnecessary local labels
...
The REP_RET workaround is only needed on old AMD cpus, and the labels clutter
up the symbol table and confuse debugging/profiling tools, so use EQU to
create SHN_ABS symbols instead of creating local labels. Furthermore, skip
the workaround completely in functions that definitely won't run on such cpus.
Note that EQU is just creating a local label when using nasm instead of yasm.
This is probably a bug, but at least it doesn't break anything.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-01-23 20:44:25 +01:00
Henrik Gramner
fd6ecac38e
x86inc: Simplify AUTO_REP_RET
...
cpuflags is never undefined any more, it's set to 0 instead.
Also fix an incorrect comment.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-01-23 20:43:39 +01:00
Henrik Gramner
5ca8e195e5
x86inc: Use more consistent indentation
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-01-23 20:42:59 +01:00