Commit Graph

469 Commits

Author SHA1 Message Date
James Almer
0fbc7a2169 x86/float_dsp: remove usage of integer instructions 2017-05-12 23:34:49 -03:00
James Almer
f1d80bc630 x86/float_dsp: add ff_vector_fmul_reverse_avx2
~20% faster than AVX.

Signed-off-by: James Almer <jamrial@gmail.com>
2017-04-11 21:35:35 -03:00
James Almer
ed9b25a148 x86/float_dsp: add ff_vector_dmac_scalar_{sse2,avx,fma3} 2017-04-10 12:18:55 -03:00
Clément Bœsch
f291a9a1ad Merge commit '99434f4df81b6801b2b535d5b9143305595784f6'
* commit '99434f4df81b6801b2b535d5b9143305595784f6':
  float_dsp: Have implementation match function pointer prototype

Merged-by: Clément Bœsch <cboesch@gopro.com>
2017-03-30 10:23:25 +02:00
James Almer
c97e986e90 Merge commit '7911186ed616ae81dd8617d6d0e8b08c818db9d8'
* commit '7911186ed616ae81dd8617d6d0e8b08c818db9d8':
  emms: Give apriv_emms_yasm() a more general name

Merged-by: James Almer <jamrial@gmail.com>
2017-03-23 18:28:56 -03:00
James Almer
29db87af52 Merge commit '6be7944ee2ec2f045e6eb9a93237e992c8b20ac4'
* commit '6be7944ee2ec2f045e6eb9a93237e992c8b20ac4':
  x86: Add missing colons after assembly labels

Merged-by: James Almer <jamrial@gmail.com>
2017-03-23 18:05:27 -03:00
James Almer
d8962ffbd8 avutil/x86util: don't use movss in VBROADCASTSS macro when src and dst args are the same
Reviewed-by: Henrik Gramner <henrik@gramner.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2017-03-21 19:15:00 -03:00
Clément Bœsch
3898e346b3 Merge commit '07e1f99a1bb41d1a615676140eefc85cf69fa793'
* commit '07e1f99a1bb41d1a615676140eefc85cf69fa793':
  x86util: Document SBUTTERFLY macro

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 18:38:07 +01:00
Clément Bœsch
8200b16a9c Merge commit 'd7bc52bf456deba0f32d9fe5c288ec441f1ebef5'
* commit 'd7bc52bf456deba0f32d9fe5c288ec441f1ebef5':
  imgutils: add a function for copying image data from GPU mapped memory

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 08:34:10 +01:00
James Darnley
5336887867 avcodec/h264: sse2, avx h luma mbaff deblock/loop filter
x86-64 only

Yorkfield:
- sse2: ~2.17x (434 vs. 200 cycles)

Nehalem:
- sse2: ~2.94x (409 vs. 139 cycles)

Skylake:
- sse2: ~3.10x (370 vs. 119 cycles)
- avx:  ~3.29x (370 vs. 112 cycles)
2017-02-18 20:26:52 +01:00
James Darnley
7627df15d4 x86util: import MOVHL macro
Originally committed to x264 in 1637239a by Henrik Gramner who has
agreed to re-license it as LGPL.  Original commit message follows.

    x86: Avoid some bypass delays and false dependencies

    A bypass delay of 1-3 clock cycles may occur on some CPUs when transitioning
    between int and float domains, so try to avoid that if possible.
2017-02-18 20:26:51 +01:00
James Darnley
9d815b7424 avcodec/x86: deduplicate PASS8ROWS macro 2017-02-18 20:26:49 +01:00
James Almer
8d5df204d0 Merge commit '8e9cd81d291b1010c625b2766058aadf4affb537'
* commit '8e9cd81d291b1010c625b2766058aadf4affb537':
  x86: cpu: Detect Conroe CPUs and their slow shuffle unit

Merged-by: James Almer <jamrial@gmail.com>
2017-01-31 15:20:54 -03:00
James Almer
2eab48177d Merge commit '7d7355aa92bb36ca0765c49a569a999bcb96f332'
* commit '7d7355aa92bb36ca0765c49a569a999bcb96f332':
  x86: Add SSSE3_SLOW CPU flag and related convenience macros

Merged-by: James Almer <jamrial@gmail.com>
2017-01-31 15:17:19 -03:00
Henrik Gramner
cd09e3b349 x86inc: Avoid using eax/rax for storing the stack pointer
When allocating stack space with an alignment requirement that is larger
than the current stack alignment we need to store a copy of the original
stack pointer in order to be able to restore it later.

If we chose to use another register for this purpose we should not pick
eax/rax since it can be overwritten as a return value.
2017-01-09 16:00:29 +01:00
Diego Biurrun
99434f4df8 float_dsp: Have implementation match function pointer prototype
libavutil/x86/float_dsp_init.c(144) : warning C4028: formal parameter 1 different from declaration
libavutil/x86/float_dsp_init.c(144) : warning C4028: formal parameter 2 different from declaration
2016-11-03 17:43:55 +01:00
Michael Niedermayer
051517648b avutil/x86/emms: Document the emms_c() vs alloc/free relation.
Reviewed-by: Andreas Cadhalpun <andreas.cadhalpun@googlemail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2016-10-23 13:02:37 +02:00
Diego Biurrun
7911186ed6 emms: Give apriv_emms_yasm() a more general name 2016-10-18 13:09:09 +02:00
Diego Biurrun
6be7944ee2 x86: Add missing colons after assembly labels
This fixes many warnings of the sort
warning: label alone on a line without a colon might be in error
2016-10-17 16:31:26 +02:00
Alexandra Hájková
07e1f99a1b x86util: Document SBUTTERFLY macro
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2016-09-19 10:02:43 +02:00
Anton Khirnov
d7bc52bf45 imgutils: add a function for copying image data from GPU mapped memory
See https://software.intel.com/en-us/articles/copying-accelerated-video-decode-frame-buffers
2016-08-31 08:15:47 +02:00
Fiona Glaser
8e9cd81d29 x86: cpu: Detect Conroe CPUs and their slow shuffle unit 2016-07-20 18:43:28 +02:00
Diego Biurrun
7d7355aa92 x86: Add SSSE3_SLOW CPU flag and related convenience macros 2016-07-20 18:43:28 +02:00
James Almer
fd5e6a095f x86util: Extend SPLATW for avx2
Integration to Libav by Josh de Kock <josh@itanimul.li>.

Signed-off-by: Alexandra Hájková <alexandra@khirnov.net>
2016-07-18 15:27:13 +02:00
Ronald S. Bultje
f0a2b6249b vp9: add 16x16 idct avx2 (8-bit).
checkasm --bench, 10k runs, for *_add_${bpc}_${sub_idct}_${opt}, shows
that it's about 1.65x as fast as the AVX version for the full IDCT, and
similar speedups for the sub-IDCTs:

nop: 24.6
vp9_inv_dct_dct_16x16_add_8_1_c: 6444.8
vp9_inv_dct_dct_16x16_add_8_1_sse2: 638.6
vp9_inv_dct_dct_16x16_add_8_1_ssse3: 484.4
vp9_inv_dct_dct_16x16_add_8_1_avx: 661.2
vp9_inv_dct_dct_16x16_add_8_1_avx2: 311.5
vp9_inv_dct_dct_16x16_add_8_2_c: 6665.7
vp9_inv_dct_dct_16x16_add_8_2_sse2: 646.9
vp9_inv_dct_dct_16x16_add_8_2_ssse3: 455.2
vp9_inv_dct_dct_16x16_add_8_2_avx: 521.9
vp9_inv_dct_dct_16x16_add_8_2_avx2: 304.3
vp9_inv_dct_dct_16x16_add_8_4_c: 7022.7
vp9_inv_dct_dct_16x16_add_8_4_sse2: 647.4
vp9_inv_dct_dct_16x16_add_8_4_ssse3: 467.1
vp9_inv_dct_dct_16x16_add_8_4_avx: 446.1
vp9_inv_dct_dct_16x16_add_8_4_avx2: 297.0
vp9_inv_dct_dct_16x16_add_8_8_c: 6800.4
vp9_inv_dct_dct_16x16_add_8_8_sse2: 598.6
vp9_inv_dct_dct_16x16_add_8_8_ssse3: 465.7
vp9_inv_dct_dct_16x16_add_8_8_avx: 440.9
vp9_inv_dct_dct_16x16_add_8_8_avx2: 290.2
vp9_inv_dct_dct_16x16_add_8_16_c: 6626.6
vp9_inv_dct_dct_16x16_add_8_16_sse2: 599.5
vp9_inv_dct_dct_16x16_add_8_16_ssse3: 475.0
vp9_inv_dct_dct_16x16_add_8_16_avx: 469.9
vp9_inv_dct_dct_16x16_add_8_16_avx2: 286.4
2016-07-11 10:14:58 -04:00
Matthieu Bouron
9eb3da2f99 asm: FF_-prefix internal macros used in inline assembly
See merge commit '39d6d3618d48625decaff7d9bdbb45b44ef2a805'.
2016-06-27 17:21:18 +02:00
Clément Bœsch
8ef57a0d61 Merge commit '41ed7ab45fc693f7d7fc35664c0233f4c32d69bb'
* commit '41ed7ab45fc693f7d7fc35664c0233f4c32d69bb':
  cosmetics: Fix spelling mistakes

Merged-by: Clément Bœsch <u@pkh.me>
2016-06-21 21:55:34 +02:00
Matt Oliver
5ca44ebd99 lavu/intmath.h: fix compilation with msvc10.
Signed-off-by: Matt Oliver <protogonoi@gmail.com>
2016-06-13 13:49:24 +10:00
James Almer
172af20852 x86/showcqt: use three operand format for some instructions
Fixes failures with yasm 1.1.0 and older

Signed-off-by: James Almer <jamrial@gmail.com>
2016-06-08 19:37:08 -03:00
James Almer
99b899483e avutil/x86util: move haddps sse emulation from showcqt
Signed-off-by: James Almer <jamrial@gmail.com>
2016-06-08 14:18:00 -03:00
Diego Biurrun
1e9c5bf4c1 asm: FF_-prefix internal macros used in inline assembly
These warnings conflict with system macros on Solaris, producing
truckloads of warnings about macro redefinition.
2016-05-28 19:18:26 +02:00
Anton Mitrofanov
2fb1d17a5a x86inc: Enable AVX emulation in additional cases
Allows emulation to work when dst is equal to src2 as long as the
instruction is commutative, e.g. `addps m0, m1, m0`.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-05-16 10:31:24 +02:00
Anton Mitrofanov
300fb0df84 x86inc: Improve handling of %ifid with multi-token parameters
The yasm/nasm preprocessor only checks the first token, which means that
parameters such as `dword [rax]` are treated as identifiers, which is
generally not what we want.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-05-16 10:31:20 +02:00
Anton Mitrofanov
8d02579fae x86inc: Fix AVX emulation of some instructions
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-05-16 10:31:17 +02:00
Henrik Gramner
ba3eb745cc x86inc: Fix AVX emulation of scalar float instructions
Those instructions are not commutative since they only change the first
element in the vector and leave the rest unmodified.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-05-16 10:31:13 +02:00
Vittorio Giovara
41ed7ab45f cosmetics: Fix spelling mistakes
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2016-05-04 18:16:21 +02:00
Anton Mitrofanov
e428f3b30c x86inc: Enable AVX emulation in additional cases
Allows emulation to work when dst is equal to src2 as long as the
instruction is commutative, e.g. `addps m0, m1, m0`.
2016-04-20 19:16:22 +02:00
Anton Mitrofanov
4bd5583ace x86inc: Improve handling of %ifid with multi-token parameters
The yasm/nasm preprocessor only checks the first token, which means that
parameters such as `dword [rax]` are treated as identifiers, which is
generally not what we want.
2016-04-20 19:16:22 +02:00
Anton Mitrofanov
42be240ad6 x86inc: Fix AVX emulation of some instructions 2016-04-20 19:16:22 +02:00
Henrik Gramner
8dd3ee9ddd x86inc: Fix AVX emulation of scalar float instructions
Those instructions are not commutative since they only change the first
element in the vector and leave the rest unmodified.
2016-04-20 19:16:22 +02:00
James Almer
70d685a77f x86: use the new helper macros where useful
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: James Almer <jamrial@gmail.com>
2016-02-14 20:00:21 -03:00
James Almer
73a4589d4b x86: add some more helper macros to check for slow cpuflags
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: James Almer <jamrial@gmail.com>
2016-02-14 20:00:17 -03:00
James Almer
be22bd32fe x86/cpu: set avxslow cpuflag on btver2 CPUs
They are also slow when using 256 bit wide registers

Reviewed-by: Hendrik Leppkes <h.leppkes@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2016-02-07 16:39:21 -03:00
James Almer
b3b0ecee15 x86/emms: empty the mmx state unconditionally on supported targets
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: James Almer <jamrial@gmail.com>
2016-02-04 01:49:01 -03:00
Timothy Gu
44304ae322 all: Add missing header guards 2016-01-28 19:49:48 -08:00
James Almer
b624f0660b x86: Add ymm_reg struct
Needed to declare 32-byte long constants

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2016-01-28 00:41:19 +01:00
Geza Lore
cc602061ee x86inc: Add debug symbols indicating sizes of compiled functions
Some debuggers/profilers use this metadata to determine which function a
given instruction is in; without it they get can confused by local labels
(if you haven't stripped those). On the other hand, some tools are still
confused even with this metadata. e.g. this fixes `gdb`, but not `perf`.

Currently only implemented for ELF.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-01-23 20:46:28 +01:00
Henrik Gramner
002c47798d x86inc: Avoid creating unnecessary local labels
The REP_RET workaround is only needed on old AMD cpus, and the labels clutter
up the symbol table and confuse debugging/profiling tools, so use EQU to
create SHN_ABS symbols instead of creating local labels. Furthermore, skip
the workaround completely in functions that definitely won't run on such cpus.

Note that EQU is just creating a local label when using nasm instead of yasm.
This is probably a bug, but at least it doesn't break anything.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-01-23 20:44:25 +01:00
Henrik Gramner
fd6ecac38e x86inc: Simplify AUTO_REP_RET
cpuflags is never undefined any more, it's set to 0 instead.

Also fix an incorrect comment.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-01-23 20:43:39 +01:00
Henrik Gramner
5ca8e195e5 x86inc: Use more consistent indentation
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-01-23 20:42:59 +01:00