9758 Commits

Author SHA1 Message Date
Ryan Houdek
4060f4018e
Frontend: Fixes invalid VSIB Index problem
In regular SIB land the index register encoding of 0b100 encodes to "no
register", this feature lets you get SIB encodings without an index
register for flexibility.

In VSIB encoding this isn't expected behaviour and instead there are no
encodings where an index register is missing. Allowing you to encode all
sixteen registers as an index register.

This was causing an abort in `AVX128_LoadVSIB` because the index turned
in to an invalid register.

Working instruction:
`vgatherdps ymm2, dword [eax+ymm5*4], ymm7`

Broken instruction:
`vgatherdps ymm0, dword [eax+ymm4*4], ymm7`

This fixes a crash in libfmod where it is using gathers in the wild.
Fixing a crash in Ender Lilies.
2024-06-27 20:55:30 -07:00
Ryan Houdek
739ac0f18f
Merge pull request #3775 from Sonicadvance1/avx_bugfixes
AVX128: Some quick bugfixes
2024-06-27 17:44:12 -07:00
Ryan Houdek
98d62a7eb1
InstcountCI: Update 2024-06-27 17:21:12 -07:00
Ryan Houdek
aba7a3a830
AVX128: Fixes vblendps lower and upper selector 2024-06-27 17:20:39 -07:00
Ryan Houdek
9027d1eee7
AVX128: Fixes bug in vector immediate shift 2024-06-27 16:22:14 -07:00
Ryan Houdek
4e5da4946d
Merge pull request #3773 from bylaws/win-fixes
Windows: Small fixes for compat with newer toolchains/wine versions
2024-06-27 15:14:20 -07:00
Billy Laws
a70e3e42b2 FEXCore: Drop unneeded MinGW library naming workaround
It's generally expected for libraries to use the .a suffix with MinGW,
and DLLs are still correctly named without the prior special handling.
2024-06-27 23:01:21 +01:00
Billy Laws
09f476924f FEXCore: Fix missing return in win32 SetSignalMask path 2024-06-27 23:01:21 +01:00
Billy Laws
230e3245fd FileLoading: Fix compilation with newer libc++ 2024-06-27 23:01:21 +01:00
Billy Laws
8de876daf2 Windows: Use newer wine unixcall API
__wine_unix_call is no longer exported in recent wine versions.
2024-06-27 23:01:19 +01:00
Ryan Houdek
53b1d155cc
Merge pull request #3772 from Sonicadvance1/fix_addrsize_override
FEXCore: Fixes address size override on GPR sources and destinations
2024-06-27 15:01:08 -07:00
Ryan Houdek
b0eb63ab9a
FEXCore: Fixes address size override on GPR sources and destinations
When the source or destination is a register, the address size override
doesn't apply. We were accidentally applying it on all sources
regardless of type which was causing us to zero extend on operations
that aren't affected by address size override.

This fixes the OpenSSL cert error in every application, but most
importantly Steam.
2024-06-27 14:12:01 -07:00
Ryan Houdek
2e3242682d
Merge pull request #3771 from alyssarosenzweig/opt/asimd-masked
OpcodeDispatcher: optimize nzcv with asimd masked load/store
2024-06-27 10:27:10 -07:00
Ryan Houdek
ad4d4c9e67
Merge pull request #3770 from alyssarosenzweig/opt/vzeroall
Tiny opt for vzeroall
2024-06-27 10:25:35 -07:00
Alyssa Rosenzweig
3250d4e405 InstCountCI: Update
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-06-27 10:37:11 -04:00
Alyssa Rosenzweig
196a0531e0 OpcodeDispatcher: optimize nzcv with asimd masked load/store
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-06-27 10:37:06 -04:00
Alyssa Rosenzweig
e61cb5b2c3 InstCountCI: Update
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-06-27 10:30:45 -04:00
Alyssa Rosenzweig
f9b53c6b51 AVX_128: save a move in vzeroall
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-06-27 10:30:25 -04:00
Mai
58e949e148
Merge pull request #3769 from Sonicadvance1/avx2_cpuid
CPUID: Oops, forgot to enable AVX2
2024-06-26 21:17:44 -04:00
Ryan Houdek
dad47b7bda
CPUID: Oops, forgot to enable AVX2 2024-06-26 17:43:56 -07:00
Ryan Houdek
e519bf5978
Merge pull request #3768 from Sonicadvance1/avx128_letsgo
AVX128: Enable all the things
2024-06-26 17:40:21 -07:00
Ryan Houdek
fc50e52157
InstCountCI: Adds AVX128 tests 2024-06-26 16:49:00 -07:00
Ryan Houdek
7669df0e16
InstCountCI: SVE256: Fixes behaviour change 2024-06-26 16:49:00 -07:00
Ryan Houdek
4d56fec5f1
AVX128: Work around glibc fault testing 2024-06-26 16:49:00 -07:00
Ryan Houdek
8181552b16
AVX128: Actually install AVX helpers per thread.
How this didn't break the world in my testing I don't know.
2024-06-26 16:49:00 -07:00
Ryan Houdek
c6c147daf6
unittests: Updates vcvtps2ph test for failure case of writing too much memory. 2024-06-26 16:49:00 -07:00
Ryan Houdek
975069825e
AVX128: Fix a real bug with VCVTPS2PH 2024-06-26 16:49:00 -07:00
Ryan Houdek
5133f480d1
InstcountCI: Update for xsave/xrstor behaviour changes with AVX 2024-06-26 16:49:00 -07:00
Ryan Houdek
ce4b252e5c
InstCountCI: Stop disabling AVX if SVE256 is disabled. 2024-06-26 15:06:03 -07:00
Ryan Houdek
031d56de35
HostFeatures: Enables AVX unconditionally 2024-06-26 15:03:21 -07:00
Ryan Houdek
3cdaf6736b
InstcountCI: Update for SVE256 FMA implementation 2024-06-26 14:56:01 -07:00
Ryan Houdek
b5e696b3cb
CPUID: Implement support for XCR0 when AVX is enabled
This enables AVX, AVX2, FMA3 for the entire CPUID!

```bash
$ FEX_HOSTFEATURES=enableavx,enableavx2 ./Bin/FEXInterpreter /usr/bin/cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 23
model name      : Cortex-A78AE
stepping        : 0
microcode       : 0x0
cpu MHz         : 3000
cache size      : 512 KB
physical id     : 0
siblings        : 12
core id         : 0
cpu cores       : 12
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 22
wp              : yes
flags           : fpu vme tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht tm syscall nx mmxext fxsr_opt rdtscp lm 3dnow 3dnowext constant_tsc art rep_good nopl xtoplogy nonstop_tsc cpuid tsc_known_freq pni pclmulqdq dtes64 monitor tm2 ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx hypervisor lahf_lm cmp_legacy extapic abm 3dnowprefetc
h tce fsgsbase bmi1 avx2 smep bmi2 erms invpcid adx clflushopt clwb sha_ni clzero arat vpclmulqdq rdpid fsrm
bugs            :
bogomips        : 8000.0
TLB size        : 2560 4K pages
clflush size    : 64
cache_alignment  : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:
```

Notice avx, avx2, and fma
2024-06-26 14:56:01 -07:00
Ryan Houdek
43aef377d7
HostFeatures: Allow enabling AVX without SVE256 2024-06-26 14:56:01 -07:00
Ryan Houdek
add0e7a8db
HostFeatures: Removes distinction between AVX and AVX2
We now no longer care about AVX versions, consolidate them in to a
single config option which enables both.
2024-06-26 14:56:01 -07:00
Ryan Houdek
52e541d453
Unittests: Stop using AVX2 flag 2024-06-26 14:56:01 -07:00
Mai
a031a49546
Merge pull request #3767 from Sonicadvance1/avx128_fix_wide_shift
AVX128: Fixes wide shifts
2024-06-26 17:29:09 -04:00
Alyssa Rosenzweig
4d821b8dd8
Merge pull request #3765 from Sonicadvance1/avx128_f16c
AVX128: F16C support
2024-06-26 17:25:05 -04:00
Ryan Houdek
f277025c9a
AVX128: Fixes wide shifts
During refactoring this was missed and rerunning unittests locally
caught it. 256-bit operations get their shift only from the lower half
of the vector register.
2024-06-26 14:16:39 -07:00
Ryan Houdek
ba28e6f82e
unittests: Adds vcvtps2ph tests that use mxcsr 2024-06-26 14:08:20 -07:00
Ryan Houdek
3a89df9bed
AVX128: Implement support for F16C 2024-06-26 14:05:12 -07:00
Ryan Houdek
f6a0866fbb
IR: Split Vector_FToF2 in to VFCVTL2 and VCVTFN2
I forgot in the narrowing case we need to be careful about insert. No IR
op used Vector_FToF2 with narrowing.
2024-06-26 14:03:41 -07:00
Ryan Houdek
756fa2ecc5
Merge pull request #3766 from alyssarosenzweig/opt/f16c-round
Optimize vcvtps2ph
2024-06-26 14:03:24 -07:00
Alyssa Rosenzweig
cf834aa6da InstCountCI: Update
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-06-26 16:46:21 -04:00
Alyssa Rosenzweig
d2324f4a93 OpcodeDispatcher: optimize vcvtps2ph
We can avoid a LOT of pointless work with some dedicated IR ops for specifically
overriding the round mode.

Small behaviour change here: we no longer reset FTZ. I think this is a bug fix?
But if it's not it's not hard to fix.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-06-26 16:46:21 -04:00
Ryan Houdek
6226c7f4f3
Merge pull request #3757 from Sonicadvance1/avx_16
AVX128: Implement support for gathers
2024-06-26 13:29:58 -07:00
Ryan Houdek
991ecd558e InstcountCI: Update for SVE256 gathers! 2024-06-26 16:00:53 -04:00
Ryan Houdek
a4fa3a460e OpcodeDispatcher: Implement AVX gathers with SVE256
Just to ensure we still have feature parity.
2024-06-26 16:00:53 -04:00
Ryan Houdek
77ba708933 AVX128: Implement support for gather load instructions
This is the last family of instructions that we needed to implement for
AVX2 to be properly advertised!
2024-06-26 16:00:53 -04:00
Ryan Houdek
662d50a966 X86Tables: Describe VPGather in the VEX tables 2024-06-26 16:00:53 -04:00
Ryan Houdek
5472d1cc04 Arm64: Implement VLoadVectorGatherMasked operation
This does a gather load three ways, SVE256, SVE128, and ASIMD.

This operation is a bit special since it it can't quite handle all
gather loadstores in the 256-bit case and requires the frontend to
decompose the operation in the case that the striding hits a mode that
SVE doesn't support!

The 128-bit case is a lot simpler since both support all the cases where
stride doesn't match. I find this to be a nice compromise while there
aren't any SVE256 products on the market.

In the 128-bit case there is an SVE path which is utilized if the passed
in stride supports what SVE understands, otherwise it falls back to an
ASIMD implementation which manually emulates everything that is
necessary.

This instruction is very explicitly doing basically exactly what AVX
gather instructions want, because it's complex enough that we don't want
to try and make this a generic solution.
2024-06-26 16:00:53 -04:00