9779 Commits

Author SHA1 Message Date
Ryan Houdek
c282239077
InstcountCI: Add SVE128 VEX_map3 2024-06-30 16:27:58 -07:00
Ryan Houdek
8d28a441ab
Arm64: Minor VBSL optimization with SVE128
This is a very minor performance change. On Cortex CPUs that support
SVE, they do movprfx+<instruction> fusion to remove two cycles and a
dependency from the backend.

This is a minor win to convert from ASIMD mov+bsl to SVE movprfx+bsl
because of this, saving two cycles and a dependency on Cortex A710 and
A715. This is slightly less of a win on Cortex-A720/A725 because it supports
zero-cycle vector register renames, but it is still a win on Cortex-X925
because that is an older core design that doesn't support zero-cycle
vector register renames.

Very silly little thing.
2024-06-30 16:22:29 -07:00
Ryan Houdek
5821054d91
Merge pull request #3789 from Sonicadvance1/avx128_minor_pshufb_opt
AVX128: Minor optimization to 256-bit vpshufb
2024-06-30 15:45:11 -07:00
Ryan Houdek
4626145374
Merge pull request #3792 from Sonicadvance1/avx128_fix_scalar_fma
AVX128: Fixes scalar FMA accidentally using vector wide
2024-06-30 15:36:09 -07:00
Ryan Houdek
a786d3621d
InstcountCI: Update for Scalar FMA 2024-06-30 14:36:56 -07:00
Ryan Houdek
1393dc2a5b
AVX128: Fixes scalar FMA accidentally using vector wide 2024-06-30 14:36:33 -07:00
Ryan Houdek
c4604465ba
InstcountCI: Update 2024-06-30 13:41:14 -07:00
Ryan Houdek
cffae9cb0f
AVX128: Minor optimization to 256-bit vpshufb 2024-06-30 13:41:03 -07:00
Ryan Houdek
cf24d3c33f
Merge pull request #3781 from Sonicadvance1/optimize_vmovlh
AVX128: Minor optimization to vmov{l,h}{ps,pd}
2024-06-29 23:15:53 -07:00
Ryan Houdek
cc0509c0f3
InstcountCI: Update 2024-06-29 19:27:39 -07:00
Ryan Houdek
ebfa65fedc
AVX128: Minor optimization to vmov{l,h}{ps,pd} 2024-06-29 19:27:16 -07:00
Ryan Houdek
76f3391ebc
Merge pull request #3779 from Sonicadvance1/cpuinfo_cyclecounter
Linux: Calculate cycle counter frequency for cpuinfo
2024-06-29 11:58:32 -07:00
Ryan Houdek
be6ff52709
Linux: Calculate cycle counter frequency for cpuinfo
Some applications don't measure rdtsc correctly and instead use cpuinfo
to get the CPU core's base clock speed. Which for most x86 CPUs is the
base clock speed which also matches their cycle counter speed.

Did this as a quick test to see if this would help `Unbound: Worlds
Apart` stuttering while BinaryNinja was disassembling the binary.

Turns out the game doesn't use cpuinfo for its cycle counter speed
determination, but it is good to implement this regardless.
2024-06-28 16:38:49 -07:00
Ryan Houdek
e99e252188
Merge pull request #3731 from Sonicadvance1/avx_5
HostFeatures: Always disable AVX in 32-bit mode to protect from stack overflows
2024-06-28 13:37:55 -07:00
Ryan Houdek
98b980f7e3
TestHarnessRunner: Ensure we are still reconstructing XMM registers if we don't support AVX
Also fixes a bug where we were destroying the thread context before
reading the data from it, spooky.
2024-06-28 13:05:52 -07:00
Ryan Houdek
f2f90eeb82
FEXCore: Make more distinctions between host register size and guest vector register size
We can support a few combinations of guest and host vector sizes
Host: 128-bit or 256-bit
Guest: 128-bit or 256-bit

The typical case is Host = 128-bit and Guest = 256-bit now that AVX is
implemented.
On 32-bit this changes to Host=128-bit and Guest=128-bit because we
disable AVX.

In the vixl simulator 32-bit turns in to Host=256-bit and Guest=128-bit.
And then in the vixl sim 64-bit turns in to Host=256-bit and
Guest=256-bit.

We cover all four combinations of guest and host vector register sizes!

Fixes a few assumptions that SVE256 = AVX256 basically.
2024-06-28 13:05:52 -07:00
Ryan Houdek
f267fd2250
HostFeatures: Always disable AVX in 32-bit mode to protect from stack overflows 2024-06-28 13:05:52 -07:00
Ryan Houdek
500ad34769
Merge pull request #3778 from pmatos/LargeX87Blocks
Largest x87 blocks of code from games
2024-06-28 09:40:19 -07:00
Ryan Houdek
1700d54012
Merge pull request #3776 from Sonicadvance1/fix_vsib_invalid_index 2024-06-28 08:43:24 -07:00
Paulo Matos
70d8a10484 Largest x87 blocks of code from games 2024-06-28 16:50:58 +02:00
Ryan Houdek
9e94784e26
unittests: Adds test for xmm4 VSIB bug 2024-06-27 20:55:30 -07:00
Ryan Houdek
4060f4018e
Frontend: Fixes invalid VSIB Index problem
In regular SIB land the index register encoding of 0b100 encodes to "no
register", this feature lets you get SIB encodings without an index
register for flexibility.

In VSIB encoding this isn't expected behaviour and instead there are no
encodings where an index register is missing. Allowing you to encode all
sixteen registers as an index register.

This was causing an abort in `AVX128_LoadVSIB` because the index turned
in to an invalid register.

Working instruction:
`vgatherdps ymm2, dword [eax+ymm5*4], ymm7`

Broken instruction:
`vgatherdps ymm0, dword [eax+ymm4*4], ymm7`

This fixes a crash in libfmod where it is using gathers in the wild.
Fixing a crash in Ender Lilies.
2024-06-27 20:55:30 -07:00
Ryan Houdek
739ac0f18f
Merge pull request #3775 from Sonicadvance1/avx_bugfixes
AVX128: Some quick bugfixes
2024-06-27 17:44:12 -07:00
Ryan Houdek
98d62a7eb1
InstcountCI: Update 2024-06-27 17:21:12 -07:00
Ryan Houdek
aba7a3a830
AVX128: Fixes vblendps lower and upper selector 2024-06-27 17:20:39 -07:00
Ryan Houdek
9027d1eee7
AVX128: Fixes bug in vector immediate shift 2024-06-27 16:22:14 -07:00
Ryan Houdek
4e5da4946d
Merge pull request #3773 from bylaws/win-fixes
Windows: Small fixes for compat with newer toolchains/wine versions
2024-06-27 15:14:20 -07:00
Billy Laws
a70e3e42b2 FEXCore: Drop unneeded MinGW library naming workaround
It's generally expected for libraries to use the .a suffix with MinGW,
and DLLs are still correctly named without the prior special handling.
2024-06-27 23:01:21 +01:00
Billy Laws
09f476924f FEXCore: Fix missing return in win32 SetSignalMask path 2024-06-27 23:01:21 +01:00
Billy Laws
230e3245fd FileLoading: Fix compilation with newer libc++ 2024-06-27 23:01:21 +01:00
Billy Laws
8de876daf2 Windows: Use newer wine unixcall API
__wine_unix_call is no longer exported in recent wine versions.
2024-06-27 23:01:19 +01:00
Ryan Houdek
53b1d155cc
Merge pull request #3772 from Sonicadvance1/fix_addrsize_override
FEXCore: Fixes address size override on GPR sources and destinations
2024-06-27 15:01:08 -07:00
Ryan Houdek
b0eb63ab9a
FEXCore: Fixes address size override on GPR sources and destinations
When the source or destination is a register, the address size override
doesn't apply. We were accidentally applying it on all sources
regardless of type which was causing us to zero extend on operations
that aren't affected by address size override.

This fixes the OpenSSL cert error in every application, but most
importantly Steam.
2024-06-27 14:12:01 -07:00
Ryan Houdek
2e3242682d
Merge pull request #3771 from alyssarosenzweig/opt/asimd-masked
OpcodeDispatcher: optimize nzcv with asimd masked load/store
2024-06-27 10:27:10 -07:00
Ryan Houdek
ad4d4c9e67
Merge pull request #3770 from alyssarosenzweig/opt/vzeroall
Tiny opt for vzeroall
2024-06-27 10:25:35 -07:00
Alyssa Rosenzweig
3250d4e405 InstCountCI: Update
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-06-27 10:37:11 -04:00
Alyssa Rosenzweig
196a0531e0 OpcodeDispatcher: optimize nzcv with asimd masked load/store
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-06-27 10:37:06 -04:00
Alyssa Rosenzweig
e61cb5b2c3 InstCountCI: Update
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-06-27 10:30:45 -04:00
Alyssa Rosenzweig
f9b53c6b51 AVX_128: save a move in vzeroall
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-06-27 10:30:25 -04:00
Mai
58e949e148
Merge pull request #3769 from Sonicadvance1/avx2_cpuid
CPUID: Oops, forgot to enable AVX2
2024-06-26 21:17:44 -04:00
Ryan Houdek
dad47b7bda
CPUID: Oops, forgot to enable AVX2 2024-06-26 17:43:56 -07:00
Ryan Houdek
e519bf5978
Merge pull request #3768 from Sonicadvance1/avx128_letsgo
AVX128: Enable all the things
2024-06-26 17:40:21 -07:00
Ryan Houdek
fc50e52157
InstCountCI: Adds AVX128 tests 2024-06-26 16:49:00 -07:00
Ryan Houdek
7669df0e16
InstCountCI: SVE256: Fixes behaviour change 2024-06-26 16:49:00 -07:00
Ryan Houdek
4d56fec5f1
AVX128: Work around glibc fault testing 2024-06-26 16:49:00 -07:00
Ryan Houdek
8181552b16
AVX128: Actually install AVX helpers per thread.
How this didn't break the world in my testing I don't know.
2024-06-26 16:49:00 -07:00
Ryan Houdek
c6c147daf6
unittests: Updates vcvtps2ph test for failure case of writing too much memory. 2024-06-26 16:49:00 -07:00
Ryan Houdek
975069825e
AVX128: Fix a real bug with VCVTPS2PH 2024-06-26 16:49:00 -07:00
Ryan Houdek
5133f480d1
InstcountCI: Update for xsave/xrstor behaviour changes with AVX 2024-06-26 16:49:00 -07:00
Ryan Houdek
ce4b252e5c
InstCountCI: Stop disabling AVX if SVE256 is disabled. 2024-06-26 15:06:03 -07:00