Ryan Houdek
7e8d734e43
AVX256: Initial fixes just to get my unittest working
...
This is the initial split to decouple AVX256 composed operations from
their MMX/SSE counterparts. This is to work around the subtle
differences with AVX/SSE zext/insert behaviour.
2024-07-11 18:43:31 -07:00
Ryan Houdek
3d90d1ab4f
InstcountCI: Update for vmovq fix
2024-07-11 18:34:06 -07:00
Ryan Houdek
3c7318d7c8
AVX128: Fixes vmovq loading too much data
...
This was doing a 128-bit load from memory and then a 64-bit zero extend
which looked like a spurious move but it was trying to match the
behaviour of vmovq where it needed the zero extend.
Also adds a unit test to ensure that we aren't loading too much data by
loading right up against a page boundary.
Fixes #3787
2024-07-11 18:34:05 -07:00
Ryan Houdek
fc0b233046
Merge pull request #3859 from neobrain/refactor_opdispatch_templates
...
OpcodeDispatcher: Replace hand-written wrapper templates with a generic utility
2024-07-11 18:18:23 -07:00
Mai
e25918d846
Merge pull request #3858 from Sonicadvance1/implement_nt_load
...
Implement support for SSE4.1/AVX NT loads
2024-07-11 14:22:41 -04:00
Tony Wasserka
b9829ed316
OpcodeDispatcher: Replace even more hand-written wrapper templates
2024-07-11 16:19:15 +02:00
Tony Wasserka
4ccec17676
OpcodeDispatcher: Replace more hand-written wrapper templates
2024-07-11 16:19:15 +02:00
Tony Wasserka
f45082043b
OpcodeDispatcher: Replace hand-written wrapper templates with a generic utility
2024-07-11 16:19:14 +02:00
Tony Wasserka
3222f13dde
Fix comment formatting
2024-07-11 16:19:14 +02:00
Mai
b282620a48
Merge pull request #3857 from Sonicadvance1/sve_bitperm
...
Arm64: Implement support for SVE bitperm
2024-07-11 05:05:41 -04:00
Ryan Houdek
3ff1ff8f74
InstcountCI: Update for svebitperm
2024-07-11 01:46:35 -07:00
Ryan Houdek
e24b01b6cb
Arm64: Implement support for SVE bitperm
2024-07-11 01:46:35 -07:00
Tony Wasserka
9a8694c2f3
Merge pull request #3853 from neobrain/refactor_warn_fixes
...
Fix all the warnings
2024-07-11 10:12:41 +02:00
Tony Wasserka
070a9148aa
Merge pull request #3852 from neobrain/refactor_opdispatch_codesize
...
OpcodeDispatcher: Avoid template monomorphization to reduce FEXLoader binary size
2024-07-11 09:58:49 +02:00
Tony Wasserka
f19fe3b6f3
Fix warning about an expression with side effects being passed to __builtin_assume
...
LOGMAN_THROW_AA_FMT has no benefit over LOGMAN_THROW_A_FMT here, so just use
the latter.
2024-07-11 09:54:31 +02:00
Tony Wasserka
8d2b15665d
Fix unused-variable warnings
2024-07-11 09:54:30 +02:00
Tony Wasserka
4dec8f22f8
Fix packed-non-pod warnings
2024-07-11 09:54:30 +02:00
Tony Wasserka
a39b3aca78
Fix invalid-offsetof warnings due to JsonAllocator not being standard layout
...
Inheritance can be used here instead, which allows the JsonAllocator to be
reconstructed using a downcast.
2024-07-11 09:54:30 +02:00
Tony Wasserka
5dc4ab062d
Fix invalid-offsetof warnings due to InternalThreadState not being standard layout
...
See https://github.com/llvm/llvm-project/issues/53021 for more information
about unique_ptr turning non-standard-layout.
2024-07-11 09:54:30 +02:00
Ryan Houdek
31f82c1d96
InstcountCI: Update for SVE NT load support
2024-07-10 23:07:58 -07:00
Ryan Houdek
548fd9daf8
OpcodeDispatcher: Implement support for SSE4.1 NT load
2024-07-10 23:07:37 -07:00
Ryan Houdek
f831f5a0e1
AVX128: Implement support for NT Load
2024-07-10 23:07:14 -07:00
Ryan Houdek
4c21aa2604
Arm64: Implement support for NT Loads with ASIMD fallback
2024-07-10 23:06:46 -07:00
Ryan Houdek
c9efb75714
CodeEmitter: Implement support for SVE NT loads
2024-07-10 23:06:19 -07:00
Ryan Houdek
5e56bdc0fd
InstcountCI: Add support for SVE bitperm
2024-07-10 21:48:37 -07:00
Ryan Houdek
3554d5c2f7
HostFeatures: Check for SVE bit permute extension
2024-07-10 21:45:07 -07:00
Mai
5fe405e1fb
Merge pull request #3855 from neobrain/fix_aotir_uniqueptr
...
AOTIR: Change std::unique_ptr to fextl::unique_ptr
2024-07-10 17:04:12 -04:00
Tony Wasserka
8381d44bbd
Merge pull request #3854 from neobrain/fix_default_delete
...
fextl: Properly handle nullptr arguments in fextl::default_delete
2024-07-10 23:00:52 +02:00
Tony Wasserka
56bb3744a5
AOTIR: Change std::unique_ptr to fextl::unique_ptr
2024-07-10 19:34:24 +02:00
Tony Wasserka
470b435afd
fextl: Properly handle nullptr arguments in fextl::default_delete
...
This reflects behavior of std::default_delete.
2024-07-10 19:17:50 +02:00
Tony Wasserka
441187470e
OpcodeDispatcher: Avoid monomorphization of some AVX functions
2024-07-10 17:01:30 +02:00
Tony Wasserka
59fd13cc2f
OpcodeDispatcher: Avoid monomorphization of even more functions
2024-07-10 17:01:30 +02:00
Tony Wasserka
c9e7bfdf16
OpcodeDispatcher: Avoid monomorphization of more functions
2024-07-10 17:01:30 +02:00
Tony Wasserka
2d700c381e
OpcodeDispatcher: Avoid monomorphization of large functions
2024-07-10 17:01:30 +02:00
Ryan Houdek
72d6c8ebd6
Merge pull request #3820 from alyssarosenzweig/ir/drop-deferred
...
Drop deferred flag infrastructure
2024-07-09 17:06:25 -07:00
Ryan Houdek
991c6941c1
Merge pull request #3849 from alyssarosenzweig/ir/drop-parser-2
...
Scripts: drop remnant of IR parser
2024-07-09 16:48:36 -07:00
Alyssa Rosenzweig
f974696e34
Scripts: drop remnant of IR parser
...
unused.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-07-09 16:08:38 -04:00
Alyssa Rosenzweig
3ef9ea94e5
Merge pull request #3848 from pmatos/FTSTX87Tests
...
Tests for X87 FTST
2024-07-09 09:10:29 -04:00
Paulo Matos
381ce23fd7
Tests for X87 FTST
2024-07-09 13:36:16 +02:00
Mai
af6a0be832
Merge pull request #3842 from Sonicadvance1/fix_f64_to_i32
...
VCVT{T,}PD2DQ fixes and optimization
2024-07-09 03:49:31 -04:00
Ryan Houdek
287fe5beac
InstcountCI: Update
2024-07-09 00:38:48 -07:00
Ryan Houdek
b9c214e6e8
OpcodeDispatcher: Use new IR op for vcvt{t,}pd2dq
...
Also fixes a bug where it was failing to zero the upper bits of the
destination register in the AVX128 implementation. Which the updated
unit tests now check against.
Fixes a minor precision issue that was reported in #2995 . We still don't
return correct values for overflow. x86 always returns maximum negative
int32_t on overflow, ARM will return maximum negative or positive
depending on sign of the double.
2024-07-09 00:38:47 -07:00
Ryan Houdek
d3d76aa8ce
IR: Adds new F64 -> I32 operation that changes behaviour depending on SVE
...
SVE added the ability to do F64 -> I32 conversions directly without an
fcvtn inbetween. So maybe sure to support them.
2024-07-09 00:38:47 -07:00
Ryan Houdek
3bea08da5f
Merge pull request #3843 from Sonicadvance1/remove_half_moves_fma3
...
Arm64: Remove one move if possible in FMA operations
2024-07-09 00:25:07 -07:00
Mai
7ccb252069
Merge pull request #3837 from Sonicadvance1/optimize_sve_vpgatherdq
...
AVX128: Extends 32-bit indexes path for 128-bit operations
2024-07-08 22:01:02 -04:00
Ryan Houdek
31547462bb
InstcountCI: Update for final SVE AVX128 improvements.
2024-07-08 18:44:07 -07:00
Ryan Houdek
b3a7a973a1
AVX128: Extends 32-bit indexes path for 128-bit operations
...
The codepath from #3826 was only targeting 256-bit sized operations.
This missed the vpgatherdq/vgatherdpd 128-bit operations. By extending
the codepath to understand 128-bit operations, we now hit these
instruction variants.
With this PR, we now have SVE128 codepaths that handle ALL variants of
x86 gather instructions! There are zero ASIMD fallbacks used in this
case!
Of course depending on the instruction, the performance still leaves a
lot to be desired, and there is no way to emulate x86 TSO behaviour
without an ASIMD fallback, which we will likely need to add as a
fallback at some point.
Based on #3836 until that is merged.
2024-07-08 18:44:07 -07:00
Mai
22b26696ba
Merge pull request #3836 from Sonicadvance1/optimize_sve_vpgatherdd
...
AVX128: Optimize the vpgatherdd/vgatherdps cases that would fall back to ASIMD
2024-07-08 21:43:36 -04:00
Ryan Houdek
495241f8ca
InstcountCI: Update for wide gather vpgatherdd SVE usage
2024-07-08 18:12:28 -07:00
Ryan Houdek
4afbfcae17
AVX128: Optimize the vpgatherdd/vgatherdps cases that would fall back to ASIMD
...
With the introduction of the wide gathers in #3828 this has opened new
avenues for optimizing these cases that would typically fall back to
ASIMD. In the cases that 32-bit SVE scaling doesn't fit, we can instead
sign extend the elements in to double-width address registers.
This then feeds naturally in to the SVE path even though we end up
needing to allocate 512-bits worth of address registers. This ends up
being significantly better than the ASIMD path still.
Relies on #3828 to be merged first
Fixes #3829
2024-07-08 18:12:28 -07:00