Alyssa Rosenzweig
1e709d1150
OpcodeDispatcher: add RecordX87 helper
...
calls will be generated.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-07-16 09:07:35 +02:00
Alyssa Rosenzweig
476ee0cd7d
IR: track whether x87 is used in header
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-07-16 09:07:35 +02:00
Ryan Houdek
b8e864ffdf
Merge pull request #3865 from Sonicadvance1/telemetry_atexit
...
Telemetry: Change how visibility of telemetry values work
2024-07-15 09:53:37 -07:00
Ryan Houdek
d79b7fcc49
Merge pull request #3808 from alyssarosenzweig/rclse/3
...
Try to delete RCLSE again
2024-07-12 20:38:06 -07:00
Ryan Houdek
b9a6caea8d
Merge pull request #3844 from Sonicadvance1/fix_vmovq
...
AVX128: Fixes vmovq loading too much data
2024-07-12 17:07:32 -07:00
Ryan Houdek
97a68cb643
Telemetry: Change how visibility of telemetry values work
...
Removes global initializer for telemetry values since their address is
visible and PIC relative code loading handles the address fetching for
us.
2024-07-12 03:18:23 -07:00
Ryan Houdek
870e395ac4
Merge pull request #3862 from Sonicadvance1/remove_atexit_logman
...
LogManager: Removes fextl::vector usage
2024-07-12 02:05:02 -07:00
Ryan Houdek
04592f82f5
Merge pull request #3861 from Sonicadvance1/remove_atexit_vdso
...
VDSO: Stop using a vector for a static
2024-07-12 02:04:25 -07:00
Ryan Houdek
5ef0db994d
VDSO: Stop using a vector for a static
...
This causes a global initializer that registers an atexit handler.
Be smarter, use an std::array and pass its data around using a span
instead.
Removes the global initializer and removes the atexit installation
2024-07-11 23:53:57 -07:00
Ryan Houdek
b523407a3e
LogManager: Removes fextl::vector usage
...
We never use more than one logging method at a time so this was
overengineered for what it is doing.
Instead only allow one handler for messages and throw messages each
which just is a pointer.
Removes a global initializer and an atexit handler being installed
2024-07-11 22:51:56 -07:00
Ryan Houdek
8021dc10a1
OpcodeDispatcher: Force noinline for the function call in the Bind helper
...
Clang was inlining a few of the functions it was calling. So force it
never to inline since this is supports to be a little shim trampoline
only.
2024-07-11 19:00:42 -07:00
Ryan Houdek
7e8d734e43
AVX256: Initial fixes just to get my unittest working
...
This is the initial split to decouple AVX256 composed operations from
their MMX/SSE counterparts. This is to work around the subtle
differences with AVX/SSE zext/insert behaviour.
2024-07-11 18:43:31 -07:00
Ryan Houdek
3c7318d7c8
AVX128: Fixes vmovq loading too much data
...
This was doing a 128-bit load from memory and then a 64-bit zero extend
which looked like a spurious move but it was trying to match the
behaviour of vmovq where it needed the zero extend.
Also adds a unit test to ensure that we aren't loading too much data by
loading right up against a page boundary.
Fixes #3787
2024-07-11 18:34:05 -07:00
Ryan Houdek
fc0b233046
Merge pull request #3859 from neobrain/refactor_opdispatch_templates
...
OpcodeDispatcher: Replace hand-written wrapper templates with a generic utility
2024-07-11 18:18:23 -07:00
Mai
e25918d846
Merge pull request #3858 from Sonicadvance1/implement_nt_load
...
Implement support for SSE4.1/AVX NT loads
2024-07-11 14:22:41 -04:00
Alyssa Rosenzweig
3a334c4585
Reapply "IR: drop RCLSE"
...
This reverts commit 78aee4d96e39c9ef6415a7dca21fd6b81dabe12e.
2024-07-11 13:21:14 -04:00
Alyssa Rosenzweig
8dae4bcd44
OpcodeDispatcher: drop stale comment
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-07-11 13:21:14 -04:00
Alyssa Rosenzweig
294f10fdd0
OpcodeDispatcher: reg cache mmx
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-07-11 13:21:14 -04:00
Tony Wasserka
b9829ed316
OpcodeDispatcher: Replace even more hand-written wrapper templates
2024-07-11 16:19:15 +02:00
Tony Wasserka
4ccec17676
OpcodeDispatcher: Replace more hand-written wrapper templates
2024-07-11 16:19:15 +02:00
Tony Wasserka
f45082043b
OpcodeDispatcher: Replace hand-written wrapper templates with a generic utility
2024-07-11 16:19:14 +02:00
Tony Wasserka
3222f13dde
Fix comment formatting
2024-07-11 16:19:14 +02:00
Mai
b282620a48
Merge pull request #3857 from Sonicadvance1/sve_bitperm
...
Arm64: Implement support for SVE bitperm
2024-07-11 05:05:41 -04:00
Ryan Houdek
e24b01b6cb
Arm64: Implement support for SVE bitperm
2024-07-11 01:46:35 -07:00
Tony Wasserka
9a8694c2f3
Merge pull request #3853 from neobrain/refactor_warn_fixes
...
Fix all the warnings
2024-07-11 10:12:41 +02:00
Tony Wasserka
070a9148aa
Merge pull request #3852 from neobrain/refactor_opdispatch_codesize
...
OpcodeDispatcher: Avoid template monomorphization to reduce FEXLoader binary size
2024-07-11 09:58:49 +02:00
Tony Wasserka
f19fe3b6f3
Fix warning about an expression with side effects being passed to __builtin_assume
...
LOGMAN_THROW_AA_FMT has no benefit over LOGMAN_THROW_A_FMT here, so just use
the latter.
2024-07-11 09:54:31 +02:00
Tony Wasserka
8d2b15665d
Fix unused-variable warnings
2024-07-11 09:54:30 +02:00
Ryan Houdek
548fd9daf8
OpcodeDispatcher: Implement support for SSE4.1 NT load
2024-07-10 23:07:37 -07:00
Ryan Houdek
f831f5a0e1
AVX128: Implement support for NT Load
2024-07-10 23:07:14 -07:00
Ryan Houdek
4c21aa2604
Arm64: Implement support for NT Loads with ASIMD fallback
2024-07-10 23:06:46 -07:00
Ryan Houdek
3554d5c2f7
HostFeatures: Check for SVE bit permute extension
2024-07-10 21:45:07 -07:00
Tony Wasserka
56bb3744a5
AOTIR: Change std::unique_ptr to fextl::unique_ptr
2024-07-10 19:34:24 +02:00
Alyssa Rosenzweig
a4f8bbff02
OpcodeDispatcher: reg cache avx high
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-07-10 11:36:18 -04:00
Alyssa Rosenzweig
cf5ab05b90
OpcodeDispatcher: reg cache AbridgedFTW
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-07-10 11:36:18 -04:00
Alyssa Rosenzweig
3a2ce240f9
OpcodeDispatcher: reg cache DF
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-07-10 11:36:18 -04:00
Alyssa Rosenzweig
1f01dd53f7
OpcodeDispatcher: reg cache fprs
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-07-10 11:36:18 -04:00
Alyssa Rosenzweig
72d41d70b6
OpcodeDispatcher: introduce GPR-only reg cache
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-07-10 11:36:18 -04:00
Alyssa Rosenzweig
42b5b1f64c
Core: partially flush register cache per instruction
...
This will mitigate problems later.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-07-10 11:36:18 -04:00
Alyssa Rosenzweig
2949bc211d
OpcodeDispatcher: thunk through FlushRegisterCache
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-07-10 11:36:18 -04:00
Tony Wasserka
441187470e
OpcodeDispatcher: Avoid monomorphization of some AVX functions
2024-07-10 17:01:30 +02:00
Tony Wasserka
59fd13cc2f
OpcodeDispatcher: Avoid monomorphization of even more functions
2024-07-10 17:01:30 +02:00
Tony Wasserka
c9e7bfdf16
OpcodeDispatcher: Avoid monomorphization of more functions
2024-07-10 17:01:30 +02:00
Tony Wasserka
2d700c381e
OpcodeDispatcher: Avoid monomorphization of large functions
2024-07-10 17:01:30 +02:00
Ryan Houdek
72d6c8ebd6
Merge pull request #3820 from alyssarosenzweig/ir/drop-deferred
...
Drop deferred flag infrastructure
2024-07-09 17:06:25 -07:00
Mai
af6a0be832
Merge pull request #3842 from Sonicadvance1/fix_f64_to_i32
...
VCVT{T,}PD2DQ fixes and optimization
2024-07-09 03:49:31 -04:00
Ryan Houdek
b9c214e6e8
OpcodeDispatcher: Use new IR op for vcvt{t,}pd2dq
...
Also fixes a bug where it was failing to zero the upper bits of the
destination register in the AVX128 implementation. Which the updated
unit tests now check against.
Fixes a minor precision issue that was reported in #2995 . We still don't
return correct values for overflow. x86 always returns maximum negative
int32_t on overflow, ARM will return maximum negative or positive
depending on sign of the double.
2024-07-09 00:38:47 -07:00
Ryan Houdek
d3d76aa8ce
IR: Adds new F64 -> I32 operation that changes behaviour depending on SVE
...
SVE added the ability to do F64 -> I32 conversions directly without an
fcvtn inbetween. So maybe sure to support them.
2024-07-09 00:38:47 -07:00
Ryan Houdek
3bea08da5f
Merge pull request #3843 from Sonicadvance1/remove_half_moves_fma3
...
Arm64: Remove one move if possible in FMA operations
2024-07-09 00:25:07 -07:00
Ryan Houdek
b3a7a973a1
AVX128: Extends 32-bit indexes path for 128-bit operations
...
The codepath from #3826 was only targeting 256-bit sized operations.
This missed the vpgatherdq/vgatherdpd 128-bit operations. By extending
the codepath to understand 128-bit operations, we now hit these
instruction variants.
With this PR, we now have SVE128 codepaths that handle ALL variants of
x86 gather instructions! There are zero ASIMD fallbacks used in this
case!
Of course depending on the instruction, the performance still leaves a
lot to be desired, and there is no way to emulate x86 TSO behaviour
without an ASIMD fallback, which we will likely need to add as a
fallback at some point.
Based on #3836 until that is merged.
2024-07-08 18:44:07 -07:00