Ryan Houdek
6ec628fa31
Merge pull request #3433 from bylaws/arm64ec-pt1
...
Arm64Emitter: Introduce ARM64EC SRA mappings
2024-02-23 14:48:43 -08:00
Alyssa Rosenzweig
5378ae2e76
Merge pull request #3436 from alyssarosenzweig/ir/af-simplify
...
Simplify CalculateAF
2024-02-22 08:17:07 -04:00
Ryan Houdek
d4be2dc636
Merge pull request #3434 from bylaws/arm64ec-pt3
...
FEXCore: Expose AbsoluteLoopTopAddress to the frontend
2024-02-21 14:31:04 -08:00
Alyssa Rosenzweig
2bcd285851
Merge pull request #3430 from Sonicadvance1/tsc_scale
...
Implement small TSC scaling
2024-02-21 13:16:27 -04:00
Alyssa Rosenzweig
8762bc1fa3
OpcodeDispatcher: simplify CalculateAF signature
...
- Res is unused
- SrcSize doesn't matter since we ignore the high bits, might as well always use
32-bit, it doesn't matter
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-21 12:48:15 -04:00
Billy Laws
5b4162b712
FEXCore: Expose AbsoluteLoopTopAddress to the frontend
...
ARM64EC has a shared SRA mapping between ARM64 and X64 code, so there
needs to be a public way to enter the dispatcher without refilling SRA
from the in-memory context struct.
2024-02-21 11:46:24 +00:00
Billy Laws
cb5c07f4b1
Arm64Emitter: Introduce ARM64EC SRA mappings
...
See https://learn.microsoft.com/en-us/cpp/build/arm64ec-windows-abi-conventions?view=msvc-170
note that since mm registers are volatile there is no need to match the
mapping for them when in JIT, so they can be used as scratch regs.
Disallowed regs are also wiped on context switches, so they cannot be
taken advantage of to e.g. avoid spilling.
2024-02-21 11:18:10 +00:00
Ryan Houdek
b902b8edab
Implement small TSC scaling
...
Games engines are expecting >1Ghz cycle counters. Scale them to work
around the issue.
Resolves the excessive busy waiting in Unreal Engine 5 games.
2024-02-20 12:05:44 -08:00
Alyssa Rosenzweig
0503c89ff6
OpcodeDispatcher: use NZCV update helpers
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-19 14:12:54 -04:00
Alyssa Rosenzweig
6dd410698a
OpcodeDispatcher: add helpers for updating NZCV metadata
...
to reduce error-prone copypaste
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-19 14:12:54 -04:00
Ryan Houdek
808ced455d
FEXCore: Add a frontend pointer to InternalThreadState
...
FEXCore is guaranteed to not touch this pointer and can be used by
frontends to store thread-specific data.
2024-02-15 02:06:16 -08:00
Ryan Houdek
9cab746aa7
Merge pull request #3407 from neobrain/feature_libfwd_arguments_on_guest_stack
...
Library Forwarding: Allocate packed arguments on the guest stack if needed
2024-02-12 16:31:34 -08:00
Alyssa Rosenzweig
68232366e4
OpcodeDispatcher: don't mask add/sub sources
...
not needed in the new approach
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-12 12:36:28 -04:00
Alyssa Rosenzweig
d7ff1b78fb
IR: handle 8/16-bit AddNZCV/SubNZCV
...
we can do it more effectively than the current s/w lowering.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-12 12:36:09 -04:00
Mai
780b48620b
Merge pull request #3420 from Sonicadvance1/preserve_all_3419
...
Fix #3419
2024-02-10 23:24:38 -05:00
Ryan Houdek
4a0878fa92
Fix #3419
2024-02-10 19:55:51 -08:00
Ryan Houdek
df3d6938ae
Merge pull request #3410 from alyssarosenzweig/opt/nzcv-pass-2
...
Add NZCV+PF/AF optimization pass
2024-02-10 05:03:12 -08:00
Ryan Houdek
ba41da7da0
Merge pull request #3414 from Sonicadvance1/fix_one_mutex_hang
...
Fixes one mutex hang
2024-02-09 05:54:40 -08:00
Ryan Houdek
2480bab409
Fixes one mutex hang
...
When code invalidation is happening we currently have the issue that a
thread can acquire the code invalidation mutex in the middle of
invalidation. This is due to us acquiring and releasing the mutex
between each thread's code invalidation.
We need to hold the mutex for the entire duration for all thread's code
invalidation.
This fixes a rare hang on proton startup and resolves a consistent hang
on Proton application shutdown.
This now puts us on par with FEX-2312.1 with hanging.
This does not fix a relatively rare hang on fork (which also existed with FEX-2312.1).
This also does not fix the issue that the intersection of our mutexes
between frontend and backend are very convoluted. In part of the work
that is going to fix the rare fork mutex hang will change more of this.
2024-02-08 18:18:00 -08:00
Alyssa Rosenzweig
ad7202e7d7
OpcodeDispatcher: optimize test -1
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-08 14:10:13 -04:00
Alyssa Rosenzweig
175a57dd27
OpcodeDispatcher: emit AndWithFlags directly for primary alu
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-06 14:22:28 -04:00
Alyssa Rosenzweig
e2ce60148c
OpcodeDispatcher: emit AndWithFlags directly for 2ndary alu
...
rely on opt pass to drop the flags.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-06 14:22:28 -04:00
Alyssa Rosenzweig
99660129f3
IR: implement AndWithFlags for 8/16-bit
...
easier to deal with in the JIT
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-06 14:22:28 -04:00
Alyssa Rosenzweig
308d9a751c
RedundantFlagCalculationElimination: optimize rmif
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-06 13:06:26 -04:00
Alyssa Rosenzweig
4bd28c0ed8
RedundantFlagCalculationElimination: optimize condaddnzcv
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-06 13:06:26 -04:00
Alyssa Rosenzweig
8397f3ac99
RedundantFlagCalculationElimination: refine AXFLAG
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-06 13:06:26 -04:00
Alyssa Rosenzweig
0452bc7212
RedundantFlagCalculationElimination: optimize condjump
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-06 13:06:26 -04:00
Alyssa Rosenzweig
3d7ed89ffb
RedundantFlagCalculationElimination: optimize NZCVSelect
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-06 13:06:25 -04:00
Alyssa Rosenzweig
23ab0a978e
RedundantFlagCalculationElimination: also handle InvalidateFlags
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-06 13:06:25 -04:00
Alyssa Rosenzweig
7f47a9ef0e
IR: add local dead flag elimination pass
...
RCLSE ignores NZCV and doesn't optimize stores which doesn't help us with PF/AF
either. So, we add a new pass for dead flag elimination (cannibalizing the old
and broken dead flag elimination pass). This is a simple local optimizer that
walks each block backwards, converging in linear time & constant space in a
single iteration.
Right now, it doesn't do a ton (other than a nice reduction in silliness in
the hot Sonic block), but it provides the framework to fuse comparisons.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-06 13:06:25 -04:00
Alyssa Rosenzweig
4331753ca0
Merge pull request #3408 from alyssarosenzweig/opt/tst
...
Optimize TST
2024-02-06 11:28:02 -04:00
Paulo Matos
fa8bcfd67a
Clean up access to possible nullptr
...
Patch suggested by @Sonicadvance1
2024-02-06 12:31:38 +00:00
Tony Wasserka
a1343e9296
Revert "Add cmake option DISABLE_CLANG_PRESERVE_ALL"
2024-02-05 22:31:45 +01:00
Alyssa Rosenzweig
235f32ce8c
Merge pull request #3401 from Sonicadvance1/runtime_preserve_all
...
HostFeatures: Supports runtime disabling of preserve_all
2024-02-05 15:34:46 -04:00
Alyssa Rosenzweig
2e0cb2fbd4
OpcodeDispatcher: optimize TST
...
it's just an AndWithFlags setting the PF.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-05 15:32:21 -04:00
Alyssa Rosenzweig
4790a7ba79
IR: add AndWithFlags
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-05 15:32:21 -04:00
Tony Wasserka
df3e51fc8c
Library Forwarding: Allocate packed arguments on the guest stack if needed
...
This is required for host-side calls to guest functions on 32-bit guests.
Since the host stack is allocated before FEX blocks memory inaccessible to
the guest, the guest would otherwise fail to read the packed argument data.
2024-02-05 18:10:34 +01:00
Ryan Houdek
0139498072
SpinWaitLock: Removes unused variable in spin-loop fallback
...
Tmp was no longer being used, forgot to remove it.
2024-02-05 07:22:52 -08:00
Ryan Houdek
472a701e2b
Merge pull request #3403 from Sonicadvance1/fix_spinlock_contended_lock
...
SpinLockWait: Fixes unexpected lock success
2024-02-05 06:51:42 -08:00
Ryan Houdek
cce6011205
SpinLockWait: Fixes unexpected lock success
...
With a contended unique lock, we forgot to reset the `Expected` value to
zero. This was causing a contended mutex to incorrectly succeed.
Noticed this when converting some pthread mutexes over to spinloops to
remove strace noise.
The reference wfe_mutex library I wrote didn't have this problem since
the implementation is slightly different.
2024-02-03 01:10:57 -08:00
Ryan Houdek
c437129ed8
Revert "Revert "FEXLoader: Moves thread management to the frontend""
...
This reverts commit 5358af7794d9568398f7b84fe09b4c8198448f2c.
2024-02-03 00:57:36 -08:00
Alyssa Rosenzweig
8d3f0b6f02
OpcodeDispatcher: reassociate and sink W in sha1
...
We only need each part of W extracted in the corresponding round, so sink the
extract into the round to reduce pressure.
Further, W and E are added and then never used again. So, by reassociating we
can do the add upfront, killing W and E at the start and further reducing
pressure.
Eliminates spilling in sha1rnds4.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-02 13:03:07 -04:00
Alyssa Rosenzweig
60f7b9bcc4
OpcodeDispatcher: optimze sha1's 2/3 expr
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-02 13:03:07 -04:00
Alyssa Rosenzweig
a487557173
OpcodeDispatcher: extract BitwiseAtLeastTwo
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-02 13:03:07 -04:00
Alyssa Rosenzweig
394b4888bb
OpcodeDispatcher: reassociate and remat C0, G0
...
costs 2 moves and eliminates the rest of our spilling
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-02 13:03:07 -04:00
Alyssa Rosenzweig
142cbdd852
OpcodeDispatcher: expand, reassociate, and interleave sha256 calc
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-02 13:03:07 -04:00
Alyssa Rosenzweig
2f9102f78d
OpcodeDispatcher: expand & interleave sha256 calc
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-02 13:03:07 -04:00
Alyssa Rosenzweig
c9824d04cb
OpcodeDispatcher: sink sha256 extracts
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-02 13:03:07 -04:00
Alyssa Rosenzweig
9c2a569539
OpcodeDispatcher: reexpress Major in sha256
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-02 13:03:07 -04:00
Alyssa Rosenzweig
515aa4ce3e
OpcodeDispatcher: fuse eor+ror in sha256
...
This reduces instructions a ton.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-02 13:03:07 -04:00