953 Commits

Author SHA1 Message Date
Alyssa Rosenzweig
df5bdefb8a OpcodeDispatcher: merge secondary ALU with primary ALU
It's the same, stop copypasting. This gets our flag and arithmetic opts (current
and future) applied to secondary ALU too.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-24 15:54:49 -04:00
Alyssa Rosenzweig
3d1fb7701c OpcodeDispatcher: optimize sub
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-24 15:54:49 -04:00
Alyssa Rosenzweig
140976d322 OpcodeDispatcher: prep primary ALU for better flags
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-24 15:54:49 -04:00
Alyssa Rosenzweig
9a11d3b1a2 OpcodeDispatcher: fuse NEG
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-24 15:54:49 -04:00
Alyssa Rosenzweig
96e652879f OpcodeDispatcher: fuse DEC
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-24 15:54:49 -04:00
Alyssa Rosenzweig
cc1c1dd047 OpcodeDispatcher: return result from SUB flag calculate
for fusion

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-24 15:54:49 -04:00
Alyssa Rosenzweig
c1d572951f OpcodeDispatcher: drop unused GenerateFlags_SUB arg
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-24 15:54:49 -04:00
Alyssa Rosenzweig
dd9d3264dd OpcodeDispatcher: smarten SUB flag generation
we don't need the result, we can use subs and come out ahead in practice. also a
step towards better fusion

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-24 15:54:49 -04:00
Alyssa Rosenzweig
2aaf957ad8 RedundantFlagCalculationElimination: DCE as we go
This is required to ensure single-iteration convergence with a sequence like:

  write C
  whatever = load C
  rmif C, whatever
  invalidate C

avoids regressing the "DEC dead" case with future work in the series.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-24 15:54:49 -04:00
Alyssa Rosenzweig
d459b2f9b5 IR: propagate 0 into sub
now that we have to handle it, we may as well take advantage of it.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-24 15:54:49 -04:00
Alyssa Rosenzweig
22ab7f2b3e IR: add SubWithFlags op (arm64 subs)
with 8/16-bit handling to keep everything uniform.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-24 15:54:49 -04:00
Alyssa Rosenzweig
f7e32373ce JIT: allow #0 in sub
turns into neg, this will be generated via SubWithFlags -> Sub opts.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-24 15:54:49 -04:00
Alyssa Rosenzweig
25d422e92b JIT: use GetZeroableReg for NZCVSelect
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-24 15:54:49 -04:00
Alyssa Rosenzweig
cfbeece09f JIT: use GetZeroableReg for CondAddNZCV
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-24 15:54:49 -04:00
Alyssa Rosenzweig
a597a09825 JIT: use GetZeroableReg for SubNZCV
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-24 15:54:49 -04:00
Alyssa Rosenzweig
99854ff310 JIT: add GetZeroableReg helper
for inlining constant zeroes in applicable sources

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-24 15:54:49 -04:00
Ryan Houdek
0a64f8a9c5
Moves SignalDelegator TLS tracking to the frontend
FEXCore doesn't need track the TLS state of the SignalDelegator, this is
a frontend concept.

Removes the tracking from the backend and keeps it in the frontend.
2024-02-24 01:07:29 -08:00
Ryan Houdek
59ec88f48d
Merge pull request #3438 from Sonicadvance1/move_tls_allocation
Moves JITSymbol allocation
2024-02-23 14:49:04 -08:00
Ryan Houdek
6ec628fa31
Merge pull request #3433 from bylaws/arm64ec-pt1
Arm64Emitter: Introduce ARM64EC SRA mappings
2024-02-23 14:48:43 -08:00
Alyssa Rosenzweig
5378ae2e76
Merge pull request #3436 from alyssarosenzweig/ir/af-simplify
Simplify CalculateAF
2024-02-22 08:17:07 -04:00
Ryan Houdek
bd4a81a2a1
Moves JITSymbol allocation
This isn't actually using TLS allocations. Instead it is an allocation
tied to the InternalThreadState object.
2024-02-21 17:57:34 -08:00
Ryan Houdek
d4be2dc636
Merge pull request #3434 from bylaws/arm64ec-pt3
FEXCore: Expose AbsoluteLoopTopAddress to the frontend
2024-02-21 14:31:04 -08:00
Alyssa Rosenzweig
2bcd285851
Merge pull request #3430 from Sonicadvance1/tsc_scale
Implement small TSC scaling
2024-02-21 13:16:27 -04:00
Alyssa Rosenzweig
8762bc1fa3 OpcodeDispatcher: simplify CalculateAF signature
- Res is unused
- SrcSize doesn't matter since we ignore the high bits, might as well always use
  32-bit, it doesn't matter

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-21 12:48:15 -04:00
Billy Laws
5b4162b712 FEXCore: Expose AbsoluteLoopTopAddress to the frontend
ARM64EC has a shared SRA mapping between ARM64 and X64 code, so there
needs to be a public way to enter the dispatcher without refilling SRA
from the in-memory context struct.
2024-02-21 11:46:24 +00:00
Billy Laws
cb5c07f4b1 Arm64Emitter: Introduce ARM64EC SRA mappings
See https://learn.microsoft.com/en-us/cpp/build/arm64ec-windows-abi-conventions?view=msvc-170
note that since mm registers are volatile there is no need to match the
mapping for them when in JIT, so they can be used as scratch regs.
Disallowed regs are also wiped on context switches, so they cannot be
taken advantage of to e.g. avoid spilling.
2024-02-21 11:18:10 +00:00
Ryan Houdek
b902b8edab
Implement small TSC scaling
Games engines are expecting >1Ghz cycle counters. Scale them to work
around the issue.

Resolves the excessive busy waiting in Unreal Engine 5 games.
2024-02-20 12:05:44 -08:00
Alyssa Rosenzweig
0503c89ff6 OpcodeDispatcher: use NZCV update helpers
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-19 14:12:54 -04:00
Alyssa Rosenzweig
6dd410698a OpcodeDispatcher: add helpers for updating NZCV metadata
to reduce error-prone copypaste

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-19 14:12:54 -04:00
Ryan Houdek
808ced455d
FEXCore: Add a frontend pointer to InternalThreadState
FEXCore is guaranteed to not touch this pointer and can be used by
frontends to store thread-specific data.
2024-02-15 02:06:16 -08:00
Ryan Houdek
9cab746aa7
Merge pull request #3407 from neobrain/feature_libfwd_arguments_on_guest_stack
Library Forwarding: Allocate packed arguments on the guest stack if needed
2024-02-12 16:31:34 -08:00
Alyssa Rosenzweig
68232366e4 OpcodeDispatcher: don't mask add/sub sources
not needed in the new approach

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-12 12:36:28 -04:00
Alyssa Rosenzweig
d7ff1b78fb IR: handle 8/16-bit AddNZCV/SubNZCV
we can do it more effectively than the current s/w lowering.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-12 12:36:09 -04:00
Mai
780b48620b
Merge pull request #3420 from Sonicadvance1/preserve_all_3419
Fix #3419
2024-02-10 23:24:38 -05:00
Ryan Houdek
4a0878fa92 Fix #3419 2024-02-10 19:55:51 -08:00
Ryan Houdek
df3d6938ae
Merge pull request #3410 from alyssarosenzweig/opt/nzcv-pass-2
Add NZCV+PF/AF optimization pass
2024-02-10 05:03:12 -08:00
Ryan Houdek
ba41da7da0
Merge pull request #3414 from Sonicadvance1/fix_one_mutex_hang
Fixes one mutex hang
2024-02-09 05:54:40 -08:00
Ryan Houdek
2480bab409 Fixes one mutex hang
When code invalidation is happening we currently have the issue that a
thread can acquire the code invalidation mutex in the middle of
invalidation. This is due to us acquiring and releasing the mutex
between each thread's code invalidation.

We need to hold the mutex for the entire duration for all thread's code
invalidation.
This fixes a rare hang on proton startup and resolves a consistent hang
on Proton application shutdown.

This now puts us on par with FEX-2312.1 with hanging.

This does not fix a relatively rare hang on fork (which also existed with FEX-2312.1).

This also does not fix the issue that the intersection of our mutexes
between frontend and backend are very convoluted. In part of the work
that is going to fix the rare fork mutex hang will change more of this.
2024-02-08 18:18:00 -08:00
Alyssa Rosenzweig
ad7202e7d7 OpcodeDispatcher: optimize test -1
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-08 14:10:13 -04:00
Alyssa Rosenzweig
175a57dd27 OpcodeDispatcher: emit AndWithFlags directly for primary alu
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-06 14:22:28 -04:00
Alyssa Rosenzweig
e2ce60148c OpcodeDispatcher: emit AndWithFlags directly for 2ndary alu
rely on opt pass to drop the flags.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-06 14:22:28 -04:00
Alyssa Rosenzweig
99660129f3 IR: implement AndWithFlags for 8/16-bit
easier to deal with in the JIT

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-06 14:22:28 -04:00
Alyssa Rosenzweig
308d9a751c RedundantFlagCalculationElimination: optimize rmif
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-06 13:06:26 -04:00
Alyssa Rosenzweig
4bd28c0ed8 RedundantFlagCalculationElimination: optimize condaddnzcv
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-06 13:06:26 -04:00
Alyssa Rosenzweig
8397f3ac99 RedundantFlagCalculationElimination: refine AXFLAG
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-06 13:06:26 -04:00
Alyssa Rosenzweig
0452bc7212 RedundantFlagCalculationElimination: optimize condjump
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-06 13:06:26 -04:00
Alyssa Rosenzweig
3d7ed89ffb RedundantFlagCalculationElimination: optimize NZCVSelect
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-06 13:06:25 -04:00
Alyssa Rosenzweig
23ab0a978e RedundantFlagCalculationElimination: also handle InvalidateFlags
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-06 13:06:25 -04:00
Alyssa Rosenzweig
7f47a9ef0e IR: add local dead flag elimination pass
RCLSE ignores NZCV and doesn't optimize stores which doesn't help us with PF/AF
either. So, we add a new pass for dead flag elimination (cannibalizing the old
and broken dead flag elimination pass). This is a simple local optimizer that
walks each block backwards, converging in linear time & constant space in a
single iteration.

Right now, it doesn't do a ton (other than a nice reduction in silliness in
the hot Sonic block), but it provides the framework to fuse comparisons.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-06 13:06:25 -04:00
Alyssa Rosenzweig
4331753ca0
Merge pull request #3408 from alyssarosenzweig/opt/tst
Optimize TST
2024-02-06 11:28:02 -04:00