Commit Graph

1082 Commits

Author SHA1 Message Date
Ryan Houdek
d4be2dc636
Merge pull request #3434 from bylaws/arm64ec-pt3
FEXCore: Expose AbsoluteLoopTopAddress to the frontend
2024-02-21 14:31:04 -08:00
Alyssa Rosenzweig
2bcd285851
Merge pull request #3430 from Sonicadvance1/tsc_scale
Implement small TSC scaling
2024-02-21 13:16:27 -04:00
Alyssa Rosenzweig
8762bc1fa3 OpcodeDispatcher: simplify CalculateAF signature
- Res is unused
- SrcSize doesn't matter since we ignore the high bits, might as well always use
  32-bit, it doesn't matter

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-21 12:48:15 -04:00
Billy Laws
5b4162b712 FEXCore: Expose AbsoluteLoopTopAddress to the frontend
ARM64EC has a shared SRA mapping between ARM64 and X64 code, so there
needs to be a public way to enter the dispatcher without refilling SRA
from the in-memory context struct.
2024-02-21 11:46:24 +00:00
Billy Laws
cb5c07f4b1 Arm64Emitter: Introduce ARM64EC SRA mappings
See https://learn.microsoft.com/en-us/cpp/build/arm64ec-windows-abi-conventions?view=msvc-170
note that since mm registers are volatile there is no need to match the
mapping for them when in JIT, so they can be used as scratch regs.
Disallowed regs are also wiped on context switches, so they cannot be
taken advantage of to e.g. avoid spilling.
2024-02-21 11:18:10 +00:00
Ryan Houdek
b902b8edab
Implement small TSC scaling
Games engines are expecting >1Ghz cycle counters. Scale them to work
around the issue.

Resolves the excessive busy waiting in Unreal Engine 5 games.
2024-02-20 12:05:44 -08:00
Alyssa Rosenzweig
0503c89ff6 OpcodeDispatcher: use NZCV update helpers
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-19 14:12:54 -04:00
Alyssa Rosenzweig
6dd410698a OpcodeDispatcher: add helpers for updating NZCV metadata
to reduce error-prone copypaste

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-19 14:12:54 -04:00
Ryan Houdek
808ced455d
FEXCore: Add a frontend pointer to InternalThreadState
FEXCore is guaranteed to not touch this pointer and can be used by
frontends to store thread-specific data.
2024-02-15 02:06:16 -08:00
Ryan Houdek
9cab746aa7
Merge pull request #3407 from neobrain/feature_libfwd_arguments_on_guest_stack
Library Forwarding: Allocate packed arguments on the guest stack if needed
2024-02-12 16:31:34 -08:00
Alyssa Rosenzweig
68232366e4 OpcodeDispatcher: don't mask add/sub sources
not needed in the new approach

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-12 12:36:28 -04:00
Alyssa Rosenzweig
d7ff1b78fb IR: handle 8/16-bit AddNZCV/SubNZCV
we can do it more effectively than the current s/w lowering.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-12 12:36:09 -04:00
Mai
780b48620b
Merge pull request #3420 from Sonicadvance1/preserve_all_3419
Fix #3419
2024-02-10 23:24:38 -05:00
Ryan Houdek
4a0878fa92 Fix #3419 2024-02-10 19:55:51 -08:00
Ryan Houdek
df3d6938ae
Merge pull request #3410 from alyssarosenzweig/opt/nzcv-pass-2
Add NZCV+PF/AF optimization pass
2024-02-10 05:03:12 -08:00
Ryan Houdek
ba41da7da0
Merge pull request #3414 from Sonicadvance1/fix_one_mutex_hang
Fixes one mutex hang
2024-02-09 05:54:40 -08:00
Ryan Houdek
2480bab409 Fixes one mutex hang
When code invalidation is happening we currently have the issue that a
thread can acquire the code invalidation mutex in the middle of
invalidation. This is due to us acquiring and releasing the mutex
between each thread's code invalidation.

We need to hold the mutex for the entire duration for all thread's code
invalidation.
This fixes a rare hang on proton startup and resolves a consistent hang
on Proton application shutdown.

This now puts us on par with FEX-2312.1 with hanging.

This does not fix a relatively rare hang on fork (which also existed with FEX-2312.1).

This also does not fix the issue that the intersection of our mutexes
between frontend and backend are very convoluted. In part of the work
that is going to fix the rare fork mutex hang will change more of this.
2024-02-08 18:18:00 -08:00
Alyssa Rosenzweig
ad7202e7d7 OpcodeDispatcher: optimize test -1
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-08 14:10:13 -04:00
Alyssa Rosenzweig
175a57dd27 OpcodeDispatcher: emit AndWithFlags directly for primary alu
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-06 14:22:28 -04:00
Alyssa Rosenzweig
e2ce60148c OpcodeDispatcher: emit AndWithFlags directly for 2ndary alu
rely on opt pass to drop the flags.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-06 14:22:28 -04:00
Alyssa Rosenzweig
99660129f3 IR: implement AndWithFlags for 8/16-bit
easier to deal with in the JIT

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-06 14:22:28 -04:00
Alyssa Rosenzweig
308d9a751c RedundantFlagCalculationElimination: optimize rmif
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-06 13:06:26 -04:00
Alyssa Rosenzweig
4bd28c0ed8 RedundantFlagCalculationElimination: optimize condaddnzcv
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-06 13:06:26 -04:00
Alyssa Rosenzweig
8397f3ac99 RedundantFlagCalculationElimination: refine AXFLAG
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-06 13:06:26 -04:00
Alyssa Rosenzweig
0452bc7212 RedundantFlagCalculationElimination: optimize condjump
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-06 13:06:26 -04:00
Alyssa Rosenzweig
3d7ed89ffb RedundantFlagCalculationElimination: optimize NZCVSelect
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-06 13:06:25 -04:00
Alyssa Rosenzweig
23ab0a978e RedundantFlagCalculationElimination: also handle InvalidateFlags
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-06 13:06:25 -04:00
Alyssa Rosenzweig
7f47a9ef0e IR: add local dead flag elimination pass
RCLSE ignores NZCV and doesn't optimize stores which doesn't help us with PF/AF
either. So, we add a new pass for dead flag elimination (cannibalizing the old
and broken dead flag elimination pass). This is a simple local optimizer that
walks each block backwards, converging in linear time & constant space in a
single iteration.

Right now, it doesn't do a ton (other than a nice reduction in silliness in
the hot Sonic block), but it provides the framework to fuse comparisons.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-06 13:06:25 -04:00
Alyssa Rosenzweig
4331753ca0
Merge pull request #3408 from alyssarosenzweig/opt/tst
Optimize TST
2024-02-06 11:28:02 -04:00
Paulo Matos
fa8bcfd67a Clean up access to possible nullptr
Patch suggested by @Sonicadvance1
2024-02-06 12:31:38 +00:00
Tony Wasserka
a1343e9296
Revert "Add cmake option DISABLE_CLANG_PRESERVE_ALL" 2024-02-05 22:31:45 +01:00
Alyssa Rosenzweig
235f32ce8c
Merge pull request #3401 from Sonicadvance1/runtime_preserve_all
HostFeatures: Supports runtime disabling of preserve_all
2024-02-05 15:34:46 -04:00
Alyssa Rosenzweig
2e0cb2fbd4 OpcodeDispatcher: optimize TST
it's just an AndWithFlags setting the PF.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-05 15:32:21 -04:00
Alyssa Rosenzweig
4790a7ba79 IR: add AndWithFlags
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-05 15:32:21 -04:00
Tony Wasserka
df3e51fc8c Library Forwarding: Allocate packed arguments on the guest stack if needed
This is required for host-side calls to guest functions on 32-bit guests.
Since the host stack is allocated before FEX blocks memory inaccessible to
the guest, the guest would otherwise fail to read the packed argument data.
2024-02-05 18:10:34 +01:00
Ryan Houdek
0139498072 SpinWaitLock: Removes unused variable in spin-loop fallback
Tmp was no longer being used, forgot to remove it.
2024-02-05 07:22:52 -08:00
Ryan Houdek
472a701e2b
Merge pull request #3403 from Sonicadvance1/fix_spinlock_contended_lock
SpinLockWait: Fixes unexpected lock success
2024-02-05 06:51:42 -08:00
Ryan Houdek
cce6011205 SpinLockWait: Fixes unexpected lock success
With a contended unique lock, we forgot to reset the `Expected` value to
zero. This was causing a contended mutex to incorrectly succeed.

Noticed this when converting some pthread mutexes over to spinloops to
remove strace noise.

The reference wfe_mutex library I wrote didn't have this problem since
the implementation is slightly different.
2024-02-03 01:10:57 -08:00
Ryan Houdek
c437129ed8 Revert "Revert "FEXLoader: Moves thread management to the frontend""
This reverts commit 5358af7794.
2024-02-03 00:57:36 -08:00
Alyssa Rosenzweig
8d3f0b6f02 OpcodeDispatcher: reassociate and sink W in sha1
We only need each part of W extracted in the corresponding round, so sink the
extract into the round to reduce pressure.

Further, W and E are added and then never used again. So, by reassociating we
can do the add upfront, killing W and E at the start and further reducing
pressure.

Eliminates spilling in sha1rnds4.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-02 13:03:07 -04:00
Alyssa Rosenzweig
60f7b9bcc4 OpcodeDispatcher: optimze sha1's 2/3 expr
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-02 13:03:07 -04:00
Alyssa Rosenzweig
a487557173 OpcodeDispatcher: extract BitwiseAtLeastTwo
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-02 13:03:07 -04:00
Alyssa Rosenzweig
394b4888bb OpcodeDispatcher: reassociate and remat C0, G0
costs 2 moves and eliminates the rest of our spilling

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-02 13:03:07 -04:00
Alyssa Rosenzweig
142cbdd852 OpcodeDispatcher: expand, reassociate, and interleave sha256 calc
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-02 13:03:07 -04:00
Alyssa Rosenzweig
2f9102f78d OpcodeDispatcher: expand & interleave sha256 calc
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-02 13:03:07 -04:00
Alyssa Rosenzweig
c9824d04cb OpcodeDispatcher: sink sha256 extracts
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-02 13:03:07 -04:00
Alyssa Rosenzweig
9c2a569539 OpcodeDispatcher: reexpress Major in sha256
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-02 13:03:07 -04:00
Alyssa Rosenzweig
515aa4ce3e OpcodeDispatcher: fuse eor+ror in sha256
This reduces instructions a ton.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-02 13:03:07 -04:00
Alyssa Rosenzweig
f616beb992 OpcodeDispatcher: CSE sha
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-02 13:03:07 -04:00
Alyssa Rosenzweig
0dcf1e12b8 OpcodeDispatcher: copyprop sha logic
prepare for clever

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-02 13:03:07 -04:00
Alyssa Rosenzweig
2cbf544ef5 OpcodeDispatcher: expand sha logic
no functional change, just preparing for cleverness.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-02 13:03:07 -04:00
Ryan Houdek
0eed73beeb HostFeatures: Supports runtime disabling of preserve_all
This is used for instcountci to ensure instruction counts don't change
when a compiler supports this feature or not. Always runtime disable
when running in instcountci.

CMake option from #3394 can still be useful so leaving that in place.
2024-02-02 08:59:04 -08:00
Mai
6993f4fd8d
Merge pull request #3400 from Sonicadvance1/revert_runtime_longmode_switch
Revert #3303
2024-02-02 11:53:44 -05:00
Mai
920a8db369
Merge pull request #3397 from pmatos/XCHGOp
Improve XCHG operations
2024-02-02 11:53:22 -05:00
Paulo Matos
4623544f69 Improve XCHG operations
Marking loads as allowing upper garbage simplifies some operations.
Update InstCountCI as well.
2024-02-02 08:16:13 +00:00
Ryan Houdek
ccf1402fe6 Revert "FEXCore: Accurately store segment descriptors"
This reverts commit 8648fb1485.
2024-02-01 18:14:30 -08:00
Ryan Houdek
da0e1b515a Revert "OpcodeDispatcher: Initial support for runtime long-mode switch"
This reverts commit 9e5d7aa5fe.
2024-02-01 18:14:24 -08:00
Ryan Houdek
690cb6fa48 Config: Fixes JSON parsing of "ArgumentHandler" types
When 4d109c9ce0 fixed parsing strenum
types in the json, it also added `ArgumentHandler` types to the json
parsing. This was incorrect as those types are already stored in the
json in their decoded numerical format.

Without this change, all config options with `ArgumentHandler` will
decode as "0" which is incorrect. The main killer here is that SMCChecks
gets disabled (visible in both FEXConfig and when applications are
running) which was causing spurious failures.
2024-02-01 16:20:57 -08:00
Ryan Houdek
cec1814a09
Merge pull request #3384 from pmatos/CDQOp-Opt
Optimize CDQOp
2024-01-31 17:51:23 -08:00
Mai
4d49ac7c3d
Merge pull request #3387 from alyssarosenzweig/opt/rotates
Optimize rotates
2024-01-31 18:20:40 -05:00
Alyssa Rosenzweig
6d13d9fb56
Merge pull request #3395 from pmatos/StaticAnalysis
Code cleanup - mainly dead store removal; NFC
2024-01-31 17:24:48 -04:00
Mai
ae7dc250db
Merge pull request #3386 from alyssarosenzweig/opt/shift
Optimize shifts a bit
2024-01-31 14:11:58 -05:00
Mai
f4086b25e6
Merge pull request #3385 from alyssarosenzweig/opt/bmi
Optimize bit manipulation instructions
2024-01-31 14:07:23 -05:00
Paulo Matos
e4560ed0c8 Code cleanup - mainly dead store removal; NFC
scan-build found a few dead stores that can be easily cleaned-up
2024-01-31 08:35:55 +00:00
Paulo Matos
6d58ea31b9 Add cmake option DISABLE_CLANG_PRESERVE_ALL
Forces disabling use of __attribute__((preserve_all)).
Until CI uses clang17, where this attribute was added, instcountci fails
when FEX is compiled with clang>=17.
2024-01-31 08:29:20 +00:00
Alyssa Rosenzweig
f3eee8f305 OpcodeDispatcher: optimize bextr's length sanitize
reordering the operations saves an immediate move.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:28:06 -04:00
Alyssa Rosenzweig
f66085f4a7 OpcodeDispatcher: optimize bextr's (1 << x) - 1
little algebraic trick I cribbed from llvm

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:28:06 -04:00
Alyssa Rosenzweig
c9461d9997 OpcodeDispatcher: optimize BEXTR flag setting
use native test.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:28:06 -04:00
Alyssa Rosenzweig
d5eb99fac8 OpcodeDispatcher: optimize popcount flags
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:28:06 -04:00
Alyssa Rosenzweig
f3175848b1 OpcodeDispatcher: use lzcnt flag gen for tzcnt
as far as flags go, they're identical: set ZF for zero output, set CF for output
= DestSize, undef the rest. merge the impls, so we get the optimized lzcnt impl
for tzcnt.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:28:06 -04:00
Alyssa Rosenzweig
3dd597a591 OpcodeDispatcher: optimize lzcnt
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:28:06 -04:00
Alyssa Rosenzweig
e8e05252f0 OpcodeDispatcher: optimize BLSI
and explain why the suss thing we did before was actually right all along.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:28:06 -04:00
Alyssa Rosenzweig
93cef53ec0 OpcodeDispatcher: optimize blsr flags
reorder to avoid nzcv clobber

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:28:06 -04:00
Alyssa Rosenzweig
3a19133267 OpcodeDispatcher: fix inverted BLSR carry
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:28:06 -04:00
Alyssa Rosenzweig
9b309b2102 OpcodeDispatcher: optimize blsmsk flags
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:28:06 -04:00
Alyssa Rosenzweig
fe88b904c9 OpcodeDispatcher: fix missing SF set with blsmsk 2024-01-30 22:28:06 -04:00
Alyssa Rosenzweig
2e63c6d547 OpcodeDispatcher: fix inverted CF with blsmsk
CF set if SRC = 0

per https://www.felixcloutier.com/x86/blsmsk

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:28:06 -04:00
Alyssa Rosenzweig
0bc9e1a409 OpcodeDispatcher: clobber OF with shift immediate
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:26:59 -04:00
Alyssa Rosenzweig
338f12845d OpcodeDispatcher: save a constant in shld
one weird trick

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:26:59 -04:00
Alyssa Rosenzweig
b3ae81f75f OpcodeDispatcher: allow garbage on shld shift
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:26:59 -04:00
Alyssa Rosenzweig
c1a1c37980 OpcodeDispatcher: mark ideas to improve SHLD
a bit tricky right now.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:26:59 -04:00
Alyssa Rosenzweig
fb6f850bb4 OpcodeDispatcher: remove rcl sub
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:22:57 -04:00
Alyssa Rosenzweig
b6d8749525 OpcodeDispatcher: remove select from rcl
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:22:57 -04:00
Alyssa Rosenzweig
d3f1397325 OpcodeDispatcher: eliminate constants in RCR
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:22:57 -04:00
Alyssa Rosenzweig
0a164428fa OpcodeDispatcher: eliminate select in RCR
the nzcv clobber I actually came ofr

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:22:57 -04:00
Alyssa Rosenzweig
7496175100 OpcodeDispatcher: optimize 32-bit rcl/rcr
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:22:57 -04:00
Alyssa Rosenzweig
0616a9cef1 OpcodeDispatcher: eliminate move in rcr 1-bit
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:22:57 -04:00
Alyssa Rosenzweig
97f8775354 OpcodeDispatcher: optimize <32-bit rcr op1
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:22:57 -04:00
Alyssa Rosenzweig
c92099aa98 OpcodeDispatcher: fuse orlshl in rcr 1-bit
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:22:57 -04:00
Alyssa Rosenzweig
7c288b09f1 OpcodeDispatcher: rmif mask rcl smaller OF
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:22:57 -04:00
Alyssa Rosenzweig
680af7b1b0 OpcodeDispatcher: rcr op 8x1 cleanup
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:22:57 -04:00
Alyssa Rosenzweig
349bc9efab OpcodeDispatcher: unify rcr op 1bit codepaths
get additional opt for <32-bit

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:22:57 -04:00
Alyssa Rosenzweig
ad5c3cb268 OpcodeDispatcher: rmif mask for OF in rcr smaller
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:22:57 -04:00
Alyssa Rosenzweig
be8d37ef3d OpcodeDispatcher: optimize 32-bit rol/ror imm
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:22:57 -04:00
Alyssa Rosenzweig
6ad2514bfe OpcodeDispatcher: rmif mask rcl smaller cf
better on flagm. extra moves on non-flagm but, meh.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:22:57 -04:00
Alyssa Rosenzweig
3fa6129a14 OpcodeDispatcher: rmif mask rcr smaller cf
and do some constant folding to do so more.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:22:57 -04:00
Alyssa Rosenzweig
a57cebaf58 OpcodeDispatcher: skip OF calc for constant rotate >= 2
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:22:57 -04:00
Alyssa Rosenzweig
34fdb14da1 OpcodeDispatcher: add and use AndConst
this skips the constant folding, which saves the branching in the rotate
immediate implementations.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:22:57 -04:00
Alyssa Rosenzweig
974baca09c OpcodeDispatcher: allow upper garbage with rcl/rcr smaller
we're masking immediately to something smaller

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:22:57 -04:00
Alyssa Rosenzweig
f22094a493 OpcodeDispatcher: use a branch for 8/16-bit rotate flags
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:22:57 -04:00
Alyssa Rosenzweig
d979b3a1da OpcodeDispatcher: note idea to further optimize rcl
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:22:57 -04:00
Alyssa Rosenzweig
6d82c957fa OpcodeDispatcher: fuse orlshl in rcl
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:22:57 -04:00
Mai
fa3352004e
Merge pull request #3381 from alyssarosenzweig/opt/masking
Allow upper garbage on a bunch of instructions
2024-01-30 10:07:53 -05:00
Ryan Houdek
ce2924731e vixl/simulator: Enlarge simulator stack size
Simulator stack size defaults to 8KB. This new unit test requires at
least 15360 stack size. Just push it up to 8MB.
2024-01-29 19:48:38 -08:00
Ryan Houdek
bc67910ee4
Merge pull request #3382 from pmatos/TypoFix
Fix typos; NFC
2024-01-29 16:10:14 -08:00
Mai
31a4158957
Merge pull request #3383 from alyssarosenzweig/opt/ptest
Optimize PTEST and VTESTP
2024-01-29 13:30:53 -05:00
Mai
58f3d3caf5
Merge pull request #3380 from alyssarosenzweig/opt/pdep
Optimize PDEP
2024-01-29 13:27:15 -05:00
Alyssa Rosenzweig
ae48228943 OpcodeDispatcher: optimize vtestps/vtestpd
I don't really care about AVX but do the same thing we did for vptest.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:24:11 -04:00
Alyssa Rosenzweig
e8e35e48c7 OpcodeDispatcher: optimize ptest with tst
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:20:17 -04:00
Alyssa Rosenzweig
8b8f27a88f OpcodeDispatcher: optimize ptest with umaxv
to check if the vector is zero, umaxv its elements and check if the reduced
scalar is zero.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:20:17 -04:00
Alyssa Rosenzweig
8e7906a665 IR: add UMaxV
will be used to accelerate ptest

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:19:22 -04:00
Paulo Matos
027fbbf051 Optimize CDQOp 2024-01-29 17:18:02 +00:00
Paulo Matos
ca31a0404c ConstProp should generate 32bit constants when required 2024-01-29 17:15:47 +00:00
Paulo Matos
f644959c7c Fixing some typos; NFC 2024-01-29 17:14:53 +00:00
Alyssa Rosenzweig
16a54742e6 OpcodeDispatcher: optimize 32-bit tzcnt
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:11:25 -04:00
Alyssa Rosenzweig
ad9aa0bc87 OpcodeDispatcher: optimize 32-bit lzcnt
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:11:25 -04:00
Alyssa Rosenzweig
bd2b3f35a3 OpcodeDispatcher: optimize 32-bit popcnt
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:11:25 -04:00
Alyssa Rosenzweig
50169ce640 OpcodeDispatcher: optimize 32-bit pext
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:11:25 -04:00
Alyssa Rosenzweig
baae2d68f9 OpcodeDispatcher: optimize 32-bit bextr
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:11:25 -04:00
Alyssa Rosenzweig
ad8d038b8a OpcodeDispatcher: optimize 32-bit blsi
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:11:25 -04:00
Alyssa Rosenzweig
820932e3c7 OpcodeDispatcher: optimize 32-bit blsmsk
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:11:25 -04:00
Alyssa Rosenzweig
6f11f2e6f4 OpcodeDispatcher: optimize 32-bit blsr
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:11:25 -04:00
Alyssa Rosenzweig
f5ad7682c3 OpcodeDispatcher: optimize 32-bit pdep
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:06:56 -04:00
Alyssa Rosenzweig
04805f351b JIT: rewrite pdep implementation
- use better algorithm that is O(# set bits) instead of O(# total bits)
- eliminate spilling by careful management of our temporaries
- fix nzcv clobber bug (whoops)

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:06:56 -04:00
Mai
750b0b70bc
Merge pull request #3356 from Sonicadvance1/modify_code_lock
Jitarm64: Implements spin-loop futex for JIT blocks
2024-01-23 13:46:59 -05:00
Ryan Houdek
56d8080ec9
Merge pull request #3345 from Sonicadvance1/fix_syscall_registers
OpcodeDispatcher: Fixes syscall rcx/r11 generation
2024-01-22 15:21:13 -08:00
Ryan Houdek
c0be974272
Merge pull request #3368 from bylaws/preprcr
FEXCore: Fix RCL/RCR shift wraparound behaviour
2024-01-21 13:44:49 -08:00
Billy Laws
e323938173 FEXCore: Fix RCL/RCR shift wraparound behaviour
This ends up being cleaner to handle outside of
CalculateFlags_ShiftVariable as constant masking is only needed for
RCL/RCR.
2024-01-21 18:15:50 +00:00
Billy Laws
407e26bfee FEXCore: Use TMP1-4 for values that need preserving across spills
The ARM64EC SRA layout will use x0-3 for x86_64 registers, as such any
arguments passed to C ABI functions need to proxy their arguments
through the temporaries and move as appropriate.
2024-01-21 16:21:13 +00:00
Ryan Houdek
a6c57f71e9 SpinWaitLock: Fixes potential extra wait that would occur on contended lock
We had a chance of doing an additional bogus wfe if the expected value
was hit in one iteration of a loop. Not the biggest problem on current
hardware where WFE only ever sleeps for 1-4 system cycles, but on future
hardware where WFE might actually sleep for longer then this could have
been an issue.
2024-01-17 10:41:16 -08:00
Ryan Houdek
2af7e997f4 Spinlocks: Fix assembly
Need to have a source be +r so it doesn't get overwritten.
2024-01-17 10:19:38 -08:00
Ryan Houdek
ab6c00bbcf FEXCore/Utils: Rename FutexSpinWait to SpinWaitLock 2024-01-17 10:19:38 -08:00
Ryan Houdek
e18453cb57 Jitarm64: Implements spin-loop futex for JIT blocks
This will ensure that multiple concurrent SIGBUS handlers in the same
code block doesn't modify the same code.
2024-01-17 10:19:38 -08:00
Ryan Houdek
39f49782da Arm64: Move ParanoidTSO checks up out of the non-paranoid code bath 2024-01-17 10:19:38 -08:00
Ryan Houdek
2c5dd20f3c FutexSpinWait: Implement spin-loop Unique mutex. 2024-01-17 10:19:38 -08:00
Ryan Houdek
136fa78825 FEXCore: Implements an efficient spin-loop API
This will only be used internally inside of FEXCore for efficient shared
codecach backpatch spin-loops.
2024-01-17 10:19:38 -08:00
Ryan Houdek
f956f008ea
Merge pull request #3372 from alyssarosenzweig/opt/cmpxchg-review
Optimize GPR cmpxchg
2024-01-15 05:11:12 -08:00
Ryan Houdek
1f7a619c79 OpcodeDispatcher: Fixes syscall rcx/r11 generation
Noticed this while writing #3342.

Fixes #3343

The syscall instruction is defined in the documentation that it will set
RCX to the next instruction's RIP and R11 to be RFLAGS. We entirely
skipped this which I noticed while writing unit tests.

Adds unittests to test both 32-bit and 64-bit behaviour because our
helper shares code with both.

I don't know if anything actually relied on this behaviour but we should
definitely support it.
2024-01-12 19:14:30 -08:00
Alyssa Rosenzweig
58127bd0e8 OpcodeDispatcher: optimize trivial cmpxchgs
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-12 12:23:34 -04:00
Alyssa Rosenzweig
e8945dfb6d OpcodeDispatcher: optimize gpr cmpxchg
NZCV stuff.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-12 12:03:28 -04:00
Ryan Houdek
8c3163096b
Merge pull request #3363 from Sonicadvance1/fix_label_allocations
ArmEmitter: Support single use forward labels
2024-01-12 00:26:31 -08:00
Ryan Houdek
615cfe0246
Merge pull request #3361 from Sonicadvance1/decompose_std_function
FEXCore: Decompose some std::function usage to regular pointers
2024-01-10 16:55:29 -08:00
Ryan Houdek
3d5f876585 Fixes some new glibc allocations that cropped up
I guess this was handled by brk things before.
2024-01-09 13:55:04 -08:00
Ryan Houdek
37102400b5 Arm64: Switches uses of forward label over to SingleUse if possible
Primary goal for this is to ensure that the delinker doesn't need to
allocate any memory. This delinker can end up getting hit heavily with
JIT code so we don't want it to be allocating memory.
2024-01-08 22:18:20 -08:00
Ryan Houdek
c01e6283ae CodeEmitter: Support a single use forward label
Currently all uses of the forward label calls in to jemalloc to allocate
memory. This allows a forward label that doesn't require any memory
allocation, which is the common case in FEX.
2024-01-08 22:18:20 -08:00
Ryan Houdek
248dc97993 FEXCore: Decompose some std::function usage to regular pointers
The delinker step of the JIT was using std::function with capture
lambdas that required memory allocation when unnecessary.
Because the compiler can't see through our std::function usage it could
never decompose these by itself.

By passing the Thread's frame and record to the function as arguments
then we can have the signature be a raw function pointer.

This fixes an area of concern from:
https://github.com/FEX-Emu/FEX/blob/main/docs/ProgrammingConcerns.md#stdfunction-and-lambdas
2024-01-06 19:39:54 -08:00
Ryan Houdek
d488592eda
Merge pull request #3339 from Sonicadvance1/pass_thread_unaligned_fault_handler
FEXCore: Pass thread object to HandleUnalignedAccess
2024-01-04 18:20:37 -08:00
Ryan Houdek
743df8dfae
Merge pull request #3327 from Sonicadvance1/remove_syscall_indirection
Arm64: Removes a vtable indirection in syscalls
2024-01-04 18:19:40 -08:00
Ryan Houdek
4b3792196f
Merge pull request #3303 from Sonicadvance1/initial_runtime_longmode_switch
OpcodeDispatcher: Initial support for runtime long-mode switch
2024-01-04 18:17:54 -08:00
Ryan Houdek
db7d7a6bd7
Merge pull request #3349 from Sonicadvance1/revert_frontend_ownership
Revert "FEXLoader: Moves thread management to the frontend"
2024-01-03 14:25:04 -08:00
Alyssa Rosenzweig
04a88ed3ab
Merge pull request #3353 from Sonicadvance1/public_interface_cleaning
FEXCore interface cleaning
2024-01-03 15:14:54 -04:00
Alyssa Rosenzweig
9da08b40bd
Merge pull request #3344 from Sonicadvance1/xbyak_upstream
Externals: Update xbyak to v7.02 and switch away from fork
2024-01-03 15:13:58 -04:00
Alyssa Rosenzweig
5467c3e478
Merge pull request #3357 from Sonicadvance1/remove_non_sra
FEXCore: Removes SRA option, it's now permanently enabled
2024-01-03 15:10:04 -04:00
wannacu
4e7bab849c JIT: Fixes broken register in VTBX1
If the Dst register is allocated as VectorIndices or VectorTable,
using Dst as an operand to perform the tbx operation will result in an error.
For example:
%131(FPR0) i128 = LoadNamedVectorIndexedConstant u8:Tmp:RegisterSize, #0x6, #0xaa0
%132(FPR0) i128 = VTBX1 u8:Tmp:RegisterSize, %129(FPRFixed6) i32v4, %126(FPRFixed10) i16v8, %131(FPR0) i128
Since the tbx instruction's destination register is also the original operand,
this is consistent with the semantics of VTBX1. Therefore,
directly using VectorSrcDst as the destination operand for the tbx instruction is safe.
2023-12-29 16:18:40 +08:00
Ryan Houdek
d098545c20 FEXCore: Removes SRA option, it's now permanently enabled 2023-12-28 18:28:02 -08:00
Ryan Houdek
5358af7794 Revert "FEXLoader: Moves thread management to the frontend"
This reverts commit 58f2693954.
2023-12-27 04:33:50 -08:00
Ryan Houdek
25bcddf3a5 FEXCore: Removes context wide and map lookup
While locking a shared_lock and doing an empty table lookup is fairly
fast, just remove them from the hot path entirely if no custom IR
handlers are installed.

This is only used for our IRLoader, which is losing its importance
significantly and should probably be removed anyway.
2023-12-26 11:11:44 -08:00
Ryan Houdek
f785b38e4d
Merge pull request #3352 from Sonicadvance1/remove_irloader
Removes IRLoader, unittests, and public interface
2023-12-26 11:08:26 -08:00
Ryan Houdek
b115c144fb FEXCore: Removes NetStream from public API
Only used by GDBServer.
NFC.
2023-12-25 07:07:17 -08:00
Ryan Houdek
d8f20751fe FEXCore: Moves IREmitter from the public API to backend
No functional change
2023-12-25 07:00:29 -08:00
Ryan Houdek
1977747fc2 Removes IRLoader, unittests, and public interface
This unit test hasn't really served any purpose for a while now and
mostly just causes pain when reworking things in the IR.

Just remove the IRLoader, its unit tests, the github action steps and
the public FEXCore interface to it. Since it isn't used by anything
other than Thunks.

Also moves some IR definitions from the public API to the backend.
2023-12-25 07:00:29 -08:00
Ryan Houdek
257016bf12 FEXCore: Moves BucketList out of public API
NFC
2023-12-25 06:58:22 -08:00
Ryan Houdek
69d65fba4a FEXCore: Removes unused SyscallVisitor
This was expected to be part of the syscall optimizations we did but
ended up getting manifested in a different way. Remove it.
2023-12-25 06:42:11 -08:00
Ryan Houdek
bce694ebb5 FEXCore: Moves BitUtils to FHU
No functional change
2023-12-25 06:38:51 -08:00
Ryan Houdek
5d37d5db1a FEXCore: Optimize HostFeatures and CPUID feature calculation
Need #3348 merged first.

As I was casually thinking, this code made me realize that it was quite
branch heavy and could likely be optimized to logic.

The previous code generated some fairly nasty branch heavy code. This
can be optimized to be branchless and take roughly five instructions
per flag. Using a bitfield for each feature would turn each calculation
in to 3-4 instructions but that seems overkill.

Very minor thing.
2023-12-25 04:58:15 -08:00
Ryan Houdek
4d109c9ce0 Config: Fixes parsing strenum inside of json files
This wasn't wired up before.
2023-12-23 22:32:59 -08:00
Ryan Houdek
db9b326534 FEXCore: Support disabling CPUID features based on config
Need to be able to disable sha by config.
2023-12-23 22:32:29 -08:00
Ryan Houdek
1c34b25538 FEX: Removes legacy kernel 32-bit allocator
We only used this so that our Xavier CI system which were running old
kernels could run unit tests. We have now removed the Xaviers from CI
and this is no longer necessary.

Stop pretending that we support kernels older than 5.0 and allowing this
fallback.

The 32-bit allocator is still used for the MAP_32BIT mmap flag, so the
load bearing code can't be fully removed. Just remove the config and the
frontend things using it.
2023-12-21 06:21:01 -08:00
Ryan Houdek
38ad3f0e05 FEXCore: Pass thread object to HandleUnalignedAccess
Currently no functional change but public API breaks should come early.

The thread state object will be used for looking up thread specific
codebuffers in the future when we support MDWE with code mirrors.
2023-12-21 01:55:25 -08:00
Ryan Houdek
266f7feecb Arm64: Removes a vtable indirection in syscalls
We can safely call virtual functions through the JIT with a little bit
of work.

FEX's JIT has quite a few steps before it gets to a syscall handler.

Before this commit:
JIT->static HandleSyscall->SyscallHandler::HandleSyscall->SyscallHandler

After this commit:
JIT->SyscallHandler::HandleSyscall->SyscallHandler

A bit hard to notice this when this interface can spin at 67-million
calls per second though.
2023-12-21 01:55:02 -08:00
Ryan Houdek
f9902142f7 Utils: Add ability to get VTable entries to PMF helper
This will be useful to remove an indirection.
2023-12-21 01:55:02 -08:00
Ryan Houdek
9e5d7aa5fe OpcodeDispatcher: Initial support for runtime long-mode switch
This has the Frontend and OpcodeDispatcher select their operating mode
depending on the incoming code segment long-mode flag.

Adds some asserts since currently it is unexpected if the configuration
changes at runtime.

This is fairly straightforward for an initial setup but isn't fully
fleshed out.

Right now FEX's x86 tables aren't setup in a way to support choosing a
different instruction decoding depending on runtime operating mode
change, so that would break in interesting ways.

Primarily this just gets FEX setup to start piping the operating mode
through from the frontend to the backend. This is a long term task, so
it is going to take a long time to iron out all the issues.
2023-12-21 01:54:19 -08:00
Ryan Houdek
8648fb1485 FEXCore: Accurately store segment descriptors
Previously we were only storing the 32-bit base address which isn't
actually how segment descriptors work.

In reality segment descriptors are 64-bit descriptors that are laid out
in a particular layout depending on the 4-bit type value. In reality we
only care about code and data segment layouts since the rest are
bonkers.

Describe these descriptors correctly and setup a default code descriptor
for the operating mode that FEX is starting in.
2023-12-21 01:54:18 -08:00
Ryan Houdek
8b24f7fc26 Externals: Update xbyak to v7.02 and switch away from fork
The last few patches we need have been upstreamed so we shouldn't need
our downstream fork anymore.
2023-12-21 01:52:05 -08:00
Ryan Houdek
00669a1c89
Merge pull request #3336 from Sonicadvance1/warn_on_mdwe
FEXCore: Warn if MDWE is set
2023-12-20 13:27:43 -08:00
Ryan Houdek
1cedc3d85a FEXCore: Warn if MDWE is set
This will result in FEX not being able to allocate executable memory.
We can use shared memory in the future to work around this but for now
we don't support that as a fix.
2023-12-20 13:19:28 -08:00
Ryan Houdek
58f2693954 FEXLoader: Moves thread management to the frontend
Lots going on here.

This moves OS thread object lifetime management and internal thread
state lifetime management to the frontend. This causes a bunch of thread
handling to move from the FEXCore Context to the frontend.

Looking at `FEXCore/include/FEXCore/Core/Context.h` really shows how
much of the API has moved to the frontend that FEXCore no longer needs
to manage. Primarily this makes FEXCore itself no longer need to care
about most of the management of the emulation state.

A large amount of the behaviour moved wholesale from Core.cpp to
LinuxEmulation's ThreadManager.cpp. Which this manages the lifetimes of
both the OS threads and the FEXCore thread state objects.

One feature lost was the instruction capability, but this was already
buggy and is going to be rewritten/fixed when gdbserver work continues.

Now that all of this management is moved to the frontend, the gdbserver
can start improving since it can start managing all thread state
directly.
2023-12-19 17:43:04 -08:00
Mai
b4b8e81f24
Merge pull request #3321 from Sonicadvance1/thread_frontend_ownership_take2
FEXCore: Changes ParentThread ownership from the CTX to the frontend, take 2
2023-12-19 20:37:59 -05:00
Ryan Houdek
a8f797d36b Dispatcher: Convert GetCompileBlockPtr to using PMF helper
This was older code that was written before the PMF helper was
available.
Switch it over.
2023-12-19 17:20:37 -08:00
Ryan Houdek
93ec676ce8
Merge pull request #3340 from Sonicadvance1/exitfunctionlink_data
FEXCore: Describe exit function linking object with a structure
2023-12-19 16:17:21 -08:00
Mai
3d2cbc5d08
Merge pull request #3317 from Sonicadvance1/fix_imul_flags2
OpcodeDispatcher: Fixes flags generation in imul
2023-12-19 11:43:21 -05:00
Mai
81c85d73b2
Merge pull request #3330 from Sonicadvance1/optimize_sib_addr_calc
OpcodeDispatcher: Optimize SIB addr calculation
2023-12-19 11:41:42 -05:00
Mai
5b4e9c6907
Merge pull request #3323 from Sonicadvance1/remove_unused_check
Dispatcher: Removes unused asserting CompileBlock function
2023-12-19 11:38:57 -05:00
Ryan Houdek
aa2e8704bc FEXCore: Changes ParentThread ownership from the CTX to the frontend, take 2
Similar to #3284 but works around some of the bugs that one introduced.

This is the minimal amount of changes to move the ownership from FEXCore
to the frontend. Since the frontends don't yet have a full thread state
tracking, there is an opaque pointer that needs to be managed.

In the followup commits this will be changed to have the syscall handler
to be the thread object manager.
2023-12-18 14:54:07 -08:00
Ryan Houdek
cf86ae6b65 FEXCore: Describe exit function linking object with a structure
Instead of just poking raw uint64_t data values, describe it with a
struct.

This will be a read-only in the future.
2023-12-18 13:31:53 -08:00
Ryan Houdek
86654907bf
Merge pull request #3334 from Sonicadvance1/remove_old_x86jit_references
FEXCore: Removes stale references to x86 JIT
2023-12-18 04:15:29 -08:00
Ryan Houdek
12b72f908b
Merge pull request #3335 from Sonicadvance1/remove_internalthreadstate_header
FEXCore: Removes old InternalThreadState header
2023-12-18 04:14:57 -08:00
Ryan Houdek
6c8a54ff84 FEXCore: Removes old InternalThreadState header
This was a temporary header to help with when this header was migrated
to our public API headers.

It's temporary nature is no longer necessary, just get rid of it.
2023-12-15 18:51:25 -08:00
Ryan Houdek
bcc2901d7f FEXCore: Removes stale references to x86 JIT
It doesn't exist anymore.
2023-12-15 18:46:57 -08:00
Ryan Houdek
358bbb51ff CPUID: Removes Init and just uses constructor
No need to wait for initialization on for this anymore.
Ever since Init was refactored to do basically no work, this hasn't been
necessary.

CPUID does need to still be initialized after HostFeatures though, so
need to ensure correct member ordering there.
2023-12-15 18:43:23 -08:00
Ryan Houdek
1a2f41922c OpcodeDispatcher: Optimize SIB addr calculation
When the address calculation for SIB has both index and base then we can
optimize this to an add with a shifted register. This will convert a
three instruction sequence in to one instruction in most cases.
2023-12-15 13:08:46 -08:00
Ryan Houdek
0ede707e0b IR: Adds support for AddShift IR op
This matches x86 SIB's operation of `scale * index + base`
2023-12-15 13:08:06 -08:00
Ryan Houdek
12923ba1b7
Merge pull request #3322 from Sonicadvance1/remove_unused_exithandler
PassManager: Removes unused exit handler
2023-12-14 01:55:06 -08:00
Ryan Houdek
e657a27607 Dispatcher: Removes unused asserting CompileBlock function
While we were calling this function, its asserting nature hasn't been
used for a long time.

This used to trigger more frequently when CompileBlock would fail to
compile code, either due to not being able to decode an instruction or
hitting an instruction that FEX doesn't understand.

When these cases are hit today we still generate code blocks which
generate SIGILL. This means that this code was actually never hit.

Completely remove this function and have the JIT's dispatcher call the
CompileBlock function directly. Signature is slightly different since we
need to set x3 to be 0.
2023-12-11 17:20:51 -08:00
Ryan Houdek
8bb5462554 PassManager: Removes unused exit handler
git blame shows that 718b3e6b4c added this
handler.

It doesn't explain why this was desired but it was never wired up to
anything. Just remove it.
2023-12-11 17:14:41 -08:00
Ryan Houdek
98f21a2a28 X86Tables: Converts tables to be mostly consteval
Reduces the ELF's VM size from 9.8MB down to 9.37MB and should reduce
initialization time a smidge.

Slammed this out while waiting for other PRs to get reviewed.
2023-12-11 10:03:52 -08:00
Ryan Houdek
5660065eea FEXCore: Moves OS thread creation to the frontend
Fairly lightweight since it is almost 1:1 transplanting the code from
FEXCore in to the SyscallHandler's thread creation code.

Minor changes:
- ExecutionThreadHandler gets freed before executing the thread
   - Saves 16-bytes of memory per thread
- Start all threads paused by default
   - Since I moved the code to the frontend, I noticed we needed to do
     some post thread-creation setup.
   - Without the pause we were racing code execution with TLS setup and
     a few other things.
2023-12-11 06:22:50 -08:00
Ryan Houdek
7524029a06
Merge pull request #3294 from Sonicadvance1/mov_xid_check
FEXCore: Moves XID check to the frontend
2023-12-07 01:17:45 -08:00
Ryan Houdek
5c6f229e76 OpcodeDispatcher: Fixes flags generation imul
On overflow with 32-bit we weren't setting the flags correctly.
2023-12-07 01:08:02 -08:00
Ryan Houdek
acdb4c7061 IR: Adds support for {S,U}Mull
Lets us do a 32-bit multiply returning a 64-bit result, signed and
unsigned.
2023-12-07 01:06:58 -08:00