611 Commits

Author SHA1 Message Date
Alyssa Rosenzweig
92211bf8c6 OpcodeDispatcher: Add AllowUpperGarbage option
To load 8-bit sources without bfe'ing for al/bl/cl if the caller knows it
doesn't need masking behaviour, but without lying about the size so the extract
for ah/bh/ch will still work properly.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-09-26 19:08:20 -04:00
Ryan Houdek
ca87d8688d
Merge pull request #3153 from alyssarosenzweig/opt/adcs
Use adcs
2023-09-26 09:57:01 -07:00
Ryan Houdek
e32601f49d
Merge pull request #3161 from neobrain/fix_ctest_silent_failures
unittests: Instruct CTest to print output from tests on failure
2023-09-26 08:26:15 -07:00
Tony Wasserka
f4dd456c80 unittests: Instruct CTest to print output from tests on failure 2023-09-26 17:16:28 +02:00
Alyssa Rosenzweig
7a06cc9727 IR: Use adcs/sbcs
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-09-26 09:06:46 -04:00
Alyssa Rosenzweig
5facb21d30 OpcodeDispatcher: Don't mask small add/sub carries
For the GPR result, the masking already happens as part of the bfi. So the only
point of masking is for the flag calculation. But actually, every flag except
carry will ignore the upper bits anyway. And the carry calculation actually
WANTS the upper bit as a faster impl.

Deletes a pile of code both in FEX and the output :-)

ADC/SBC could probably get similar treatment later.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-09-25 18:25:30 -04:00
Ryan Houdek
234e029391
Merge pull request #3145 from Sonicadvance1/optimize_inline_calls
PassManager: Optimize out CPUID and XGetBV calls
2023-09-24 18:09:18 -07:00
Alyssa Rosenzweig
c8519b0b87 OpcodeDispatcher: Remove LoadPF
Now unused, its former users all prefer LoadPFRaw since they can fold in some of
this math into the use.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-09-24 20:59:28 -04:00
Alyssa Rosenzweig
68d32ad70d OpcodeDispatcher: Optimize PF in lahf
Use the raw popcount rather than the final PF and use some sneaky bit math to
come out 1 instruction ahead.

Closes #3117

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-09-24 20:59:28 -04:00
Alyssa Rosenzweig
1f02a6da34 IR: Add Ornror op
Mostly copypaste of Orlshl... we really should deduplicate this mess somehow.
Maybe a shift enum on the core Or op?

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-09-24 20:47:50 -04:00
Alyssa Rosenzweig
86063411dc Revert "OpcodeDispatcher: Use plain Lshl for flags"
This logic is unused since 8adfaa9aa ("OpcodeDispatcher: Use SelectCC for x87"),
which addressed the underlying issue.

This reverts commit df3833edbe3d34da4df28269f31340076238e420.
2023-09-24 20:47:50 -04:00
Ryan Houdek
9968e6431f Passes: Rename SyscallOptimization
This is now inlining multiple external calls out of the JIT. Rename it
to InlineCallOptimization.
2023-09-24 17:25:38 -07:00
Ryan Houdek
ff24f64b2a PassManager: Optimize out CPUID and XGetBV calls
If we const-prop the required functions and leafs then we can directly
encode the CPUID information rather than jumping out of the JIT.
In testing almost all CPUID executions const-prop which function is
getting called. Worst case that I found was only 85% const-prop rate.

This isn't quite 100% optimal since we need to call the RCLSE and
Constprop passes after we optimize these, which would remove some
redundant moves.

Sadly there seems to be a bug in the constprop pass that starts crashing
applications if that is done.
Easily enough tested by running Half-Life 2 and it immediately hitting
SIGILL.

Even without this optimization, this is stil a significant savings since
we aren't jumping out of the JIT anymore for these optimized CPUIDs.
2023-09-24 17:25:38 -07:00
Ryan Houdek
e9a7ef2534 CPUID: Describe CPUID functions if they return constant state or not
Most CPUID routines return constant data, there are four that don't.
Some CPUID functions also need the leaf descriptor, so we need to
describe that as well.

Functions that don't return constant data:
- function 1Ah - Returns different data depending on current CPU core
- function 8000_000{2,3,4} - Different data based on CPU core

Functions that need leaf constprop:
- 4h, 7h, Dh, 4000_0001h, 8000_001Dh
2023-09-24 17:25:38 -07:00
Ryan Houdek
842c57e221 CPUID: Constify some functions
These don't modify CPUIDEmu state.
2023-09-24 17:25:38 -07:00
Alyssa Rosenzweig
8798e0cba0 Arm64: Rewrite Set/GetRoundingMode
I went auditing for places to use cset and what I found was hot garbage.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-09-24 19:52:35 -04:00
Alyssa Rosenzweig
c5fc03dac4 OpcodeDispatcher: Use cset for blsr/etc flags
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-09-24 19:52:35 -04:00
Alyssa Rosenzweig
e63871ed2e OpcodeDispatcher: Handle sub in CalculateOF
Gets us the constant source optimization without more code duplication. And
honestly I prefer the combined presentation.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-09-24 19:52:35 -04:00
Alyssa Rosenzweig
ea8b7633eb OpcodeDispatcher: Optimize OF calc of immediates
If we know the sign of one of the sources, we can do better when calculating OF.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-09-24 18:16:09 -04:00
Ryan Houdek
9ab2967d71 Arm64: Fixes wide shifts
movprfx is invalid to use when the source register matches the movprfx
destination.

This was getting picked up on by `TwoByte/0F_D1.asm` now that RCLSE is
working better now.
2023-09-23 06:06:18 -07:00
Ryan Houdek
d01b457727 RCLSE: Optimize redundant store->load operations
The bug that was causing crashes with this was due to inline syscalls.
Now that this is fixed we can re-enable store->load operations.

This allows constant propagation to work significantly better, which
means inline syscalls start working again. This can significantly
improve syscall performance in some cases.

This is most likely to improve performance in dxsetup and vc_redist but
hard to get a real profile.

Additionally this will let us inline cpuid results in the future which
is pretty nice.
2023-09-23 06:06:18 -07:00
Mai
4e9a114858
Merge pull request #3142 from Sonicadvance1/inline_syscall_fix
Arm64: Fixes inline syscalls
2023-09-23 09:03:49 -04:00
Mai
72d092e951
Merge pull request #3141 from Sonicadvance1/fix_simm9_range
ConstProp: Fixes unscaled signed 9-bit range
2023-09-23 09:03:01 -04:00
Ryan Houdek
28fa0bda31 Arm64: Fixes inline syscalls
Ever since we reordered registers in `X86Enums.h` this has silently been
broken. This wasn't hit because RCLSE has been broken ever since SRA was
added, so inlinesyscalls just weren't ever happening.

Quick fix while I think of a way to more strictly correlate these
registers so it doesn't happen again.
2023-09-23 02:56:32 -07:00
Ryan Houdek
1f2a3cfa8b ConstProp: Fixes unscaled signed 9-bit range
The range was slightly incorrect which mostly wouldn't have caused
issues.

The lowest byte would have just generated slightly less optimal code.
The upper byte could have generated broken code, which our CI couldn't
catch since TSO instructions only get enabled when multiple threads are
in-flight.

Easy enough to fix.
2023-09-23 01:13:54 -07:00
Ryan Houdek
571b0fe47e Config: Fixes core sanitization
This would have caused core to try and initialize a custom core on
Arm64, which causes a std::function assert because it doesn't support
that.

Users would likely get hit by this immediately since we deleted the
interpreter and shifted all the core numbers.
2023-09-23 00:52:23 -07:00
Alyssa Rosenzweig
223a6562ff IR: Support <32-bit TestNZ
Originally this was going to use setf8/setf16, but it looks like the approach of
shift-and-test turns out to be faster. As a bonus this is a nice delete-the-code
win :-)

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-09-22 19:08:26 -04:00
Alyssa Rosenzweig
b1231c24ef OpcodeDispatcher: Omit AF xor for common constants
The only reason we need to XOR arguments for AF is to get bit 4 correct. But if
the operand in question is known to have bit 4 clear, the XOR will be an
effective no-op and can be skipped. This saves an instruction in a bunch of
common cases, like inc/dec. If we dedicated a register to AF to eliminate the
store, we would not save an instruction from this but would still come out ahead
due to an eor turning into a (zero cycle?) mov that can be handled by the
renamer.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-09-22 19:08:26 -04:00
Alyssa Rosenzweig
699aa85c4b OpcodeDispatcher: Opt PF selection
Fold the and in.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-09-22 19:07:42 -04:00
Alyssa Rosenzweig
2d65a3677b OpcodeDispatcher: Optimize NZCV selects
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-09-22 19:07:42 -04:00
Alyssa Rosenzweig
2a2619c0f5 IR: Add bit masking selects
Add new synthetic condition codes that do an AND as their relational operator,
testing the result. This is 1 IR op for things like

  (A & B) == 0 ? C : D

This can translate to

  tst A, B
  csel A, B, eq

In the future, if A is the NZCV register and B is a supported immediate, eg

  (NZCV & 0x80000000) == 0 ? C : D

this will be able to translate to a single instruction with the appropriate
condition

  csel A, B, pl

but that needs RA support.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-09-22 19:07:42 -04:00
Ryan Houdek
797c890ff6
Merge pull request #2874 from bylaws/wowfex
Add WOW64 JIT frontend
2023-09-22 15:47:59 -07:00
Ryan Houdek
0fbf403787 Adds back in host testharnessrunner CI
Necessary for asm tests to still run in the host "core".
Useful for ensuring correct behaviour of our assembly tests.
2023-09-22 14:46:03 -07:00
Billy Laws
51f8c83c76 Context: Add an alternative thread-oriented execute function 2023-09-22 10:12:40 -07:00
Billy Laws
d641d3f61e OpcodeDispatcher: Avoid redundantly passing args to WIN32 ABI syscalls 2023-09-22 10:12:39 -07:00
Ryan Houdek
b5cc9a12f2 FEXCore: Removes x86 JIT.
This is blocking performance improvements. This backend is almost
unilaterally unused except for when I'm testing if games run on Radeon
video drivers.

Hopefully AmpereOne and Orin/Grace can fulfill this role when they
launch next year.
2023-09-21 18:30:02 -07:00
Ryan Houdek
31564354b1 FEXCore: Removes vestigial Interpreter code 2023-09-21 15:49:49 -07:00
Ryan Houdek
fea72ce19c
Merge pull request #3120 from Sonicadvance1/more_optimal_x87
FEXCore: Support preserve_all ABI for interpreter fallbacks
2023-09-21 15:35:37 -07:00
Ryan Houdek
2b7e1d10ec
Merge pull request #3131 from Sonicadvance1/optimize_btr
OpcodeDispatcher: Optimize lock btr
2023-09-21 15:06:55 -07:00
Ryan Houdek
5444810d64
Merge pull request #3132 from alyssarosenzweig/opt/orlshl
Optimize reconstructing x87, harder
2023-09-21 15:02:37 -07:00
Ryan Houdek
1a4d1d820b OpcodeDispatcher: Optimize lock btr
This is an atomicFetchCLR, removes two mvn instructions that are back to
back negating the source.

We didn't have this instruction combination in InstCountCI so will be a
bit hard to see.
2023-09-21 14:54:51 -07:00
Ryan Houdek
0ae4bbb9c5 IR: Implements support for AtomicFetchCLR
This is the native ARM operation rather than fetchAnd. Will make an
instruction an instruction slightly more optimal.
2023-09-21 14:54:51 -07:00
Alyssa Rosenzweig
c52741c813 FEXCore: Gut interpreter
It is scarcely used today, and like the x86 jit, it is a significant
maintainence burden complicating work on FEXCore and arm64 optimization. Remove
it, bringing us down to 2 backends.

1 down, 1 to go.

Some interpreter scaffolding remains for x87 fallbacks. That is not a problem
here.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-09-21 12:48:12 -04:00
Alyssa Rosenzweig
1596e33f58 OpcodeDispatcher: Remove pointless or
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-09-21 09:13:41 -04:00
Alyssa Rosenzweig
07d03f1610 OpcodeDispatcher: Don't opencode bfe, badly
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-09-21 09:13:41 -04:00
Alyssa Rosenzweig
a8b48dcacd OpcodeDispatcher: Swap some selects
...if it lets us use cset.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-09-21 09:13:41 -04:00
Alyssa Rosenzweig
bb87b2a19d OpcodeDispatcher: Use more Orlshl
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-09-21 09:13:41 -04:00
Alyssa Rosenzweig
19eff62c77 OpcodeDispatcher: Use orlshl for FCW
Potentially easier on the RA (bfi has a tied operand), mostly whatever here.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-09-21 08:55:25 -04:00
Mai
5fc8699db9
Merge pull request #3130 from Sonicadvance1/optimize_fsw
OpcodeDispatcher: Optimize reconstructing FSW
2023-09-21 08:35:16 -04:00
Ryan Houdek
5664195e49 OpcodeDispatcher: Optimize reconstructing FSW
Minor optimization using Bfi to insert C0, C1, C2, & C3
2023-09-21 02:07:27 -07:00