movprfx is invalid to use when the source register matches the movprfx
destination.
This was getting picked up on by `TwoByte/0F_D1.asm` now that RCLSE is
working better now.
The bug that was causing crashes with this was due to inline syscalls.
Now that this is fixed we can re-enable store->load operations.
This allows constant propagation to work significantly better, which
means inline syscalls start working again. This can significantly
improve syscall performance in some cases.
This is most likely to improve performance in dxsetup and vc_redist but
hard to get a real profile.
Additionally this will let us inline cpuid results in the future which
is pretty nice.
Ever since we reordered registers in `X86Enums.h` this has silently been
broken. This wasn't hit because RCLSE has been broken ever since SRA was
added, so inlinesyscalls just weren't ever happening.
Quick fix while I think of a way to more strictly correlate these
registers so it doesn't happen again.
The range was slightly incorrect which mostly wouldn't have caused
issues.
The lowest byte would have just generated slightly less optimal code.
The upper byte could have generated broken code, which our CI couldn't
catch since TSO instructions only get enabled when multiple threads are
in-flight.
Easy enough to fix.
This would have caused core to try and initialize a custom core on
Arm64, which causes a std::function assert because it doesn't support
that.
Users would likely get hit by this immediately since we deleted the
interpreter and shifted all the core numbers.
Originally this was going to use setf8/setf16, but it looks like the approach of
shift-and-test turns out to be faster. As a bonus this is a nice delete-the-code
win :-)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
The only reason we need to XOR arguments for AF is to get bit 4 correct. But if
the operand in question is known to have bit 4 clear, the XOR will be an
effective no-op and can be skipped. This saves an instruction in a bunch of
common cases, like inc/dec. If we dedicated a register to AF to eliminate the
store, we would not save an instruction from this but would still come out ahead
due to an eor turning into a (zero cycle?) mov that can be handled by the
renamer.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Add new synthetic condition codes that do an AND as their relational operator,
testing the result. This is 1 IR op for things like
(A & B) == 0 ? C : D
This can translate to
tst A, B
csel A, B, eq
In the future, if A is the NZCV register and B is a supported immediate, eg
(NZCV & 0x80000000) == 0 ? C : D
this will be able to translate to a single instruction with the appropriate
condition
csel A, B, pl
but that needs RA support.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
This is blocking performance improvements. This backend is almost
unilaterally unused except for when I'm testing if games run on Radeon
video drivers.
Hopefully AmpereOne and Orin/Grace can fulfill this role when they
launch next year.
This is an atomicFetchCLR, removes two mvn instructions that are back to
back negating the source.
We didn't have this instruction combination in InstCountCI so will be a
bit hard to see.
It is scarcely used today, and like the x86 jit, it is a significant
maintainence burden complicating work on FEXCore and arm64 optimization. Remove
it, bringing us down to 2 backends.
1 down, 1 to go.
Some interpreter scaffolding remains for x87 fallbacks. That is not a problem
here.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>