For the GPR result, the masking already happens as part of the bfi. So the only
point of masking is for the flag calculation. But actually, every flag except
carry will ignore the upper bits anyway. And the carry calculation actually
WANTS the upper bit as a faster impl.
Deletes a pile of code both in FEX and the output :-)
ADC/SBC could probably get similar treatment later.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Now unused, its former users all prefer LoadPFRaw since they can fold in some of
this math into the use.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Use the raw popcount rather than the final PF and use some sneaky bit math to
come out 1 instruction ahead.
Closes#3117
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Mostly copypaste of Orlshl... we really should deduplicate this mess somehow.
Maybe a shift enum on the core Or op?
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
This logic is unused since 8adfaa9aa ("OpcodeDispatcher: Use SelectCC for x87"),
which addressed the underlying issue.
This reverts commit df3833edbe3d34da4df28269f31340076238e420.
If we const-prop the required functions and leafs then we can directly
encode the CPUID information rather than jumping out of the JIT.
In testing almost all CPUID executions const-prop which function is
getting called. Worst case that I found was only 85% const-prop rate.
This isn't quite 100% optimal since we need to call the RCLSE and
Constprop passes after we optimize these, which would remove some
redundant moves.
Sadly there seems to be a bug in the constprop pass that starts crashing
applications if that is done.
Easily enough tested by running Half-Life 2 and it immediately hitting
SIGILL.
Even without this optimization, this is stil a significant savings since
we aren't jumping out of the JIT anymore for these optimized CPUIDs.
Most CPUID routines return constant data, there are four that don't.
Some CPUID functions also need the leaf descriptor, so we need to
describe that as well.
Functions that don't return constant data:
- function 1Ah - Returns different data depending on current CPU core
- function 8000_000{2,3,4} - Different data based on CPU core
Functions that need leaf constprop:
- 4h, 7h, Dh, 4000_0001h, 8000_001Dh
Gets us the constant source optimization without more code duplication. And
honestly I prefer the combined presentation.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
This option was disabled a few months ago when we switched the server
socket from a filesystem unix socket to an abstract socket.
This partially broke our chroot scripts which relied on this option
existing.
Readds support for an explicitly named abstract socket named from
config.
This is a workaround for dealing with chroots that change users.
They end up changing a user while doing operations and then can't
connect to the FEXServer anymore because environment variables have been
wiped away.
movprfx is invalid to use when the source register matches the movprfx
destination.
This was getting picked up on by `TwoByte/0F_D1.asm` now that RCLSE is
working better now.
The bug that was causing crashes with this was due to inline syscalls.
Now that this is fixed we can re-enable store->load operations.
This allows constant propagation to work significantly better, which
means inline syscalls start working again. This can significantly
improve syscall performance in some cases.
This is most likely to improve performance in dxsetup and vc_redist but
hard to get a real profile.
Additionally this will let us inline cpuid results in the future which
is pretty nice.