682 Commits

Author SHA1 Message Date
Ryan Houdek
cb56728e57 FEXCore/X86Tables: Removes unused supports REP flag
This flag was being set in the tables but was actually unused.
2023-11-25 16:54:51 -08:00
Ryan Houdek
b89c3a4573 FEXCore: Removes x86 DebugInfo table
This has long since been unused. Originally implemented for some fuzzing
tests but has been abandoned and that should likely be implemented some
other way.
2023-11-25 16:50:24 -08:00
Ryan Houdek
98f9a65202
Merge pull request #3275 from bylaws/x87fix
FEXCore: Work around broken preserve_all support in Windows clang
2023-11-22 05:53:14 -08:00
Ryan Houdek
8726c8fb73
Merge pull request #2691 from neobrain/refactor_scoped_signal_mask
ScopedSignalMask: Clean up API and use std::unique_lock/shared_lock
2023-11-19 04:53:05 -08:00
Tony Wasserka
d33b0cb9e3 SignalScopeGuards: Improve code gen for GuardSignalDeferringSectionWithFallback 2023-11-18 12:10:02 +01:00
Ryan Houdek
11993daec4 FEXCore: Hides eflags reconstruction information in the core
The frontend shouldn't need to know any information about how to
reconstruct eflags. Just give us the information we need and it'll work
out.
There are still some inherit limitations of this and some edge cases
that might give invalid data, but it is roughly as close as it was
before.

Just provide if the PC was in the JIT, the host GPRs, and the PState object from the signal
information and FEXCore does the rest.

We don't need to change the signature for `SetFlagsFromCompactedEFLAGS`
because during reloading of register state automatically does this for
us.
2023-11-17 20:38:42 -04:00
Billy Laws
05b78339f6 FEXCore: Work around broken preserve_all support in Windows clang
While clang side code to support preserve_all on Windows is in place
(and thus there are no errors for using it), there are still some parts
missing on the LLVM side. [1]

[1] 2402b14046/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp (L88)
2023-11-17 22:13:39 +00:00
Alyssa Rosenzweig
f60608a9c0 OpcodeDispatcher: allow garbage for 32bit inc/dec
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-17 17:37:24 -04:00
Alyssa Rosenzweig
153d871be2 OpcodeDispatcher: allow garbage for 32-bit cmp
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-17 17:37:24 -04:00
Alyssa Rosenzweig
3dfb94b524 OpcodeDispatcher: use axflag for x87-f64 fcmp
only 1 instr saved but meh.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-17 17:37:24 -04:00
Alyssa Rosenzweig
d1e43d94e9 OpcodeDispatcher: use axflag for fcmp faster
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-17 17:37:24 -04:00
Alyssa Rosenzweig
1b490e0e53 CodeEmitter: add ax/xaflag
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-17 17:37:24 -04:00
Alyssa Rosenzweig
094146d630 IR: add axflag
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-17 17:37:24 -04:00
Alyssa Rosenzweig
23c2a53683 OpcodeDispatcher: move fcmp flag fixup to dispatcher
simpler *and* much faster

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-17 17:37:24 -04:00
Alyssa Rosenzweig
82b7689ca4 OpcodeDispatcher: remove fcmp deferral
no longer load bearing, delete the abstraction.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-17 17:37:24 -04:00
Alyssa Rosenzweig
149f3e6f6d OpcodeDispatcher: rm flagsOp unused since select rework
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-17 17:37:24 -04:00
Alyssa Rosenzweig
2dcae23776 Arm64Emitter: Dedicate registers for PF/AF
Many flag-generating instructions like cmp need to save calculations for
deferred PF and AF flag calculation. Currently, they require a store per flag,
which is prohibitively expensive for hot instructions like cmp. By instead
pinning PF/AF temporary results to registers (x26/x27 by convention here), we
eliminate many stores altogether and turn the rest into zero-cycle moves (on
64-bit at least, this isn't optimal for 32-bit emulation due to CTX->GetGPRSize
shenanigans, need to check if this requirement can be lifted..).

To implement, we model as SRA and then the existing SRA code is able to generate
good code with little manual tuning. (Future work will get us to excellent code
with more tuning ;) ).

The tradeoff is reducing the working dynamic GPR set by 2 registers, which might
increase spilling in some cases. I think it's worth it in practice, though.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-17 17:37:24 -04:00
Tony Wasserka
92e4e75217 Merge DeferredSignalMutex.h and ScopedSignalMask.h into a single file
Using a single file makes sense now that the individual files are much
shorter and share common utility classes.
2023-11-17 10:56:34 +01:00
Tony Wasserka
5ca35bf77c ForkableMutex: Simplify WIN32 implementation 2023-11-17 10:56:34 +01:00
Tony Wasserka
c956b82d27 ScopedSignalMask/DeferredSignalMutex: Clean up API and use std::unique_lock/shared_lock 2023-11-17 10:56:34 +01:00
Ryan Houdek
c1d5fae018
Merge pull request #3273 from alyssarosenzweig/opt/shifts
Optimize shifts/rotates
2023-11-14 14:13:56 -08:00
Alyssa Rosenzweig
56841f0e50 OpcodeDispatcher: avoid moves with 64bit imul
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-14 08:40:41 -04:00
Alyssa Rosenzweig
723146050b OpcodeDispatcher: allow garbage with multiplies
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-14 08:40:41 -04:00
Alyssa Rosenzweig
85b1aa4c2d OpcodeDispatcher: optimize mul flags
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-14 08:40:41 -04:00
Ryan Houdek
c69082b1a4 OpcodeDispatcher: Optimize three sha instructions
- sha1nexte
   - Takes advantage of sha1h if supported
   - Does the operation in a vector otherwise
- sha1msg2
   - Instead of dumping everything to GPRs, we can do this with vectors
   - Mostly matches ARM's sha1su1 instruction, but it is /just/
     different enough to be annoying.
- sha256msg1
   - Directly matches sha256u0
   - Leaves the previous implementation alone
2023-11-13 18:38:02 -08:00
Ryan Houdek
74b2548982 IR: Implement support for VUSHRAI IR op
This matches Arm64 usra semantics. This instruction is useful for
implementing vector element rotate.
2023-11-13 18:22:45 -08:00
Ryan Houdek
f31656ec65 IR: Support sha1h and sha256u0
These match our needs so wire them up
2023-11-13 18:22:45 -08:00
Ryan Houdek
e91420c405 HostFeatures: fixup SHA checks
- Simulator doesn't support SHA
- Use DisableCrypto option to disable sha as well
- Only enable SHA if ARM cpu supports both SHA1 and SHA2
2023-11-13 18:22:45 -08:00
Ryan Houdek
25df59a65d ArmEmitter: Fixes sha256u1 emitter
Noticed this was actually emitting sha256h2
2023-11-13 18:22:45 -08:00
Alyssa Rosenzweig
83fdd5720f IR: Add ccmn
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-13 22:05:02 -04:00
Alyssa Rosenzweig
89b00c89aa OpcodeDispatcher: optimize rcl 1-bit
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-13 21:21:01 -04:00
Alyssa Rosenzweig
d38917b5f0 OpcodeDispatcher: rm pointless constant
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-13 21:21:01 -04:00
Alyssa Rosenzweig
910e0242c1 OpcodeDispatcher: optimize rcr 1-bit
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-13 21:21:01 -04:00
Alyssa Rosenzweig
651b7bb75d OpcodeDispatcher: optimize RCL
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-13 21:21:01 -04:00
Alyssa Rosenzweig
c9f13ae1dd OpcodeDispatcher: optimize RCR the usual ways
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-13 21:21:01 -04:00
Alyssa Rosenzweig
862e575100 OpcodeDispatcher: avoid some ubfx for rcr with flagm
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-13 21:21:01 -04:00
Alyssa Rosenzweig
4669c4541c OpcodeDispatcher: don't zero for flagm ror
missed earlier in the PR, would be annoying to rebase in.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-13 21:21:01 -04:00
Alyssa Rosenzweig
769a8c41c4 OpcodeDispatcher: Branch over shift=0 flags
Not supposed to touch flags at all, so don't! instead of making a terrible mess
of csels. a lot less instructions, and probably faster because the branch should
be predicted correctly in practice in hot loops.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-13 21:21:01 -04:00
Alyssa Rosenzweig
bec9dba2b1 OpcodeDispatcher: use shifted xor + rmif for rotates
eliminates lots of Bfe on flagm.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-13 21:21:01 -04:00
Alyssa Rosenzweig
205ba2ea13 IR: add shifted xor
for rotates.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-13 21:21:01 -04:00
Alyssa Rosenzweig
2073f6d287 OpcodeDispatcher: don't zero nzcv for flagm shifts
Faster for flagm. would be slower for !flagm because bfi slowness...

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-13 21:21:01 -04:00
Alyssa Rosenzweig
0f25a960ee OpcodeDispatcher: remove bfe for small shl imm
We allow the garbage in flags calculation, it's ignored.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-13 21:21:01 -04:00
Alyssa Rosenzweig
282ed3e309 OpcodeDispatcher: optimize bsf/bsr
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-13 21:21:01 -04:00
Alyssa Rosenzweig
cd031a7d38 OpcodeDispatcher: avoid some ubfx for flagm
Do the masking as part of the rmif, for free.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-13 21:21:01 -04:00
Alyssa Rosenzweig
e1885ed0bd OpcodeDispatcher: Use 64-bit ubfx
for larger shifts.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-13 21:21:01 -04:00
Alyssa Rosenzweig
238e52f74a OpcodeDispatcher: Don't mask 32-bit bzhi either
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-12 17:36:46 -04:00
Alyssa Rosenzweig
9398b931fb Arm64Emitter: Handle 32-bit negatives
Noticed in the area.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-12 17:36:42 -04:00
Alyssa Rosenzweig
b2a9785959 OpcodeDispatcher: optimize bzhi
Trickery to save an instruction :')

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-12 17:32:24 -04:00
Alyssa Rosenzweig
224a1f19a3 OpcodeDispatcher: improve bzhi flag gen
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-12 17:32:24 -04:00
Alyssa Rosenzweig
e25849b2cb OpcodeDispatcher: fix BZHI flag calculation
needs SF.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-12 17:32:24 -04:00