Ryan Houdek
cb56728e57
FEXCore/X86Tables: Removes unused supports REP flag
...
This flag was being set in the tables but was actually unused.
2023-11-25 16:54:51 -08:00
Ryan Houdek
b89c3a4573
FEXCore: Removes x86 DebugInfo table
...
This has long since been unused. Originally implemented for some fuzzing
tests but has been abandoned and that should likely be implemented some
other way.
2023-11-25 16:50:24 -08:00
Ryan Houdek
98f9a65202
Merge pull request #3275 from bylaws/x87fix
...
FEXCore: Work around broken preserve_all support in Windows clang
2023-11-22 05:53:14 -08:00
Ryan Houdek
8726c8fb73
Merge pull request #2691 from neobrain/refactor_scoped_signal_mask
...
ScopedSignalMask: Clean up API and use std::unique_lock/shared_lock
2023-11-19 04:53:05 -08:00
Tony Wasserka
d33b0cb9e3
SignalScopeGuards: Improve code gen for GuardSignalDeferringSectionWithFallback
2023-11-18 12:10:02 +01:00
Ryan Houdek
11993daec4
FEXCore: Hides eflags reconstruction information in the core
...
The frontend shouldn't need to know any information about how to
reconstruct eflags. Just give us the information we need and it'll work
out.
There are still some inherit limitations of this and some edge cases
that might give invalid data, but it is roughly as close as it was
before.
Just provide if the PC was in the JIT, the host GPRs, and the PState object from the signal
information and FEXCore does the rest.
We don't need to change the signature for `SetFlagsFromCompactedEFLAGS`
because during reloading of register state automatically does this for
us.
2023-11-17 20:38:42 -04:00
Billy Laws
05b78339f6
FEXCore: Work around broken preserve_all support in Windows clang
...
While clang side code to support preserve_all on Windows is in place
(and thus there are no errors for using it), there are still some parts
missing on the LLVM side. [1]
[1] 2402b14046/llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp (L88)
2023-11-17 22:13:39 +00:00
Alyssa Rosenzweig
f60608a9c0
OpcodeDispatcher: allow garbage for 32bit inc/dec
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-17 17:37:24 -04:00
Alyssa Rosenzweig
153d871be2
OpcodeDispatcher: allow garbage for 32-bit cmp
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-17 17:37:24 -04:00
Alyssa Rosenzweig
3dfb94b524
OpcodeDispatcher: use axflag for x87-f64 fcmp
...
only 1 instr saved but meh.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-17 17:37:24 -04:00
Alyssa Rosenzweig
d1e43d94e9
OpcodeDispatcher: use axflag for fcmp faster
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-17 17:37:24 -04:00
Alyssa Rosenzweig
1b490e0e53
CodeEmitter: add ax/xaflag
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-17 17:37:24 -04:00
Alyssa Rosenzweig
094146d630
IR: add axflag
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-17 17:37:24 -04:00
Alyssa Rosenzweig
23c2a53683
OpcodeDispatcher: move fcmp flag fixup to dispatcher
...
simpler *and* much faster
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-17 17:37:24 -04:00
Alyssa Rosenzweig
82b7689ca4
OpcodeDispatcher: remove fcmp deferral
...
no longer load bearing, delete the abstraction.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-17 17:37:24 -04:00
Alyssa Rosenzweig
149f3e6f6d
OpcodeDispatcher: rm flagsOp unused since select rework
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-17 17:37:24 -04:00
Alyssa Rosenzweig
2dcae23776
Arm64Emitter: Dedicate registers for PF/AF
...
Many flag-generating instructions like cmp need to save calculations for
deferred PF and AF flag calculation. Currently, they require a store per flag,
which is prohibitively expensive for hot instructions like cmp. By instead
pinning PF/AF temporary results to registers (x26/x27 by convention here), we
eliminate many stores altogether and turn the rest into zero-cycle moves (on
64-bit at least, this isn't optimal for 32-bit emulation due to CTX->GetGPRSize
shenanigans, need to check if this requirement can be lifted..).
To implement, we model as SRA and then the existing SRA code is able to generate
good code with little manual tuning. (Future work will get us to excellent code
with more tuning ;) ).
The tradeoff is reducing the working dynamic GPR set by 2 registers, which might
increase spilling in some cases. I think it's worth it in practice, though.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-17 17:37:24 -04:00
Tony Wasserka
92e4e75217
Merge DeferredSignalMutex.h and ScopedSignalMask.h into a single file
...
Using a single file makes sense now that the individual files are much
shorter and share common utility classes.
2023-11-17 10:56:34 +01:00
Tony Wasserka
5ca35bf77c
ForkableMutex: Simplify WIN32 implementation
2023-11-17 10:56:34 +01:00
Tony Wasserka
c956b82d27
ScopedSignalMask/DeferredSignalMutex: Clean up API and use std::unique_lock/shared_lock
2023-11-17 10:56:34 +01:00
Ryan Houdek
c1d5fae018
Merge pull request #3273 from alyssarosenzweig/opt/shifts
...
Optimize shifts/rotates
2023-11-14 14:13:56 -08:00
Alyssa Rosenzweig
56841f0e50
OpcodeDispatcher: avoid moves with 64bit imul
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-14 08:40:41 -04:00
Alyssa Rosenzweig
723146050b
OpcodeDispatcher: allow garbage with multiplies
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-14 08:40:41 -04:00
Alyssa Rosenzweig
85b1aa4c2d
OpcodeDispatcher: optimize mul flags
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-14 08:40:41 -04:00
Ryan Houdek
c69082b1a4
OpcodeDispatcher: Optimize three sha instructions
...
- sha1nexte
- Takes advantage of sha1h if supported
- Does the operation in a vector otherwise
- sha1msg2
- Instead of dumping everything to GPRs, we can do this with vectors
- Mostly matches ARM's sha1su1 instruction, but it is /just/
different enough to be annoying.
- sha256msg1
- Directly matches sha256u0
- Leaves the previous implementation alone
2023-11-13 18:38:02 -08:00
Ryan Houdek
74b2548982
IR: Implement support for VUSHRAI IR op
...
This matches Arm64 usra semantics. This instruction is useful for
implementing vector element rotate.
2023-11-13 18:22:45 -08:00
Ryan Houdek
f31656ec65
IR: Support sha1h and sha256u0
...
These match our needs so wire them up
2023-11-13 18:22:45 -08:00
Ryan Houdek
e91420c405
HostFeatures: fixup SHA checks
...
- Simulator doesn't support SHA
- Use DisableCrypto option to disable sha as well
- Only enable SHA if ARM cpu supports both SHA1 and SHA2
2023-11-13 18:22:45 -08:00
Ryan Houdek
25df59a65d
ArmEmitter: Fixes sha256u1 emitter
...
Noticed this was actually emitting sha256h2
2023-11-13 18:22:45 -08:00
Alyssa Rosenzweig
83fdd5720f
IR: Add ccmn
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-13 22:05:02 -04:00
Alyssa Rosenzweig
89b00c89aa
OpcodeDispatcher: optimize rcl 1-bit
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-13 21:21:01 -04:00
Alyssa Rosenzweig
d38917b5f0
OpcodeDispatcher: rm pointless constant
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-13 21:21:01 -04:00
Alyssa Rosenzweig
910e0242c1
OpcodeDispatcher: optimize rcr 1-bit
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-13 21:21:01 -04:00
Alyssa Rosenzweig
651b7bb75d
OpcodeDispatcher: optimize RCL
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-13 21:21:01 -04:00
Alyssa Rosenzweig
c9f13ae1dd
OpcodeDispatcher: optimize RCR the usual ways
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-13 21:21:01 -04:00
Alyssa Rosenzweig
862e575100
OpcodeDispatcher: avoid some ubfx for rcr with flagm
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-13 21:21:01 -04:00
Alyssa Rosenzweig
4669c4541c
OpcodeDispatcher: don't zero for flagm ror
...
missed earlier in the PR, would be annoying to rebase in.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-13 21:21:01 -04:00
Alyssa Rosenzweig
769a8c41c4
OpcodeDispatcher: Branch over shift=0 flags
...
Not supposed to touch flags at all, so don't! instead of making a terrible mess
of csels. a lot less instructions, and probably faster because the branch should
be predicted correctly in practice in hot loops.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-13 21:21:01 -04:00
Alyssa Rosenzweig
bec9dba2b1
OpcodeDispatcher: use shifted xor + rmif for rotates
...
eliminates lots of Bfe on flagm.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-13 21:21:01 -04:00
Alyssa Rosenzweig
205ba2ea13
IR: add shifted xor
...
for rotates.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-13 21:21:01 -04:00
Alyssa Rosenzweig
2073f6d287
OpcodeDispatcher: don't zero nzcv for flagm shifts
...
Faster for flagm. would be slower for !flagm because bfi slowness...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-13 21:21:01 -04:00
Alyssa Rosenzweig
0f25a960ee
OpcodeDispatcher: remove bfe for small shl imm
...
We allow the garbage in flags calculation, it's ignored.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-13 21:21:01 -04:00
Alyssa Rosenzweig
282ed3e309
OpcodeDispatcher: optimize bsf/bsr
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-13 21:21:01 -04:00
Alyssa Rosenzweig
cd031a7d38
OpcodeDispatcher: avoid some ubfx for flagm
...
Do the masking as part of the rmif, for free.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-13 21:21:01 -04:00
Alyssa Rosenzweig
e1885ed0bd
OpcodeDispatcher: Use 64-bit ubfx
...
for larger shifts.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-13 21:21:01 -04:00
Alyssa Rosenzweig
238e52f74a
OpcodeDispatcher: Don't mask 32-bit bzhi either
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-12 17:36:46 -04:00
Alyssa Rosenzweig
9398b931fb
Arm64Emitter: Handle 32-bit negatives
...
Noticed in the area.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-12 17:36:42 -04:00
Alyssa Rosenzweig
b2a9785959
OpcodeDispatcher: optimize bzhi
...
Trickery to save an instruction :')
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-12 17:32:24 -04:00
Alyssa Rosenzweig
224a1f19a3
OpcodeDispatcher: improve bzhi flag gen
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-12 17:32:24 -04:00
Alyssa Rosenzweig
e25849b2cb
OpcodeDispatcher: fix BZHI flag calculation
...
needs SF.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-12 17:32:24 -04:00