807 Commits

Author SHA1 Message Date
Alyssa Rosenzweig
ad5c3cb268 OpcodeDispatcher: rmif mask for OF in rcr smaller
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:22:57 -04:00
Alyssa Rosenzweig
be8d37ef3d OpcodeDispatcher: optimize 32-bit rol/ror imm
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:22:57 -04:00
Alyssa Rosenzweig
6ad2514bfe OpcodeDispatcher: rmif mask rcl smaller cf
better on flagm. extra moves on non-flagm but, meh.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:22:57 -04:00
Alyssa Rosenzweig
3fa6129a14 OpcodeDispatcher: rmif mask rcr smaller cf
and do some constant folding to do so more.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:22:57 -04:00
Alyssa Rosenzweig
a57cebaf58 OpcodeDispatcher: skip OF calc for constant rotate >= 2
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:22:57 -04:00
Alyssa Rosenzweig
34fdb14da1 OpcodeDispatcher: add and use AndConst
this skips the constant folding, which saves the branching in the rotate
immediate implementations.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:22:57 -04:00
Alyssa Rosenzweig
974baca09c OpcodeDispatcher: allow upper garbage with rcl/rcr smaller
we're masking immediately to something smaller

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:22:57 -04:00
Alyssa Rosenzweig
f22094a493 OpcodeDispatcher: use a branch for 8/16-bit rotate flags
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:22:57 -04:00
Alyssa Rosenzweig
d979b3a1da OpcodeDispatcher: note idea to further optimize rcl
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:22:57 -04:00
Alyssa Rosenzweig
6d82c957fa OpcodeDispatcher: fuse orlshl in rcl
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:22:57 -04:00
Mai
fa3352004e
Merge pull request #3381 from alyssarosenzweig/opt/masking
Allow upper garbage on a bunch of instructions
2024-01-30 10:07:53 -05:00
Ryan Houdek
ce2924731e vixl/simulator: Enlarge simulator stack size
Simulator stack size defaults to 8KB. This new unit test requires at
least 15360 stack size. Just push it up to 8MB.
2024-01-29 19:48:38 -08:00
Ryan Houdek
bc67910ee4
Merge pull request #3382 from pmatos/TypoFix
Fix typos; NFC
2024-01-29 16:10:14 -08:00
Mai
31a4158957
Merge pull request #3383 from alyssarosenzweig/opt/ptest
Optimize PTEST and VTESTP
2024-01-29 13:30:53 -05:00
Mai
58f3d3caf5
Merge pull request #3380 from alyssarosenzweig/opt/pdep
Optimize PDEP
2024-01-29 13:27:15 -05:00
Alyssa Rosenzweig
ae48228943 OpcodeDispatcher: optimize vtestps/vtestpd
I don't really care about AVX but do the same thing we did for vptest.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:24:11 -04:00
Alyssa Rosenzweig
e8e35e48c7 OpcodeDispatcher: optimize ptest with tst
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:20:17 -04:00
Alyssa Rosenzweig
8b8f27a88f OpcodeDispatcher: optimize ptest with umaxv
to check if the vector is zero, umaxv its elements and check if the reduced
scalar is zero.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:20:17 -04:00
Alyssa Rosenzweig
8e7906a665 IR: add UMaxV
will be used to accelerate ptest

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:19:22 -04:00
Paulo Matos
f644959c7c Fixing some typos; NFC 2024-01-29 17:14:53 +00:00
Alyssa Rosenzweig
16a54742e6 OpcodeDispatcher: optimize 32-bit tzcnt
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:11:25 -04:00
Alyssa Rosenzweig
ad9aa0bc87 OpcodeDispatcher: optimize 32-bit lzcnt
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:11:25 -04:00
Alyssa Rosenzweig
bd2b3f35a3 OpcodeDispatcher: optimize 32-bit popcnt
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:11:25 -04:00
Alyssa Rosenzweig
50169ce640 OpcodeDispatcher: optimize 32-bit pext
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:11:25 -04:00
Alyssa Rosenzweig
baae2d68f9 OpcodeDispatcher: optimize 32-bit bextr
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:11:25 -04:00
Alyssa Rosenzweig
ad8d038b8a OpcodeDispatcher: optimize 32-bit blsi
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:11:25 -04:00
Alyssa Rosenzweig
820932e3c7 OpcodeDispatcher: optimize 32-bit blsmsk
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:11:25 -04:00
Alyssa Rosenzweig
6f11f2e6f4 OpcodeDispatcher: optimize 32-bit blsr
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:11:25 -04:00
Alyssa Rosenzweig
f5ad7682c3 OpcodeDispatcher: optimize 32-bit pdep
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:06:56 -04:00
Alyssa Rosenzweig
04805f351b JIT: rewrite pdep implementation
- use better algorithm that is O(# set bits) instead of O(# total bits)
- eliminate spilling by careful management of our temporaries
- fix nzcv clobber bug (whoops)

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:06:56 -04:00
Mai
750b0b70bc
Merge pull request #3356 from Sonicadvance1/modify_code_lock
Jitarm64: Implements spin-loop futex for JIT blocks
2024-01-23 13:46:59 -05:00
Ryan Houdek
56d8080ec9
Merge pull request #3345 from Sonicadvance1/fix_syscall_registers
OpcodeDispatcher: Fixes syscall rcx/r11 generation
2024-01-22 15:21:13 -08:00
Ryan Houdek
c0be974272
Merge pull request #3368 from bylaws/preprcr
FEXCore: Fix RCL/RCR shift wraparound behaviour
2024-01-21 13:44:49 -08:00
Billy Laws
e323938173 FEXCore: Fix RCL/RCR shift wraparound behaviour
This ends up being cleaner to handle outside of
CalculateFlags_ShiftVariable as constant masking is only needed for
RCL/RCR.
2024-01-21 18:15:50 +00:00
Billy Laws
407e26bfee FEXCore: Use TMP1-4 for values that need preserving across spills
The ARM64EC SRA layout will use x0-3 for x86_64 registers, as such any
arguments passed to C ABI functions need to proxy their arguments
through the temporaries and move as appropriate.
2024-01-21 16:21:13 +00:00
Ryan Houdek
a6c57f71e9 SpinWaitLock: Fixes potential extra wait that would occur on contended lock
We had a chance of doing an additional bogus wfe if the expected value
was hit in one iteration of a loop. Not the biggest problem on current
hardware where WFE only ever sleeps for 1-4 system cycles, but on future
hardware where WFE might actually sleep for longer then this could have
been an issue.
2024-01-17 10:41:16 -08:00
Ryan Houdek
2af7e997f4 Spinlocks: Fix assembly
Need to have a source be +r so it doesn't get overwritten.
2024-01-17 10:19:38 -08:00
Ryan Houdek
ab6c00bbcf FEXCore/Utils: Rename FutexSpinWait to SpinWaitLock 2024-01-17 10:19:38 -08:00
Ryan Houdek
e18453cb57 Jitarm64: Implements spin-loop futex for JIT blocks
This will ensure that multiple concurrent SIGBUS handlers in the same
code block doesn't modify the same code.
2024-01-17 10:19:38 -08:00
Ryan Houdek
39f49782da Arm64: Move ParanoidTSO checks up out of the non-paranoid code bath 2024-01-17 10:19:38 -08:00
Ryan Houdek
2c5dd20f3c FutexSpinWait: Implement spin-loop Unique mutex. 2024-01-17 10:19:38 -08:00
Ryan Houdek
136fa78825 FEXCore: Implements an efficient spin-loop API
This will only be used internally inside of FEXCore for efficient shared
codecach backpatch spin-loops.
2024-01-17 10:19:38 -08:00
Ryan Houdek
f956f008ea
Merge pull request #3372 from alyssarosenzweig/opt/cmpxchg-review
Optimize GPR cmpxchg
2024-01-15 05:11:12 -08:00
Ryan Houdek
1f7a619c79 OpcodeDispatcher: Fixes syscall rcx/r11 generation
Noticed this while writing #3342.

Fixes #3343

The syscall instruction is defined in the documentation that it will set
RCX to the next instruction's RIP and R11 to be RFLAGS. We entirely
skipped this which I noticed while writing unit tests.

Adds unittests to test both 32-bit and 64-bit behaviour because our
helper shares code with both.

I don't know if anything actually relied on this behaviour but we should
definitely support it.
2024-01-12 19:14:30 -08:00
Alyssa Rosenzweig
58127bd0e8 OpcodeDispatcher: optimize trivial cmpxchgs
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-12 12:23:34 -04:00
Alyssa Rosenzweig
e8945dfb6d OpcodeDispatcher: optimize gpr cmpxchg
NZCV stuff.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-12 12:03:28 -04:00
Ryan Houdek
8c3163096b
Merge pull request #3363 from Sonicadvance1/fix_label_allocations
ArmEmitter: Support single use forward labels
2024-01-12 00:26:31 -08:00
Ryan Houdek
615cfe0246
Merge pull request #3361 from Sonicadvance1/decompose_std_function
FEXCore: Decompose some std::function usage to regular pointers
2024-01-10 16:55:29 -08:00
Ryan Houdek
3d5f876585 Fixes some new glibc allocations that cropped up
I guess this was handled by brk things before.
2024-01-09 13:55:04 -08:00
Ryan Houdek
37102400b5 Arm64: Switches uses of forward label over to SingleUse if possible
Primary goal for this is to ensure that the delinker doesn't need to
allocate any memory. This delinker can end up getting hit heavily with
JIT code so we don't want it to be allocating memory.
2024-01-08 22:18:20 -08:00