799 Commits

Author SHA1 Message Date
Alyssa Rosenzweig
b3ae81f75f OpcodeDispatcher: allow garbage on shld shift
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:26:59 -04:00
Alyssa Rosenzweig
c1a1c37980 OpcodeDispatcher: mark ideas to improve SHLD
a bit tricky right now.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:26:59 -04:00
Mai
fa3352004e
Merge pull request #3381 from alyssarosenzweig/opt/masking
Allow upper garbage on a bunch of instructions
2024-01-30 10:07:53 -05:00
Ryan Houdek
ce2924731e vixl/simulator: Enlarge simulator stack size
Simulator stack size defaults to 8KB. This new unit test requires at
least 15360 stack size. Just push it up to 8MB.
2024-01-29 19:48:38 -08:00
Ryan Houdek
bc67910ee4
Merge pull request #3382 from pmatos/TypoFix
Fix typos; NFC
2024-01-29 16:10:14 -08:00
Mai
31a4158957
Merge pull request #3383 from alyssarosenzweig/opt/ptest
Optimize PTEST and VTESTP
2024-01-29 13:30:53 -05:00
Mai
58f3d3caf5
Merge pull request #3380 from alyssarosenzweig/opt/pdep
Optimize PDEP
2024-01-29 13:27:15 -05:00
Alyssa Rosenzweig
ae48228943 OpcodeDispatcher: optimize vtestps/vtestpd
I don't really care about AVX but do the same thing we did for vptest.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:24:11 -04:00
Alyssa Rosenzweig
e8e35e48c7 OpcodeDispatcher: optimize ptest with tst
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:20:17 -04:00
Alyssa Rosenzweig
8b8f27a88f OpcodeDispatcher: optimize ptest with umaxv
to check if the vector is zero, umaxv its elements and check if the reduced
scalar is zero.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:20:17 -04:00
Alyssa Rosenzweig
8e7906a665 IR: add UMaxV
will be used to accelerate ptest

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:19:22 -04:00
Paulo Matos
f644959c7c Fixing some typos; NFC 2024-01-29 17:14:53 +00:00
Alyssa Rosenzweig
16a54742e6 OpcodeDispatcher: optimize 32-bit tzcnt
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:11:25 -04:00
Alyssa Rosenzweig
ad9aa0bc87 OpcodeDispatcher: optimize 32-bit lzcnt
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:11:25 -04:00
Alyssa Rosenzweig
bd2b3f35a3 OpcodeDispatcher: optimize 32-bit popcnt
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:11:25 -04:00
Alyssa Rosenzweig
50169ce640 OpcodeDispatcher: optimize 32-bit pext
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:11:25 -04:00
Alyssa Rosenzweig
baae2d68f9 OpcodeDispatcher: optimize 32-bit bextr
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:11:25 -04:00
Alyssa Rosenzweig
ad8d038b8a OpcodeDispatcher: optimize 32-bit blsi
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:11:25 -04:00
Alyssa Rosenzweig
820932e3c7 OpcodeDispatcher: optimize 32-bit blsmsk
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:11:25 -04:00
Alyssa Rosenzweig
6f11f2e6f4 OpcodeDispatcher: optimize 32-bit blsr
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:11:25 -04:00
Alyssa Rosenzweig
f5ad7682c3 OpcodeDispatcher: optimize 32-bit pdep
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:06:56 -04:00
Alyssa Rosenzweig
04805f351b JIT: rewrite pdep implementation
- use better algorithm that is O(# set bits) instead of O(# total bits)
- eliminate spilling by careful management of our temporaries
- fix nzcv clobber bug (whoops)

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:06:56 -04:00
Mai
750b0b70bc
Merge pull request #3356 from Sonicadvance1/modify_code_lock
Jitarm64: Implements spin-loop futex for JIT blocks
2024-01-23 13:46:59 -05:00
Ryan Houdek
56d8080ec9
Merge pull request #3345 from Sonicadvance1/fix_syscall_registers
OpcodeDispatcher: Fixes syscall rcx/r11 generation
2024-01-22 15:21:13 -08:00
Ryan Houdek
c0be974272
Merge pull request #3368 from bylaws/preprcr
FEXCore: Fix RCL/RCR shift wraparound behaviour
2024-01-21 13:44:49 -08:00
Billy Laws
e323938173 FEXCore: Fix RCL/RCR shift wraparound behaviour
This ends up being cleaner to handle outside of
CalculateFlags_ShiftVariable as constant masking is only needed for
RCL/RCR.
2024-01-21 18:15:50 +00:00
Billy Laws
407e26bfee FEXCore: Use TMP1-4 for values that need preserving across spills
The ARM64EC SRA layout will use x0-3 for x86_64 registers, as such any
arguments passed to C ABI functions need to proxy their arguments
through the temporaries and move as appropriate.
2024-01-21 16:21:13 +00:00
Ryan Houdek
a6c57f71e9 SpinWaitLock: Fixes potential extra wait that would occur on contended lock
We had a chance of doing an additional bogus wfe if the expected value
was hit in one iteration of a loop. Not the biggest problem on current
hardware where WFE only ever sleeps for 1-4 system cycles, but on future
hardware where WFE might actually sleep for longer then this could have
been an issue.
2024-01-17 10:41:16 -08:00
Ryan Houdek
2af7e997f4 Spinlocks: Fix assembly
Need to have a source be +r so it doesn't get overwritten.
2024-01-17 10:19:38 -08:00
Ryan Houdek
ab6c00bbcf FEXCore/Utils: Rename FutexSpinWait to SpinWaitLock 2024-01-17 10:19:38 -08:00
Ryan Houdek
e18453cb57 Jitarm64: Implements spin-loop futex for JIT blocks
This will ensure that multiple concurrent SIGBUS handlers in the same
code block doesn't modify the same code.
2024-01-17 10:19:38 -08:00
Ryan Houdek
39f49782da Arm64: Move ParanoidTSO checks up out of the non-paranoid code bath 2024-01-17 10:19:38 -08:00
Ryan Houdek
2c5dd20f3c FutexSpinWait: Implement spin-loop Unique mutex. 2024-01-17 10:19:38 -08:00
Ryan Houdek
136fa78825 FEXCore: Implements an efficient spin-loop API
This will only be used internally inside of FEXCore for efficient shared
codecach backpatch spin-loops.
2024-01-17 10:19:38 -08:00
Ryan Houdek
f956f008ea
Merge pull request #3372 from alyssarosenzweig/opt/cmpxchg-review
Optimize GPR cmpxchg
2024-01-15 05:11:12 -08:00
Ryan Houdek
1f7a619c79 OpcodeDispatcher: Fixes syscall rcx/r11 generation
Noticed this while writing #3342.

Fixes #3343

The syscall instruction is defined in the documentation that it will set
RCX to the next instruction's RIP and R11 to be RFLAGS. We entirely
skipped this which I noticed while writing unit tests.

Adds unittests to test both 32-bit and 64-bit behaviour because our
helper shares code with both.

I don't know if anything actually relied on this behaviour but we should
definitely support it.
2024-01-12 19:14:30 -08:00
Alyssa Rosenzweig
58127bd0e8 OpcodeDispatcher: optimize trivial cmpxchgs
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-12 12:23:34 -04:00
Alyssa Rosenzweig
e8945dfb6d OpcodeDispatcher: optimize gpr cmpxchg
NZCV stuff.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-12 12:03:28 -04:00
Ryan Houdek
8c3163096b
Merge pull request #3363 from Sonicadvance1/fix_label_allocations
ArmEmitter: Support single use forward labels
2024-01-12 00:26:31 -08:00
Ryan Houdek
615cfe0246
Merge pull request #3361 from Sonicadvance1/decompose_std_function
FEXCore: Decompose some std::function usage to regular pointers
2024-01-10 16:55:29 -08:00
Ryan Houdek
3d5f876585 Fixes some new glibc allocations that cropped up
I guess this was handled by brk things before.
2024-01-09 13:55:04 -08:00
Ryan Houdek
37102400b5 Arm64: Switches uses of forward label over to SingleUse if possible
Primary goal for this is to ensure that the delinker doesn't need to
allocate any memory. This delinker can end up getting hit heavily with
JIT code so we don't want it to be allocating memory.
2024-01-08 22:18:20 -08:00
Ryan Houdek
c01e6283ae CodeEmitter: Support a single use forward label
Currently all uses of the forward label calls in to jemalloc to allocate
memory. This allows a forward label that doesn't require any memory
allocation, which is the common case in FEX.
2024-01-08 22:18:20 -08:00
Ryan Houdek
248dc97993 FEXCore: Decompose some std::function usage to regular pointers
The delinker step of the JIT was using std::function with capture
lambdas that required memory allocation when unnecessary.
Because the compiler can't see through our std::function usage it could
never decompose these by itself.

By passing the Thread's frame and record to the function as arguments
then we can have the signature be a raw function pointer.

This fixes an area of concern from:
https://github.com/FEX-Emu/FEX/blob/main/docs/ProgrammingConcerns.md#stdfunction-and-lambdas
2024-01-06 19:39:54 -08:00
Ryan Houdek
d488592eda
Merge pull request #3339 from Sonicadvance1/pass_thread_unaligned_fault_handler
FEXCore: Pass thread object to HandleUnalignedAccess
2024-01-04 18:20:37 -08:00
Ryan Houdek
743df8dfae
Merge pull request #3327 from Sonicadvance1/remove_syscall_indirection
Arm64: Removes a vtable indirection in syscalls
2024-01-04 18:19:40 -08:00
Ryan Houdek
4b3792196f
Merge pull request #3303 from Sonicadvance1/initial_runtime_longmode_switch
OpcodeDispatcher: Initial support for runtime long-mode switch
2024-01-04 18:17:54 -08:00
Ryan Houdek
db7d7a6bd7
Merge pull request #3349 from Sonicadvance1/revert_frontend_ownership
Revert "FEXLoader: Moves thread management to the frontend"
2024-01-03 14:25:04 -08:00
Alyssa Rosenzweig
04a88ed3ab
Merge pull request #3353 from Sonicadvance1/public_interface_cleaning
FEXCore interface cleaning
2024-01-03 15:14:54 -04:00
Alyssa Rosenzweig
9da08b40bd
Merge pull request #3344 from Sonicadvance1/xbyak_upstream
Externals: Update xbyak to v7.02 and switch away from fork
2024-01-03 15:13:58 -04:00