Alyssa Rosenzweig
b3ae81f75f
OpcodeDispatcher: allow garbage on shld shift
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:26:59 -04:00
Alyssa Rosenzweig
c1a1c37980
OpcodeDispatcher: mark ideas to improve SHLD
...
a bit tricky right now.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:26:59 -04:00
Mai
fa3352004e
Merge pull request #3381 from alyssarosenzweig/opt/masking
...
Allow upper garbage on a bunch of instructions
2024-01-30 10:07:53 -05:00
Ryan Houdek
ce2924731e
vixl/simulator: Enlarge simulator stack size
...
Simulator stack size defaults to 8KB. This new unit test requires at
least 15360 stack size. Just push it up to 8MB.
2024-01-29 19:48:38 -08:00
Ryan Houdek
bc67910ee4
Merge pull request #3382 from pmatos/TypoFix
...
Fix typos; NFC
2024-01-29 16:10:14 -08:00
Mai
31a4158957
Merge pull request #3383 from alyssarosenzweig/opt/ptest
...
Optimize PTEST and VTESTP
2024-01-29 13:30:53 -05:00
Mai
58f3d3caf5
Merge pull request #3380 from alyssarosenzweig/opt/pdep
...
Optimize PDEP
2024-01-29 13:27:15 -05:00
Alyssa Rosenzweig
ae48228943
OpcodeDispatcher: optimize vtestps/vtestpd
...
I don't really care about AVX but do the same thing we did for vptest.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:24:11 -04:00
Alyssa Rosenzweig
e8e35e48c7
OpcodeDispatcher: optimize ptest with tst
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:20:17 -04:00
Alyssa Rosenzweig
8b8f27a88f
OpcodeDispatcher: optimize ptest with umaxv
...
to check if the vector is zero, umaxv its elements and check if the reduced
scalar is zero.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:20:17 -04:00
Alyssa Rosenzweig
8e7906a665
IR: add UMaxV
...
will be used to accelerate ptest
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:19:22 -04:00
Paulo Matos
f644959c7c
Fixing some typos; NFC
2024-01-29 17:14:53 +00:00
Alyssa Rosenzweig
16a54742e6
OpcodeDispatcher: optimize 32-bit tzcnt
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:11:25 -04:00
Alyssa Rosenzweig
ad9aa0bc87
OpcodeDispatcher: optimize 32-bit lzcnt
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:11:25 -04:00
Alyssa Rosenzweig
bd2b3f35a3
OpcodeDispatcher: optimize 32-bit popcnt
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:11:25 -04:00
Alyssa Rosenzweig
50169ce640
OpcodeDispatcher: optimize 32-bit pext
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:11:25 -04:00
Alyssa Rosenzweig
baae2d68f9
OpcodeDispatcher: optimize 32-bit bextr
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:11:25 -04:00
Alyssa Rosenzweig
ad8d038b8a
OpcodeDispatcher: optimize 32-bit blsi
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:11:25 -04:00
Alyssa Rosenzweig
820932e3c7
OpcodeDispatcher: optimize 32-bit blsmsk
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:11:25 -04:00
Alyssa Rosenzweig
6f11f2e6f4
OpcodeDispatcher: optimize 32-bit blsr
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:11:25 -04:00
Alyssa Rosenzweig
f5ad7682c3
OpcodeDispatcher: optimize 32-bit pdep
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:06:56 -04:00
Alyssa Rosenzweig
04805f351b
JIT: rewrite pdep implementation
...
- use better algorithm that is O(# set bits) instead of O(# total bits)
- eliminate spilling by careful management of our temporaries
- fix nzcv clobber bug (whoops)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:06:56 -04:00
Mai
750b0b70bc
Merge pull request #3356 from Sonicadvance1/modify_code_lock
...
Jitarm64: Implements spin-loop futex for JIT blocks
2024-01-23 13:46:59 -05:00
Ryan Houdek
56d8080ec9
Merge pull request #3345 from Sonicadvance1/fix_syscall_registers
...
OpcodeDispatcher: Fixes syscall rcx/r11 generation
2024-01-22 15:21:13 -08:00
Ryan Houdek
c0be974272
Merge pull request #3368 from bylaws/preprcr
...
FEXCore: Fix RCL/RCR shift wraparound behaviour
2024-01-21 13:44:49 -08:00
Billy Laws
e323938173
FEXCore: Fix RCL/RCR shift wraparound behaviour
...
This ends up being cleaner to handle outside of
CalculateFlags_ShiftVariable as constant masking is only needed for
RCL/RCR.
2024-01-21 18:15:50 +00:00
Billy Laws
407e26bfee
FEXCore: Use TMP1-4 for values that need preserving across spills
...
The ARM64EC SRA layout will use x0-3 for x86_64 registers, as such any
arguments passed to C ABI functions need to proxy their arguments
through the temporaries and move as appropriate.
2024-01-21 16:21:13 +00:00
Ryan Houdek
a6c57f71e9
SpinWaitLock: Fixes potential extra wait that would occur on contended lock
...
We had a chance of doing an additional bogus wfe if the expected value
was hit in one iteration of a loop. Not the biggest problem on current
hardware where WFE only ever sleeps for 1-4 system cycles, but on future
hardware where WFE might actually sleep for longer then this could have
been an issue.
2024-01-17 10:41:16 -08:00
Ryan Houdek
2af7e997f4
Spinlocks: Fix assembly
...
Need to have a source be +r so it doesn't get overwritten.
2024-01-17 10:19:38 -08:00
Ryan Houdek
ab6c00bbcf
FEXCore/Utils: Rename FutexSpinWait to SpinWaitLock
2024-01-17 10:19:38 -08:00
Ryan Houdek
e18453cb57
Jitarm64: Implements spin-loop futex for JIT blocks
...
This will ensure that multiple concurrent SIGBUS handlers in the same
code block doesn't modify the same code.
2024-01-17 10:19:38 -08:00
Ryan Houdek
39f49782da
Arm64: Move ParanoidTSO checks up out of the non-paranoid code bath
2024-01-17 10:19:38 -08:00
Ryan Houdek
2c5dd20f3c
FutexSpinWait: Implement spin-loop Unique mutex.
2024-01-17 10:19:38 -08:00
Ryan Houdek
136fa78825
FEXCore: Implements an efficient spin-loop API
...
This will only be used internally inside of FEXCore for efficient shared
codecach backpatch spin-loops.
2024-01-17 10:19:38 -08:00
Ryan Houdek
f956f008ea
Merge pull request #3372 from alyssarosenzweig/opt/cmpxchg-review
...
Optimize GPR cmpxchg
2024-01-15 05:11:12 -08:00
Ryan Houdek
1f7a619c79
OpcodeDispatcher: Fixes syscall rcx/r11 generation
...
Noticed this while writing #3342 .
Fixes #3343
The syscall instruction is defined in the documentation that it will set
RCX to the next instruction's RIP and R11 to be RFLAGS. We entirely
skipped this which I noticed while writing unit tests.
Adds unittests to test both 32-bit and 64-bit behaviour because our
helper shares code with both.
I don't know if anything actually relied on this behaviour but we should
definitely support it.
2024-01-12 19:14:30 -08:00
Alyssa Rosenzweig
58127bd0e8
OpcodeDispatcher: optimize trivial cmpxchgs
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-12 12:23:34 -04:00
Alyssa Rosenzweig
e8945dfb6d
OpcodeDispatcher: optimize gpr cmpxchg
...
NZCV stuff.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-12 12:03:28 -04:00
Ryan Houdek
8c3163096b
Merge pull request #3363 from Sonicadvance1/fix_label_allocations
...
ArmEmitter: Support single use forward labels
2024-01-12 00:26:31 -08:00
Ryan Houdek
615cfe0246
Merge pull request #3361 from Sonicadvance1/decompose_std_function
...
FEXCore: Decompose some std::function usage to regular pointers
2024-01-10 16:55:29 -08:00
Ryan Houdek
3d5f876585
Fixes some new glibc allocations that cropped up
...
I guess this was handled by brk things before.
2024-01-09 13:55:04 -08:00
Ryan Houdek
37102400b5
Arm64: Switches uses of forward label over to SingleUse if possible
...
Primary goal for this is to ensure that the delinker doesn't need to
allocate any memory. This delinker can end up getting hit heavily with
JIT code so we don't want it to be allocating memory.
2024-01-08 22:18:20 -08:00
Ryan Houdek
c01e6283ae
CodeEmitter: Support a single use forward label
...
Currently all uses of the forward label calls in to jemalloc to allocate
memory. This allows a forward label that doesn't require any memory
allocation, which is the common case in FEX.
2024-01-08 22:18:20 -08:00
Ryan Houdek
248dc97993
FEXCore: Decompose some std::function usage to regular pointers
...
The delinker step of the JIT was using std::function with capture
lambdas that required memory allocation when unnecessary.
Because the compiler can't see through our std::function usage it could
never decompose these by itself.
By passing the Thread's frame and record to the function as arguments
then we can have the signature be a raw function pointer.
This fixes an area of concern from:
https://github.com/FEX-Emu/FEX/blob/main/docs/ProgrammingConcerns.md#stdfunction-and-lambdas
2024-01-06 19:39:54 -08:00
Ryan Houdek
d488592eda
Merge pull request #3339 from Sonicadvance1/pass_thread_unaligned_fault_handler
...
FEXCore: Pass thread object to HandleUnalignedAccess
2024-01-04 18:20:37 -08:00
Ryan Houdek
743df8dfae
Merge pull request #3327 from Sonicadvance1/remove_syscall_indirection
...
Arm64: Removes a vtable indirection in syscalls
2024-01-04 18:19:40 -08:00
Ryan Houdek
4b3792196f
Merge pull request #3303 from Sonicadvance1/initial_runtime_longmode_switch
...
OpcodeDispatcher: Initial support for runtime long-mode switch
2024-01-04 18:17:54 -08:00
Ryan Houdek
db7d7a6bd7
Merge pull request #3349 from Sonicadvance1/revert_frontend_ownership
...
Revert "FEXLoader: Moves thread management to the frontend"
2024-01-03 14:25:04 -08:00
Alyssa Rosenzweig
04a88ed3ab
Merge pull request #3353 from Sonicadvance1/public_interface_cleaning
...
FEXCore interface cleaning
2024-01-03 15:14:54 -04:00
Alyssa Rosenzweig
9da08b40bd
Merge pull request #3344 from Sonicadvance1/xbyak_upstream
...
Externals: Update xbyak to v7.02 and switch away from fork
2024-01-03 15:13:58 -04:00