953 Commits

Author SHA1 Message Date
Alyssa Rosenzweig
b3ae81f75f OpcodeDispatcher: allow garbage on shld shift
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:26:59 -04:00
Alyssa Rosenzweig
c1a1c37980 OpcodeDispatcher: mark ideas to improve SHLD
a bit tricky right now.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:26:59 -04:00
Alyssa Rosenzweig
fb6f850bb4 OpcodeDispatcher: remove rcl sub
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:22:57 -04:00
Alyssa Rosenzweig
b6d8749525 OpcodeDispatcher: remove select from rcl
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:22:57 -04:00
Alyssa Rosenzweig
d3f1397325 OpcodeDispatcher: eliminate constants in RCR
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:22:57 -04:00
Alyssa Rosenzweig
0a164428fa OpcodeDispatcher: eliminate select in RCR
the nzcv clobber I actually came ofr

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:22:57 -04:00
Alyssa Rosenzweig
7496175100 OpcodeDispatcher: optimize 32-bit rcl/rcr
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:22:57 -04:00
Alyssa Rosenzweig
0616a9cef1 OpcodeDispatcher: eliminate move in rcr 1-bit
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:22:57 -04:00
Alyssa Rosenzweig
97f8775354 OpcodeDispatcher: optimize <32-bit rcr op1
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:22:57 -04:00
Alyssa Rosenzweig
c92099aa98 OpcodeDispatcher: fuse orlshl in rcr 1-bit
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:22:57 -04:00
Alyssa Rosenzweig
7c288b09f1 OpcodeDispatcher: rmif mask rcl smaller OF
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:22:57 -04:00
Alyssa Rosenzweig
680af7b1b0 OpcodeDispatcher: rcr op 8x1 cleanup
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:22:57 -04:00
Alyssa Rosenzweig
349bc9efab OpcodeDispatcher: unify rcr op 1bit codepaths
get additional opt for <32-bit

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:22:57 -04:00
Alyssa Rosenzweig
ad5c3cb268 OpcodeDispatcher: rmif mask for OF in rcr smaller
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:22:57 -04:00
Alyssa Rosenzweig
be8d37ef3d OpcodeDispatcher: optimize 32-bit rol/ror imm
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:22:57 -04:00
Alyssa Rosenzweig
6ad2514bfe OpcodeDispatcher: rmif mask rcl smaller cf
better on flagm. extra moves on non-flagm but, meh.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:22:57 -04:00
Alyssa Rosenzweig
3fa6129a14 OpcodeDispatcher: rmif mask rcr smaller cf
and do some constant folding to do so more.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:22:57 -04:00
Alyssa Rosenzweig
a57cebaf58 OpcodeDispatcher: skip OF calc for constant rotate >= 2
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:22:57 -04:00
Alyssa Rosenzweig
34fdb14da1 OpcodeDispatcher: add and use AndConst
this skips the constant folding, which saves the branching in the rotate
immediate implementations.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:22:57 -04:00
Alyssa Rosenzweig
974baca09c OpcodeDispatcher: allow upper garbage with rcl/rcr smaller
we're masking immediately to something smaller

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:22:57 -04:00
Alyssa Rosenzweig
f22094a493 OpcodeDispatcher: use a branch for 8/16-bit rotate flags
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:22:57 -04:00
Alyssa Rosenzweig
d979b3a1da OpcodeDispatcher: note idea to further optimize rcl
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:22:57 -04:00
Alyssa Rosenzweig
6d82c957fa OpcodeDispatcher: fuse orlshl in rcl
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-30 22:22:57 -04:00
Mai
fa3352004e
Merge pull request #3381 from alyssarosenzweig/opt/masking
Allow upper garbage on a bunch of instructions
2024-01-30 10:07:53 -05:00
Ryan Houdek
ce2924731e vixl/simulator: Enlarge simulator stack size
Simulator stack size defaults to 8KB. This new unit test requires at
least 15360 stack size. Just push it up to 8MB.
2024-01-29 19:48:38 -08:00
Ryan Houdek
bc67910ee4
Merge pull request #3382 from pmatos/TypoFix
Fix typos; NFC
2024-01-29 16:10:14 -08:00
Mai
31a4158957
Merge pull request #3383 from alyssarosenzweig/opt/ptest
Optimize PTEST and VTESTP
2024-01-29 13:30:53 -05:00
Mai
58f3d3caf5
Merge pull request #3380 from alyssarosenzweig/opt/pdep
Optimize PDEP
2024-01-29 13:27:15 -05:00
Alyssa Rosenzweig
ae48228943 OpcodeDispatcher: optimize vtestps/vtestpd
I don't really care about AVX but do the same thing we did for vptest.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:24:11 -04:00
Alyssa Rosenzweig
e8e35e48c7 OpcodeDispatcher: optimize ptest with tst
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:20:17 -04:00
Alyssa Rosenzweig
8b8f27a88f OpcodeDispatcher: optimize ptest with umaxv
to check if the vector is zero, umaxv its elements and check if the reduced
scalar is zero.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:20:17 -04:00
Alyssa Rosenzweig
8e7906a665 IR: add UMaxV
will be used to accelerate ptest

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:19:22 -04:00
Paulo Matos
027fbbf051 Optimize CDQOp 2024-01-29 17:18:02 +00:00
Paulo Matos
ca31a0404c ConstProp should generate 32bit constants when required 2024-01-29 17:15:47 +00:00
Paulo Matos
f644959c7c Fixing some typos; NFC 2024-01-29 17:14:53 +00:00
Alyssa Rosenzweig
16a54742e6 OpcodeDispatcher: optimize 32-bit tzcnt
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:11:25 -04:00
Alyssa Rosenzweig
ad9aa0bc87 OpcodeDispatcher: optimize 32-bit lzcnt
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:11:25 -04:00
Alyssa Rosenzweig
bd2b3f35a3 OpcodeDispatcher: optimize 32-bit popcnt
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:11:25 -04:00
Alyssa Rosenzweig
50169ce640 OpcodeDispatcher: optimize 32-bit pext
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:11:25 -04:00
Alyssa Rosenzweig
baae2d68f9 OpcodeDispatcher: optimize 32-bit bextr
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:11:25 -04:00
Alyssa Rosenzweig
ad8d038b8a OpcodeDispatcher: optimize 32-bit blsi
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:11:25 -04:00
Alyssa Rosenzweig
820932e3c7 OpcodeDispatcher: optimize 32-bit blsmsk
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:11:25 -04:00
Alyssa Rosenzweig
6f11f2e6f4 OpcodeDispatcher: optimize 32-bit blsr
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:11:25 -04:00
Alyssa Rosenzweig
f5ad7682c3 OpcodeDispatcher: optimize 32-bit pdep
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:06:56 -04:00
Alyssa Rosenzweig
04805f351b JIT: rewrite pdep implementation
- use better algorithm that is O(# set bits) instead of O(# total bits)
- eliminate spilling by careful management of our temporaries
- fix nzcv clobber bug (whoops)

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-01-29 13:06:56 -04:00
Mai
750b0b70bc
Merge pull request #3356 from Sonicadvance1/modify_code_lock
Jitarm64: Implements spin-loop futex for JIT blocks
2024-01-23 13:46:59 -05:00
Ryan Houdek
56d8080ec9
Merge pull request #3345 from Sonicadvance1/fix_syscall_registers
OpcodeDispatcher: Fixes syscall rcx/r11 generation
2024-01-22 15:21:13 -08:00
Ryan Houdek
c0be974272
Merge pull request #3368 from bylaws/preprcr
FEXCore: Fix RCL/RCR shift wraparound behaviour
2024-01-21 13:44:49 -08:00
Billy Laws
e323938173 FEXCore: Fix RCL/RCR shift wraparound behaviour
This ends up being cleaner to handle outside of
CalculateFlags_ShiftVariable as constant masking is only needed for
RCL/RCR.
2024-01-21 18:15:50 +00:00
Billy Laws
407e26bfee FEXCore: Use TMP1-4 for values that need preserving across spills
The ARM64EC SRA layout will use x0-3 for x86_64 registers, as such any
arguments passed to C ABI functions need to proxy their arguments
through the temporaries and move as appropriate.
2024-01-21 16:21:13 +00:00