Alyssa Rosenzweig
2073f6d287
OpcodeDispatcher: don't zero nzcv for flagm shifts
...
Faster for flagm. would be slower for !flagm because bfi slowness...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-13 21:21:01 -04:00
Alyssa Rosenzweig
0f25a960ee
OpcodeDispatcher: remove bfe for small shl imm
...
We allow the garbage in flags calculation, it's ignored.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-13 21:21:01 -04:00
Alyssa Rosenzweig
282ed3e309
OpcodeDispatcher: optimize bsf/bsr
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-13 21:21:01 -04:00
Alyssa Rosenzweig
cd031a7d38
OpcodeDispatcher: avoid some ubfx for flagm
...
Do the masking as part of the rmif, for free.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-13 21:21:01 -04:00
Alyssa Rosenzweig
e1885ed0bd
OpcodeDispatcher: Use 64-bit ubfx
...
for larger shifts.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-13 21:21:01 -04:00
Alyssa Rosenzweig
238e52f74a
OpcodeDispatcher: Don't mask 32-bit bzhi either
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-12 17:36:46 -04:00
Alyssa Rosenzweig
9398b931fb
Arm64Emitter: Handle 32-bit negatives
...
Noticed in the area.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-12 17:36:42 -04:00
Alyssa Rosenzweig
b2a9785959
OpcodeDispatcher: optimize bzhi
...
Trickery to save an instruction :')
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-12 17:32:24 -04:00
Alyssa Rosenzweig
224a1f19a3
OpcodeDispatcher: improve bzhi flag gen
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-12 17:32:24 -04:00
Alyssa Rosenzweig
e25849b2cb
OpcodeDispatcher: fix BZHI flag calculation
...
needs SF.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-12 17:32:24 -04:00
Alyssa Rosenzweig
57978accc1
OpcodeDispatcher: Add heuristic to prefer rmif for InsertNZCV
...
Decent instcountci win.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-12 17:32:24 -04:00
Alyssa Rosenzweig
ff37177f4d
IR: Remove AND(N)Z conditions
...
Unused since TestNZ+NZCVSelect accomplishes the same and good riddance. Might
bring them back later for tbz/tbnz, but certainly not in this Selectful form.
(I added them when I thought we were going to RA the flags. With the more
effective static approach, we don't need this for that.)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-12 17:32:24 -04:00
Alyssa Rosenzweig
472d143021
OpcodeDispatcher: Use TestNZ directly for SelectBit
...
Lets us drop the bitwise select variants, this was the last use.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-12 17:32:24 -04:00
Alyssa Rosenzweig
157f95b08f
IR: Extend TestNZ to two sources
...
Unlock the power of AND.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-12 17:32:24 -04:00
Alyssa Rosenzweig
6bf7ab0778
IR: Remove complex branches
...
Performance footgun and now unused. Don't bring it back.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-12 17:32:24 -04:00
Alyssa Rosenzweig
0de958be2a
IR: Remove Abs
...
Now unused. If we bring it back, it should be brought back as CSSC only. On
non-CSSC platforms, an explicit cmp + predicated neg can be better.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-12 17:32:24 -04:00
Alyssa Rosenzweig
ef544fecf2
OpcodeDispatcher: Use predicated neg for x87 fild
...
Saves an instruction on non-CSSC platforms by deleting a redundant cmp.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-12 17:32:24 -04:00
Alyssa Rosenzweig
5eea68d6c6
IR: Support predicated Neg
...
i.e. cneg. will be used for x87 hell op
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-12 17:32:24 -04:00
Alyssa Rosenzweig
5471367db1
OpcodeDispatcher: Optimize branches
...
For native cases. Big perf win.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-12 17:32:24 -04:00
Alyssa Rosenzweig
b8265b1067
OpcodeDispatcher: Rework SelectCC for branches
...
Separate out the NZCV bits from the more complex stuff so we can specially
optimize the branches.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-12 17:32:24 -04:00
Alyssa Rosenzweig
099c683a5a
IR: Add FromNZCV mode to CondJump
...
In this mode, rather than the branch comparing its arguments and then jumping
based on the result, the branch simply jumps by the native comparison based on
the NZCV value. This allows us to map x86 branches to arm64 branches 1:1.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-12 17:32:24 -04:00
Alyssa Rosenzweig
ec14a65e23
OpcodeDispatcher: optimize LOOP invert
...
need to reorder since the select clobbers nzcv. before/after diff on the loopne
unit test:
> 4308: [INFO] cset w20, ne
40c41
< 4308: [INFO] mrs x20, nzcv
---
> 4308: [INFO] mrs x21, nzcv
42,49c43,48
< 4308: [INFO] cset x21, ne
< 4308: [INFO] ubfx w22, w20, #30 , #1
< 4308: [INFO] eor x22, x22, #0x1
< 4308: [INFO] and x21, x21, x22
< 4308: [INFO] msr nzcv, x20
< 4308: [INFO] cbnz x21, #+0x8 (addr 0xfffed66f8094)
< 4308: [INFO] b #+0x1c (addr 0xfffed66f80ac)
< 4308: [INFO] ldr x0, pc+8 (addr 0xfffed66f809c)
---
> 4308: [INFO] cset x22, ne
> 4308: [INFO] and x20, x22, x20
> 4308: [INFO] msr nzcv, x21
> 4308: [INFO] cbnz x20, #+0x8 (addr 0xfffec94e8090)
> 4308: [INFO] b #+0x1c (addr 0xfffec94e80a8)
> 4308: [INFO] ldr x0, pc+8 (addr 0xfffec94e8098)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-10 21:03:58 -04:00
Alyssa Rosenzweig
0d70c6a0d0
OpcodeDispatcher: Use cset for getting nzcv flags
...
Usually better in practice... some rotates are slightly regressed by this but
they were already terrible.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-10 21:03:58 -04:00
Alyssa Rosenzweig
bfa069c4d5
OpcodeDispatcher: Dirty NZCV on new blocks
...
This worked only by accident before.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-10 21:03:58 -04:00
Alyssa Rosenzweig
61bdf64e15
OpcodeDispatcher: Cleanup PF select
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-10 21:03:58 -04:00
Alyssa Rosenzweig
482b35c283
OpcodeDispatcher: Optimize JA/JNA selects
...
Chain two csel instructions together, which should be optimal.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-10 10:46:59 -04:00
Alyssa Rosenzweig
b74d886017
OpcodeDispatcher: Cleanup selectcc
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-10 10:46:59 -04:00
Alyssa Rosenzweig
1667abad7e
OpcodeDispatcher: Don't fold flags in SelectCC
...
Not beneficial in the new approach.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-10 10:46:59 -04:00
Alyssa Rosenzweig
b187a853e7
OpcodeDispatcher: Use NZCVSelect for SelectCC
...
Massively better codegen.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-10 10:46:58 -04:00
Alyssa Rosenzweig
228c7d142e
IR: Allow inline 0 on NZCVSelect
...
via the 0 reg. We really need a more generalized approach to taking advantage of
wzr, but this optimizes the special case I care about for seta (saving a move to
make the impl optimal).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-10 09:42:26 -04:00
Alyssa Rosenzweig
041199644c
IR: Add NZCVSelect op
...
This will replace Select soon, as it lets us take advantage of
NZCV-generating instructions and it doesn't clobber NZCV.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-10 09:04:42 -04:00
Alyssa Rosenzweig
3767f3633d
Merge pull request #3263 from alyssarosenzweig/opt/not-garbage
...
OpcodeDispatcher: Make "not" not garbage
2023-11-09 19:58:02 -04:00
Ryan Houdek
af3253947e
Merge pull request #3162 from alyssarosenzweig/opt/nzcv-native
...
Keep guest SF/ZF/CF/OF flags resident in host NZCV
2023-11-09 15:10:45 -08:00
Ryan Houdek
efc5eb2933
Merge pull request #3250 from Sonicadvance1/gdbserver_frontend_move
...
FEXLoader: Wire up gdbserver in the frontend
2023-11-09 14:48:59 -08:00
Alyssa Rosenzweig
da3e3fc7a3
OpcodeDispatcher: Optimize some selects
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 15:21:10 -04:00
Alyssa Rosenzweig
b5c83f0628
JIT: Optimize 8/16-bit TestNZ
...
This is cursed. I blame the darling.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 14:50:36 -04:00
Alyssa Rosenzweig
1ce3c16b30
OpcodeDispatcher: Make "not" not garbage
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 12:02:20 -04:00
Alyssa Rosenzweig
584c4cc05e
OpcodeDispatcher: Mask with rmif sometimes
...
For CF/OF calculation, this saves an instruction on flagm platforms.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 10:05:51 -04:00
Alyssa Rosenzweig
279afd88bb
OpcodeDispatcher: Generalize rmif trick
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 10:05:51 -04:00
Alyssa Rosenzweig
c0a6d82025
OpcodeDispatcher: Use rmif for NZCV inserts
...
Optimizes piles of s/w flag generation on flagm.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 09:40:51 -04:00
Alyssa Rosenzweig
3a03e1c93c
OpcodeDispatcher: rework InsertNZCV
...
in prep for rmif.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 09:40:51 -04:00
Alyssa Rosenzweig
bdaa70405f
OpcodeDispatcher: simplify flag control ops
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 09:40:51 -04:00
Alyssa Rosenzweig
1281145982
OpcodeDispatcher: Optimize popf
...
mostly for easier debugging tbh
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 09:40:51 -04:00
Alyssa Rosenzweig
87cac09477
OpcodeDispatcher: optimize cmc
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 09:40:51 -04:00
Alyssa Rosenzweig
5336129b58
IR: Optimize sub/sbb
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 09:40:51 -04:00
Alyssa Rosenzweig
72fc2b522d
IR: Optimize add/adc
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 09:40:51 -04:00
Alyssa Rosenzweig
b6f6c84790
IR: Optimize tests
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 09:40:51 -04:00
Alyssa Rosenzweig
afdb8753ba
IR: Remove some implicit flag clobbers
...
Do it explicitly for sve-256 and punt on optimizing, so we avoid regressing code
gen otherwise.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 09:40:51 -04:00
Alyssa Rosenzweig
f6a2e6739d
IR: VCMPEQ doesn't clobber nzcv
...
cmeq not cmpeq!
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 09:40:51 -04:00
Alyssa Rosenzweig
d6569d510d
Arm64: Keep host flags resident in NZCV
...
Rather than the context. Effectively a static register allocation scheme for
flags. This will let us optimize out a LOT of flag handling code, keeping things
in NZCV rather than needing to copy between NZCV and memory all the time.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 09:40:51 -04:00