8237 Commits

Author SHA1 Message Date
Alyssa Rosenzweig
099c683a5a IR: Add FromNZCV mode to CondJump
In this mode, rather than the branch comparing its arguments and then jumping
based on the result, the branch simply jumps by the native comparison based on
the NZCV value. This allows us to map x86 branches to arm64 branches 1:1.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-12 17:32:24 -04:00
Alyssa Rosenzweig
ec14a65e23 OpcodeDispatcher: optimize LOOP invert
need to reorder since the select clobbers nzcv. before/after diff on the loopne
unit test:

> 4308: [INFO] cset w20, ne
40c41
< 4308: [INFO] mrs x20, nzcv
---
> 4308: [INFO] mrs x21, nzcv
42,49c43,48
< 4308: [INFO] cset x21, ne
< 4308: [INFO] ubfx w22, w20, #30, #1
< 4308: [INFO] eor x22, x22, #0x1
< 4308: [INFO] and x21, x21, x22
< 4308: [INFO] msr nzcv, x20
< 4308: [INFO] cbnz x21, #+0x8 (addr 0xfffed66f8094)
< 4308: [INFO] b #+0x1c (addr 0xfffed66f80ac)
< 4308: [INFO] ldr x0, pc+8 (addr 0xfffed66f809c)
---
> 4308: [INFO] cset x22, ne
> 4308: [INFO] and x20, x22, x20
> 4308: [INFO] msr nzcv, x21
> 4308: [INFO] cbnz x20, #+0x8 (addr 0xfffec94e8090)
> 4308: [INFO] b #+0x1c (addr 0xfffec94e80a8)
> 4308: [INFO] ldr x0, pc+8 (addr 0xfffec94e8098)

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-10 21:03:58 -04:00
Alyssa Rosenzweig
0d70c6a0d0 OpcodeDispatcher: Use cset for getting nzcv flags
Usually better in practice... some rotates are slightly regressed by this but
they were already terrible.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-10 21:03:58 -04:00
Alyssa Rosenzweig
bfa069c4d5 OpcodeDispatcher: Dirty NZCV on new blocks
This worked only by accident before.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-10 21:03:58 -04:00
Alyssa Rosenzweig
61bdf64e15 OpcodeDispatcher: Cleanup PF select
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-10 21:03:58 -04:00
Alyssa Rosenzweig
482b35c283 OpcodeDispatcher: Optimize JA/JNA selects
Chain two csel instructions together, which should be optimal.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-10 10:46:59 -04:00
Alyssa Rosenzweig
b74d886017 OpcodeDispatcher: Cleanup selectcc
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-10 10:46:59 -04:00
Alyssa Rosenzweig
1667abad7e OpcodeDispatcher: Don't fold flags in SelectCC
Not beneficial in the new approach.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-10 10:46:59 -04:00
Alyssa Rosenzweig
b187a853e7 OpcodeDispatcher: Use NZCVSelect for SelectCC
Massively better codegen.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-10 10:46:58 -04:00
Alyssa Rosenzweig
228c7d142e IR: Allow inline 0 on NZCVSelect
via the 0 reg. We really need a more generalized approach to taking advantage of
wzr, but this optimizes the special case I care about for seta (saving a move to
make the impl optimal).

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-10 09:42:26 -04:00
Alyssa Rosenzweig
041199644c IR: Add NZCVSelect op
This will replace Select soon, as it lets us take advantage of
NZCV-generating instructions and it doesn't clobber NZCV.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-10 09:04:42 -04:00
Alyssa Rosenzweig
3767f3633d
Merge pull request #3263 from alyssarosenzweig/opt/not-garbage
OpcodeDispatcher: Make "not" not garbage
2023-11-09 19:58:02 -04:00
Ryan Houdek
af3253947e
Merge pull request #3162 from alyssarosenzweig/opt/nzcv-native
Keep guest SF/ZF/CF/OF flags resident in host NZCV
2023-11-09 15:10:45 -08:00
Ryan Houdek
efc5eb2933
Merge pull request #3250 from Sonicadvance1/gdbserver_frontend_move
FEXLoader: Wire up gdbserver in the frontend
2023-11-09 14:48:59 -08:00
Ryan Houdek
b4eeb96375
Merge pull request #3261 from Sonicadvance1/tuning_options
FEX: Only pass CPU tunables to FEXCore and FEXLoader
2023-11-09 14:48:42 -08:00
Alyssa Rosenzweig
c9832e3d34 InstCountCI: Update
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 15:22:25 -04:00
Alyssa Rosenzweig
da3e3fc7a3 OpcodeDispatcher: Optimize some selects
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 15:21:10 -04:00
Alyssa Rosenzweig
b5c83f0628 JIT: Optimize 8/16-bit TestNZ
This is cursed. I blame the darling.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 14:50:36 -04:00
Alyssa Rosenzweig
03087a55ba InstCountCI: Update
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 12:02:20 -04:00
Alyssa Rosenzweig
1ce3c16b30 OpcodeDispatcher: Make "not" not garbage
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 12:02:20 -04:00
Alyssa Rosenzweig
bf702850a9 InstCountCI: Add not bh case
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 11:32:45 -04:00
Alyssa Rosenzweig
584c4cc05e OpcodeDispatcher: Mask with rmif sometimes
For CF/OF calculation, this saves an instruction on flagm platforms.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 10:05:51 -04:00
Alyssa Rosenzweig
279afd88bb OpcodeDispatcher: Generalize rmif trick
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 10:05:51 -04:00
Alyssa Rosenzweig
c0a6d82025 OpcodeDispatcher: Use rmif for NZCV inserts
Optimizes piles of s/w flag generation on flagm.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 09:40:51 -04:00
Alyssa Rosenzweig
3a03e1c93c OpcodeDispatcher: rework InsertNZCV
in prep for rmif.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 09:40:51 -04:00
Alyssa Rosenzweig
bdaa70405f OpcodeDispatcher: simplify flag control ops
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 09:40:51 -04:00
Alyssa Rosenzweig
1281145982 OpcodeDispatcher: Optimize popf
mostly for easier debugging tbh

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 09:40:51 -04:00
Alyssa Rosenzweig
87cac09477 OpcodeDispatcher: optimize cmc
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 09:40:51 -04:00
Alyssa Rosenzweig
5336129b58 IR: Optimize sub/sbb
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 09:40:51 -04:00
Alyssa Rosenzweig
72fc2b522d IR: Optimize add/adc
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 09:40:51 -04:00
Alyssa Rosenzweig
b6f6c84790 IR: Optimize tests
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 09:40:51 -04:00
Alyssa Rosenzweig
11e9be13b1 InstCountCI: Update
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 09:40:51 -04:00
Alyssa Rosenzweig
afdb8753ba IR: Remove some implicit flag clobbers
Do it explicitly for sve-256 and punt on optimizing, so we avoid regressing code
gen otherwise.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 09:40:51 -04:00
Alyssa Rosenzweig
f6a2e6739d IR: VCMPEQ doesn't clobber nzcv
cmeq not cmpeq!

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 09:40:51 -04:00
Alyssa Rosenzweig
d6569d510d Arm64: Keep host flags resident in NZCV
Rather than the context. Effectively a static register allocation scheme for
flags. This will let us optimize out a LOT of flag handling code, keeping things
in NZCV rather than needing to copy between NZCV and memory all the time.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 09:40:51 -04:00
Alyssa Rosenzweig
04e4993d9b OpcodeDispatcher: Add a kludge to save NZCV less
Some opcodes only clobber NZCV under certain circumstances, we don't yet have
a good way of encoding that. In the mean time this hot fixes some would-be
instcountci regressions.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 09:40:51 -04:00
Alyssa Rosenzweig
783e09d67d ConstProp: Remove select code motion
Problematic in the new approach and not sure what it's trying to accomplish tbh.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 09:40:51 -04:00
Alyssa Rosenzweig
314f478225 ConstProp: remove select+branch fusion
Not beneficial in the new approach to flags.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 09:40:51 -04:00
Alyssa Rosenzweig
c1dbc28aa2 OpcodeDispatcher: Implement SaveNZCV
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 09:40:51 -04:00
Alyssa Rosenzweig
8f7e393ffb Arm64: Don't clobber NZCV in CondJump
Again we need to handle this one specially because the dispatcher can't insert
restore code after the branch. It should be optimized in the near future, don't
worry.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 09:40:51 -04:00
Alyssa Rosenzweig
b3055523b4 IR: Switch to dedicated NZCV load/store
Semantics differ markedly from the non-NZCV flags, splitting this out makes it a
lot easier to do things correctly imho. Gets the dest/src size correct
(important for spilling), as well as makes our existing opt passes skip this
which is needed for correctness at the moment anyway.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 09:40:51 -04:00
Alyssa Rosenzweig
cf6b21564c InstCountCI: disable flagm explicitly more
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 09:40:51 -04:00
Mai
996a4c023c
Merge pull request #3260 from Sonicadvance1/ender_lilies_nzcv_unittest
unittests/ASM: Adds unittest found from Ender Lilies that crashed with NZCV
2023-11-08 20:24:09 +01:00
Ryan Houdek
0dcbdcc0e2 FEX: Only pass CPU tunables to FEXCore and FEXLoader
This fixes an issue where CPU tunables were ending up in the thunk
generator which means if your CPU doesn't support all the features on
the *Builder* then it would crash with SIGILL. This was happening with
Canonical's runners because they typically only support ARMv8.2 but we
are compiling packages to run on ARMv8.4 devices.

cc: FEX-2311.1
2023-11-08 05:50:33 -08:00
Ryan Houdek
5bdd422db6 unittests/ASM: Adds unittest found from Ender Lilies that crashed with NZCV
SHA instructions are very large right now and cause register spilling
due to their codegen. Ender Lilies has a really large block in a
function called `sha1_block_data_order` that was causing FEX to spill
NZCV flags incorrectly. The assumption which held true before NZCV
optimizations were a thing was that all flags were either 1-bit in an
8-bit container, or just 8-bit (x87 TOP flag).

NZCV host flags broke this assumption by making its flags 32-bit which
ended up breaking when encounting spilling situations.
2023-11-08 04:52:22 -08:00
Alyssa Rosenzweig
bf147f47b5
Merge pull request #3258 from Sonicadvance1/remove_warnings_16
Arm64Emitter: Fixes warning
2023-11-08 07:05:26 -04:00
Alyssa Rosenzweig
3f1f7faf34
Merge pull request #3257 from alyssarosenzweig/helper/derive-op
Add helper for deriving ops by opcode
2023-11-08 06:53:45 -04:00
Ryan Houdek
1fc6725826 Arm64Emitter: Fixes warning 2023-11-08 01:27:27 -08:00
Ryan Houdek
fa8c35feba Docs: Update for release FEX-2311 FEX-2311 2023-11-07 09:42:16 -08:00
Alyssa Rosenzweig
73958b9163 OpcodeDispatcher: Use DeriveOp
Replace every instance of the Op overwrite pattern, and ban that anti-pattern
from the codebase in the future. This will prevent piles of NZCV related
regressions.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-07 12:05:00 -04:00