In this mode, rather than the branch comparing its arguments and then jumping
based on the result, the branch simply jumps by the native comparison based on
the NZCV value. This allows us to map x86 branches to arm64 branches 1:1.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Usually better in practice... some rotates are slightly regressed by this but
they were already terrible.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
via the 0 reg. We really need a more generalized approach to taking advantage of
wzr, but this optimizes the special case I care about for seta (saving a move to
make the impl optimal).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
This will replace Select soon, as it lets us take advantage of
NZCV-generating instructions and it doesn't clobber NZCV.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Do it explicitly for sve-256 and punt on optimizing, so we avoid regressing code
gen otherwise.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Rather than the context. Effectively a static register allocation scheme for
flags. This will let us optimize out a LOT of flag handling code, keeping things
in NZCV rather than needing to copy between NZCV and memory all the time.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Some opcodes only clobber NZCV under certain circumstances, we don't yet have
a good way of encoding that. In the mean time this hot fixes some would-be
instcountci regressions.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Again we need to handle this one specially because the dispatcher can't insert
restore code after the branch. It should be optimized in the near future, don't
worry.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Semantics differ markedly from the non-NZCV flags, splitting this out makes it a
lot easier to do things correctly imho. Gets the dest/src size correct
(important for spilling), as well as makes our existing opt passes skip this
which is needed for correctness at the moment anyway.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
This fixes an issue where CPU tunables were ending up in the thunk
generator which means if your CPU doesn't support all the features on
the *Builder* then it would crash with SIGILL. This was happening with
Canonical's runners because they typically only support ARMv8.2 but we
are compiling packages to run on ARMv8.4 devices.
cc: FEX-2311.1
SHA instructions are very large right now and cause register spilling
due to their codegen. Ender Lilies has a really large block in a
function called `sha1_block_data_order` that was causing FEX to spill
NZCV flags incorrectly. The assumption which held true before NZCV
optimizations were a thing was that all flags were either 1-bit in an
8-bit container, or just 8-bit (x87 TOP flag).
NZCV host flags broke this assumption by making its flags 32-bit which
ended up breaking when encounting spilling situations.
Replace every instance of the Op overwrite pattern, and ban that anti-pattern
from the codebase in the future. This will prevent piles of NZCV related
regressions.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>