These should always be used in the dispatcher rather than the raw jumps they
translate to, as they ensure that flags are flushed. Eliminates a class of bugs
that will become a lot easier to hit with the new nzcv work.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
This PR has a bug around flags calculation and REP LODS{B,W,D,Q}.
This currently passes on main but fails on #3162.
Bug only occurs in 32-bit instead of 64-bit with the same test. Should
help diagnose the bugs in #3162.
When SubShift (LSL) occurs with both sources constant then optimize away
the calculation.
Additionally if add is found to have one immediate constant where the
inverse of the constant fits in to ImmAddSub range, then invert the
constant and change it in to a sub.
This optimizes the cases when direction flag is known upfront in an
instruction.
Previously this moved two constant, did a compare and a csel. Four
instructions in total. It also corrupts NZCV which we want to use for
other things.
This new codegen emits one constant and one subtract instruction, two
instructions total and doesn't touch NZCV.
More optimal!
Audit the code base and mark any instruction that implicitly clobbers flags so
it can get special handling in the dispatcher to spill NZCV ahead of emitting.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Lots of instructions clobber NZCV inadvertently but are not intended to write to
the host flags from the IR point-of-view. As an example, Abs logically has no
side effects but physically clobbers NZCV due to its cmp/csneg impl on non-CSSC
hw. Add infrastructure to model this in the IR so we can deal with it when we
start using NZCV for things.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
So we don't need to mark VInsertElement as implicit clobber in the common case.
Only afects sve256 which doesn't exist yet.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
This lets most of the ASM tests run on 16K Linux hosts which is good because I
have a Mac and I'm bad at computer.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
With the previous RCLSE pass optimization that fixes store->load
forwarding, this pass started optimizing harder.
This hit a bug with this vmov removal that previously didn't get hit.
In particular this would eliminate vmov IR operations even if they were
zero extending a vector.
Since we have dramatically cleaned up the amount of vmov IR operations
we are generating, remove this optimization entirely. In the games I
tested, the only game that hit this "optimization" was Ender Lilies and
it started generating broken code for the single block of instructions
that did.
Adds a unit test for this case just in-case it comes back in the future
for some reason.
Fixes an issue where Ender Lilies would flash the screen to black every
time an enemy hit the player character.