This lets most of the ASM tests run on 16K Linux hosts which is good because I
have a Mac and I'm bad at computer.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
With the previous RCLSE pass optimization that fixes store->load
forwarding, this pass started optimizing harder.
This hit a bug with this vmov removal that previously didn't get hit.
In particular this would eliminate vmov IR operations even if they were
zero extending a vector.
Since we have dramatically cleaned up the amount of vmov IR operations
we are generating, remove this optimization entirely. In the games I
tested, the only game that hit this "optimization" was Ender Lilies and
it started generating broken code for the single block of instructions
that did.
Adds a unit test for this case just in-case it comes back in the future
for some reason.
Fixes an issue where Ender Lilies would flash the screen to black every
time an enemy hit the player character.
These instructions aren't super amazing due to the fact that they have
both a source mask and a destination duplication mask.
Setup a case where we can generate more optimal code in /most/ cases.
There are a few that still fall down a "bad" path for the result
broadcast but in most cases they are optimal. Still to be seen what
games typically use the broadcast mask as.
AVX in its infinite wisdom expanded DPPS to 256-bit, while leaving DPPD
to only support 128-bit still. This leaves the original implementation
alone for 256-bit DPPS since I don't want to break it.
This is another instruction that gets a free optimization when
SVE-128bit is supported!
If no registers alias, then we can move the first source directly into the
destination and then perform the FCADD operation as opposed to using a
temporary.
These annotations allow for a given type or parameter to be treated as
"compatible" even if data layout analysis can't infer this automatically.
assume_compatible_data_layout is more powerful than is_opaque, since it
allows for structs containing members of a certain type to be automatically
inferred as "compatible".
Conversely however, is_opaque enforces that the underlying data is never
accessed directly, since non-pointer uses of the type would still be
detected as "incompatible".
This annotation can be used for data types that can't be repacked
automatically even with custom repack annotations. With ptr_passthrough,
the types are wrapped in guest_layout and passed to the host like that.
Previously, two functions with the same signature would always be wrapped
in the same logic. This change allows customizing one function with
annotations while leaving the other one unchanged.
We can perform less moves by checking for scenarios where aliasing
occurs. Since addition is commutative (usually, general-case anyway),
order of inputs doesn't strictly matter here.
In the event no source vectors alias the destination,
we can just move the first source vector into it and
then perform the divide without needing to move afterword.
When a syscall from the *at series is provided an FD but the path is
absolute then dirfd should be ignored. We weren't correctly doing this.
Now if the path is absolute, but set the argument to the special
AT_FDCWD..
Fixes#3204
We can avoid needing to use movprfx here by moving
directly into the destination when possible and just
doing the UMAX directly.
Also expands the unsigned max tests to test values with
the sign bit set to ensure all behavior is caught.
Since SMAX performs a comparison and returns the max value regardless
of how the operands are provided, we can check for when the second
input aliases the destination.
Removes the truncating move that we perform inside the StoreResult
function and instead delegates the responsibility to the instruction
implementations themselves.
This removes a lot of redundant moves that occur on 128-bit variants
of AVX instructions.
Also fixes a weird case where we were handling 128-bit SVE
in VBroadcastFromMem when we already have AdvSIMD instructions
that will perfom the zero-extension behavior for us.