8226 Commits

Author SHA1 Message Date
Alyssa Rosenzweig
4c801d594a FEXLoader: Query runtime page size
This lets most of the ASM tests run on 16K Linux hosts which is good because I
have a Mac and I'm bad at computer.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-10-23 09:35:22 -04:00
Ryan Houdek
d4403edea9 OpcodeDispatcher: Updates COMIS to eliminate scalar moves
This was one of the few things that managed to hit the previously
removed optimization. Just fix the OpcodeDispatcher instead.
2023-10-21 21:33:07 -07:00
Ryan Houdek
06ef012fb2 FEXCore: Fixes bug in vector ZextAndMaskingElimination pass
With the previous RCLSE pass optimization that fixes store->load
forwarding, this pass started optimizing harder.

This hit a bug with this vmov removal that previously didn't get hit.
In particular this would eliminate vmov IR operations even if they were
zero extending a vector.
Since we have dramatically cleaned up the amount of vmov IR operations
we are generating, remove this optimization entirely. In the games I
tested, the only game that hit this "optimization" was Ender Lilies and
it started generating broken code for the single block of instructions
that did.

Adds a unit test for this case just in-case it comes back in the future
for some reason.

Fixes an issue where Ender Lilies would flash the screen to black every
time an enemy hit the player character.
2023-10-21 21:21:14 -07:00
Mai
8f8f37684a
Merge pull request #3213 from Sonicadvance1/fix_repres
JITArm64: Fixes bug in rpres scalar operations
2023-10-22 05:05:14 +02:00
Ryan Houdek
7140b8d901 InstCountCI: Update for RPRES fix 2023-10-21 15:29:11 -07:00
Ryan Houdek
d5beba9423 JITArm64: Fixes bug in rpres scalar operations
Noticed this during code investigation, these two operations were
swapped.

Would have caused issues if anything supported RPRES today.
2023-10-21 15:24:43 -07:00
Ryan Houdek
826e15aea9 unittests/ASM: Adds dpps/dppd broadcast mask tests
Ensures that the optimization around the broadcast mask is correct.
2023-10-20 18:15:43 +02:00
Ryan Houdek
14e80ce228 InstCountCI: Update for DPPS/DPPD
Adds some new destination broadcast masks to ensure we handle most of
them.
2023-10-20 18:15:43 +02:00
Ryan Houdek
165d3d3d4d Arm64JIT: Fixes VDupElement so it respects 64-bit vector duping
In some cases when we want the upper bits to be zero, this is the
desired behaviour
2023-10-20 18:15:43 +02:00
Ryan Houdek
887200e571 OpcodeDispatcher: Optimize 128-bit DPPS and DPPD
These instructions aren't super amazing due to the fact that they have
both a source mask and a destination duplication mask.

Setup a case where we can generate more optimal code in /most/ cases.

There are a few that still fall down a "bad" path for the result
broadcast but in most cases they are optimal. Still to be seen what
games typically use the broadcast mask as.

AVX in its infinite wisdom expanded DPPS to 256-bit, while leaving DPPD
to only support 128-bit still. This leaves the original implementation
alone for 256-bit DPPS since I don't want to break it.

This is another instruction that gets a free optimization when
SVE-128bit is supported!
2023-10-20 18:02:27 +02:00
Ryan Houdek
2c0bc0654d IR: Adds new VFAddV operation
SVE added this instruction natively, we can take advantage of it on
SVE-128bit systems which is quite nice.

Will be used soon.
2023-10-19 16:38:11 +02:00
Ryan Houdek
b3d76bd2f1 IR: Adds DPPS and DPPD source masks
This will get used for these instructions soon
2023-10-19 16:36:19 +02:00
Ryan Houdek
2e694412f4
Merge pull request #3211 from lioncash/ext
VectorOps: Handle SVE VExtr a little better
2023-10-19 16:04:02 +02:00
Lioncache
d84577c36c VectorOps: Handle SVE VExtr a little better
If the source registers don't alias the destination, then we can
safely move the lower bits over to it without using a temporary.
2023-10-19 15:11:23 +02:00
Ryan Houdek
cf9c2aa72c
Merge pull request #3206 from Sonicadvance1/fix_syscall
Linux: Fixes issue with *at syscalls with absolute paths not working
2023-10-19 15:05:34 +02:00
Ryan Houdek
1cb8e4891c
Merge pull request #3210 from lioncash/fcadd
VectorOps: Handle SVE VFCADD a little better
2023-10-19 15:05:15 +02:00
Lioncache
24f2796141 VectorOps: Handle SVE VFCADD a little better
If no registers alias, then we can move the first source directly into the
destination and then perform the FCADD operation as opposed to using a
temporary.
2023-10-19 14:48:46 +02:00
Tony Wasserka
cb215b5f21 FEXLinuxTests/thunks: Add assume_compatible_data_layout tests 2023-10-19 12:49:00 +02:00
Tony Wasserka
0cf2695772 FEXLinuxTests/thunks: Add tests for opaque types 2023-10-19 12:49:00 +02:00
Tony Wasserka
6a6886305e Thunks/gen: Add assume_compatible/is_opaque annotations
These annotations allow for a given type or parameter to be treated as
"compatible" even if data layout analysis can't infer this automatically.

assume_compatible_data_layout is more powerful than is_opaque, since it
allows for structs containing members of a certain type to be automatically
inferred as "compatible".

Conversely however, is_opaque enforces that the underlying data is never
accessed directly, since non-pointer uses of the type would still be
detected as "incompatible".
2023-10-19 12:49:00 +02:00
Tony Wasserka
5ef7537e61 unittests/thunks: Add ptr_passthrough tests 2023-10-19 12:49:00 +02:00
Tony Wasserka
167fe85cc3 Thunks: Implement ptr_passthrough annotation
This annotation can be used for data types that can't be repacked
automatically even with custom repack annotations. With ptr_passthrough,
the types are wrapped in guest_layout and passed to the host like that.
2023-10-19 12:49:00 +02:00
Tony Wasserka
cf65747667 Thunks: Introduce an intermediate guest_layout wrapper to unpack callback arguments
This will be used later to aid automatic struct repacking.
2023-10-19 12:48:59 +02:00
Tony Wasserka
27bb28b47f Thunks: Carry annotations in callback wrappers of host functions
Previously, two functions with the same signature would always be wrapped
in the same logic. This change allows customizing one function with
annotations while leaving the other one unchanged.
2023-10-19 12:48:59 +02:00
Tony Wasserka
a00da800e7 Thunks: Rename funcptr_types to thunked_funcptrs
This reflects its purpose slightly better, particularly since future patches
will add more information to this object.
2023-10-19 12:48:59 +02:00
Tony Wasserka
bf835e80ac Thunks: Bump compiler requirements to C++20 2023-10-19 12:48:59 +02:00
Tony Wasserka
8f246b206b
Merge pull request #3209 from neobrain/refactor_revert_vulkan_reorder 2023-10-19 12:45:14 +02:00
Ryan Houdek
3c5c23bf36
Merge pull request #3208 from lioncash/avg
VectorOps: Handle SVE VURAvg a little better
2023-10-19 12:38:56 +02:00
Tony Wasserka
5bcfaf4b9f Thunks/vulkan: Revert reordering changes from 180d16af7a99fb8e6b7105f06a2c11d9fdb9b4e3
These interfere heavily with ongoing work. Let's reapply the reordering
once the dust has settled instead.
2023-10-19 12:31:33 +02:00
Lioncache
1f6c6345d9 VectorOps: Handle SVE VURAvg a little better
We can perform less moves by checking for scenarios where aliasing
occurs. Since addition is commutative (usually, general-case anyway),
order of inputs doesn't strictly matter here.
2023-10-19 12:14:12 +02:00
Ryan Houdek
93792577eb
Merge pull request #3207 from lioncash/div
VectorOps: Handle SVE VFDiv a little better
2023-10-19 11:53:45 +02:00
Lioncache
3d23cd5765 VectorOps: Handle SVE VFDiv a little better
In the event no source vectors alias the destination,
we can just move the first source vector into it and
then perform the divide without needing to move afterword.
2023-10-19 11:45:35 +02:00
Ryan Houdek
fcc239552c Linux: Fixes issue with *at syscalls with absolute paths not working
When a syscall from the *at series is provided an FD but the path is
absolute then dirfd should be ignored. We weren't correctly doing this.
Now if the path is absolute, but set the argument to the special
AT_FDCWD..
Fixes #3204
2023-10-19 09:48:50 +02:00
Ryan Houdek
8238de024f
Merge pull request #3205 from lioncash/max
VectorOps: Handle SVE VSMax/VSMin and VUMax/VUMin paths a little better
2023-10-18 19:24:35 +02:00
Lioncache
39e658f02a VectorOps: Handle more VUMin SVE cases better
We can avoid needing to use movprfx here by moving
directly into the destination when possible and just
doing the UMIN directly
2023-10-18 18:48:13 +02:00
Lioncache
e89dd27f2a VectorOps: Handle more VSMin SVE cases better
We can avoid needing to use movprfx here by moving
directly into the destination when possible and just
doing the SMIN directly.
2023-10-18 18:48:13 +02:00
Lioncache
f85fae0041 VectorOps: Handle more VUMax SVE cases better
We can avoid needing to use movprfx here by moving
directly into the destination when possible and just
doing the UMAX directly.

Also expands the unsigned max tests to test values with
the sign bit set to ensure all behavior is caught.
2023-10-18 18:48:12 +02:00
Lioncache
65eec673fc VectorOps: Handle more VSMax SVE cases better
Since SMAX performs a comparison and returns the max value regardless
of how the operands are provided, we can check for when the second
input aliases the destination.
2023-10-18 18:48:03 +02:00
Ryan Houdek
5c93a085d2
Merge pull request #3203 from lioncash/movs
OpcodeDispatcher: Handle SSE vector moves into themselves a little better
2023-10-18 16:28:45 +02:00
Lioncache
4b356a7c2c OpcodeDispatcher: Have MOVNTSD go down the non-temporal path
For some reason this was using the regular unaligned path.
2023-10-18 14:59:02 +02:00
Lioncache
2b67f87054 OpcodeDispatcher: Handle SSE vector moves into themselves a little better
Obviously, it's silly to do this, but we should still be generating
optimal code for this case (which is none at all).
2023-10-18 14:58:57 +02:00
Ryan Houdek
1ea40ae676
Merge pull request #3201 from neobrain/fix_flt_thunks_64bit_only
FEXLinuxTests: Temporarily limit thunk test execution to 64-bit guests
2023-10-18 12:40:29 +02:00
Ryan Houdek
e0ef32e0bf
Merge pull request #3202 from Sonicadvance1/oopsies_vulkan
Thunks: Oops deleted an entry point
2023-10-18 12:38:05 +02:00
Ryan Houdek
a2b53c8eb0 Thunks: Oops deleted an entry point
Moving some entries around I managed to delete one.
Fixes Vulkan thunks.
2023-10-18 12:21:28 +02:00
Tony Wasserka
21b6cccb4e FEXLinuxTests: Temporarily limit thunk test execution to 64-bit guests
Thunking isn't fully functional on 32-bit guests currently, so non-trivial
tests would currently hang in that context.
2023-10-18 12:09:10 +02:00
Tony Wasserka
d539829251 FEXLinuxTests: Drop .32/.64 suffixes from test names 2023-10-18 12:09:10 +02:00
Ryan Houdek
ef321e4bf8
Merge pull request #3200 from lioncash/mov
OpcodeDispatcher: Remove unnecessary 128-bit truncating moves from StoreResult
2023-10-17 12:12:48 +02:00
Lioncache
47a0f14537 OpcodeDispatcher: Remove unnecessary 128-bit truncating moves from StoreResult
Removes the truncating move that we perform inside the StoreResult
function and instead delegates the responsibility to the instruction
implementations themselves.

This removes a lot of redundant moves that occur on 128-bit variants
of AVX instructions.

Also fixes a weird case where we were handling 128-bit SVE
in VBroadcastFromMem when we already have AdvSIMD instructions
that will perfom the zero-extension behavior for us.
2023-10-17 11:07:04 +02:00
Ryan Houdek
6d39f369b0
Merge pull request #3199 from lioncash/loadops
OpcodeDispatcher: Put extra LoadSource options in a struct
2023-10-16 09:42:27 +02:00
Lioncache
2304cfc530 OpcodeDispatcher: Remove prefixing from MemoryAccessType enum
Since this is an enum class, we don't need to add a prefix.
2023-10-16 03:10:33 +02:00