FEX-Emu/FEX - FEX - Gitea: Git with a cup of tea

mirror of https://github.com/FEX-Emu/FEX.git synced 2025-02-24 08:42:31 +00:00

Author	SHA1	Message	Date
Alyssa Rosenzweig	4c801d594a	FEXLoader: Query runtime page size This lets most of the ASM tests run on 16K Linux hosts which is good because I have a Mac and I'm bad at computer. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2023-10-23 09:35:22 -04:00
Ryan Houdek	d4403edea9	OpcodeDispatcher: Updates COMIS to eliminate scalar moves This was one of the few things that managed to hit the previously removed optimization. Just fix the OpcodeDispatcher instead.	2023-10-21 21:33:07 -07:00
Ryan Houdek	06ef012fb2	FEXCore: Fixes bug in vector `ZextAndMaskingElimination` pass With the previous RCLSE pass optimization that fixes store->load forwarding, this pass started optimizing harder. This hit a bug with this vmov removal that previously didn't get hit. In particular this would eliminate vmov IR operations even if they were zero extending a vector. Since we have dramatically cleaned up the amount of vmov IR operations we are generating, remove this optimization entirely. In the games I tested, the only game that hit this "optimization" was Ender Lilies and it started generating broken code for the single block of instructions that did. Adds a unit test for this case just in-case it comes back in the future for some reason. Fixes an issue where Ender Lilies would flash the screen to black every time an enemy hit the player character.	2023-10-21 21:21:14 -07:00
Mai	8f8f37684a	Merge pull request #3213 from Sonicadvance1/fix_repres JITArm64: Fixes bug in rpres scalar operations	2023-10-22 05:05:14 +02:00
Ryan Houdek	7140b8d901	InstCountCI: Update for RPRES fix	2023-10-21 15:29:11 -07:00
Ryan Houdek	d5beba9423	JITArm64: Fixes bug in rpres scalar operations Noticed this during code investigation, these two operations were swapped. Would have caused issues if anything supported RPRES today.	2023-10-21 15:24:43 -07:00
Ryan Houdek	826e15aea9	unittests/ASM: Adds dpps/dppd broadcast mask tests Ensures that the optimization around the broadcast mask is correct.	2023-10-20 18:15:43 +02:00
Ryan Houdek	14e80ce228	InstCountCI: Update for DPPS/DPPD Adds some new destination broadcast masks to ensure we handle most of them.	2023-10-20 18:15:43 +02:00
Ryan Houdek	165d3d3d4d	Arm64JIT: Fixes VDupElement so it respects 64-bit vector duping In some cases when we want the upper bits to be zero, this is the desired behaviour	2023-10-20 18:15:43 +02:00
Ryan Houdek	887200e571	OpcodeDispatcher: Optimize 128-bit DPPS and DPPD These instructions aren't super amazing due to the fact that they have both a source mask and a destination duplication mask. Setup a case where we can generate more optimal code in /most/ cases. There are a few that still fall down a "bad" path for the result broadcast but in most cases they are optimal. Still to be seen what games typically use the broadcast mask as. AVX in its infinite wisdom expanded DPPS to 256-bit, while leaving DPPD to only support 128-bit still. This leaves the original implementation alone for 256-bit DPPS since I don't want to break it. This is another instruction that gets a free optimization when SVE-128bit is supported!	2023-10-20 18:02:27 +02:00
Ryan Houdek	2c0bc0654d	IR: Adds new VFAddV operation SVE added this instruction natively, we can take advantage of it on SVE-128bit systems which is quite nice. Will be used soon.	2023-10-19 16:38:11 +02:00
Ryan Houdek	b3d76bd2f1	IR: Adds DPPS and DPPD source masks This will get used for these instructions soon	2023-10-19 16:36:19 +02:00
Ryan Houdek	2e694412f4	Merge pull request #3211 from lioncash/ext VectorOps: Handle SVE VExtr a little better	2023-10-19 16:04:02 +02:00
Lioncache	d84577c36c	VectorOps: Handle SVE VExtr a little better If the source registers don't alias the destination, then we can safely move the lower bits over to it without using a temporary.	2023-10-19 15:11:23 +02:00
Ryan Houdek	cf9c2aa72c	Merge pull request #3206 from Sonicadvance1/fix_syscall Linux: Fixes issue with *at syscalls with absolute paths not working	2023-10-19 15:05:34 +02:00
Ryan Houdek	1cb8e4891c	Merge pull request #3210 from lioncash/fcadd VectorOps: Handle SVE VFCADD a little better	2023-10-19 15:05:15 +02:00
Lioncache	24f2796141	VectorOps: Handle SVE VFCADD a little better If no registers alias, then we can move the first source directly into the destination and then perform the FCADD operation as opposed to using a temporary.	2023-10-19 14:48:46 +02:00
Tony Wasserka	cb215b5f21	FEXLinuxTests/thunks: Add assume_compatible_data_layout tests	2023-10-19 12:49:00 +02:00
Tony Wasserka	0cf2695772	FEXLinuxTests/thunks: Add tests for opaque types	2023-10-19 12:49:00 +02:00
Tony Wasserka	6a6886305e	Thunks/gen: Add assume_compatible/is_opaque annotations These annotations allow for a given type or parameter to be treated as "compatible" even if data layout analysis can't infer this automatically. assume_compatible_data_layout is more powerful than is_opaque, since it allows for structs containing members of a certain type to be automatically inferred as "compatible". Conversely however, is_opaque enforces that the underlying data is never accessed directly, since non-pointer uses of the type would still be detected as "incompatible".	2023-10-19 12:49:00 +02:00
Tony Wasserka	5ef7537e61	unittests/thunks: Add ptr_passthrough tests	2023-10-19 12:49:00 +02:00
Tony Wasserka	167fe85cc3	Thunks: Implement ptr_passthrough annotation This annotation can be used for data types that can't be repacked automatically even with custom repack annotations. With ptr_passthrough, the types are wrapped in guest_layout and passed to the host like that.	2023-10-19 12:49:00 +02:00
Tony Wasserka	cf65747667	Thunks: Introduce an intermediate guest_layout wrapper to unpack callback arguments This will be used later to aid automatic struct repacking.	2023-10-19 12:48:59 +02:00
Tony Wasserka	27bb28b47f	Thunks: Carry annotations in callback wrappers of host functions Previously, two functions with the same signature would always be wrapped in the same logic. This change allows customizing one function with annotations while leaving the other one unchanged.	2023-10-19 12:48:59 +02:00
Tony Wasserka	a00da800e7	Thunks: Rename funcptr_types to thunked_funcptrs This reflects its purpose slightly better, particularly since future patches will add more information to this object.	2023-10-19 12:48:59 +02:00
Tony Wasserka	bf835e80ac	Thunks: Bump compiler requirements to C++20	2023-10-19 12:48:59 +02:00
Tony Wasserka	8f246b206b	Merge pull request #3209 from neobrain/refactor_revert_vulkan_reorder	2023-10-19 12:45:14 +02:00
Ryan Houdek	3c5c23bf36	Merge pull request #3208 from lioncash/avg VectorOps: Handle SVE VURAvg a little better	2023-10-19 12:38:56 +02:00
Tony Wasserka	5bcfaf4b9f	Thunks/vulkan: Revert reordering changes from 180d16af7a99fb8e6b7105f06a2c11d9fdb9b4e3 These interfere heavily with ongoing work. Let's reapply the reordering once the dust has settled instead.	2023-10-19 12:31:33 +02:00
Lioncache	1f6c6345d9	VectorOps: Handle SVE VURAvg a little better We can perform less moves by checking for scenarios where aliasing occurs. Since addition is commutative (usually, general-case anyway), order of inputs doesn't strictly matter here.	2023-10-19 12:14:12 +02:00
Ryan Houdek	93792577eb	Merge pull request #3207 from lioncash/div VectorOps: Handle SVE VFDiv a little better	2023-10-19 11:53:45 +02:00
Lioncache	3d23cd5765	VectorOps: Handle SVE VFDiv a little better In the event no source vectors alias the destination, we can just move the first source vector into it and then perform the divide without needing to move afterword.	2023-10-19 11:45:35 +02:00
Ryan Houdek	fcc239552c	Linux: Fixes issue with at syscalls with absolute paths not working When a syscall from the at series is provided an FD but the path is absolute then dirfd should be ignored. We weren't correctly doing this. Now if the path is absolute, but set the argument to the special AT_FDCWD.. Fixes #3204	2023-10-19 09:48:50 +02:00
Ryan Houdek	8238de024f	Merge pull request #3205 from lioncash/max VectorOps: Handle SVE VSMax/VSMin and VUMax/VUMin paths a little better	2023-10-18 19:24:35 +02:00
Lioncache	39e658f02a	VectorOps: Handle more VUMin SVE cases better We can avoid needing to use movprfx here by moving directly into the destination when possible and just doing the UMIN directly	2023-10-18 18:48:13 +02:00
Lioncache	e89dd27f2a	VectorOps: Handle more VSMin SVE cases better We can avoid needing to use movprfx here by moving directly into the destination when possible and just doing the SMIN directly.	2023-10-18 18:48:13 +02:00
Lioncache	f85fae0041	VectorOps: Handle more VUMax SVE cases better We can avoid needing to use movprfx here by moving directly into the destination when possible and just doing the UMAX directly. Also expands the unsigned max tests to test values with the sign bit set to ensure all behavior is caught.	2023-10-18 18:48:12 +02:00
Lioncache	65eec673fc	VectorOps: Handle more VSMax SVE cases better Since SMAX performs a comparison and returns the max value regardless of how the operands are provided, we can check for when the second input aliases the destination.	2023-10-18 18:48:03 +02:00
Ryan Houdek	5c93a085d2	Merge pull request #3203 from lioncash/movs OpcodeDispatcher: Handle SSE vector moves into themselves a little better	2023-10-18 16:28:45 +02:00
Lioncache	4b356a7c2c	OpcodeDispatcher: Have MOVNTSD go down the non-temporal path For some reason this was using the regular unaligned path.	2023-10-18 14:59:02 +02:00
Lioncache	2b67f87054	OpcodeDispatcher: Handle SSE vector moves into themselves a little better Obviously, it's silly to do this, but we should still be generating optimal code for this case (which is none at all).	2023-10-18 14:58:57 +02:00
Ryan Houdek	1ea40ae676	Merge pull request #3201 from neobrain/fix_flt_thunks_64bit_only FEXLinuxTests: Temporarily limit thunk test execution to 64-bit guests	2023-10-18 12:40:29 +02:00
Ryan Houdek	e0ef32e0bf	Merge pull request #3202 from Sonicadvance1/oopsies_vulkan Thunks: Oops deleted an entry point	2023-10-18 12:38:05 +02:00
Ryan Houdek	a2b53c8eb0	Thunks: Oops deleted an entry point Moving some entries around I managed to delete one. Fixes Vulkan thunks.	2023-10-18 12:21:28 +02:00
Tony Wasserka	21b6cccb4e	FEXLinuxTests: Temporarily limit thunk test execution to 64-bit guests Thunking isn't fully functional on 32-bit guests currently, so non-trivial tests would currently hang in that context.	2023-10-18 12:09:10 +02:00
Tony Wasserka	d539829251	FEXLinuxTests: Drop .32/.64 suffixes from test names	2023-10-18 12:09:10 +02:00
Ryan Houdek	ef321e4bf8	Merge pull request #3200 from lioncash/mov OpcodeDispatcher: Remove unnecessary 128-bit truncating moves from StoreResult	2023-10-17 12:12:48 +02:00
Lioncache	47a0f14537	OpcodeDispatcher: Remove unnecessary 128-bit truncating moves from StoreResult Removes the truncating move that we perform inside the StoreResult function and instead delegates the responsibility to the instruction implementations themselves. This removes a lot of redundant moves that occur on 128-bit variants of AVX instructions. Also fixes a weird case where we were handling 128-bit SVE in VBroadcastFromMem when we already have AdvSIMD instructions that will perfom the zero-extension behavior for us.	2023-10-17 11:07:04 +02:00
Ryan Houdek	6d39f369b0	Merge pull request #3199 from lioncash/loadops OpcodeDispatcher: Put extra LoadSource options in a struct	2023-10-16 09:42:27 +02:00
Lioncache	2304cfc530	OpcodeDispatcher: Remove prefixing from MemoryAccessType enum Since this is an enum class, we don't need to add a prefix.	2023-10-16 03:10:33 +02:00

... 2 3 4 5 6 ...

8226 Commits