FEX-Emu/FEX - FEX - Gitea: Git with a cup of tea

mirror of https://github.com/FEX-Emu/FEX.git synced 2024-12-14 01:18:46 +00:00

Author	SHA1	Message	Date
Ryan Houdek	b3a7a973a1	AVX128: Extends 32-bit indexes path for 128-bit operations The codepath from #3826 was only targeting 256-bit sized operations. This missed the vpgatherdq/vgatherdpd 128-bit operations. By extending the codepath to understand 128-bit operations, we now hit these instruction variants. With this PR, we now have SVE128 codepaths that handle ALL variants of x86 gather instructions! There are zero ASIMD fallbacks used in this case! Of course depending on the instruction, the performance still leaves a lot to be desired, and there is no way to emulate x86 TSO behaviour without an ASIMD fallback, which we will likely need to add as a fallback at some point. Based on #3836 until that is merged.	2024-07-08 18:44:07 -07:00
Mai	22b26696ba	Merge pull request #3836 from Sonicadvance1/optimize_sve_vpgatherdd AVX128: Optimize the vpgatherdd/vgatherdps cases that would fall back to ASIMD	2024-07-08 21:43:36 -04:00
Ryan Houdek	495241f8ca	InstcountCI: Update for wide gather vpgatherdd SVE usage	2024-07-08 18:12:28 -07:00
Ryan Houdek	4afbfcae17	AVX128: Optimize the vpgatherdd/vgatherdps cases that would fall back to ASIMD With the introduction of the wide gathers in #3828 this has opened new avenues for optimizing these cases that would typically fall back to ASIMD. In the cases that 32-bit SVE scaling doesn't fit, we can instead sign extend the elements in to double-width address registers. This then feeds naturally in to the SVE path even though we end up needing to allocate 512-bits worth of address registers. This ends up being significantly better than the ASIMD path still. Relies on #3828 to be merged first Fixes #3829	2024-07-08 18:12:28 -07:00
Mai	3627de4cbc	Merge pull request #3828 from Sonicadvance1/optimize_wide_gathers AVX128: Optimize QPS/QD variant of gather loads!	2024-07-08 21:11:36 -04:00
Ryan Houdek	007c07e612	InstcountCI: Update for wide gathers	2024-07-08 17:19:18 -07:00
Ryan Houdek	ec7c8fd922	AVX128: Optimize QPS/QD variant of gather loads! SVE has a special version of their gather instruction that gets similar behaviour to x86's VGATHERQPS/VPGATHERQD instructions. The quirk of these instructions that the previous SVE implementation didn't handle and required ASIMD fallback, was that most gather instructions require the data element size and address element size to match. This x86 instruction uses a 64-bit address size while loading 32-bit elements. This matches this specific variant of the SVE instruction, but the data is zero-extended once loaded, requiring us to shuffle the data after it is loaded. This isn't the worst but the implementation is different enough that stuffing it in to the other gather load will cause headaches. Basically gets 32 instruction variants to use the SVE version! Fixes #3827	2024-07-08 17:19:18 -07:00
Ryan Houdek	c5a0ae7b34	IR: Adds new QPS gather load variant!	2024-07-08 17:19:18 -07:00
Ryan Houdek	4bd207ebf3	Arm64: Moves 128Bit gather ASIMD emulation to its own helper It is going to get reused.	2024-07-08 17:19:18 -07:00
Tony Wasserka	45011234d9	Merge pull request #3845 from pmatos/TESTJOBCOUNTFix Use nproc only if TEST_JOB_COUNT not specified	2024-07-08 22:31:23 +02:00
Paulo Matos	24017f379e	Use nproc only if TEST_JOB_COUNT not specified	2024-07-08 21:38:56 +02:00
Mai	aad7656b38	Merge pull request #3826 from Sonicadvance1/scale_32bit_gather AVX128: Extend 32-bit address indices when possible	2024-07-08 15:29:44 -04:00
Mai	95a9f32bf0	Merge pull request #3840 from Sonicadvance1/extend_vinsert128_tests unittests: Extends vinsert{i,f}128 tests for garbage data	2024-07-07 13:39:20 -04:00
Mai	c4ae761a0e	Merge pull request #3841 from Sonicadvance1/add_missing_cpu_names CPUID: Adds a few missing CPU names for new CPU cores	2024-07-07 13:38:27 -04:00
Ryan Houdek	0653b346e0	CPUID: Adds a few missing CPU names for new CPU cores These should be making their way to the market sooner rather than later so make sure we have the descriptor text for them.	2024-07-07 02:40:19 -07:00
Ryan Houdek	fa587398bd	unittests: Extends vinsert{i,f}128 tests for garbage data Just to ensure we don't hit an issue with masking the immediate bits. Fixes #3753	2024-07-07 02:16:21 -07:00
Ryan Houdek	6b67857151	InstcountCI: Adds a missing gather instruction invariant Oops, must have accidentally deleted this while copying things around.	2024-07-06 18:32:36 -07:00
Ryan Houdek	81165f0c40	InstcountCI: Update for 32-bit gather sign extend optimization	2024-07-06 18:32:35 -07:00
Ryan Houdek	df40515087	AVX128: Extend 32-bit address indices when possible When loading 256-bits of data with only 128-bits of address indices, we can sign extend the source indices to be 64-bit. Thus falling down the ideal path for SVE where each 128-bit lane is loading the data to addresses in a 1:1 element ratio. This means we use the SVE path more often because of this. Based on top of #3825 because the prescaling behaviour was introduced there. This implements its own prescaling when the sign extension occurs because ARM's SSHLL{,2} instruction gives us that for free. This additionally fixes a bug where we were accidentally loading the top 128-bit half of the addresses for gathers when it was unnecessary, and on the AVX256 side it was duplicating and doing some additional work when it shouldn't have. It'll be good to walk the commits when looking at this one, as there are a couple of incremental changes that are easier to follow that way. Fixes #3806	2024-07-06 18:32:35 -07:00
Ryan Houdek	c77922e3e5	InstcountCI: Update for previous fix	2024-07-06 18:32:35 -07:00
Ryan Houdek	0f9abe68b9	AVX128: Fixes accidentally loading high addr register when unnnecessary Was missing a clamp on the high half when encounting a 128-bit gather instruction. Was causing us to unconditionally load the top half when it was unncessary.	2024-07-06 18:32:35 -07:00
Ryan Houdek	c168ee6940	Arm64: Implements VSSHLL{,2} IR ops	2024-07-06 18:32:35 -07:00
Ryan Houdek	0d4414fdd0	AVX128: Removes templated AddrElementSize and add as argument NFC	2024-07-06 18:32:35 -07:00
Ryan Houdek	968d5e0d8f	Merge pull request #3774 from bylaws/win-ci FEXCore ARM64EC CI support	2024-07-06 18:22:57 -07:00
Ryan Houdek	635182b57c	Merge pull request #3832 from bylaws/wow64-wine WOW64: Mark the FEX dll as a wine builtin	2024-07-06 17:58:00 -07:00
Ryan Houdek	9d0b6ce75e	Merge pull request #3835 from bylaws/ec-topdown AllocatorHooks: Allocate from the top down on windows	2024-07-06 17:40:36 -07:00
Ryan Houdek	2fdd80fe3a	Merge pull request #3833 from bylaws/common-tso Windows: Commonise TSOHandlerConfig	2024-07-06 17:38:45 -07:00
Ryan Houdek	dbac23b749	Merge pull request #3834 from bylaws/ec-amd64 Windows: Report as an AMD64 processor when targeting ARM64EC	2024-07-06 17:38:13 -07:00
Billy Laws	7fa7061aa5	Windows: Report as an AMD64 processor when targeting ARM64EC	2024-07-06 20:37:15 +00:00
Billy Laws	e45e631199	AllocatorHooks: Allocate from the top down on windows FEX allocations can get in the way of allocations that are 4gb-limited even in 65-bit mode (i.e. those from LuaJIT), so allocate starting from the top of the AS to prevent conflicts.	2024-07-06 20:35:38 +00:00
Billy Laws	b21e77c1e0	Windows: Commonise TSOHandlerConfig	2024-07-06 19:20:49 +00:00
Billy Laws	ba33294225	WOW64: Mark the FEX dll as a wine builtin Allows it to be automatically picked up by wine during prefix setup, without a manual dll override. Thanks to AndreRH for pointing me to this.	2024-07-06 19:19:36 +00:00
Billy Laws	97c21cc3a7	CI: Add ARM64EC build CI	2024-07-06 17:27:41 +01:00
Billy Laws	7d7e6f5326	CMake: Disable WOW64 module for ARM64EC	2024-07-06 17:27:41 +01:00
Billy Laws	5e15bd935e	CMake: Disable glibc jemalloc for MinGW builds	2024-07-06 17:27:41 +01:00
Ryan Houdek	9bad09c45f	Merge pull request #3823 from alyssarosenzweig/bug/shl-var-small Fix CF with small shifts	2024-07-06 01:33:57 -07:00
Ryan Houdek	47d077ff22	Merge pull request #3825 from Sonicadvance1/scale_64bit_gather AVX128: Prescale addresses in gathers if possible	2024-07-05 19:10:43 -07:00
Ryan Houdek	bbf8dde3ca	Merge pull request #3824 from alyssarosenzweig/bug/rc2 OpcodeDispatcher: Fix 8/16-bit rcr masking	2024-07-05 17:01:16 -07:00
Ryan Houdek	6e8ca3bc6c	InstcountCI: Update for gather prescaling	2024-07-05 16:47:11 -07:00
Ryan Houdek	11a494d7b3	AVX128: Prescale addresses in gathers if possible If the host supports SVE128, if the address element size and data size is 64-bit, and the scale is not one of the two that is supported by SVE; Then prescale the addresses. 64-bit address overflow masks the top bits so is well defined that we can scale the vector elements and still execute the SVE code path in that case. Removing the ASIMD code paths from a lot of gathers. Fixes #3805	2024-07-05 16:47:11 -07:00
Alyssa Rosenzweig	9b570de33f	InstCountCI: Update Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-07-05 18:44:21 -04:00
Ryan Houdek	b67343fc5a	unittests: Adds a test for small shift flags calculation Currently we calculate CF incorrectly in the case of small shifts with large offsets.	2024-07-05 18:38:12 -04:00
Alyssa Rosenzweig	5a3c0eb83c	OpcodeDispatcher: fix shl with 8/16-bit variable the special case here lines up with the special case of using a larger shift for a smaller result, so we can just grab CF from the larger result. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-07-05 18:38:12 -04:00
Alyssa Rosenzweig	10391608a0	InstCountCI: Update Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-07-05 18:34:18 -04:00
Ryan Houdek	51c57cc5ae	unittests: More rotate with carry unit tests Looks like we missed some edge cases with small carry rotate. Adds even more unit tests.	2024-07-05 18:34:18 -04:00
Alyssa Rosenzweig	05e4678e65	OpcodeDispatcher: fix missing masking on smaller RCR I probably broke this when working on eliminating crossblock liveness. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-07-05 18:34:18 -04:00
Alyssa Rosenzweig	0f0e402db4	OpcodeDispatcher: fix CF with 8/16-bit immediate Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-07-05 18:24:34 -04:00
Ryan Houdek	653bf04db0	Merge pull request #3819 from alyssarosenzweig/bug/rcr-smol Fix 8/16-bit RCR	2024-07-05 12:49:23 -07:00
Ryan Houdek	b77a25b21a	Merge pull request #3818 from alyssarosenzweig/jit/shiftbymaskstozero JIT: fix ShiftFlags masking	2024-07-05 12:49:16 -07:00
Alyssa Rosenzweig	9db6931cea	InstCountCI: Update Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-07-05 10:49:12 -04:00

1 2 3 4 5 ...

9883 Commits