FEX-Emu/FEX - FEX - Gitea: Git with a cup of tea

mirror of https://github.com/FEX-Emu/FEX.git synced 2024-12-14 17:38:47 +00:00

Author	SHA1	Message	Date
Alyssa Rosenzweig	504511fe7e	RA: fix interaction between SRA & shuffles missed a Map. tricky case hit by the unit test added in the next commit. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-07-04 13:37:13 -04:00
Ryan Houdek	d3399a261b	Docs: Update for release FEX-2407	2024-07-03 17:59:42 -07:00
Ryan Houdek	d2437e6a21	Merge pull request #3810 from Sonicadvance1/x87_mmx_unittest unittests: Adds MMX and x87 conflating unit test	2024-07-03 14:39:05 -07:00
Ryan Houdek	95dd6ceba8	unittests: Adds MMX and x87 conflating unit test This failed with prior RCLSE deletion caching.	2024-07-03 13:54:07 -07:00
Alyssa Rosenzweig	1a0d135201	Merge pull request #3809 from alyssarosenzweig/rm/old-md FEXCore: remove very out-of-date optimizer docs	2024-07-03 15:46:27 -04:00
Ryan Houdek	f453e1523e	Merge pull request #3803 from pmatos/NinjaCore Use number of jobs as defined by TEST_JOB_COUNT	2024-07-03 12:42:14 -07:00
Alyssa Rosenzweig	622b0bfbc9	FEXCore: remove very out-of-date optimizer docs most of this doesn't exist and won't exist. nothing lost here but hopes & dreams. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-07-03 11:36:48 -04:00
Paulo Matos	ad52514b97	Use number of jobs as defined by TEST_JOB_COUNT At the moment we always run ctest with max number of cpus. If undefined, it will keep current behaviour, otherwise it will honour TEST_JOB_COUNT. Therefore to run ctest one test at a time, use `cmake ... -DTEST_JOB_COUNT=1`	2024-07-03 14:09:39 +02:00
Alyssa Rosenzweig	02a218c6e3	Merge pull request #3804 from Sonicadvance1/revert_rclse_drop Revert removing RCLSE	2024-07-03 07:37:02 -04:00
Ryan Houdek	2d617ad173	InstcountCI: Update	2024-07-02 20:24:58 -07:00
Ryan Houdek	0d06e3e47d	Revert "OpcodeDispatcher: add cache" This reverts commit `46676ca376`.	2024-07-02 20:24:57 -07:00
Ryan Houdek	78aee4d96e	Revert "IR: drop RCLSE" This reverts commit `a5b24bfe4c`.	2024-07-02 20:21:59 -07:00
Ryan Houdek	ba04da87e5	Merge pull request #3780 from Sonicadvance1/optimize_gathers Optimize gathers slightly	2024-07-02 10:58:38 -07:00
Ryan Houdek	2e6b08cbcb	Merge pull request #3798 from Sonicadvance1/minor_128bit_vbsl_opt Arm64: Minor VBSL optimization with SVE128	2024-07-01 18:57:46 -07:00
Ryan Houdek	472a373861	Merge pull request #3786 from Sonicadvance1/non_temporal_stores OpcodeDispatcher: Implement support for non-temporal vector stores	2024-07-01 18:57:38 -07:00
Ryan Houdek	a451420911	Merge pull request #3783 from Sonicadvance1/optimize_vector_zeroregister OpcodeDispatcher: Optimize x86 canonical vector zero register	2024-07-01 18:57:31 -07:00
Mai	2e84f21c18	Merge pull request #3802 from Sonicadvance1/fix_sse41_helper CodeEmitter: Fixes vector {ldr,str}{b,h} with reg-reg source	2024-07-01 20:42:49 -04:00
Ryan Houdek	fb7167c2d2	CodeEmitter: Fixes vector {ldr,str}{b,h} with reg-reg source We had failed to enable these implementations for the `ExtendedMemOperand` helpers. We had already implemented the non-helper forms, which are already tested in CI. These helpers just weren't updated? Noticed this when running libaom's SSE4.1 tests, where it managed to execute a pmovzxbq instruction with reg+reg memory source and was breaking the test results. There are /very/ few vector register operations that access only 8-bit or 16-bit in vectors so this flew under the radar for quite a while. Fixes their unit tests. Also adds a unittest using sse4.1 pmovzxbq to ensure we support the reg+reg case, and also a few other instructions to test 8-bit and 16-bit vector loads and stores.	2024-07-01 17:03:47 -07:00
Mai	d884eb9287	Merge pull request #3801 from Sonicadvance1/fix_vpcmpgtw_typo unittests: Fixes typo in vpcmpgtw test	2024-07-01 18:16:54 -04:00
Ryan Houdek	8b9b1a90e4	unittests: Fixes typo in vpcmpgtw test	2024-07-01 14:42:23 -07:00
Ryan Houdek	e2d4010b59	Merge pull request #3800 from Sonicadvance1/fix_vmovlhps AVX128: Fixes vmovlhps	2024-07-01 14:41:43 -07:00
Ryan Houdek	babde31bf0	AVX128: Fixes vmovlhps We didn't have a unit test for this and we weren't implementing it at all. We treated it as vmovhps/vmovhpd accidentally. Once again caught by the libaom Intrinsics unit tests.	2024-07-01 13:54:11 -07:00
Ryan Houdek	c282239077	InstcountCI: Add SVE128 VEX_map3	2024-06-30 16:27:58 -07:00
Ryan Houdek	8d28a441ab	Arm64: Minor VBSL optimization with SVE128 This is a very minor performance change. On Cortex CPUs that support SVE, they do movprfx+<instruction> fusion to remove two cycles and a dependency from the backend. This is a minor win to convert from ASIMD mov+bsl to SVE movprfx+bsl because of this, saving two cycles and a dependency on Cortex A710 and A715. This is slightly less of a win on Cortex-A720/A725 because it supports zero-cycle vector register renames, but it is still a win on Cortex-X925 because that is an older core design that doesn't support zero-cycle vector register renames. Very silly little thing.	2024-06-30 16:22:29 -07:00
Ryan Houdek	5821054d91	Merge pull request #3789 from Sonicadvance1/avx128_minor_pshufb_opt AVX128: Minor optimization to 256-bit vpshufb	2024-06-30 15:45:11 -07:00
Ryan Houdek	4626145374	Merge pull request #3792 from Sonicadvance1/avx128_fix_scalar_fma AVX128: Fixes scalar FMA accidentally using vector wide	2024-06-30 15:36:09 -07:00
Ryan Houdek	a786d3621d	InstcountCI: Update for Scalar FMA	2024-06-30 14:36:56 -07:00
Ryan Houdek	1393dc2a5b	AVX128: Fixes scalar FMA accidentally using vector wide	2024-06-30 14:36:33 -07:00
Ryan Houdek	c4604465ba	InstcountCI: Update	2024-06-30 13:41:14 -07:00
Ryan Houdek	cffae9cb0f	AVX128: Minor optimization to 256-bit vpshufb	2024-06-30 13:41:03 -07:00
Ryan Houdek	cf24d3c33f	Merge pull request #3781 from Sonicadvance1/optimize_vmovlh AVX128: Minor optimization to vmov{l,h}{ps,pd}	2024-06-29 23:15:53 -07:00
Ryan Houdek	672e885e40	InstcountCI: Adds canonical zero register tests	2024-06-29 22:21:53 -07:00
Ryan Houdek	7d05610da7	OpcodeDispatcher: Optimize x86 canonical vector zero register The canonical way to generate a zero register vector in x86 is to xor itself. Capture this can convert it to canonical zero register instead. Can get zero-cycle renamed on latest CPUs.	2024-06-29 22:21:53 -07:00
Ryan Houdek	a843ecf4c8	InstcountCI: Update for non-temporal stores	2024-06-29 22:05:56 -07:00
Ryan Houdek	f4ff1b0688	OpcodeDispatcher: Implement support for non-temporal vector stores x86 doesn't have a lot of non-temporal vector stores but we do have a few of them. - MMX: MOVNTQ - SSE2: MOVNTDQ, MOVNTPS, MOVNTPD - AVX: VMOVNTDQ (128-bit & 256-bit), VMOVNTPD Additionally SSE4a adds 32-bit and 64-bit scalar vector non-temporal stores, which we keep as regular stores. Since ARM doesn't have matching semantics for those. Additionally SSE4.1 adds non-temporal vector LOADS which this doesn't touch. - SSE4.1: MOVNTDQA - AVX: VMOVNTDQA (128-bit) - AVX2: VMOVNTDQA (256-bit) Fixes #3364	2024-06-29 22:05:56 -07:00
Ryan Houdek	2b4cec8385	Arm64: Implement support for non-temporal vector stores	2024-06-29 22:03:17 -07:00
Ryan Houdek	8ab4ab29f8	CodeEmitter: Add SVE contiguous non-temporal instructions	2024-06-29 21:51:58 -07:00
Ryan Houdek	cc0509c0f3	InstcountCI: Update	2024-06-29 19:27:39 -07:00
Ryan Houdek	ebfa65fedc	AVX128: Minor optimization to vmov{l,h}{ps,pd}	2024-06-29 19:27:16 -07:00
Ryan Houdek	a34ae24b3f	InstcountCI: Update for SVE non-base address reg	2024-06-29 13:16:02 -07:00
Ryan Houdek	58ea76eb24	Arm64: Minor optimization to gather loads with no base addr register and SVE path Arm64's SVE load instruction can be minorly optimized in the case that a base GPR register isn't provided, as it has a version of the instruction that doesn't require one. The limitation of this instruction is that it doesn't support scaling at all so it only works if the offset scale is 1.	2024-06-29 13:14:35 -07:00
Ryan Houdek	e9a17b19c5	InstcountCI: Add SVE gathers without base addr	2024-06-29 13:07:32 -07:00
Ryan Houdek	ce8d111453	InstcountCI: Update	2024-06-29 13:04:21 -07:00
Ryan Houdek	47fd73f6cf	Arm64: Optimize non-SVE gather load When FEX hits the optimal case that the destination isn't one of the incoming sources (other than the incomingDest source) then we can optimize out two moves per 128-bit lane. Cuts 256-bit non-SVE gather loads from 50 instructions down to 46.	2024-06-29 13:02:10 -07:00
Ryan Houdek	76f3391ebc	Merge pull request #3779 from Sonicadvance1/cpuinfo_cyclecounter Linux: Calculate cycle counter frequency for cpuinfo	2024-06-29 11:58:32 -07:00
Ryan Houdek	be6ff52709	Linux: Calculate cycle counter frequency for cpuinfo Some applications don't measure rdtsc correctly and instead use cpuinfo to get the CPU core's base clock speed. Which for most x86 CPUs is the base clock speed which also matches their cycle counter speed. Did this as a quick test to see if this would help `Unbound: Worlds Apart` stuttering while BinaryNinja was disassembling the binary. Turns out the game doesn't use cpuinfo for its cycle counter speed determination, but it is good to implement this regardless.	2024-06-28 16:38:49 -07:00
Ryan Houdek	e99e252188	Merge pull request #3731 from Sonicadvance1/avx_5 HostFeatures: Always disable AVX in 32-bit mode to protect from stack overflows	2024-06-28 13:37:55 -07:00
Ryan Houdek	98b980f7e3	TestHarnessRunner: Ensure we are still reconstructing XMM registers if we don't support AVX Also fixes a bug where we were destroying the thread context before reading the data from it, spooky.	2024-06-28 13:05:52 -07:00
Ryan Houdek	f2f90eeb82	FEXCore: Make more distinctions between host register size and guest vector register size We can support a few combinations of guest and host vector sizes Host: 128-bit or 256-bit Guest: 128-bit or 256-bit The typical case is Host = 128-bit and Guest = 256-bit now that AVX is implemented. On 32-bit this changes to Host=128-bit and Guest=128-bit because we disable AVX. In the vixl simulator 32-bit turns in to Host=256-bit and Guest=128-bit. And then in the vixl sim 64-bit turns in to Host=256-bit and Guest=256-bit. We cover all four combinations of guest and host vector register sizes! Fixes a few assumptions that SVE256 = AVX256 basically.	2024-06-28 13:05:52 -07:00
Ryan Houdek	f267fd2250	HostFeatures: Always disable AVX in 32-bit mode to protect from stack overflows	2024-06-28 13:05:52 -07:00

1 2 3 4 5 ...

9862 Commits