FEX-Emu/FEX - FEX - Gitea: Git with a cup of tea

mirror of https://github.com/FEX-Emu/FEX.git synced 2025-03-04 20:47:03 +00:00

Author	SHA1	Message	Date
Alyssa Rosenzweig	1b552a6f62	JIT: fix ShiftFlags masking we don't update flags for a nonzero shift that masks to zero. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-07-05 09:57:42 -04:00
Mai	f2d1f2de56	Merge pull request #3817 from Sonicadvance1/fix_x87_integer_indefinite Softfloat: Fixes Integer indefinite return for 16-bit signed values	2024-07-04 23:11:44 -04:00
Ryan Houdek	692c2fae96	Merge pull request #3813 from alyssarosenzweig/bug/fix-sbb Fix 16-bit SBB	2024-07-04 19:52:37 -07:00
Mai	3d65b701a2	Merge pull request #3816 from Sonicadvance1/fix_long_signed_divide Arm64: Fixes long signed divide	2024-07-04 21:43:11 -04:00
Ryan Houdek	ecaca0fe15	unittests: Adds x87 integer indefinite test Tests 16-bit, 32-bit, and 64-bit integer conversions	2024-07-04 17:53:28 -07:00
Ryan Houdek	8955f83ef6	Softfloat: Fixes Integer indefinite return for 16-bit signed values Regardless of positive or negative value, if the converted integer doesn't fit in to the converted int16_t then it returns INT16_MIN.	2024-07-04 17:43:28 -07:00
Ryan Houdek	1a8aaebd79	unittests: Adds long signed divide test	2024-07-04 16:43:21 -07:00
Ryan Houdek	38a823cc54	Arm64: Fixes long signed divide The two halves are provided as two uint64_t values that shouldn't be sign extended between them. Treat them as uint64_t until combined in to a single int128_t. Fixes long signed divide.	2024-07-04 16:42:23 -07:00
Ryan Houdek	90a6647fa4	Merge pull request #3811 from alyssarosenzweig/ra/fix-lsp RA: fix interaction between SRA & shuffles	2024-07-04 14:20:46 -07:00
Alyssa Rosenzweig	a926bb81a9	InstCountCI: Update Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-07-04 16:58:45 -04:00
Alyssa Rosenzweig	fbf41e3149	unittests: add test for small sbc flags Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-07-04 16:58:45 -04:00
Alyssa Rosenzweig	a38205069b	OpcodeDispatcher: fix SBB carry flag do it the naive way, just applying the x86 definitions of SBB. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-07-04 16:58:45 -04:00
Alyssa Rosenzweig	2d75801024	unittests: add tricky RA test this fails on current main with blocksize=500 due to mentioned RA bug. passes with blocksize=1. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-07-04 13:37:13 -04:00
Alyssa Rosenzweig	504511fe7e	RA: fix interaction between SRA & shuffles missed a Map. tricky case hit by the unit test added in the next commit. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-07-04 13:37:13 -04:00
Ryan Houdek	d3399a261b	Docs: Update for release FEX-2407 FEX-2407	2024-07-03 17:59:42 -07:00
Ryan Houdek	d2437e6a21	Merge pull request #3810 from Sonicadvance1/x87_mmx_unittest unittests: Adds MMX and x87 conflating unit test	2024-07-03 14:39:05 -07:00
Ryan Houdek	95dd6ceba8	unittests: Adds MMX and x87 conflating unit test This failed with prior RCLSE deletion caching.	2024-07-03 13:54:07 -07:00
Alyssa Rosenzweig	1a0d135201	Merge pull request #3809 from alyssarosenzweig/rm/old-md FEXCore: remove very out-of-date optimizer docs	2024-07-03 15:46:27 -04:00
Ryan Houdek	f453e1523e	Merge pull request #3803 from pmatos/NinjaCore Use number of jobs as defined by TEST_JOB_COUNT	2024-07-03 12:42:14 -07:00
Alyssa Rosenzweig	622b0bfbc9	FEXCore: remove very out-of-date optimizer docs most of this doesn't exist and won't exist. nothing lost here but hopes & dreams. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-07-03 11:36:48 -04:00
Paulo Matos	ad52514b97	Use number of jobs as defined by TEST_JOB_COUNT At the moment we always run ctest with max number of cpus. If undefined, it will keep current behaviour, otherwise it will honour TEST_JOB_COUNT. Therefore to run ctest one test at a time, use `cmake ... -DTEST_JOB_COUNT=1`	2024-07-03 14:09:39 +02:00
Alyssa Rosenzweig	02a218c6e3	Merge pull request #3804 from Sonicadvance1/revert_rclse_drop Revert removing RCLSE	2024-07-03 07:37:02 -04:00
Ryan Houdek	2d617ad173	InstcountCI: Update	2024-07-02 20:24:58 -07:00
Ryan Houdek	0d06e3e47d	Revert "OpcodeDispatcher: add cache" This reverts commit 46676ca3769fe92eee9dbfcc83ed501b54f863be.	2024-07-02 20:24:57 -07:00
Ryan Houdek	78aee4d96e	Revert "IR: drop RCLSE" This reverts commit a5b24bfe4c0f955b81eefc2da8c3f907b2c3a5e5.	2024-07-02 20:21:59 -07:00
Ryan Houdek	ba04da87e5	Merge pull request #3780 from Sonicadvance1/optimize_gathers Optimize gathers slightly	2024-07-02 10:58:38 -07:00
Ryan Houdek	2e6b08cbcb	Merge pull request #3798 from Sonicadvance1/minor_128bit_vbsl_opt Arm64: Minor VBSL optimization with SVE128	2024-07-01 18:57:46 -07:00
Ryan Houdek	472a373861	Merge pull request #3786 from Sonicadvance1/non_temporal_stores OpcodeDispatcher: Implement support for non-temporal vector stores	2024-07-01 18:57:38 -07:00
Ryan Houdek	a451420911	Merge pull request #3783 from Sonicadvance1/optimize_vector_zeroregister OpcodeDispatcher: Optimize x86 canonical vector zero register	2024-07-01 18:57:31 -07:00
Mai	2e84f21c18	Merge pull request #3802 from Sonicadvance1/fix_sse41_helper CodeEmitter: Fixes vector {ldr,str}{b,h} with reg-reg source	2024-07-01 20:42:49 -04:00
Ryan Houdek	fb7167c2d2	CodeEmitter: Fixes vector {ldr,str}{b,h} with reg-reg source We had failed to enable these implementations for the `ExtendedMemOperand` helpers. We had already implemented the non-helper forms, which are already tested in CI. These helpers just weren't updated? Noticed this when running libaom's SSE4.1 tests, where it managed to execute a pmovzxbq instruction with reg+reg memory source and was breaking the test results. There are /very/ few vector register operations that access only 8-bit or 16-bit in vectors so this flew under the radar for quite a while. Fixes their unit tests. Also adds a unittest using sse4.1 pmovzxbq to ensure we support the reg+reg case, and also a few other instructions to test 8-bit and 16-bit vector loads and stores.	2024-07-01 17:03:47 -07:00
Mai	d884eb9287	Merge pull request #3801 from Sonicadvance1/fix_vpcmpgtw_typo unittests: Fixes typo in vpcmpgtw test	2024-07-01 18:16:54 -04:00
Ryan Houdek	8b9b1a90e4	unittests: Fixes typo in vpcmpgtw test	2024-07-01 14:42:23 -07:00
Ryan Houdek	e2d4010b59	Merge pull request #3800 from Sonicadvance1/fix_vmovlhps AVX128: Fixes vmovlhps	2024-07-01 14:41:43 -07:00
Ryan Houdek	babde31bf0	AVX128: Fixes vmovlhps We didn't have a unit test for this and we weren't implementing it at all. We treated it as vmovhps/vmovhpd accidentally. Once again caught by the libaom Intrinsics unit tests.	2024-07-01 13:54:11 -07:00
Ryan Houdek	c282239077	InstcountCI: Add SVE128 VEX_map3	2024-06-30 16:27:58 -07:00
Ryan Houdek	8d28a441ab	Arm64: Minor VBSL optimization with SVE128 This is a very minor performance change. On Cortex CPUs that support SVE, they do movprfx+<instruction> fusion to remove two cycles and a dependency from the backend. This is a minor win to convert from ASIMD mov+bsl to SVE movprfx+bsl because of this, saving two cycles and a dependency on Cortex A710 and A715. This is slightly less of a win on Cortex-A720/A725 because it supports zero-cycle vector register renames, but it is still a win on Cortex-X925 because that is an older core design that doesn't support zero-cycle vector register renames. Very silly little thing.	2024-06-30 16:22:29 -07:00
Ryan Houdek	5821054d91	Merge pull request #3789 from Sonicadvance1/avx128_minor_pshufb_opt AVX128: Minor optimization to 256-bit vpshufb	2024-06-30 15:45:11 -07:00
Ryan Houdek	4626145374	Merge pull request #3792 from Sonicadvance1/avx128_fix_scalar_fma AVX128: Fixes scalar FMA accidentally using vector wide	2024-06-30 15:36:09 -07:00
Ryan Houdek	a786d3621d	InstcountCI: Update for Scalar FMA	2024-06-30 14:36:56 -07:00
Ryan Houdek	1393dc2a5b	AVX128: Fixes scalar FMA accidentally using vector wide	2024-06-30 14:36:33 -07:00
Ryan Houdek	c4604465ba	InstcountCI: Update	2024-06-30 13:41:14 -07:00
Ryan Houdek	cffae9cb0f	AVX128: Minor optimization to 256-bit vpshufb	2024-06-30 13:41:03 -07:00
Ryan Houdek	cf24d3c33f	Merge pull request #3781 from Sonicadvance1/optimize_vmovlh AVX128: Minor optimization to vmov{l,h}{ps,pd}	2024-06-29 23:15:53 -07:00
Ryan Houdek	672e885e40	InstcountCI: Adds canonical zero register tests	2024-06-29 22:21:53 -07:00
Ryan Houdek	7d05610da7	OpcodeDispatcher: Optimize x86 canonical vector zero register The canonical way to generate a zero register vector in x86 is to xor itself. Capture this can convert it to canonical zero register instead. Can get zero-cycle renamed on latest CPUs.	2024-06-29 22:21:53 -07:00
Ryan Houdek	a843ecf4c8	InstcountCI: Update for non-temporal stores	2024-06-29 22:05:56 -07:00
Ryan Houdek	f4ff1b0688	OpcodeDispatcher: Implement support for non-temporal vector stores x86 doesn't have a lot of non-temporal vector stores but we do have a few of them. - MMX: MOVNTQ - SSE2: MOVNTDQ, MOVNTPS, MOVNTPD - AVX: VMOVNTDQ (128-bit & 256-bit), VMOVNTPD Additionally SSE4a adds 32-bit and 64-bit scalar vector non-temporal stores, which we keep as regular stores. Since ARM doesn't have matching semantics for those. Additionally SSE4.1 adds non-temporal vector LOADS which this doesn't touch. - SSE4.1: MOVNTDQA - AVX: VMOVNTDQA (128-bit) - AVX2: VMOVNTDQA (256-bit) Fixes #3364	2024-06-29 22:05:56 -07:00
Ryan Houdek	2b4cec8385	Arm64: Implement support for non-temporal vector stores	2024-06-29 22:03:17 -07:00
Ryan Houdek	8ab4ab29f8	CodeEmitter: Add SVE contiguous non-temporal instructions	2024-06-29 21:51:58 -07:00

1 2 3 4 5 ...

9825 Commits