FEX-Emu/FEX - FEX - Gitea: Git with a cup of tea

mirror of https://github.com/FEX-Emu/FEX.git synced 2024-12-13 17:15:41 +00:00

Author	SHA1	Message	Date
Alyssa Rosenzweig	1b552a6f62	JIT: fix ShiftFlags masking we don't update flags for a nonzero shift that masks to zero. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-07-05 09:57:42 -04:00
Mai	f2d1f2de56	Merge pull request #3817 from Sonicadvance1/fix_x87_integer_indefinite Softfloat: Fixes Integer indefinite return for 16-bit signed values	2024-07-04 23:11:44 -04:00
Ryan Houdek	692c2fae96	Merge pull request #3813 from alyssarosenzweig/bug/fix-sbb Fix 16-bit SBB	2024-07-04 19:52:37 -07:00
Ryan Houdek	8955f83ef6	Softfloat: Fixes Integer indefinite return for 16-bit signed values Regardless of positive or negative value, if the converted integer doesn't fit in to the converted int16_t then it returns INT16_MIN.	2024-07-04 17:43:28 -07:00
Ryan Houdek	38a823cc54	Arm64: Fixes long signed divide The two halves are provided as two uint64_t values that shouldn't be sign extended between them. Treat them as uint64_t until combined in to a single int128_t. Fixes long signed divide.	2024-07-04 16:42:23 -07:00
Ryan Houdek	90a6647fa4	Merge pull request #3811 from alyssarosenzweig/ra/fix-lsp RA: fix interaction between SRA & shuffles	2024-07-04 14:20:46 -07:00
Alyssa Rosenzweig	a38205069b	OpcodeDispatcher: fix SBB carry flag do it the naive way, just applying the x86 definitions of SBB. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-07-04 16:58:45 -04:00
Alyssa Rosenzweig	504511fe7e	RA: fix interaction between SRA & shuffles missed a Map. tricky case hit by the unit test added in the next commit. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-07-04 13:37:13 -04:00
Alyssa Rosenzweig	1a0d135201	Merge pull request #3809 from alyssarosenzweig/rm/old-md FEXCore: remove very out-of-date optimizer docs	2024-07-03 15:46:27 -04:00
Ryan Houdek	f453e1523e	Merge pull request #3803 from pmatos/NinjaCore Use number of jobs as defined by TEST_JOB_COUNT	2024-07-03 12:42:14 -07:00
Alyssa Rosenzweig	622b0bfbc9	FEXCore: remove very out-of-date optimizer docs most of this doesn't exist and won't exist. nothing lost here but hopes & dreams. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-07-03 11:36:48 -04:00
Paulo Matos	ad52514b97	Use number of jobs as defined by TEST_JOB_COUNT At the moment we always run ctest with max number of cpus. If undefined, it will keep current behaviour, otherwise it will honour TEST_JOB_COUNT. Therefore to run ctest one test at a time, use `cmake ... -DTEST_JOB_COUNT=1`	2024-07-03 14:09:39 +02:00
Ryan Houdek	0d06e3e47d	Revert "OpcodeDispatcher: add cache" This reverts commit `46676ca376`.	2024-07-02 20:24:57 -07:00
Ryan Houdek	78aee4d96e	Revert "IR: drop RCLSE" This reverts commit `a5b24bfe4c`.	2024-07-02 20:21:59 -07:00
Ryan Houdek	ba04da87e5	Merge pull request #3780 from Sonicadvance1/optimize_gathers Optimize gathers slightly	2024-07-02 10:58:38 -07:00
Ryan Houdek	2e6b08cbcb	Merge pull request #3798 from Sonicadvance1/minor_128bit_vbsl_opt Arm64: Minor VBSL optimization with SVE128	2024-07-01 18:57:46 -07:00
Ryan Houdek	472a373861	Merge pull request #3786 from Sonicadvance1/non_temporal_stores OpcodeDispatcher: Implement support for non-temporal vector stores	2024-07-01 18:57:38 -07:00
Ryan Houdek	a451420911	Merge pull request #3783 from Sonicadvance1/optimize_vector_zeroregister OpcodeDispatcher: Optimize x86 canonical vector zero register	2024-07-01 18:57:31 -07:00
Ryan Houdek	babde31bf0	AVX128: Fixes vmovlhps We didn't have a unit test for this and we weren't implementing it at all. We treated it as vmovhps/vmovhpd accidentally. Once again caught by the libaom Intrinsics unit tests.	2024-07-01 13:54:11 -07:00
Ryan Houdek	8d28a441ab	Arm64: Minor VBSL optimization with SVE128 This is a very minor performance change. On Cortex CPUs that support SVE, they do movprfx+<instruction> fusion to remove two cycles and a dependency from the backend. This is a minor win to convert from ASIMD mov+bsl to SVE movprfx+bsl because of this, saving two cycles and a dependency on Cortex A710 and A715. This is slightly less of a win on Cortex-A720/A725 because it supports zero-cycle vector register renames, but it is still a win on Cortex-X925 because that is an older core design that doesn't support zero-cycle vector register renames. Very silly little thing.	2024-06-30 16:22:29 -07:00
Ryan Houdek	5821054d91	Merge pull request #3789 from Sonicadvance1/avx128_minor_pshufb_opt AVX128: Minor optimization to 256-bit vpshufb	2024-06-30 15:45:11 -07:00
Ryan Houdek	4626145374	Merge pull request #3792 from Sonicadvance1/avx128_fix_scalar_fma AVX128: Fixes scalar FMA accidentally using vector wide	2024-06-30 15:36:09 -07:00
Ryan Houdek	1393dc2a5b	AVX128: Fixes scalar FMA accidentally using vector wide	2024-06-30 14:36:33 -07:00
Ryan Houdek	cffae9cb0f	AVX128: Minor optimization to 256-bit vpshufb	2024-06-30 13:41:03 -07:00
Ryan Houdek	7d05610da7	OpcodeDispatcher: Optimize x86 canonical vector zero register The canonical way to generate a zero register vector in x86 is to xor itself. Capture this can convert it to canonical zero register instead. Can get zero-cycle renamed on latest CPUs.	2024-06-29 22:21:53 -07:00
Ryan Houdek	f4ff1b0688	OpcodeDispatcher: Implement support for non-temporal vector stores x86 doesn't have a lot of non-temporal vector stores but we do have a few of them. - MMX: MOVNTQ - SSE2: MOVNTDQ, MOVNTPS, MOVNTPD - AVX: VMOVNTDQ (128-bit & 256-bit), VMOVNTPD Additionally SSE4a adds 32-bit and 64-bit scalar vector non-temporal stores, which we keep as regular stores. Since ARM doesn't have matching semantics for those. Additionally SSE4.1 adds non-temporal vector LOADS which this doesn't touch. - SSE4.1: MOVNTDQA - AVX: VMOVNTDQA (128-bit) - AVX2: VMOVNTDQA (256-bit) Fixes #3364	2024-06-29 22:05:56 -07:00
Ryan Houdek	2b4cec8385	Arm64: Implement support for non-temporal vector stores	2024-06-29 22:03:17 -07:00
Ryan Houdek	8ab4ab29f8	CodeEmitter: Add SVE contiguous non-temporal instructions	2024-06-29 21:51:58 -07:00
Ryan Houdek	ebfa65fedc	AVX128: Minor optimization to vmov{l,h}{ps,pd}	2024-06-29 19:27:16 -07:00
Ryan Houdek	58ea76eb24	Arm64: Minor optimization to gather loads with no base addr register and SVE path Arm64's SVE load instruction can be minorly optimized in the case that a base GPR register isn't provided, as it has a version of the instruction that doesn't require one. The limitation of this instruction is that it doesn't support scaling at all so it only works if the offset scale is 1.	2024-06-29 13:14:35 -07:00
Ryan Houdek	47fd73f6cf	Arm64: Optimize non-SVE gather load When FEX hits the optimal case that the destination isn't one of the incoming sources (other than the incomingDest source) then we can optimize out two moves per 128-bit lane. Cuts 256-bit non-SVE gather loads from 50 instructions down to 46.	2024-06-29 13:02:10 -07:00
Ryan Houdek	f2f90eeb82	FEXCore: Make more distinctions between host register size and guest vector register size We can support a few combinations of guest and host vector sizes Host: 128-bit or 256-bit Guest: 128-bit or 256-bit The typical case is Host = 128-bit and Guest = 256-bit now that AVX is implemented. On 32-bit this changes to Host=128-bit and Guest=128-bit because we disable AVX. In the vixl simulator 32-bit turns in to Host=256-bit and Guest=128-bit. And then in the vixl sim 64-bit turns in to Host=256-bit and Guest=256-bit. We cover all four combinations of guest and host vector register sizes! Fixes a few assumptions that SVE256 = AVX256 basically.	2024-06-28 13:05:52 -07:00
Ryan Houdek	f267fd2250	HostFeatures: Always disable AVX in 32-bit mode to protect from stack overflows	2024-06-28 13:05:52 -07:00
Ryan Houdek	4060f4018e	Frontend: Fixes invalid VSIB Index problem In regular SIB land the index register encoding of 0b100 encodes to "no register", this feature lets you get SIB encodings without an index register for flexibility. In VSIB encoding this isn't expected behaviour and instead there are no encodings where an index register is missing. Allowing you to encode all sixteen registers as an index register. This was causing an abort in `AVX128_LoadVSIB` because the index turned in to an invalid register. Working instruction: `vgatherdps ymm2, dword [eax+ymm54], ymm7` Broken instruction: `vgatherdps ymm0, dword [eax+ymm44], ymm7` This fixes a crash in libfmod where it is using gathers in the wild. Fixing a crash in Ender Lilies.	2024-06-27 20:55:30 -07:00
Ryan Houdek	aba7a3a830	AVX128: Fixes vblendps lower and upper selector	2024-06-27 17:20:39 -07:00
Ryan Houdek	9027d1eee7	AVX128: Fixes bug in vector immediate shift	2024-06-27 16:22:14 -07:00
Ryan Houdek	4e5da4946d	Merge pull request #3773 from bylaws/win-fixes Windows: Small fixes for compat with newer toolchains/wine versions	2024-06-27 15:14:20 -07:00
Billy Laws	a70e3e42b2	FEXCore: Drop unneeded MinGW library naming workaround It's generally expected for libraries to use the .a suffix with MinGW, and DLLs are still correctly named without the prior special handling.	2024-06-27 23:01:21 +01:00
Billy Laws	09f476924f	FEXCore: Fix missing return in win32 SetSignalMask path	2024-06-27 23:01:21 +01:00
Billy Laws	230e3245fd	FileLoading: Fix compilation with newer libc++	2024-06-27 23:01:21 +01:00
Ryan Houdek	b0eb63ab9a	FEXCore: Fixes address size override on GPR sources and destinations When the source or destination is a register, the address size override doesn't apply. We were accidentally applying it on all sources regardless of type which was causing us to zero extend on operations that aren't affected by address size override. This fixes the OpenSSL cert error in every application, but most importantly Steam.	2024-06-27 14:12:01 -07:00
Ryan Houdek	2e3242682d	Merge pull request #3771 from alyssarosenzweig/opt/asimd-masked OpcodeDispatcher: optimize nzcv with asimd masked load/store	2024-06-27 10:27:10 -07:00
Alyssa Rosenzweig	196a0531e0	OpcodeDispatcher: optimize nzcv with asimd masked load/store Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-06-27 10:37:06 -04:00
Alyssa Rosenzweig	f9b53c6b51	AVX_128: save a move in vzeroall Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-06-27 10:30:25 -04:00
Ryan Houdek	dad47b7bda	CPUID: Oops, forgot to enable AVX2	2024-06-26 17:43:56 -07:00
Ryan Houdek	4d56fec5f1	AVX128: Work around glibc fault testing	2024-06-26 16:49:00 -07:00
Ryan Houdek	8181552b16	AVX128: Actually install AVX helpers per thread. How this didn't break the world in my testing I don't know.	2024-06-26 16:49:00 -07:00
Ryan Houdek	975069825e	AVX128: Fix a real bug with VCVTPS2PH	2024-06-26 16:49:00 -07:00
Ryan Houdek	031d56de35	HostFeatures: Enables AVX unconditionally	2024-06-26 15:03:21 -07:00
Ryan Houdek	b5e696b3cb	CPUID: Implement support for XCR0 when AVX is enabled This enables AVX, AVX2, FMA3 for the entire CPUID! ```bash $ FEX_HOSTFEATURES=enableavx,enableavx2 ./Bin/FEXInterpreter /usr/bin/cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Cortex-A78AE stepping : 0 microcode : 0x0 cpu MHz : 3000 cache size : 512 KB physical id : 0 siblings : 12 core id : 0 cpu cores : 12 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht tm syscall nx mmxext fxsr_opt rdtscp lm 3dnow 3dnowext constant_tsc art rep_good nopl xtoplogy nonstop_tsc cpuid tsc_known_freq pni pclmulqdq dtes64 monitor tm2 ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx hypervisor lahf_lm cmp_legacy extapic abm 3dnowprefetc h tce fsgsbase bmi1 avx2 smep bmi2 erms invpcid adx clflushopt clwb sha_ni clzero arat vpclmulqdq rdpid fsrm bugs : bogomips : 8000.0 TLB size : 2560 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ``` Notice avx, avx2, and fma	2024-06-26 14:56:01 -07:00

1 2 3 4 5 ...

1592 Commits