FEX-Emu/FEX - FEX - Gitea: Git with a cup of tea

mirror of https://github.com/FEX-Emu/FEX.git synced 2025-01-09 23:30:37 +00:00

Author	SHA1	Message	Date
Ryan Houdek	ce4b252e5c	InstCountCI: Stop disabling AVX if SVE256 is disabled.	2024-06-26 15:06:03 -07:00
Ryan Houdek	031d56de35	HostFeatures: Enables AVX unconditionally	2024-06-26 15:03:21 -07:00
Ryan Houdek	3cdaf6736b	InstcountCI: Update for SVE256 FMA implementation	2024-06-26 14:56:01 -07:00
Ryan Houdek	b5e696b3cb	CPUID: Implement support for XCR0 when AVX is enabled This enables AVX, AVX2, FMA3 for the entire CPUID! ```bash $ FEX_HOSTFEATURES=enableavx,enableavx2 ./Bin/FEXInterpreter /usr/bin/cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Cortex-A78AE stepping : 0 microcode : 0x0 cpu MHz : 3000 cache size : 512 KB physical id : 0 siblings : 12 core id : 0 cpu cores : 12 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht tm syscall nx mmxext fxsr_opt rdtscp lm 3dnow 3dnowext constant_tsc art rep_good nopl xtoplogy nonstop_tsc cpuid tsc_known_freq pni pclmulqdq dtes64 monitor tm2 ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx hypervisor lahf_lm cmp_legacy extapic abm 3dnowprefetc h tce fsgsbase bmi1 avx2 smep bmi2 erms invpcid adx clflushopt clwb sha_ni clzero arat vpclmulqdq rdpid fsrm bugs : bogomips : 8000.0 TLB size : 2560 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ``` Notice avx, avx2, and fma	2024-06-26 14:56:01 -07:00
Ryan Houdek	43aef377d7	HostFeatures: Allow enabling AVX without SVE256	2024-06-26 14:56:01 -07:00
Ryan Houdek	add0e7a8db	HostFeatures: Removes distinction between AVX and AVX2 We now no longer care about AVX versions, consolidate them in to a single config option which enables both.	2024-06-26 14:56:01 -07:00
Ryan Houdek	52e541d453	Unittests: Stop using AVX2 flag	2024-06-26 14:56:01 -07:00
Mai	a031a49546	Merge pull request #3767 from Sonicadvance1/avx128_fix_wide_shift AVX128: Fixes wide shifts	2024-06-26 17:29:09 -04:00
Alyssa Rosenzweig	4d821b8dd8	Merge pull request #3765 from Sonicadvance1/avx128_f16c AVX128: F16C support	2024-06-26 17:25:05 -04:00
Ryan Houdek	f277025c9a	AVX128: Fixes wide shifts During refactoring this was missed and rerunning unittests locally caught it. 256-bit operations get their shift only from the lower half of the vector register.	2024-06-26 14:16:39 -07:00
Ryan Houdek	ba28e6f82e	unittests: Adds vcvtps2ph tests that use mxcsr	2024-06-26 14:08:20 -07:00
Ryan Houdek	3a89df9bed	AVX128: Implement support for F16C	2024-06-26 14:05:12 -07:00
Ryan Houdek	f6a0866fbb	IR: Split Vector_FToF2 in to VFCVTL2 and VCVTFN2 I forgot in the narrowing case we need to be careful about insert. No IR op used Vector_FToF2 with narrowing.	2024-06-26 14:03:41 -07:00
Ryan Houdek	756fa2ecc5	Merge pull request #3766 from alyssarosenzweig/opt/f16c-round Optimize vcvtps2ph	2024-06-26 14:03:24 -07:00
Alyssa Rosenzweig	cf834aa6da	InstCountCI: Update Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-06-26 16:46:21 -04:00
Alyssa Rosenzweig	d2324f4a93	OpcodeDispatcher: optimize vcvtps2ph We can avoid a LOT of pointless work with some dedicated IR ops for specifically overriding the round mode. Small behaviour change here: we no longer reset FTZ. I think this is a bug fix? But if it's not it's not hard to fix. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-06-26 16:46:21 -04:00
Ryan Houdek	6226c7f4f3	Merge pull request #3757 from Sonicadvance1/avx_16 AVX128: Implement support for gathers	2024-06-26 13:29:58 -07:00
Ryan Houdek	991ecd558e	InstcountCI: Update for SVE256 gathers!	2024-06-26 16:00:53 -04:00
Ryan Houdek	a4fa3a460e	OpcodeDispatcher: Implement AVX gathers with SVE256 Just to ensure we still have feature parity.	2024-06-26 16:00:53 -04:00
Ryan Houdek	77ba708933	AVX128: Implement support for gather load instructions This is the last family of instructions that we needed to implement for AVX2 to be properly advertised!	2024-06-26 16:00:53 -04:00
Ryan Houdek	662d50a966	X86Tables: Describe VPGather in the VEX tables	2024-06-26 16:00:53 -04:00
Ryan Houdek	5472d1cc04	Arm64: Implement VLoadVectorGatherMasked operation This does a gather load three ways, SVE256, SVE128, and ASIMD. This operation is a bit special since it it can't quite handle all gather loadstores in the 256-bit case and requires the frontend to decompose the operation in the case that the striding hits a mode that SVE doesn't support! The 128-bit case is a lot simpler since both support all the cases where stride doesn't match. I find this to be a nice compromise while there aren't any SVE256 products on the market. In the 128-bit case there is an SVE path which is utilized if the passed in stride supports what SVE understands, otherwise it falls back to an ASIMD implementation which manually emulates everything that is necessary. This instruction is very explicitly doing basically exactly what AVX gather instructions want, because it's complex enough that we don't want to try and make this a generic solution.	2024-06-26 16:00:53 -04:00
Alyssa Rosenzweig	d1d41f5645	Merge pull request #3763 from alyssarosenzweig/rclse/less-aggressive Remove RCLSE	2024-06-26 15:14:14 -04:00
Ryan Houdek	94fd100fc7	Merge pull request #3719 from lioncash/f16c OpcodeDispatcher: Handle F16C operations	2024-06-26 12:12:13 -07:00
Lioncache	b9ff36b5d9	CPUID: Signify F16C support if AVX is available On Aarch64 hardware, if we have SVE2 available (which we use in the AVX implementation), then we can also enable F16C support.	2024-06-26 15:05:03 -04:00
Lioncache	cd5a809ec9	OpcodeDispatcher: Handle VCVTPS2PH	2024-06-26 15:05:03 -04:00
Lioncache	045a8efbeb	OpcodeDispatcher: Handle VCVTPH2PS Fairly straightforward, since we already have handling for half-float conversions.	2024-06-26 15:05:00 -04:00
Ryan Houdek	54a1f7d833	Merge pull request #3764 from Sonicadvance1/rorx_masking BMI2: Ensure rorx immediate masks by operation size correctly.	2024-06-26 11:52:47 -07:00
Alyssa Rosenzweig	1b496cda8f	InstCountCI: Update Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-06-26 14:49:58 -04:00
Alyssa Rosenzweig	a5b24bfe4c	IR: drop RCLSE Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-06-26 14:49:05 -04:00
Alyssa Rosenzweig	46676ca376	OpcodeDispatcher: add cache Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-06-26 14:49:05 -04:00
Alyssa Rosenzweig	7d939a3b3d	Merge pull request #3758 from Sonicadvance1/avx_17 AVX128: FMA3	2024-06-26 14:18:32 -04:00
Ryan Houdek	a515061465	BMI2: Ensure rorx immediate masks by operation size correctly.	2024-06-26 11:11:37 -07:00
Ryan Houdek	1c24d63f73	Merge pull request #3762 from alyssarosenzweig/bug/constprop-bextr	2024-06-26 09:28:18 -07:00
Alyssa Rosenzweig	7e10dba5e2	unittests: add test for a BEXTR bug Ryan reduced this test while debugging openssl. This fails without the constprop fix. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-06-26 12:06:47 -04:00
Alyssa Rosenzweig	e2d73014f1	ConstProp: fix LSHR constant prop Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-06-26 12:06:47 -04:00
Ryan Houdek	53aa30596e	InstcountCI: Update	2024-06-25 11:37:18 -07:00
Ryan Houdek	122ae5b710	unittests: Adds FMA3 unittests	2024-06-25 11:37:18 -07:00
Ryan Houdek	45c27b2965	CPUID: Enable support for FMA3 when AVX is enabled	2024-06-25 11:24:53 -07:00
Ryan Houdek	832b247fc1	SVE258: Implement support for FMA3	2024-06-25 11:24:46 -07:00
Ryan Houdek	0e8b53d566	AVX128: Implement FMA3 instructions	2024-06-25 11:23:50 -07:00
Ryan Houdek	d03d69273b	X86Tables: Describe FMA3 instructions	2024-06-25 11:22:27 -07:00
Ryan Houdek	efa05ba19d	IR: Adds support for new SUBADD FMA constants ADDSUB didn't cover this new variant.	2024-06-25 11:22:22 -07:00
Ryan Houdek	5da205d91a	Merge pull request #3760 from alyssarosenzweig/avx/vpclmulqdql AVX128: fix VPCLMULQDQl	2024-06-25 10:31:52 -07:00
Ryan Houdek	41923bac99	OpcodeDispatcher: Fixes PCMUL with weird selectors and zero-extend We had a bug where we weren't correctly ignoring the non-used bits in the selector. This was causing an assert in the ARM backend.	2024-06-25 12:54:03 -04:00
Alyssa Rosenzweig	c6148f6bf1	AVX128: fix VPCLMULQDQl use the helper. I assumed the lack of zero extension here was intentional. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-06-25 12:51:21 -04:00
Alyssa Rosenzweig	77aaa9af4d	Merge pull request #3748 from Sonicadvance1/avx_15 AVX128: More instructions Part 4	2024-06-25 12:39:48 -04:00
Ryan Houdek	00cf8d530c	Merge pull request #3752 from Sonicadvance1/fma_ir_operations ARM64: Adds new FMA vector instructions	2024-06-25 09:07:06 -07:00
Alyssa Rosenzweig	98aa58e9f5	InstCountCI: Update Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-06-25 10:03:33 -04:00
Ryan Houdek	6911917819	Disable vpclmulqdq_256 on simulator	2024-06-25 10:03:33 -04:00

1 2 3 4 5 ...

9780 Commits