FEX-Emu/FEX - FEX - Gitea: Git with a cup of tea

mirror of https://github.com/FEX-Emu/FEX.git synced 2024-12-13 17:15:41 +00:00

Author	SHA1	Message	Date
Mai	58e949e148	Merge pull request #3769 from Sonicadvance1/avx2_cpuid CPUID: Oops, forgot to enable AVX2	2024-06-26 21:17:44 -04:00
Ryan Houdek	dad47b7bda	CPUID: Oops, forgot to enable AVX2	2024-06-26 17:43:56 -07:00
Ryan Houdek	e519bf5978	Merge pull request #3768 from Sonicadvance1/avx128_letsgo AVX128: Enable all the things	2024-06-26 17:40:21 -07:00
Ryan Houdek	fc50e52157	InstCountCI: Adds AVX128 tests	2024-06-26 16:49:00 -07:00
Ryan Houdek	7669df0e16	InstCountCI: SVE256: Fixes behaviour change	2024-06-26 16:49:00 -07:00
Ryan Houdek	4d56fec5f1	AVX128: Work around glibc fault testing	2024-06-26 16:49:00 -07:00
Ryan Houdek	8181552b16	AVX128: Actually install AVX helpers per thread. How this didn't break the world in my testing I don't know.	2024-06-26 16:49:00 -07:00
Ryan Houdek	c6c147daf6	unittests: Updates vcvtps2ph test for failure case of writing too much memory.	2024-06-26 16:49:00 -07:00
Ryan Houdek	975069825e	AVX128: Fix a real bug with VCVTPS2PH	2024-06-26 16:49:00 -07:00
Ryan Houdek	5133f480d1	InstcountCI: Update for xsave/xrstor behaviour changes with AVX	2024-06-26 16:49:00 -07:00
Ryan Houdek	ce4b252e5c	InstCountCI: Stop disabling AVX if SVE256 is disabled.	2024-06-26 15:06:03 -07:00
Ryan Houdek	031d56de35	HostFeatures: Enables AVX unconditionally	2024-06-26 15:03:21 -07:00
Ryan Houdek	3cdaf6736b	InstcountCI: Update for SVE256 FMA implementation	2024-06-26 14:56:01 -07:00
Ryan Houdek	b5e696b3cb	CPUID: Implement support for XCR0 when AVX is enabled This enables AVX, AVX2, FMA3 for the entire CPUID! ```bash $ FEX_HOSTFEATURES=enableavx,enableavx2 ./Bin/FEXInterpreter /usr/bin/cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Cortex-A78AE stepping : 0 microcode : 0x0 cpu MHz : 3000 cache size : 512 KB physical id : 0 siblings : 12 core id : 0 cpu cores : 12 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht tm syscall nx mmxext fxsr_opt rdtscp lm 3dnow 3dnowext constant_tsc art rep_good nopl xtoplogy nonstop_tsc cpuid tsc_known_freq pni pclmulqdq dtes64 monitor tm2 ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx hypervisor lahf_lm cmp_legacy extapic abm 3dnowprefetc h tce fsgsbase bmi1 avx2 smep bmi2 erms invpcid adx clflushopt clwb sha_ni clzero arat vpclmulqdq rdpid fsrm bugs : bogomips : 8000.0 TLB size : 2560 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ``` Notice avx, avx2, and fma	2024-06-26 14:56:01 -07:00
Ryan Houdek	43aef377d7	HostFeatures: Allow enabling AVX without SVE256	2024-06-26 14:56:01 -07:00
Ryan Houdek	add0e7a8db	HostFeatures: Removes distinction between AVX and AVX2 We now no longer care about AVX versions, consolidate them in to a single config option which enables both.	2024-06-26 14:56:01 -07:00
Ryan Houdek	52e541d453	Unittests: Stop using AVX2 flag	2024-06-26 14:56:01 -07:00
Mai	a031a49546	Merge pull request #3767 from Sonicadvance1/avx128_fix_wide_shift AVX128: Fixes wide shifts	2024-06-26 17:29:09 -04:00
Alyssa Rosenzweig	4d821b8dd8	Merge pull request #3765 from Sonicadvance1/avx128_f16c AVX128: F16C support	2024-06-26 17:25:05 -04:00
Ryan Houdek	f277025c9a	AVX128: Fixes wide shifts During refactoring this was missed and rerunning unittests locally caught it. 256-bit operations get their shift only from the lower half of the vector register.	2024-06-26 14:16:39 -07:00
Ryan Houdek	ba28e6f82e	unittests: Adds vcvtps2ph tests that use mxcsr	2024-06-26 14:08:20 -07:00
Ryan Houdek	3a89df9bed	AVX128: Implement support for F16C	2024-06-26 14:05:12 -07:00
Ryan Houdek	f6a0866fbb	IR: Split Vector_FToF2 in to VFCVTL2 and VCVTFN2 I forgot in the narrowing case we need to be careful about insert. No IR op used Vector_FToF2 with narrowing.	2024-06-26 14:03:41 -07:00
Ryan Houdek	756fa2ecc5	Merge pull request #3766 from alyssarosenzweig/opt/f16c-round Optimize vcvtps2ph	2024-06-26 14:03:24 -07:00
Alyssa Rosenzweig	cf834aa6da	InstCountCI: Update Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-06-26 16:46:21 -04:00
Alyssa Rosenzweig	d2324f4a93	OpcodeDispatcher: optimize vcvtps2ph We can avoid a LOT of pointless work with some dedicated IR ops for specifically overriding the round mode. Small behaviour change here: we no longer reset FTZ. I think this is a bug fix? But if it's not it's not hard to fix. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-06-26 16:46:21 -04:00
Ryan Houdek	6226c7f4f3	Merge pull request #3757 from Sonicadvance1/avx_16 AVX128: Implement support for gathers	2024-06-26 13:29:58 -07:00
Ryan Houdek	991ecd558e	InstcountCI: Update for SVE256 gathers!	2024-06-26 16:00:53 -04:00
Ryan Houdek	a4fa3a460e	OpcodeDispatcher: Implement AVX gathers with SVE256 Just to ensure we still have feature parity.	2024-06-26 16:00:53 -04:00
Ryan Houdek	77ba708933	AVX128: Implement support for gather load instructions This is the last family of instructions that we needed to implement for AVX2 to be properly advertised!	2024-06-26 16:00:53 -04:00
Ryan Houdek	662d50a966	X86Tables: Describe VPGather in the VEX tables	2024-06-26 16:00:53 -04:00
Ryan Houdek	5472d1cc04	Arm64: Implement VLoadVectorGatherMasked operation This does a gather load three ways, SVE256, SVE128, and ASIMD. This operation is a bit special since it it can't quite handle all gather loadstores in the 256-bit case and requires the frontend to decompose the operation in the case that the striding hits a mode that SVE doesn't support! The 128-bit case is a lot simpler since both support all the cases where stride doesn't match. I find this to be a nice compromise while there aren't any SVE256 products on the market. In the 128-bit case there is an SVE path which is utilized if the passed in stride supports what SVE understands, otherwise it falls back to an ASIMD implementation which manually emulates everything that is necessary. This instruction is very explicitly doing basically exactly what AVX gather instructions want, because it's complex enough that we don't want to try and make this a generic solution.	2024-06-26 16:00:53 -04:00
Alyssa Rosenzweig	d1d41f5645	Merge pull request #3763 from alyssarosenzweig/rclse/less-aggressive Remove RCLSE	2024-06-26 15:14:14 -04:00
Ryan Houdek	94fd100fc7	Merge pull request #3719 from lioncash/f16c OpcodeDispatcher: Handle F16C operations	2024-06-26 12:12:13 -07:00
Lioncache	b9ff36b5d9	CPUID: Signify F16C support if AVX is available On Aarch64 hardware, if we have SVE2 available (which we use in the AVX implementation), then we can also enable F16C support.	2024-06-26 15:05:03 -04:00
Lioncache	cd5a809ec9	OpcodeDispatcher: Handle VCVTPS2PH	2024-06-26 15:05:03 -04:00
Lioncache	045a8efbeb	OpcodeDispatcher: Handle VCVTPH2PS Fairly straightforward, since we already have handling for half-float conversions.	2024-06-26 15:05:00 -04:00
Ryan Houdek	54a1f7d833	Merge pull request #3764 from Sonicadvance1/rorx_masking BMI2: Ensure rorx immediate masks by operation size correctly.	2024-06-26 11:52:47 -07:00
Alyssa Rosenzweig	1b496cda8f	InstCountCI: Update Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-06-26 14:49:58 -04:00
Alyssa Rosenzweig	a5b24bfe4c	IR: drop RCLSE Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-06-26 14:49:05 -04:00
Alyssa Rosenzweig	46676ca376	OpcodeDispatcher: add cache Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-06-26 14:49:05 -04:00
Alyssa Rosenzweig	7d939a3b3d	Merge pull request #3758 from Sonicadvance1/avx_17 AVX128: FMA3	2024-06-26 14:18:32 -04:00
Ryan Houdek	a515061465	BMI2: Ensure rorx immediate masks by operation size correctly.	2024-06-26 11:11:37 -07:00
Ryan Houdek	1c24d63f73	Merge pull request #3762 from alyssarosenzweig/bug/constprop-bextr	2024-06-26 09:28:18 -07:00
Alyssa Rosenzweig	7e10dba5e2	unittests: add test for a BEXTR bug Ryan reduced this test while debugging openssl. This fails without the constprop fix. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-06-26 12:06:47 -04:00
Alyssa Rosenzweig	e2d73014f1	ConstProp: fix LSHR constant prop Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-06-26 12:06:47 -04:00
Ryan Houdek	53aa30596e	InstcountCI: Update	2024-06-25 11:37:18 -07:00
Ryan Houdek	122ae5b710	unittests: Adds FMA3 unittests	2024-06-25 11:37:18 -07:00
Ryan Houdek	45c27b2965	CPUID: Enable support for FMA3 when AVX is enabled	2024-06-25 11:24:53 -07:00
Ryan Houdek	832b247fc1	SVE258: Implement support for FMA3	2024-06-25 11:24:46 -07:00

... 3 4 5 6 7 ...

9940 Commits