FEX-Emu/FEX - FEX - Gitea: Git with a cup of tea

mirror of https://github.com/FEX-Emu/FEX.git synced 2024-12-14 17:38:47 +00:00

Author	SHA1	Message	Date
Alyssa Rosenzweig	edf1a7970d	X86Tables: add Literal() helper Any time we get the value of Literal, we want to assert that it's actually a literal. We've been open coding this pattern sporadically throughout the opcodedispatcher. Let's add an ergonomic helper to fetch the value of literal, asserting that the value is indeed literal. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-06-21 14:46:46 -04:00
Ryan Houdek	fac9972bad	Merge pull request #3741 from alyssarosenzweig/cleanup/comiss OpcodeDispatcher: refactor Comiss helper	2024-06-21 11:43:05 -07:00
Alyssa Rosenzweig	9ecb960f3a	OpcodeDispatcher: refactor Comiss helper AVX128 will use this, it's not SSE-specific. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-06-21 14:23:20 -04:00
Ryan Houdek	3d26e23891	Merge pull request #3737 from Sonicadvance1/avx_10 Arm64: Implement support for emulated masked vector loadstores	2024-06-21 11:04:01 -07:00
Ryan Houdek	7bbbd95775	Merge pull request #3736 from Sonicadvance1/avx_9 AVX128: Some pun pickles, moves and conversions	2024-06-21 10:55:19 -07:00
Ryan Houdek	bb308899b9	Frontend: Expose AVX W flag Previously we could always tell the size of the operation depending on how this effects the operating size of the instruction. Converting 64-bit down to 32-bit as an example. AVX gather instructions are the first instruction class that can't infer this information. The element load size is determined by the W flag but the operating size of 128-bit or 256-bit is determined by other means. Expose this flag so we can determine this difference. The FMA instructions are going to need this flag as well.	2024-06-21 10:54:42 -07:00
Ryan Houdek	e95c8d703c	Arm64: Implement support for emulated masked vector loadstores In order to support `vmaskmov{ps,pd}` without SVE128 this is required. It's pretty gnarly but they aren't often used so that's fine from a compatibility perspective. Example SVE128 implementation: ```json "vmaskmovps ymm0, ymm1, [rax]": { "ExpectedInstructionCount": 9, "Comment": [ "Map 2 0b01 0x2c 256-bit" ], "ExpectedArm64ASM": [ "ldr q2, [x28, #32]", "mrs x20, nzcv", "cmplt p0.s, p6/z, z17.s, #0", "ld1w {z16.s}, p0/z, [x4]", "add x21, x4, #0x10 (16)", "cmplt p0.s, p6/z, z2.s, #0", "ld1w {z2.s}, p0/z, [x21]", "str q2, [x28, #16]", "msr nzcv, x20" ] }, ``` Example ASIMD implementation ```json "vmaskmovps ymm0, ymm1, [rax]": { "ExpectedInstructionCount": 37, "Comment": [ "Map 2 0b01 0x2c 256-bit" ], "ExpectedArm64ASM": [ "ldr q2, [x28, #32]", "mrs x20, nzcv", "movi v0.2d, #0x0", "mov x1, x4", "mov x0, v17.d[0]", "tbz x0, #63, #+0x8", "ld1 {v0.s}[0], [x1]", "add x1, x1, #0x4 (4)", "tbz w0, #31, #+0x8", "ld1 {v0.s}[1], [x1]", "add x1, x1, #0x4 (4)", "mov x0, v17.d[1]", "tbz x0, #63, #+0x8", "ld1 {v0.s}[2], [x1]", "add x1, x1, #0x4 (4)", "tbz w0, #31, #+0x8", "ld1 {v0.s}[3], [x1]", "mov v16.16b, v0.16b", "add x21, x4, #0x10 (16)", "movi v0.2d, #0x0", "mov x1, x21", "mov x0, v2.d[0]", "tbz x0, #63, #+0x8", "ld1 {v0.s}[0], [x1]", "add x1, x1, #0x4 (4)", "tbz w0, #31, #+0x8", "ld1 {v0.s}[1], [x1]", "add x1, x1, #0x4 (4)", "mov x0, v2.d[1]", "tbz x0, #63, #+0x8", "ld1 {v0.s}[2], [x1]", "add x1, x1, #0x4 (4)", "tbz w0, #31, #+0x8", "ld1 {v0.s}[3], [x1]", "mov v2.16b, v0.16b", "str q2, [x28, #16]", "msr nzcv, x20" ] }, ``` There's a little bit of an improvement where nzcv isn't needed to get touched on the ASIMD implementation, but I'll leave that for a future improvement.	2024-06-21 08:21:32 -07:00
Ryan Houdek	903d6a742e	CPUBackend: Removes `SupportsSaturatingRoundingShifts` option This has always been true ever since we removed the x86 JIT and Interpreter. This was left over and adding more code for no reason.	2024-06-21 08:11:22 -07:00
Ryan Houdek	424218e327	AVX128: Implement support for vpsign{b,w,d}	2024-06-21 08:11:22 -07:00
Ryan Houdek	17dc03d414	AVX128: Implement support for vpack{s,u}{wb,dw}	2024-06-21 08:11:21 -07:00
Ryan Houdek	baf699c6e1	AVX128: Implements support for vandnps and vpandn This can't use the previous binary operator handler since the register sources need to be swapped.	2024-06-21 08:11:21 -07:00
Ryan Houdek	1431af1ff5	AVX128: Implements support for vcvt{t,}s{s,d}2si	2024-06-21 08:11:21 -07:00
Ryan Houdek	775a41b903	AVX128: Implement support for vcvtsi2s{s,d}	2024-06-21 08:11:21 -07:00
Ryan Houdek	2da1e90dd5	Merge pull request #3738 from Sonicadvance1/cpuid_label CPUID: Update labeling on some reserved bits	2024-06-21 07:41:57 -07:00
Ryan Houdek	e614340c0c	CPUID: Update labeling on some reserved bits These aren't reserved and I was confused that they were missing.	2024-06-21 05:34:44 -07:00
Ryan Houdek	3c293b9aed	Arm64: Loosen restrictions on V{Load,Store}VectorMasked to allow 128-bit operation	2024-06-21 04:26:09 -07:00
Ryan Houdek	283c2861c9	AVX128: Implement suppor for vlddqu	2024-06-21 00:56:36 -07:00
Ryan Houdek	757dc95116	AVX128: Implement support for the punpckh instructions	2024-06-21 00:56:32 -07:00
Ryan Houdek	6192250b8a	AVX128: Implement support for the punpckl instructions	2024-06-21 00:56:28 -07:00
Ryan Houdek	f489135b1d	Merge pull request #3734 from Sonicadvance1/avx_8 AVX128: Move moves!	2024-06-21 00:53:41 -07:00
Ryan Houdek	4d00a52761	Merge pull request #3732 from Sonicadvance1/avx_6 unittests: Split up vtestps unittest to accumulate flags in independent registers.	2024-06-21 00:52:02 -07:00
Ryan Houdek	6941a59223	unittests: Split up vtestps unittest to accumulate flags in independent registers. Makes it easier to see what is failing on the 128-bit side versus 256-bit side.	2024-06-21 00:45:30 -07:00
Ryan Houdek	3f232e631e	Merge pull request #3730 from Sonicadvance1/avx_4 Vector: Helper refactorings	2024-06-21 00:31:14 -07:00
Ryan Houdek	6e3643c3ef	Merge pull request #3714 from pmatos/FSTstiTagSet Set tag properly in X87 FST(reg)	2024-06-21 00:27:24 -07:00
Ryan Houdek	d7348c8aff	Merge pull request #3683 from Sonicadvance1/fix_broken_mprotect SMCTracking: Fix incorrect mprotect tracking	2024-06-20 22:49:51 -07:00
Ryan Houdek	e7bdb8679d	Merge pull request #3735 from alyssarosenzweig/instcountci/seg-reg-cases InstCountCI: add segment register cases	2024-06-20 09:43:42 -07:00
Ryan Houdek	c28824f94d	AVX128: Implements support for vbroadcast*	2024-06-20 09:43:10 -07:00
Ryan Houdek	664d766b45	AVX128: Implement support for vmovshdup	2024-06-20 09:43:10 -07:00
Ryan Houdek	fce694ed92	AVX128: Implement support for vmovsldup	2024-06-20 09:43:10 -07:00
Ryan Houdek	96aafb4f07	AVX128: Implement support for vmovddup This instruction is a little weird. When accessing memory, the 128-bit operating size of the instruction only loads 64-bits. Meanwhile the 256-bit operating size of the instruction fetches a full 256-bits. Theoretically the hardware could get away with two 64-bit loads or a wacky 24-byte load, but it looks like to simplify hardware they just spec'd it that the 256-bit version will always load the full range.	2024-06-20 09:43:10 -07:00
Alyssa Rosenzweig	a474f86ea8	InstCountCI: add segment register cases add a bit of coverage for this funny addressing corner. We do handle this optimally but I had to write this to check ;) Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-06-20 11:37:35 -04:00
Ryan Houdek	dbaf95a8f3	AVX128: Implement support for vmovhps/d	2024-06-20 06:53:21 -07:00
Ryan Houdek	e67df96ad9	AVX128: Implement support for movlps/d	2024-06-20 06:53:17 -07:00
Ryan Houdek	56de94578d	AVX128: Implement support for vmovq	2024-06-20 06:53:13 -07:00
Ryan Houdek	06fc2f5ef0	AVX128: Implement support for non-temporal moves.	2024-06-20 06:53:09 -07:00
Ryan Houdek	b3ba315cbd	AVX128: Implements unary/binary lambda helper	2024-06-20 06:53:05 -07:00
Ryan Houdek	e5a531e683	Vector: Refactor MPSADBWOpImpl so AVX128 can use it.	2024-06-20 06:43:57 -07:00
Ryan Houdek	e2de57bd04	Vector: Refactor PSADBWOpImpl so AVX128 can use it.	2024-06-20 06:43:57 -07:00
Ryan Houdek	4eebca93e3	Vector: Refactor PSHUFBOpImpl. This will be reused for AVX128	2024-06-20 06:33:27 -07:00
Ryan Houdek	3919ec9692	Vector: Expose VBLENDOpImpl in the OpcodeDispatcher. It will be reused by AVX128	2024-06-20 06:33:21 -07:00
Ryan Houdek	02aeb0ac1a	Vector: Restructure PMADDWDOpImpl. It's going to get reused for AVX128	2024-06-20 06:33:15 -07:00
Ryan Houdek	206544ad09	Vector: Reconfigure PMADDUBSWOpImpl, it's going to get reused for AVX128	2024-06-20 06:33:08 -07:00
Ryan Houdek	3854cd2b2f	Vector: Restruture SHUFOpImpl. AVX128 is going to reuse it.	2024-06-20 06:32:58 -07:00
Alyssa Rosenzweig	b2eb8aaf66	Merge pull request #3718 from Sonicadvance1/avx128_3 OpcodeDispatcher: Adds initial groundwork for decomposed AVX operations	2024-06-20 08:57:35 -04:00
Ryan Houdek	acbd920c9a	OpcodeDispatcher: Adds initial groundwork for decomposed AVX operations Only installs the tables if SVE256 isn't supported yet AVX is explicitly enabled with HostFeatures, to protect accidental enablement early. - Only implements 85 instructions starting out - Basic vector moves - Basic vector unary operations - Basic vector binary operations - VZeroUpper/VZeroAll The bulk of the implementation is currently the handling for loading and storing the halves of the registers from the context or from memory. This means the load/store helpers must always return a pair unless only requesting the bottom half of the register, which occurs with 128-bit AVX operations. The store side then needing to consume the named zero register if it occurs since those cases will zero the upper bits. This implementation approach has a few benefits. - I can pound this out extremely quickly - SSE implementations are unaffected and don't need to deal with the insert behaviour of SVE256. - We still keep the SVE256 implementation for the inevitable future when hardware vendors actually do implement it (Give it 8 years or something). - We can actually unit test this path in CI once it is complete. - We can partially optimize some paths with SVE128 (Gathers) and support a full ASIMD path if necessary. One downside is that I can't enable this in CI yet because it can't pass all unittests. but that's a non-issue since it is going to be in heavy flux as I'm hammering out the implementation. It'll get switched on at the end when it's passing all 1265 AVX unittests. Currently at 1001 on this.	2024-06-20 08:44:14 -04:00
Alyssa Rosenzweig	db0bdd48e5	Merge pull request #3729 from alyssarosenzweig/refactor/address-modes OpcodeDispatcher: Refactor address modes	2024-06-20 08:18:33 -04:00
Ryan Houdek	da21ee3cda	Merge pull request #3692 from pmatos/AFP_RPRES_fix Fixes AFP.NEP handling on scalar insertions	2024-06-19 19:23:49 -07:00
Ryan Houdek	d2baef2b36	Merge pull request #3727 from Sonicadvance1/vaes VAES support	2024-06-19 19:22:56 -07:00
Ryan Houdek	df96bc83cc	Merge pull request #3726 from Sonicadvance1/oryon_errata HostFeatures: Work around Qualcomm Oryon RNG errata	2024-06-19 19:21:14 -07:00
Alyssa Rosenzweig	ec03831a21	OpcodeDispatcher: plumb A.NonTSO deeper Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-06-19 08:52:07 -04:00

... 3 4 5 6 7 ...

9772 Commits