FEX-Emu/FEX - FEX - Gitea: Git with a cup of tea

mirror of https://github.com/FEX-Emu/FEX.git synced 2025-02-23 16:23:10 +00:00

Author	SHA1	Message	Date
Ryan Houdek	2671246fef	IR: Adds scalar vector insert operations These IR operations are required to support AFP's NEP mode which does vector insert in to the destination register. Additionally it gives us tracking information to allow optimizing out redundant inserts on devices that don't support AFP natively. In order to match x86 semantics we need to support binary and unary scalar operations that do a final insert in to a vector. With optional zeroing of the top 128-bits for AVX variants. A tricky thing is that in binary operations this means that the destination and first source have an intrinsically linked property depending on if it is SSE or AVX. SSE example: - addss xmm0, xmm1 - xmm0 is both the destination and the first source. - This means xmm0[31:0] = xmm0[31:0] + xmm1[31:0] - Bits [127:32] are UNMODIFIED. FEX's JIT jumps through some hoops so that if the destination register equals the first source register, then it hits the optimal path the AFP.NEP will insert in to the result. AVX throws a small wrench in to this due to changed behaviour AVX example: - vaddss xmm0, xmm1, xmm2 - xmm0 is ONLY the destination, xmm1 and xmm2 are the sources - This operation copies the bits above the scalar result from the first source (xmm1). - Additionally this will zero bits above the original 128-bit xmm register. - xmm0[31:0] = xmm1[31:0] + xmm2[31:0] - xmm0[127:32] = xmm1[127:32] - ymm0[255:127] = 0 This causes these instructions to support a fairly large table depending on if the instruction is an SSE or AVX instruction, plus if the host CPU supports AFP or not. So while fairly complex, it's handling all the edge cases and gives us optimization opportunities as we move forward. Currently on non-AFP supporting devices this has a minor benefit that these IR operations remove one temporary register, lowering the Register Allocation overhead. In the coming weeks I am likely to introduce an optimization pass that removes redundant inserts because FEX currently does /really/ badly with scalar code loops. Needs #3184 merged first.	2023-10-10 03:17:19 -07:00
Ryan Houdek	8cb8f090dd	Arm64Emitter: enable/disable AFP on Fill/Spill When FEX is in the JIT we need to make sure to enable NEP and AH and then disable when leaving. Explicitly disabled when the vixl simulator is used since even attempting to set the bits will cause it to fault out. Ensures InstCountCI keeps working.	2023-10-10 03:17:18 -07:00
Ryan Houdek	a37d89a7d5	HostFeatures: Disable AFP until verified that it is working Need to audit scalar instruction usage to ensure all uses are okay with garbage in the upper bits.	2023-10-10 03:17:18 -07:00
Ryan Houdek	252d7712ea	Arm64: Save if the host supports AFP	2023-10-10 03:17:18 -07:00
Ryan Houdek	c548625fbe	InstCountCI: Update tests for disabling AFP Doesn't change behaviour yet, just prep work.	2023-10-10 03:17:18 -07:00
Ryan Houdek	f036a0b84f	Merge pull request #3191 from Sonicadvance1/instcountci_multiple InstCountCI: Support multiple instructions in the tests	2023-10-10 02:53:14 -07:00
Ryan Houdek	2e1389b25e	Merge pull request #3184 from Sonicadvance1/armemitter_sized_scalars ArmEmitter: Adds sized Scalar 1 source and 2 source helpers	2023-10-10 02:53:06 -07:00
Alyssa Rosenzweig	a5f82a57fa	Merge pull request #3190 from Sonicadvance1/atomic_instcountci InstCountCI: Adds missing atomic tests	2023-10-10 05:22:36 -04:00
Ryan Houdek	cd83d3eb24	InstCountCI: Support multiple instructions in the tests There are some cases where we want to test multiple instructions where we can do optimizations that would overwise be hard to see. eg: ```asm ; Can be optimized to a single stp push eax push ebx ; Can remove half of the copy since we know the direction cld rep movsb ; Can remove a redundant insert addss xmm0, xmm1 addss xmm0, xmm2 ``` This lets us have arbitrary sized code in instruction count CI, with the original json key becoming only a label if the instruction array is provided. There are still some major limitations to this, instructions that generate side-effects might have "garbage" after the end of the block that isn't correctly accounted for. So care must be taken. Example in the json ```json "push ax, bx": { "ExpectedInstructionCount": 4, "Optimal": "No", "Comment": "0x50", "x86Insts": [ "push ax", "push bx" ], "ExpectedArm64ASM": [ "uxth w20, w4", "strh w20, [x8, #-2]!", "uxth w20, w7", "strh w20, [x8, #-2]!" ] } ```	2023-10-09 21:49:53 -07:00
Ryan Houdek	93ab8ab23c	InstCountCI: Adds missing atomic tests This adds all the missing atomic tests in to their own tests files. This includes all of them except a few choice ones that are in their original files. - BTC, BTR, BTS are in their Secondary/SecondaryGroup files - CMPXCHG, CMPXCHG8B, CMPXCHG16B are in their Secondary/SecondaryGroup files - These always imply lock semantics even without the prefix.	2023-10-09 21:18:08 -07:00
Ryan Houdek	462fff2c67	Merge pull request #3189 from Sonicadvance1/remove_warnings_15 FEXCore: Removes a warning about assume discarding side-effects	2023-10-09 17:26:14 -07:00
Alyssa Rosenzweig	8dab35cbf8	Merge pull request #3188 from Sonicadvance1/reconstruct_flags_naming FEXCore: Renames raw FLAGS location names to signify they can't be used directly	2023-10-09 19:41:09 -04:00
Ryan Houdek	a1a479e69f	FEXCore: Removes a warning about assume discarding side-effects	2023-10-09 16:04:57 -07:00
Ryan Houdek	6403290019	FEXCore: Renames raw FLAGS location names to signify they can't be used directly Six of the EFLAGS can't be used directly in a bitmask because they are either contained in a different flags location or has multiple bits stored in it. SF, ZF, CF, OF are stored in ARM's NZCV format in offset 24. PF calculation is deferred but stored in the regular offset. AF is also deferred in relation to the PF but stored in the regular offset. These /need/ to be reconstructed using the `ReconstructCompactedEFLAGS` function when wanting to read the EFLAGS. When setting these flags they /need/ to be set using `SetFlagsFromCompactedEFLAGS`. If either of these functions are not used when managing EFLAGs then the internal representation will get mangled and the state will be corrupted. Having a little `_RAW` on these to signify that these aren't just regular single bit representations like the other flags in EFLAGS should make us puzzle about this issue before writing more broken code that tries accessing it directly.	2023-10-08 11:51:11 -07:00
Ryan Houdek	580bd50a00	unittests/ASM: Removes eflags comparison option This was not used and is also broken.	2023-10-08 11:51:11 -07:00
Ryan Houdek	b2a8b0ca12	Merge pull request #3187 from Sonicadvance1/implement_rpres FEXCore: Implements support for RPRES	2023-10-08 09:57:27 -07:00
Ryan Houdek	f78bdf0852	unittests/Emitter: Adds sized scalar unittests.	2023-10-08 09:48:37 -07:00
Ryan Houdek	5652eb4c5d	ARMEmitter: Removes templated ptrue/ptrues Non-templated version exists and templated version gets us nothing.	2023-10-07 23:51:32 -07:00
Ryan Houdek	a52bb47551	unittests: Update for rpres optimization	2023-10-07 23:22:51 -07:00
Ryan Houdek	22590dde77	FEXCore: Implements support for RPRES This allows us to use reciprocal instructions which matches precision of what x86 expects rather than converting everything to float divides. Currently no hardware supports this, and even the upcoming X4/A720/A520 won't support it, but it was trivial to implement so wire it up.	2023-10-07 23:13:47 -07:00
Ryan Houdek	6543a80ff9	Merge pull request #3185 from Sonicadvance1/ir_dispatcher_emit FEXCore/IR: Changes over to automated IR dispatch generation	2023-10-07 21:21:44 -07:00
Ryan Houdek	9c36d1061b	Merge pull request #3182 from Sonicadvance1/instcountci_stacking_test_names InstCountCI: Fixes recursive tests with same filename	2023-10-07 21:21:10 -07:00
Alyssa Rosenzweig	5a3cc7b469	Merge pull request #3183 from Sonicadvance1/instcountci_support_afp_override InstCountCI: Support overriding AFP features	2023-10-07 19:57:33 -04:00
Ryan Houdek	4cff3e5f1f	FEXCore/IR: Changes over to automated IR dispatch generation Suggested by Alyssa. Adding an IR operation can be a little tedious since you need to add the definition to JIT.cpp for the dispatch switch, JITClass.h for the function declared, and then actually defining the implementation in the correct file. Instead support the common case where an IR operation just gets dispatched through to the regular handler. This lets the developer just put the function definition in to the json and the relevent cpp file and it just gets picked up. Some minor things: - Needs to support dynamic dispatch for {Load,Store}Register and {Load,Store}Mem - This is just a bool in the json - It needs to not output JIT dispatch for some IR operations - SSE4.2 string instructions and x87 operations - These go down the "Unhandled" path - Needs to support a Dispatcher function override - This is just for handling NoOp IR operations that get used for other reasons. - Finally removes VSMul and VUMul, consolidating to VMul - Unlike V{U,S}Mull, signed or unsigned doesn't change behaviour here - Fixed a couple random handler names not matching the IR operation name.	2023-10-07 15:01:47 -07:00
Ryan Houdek	a1eb571630	ArmEmitter: Adds sized Scalar 1 source and 2 source helpers Removes the need for an annoying switch statement with scalar operations for the most part.	2023-10-07 11:51:32 -07:00
Ryan Houdek	559cf6491a	InstCountCI: Support overriding AFP features Also disable AFP under the vixl simulator by default since it doesn't support it.	2023-10-07 11:48:42 -07:00
Ryan Houdek	4bdda1eeb5	InstCountCI: Fixes recursive tests with same filename This will be used to move AFP tests to a sub-directory	2023-10-07 11:47:16 -07:00
Mai	fc70fc3506	Merge pull request #3179 from Sonicadvance1/support_hostfeature_crypto FEXCore: Support crypto extensions in HostFeatures override	2023-10-06 16:01:59 -04:00
Mai	26ee63cc24	Merge pull request #3181 from Sonicadvance1/remove_spurious_license External: Remove a spurious license	2023-10-06 15:59:41 -04:00
Ryan Houdek	0092ea7c0b	External: Remove a spurious license This doesn't exist anymore	2023-10-06 09:37:17 -07:00
Ryan Houdek	439a3b9c3a	HostFeatures: Use a define	2023-10-06 09:33:41 -07:00
Alyssa Rosenzweig	b4ddf36582	Merge pull request #3180 from Sonicadvance1/remove_warnings_14 Linux: Fixes warning in 32-bit clock_settime	2023-10-06 08:13:52 -04:00
Ryan Houdek	12c44f26e5	Linux: Fixes warning in 32-bit clock_settime This syscall requires a valid pointer otherwise it returns EFAULT. When going through the glibc helper it can crash before reaching the raw syscall even.	2023-10-05 17:44:33 -07:00
Ryan Houdek	5b7ba06d5c	FEXCore: Support crypto extensions in HostFeatures override Enables in InstCountCI so Pi users can run InstCountCI can run the tests without breaking on crypto operations. When crypto is enabled or disabled just wholesale change AES, CRC32, and PMULL 128-bit in one step. We don't really care about partial support here.	2023-10-05 17:41:08 -07:00
Ryan Houdek	ee0c1457d8	Docs: Update for release FEX-2310 FEX-2310	2023-10-05 14:39:10 -07:00
Alyssa Rosenzweig	3413eb3d98	Merge pull request #3169 from Sonicadvance1/remove_constant_indirection FEXCore: Support CpuState relative vector named constants	2023-10-05 08:22:48 -04:00
Ryan Houdek	2e0753a244	InstCountCI: Update for named vector constant optimization	2023-10-04 20:57:09 -07:00
Ryan Houdek	8a51bb7a61	FEXCore: Support CpuState relative vector named constants The motivation towards just having a pointer array in CpuState was that initialization was fairly cheap and that we have limited space inside the encoding depending on what we want to do. Initialization cost is still a concern but doing a memcpy of 128-bytes isn't that big of a deal. Limited space in CpuState, while a concern isn't a significant one. - Needs to currently be less than 1 page in size - Needs to be under the architectural offset limitations of loadstore scaled offsets. Which is 65KB for 128-bit vectors Still keeps the pointer array around for cases when we would need synthesize an address offset and it's just easier to load the process-wide table. The performance improvement here is removing the dependency in the ldr+ldr chain. In microbenchmarks this has shown to have an improvement of ~4% by removing this dependency chain on Cortex-X1C.	2023-10-04 20:56:29 -07:00
Ryan Houdek	ee6debe8fd	FEXCore: Adds DividePow2 helper	2023-10-04 20:56:29 -07:00
Mai	3ba1c7912c	Merge pull request #3178 from Sonicadvance1/fix_avx_alias_precolour Minor AVX optimizations	2023-10-04 21:31:20 -04:00
Ryan Houdek	a408afaeb0	InstCountCI: Update for optimized AVX	2023-10-04 10:05:09 -07:00
Ryan Houdek	fba7c4bedc	IR/RA: Fixes register aliasing and pre-colouring for AVX This is the cause of a bunch of redundant moves that shows up in InstCountCI. Fixing this aliasing and pre-colouring issue causes a ton of 256-bit operations to become optimal.	2023-10-04 10:04:06 -07:00
Ryan Houdek	c52753e9c8	OpcodeDispatcher: Minor optimization in vzeroall Using the cached zero value is less efficient than loading it in to the register for all these cases. Lets us use rename hardware more efficiently and removes a dependency chain on a single register. Original: ``` movi v2.2d, #0x0 mov z16.d, p7/m, z2.d <... 16 more times> mov z31.d, p7/m, z2.d ``` Result: ``` movi v16.2d, #0x0 <... 16 more times> movi v31.2d, #0x0 ```	2023-10-04 10:01:13 -07:00
Ryan Houdek	e39634d314	Arm64: Fixes assert in VSQSHL/VSQSHR with SVE When Dst != Vector then we need to pass Dst in to both Zd and Zdn. Would have worked fine in a release build but assert build managed to capture it.	2023-10-04 09:59:59 -07:00
Ryan Houdek	507cf82dad	Merge pull request #3176 from neobrain/fix_thunks_unused_artifacts Thunks: Only build guest target for libfex_thunk_test if FEXLinuxTests are enabled	2023-10-04 07:07:18 -07:00
Ryan Houdek	48fa4f1121	Merge pull request #3156 from neobrain/feature_thunk_data_layout_analysis Thunks: Analyze data layout to detect platform differences	2023-10-04 07:06:49 -07:00
Tony Wasserka	e06d609bf0	Thunks: Drop unused STRUCT_VERIFIER define from CMake	2023-10-03 11:43:29 +02:00
Tony Wasserka	0a09e04e33	Thunks: Only build guest target for libfex_thunk_test if FEXLinuxTests are enabled	2023-10-03 11:43:27 +02:00
Ryan Houdek	a1a709f948	Merge pull request #3170 from Sonicadvance1/vixl_sim_instcountci InstCountCI: Enable running on x86 hosts	2023-10-02 16:38:25 -07:00
Ryan Houdek	5925eef213	Github/InstCountCI: Enables x86 runner To ensure we don't break this path for developers.	2023-10-02 16:26:14 -07:00

1 2 3 4 5 ...

8111 Commits