FEX-Emu/FEX - FEX - Gitea: Git with a cup of tea

mirror of https://github.com/FEX-Emu/FEX.git synced 2025-02-24 08:42:31 +00:00

Author	SHA1	Message	Date
Alyssa Rosenzweig	92211bf8c6	OpcodeDispatcher: Add AllowUpperGarbage option To load 8-bit sources without bfe'ing for al/bl/cl if the caller knows it doesn't need masking behaviour, but without lying about the size so the extract for ah/bh/ch will still work properly. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2023-09-26 19:08:20 -04:00
Ryan Houdek	ca87d8688d	Merge pull request #3153 from alyssarosenzweig/opt/adcs Use adcs	2023-09-26 09:57:01 -07:00
Ryan Houdek	e32601f49d	Merge pull request #3161 from neobrain/fix_ctest_silent_failures unittests: Instruct CTest to print output from tests on failure	2023-09-26 08:26:15 -07:00
Tony Wasserka	f4dd456c80	unittests: Instruct CTest to print output from tests on failure	2023-09-26 17:16:28 +02:00
Alyssa Rosenzweig	7a06cc9727	IR: Use adcs/sbcs Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2023-09-26 09:06:46 -04:00
Alyssa Rosenzweig	5facb21d30	OpcodeDispatcher: Don't mask small add/sub carries For the GPR result, the masking already happens as part of the bfi. So the only point of masking is for the flag calculation. But actually, every flag except carry will ignore the upper bits anyway. And the carry calculation actually WANTS the upper bit as a faster impl. Deletes a pile of code both in FEX and the output :-) ADC/SBC could probably get similar treatment later. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2023-09-25 18:25:30 -04:00
Ryan Houdek	234e029391	Merge pull request #3145 from Sonicadvance1/optimize_inline_calls PassManager: Optimize out CPUID and XGetBV calls	2023-09-24 18:09:18 -07:00
Alyssa Rosenzweig	c8519b0b87	OpcodeDispatcher: Remove LoadPF Now unused, its former users all prefer LoadPFRaw since they can fold in some of this math into the use. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2023-09-24 20:59:28 -04:00
Alyssa Rosenzweig	68d32ad70d	OpcodeDispatcher: Optimize PF in lahf Use the raw popcount rather than the final PF and use some sneaky bit math to come out 1 instruction ahead. Closes #3117 Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2023-09-24 20:59:28 -04:00
Alyssa Rosenzweig	1f02a6da34	IR: Add Ornror op Mostly copypaste of Orlshl... we really should deduplicate this mess somehow. Maybe a shift enum on the core Or op? Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2023-09-24 20:47:50 -04:00
Alyssa Rosenzweig	86063411dc	Revert "OpcodeDispatcher: Use plain Lshl for flags" This logic is unused since 8adfaa9aa ("OpcodeDispatcher: Use SelectCC for x87"), which addressed the underlying issue. This reverts commit df3833edbe3d34da4df28269f31340076238e420.	2023-09-24 20:47:50 -04:00
Ryan Houdek	9968e6431f	Passes: Rename SyscallOptimization This is now inlining multiple external calls out of the JIT. Rename it to InlineCallOptimization.	2023-09-24 17:25:38 -07:00
Ryan Houdek	ff24f64b2a	PassManager: Optimize out CPUID and XGetBV calls If we const-prop the required functions and leafs then we can directly encode the CPUID information rather than jumping out of the JIT. In testing almost all CPUID executions const-prop which function is getting called. Worst case that I found was only 85% const-prop rate. This isn't quite 100% optimal since we need to call the RCLSE and Constprop passes after we optimize these, which would remove some redundant moves. Sadly there seems to be a bug in the constprop pass that starts crashing applications if that is done. Easily enough tested by running Half-Life 2 and it immediately hitting SIGILL. Even without this optimization, this is stil a significant savings since we aren't jumping out of the JIT anymore for these optimized CPUIDs.	2023-09-24 17:25:38 -07:00
Ryan Houdek	e9a7ef2534	CPUID: Describe CPUID functions if they return constant state or not Most CPUID routines return constant data, there are four that don't. Some CPUID functions also need the leaf descriptor, so we need to describe that as well. Functions that don't return constant data: - function 1Ah - Returns different data depending on current CPU core - function 8000_000{2,3,4} - Different data based on CPU core Functions that need leaf constprop: - 4h, 7h, Dh, 4000_0001h, 8000_001Dh	2023-09-24 17:25:38 -07:00
Ryan Houdek	842c57e221	CPUID: Constify some functions These don't modify CPUIDEmu state.	2023-09-24 17:25:38 -07:00
Alyssa Rosenzweig	8798e0cba0	Arm64: Rewrite Set/GetRoundingMode I went auditing for places to use cset and what I found was hot garbage. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2023-09-24 19:52:35 -04:00
Alyssa Rosenzweig	c5fc03dac4	OpcodeDispatcher: Use cset for blsr/etc flags Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2023-09-24 19:52:35 -04:00
Alyssa Rosenzweig	e63871ed2e	OpcodeDispatcher: Handle sub in CalculateOF Gets us the constant source optimization without more code duplication. And honestly I prefer the combined presentation. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2023-09-24 19:52:35 -04:00
Alyssa Rosenzweig	ea8b7633eb	OpcodeDispatcher: Optimize OF calc of immediates If we know the sign of one of the sources, we can do better when calculating OF. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2023-09-24 18:16:09 -04:00
Ryan Houdek	9ab2967d71	Arm64: Fixes wide shifts movprfx is invalid to use when the source register matches the movprfx destination. This was getting picked up on by `TwoByte/0F_D1.asm` now that RCLSE is working better now.	2023-09-23 06:06:18 -07:00
Ryan Houdek	d01b457727	RCLSE: Optimize redundant store->load operations The bug that was causing crashes with this was due to inline syscalls. Now that this is fixed we can re-enable store->load operations. This allows constant propagation to work significantly better, which means inline syscalls start working again. This can significantly improve syscall performance in some cases. This is most likely to improve performance in dxsetup and vc_redist but hard to get a real profile. Additionally this will let us inline cpuid results in the future which is pretty nice.	2023-09-23 06:06:18 -07:00
Mai	4e9a114858	Merge pull request #3142 from Sonicadvance1/inline_syscall_fix Arm64: Fixes inline syscalls	2023-09-23 09:03:49 -04:00
Mai	72d092e951	Merge pull request #3141 from Sonicadvance1/fix_simm9_range ConstProp: Fixes unscaled signed 9-bit range	2023-09-23 09:03:01 -04:00
Ryan Houdek	28fa0bda31	Arm64: Fixes inline syscalls Ever since we reordered registers in `X86Enums.h` this has silently been broken. This wasn't hit because RCLSE has been broken ever since SRA was added, so inlinesyscalls just weren't ever happening. Quick fix while I think of a way to more strictly correlate these registers so it doesn't happen again.	2023-09-23 02:56:32 -07:00
Ryan Houdek	1f2a3cfa8b	ConstProp: Fixes unscaled signed 9-bit range The range was slightly incorrect which mostly wouldn't have caused issues. The lowest byte would have just generated slightly less optimal code. The upper byte could have generated broken code, which our CI couldn't catch since TSO instructions only get enabled when multiple threads are in-flight. Easy enough to fix.	2023-09-23 01:13:54 -07:00
Ryan Houdek	571b0fe47e	Config: Fixes core sanitization This would have caused core to try and initialize a custom core on Arm64, which causes a std::function assert because it doesn't support that. Users would likely get hit by this immediately since we deleted the interpreter and shifted all the core numbers.	2023-09-23 00:52:23 -07:00
Alyssa Rosenzweig	223a6562ff	IR: Support <32-bit TestNZ Originally this was going to use setf8/setf16, but it looks like the approach of shift-and-test turns out to be faster. As a bonus this is a nice delete-the-code win :-) Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2023-09-22 19:08:26 -04:00
Alyssa Rosenzweig	b1231c24ef	OpcodeDispatcher: Omit AF xor for common constants The only reason we need to XOR arguments for AF is to get bit 4 correct. But if the operand in question is known to have bit 4 clear, the XOR will be an effective no-op and can be skipped. This saves an instruction in a bunch of common cases, like inc/dec. If we dedicated a register to AF to eliminate the store, we would not save an instruction from this but would still come out ahead due to an eor turning into a (zero cycle?) mov that can be handled by the renamer. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2023-09-22 19:08:26 -04:00
Alyssa Rosenzweig	699aa85c4b	OpcodeDispatcher: Opt PF selection Fold the and in. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2023-09-22 19:07:42 -04:00
Alyssa Rosenzweig	2d65a3677b	OpcodeDispatcher: Optimize NZCV selects Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2023-09-22 19:07:42 -04:00
Alyssa Rosenzweig	2a2619c0f5	IR: Add bit masking selects Add new synthetic condition codes that do an AND as their relational operator, testing the result. This is 1 IR op for things like (A & B) == 0 ? C : D This can translate to tst A, B csel A, B, eq In the future, if A is the NZCV register and B is a supported immediate, eg (NZCV & 0x80000000) == 0 ? C : D this will be able to translate to a single instruction with the appropriate condition csel A, B, pl but that needs RA support. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2023-09-22 19:07:42 -04:00
Ryan Houdek	797c890ff6	Merge pull request #2874 from bylaws/wowfex Add WOW64 JIT frontend	2023-09-22 15:47:59 -07:00
Ryan Houdek	0fbf403787	Adds back in host testharnessrunner CI Necessary for asm tests to still run in the host "core". Useful for ensuring correct behaviour of our assembly tests.	2023-09-22 14:46:03 -07:00
Billy Laws	51f8c83c76	Context: Add an alternative thread-oriented execute function	2023-09-22 10:12:40 -07:00
Billy Laws	d641d3f61e	OpcodeDispatcher: Avoid redundantly passing args to WIN32 ABI syscalls	2023-09-22 10:12:39 -07:00
Ryan Houdek	b5cc9a12f2	FEXCore: Removes x86 JIT. This is blocking performance improvements. This backend is almost unilaterally unused except for when I'm testing if games run on Radeon video drivers. Hopefully AmpereOne and Orin/Grace can fulfill this role when they launch next year.	2023-09-21 18:30:02 -07:00
Ryan Houdek	31564354b1	FEXCore: Removes vestigial Interpreter code	2023-09-21 15:49:49 -07:00
Ryan Houdek	fea72ce19c	Merge pull request #3120 from Sonicadvance1/more_optimal_x87 FEXCore: Support preserve_all ABI for interpreter fallbacks	2023-09-21 15:35:37 -07:00
Ryan Houdek	2b7e1d10ec	Merge pull request #3131 from Sonicadvance1/optimize_btr OpcodeDispatcher: Optimize lock btr	2023-09-21 15:06:55 -07:00
Ryan Houdek	5444810d64	Merge pull request #3132 from alyssarosenzweig/opt/orlshl Optimize reconstructing x87, harder	2023-09-21 15:02:37 -07:00
Ryan Houdek	1a4d1d820b	OpcodeDispatcher: Optimize lock btr This is an atomicFetchCLR, removes two mvn instructions that are back to back negating the source. We didn't have this instruction combination in InstCountCI so will be a bit hard to see.	2023-09-21 14:54:51 -07:00
Ryan Houdek	0ae4bbb9c5	IR: Implements support for AtomicFetchCLR This is the native ARM operation rather than fetchAnd. Will make an instruction an instruction slightly more optimal.	2023-09-21 14:54:51 -07:00
Alyssa Rosenzweig	c52741c813	FEXCore: Gut interpreter It is scarcely used today, and like the x86 jit, it is a significant maintainence burden complicating work on FEXCore and arm64 optimization. Remove it, bringing us down to 2 backends. 1 down, 1 to go. Some interpreter scaffolding remains for x87 fallbacks. That is not a problem here. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2023-09-21 12:48:12 -04:00
Alyssa Rosenzweig	1596e33f58	OpcodeDispatcher: Remove pointless or Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2023-09-21 09:13:41 -04:00
Alyssa Rosenzweig	07d03f1610	OpcodeDispatcher: Don't opencode bfe, badly Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2023-09-21 09:13:41 -04:00
Alyssa Rosenzweig	a8b48dcacd	OpcodeDispatcher: Swap some selects ...if it lets us use cset. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2023-09-21 09:13:41 -04:00
Alyssa Rosenzweig	bb87b2a19d	OpcodeDispatcher: Use more Orlshl Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2023-09-21 09:13:41 -04:00
Alyssa Rosenzweig	19eff62c77	OpcodeDispatcher: Use orlshl for FCW Potentially easier on the RA (bfi has a tied operand), mostly whatever here. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2023-09-21 08:55:25 -04:00
Mai	5fc8699db9	Merge pull request #3130 from Sonicadvance1/optimize_fsw OpcodeDispatcher: Optimize reconstructing FSW	2023-09-21 08:35:16 -04:00
Ryan Houdek	5664195e49	OpcodeDispatcher: Optimize reconstructing FSW Minor optimization using Bfi to insert C0, C1, C2, & C3	2023-09-21 02:07:27 -07:00

... 2 3 4 5 6 ...

611 Commits