FEX-Emu/FEX - FEX - Gitea: Git with a cup of tea

mirror of https://github.com/FEX-Emu/FEX.git synced 2025-02-07 15:09:05 +00:00

Author	SHA1	Message	Date
Ryan Houdek	6ec628fa31	Merge pull request #3433 from bylaws/arm64ec-pt1 Arm64Emitter: Introduce ARM64EC SRA mappings	2024-02-23 14:48:43 -08:00
Alyssa Rosenzweig	5378ae2e76	Merge pull request #3436 from alyssarosenzweig/ir/af-simplify Simplify CalculateAF	2024-02-22 08:17:07 -04:00
Ryan Houdek	d4be2dc636	Merge pull request #3434 from bylaws/arm64ec-pt3 FEXCore: Expose AbsoluteLoopTopAddress to the frontend	2024-02-21 14:31:04 -08:00
Alyssa Rosenzweig	2bcd285851	Merge pull request #3430 from Sonicadvance1/tsc_scale Implement small TSC scaling	2024-02-21 13:16:27 -04:00
Alyssa Rosenzweig	8762bc1fa3	OpcodeDispatcher: simplify CalculateAF signature - Res is unused - SrcSize doesn't matter since we ignore the high bits, might as well always use 32-bit, it doesn't matter Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-02-21 12:48:15 -04:00
Billy Laws	5b4162b712	FEXCore: Expose AbsoluteLoopTopAddress to the frontend ARM64EC has a shared SRA mapping between ARM64 and X64 code, so there needs to be a public way to enter the dispatcher without refilling SRA from the in-memory context struct.	2024-02-21 11:46:24 +00:00
Billy Laws	cb5c07f4b1	Arm64Emitter: Introduce ARM64EC SRA mappings See https://learn.microsoft.com/en-us/cpp/build/arm64ec-windows-abi-conventions?view=msvc-170 note that since mm registers are volatile there is no need to match the mapping for them when in JIT, so they can be used as scratch regs. Disallowed regs are also wiped on context switches, so they cannot be taken advantage of to e.g. avoid spilling.	2024-02-21 11:18:10 +00:00
Ryan Houdek	b902b8edab	Implement small TSC scaling Games engines are expecting >1Ghz cycle counters. Scale them to work around the issue. Resolves the excessive busy waiting in Unreal Engine 5 games.	2024-02-20 12:05:44 -08:00
Alyssa Rosenzweig	0503c89ff6	OpcodeDispatcher: use NZCV update helpers Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-02-19 14:12:54 -04:00
Alyssa Rosenzweig	6dd410698a	OpcodeDispatcher: add helpers for updating NZCV metadata to reduce error-prone copypaste Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-02-19 14:12:54 -04:00
Ryan Houdek	808ced455d	FEXCore: Add a frontend pointer to InternalThreadState FEXCore is guaranteed to not touch this pointer and can be used by frontends to store thread-specific data.	2024-02-15 02:06:16 -08:00
Ryan Houdek	9cab746aa7	Merge pull request #3407 from neobrain/feature_libfwd_arguments_on_guest_stack Library Forwarding: Allocate packed arguments on the guest stack if needed	2024-02-12 16:31:34 -08:00
Alyssa Rosenzweig	68232366e4	OpcodeDispatcher: don't mask add/sub sources not needed in the new approach Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-02-12 12:36:28 -04:00
Alyssa Rosenzweig	d7ff1b78fb	IR: handle 8/16-bit AddNZCV/SubNZCV we can do it more effectively than the current s/w lowering. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-02-12 12:36:09 -04:00
Mai	780b48620b	Merge pull request #3420 from Sonicadvance1/preserve_all_3419 Fix #3419	2024-02-10 23:24:38 -05:00
Ryan Houdek	4a0878fa92	Fix #3419	2024-02-10 19:55:51 -08:00
Ryan Houdek	df3d6938ae	Merge pull request #3410 from alyssarosenzweig/opt/nzcv-pass-2 Add NZCV+PF/AF optimization pass	2024-02-10 05:03:12 -08:00
Ryan Houdek	ba41da7da0	Merge pull request #3414 from Sonicadvance1/fix_one_mutex_hang Fixes one mutex hang	2024-02-09 05:54:40 -08:00
Ryan Houdek	2480bab409	Fixes one mutex hang When code invalidation is happening we currently have the issue that a thread can acquire the code invalidation mutex in the middle of invalidation. This is due to us acquiring and releasing the mutex between each thread's code invalidation. We need to hold the mutex for the entire duration for all thread's code invalidation. This fixes a rare hang on proton startup and resolves a consistent hang on Proton application shutdown. This now puts us on par with FEX-2312.1 with hanging. This does not fix a relatively rare hang on fork (which also existed with FEX-2312.1). This also does not fix the issue that the intersection of our mutexes between frontend and backend are very convoluted. In part of the work that is going to fix the rare fork mutex hang will change more of this.	2024-02-08 18:18:00 -08:00
Alyssa Rosenzweig	ad7202e7d7	OpcodeDispatcher: optimize test -1 Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-02-08 14:10:13 -04:00
Alyssa Rosenzweig	175a57dd27	OpcodeDispatcher: emit AndWithFlags directly for primary alu Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-02-06 14:22:28 -04:00
Alyssa Rosenzweig	e2ce60148c	OpcodeDispatcher: emit AndWithFlags directly for 2ndary alu rely on opt pass to drop the flags. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-02-06 14:22:28 -04:00
Alyssa Rosenzweig	99660129f3	IR: implement AndWithFlags for 8/16-bit easier to deal with in the JIT Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-02-06 14:22:28 -04:00
Alyssa Rosenzweig	308d9a751c	RedundantFlagCalculationElimination: optimize rmif Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-02-06 13:06:26 -04:00
Alyssa Rosenzweig	4bd28c0ed8	RedundantFlagCalculationElimination: optimize condaddnzcv Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-02-06 13:06:26 -04:00
Alyssa Rosenzweig	8397f3ac99	RedundantFlagCalculationElimination: refine AXFLAG Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-02-06 13:06:26 -04:00
Alyssa Rosenzweig	0452bc7212	RedundantFlagCalculationElimination: optimize condjump Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-02-06 13:06:26 -04:00
Alyssa Rosenzweig	3d7ed89ffb	RedundantFlagCalculationElimination: optimize NZCVSelect Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-02-06 13:06:25 -04:00
Alyssa Rosenzweig	23ab0a978e	RedundantFlagCalculationElimination: also handle InvalidateFlags Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-02-06 13:06:25 -04:00
Alyssa Rosenzweig	7f47a9ef0e	IR: add local dead flag elimination pass RCLSE ignores NZCV and doesn't optimize stores which doesn't help us with PF/AF either. So, we add a new pass for dead flag elimination (cannibalizing the old and broken dead flag elimination pass). This is a simple local optimizer that walks each block backwards, converging in linear time & constant space in a single iteration. Right now, it doesn't do a ton (other than a nice reduction in silliness in the hot Sonic block), but it provides the framework to fuse comparisons. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-02-06 13:06:25 -04:00
Alyssa Rosenzweig	4331753ca0	Merge pull request #3408 from alyssarosenzweig/opt/tst Optimize TST	2024-02-06 11:28:02 -04:00
Paulo Matos	fa8bcfd67a	Clean up access to possible nullptr Patch suggested by @Sonicadvance1	2024-02-06 12:31:38 +00:00
Tony Wasserka	a1343e9296	Revert "Add cmake option DISABLE_CLANG_PRESERVE_ALL"	2024-02-05 22:31:45 +01:00
Alyssa Rosenzweig	235f32ce8c	Merge pull request #3401 from Sonicadvance1/runtime_preserve_all HostFeatures: Supports runtime disabling of preserve_all	2024-02-05 15:34:46 -04:00
Alyssa Rosenzweig	2e0cb2fbd4	OpcodeDispatcher: optimize TST it's just an AndWithFlags setting the PF. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-02-05 15:32:21 -04:00
Alyssa Rosenzweig	4790a7ba79	IR: add AndWithFlags Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-02-05 15:32:21 -04:00
Tony Wasserka	df3e51fc8c	Library Forwarding: Allocate packed arguments on the guest stack if needed This is required for host-side calls to guest functions on 32-bit guests. Since the host stack is allocated before FEX blocks memory inaccessible to the guest, the guest would otherwise fail to read the packed argument data.	2024-02-05 18:10:34 +01:00
Ryan Houdek	0139498072	SpinWaitLock: Removes unused variable in spin-loop fallback Tmp was no longer being used, forgot to remove it.	2024-02-05 07:22:52 -08:00
Ryan Houdek	472a701e2b	Merge pull request #3403 from Sonicadvance1/fix_spinlock_contended_lock SpinLockWait: Fixes unexpected lock success	2024-02-05 06:51:42 -08:00
Ryan Houdek	cce6011205	SpinLockWait: Fixes unexpected lock success With a contended unique lock, we forgot to reset the `Expected` value to zero. This was causing a contended mutex to incorrectly succeed. Noticed this when converting some pthread mutexes over to spinloops to remove strace noise. The reference wfe_mutex library I wrote didn't have this problem since the implementation is slightly different.	2024-02-03 01:10:57 -08:00
Ryan Houdek	c437129ed8	Revert "Revert "FEXLoader: Moves thread management to the frontend"" This reverts commit 5358af7794d9568398f7b84fe09b4c8198448f2c.	2024-02-03 00:57:36 -08:00
Alyssa Rosenzweig	8d3f0b6f02	OpcodeDispatcher: reassociate and sink W in sha1 We only need each part of W extracted in the corresponding round, so sink the extract into the round to reduce pressure. Further, W and E are added and then never used again. So, by reassociating we can do the add upfront, killing W and E at the start and further reducing pressure. Eliminates spilling in sha1rnds4. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-02-02 13:03:07 -04:00
Alyssa Rosenzweig	60f7b9bcc4	OpcodeDispatcher: optimze sha1's 2/3 expr Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-02-02 13:03:07 -04:00
Alyssa Rosenzweig	a487557173	OpcodeDispatcher: extract BitwiseAtLeastTwo Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-02-02 13:03:07 -04:00
Alyssa Rosenzweig	394b4888bb	OpcodeDispatcher: reassociate and remat C0, G0 costs 2 moves and eliminates the rest of our spilling Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-02-02 13:03:07 -04:00
Alyssa Rosenzweig	142cbdd852	OpcodeDispatcher: expand, reassociate, and interleave sha256 calc Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-02-02 13:03:07 -04:00
Alyssa Rosenzweig	2f9102f78d	OpcodeDispatcher: expand & interleave sha256 calc Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-02-02 13:03:07 -04:00
Alyssa Rosenzweig	c9824d04cb	OpcodeDispatcher: sink sha256 extracts Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-02-02 13:03:07 -04:00
Alyssa Rosenzweig	9c2a569539	OpcodeDispatcher: reexpress Major in sha256 Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-02-02 13:03:07 -04:00
Alyssa Rosenzweig	515aa4ce3e	OpcodeDispatcher: fuse eor+ror in sha256 This reduces instructions a ton. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-02-02 13:03:07 -04:00

1 2 3 4 5 ...

934 Commits