FEX-Emu/FEX - FEX - Gitea: Git with a cup of tea

mirror of https://github.com/FEX-Emu/FEX.git synced 2025-01-07 06:03:50 +00:00

Author	SHA1	Message	Date
Ryan Houdek	3034edb0aa	RA: Adds RIP when a block panic spills I find myself adding this every time I find a game that panic spills. Let's just print it out.	2024-03-24 17:11:29 -07:00
Ryan Houdek	c025039651	InstcountCI: Adds a block that is causing panic spilling	2024-03-24 17:10:00 -07:00
Ryan Houdek	64f47d1ec2	FEXCore: Adds more TSO control levers Lets use control vector loadstores and memcpy/memset TSO visibility. This just gives us a bit more configuration rather than TSO off or on.	2024-03-24 16:34:18 -07:00
Ryan Houdek	ea31363221	Linux/Threads: Fixes a stack memory leak for pthreads Same situation as the last stack leak memory fix, this is fairly tricky since it is dealing with stack pivoting. Fixes the memory leak around pthread stack allocations, making memory usage lower for applications that constantly spin-up and destroy threads (Like Steam). We need to let glibc allocate a minimum sized stack (128KB and we can't control it) to work around a race condition with DTV/TLS regions. This means we need to do a stack pivot once the thread starts executing. We also need to be careful because the `PThread` object is deleted inside of the execution thread, which was resulting in a use-after-free bug. There are definitely some more memory leaks that I'm still fighting, and I have noticed in my abusive thread creation program that we might want to change some jemalloc options to more aggressively cut down on residency. This is just one out of many.	2024-03-24 05:22:22 -07:00
Ryan Houdek	70befc216f	Telemetry: Allow redirecting directory that data is written to This will be necessary	2024-03-24 00:47:35 -07:00
Ryan Houdek	50f62663ac	ELFParser: Stop using a VLA Clang-18 complains about this, use a vector instead.	2024-03-22 22:51:57 -07:00
Ryan Houdek	60755acef0	FEXLoader: Add some debug-only tracking for FEX owned FDs I remember seeing some application last year where they closed a FEX owned FD but now I don't remember what it was. This can really mess us up so add some debug tracking so we can try and find it again. Might be something specifically around flatpack, appimage, or chrome's sandbox. I have some ideas about how to work around these problems if they crop up but need to find the problem applications again.	2024-03-22 22:49:26 -07:00
Mai	002ca360f8	Merge pull request #3506 from Sonicadvance1/telemetry_rename Telemetry: Rename old file instead of copying	2024-03-23 00:24:02 -04:00
Ryan Houdek	4952b2e16c	Telemetry: Rename old file instead of copying Since we do an immediate overwrite of the file we are copying, we can instead do a rename. Failure on rename is fine, will either mean the telemetry file didn't exist initially, or some other permission error so the telemetry will get lost regardless.	2024-03-21 22:51:20 -07:00
Ryan Houdek	cccf263080	InstCountCI: Update for Telemetry offset changes	2024-03-21 21:10:03 -07:00
Ryan Houdek	5a35e119fe	Telemetry: Adds tracker for non-canonical memory access crash This may be useful for tracking TSO faulting when it manages to fetch stale data. While most TSO crashes are due to nullptr dereferences, this can still check for the corruption case.	2024-03-21 20:47:36 -07:00
Ryan Houdek	9ab930cb26	unittests/ASM: Adds tests for loop instruction address size overrides 32-bit test would fail if the 16-bit address size override wasn't respected.	2024-03-21 20:18:43 -07:00
Ryan Houdek	824f122680	OpcodeDispatcher: Fixes 32-bit mode LOOP RCX register usage In 64-bit mode, the LOOP instruction's RCX register usage is 64-bit or 32-bit. In 32-bit mode, the LOOP instruction's RCX register usage is 32-bit or 16-bit. FEX wasn't handling the 16-bit case at all which was causing the LOOP instruction to effectively always operate at 32-bit size. Now this is correctly supported, and it also stops treating the operation as 64-bit.	2024-03-21 20:13:15 -07:00
Ryan Houdek	8852d94416	Merge pull request #3503 from alyssarosenzweig/opt/loop OpcodeDispatcher: optimize LOOP/N/E	2024-03-21 20:05:50 -07:00
Mai	0c24aea27e	Merge pull request #3502 from Sonicadvance1/remove_termux Removes false termux support	2024-03-21 12:45:01 -04:00
Alyssa Rosenzweig	82ba16c6ed	OpcodeDispatcher: optimize LOOP/N/E Don't clobber NZCV. Before/after assembly from the Primary_E1 unit test: < 4340: [INFO] cset w20, ne < 4340: [INFO] mrs x21, nzcv < 4340: [INFO] cmp x5, #0x0 (0) < 4340: [INFO] cset x22, ne < 4340: [INFO] and x20, x22, x20 < 4340: [INFO] msr nzcv, x21 < 4340: [INFO] cbnz x20, #+0x8 (addr 0xffff896f8084) < 4340: [INFO] b #+0x1c (addr 0xffff896f809c) < 4340: [INFO] ldr x0, pc+8 (addr 0xffff896f808c) --- > 4340: [INFO] csel x20, x5, xzr, ne > 4340: [INFO] cbnz x20, #+0x8 (addr 0xfffed7308070) > 4340: [INFO] b #+0x1c (addr 0xfffed7308088) > 4340: [INFO] ldr x0, pc+8 (addr 0xfffed7308078) Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-21 12:08:40 -04:00
Ryan Houdek	45ea0cd782	Removes false termux support This was a funny joke that this was here, but it is fundamentally incompatible with what we're doing. All those users are running proot anyway because of how broken running under termux directly is. Just remove this from here.	2024-03-20 22:04:32 -07:00
Ryan Houdek	6ce366ef35	Merge pull request #3499 from Sonicadvance1/overlapping_memcpy_unittest unittests/ASM: Adds a test for overlapping memcpy using rep movs	2024-03-20 19:52:19 -07:00
Ryan Houdek	862d63adf2	unittests/ASM: Adds a test for overlapping memcpy using rep movs Caused by #3478 This was missed in the review that it could cause issues. bylaws already has a fix incoming that will get this unit test working.	2024-03-20 18:44:36 -07:00
Ryan Houdek	167896dc9d	Merge pull request #3501 from bylaws/memcpy FEXCore: Fallback to the memcpy slow path for overlaps within 32 bytes	2024-03-20 18:42:56 -07:00
Billy Laws	12fb26f9c0	Update InstCountCI	2024-03-20 21:08:59 +00:00
Billy Laws	d490cb1b79	FEXCore: Fallback to the memcpy slow path for overlaps within 32 bytes Take e.g a forward rep movsb copy from addr 0 to 1, the expected behaviour since this is a bytewise copy is: before: aaabbbb... after: aaaaaaa... but by copying in 32-byte chunks we end up with: after: aaaabbbb... due to the self overwrites not occuring within a single 32 bit copy.	2024-03-20 20:54:19 +00:00
Billy Laws	94fecb9dad	FEXCore: Remove needless alignment checks for the mem{cpy,set} fastpath	2024-03-20 20:54:09 +00:00
Ryan Houdek	7dcacfe990	Merge pull request #3478 from bylaws/memcpy FEXCore: Add non-atomic Memcpy and Memset IR fast paths	2024-03-18 18:56:44 -07:00
Billy Laws	29b05f6b90	Update InstCountCI	2024-03-18 23:30:19 +00:00
Billy Laws	8d4d8fe3e5	FEXCore: Add non-atomic Memcpy and Memset IR fast paths When TSO is disabled, vector LDP/STP can be used for a two instruction 32 byte memory copy which is significantly faster than the current byte-by-byte copy. Performing two such copies directly after oneanother also marginally increases copy speed for all sizes >=64.	2024-03-18 23:28:50 +00:00
Ryan Houdek	ab8ee64352	Merge pull request #3497 from Sonicadvance1/movmaskb_constant JIT: Optimize pmovmaskb with a named vector constant	2024-03-18 16:08:40 -07:00
Alyssa Rosenzweig	2a9fcc6a66	Merge pull request #3492 from Sonicadvance1/implement_prefetch OpcodeDispatcher: Implement support for the various prefetch instructions	2024-03-18 07:49:47 -04:00
Ryan Houdek	20da1e4244	InstcountCI: Update for pmovmaskb	2024-03-17 18:52:21 -07:00
Ryan Houdek	fd391b1b18	JIT: Optimize pmovmaskb with a named vector constant I was looking at some other JIT overheads and this cropped up as some overhead. Instead of materializing a constant using mov+movk+movk+movk, load it from the named vector constant array. In a micro-benchmark this improved performance by 34%. In bytemark this improved on subbench by 0.82%	2024-03-17 18:40:46 -07:00
Mai	ba3029b1f6	Merge pull request #3495 from Sonicadvance1/implement_rdpid OpcodeDispatcher: Implement rdpid	2024-03-15 17:29:12 -04:00
Ryan Houdek	6757a80365	InstcountCI: Update for prefetch changes	2024-03-15 13:20:28 -07:00
Ryan Houdek	f79991a9d8	OpcodeDispatcher: Implement rdpid Missed this instruction when implementing rdtscp. Returns the same ID result in a register just like rdtscp, but without the cycle counter results. Doesn't touch any flags just like rdtscp.	2024-03-14 20:07:58 -07:00
Ryan Houdek	ca6b2e43e6	Merge pull request #3491 from alyssarosenzweig/rclse/waw RCLSE: Optimize store-after-store	2024-03-14 03:23:05 -07:00
Ryan Houdek	8a3d08e1d8	Merge pull request #3483 from neobrain/refactor_stealmemoryregion Allocator: Cleanup StealMemoryRegions implementation	2024-03-14 03:21:09 -07:00
Ryan Houdek	cd2a6ce820	Merge pull request #3469 from alyssarosenzweig/opt/df Optimize DF representation	2024-03-14 03:18:41 -07:00
Ryan Houdek	4e269d8b80	Merge pull request #3494 from neobrain/fix_libfwd_float_as_int Library Forwarding: Don't map float/double to fixed-size integers	2024-03-14 03:11:11 -07:00
Tony Wasserka	552e76c001	Library Forwarding: Don't map float/double to fixed-size integers Fixes #3455.	2024-03-14 10:14:57 +01:00
Mai	caff3cb799	Merge pull request #3493 from Sonicadvance1/bug_for_3478 unittests/ASM: Implements a unit test for #3478	2024-03-13 22:57:39 -04:00
Ryan Houdek	0d33dacc37	unittests/ASM: Implements a unit test for #3478 This unit test recreates the error condition that #3478 causes. With a string operation that is a backwards copy then the optimization will read past the end of the page and result in a crash. Seemingly only happens with backwards string operations, but test forward and backward in this test.	2024-03-13 18:36:19 -07:00
Ryan Houdek	ba7b69eea2	InstCountCI: Adds prefetch addressing limits	2024-03-12 21:38:28 -07:00
Ryan Houdek	8056bee82b	OpcodeDispatcher: Implement support for the various prefetch instructions x86 has a few prefetch instructions. - prefetch - One of two classic 3DNow! instructions - Prefetch in to L1 data cache - prefetchw - One of two classic 3DNow! instructions - Implies prefetch in to L1 data cache - Prefetch cacheline with intent to write and exclusive ownership - prefetchnta - Prefetch non-temporal data in respect to /all/ cache levels - Assumes inclusive caches? - prefetch{t0,t1,t2} - Prefetch data with respect to each cache level - T0 = L1 and higher - T1 = L2 and higher - T2 = L3 and higher Some silly duplicates - prefetchwt1 - Duplicate of prefetchw but explicitly L1 data cache - prefetch_exclusive - Duplicate of prefetch God Of War 2018 uses prefetchw as a hint for exclusive ownership of the cacheline in some very aggressive spin-loops. Let's implement the operations to help it along.	2024-03-12 21:37:31 -07:00
Ryan Houdek	cc635a54f8	IR: Implements support for prefetch operation	2024-03-12 21:19:50 -07:00
Ryan Houdek	217d9d8c50	ARMEmitter: Fixes prfm with negative or unaligned offsets	2024-03-12 21:18:23 -07:00
Tony Wasserka	a047ac1699	Allocator: Test CollectMemoryGaps instead of StealMemoryRegions and restore the original interfaces	2024-03-12 10:49:31 +01:00
Tony Wasserka	bb0b114fc8	Allocator: Miscellaneous cleanups	2024-03-12 10:49:30 +01:00
Tony Wasserka	ccd6c15316	Allocator: Use std::from_chars instead of parsing digits manually	2024-03-12 10:49:30 +01:00
Tony Wasserka	0a1fe1c8c2	Allocator: Parse process mappings per-line instead of per-character	2024-03-12 10:49:30 +01:00
Tony Wasserka	f43fe5fd63	Allocator: Stop parsing more eagerly This is a soft-revert of `eaf83aa`. That change is no longer needed, since the stack special case is handled externally now.	2024-03-12 10:49:30 +01:00
Tony Wasserka	dce9f651fd	Allocator: Split off memory gap collection to a separate function This function can be unit-tested more easily, and the stack special is more cleanly handled as a post-collection step. There is a minor functional change: The stack special case didn't trigger previously if the range end was within the stack mapping. This is now fixed.	2024-03-12 10:49:30 +01:00

... 2 3 4 5 6 ...

9126 Commits