FEX-Emu/FEX - FEX - Gitea: Git with a cup of tea

mirror of https://github.com/FEX-Emu/FEX.git synced 2024-12-15 18:08:35 +00:00

Author	SHA1	Message	Date
Alyssa Rosenzweig	7b1bb159fa	OpcodeDispatcher: use ForeachDirection for scas Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-31 20:31:38 -04:00
Alyssa Rosenzweig	5c7f2934de	OpcodeDispatcher: use ForeachDirection for lods eliminates xblock live Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-31 20:29:29 -04:00
Alyssa Rosenzweig	5d79d4eb50	OpcodeDispatcher: use ForeachDirection for CMPS eliminates xblock liveness Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-31 20:29:16 -04:00
Alyssa Rosenzweig	3f66173bc7	OpcodeDispatcher: add ForeachDirection helper Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-31 20:28:56 -04:00
Alyssa Rosenzweig	1a1545da0f	OpcodeDispatcher: rework rep cmp 1. pull flag calculation out of the loop body for perf 2. fully rotate the inner loop to save an instruction per iteration 3. hoist the rcx=0 jump to avoid computing df when rcx=0 Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-31 19:54:00 -04:00
Alyssa Rosenzweig	a70ea30c02	IR: add CondSubNZCV (ccmp) instruction Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-31 17:50:57 -04:00
Alyssa Rosenzweig	67baff8a57	Merge pull request #3537 from Sonicadvance1/remove_vla_ra RA: Removes VLA usage	2024-03-31 14:44:37 -04:00
Ryan Houdek	fedc24be1e	RA: Removes VLA usage Just like #3508, clang-18 complains about VLA usage. This vector is relatively small, only around 18 elements but is semi-dynamic depending on arch and if FEXCore is targeting Linux or Win32.	2024-03-30 16:50:04 -07:00
Alyssa Rosenzweig	706065b0e2	OpcodeDispatcher: accelerate cmpxchg with flagm Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-30 14:12:59 -04:00
Alyssa Rosenzweig	9fd32f07cb	JIT: preserve nzcv for the slow atomic path Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-30 14:12:59 -04:00
Alyssa Rosenzweig	deba6a1b76	JIT: add comment about unaligned backpatching save future me some grief. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-30 14:12:59 -04:00
Alyssa Rosenzweig	d25ace43aa	Merge pull request #3528 from alyssarosenzweig/ra/xsave-xrstor Eliminate crossblock liveness in xsave/xrstor	2024-03-30 14:11:25 -04:00
Ryan Houdek	8564290f76	FEXCore: Remove DebugStore map This hasn't been used and is blocking refactoring more code.	2024-03-29 14:58:44 -07:00
Alyssa Rosenzweig	c513b9685d	OpcodeDispatcher: eliminate crossblock liveness in xsave/xrstor Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-29 09:57:16 -04:00
Ryan Houdek	aa26b6288e	Merge pull request #3522 from alyssarosenzweig/ra/cmpxchg8 OpcodeDispatcher: eliminate branch in cmpxchg pair	2024-03-27 21:56:19 -07:00
Ryan Houdek	624bc3fce5	Merge pull request #3520 from Sonicadvance1/sleep_process FEXLoader: Add a way to sleep a process on startup	2024-03-27 18:35:06 -07:00
Alyssa Rosenzweig	61758ea47d	OpcodeDispatcher: eliminate branch in cmpxchg pair In the old case: * if we take the branch, 1 instruction * if we don't take the branch, 3 instruction * branch predictor fun * 3 instructions of icache pressure In the new case: * unconditionally 2 instructions * no branch predictor dependence * 2 instructions of icache pressure This should not be non-neglibly worse, and it simplifies things for RA. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-27 12:40:06 -04:00
Ryan Houdek	7b74ca1931	Merge pull request #3514 from alyssarosenzweig/opt/demon rewrite Demon Addition Adjust (DAA) and other demonic opcodes	2024-03-26 23:24:00 -07:00
Ryan Houdek	24fd28ed9e	Merge pull request #3511 from Sonicadvance1/more_tso_levers FEXCore: Adds more TSO control levers	2024-03-26 23:23:41 -07:00
Ryan Houdek	970d5d5b13	Merge pull request #3509 from Sonicadvance1/allow_telemetry_redirect Telemetry: Allow redirecting directory that data is written to	2024-03-26 23:23:05 -07:00
Ryan Houdek	7f90ca53f7	Merge pull request #3505 from Sonicadvance1/telemetry_noncanonical Telemetry: Adds tracker for non-canonical memory access crash	2024-03-26 23:21:32 -07:00
Ryan Houdek	ade0c46845	FEXLoader: Add a way to sleep a process on startup I find myself reimplementing this nearly monthly. Actually codify it so I can stop reimplementing it.	2024-03-26 07:48:09 -07:00
Ryan Houdek	6f29e75f67	FEXCore: Removes vestigial mman SMC checking This wasn't actually wired up to anything ever since some refactoring occured two years ago.	2024-03-26 02:56:26 -07:00
Alyssa Rosenzweig	dfe0bdd7f2	OpcodeDispatcher: rewrite DAS exhaustively checked against the Intel pseudocode since this is tricky: def intel(AL, CF, AF): old_AL = AL old_CF = CF CF = False if (AL & 0x0F) > 9 or AF: Borrow = AL < 6 AL = (AL - 6) & 0xff CF = old_CF or Borrow AF = True else: AF = False if (old_AL > 0x99) or old_CF: AL = (AL - 0x60) & 0xff CF = True return (AL & 0xff, CF, AF) def fex(AL, CF, AF): AF = AF \| ((AL & 0xf) > 9) CF = CF \| (AL > 0x99) NewCF = CF \| (AF if (AL < 6) else CF) AL = (AL - 6) if AF else AL AL = (AL - 0x60) if CF else AL return (AL & 0xff, NewCF, AF) for AL in range(256): for CF in [False, True]: for AF in [False, True]: ref = intel(AL, CF, AF) test = fex(AL, CF, AF) print(AL, "CF" if CF else "", "AF" if AF else "", ref, test) assert(ref == test) Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-25 19:43:10 -04:00
Alyssa Rosenzweig	e26481e3cc	OpcodeDispatcher: simplify AAM in the area. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-25 19:43:10 -04:00
Alyssa Rosenzweig	86b5a2f352	OpcodeDispatcher: simplify AAD noticed in the area. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-25 19:43:10 -04:00
Alyssa Rosenzweig	2bf880c43a	OpcodeDispatcher: rewrite AAS Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-25 19:43:10 -04:00
Alyssa Rosenzweig	583d4f8f94	OpcodeDispatcher: factor out CalculateAFForDecimal Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-25 19:43:10 -04:00
Alyssa Rosenzweig	3ca2c4377f	OpcodeDispatcher: rewrite AAA Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-25 19:43:10 -04:00
Ryan Houdek	76983476b9	Merge pull request #3504 from Sonicadvance1/fix_loop_a16 OpcodeDispatcher: Fixes 32-bit mode LOOP RCX register usage	2024-03-25 12:18:14 -07:00
Alyssa Rosenzweig	949717a95f	OpcodeDispatcher: rewrite DAA implementation Based on https://www.righto.com/2023/01/ New implementation is branchless, which is theoretically easier to RA. It's also massively simpler which is good for a demon opcode. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-25 13:00:59 -04:00
Alyssa Rosenzweig	693d86dd67	OpcodeDispatcher: add SetAFAndFixup helper Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-25 12:59:19 -04:00
Ryan Houdek	3034edb0aa	RA: Adds RIP when a block panic spills I find myself adding this every time I find a game that panic spills. Let's just print it out.	2024-03-24 17:11:29 -07:00
Ryan Houdek	64f47d1ec2	FEXCore: Adds more TSO control levers Lets use control vector loadstores and memcpy/memset TSO visibility. This just gives us a bit more configuration rather than TSO off or on.	2024-03-24 16:34:18 -07:00
Ryan Houdek	70befc216f	Telemetry: Allow redirecting directory that data is written to This will be necessary	2024-03-24 00:47:35 -07:00
Ryan Houdek	4952b2e16c	Telemetry: Rename old file instead of copying Since we do an immediate overwrite of the file we are copying, we can instead do a rename. Failure on rename is fine, will either mean the telemetry file didn't exist initially, or some other permission error so the telemetry will get lost regardless.	2024-03-21 22:51:20 -07:00
Ryan Houdek	5a35e119fe	Telemetry: Adds tracker for non-canonical memory access crash This may be useful for tracking TSO faulting when it manages to fetch stale data. While most TSO crashes are due to nullptr dereferences, this can still check for the corruption case.	2024-03-21 20:47:36 -07:00
Ryan Houdek	824f122680	OpcodeDispatcher: Fixes 32-bit mode LOOP RCX register usage In 64-bit mode, the LOOP instruction's RCX register usage is 64-bit or 32-bit. In 32-bit mode, the LOOP instruction's RCX register usage is 32-bit or 16-bit. FEX wasn't handling the 16-bit case at all which was causing the LOOP instruction to effectively always operate at 32-bit size. Now this is correctly supported, and it also stops treating the operation as 64-bit.	2024-03-21 20:13:15 -07:00
Ryan Houdek	8852d94416	Merge pull request #3503 from alyssarosenzweig/opt/loop OpcodeDispatcher: optimize LOOP/N/E	2024-03-21 20:05:50 -07:00
Alyssa Rosenzweig	82ba16c6ed	OpcodeDispatcher: optimize LOOP/N/E Don't clobber NZCV. Before/after assembly from the Primary_E1 unit test: < 4340: [INFO] cset w20, ne < 4340: [INFO] mrs x21, nzcv < 4340: [INFO] cmp x5, #0x0 (0) < 4340: [INFO] cset x22, ne < 4340: [INFO] and x20, x22, x20 < 4340: [INFO] msr nzcv, x21 < 4340: [INFO] cbnz x20, #+0x8 (addr 0xffff896f8084) < 4340: [INFO] b #+0x1c (addr 0xffff896f809c) < 4340: [INFO] ldr x0, pc+8 (addr 0xffff896f808c) --- > 4340: [INFO] csel x20, x5, xzr, ne > 4340: [INFO] cbnz x20, #+0x8 (addr 0xfffed7308070) > 4340: [INFO] b #+0x1c (addr 0xfffed7308088) > 4340: [INFO] ldr x0, pc+8 (addr 0xfffed7308078) Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-21 12:08:40 -04:00
Ryan Houdek	45ea0cd782	Removes false termux support This was a funny joke that this was here, but it is fundamentally incompatible with what we're doing. All those users are running proot anyway because of how broken running under termux directly is. Just remove this from here.	2024-03-20 22:04:32 -07:00
Billy Laws	d490cb1b79	FEXCore: Fallback to the memcpy slow path for overlaps within 32 bytes Take e.g a forward rep movsb copy from addr 0 to 1, the expected behaviour since this is a bytewise copy is: before: aaabbbb... after: aaaaaaa... but by copying in 32-byte chunks we end up with: after: aaaabbbb... due to the self overwrites not occuring within a single 32 bit copy.	2024-03-20 20:54:19 +00:00
Billy Laws	94fecb9dad	FEXCore: Remove needless alignment checks for the mem{cpy,set} fastpath	2024-03-20 20:54:09 +00:00
Ryan Houdek	7dcacfe990	Merge pull request #3478 from bylaws/memcpy FEXCore: Add non-atomic Memcpy and Memset IR fast paths	2024-03-18 18:56:44 -07:00
Billy Laws	8d4d8fe3e5	FEXCore: Add non-atomic Memcpy and Memset IR fast paths When TSO is disabled, vector LDP/STP can be used for a two instruction 32 byte memory copy which is significantly faster than the current byte-by-byte copy. Performing two such copies directly after oneanother also marginally increases copy speed for all sizes >=64.	2024-03-18 23:28:50 +00:00
Ryan Houdek	ab8ee64352	Merge pull request #3497 from Sonicadvance1/movmaskb_constant JIT: Optimize pmovmaskb with a named vector constant	2024-03-18 16:08:40 -07:00
Alyssa Rosenzweig	2a9fcc6a66	Merge pull request #3492 from Sonicadvance1/implement_prefetch OpcodeDispatcher: Implement support for the various prefetch instructions	2024-03-18 07:49:47 -04:00
Ryan Houdek	fd391b1b18	JIT: Optimize pmovmaskb with a named vector constant I was looking at some other JIT overheads and this cropped up as some overhead. Instead of materializing a constant using mov+movk+movk+movk, load it from the named vector constant array. In a micro-benchmark this improved performance by 34%. In bytemark this improved on subbench by 0.82%	2024-03-17 18:40:46 -07:00
Ryan Houdek	f79991a9d8	OpcodeDispatcher: Implement rdpid Missed this instruction when implementing rdtscp. Returns the same ID result in a register just like rdtscp, but without the cycle counter results. Doesn't touch any flags just like rdtscp.	2024-03-14 20:07:58 -07:00
Ryan Houdek	ca6b2e43e6	Merge pull request #3491 from alyssarosenzweig/rclse/waw RCLSE: Optimize store-after-store	2024-03-14 03:23:05 -07:00

1 2 3 4 5 ...

1032 Commits