FEX-Emu/FEX - FEX - Gitea: Git with a cup of tea

mirror of https://github.com/FEX-Emu/FEX.git synced 2025-02-01 20:16:20 +00:00

Author	SHA1	Message	Date
Alyssa Rosenzweig	4d503d3155	RegisterAllocationPass: drop prewritable check always true. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-05-08 14:24:41 -04:00
Alyssa Rosenzweig	7e663b91df	IR: drop IRParser Aside from its own self-test, the parser is unused and should remain that way, since it's a maintenance burden with no real benefit. Burn it. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-05-08 14:16:54 -04:00
Alyssa Rosenzweig	3c3ba62c10	MemoryOps: optimize 32-bit SRA case Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-05-08 14:01:42 -04:00
Alyssa Rosenzweig	34fe56dfb2	DeadStoreElimination: CSE block info Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-05-08 14:01:42 -04:00
Alyssa Rosenzweig	ecf8cde5e0	DeadStoreElimination: group common logic slightly less obnoxious copypaste. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-05-08 14:01:42 -04:00
Alyssa Rosenzweig	55284aad7e	DeadStoreElimination: don't handle partial stores SRA replaces the whole contents of the destination. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-05-08 14:01:42 -04:00
Alyssa Rosenzweig	3afc35f7b4	DeadStoreElimination: simplify use registers internally, not synthesized offsets Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-05-08 14:01:42 -04:00
Alyssa Rosenzweig	1058428a51	IR: document invariant on SRA This lets us simplify a lot! Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-05-08 14:01:42 -04:00
Alyssa Rosenzweig	a2fc51fc7b	IR: specify registers, not offsets for SRA SRA is fundamentally about hardware registers, not stores into a software-defined context. So, it should take a register instead of an offset. This makes all the unaligned special cases unrepresentable (by design). Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-05-08 14:01:42 -04:00
Alyssa Rosenzweig	1848629ba5	RegisterAllocationPass: drop aliasable check always true with the new ir invariants. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-05-08 14:01:42 -04:00
Alyssa Rosenzweig	3399577330	JIT: clean up fpr sra Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-05-08 14:01:42 -04:00
Alyssa Rosenzweig	18bfc8afd0	JIT: clean up gpr sra handling Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-05-08 14:01:42 -04:00
Alyssa Rosenzweig	76b023ed3e	JIT: drop unaligned and partial SRA handling This is all dead, assert as much so it stays that way. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-05-08 14:01:42 -04:00
Alyssa Rosenzweig	b91b0e9d65	IR: infer SRA static class no need to stick it in the IR. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-05-08 14:01:42 -04:00
Alyssa Rosenzweig	74489a4177	IR: remove dead SRA flags I don't know what these were meant for, and I don't care (-: Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-05-08 14:01:42 -04:00
Billy Laws	bd24364c1b	FEXCore: Switch stacks before exiting the JIT on ARM64EC This removes the need for the frontend to have any knowledge of FEX's SRA layout.	2024-05-06 15:41:34 +00:00
Billy Laws	ab516d7b79	Dispatcher: Implement ARM64EC SRA setup entrypoints While the ARM64EC ABI mostly matches FEX's SRA, the stack still needs to be switched to the emulator stack and target RIP stored into the FEX context before jumping to the dispatcher loop.	2024-05-06 15:41:34 +00:00
Billy Laws	d25ed4b0bf	Dispatcher: Block system call callbacks when compiling code These callbacks are used for code invalidation and setting the right emulated CPU features, neither of which are necessary for syscalls made from within FEX. Avoid calling them to prevent deadlocks caused by nested locks during compilation.	2024-05-06 15:41:28 +00:00
Ryan Houdek	5f0427c253	CPUID: Adds Qualcomm Oryon product name From https://github.com/llvm/llvm-project/pull/91022 Easy enough	2024-05-03 20:16:46 -07:00
Ryan Houdek	faa494c288	Merge pull request #3605 from Sonicadvance1/move_fex_versionstring_cpuid CPUID: Removes FEX version string from CPU model name	2024-05-02 11:20:49 -07:00
Ryan Houdek	6228226c08	CPUID: Fix inverted RDTSCP check This was inverted and always enabling the RDTSCP cpuid bit for wine. Thus always disabling it elsewhere.	2024-05-01 18:31:41 -07:00
Ryan Houdek	31341bb7c2	CPUID: Removes FEX version string from CPU model name Moves it to the hypervisor leafs. Before: ```bash $ FEXBash 'cat /proc/cpuinfo \| grep "model name"' model name : FEX-2404-101-gf9effcb Cortex-A78C model name : FEX-2404-101-gf9effcb Cortex-A78C model name : FEX-2404-101-gf9effcb Cortex-A78C model name : FEX-2404-101-gf9effcb Cortex-A78C model name : FEX-2404-101-gf9effcb Cortex-X1C model name : FEX-2404-101-gf9effcb Cortex-X1C model name : FEX-2404-101-gf9effcb Cortex-X1C model name : FEX-2404-101-gf9effcb Cortex-X1C ``` After: ```bash $ FEXBash 'cat /proc/cpuinfo \| grep "model name"' model name : Cortex-A78C model name : Cortex-A78C model name : Cortex-A78C model name : Cortex-A78C model name : Cortex-X1C model name : Cortex-X1C model name : Cortex-X1C model name : Cortex-X1C ``` Now the FEX string is in the hypervisor functions as a leaf, so if some utility wants the FEX version they can query that directly Ex: ```bash $ ./Bin/FEXInterpreter get_cpuid_fex Maximum 4000_0001h sub-leaf: 2 We are running under FEX on host: 2 FEX version string is: 'FEX-2404-113-g820494d' ```	2024-05-01 16:27:13 -07:00
Alyssa Rosenzweig	76b5ca4bcc	OpcodeDispatcher: optimize 8/16-bit adc Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-04-29 20:00:34 -04:00
Alyssa Rosenzweig	28fa88ff39	OpcodeDispatcher: fix 8/16-bit adc/sbc flags Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-04-29 20:00:34 -04:00
Ryan Houdek	1069cabad0	Merge pull request #3598 from Sonicadvance1/half_barrier_delete_hack Arm64: Adds another TSO hack to disable half-barrier TSO	2024-04-26 18:24:49 -07:00
Ryan Houdek	fe70ec7277	Merge pull request #3599 from alyssarosenzweig/jit/fix-faddv JIT: fix neon vec4 faddv	2024-04-24 18:20:11 -07:00
Alyssa Rosenzweig	4a4fa64254	JIT: fix neon vec4 faddv We were previously genrating nonsense code if the destination != source: faddp v2.4s, v4.4s, v4.4s faddp s2, v4.2s The result of the first faddp is ignored, so the second merely calculates the sum of the first 2 sources (not all 4 as needed). The correct fix is to feed the first add into the second, regardless of the final destination: faddp v2.4s, v4.4s, v4.4s faddp s2, v2.2s Hit in an ASM test with new RA. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-04-24 21:13:02 -04:00
Ryan Houdek	6463054fa3	Arm64: Adds another TSO hack to disable half-barrier TSO A feature of FEX's JIT is that when an unaligned atomic load/store operation occurs, the instructions will be backpatched in to a barrier plus a non-atomic memory instruction. This is the half-barrier technique that still ensures correct visibility of loadstores in an unaligned context. The problem with this approach is that the dmb instructions are HEAVY, because they effectively stop the world until all memory operations in flight are visible. But it is a necessary evil since unaligned atomics aren't a thing on ARM processors. FEAT_LSE only gives you unaligned atomics inside of a 16-byte granularity, which doesn't match x86 behaviour of cacheline size (effectively always 64B). This adds a new TSO option to disable the half-barrier on unaligned atomic and instead only convert it to a regular loadstore instruction, ommiting the half-barrier. This gives more insight in to how well a CPU's LRCPC implementation is by not stalling on DMB instructions when possible. Originally implemented as a test to see if this makes Sonic Adventure 2 run full speed with TSO enabled (but all available TSO options disabled) on NVIDIA Orin. Unfortunately this basically makes the code no longer stall on dmb instructions and instead just showing how bad the LRCPC implementation is, since the stalls show up on `ldapur` instructions instead. Tested Sonic Adventure 2 on X13s and it ran at 60FPS there without the hack anyway.	2024-04-24 13:09:00 -07:00
Ryan Houdek	a0bf6a4255	Merge pull request #3595 from alyssarosenzweig/ir/before Factor out SetWriteCursorBefore	2024-04-23 13:34:05 -07:00
Ryan Houdek	308488c419	Allocator: Fixes compiling on Fedora 40 This header was missing. Either libstdc++14 or clang-18 changed includes and we were only getting this indirectly before.	2024-04-23 12:12:33 -07:00
Alyssa Rosenzweig	2372c9458b	ConstProp: use SetWriteCursorBefore Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-04-23 13:09:45 -04:00
Alyssa Rosenzweig	1a11343f34	IREmitter: add SetWriteCursorBefore helper This is subtle, add an ergonomic helper for it. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-04-23 13:09:45 -04:00
Ryan Houdek	84d5b3ee59	CPUID: Enable enhanced rep movs in more situations Instead of only enabling enhanced rep movs if software TSO is disabled, Enable it if software tso is disabled OR memcpysettso is disabled. This is because now we hit the fast path when memcpysettso is disabled alone but global TSO is disabled. Retested Hades and performance was fine in this configuration.	2024-04-21 18:50:17 -07:00
Ryan Houdek	376936c808	Merge pull request #3591 from alyssarosenzweig/ra/fix JIT: fix ShiftFlags shuffles	2024-04-19 13:50:45 -07:00
Alyssa Rosenzweig	932b8f38f4	JIT: fix ShiftFlags shuffles messed up my RA. fixes ShiftPF.asm with jit_1 with a pathological register allocation Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-04-18 14:11:03 -04:00
Ryan Houdek	c8704a7f71	OpcodeDispatcher: Implement support for SMSW Found out that Far Cry uses this instruction and it is viable to use in CPL-3. This only returns constant data but its behaviour is a little quirky. This instruction has a weird behaviour that the 32-bit operation does an insert in to the 64-bit destination, which might be an Intel versus AMD behaviour. I don't have an Intel machine available to test if that theory is true although. This assumption would match similar behaviour where segment registers are inserted instead of zext. Gets the game farther but then it crashes in a `___ascii_strnicmp` function where the arguments end up being `___ascii_strnicmp(nullptr, "Color", 5);`.	2024-04-18 07:41:39 -07:00
Alyssa Rosenzweig	352dcdb478	RCLSE: disable store-after-store optimization Functional revert of 92f31648b ("RCLSE: optimize out pointless stores"), which reportedly regressed some titles due to RA doom. We'll revisit later, leaving in the code for when RA is ready to light this up. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-04-17 14:53:11 -04:00
Paulo Matos	905aa935f5	Reformat until fixed-point Followup to 2b4ec88daebd35fefb5bf5c73d7fc2b4155771ed. Some files needed a couple of calls to clang-format 16.0.6 to reach a fixed point.	2024-04-15 09:40:00 +02:00
Ryan Houdek	7614ac9f14	Merge pull request #3573 from pmatos/RemoveClangTidy Remove trace of clang-tidy experiment from CMakeLists.txt	2024-04-12 17:13:53 -07:00
Paulo Matos	2b4ec88dae	Whole-tree reformat This follows discussions from #3413. Followup commits add clang-format file, script and blame ignore lists.	2024-04-12 16:26:02 +02:00
Paulo Matos	6524716404	Move const to the left in preparation for reformatting clang-format-16 had some issues with const placement, so we are manually changing these.	2024-04-12 16:06:57 +02:00
Paulo Matos	20559853ee	Remove trace of clang-tidy experiment from CMakeLists.txt	2024-04-12 12:31:04 +02:00
Ryan Houdek	a9b7ad841c	Merge pull request #3570 from bylaws/ec_pt8 Enable jemalloc for ARM64EC	2024-04-11 12:57:36 -07:00
Ryan Houdek	271700e9f6	Merge pull request #3568 from lioncash/const X87: Simplify constant loading for FLD family	2024-04-11 00:32:26 -07:00
Ryan Houdek	1ba678f631	Merge pull request #3562 from Sonicadvance1/fix_rsp_store_tso OpcodeDispatcher: Fixes disabling TSO access on RSP SIB stores	2024-04-11 00:32:14 -07:00
Ryan Houdek	a0f2cae1cb	Merge pull request #3567 from lioncash/veczero IR: Remove VectorZero	2024-04-09 20:04:26 -07:00
Ryan Houdek	66cbb66732	Merge pull request #3566 from lioncash/address OpcodeDispatcher: Add helper for making segment offset addresses	2024-04-09 17:31:43 -07:00
Billy Laws	f1f0c47f16	AllocatorHooks: Allow using jemalloc on win32	2024-04-09 23:42:23 +00:00
Lioncache	4cb2432b5c	OpcodeDispatcher: Make use of new x87 constants Now we can load these directly instead of needing to manually materialize them.	2024-04-09 10:17:15 -04:00
Lioncache	27ba66a181	IRDumper: Extend printer for NamedVectorConstant Makes it aware of the x87 constants.	2024-04-09 10:13:35 -04:00

... 2 3 4 5 6 ...

1357 Commits