FEX-Emu/FEX - FEX - Gitea: Git with a cup of tea

mirror of https://github.com/FEX-Emu/FEX.git synced 2024-12-15 01:49:00 +00:00

Author	SHA1	Message	Date
Ryan Houdek	faa494c288	Merge pull request #3605 from Sonicadvance1/move_fex_versionstring_cpuid CPUID: Removes FEX version string from CPU model name	2024-05-02 11:20:49 -07:00
Ryan Houdek	6228226c08	CPUID: Fix inverted RDTSCP check This was inverted and always enabling the RDTSCP cpuid bit for wine. Thus always disabling it elsewhere.	2024-05-01 18:31:41 -07:00
Ryan Houdek	31341bb7c2	CPUID: Removes FEX version string from CPU model name Moves it to the hypervisor leafs. Before: ```bash $ FEXBash 'cat /proc/cpuinfo \| grep "model name"' model name : FEX-2404-101-gf9effcb Cortex-A78C model name : FEX-2404-101-gf9effcb Cortex-A78C model name : FEX-2404-101-gf9effcb Cortex-A78C model name : FEX-2404-101-gf9effcb Cortex-A78C model name : FEX-2404-101-gf9effcb Cortex-X1C model name : FEX-2404-101-gf9effcb Cortex-X1C model name : FEX-2404-101-gf9effcb Cortex-X1C model name : FEX-2404-101-gf9effcb Cortex-X1C ``` After: ```bash $ FEXBash 'cat /proc/cpuinfo \| grep "model name"' model name : Cortex-A78C model name : Cortex-A78C model name : Cortex-A78C model name : Cortex-A78C model name : Cortex-X1C model name : Cortex-X1C model name : Cortex-X1C model name : Cortex-X1C ``` Now the FEX string is in the hypervisor functions as a leaf, so if some utility wants the FEX version they can query that directly Ex: ```bash $ ./Bin/FEXInterpreter get_cpuid_fex Maximum 4000_0001h sub-leaf: 2 We are running under FEX on host: 2 FEX version string is: 'FEX-2404-113-g820494d' ```	2024-05-01 16:27:13 -07:00
Alyssa Rosenzweig	76b5ca4bcc	OpcodeDispatcher: optimize 8/16-bit adc Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-04-29 20:00:34 -04:00
Alyssa Rosenzweig	28fa88ff39	OpcodeDispatcher: fix 8/16-bit adc/sbc flags Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-04-29 20:00:34 -04:00
Ryan Houdek	1069cabad0	Merge pull request #3598 from Sonicadvance1/half_barrier_delete_hack Arm64: Adds another TSO hack to disable half-barrier TSO	2024-04-26 18:24:49 -07:00
Ryan Houdek	fe70ec7277	Merge pull request #3599 from alyssarosenzweig/jit/fix-faddv JIT: fix neon vec4 faddv	2024-04-24 18:20:11 -07:00
Alyssa Rosenzweig	4a4fa64254	JIT: fix neon vec4 faddv We were previously genrating nonsense code if the destination != source: faddp v2.4s, v4.4s, v4.4s faddp s2, v4.2s The result of the first faddp is ignored, so the second merely calculates the sum of the first 2 sources (not all 4 as needed). The correct fix is to feed the first add into the second, regardless of the final destination: faddp v2.4s, v4.4s, v4.4s faddp s2, v2.2s Hit in an ASM test with new RA. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-04-24 21:13:02 -04:00
Ryan Houdek	6463054fa3	Arm64: Adds another TSO hack to disable half-barrier TSO A feature of FEX's JIT is that when an unaligned atomic load/store operation occurs, the instructions will be backpatched in to a barrier plus a non-atomic memory instruction. This is the half-barrier technique that still ensures correct visibility of loadstores in an unaligned context. The problem with this approach is that the dmb instructions are HEAVY, because they effectively stop the world until all memory operations in flight are visible. But it is a necessary evil since unaligned atomics aren't a thing on ARM processors. FEAT_LSE only gives you unaligned atomics inside of a 16-byte granularity, which doesn't match x86 behaviour of cacheline size (effectively always 64B). This adds a new TSO option to disable the half-barrier on unaligned atomic and instead only convert it to a regular loadstore instruction, ommiting the half-barrier. This gives more insight in to how well a CPU's LRCPC implementation is by not stalling on DMB instructions when possible. Originally implemented as a test to see if this makes Sonic Adventure 2 run full speed with TSO enabled (but all available TSO options disabled) on NVIDIA Orin. Unfortunately this basically makes the code no longer stall on dmb instructions and instead just showing how bad the LRCPC implementation is, since the stalls show up on `ldapur` instructions instead. Tested Sonic Adventure 2 on X13s and it ran at 60FPS there without the hack anyway.	2024-04-24 13:09:00 -07:00
Ryan Houdek	a0bf6a4255	Merge pull request #3595 from alyssarosenzweig/ir/before Factor out SetWriteCursorBefore	2024-04-23 13:34:05 -07:00
Ryan Houdek	308488c419	Allocator: Fixes compiling on Fedora 40 This header was missing. Either libstdc++14 or clang-18 changed includes and we were only getting this indirectly before.	2024-04-23 12:12:33 -07:00
Alyssa Rosenzweig	2372c9458b	ConstProp: use SetWriteCursorBefore Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-04-23 13:09:45 -04:00
Alyssa Rosenzweig	1a11343f34	IREmitter: add SetWriteCursorBefore helper This is subtle, add an ergonomic helper for it. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-04-23 13:09:45 -04:00
Ryan Houdek	84d5b3ee59	CPUID: Enable enhanced rep movs in more situations Instead of only enabling enhanced rep movs if software TSO is disabled, Enable it if software tso is disabled OR memcpysettso is disabled. This is because now we hit the fast path when memcpysettso is disabled alone but global TSO is disabled. Retested Hades and performance was fine in this configuration.	2024-04-21 18:50:17 -07:00
Ryan Houdek	376936c808	Merge pull request #3591 from alyssarosenzweig/ra/fix JIT: fix ShiftFlags shuffles	2024-04-19 13:50:45 -07:00
Alyssa Rosenzweig	932b8f38f4	JIT: fix ShiftFlags shuffles messed up my RA. fixes ShiftPF.asm with jit_1 with a pathological register allocation Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-04-18 14:11:03 -04:00
Ryan Houdek	c8704a7f71	OpcodeDispatcher: Implement support for SMSW Found out that Far Cry uses this instruction and it is viable to use in CPL-3. This only returns constant data but its behaviour is a little quirky. This instruction has a weird behaviour that the 32-bit operation does an insert in to the 64-bit destination, which might be an Intel versus AMD behaviour. I don't have an Intel machine available to test if that theory is true although. This assumption would match similar behaviour where segment registers are inserted instead of zext. Gets the game farther but then it crashes in a `___ascii_strnicmp` function where the arguments end up being `___ascii_strnicmp(nullptr, "Color", 5);`.	2024-04-18 07:41:39 -07:00
Alyssa Rosenzweig	352dcdb478	RCLSE: disable store-after-store optimization Functional revert of `92f31648b` ("RCLSE: optimize out pointless stores"), which reportedly regressed some titles due to RA doom. We'll revisit later, leaving in the code for when RA is ready to light this up. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-04-17 14:53:11 -04:00
Paulo Matos	905aa935f5	Reformat until fixed-point Followup to `2b4ec88dae`. Some files needed a couple of calls to clang-format 16.0.6 to reach a fixed point.	2024-04-15 09:40:00 +02:00
Ryan Houdek	7614ac9f14	Merge pull request #3573 from pmatos/RemoveClangTidy Remove trace of clang-tidy experiment from CMakeLists.txt	2024-04-12 17:13:53 -07:00
Paulo Matos	2b4ec88dae	Whole-tree reformat This follows discussions from #3413. Followup commits add clang-format file, script and blame ignore lists.	2024-04-12 16:26:02 +02:00
Paulo Matos	6524716404	Move const to the left in preparation for reformatting clang-format-16 had some issues with const placement, so we are manually changing these.	2024-04-12 16:06:57 +02:00
Paulo Matos	20559853ee	Remove trace of clang-tidy experiment from CMakeLists.txt	2024-04-12 12:31:04 +02:00
Ryan Houdek	a9b7ad841c	Merge pull request #3570 from bylaws/ec_pt8 Enable jemalloc for ARM64EC	2024-04-11 12:57:36 -07:00
Ryan Houdek	271700e9f6	Merge pull request #3568 from lioncash/const X87: Simplify constant loading for FLD family	2024-04-11 00:32:26 -07:00
Ryan Houdek	1ba678f631	Merge pull request #3562 from Sonicadvance1/fix_rsp_store_tso OpcodeDispatcher: Fixes disabling TSO access on RSP SIB stores	2024-04-11 00:32:14 -07:00
Ryan Houdek	a0f2cae1cb	Merge pull request #3567 from lioncash/veczero IR: Remove VectorZero	2024-04-09 20:04:26 -07:00
Ryan Houdek	66cbb66732	Merge pull request #3566 from lioncash/address OpcodeDispatcher: Add helper for making segment offset addresses	2024-04-09 17:31:43 -07:00
Billy Laws	f1f0c47f16	AllocatorHooks: Allow using jemalloc on win32	2024-04-09 23:42:23 +00:00
Lioncache	4cb2432b5c	OpcodeDispatcher: Make use of new x87 constants Now we can load these directly instead of needing to manually materialize them.	2024-04-09 10:17:15 -04:00
Lioncache	27ba66a181	IRDumper: Extend printer for NamedVectorConstant Makes it aware of the x87 constants.	2024-04-09 10:13:35 -04:00
Lioncache	65b5281d7c	IR: Add constants for FLD variants	2024-04-09 10:13:33 -04:00
Ryan Houdek	1a8b61b9fc	Merge pull request #3560 from bylaws/ec-pt6 FEXCore: Support x64 -> arm64ec calls	2024-04-09 07:08:38 -07:00
Ryan Houdek	f0dad86332	Merge pull request #3559 from bylaws/ec-pt5 LookupCache: Track ARM64EC page state in the code cache	2024-04-09 07:08:29 -07:00
Mai	eedb120fd0	Merge pull request #3563 from Sonicadvance1/fill_spill_pairs JIT: Adds support for spilling/Filling GPRPair	2024-04-08 23:05:11 -04:00
Lioncache	98841fe07a	IR: Remove VectorZero We have LoadNamedVectorConstant that now performs this behavior while also being more flexible.	2024-04-08 22:42:58 -04:00
Lioncache	b0aeb501f4	OpcodeDispatcher: Add helper for making segment offset addresses There's quite a few places where the segment offset appending is open-coded throughout the opcode dispatcher, but we can pull these out into a few helpers to make the sites a little more compact and declarative.	2024-04-08 17:50:58 -04:00
Lioncache	b26bf2eaf6	DebugData: Remove header This isn't included or used anywhere, so it can be removed.	2024-04-08 16:05:40 -04:00
Ryan Houdek	574fdcef32	JIT: Adds support for spilling/Filling GPRPair Tony noticed this last week. I encountered it this week. Add support for spilling and filling GPR pairs.	2024-04-08 11:55:29 -07:00
Ryan Houdek	0e93fd0f3e	OpcodeDispatcher: Fixes disabling TSO access on RSP SIB stores GPR Direct/Indirect already had this and SIB version also already supported on the load side. Fixes this missed behaviour.	2024-04-08 11:32:01 -07:00
Alyssa Rosenzweig	063954c9b3	ValueDominanceValidation: rm deadcode Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-04-08 13:49:35 -04:00
Alyssa Rosenzweig	7b3e031678	ValueDominanceValidation: forbid crossblock liveness Now that we have successfully eliminated crossblock liveness from the IR we generate, validate as much to ensure it doesn't come back. We will take advantage of this new invariant in RA in the future. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-04-08 13:49:35 -04:00
Alyssa Rosenzweig	a775e474d5	ValueDominanceValidation: do not validate inline constants Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-04-08 13:49:35 -04:00
Alyssa Rosenzweig	0e99019586	ValueDominanceValidation: actually validate Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-04-08 13:49:35 -04:00
Alyssa Rosenzweig	d9493e5d9b	OpcodeDispatcher: fix xblock liveness in xsave/xrstr didn't fix this hard enough before. caught by validation. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-04-08 13:48:29 -04:00
Alyssa Rosenzweig	eb83c9e7f2	Core: use safe CondJump for self-modifying code this ensures we put the StoreNZCV in the right block, which will fix validation fails later in the series. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-04-08 13:48:29 -04:00
Ryan Houdek	bb24e1419c	Merge pull request #3558 from bylaws/ec-pt3 AllocatorHooks: Mark JIT code memory as EC code on ARM64EC	2024-04-08 09:32:51 -07:00
Billy Laws	526e3e654f	LookupCache: Track ARM64EC page state in the code cache Rather than checking the actual EC bitmap in the dispatcher (~6 instrs), this indirection through the code cache allows just 1 instr for the hot path of calling repeated EC code/x64 code.	2024-04-08 16:08:17 +01:00
Billy Laws	243bb45a68	FEXCore: Support x64 -> arm64ec calls The frontend will provide the return logic via ExitFunctionEC, which will be jumped to whenever there is an indirect branch/return to an addr such that RtlIsEcCode(addr) returns true.	2024-04-06 13:20:48 +00:00
Billy Laws	bd5b817c3a	AllocatorHooks: Mark JIT code memory as EC code on ARM64EC Executable mapped memory is treated as x86 code by default when running under EC, VirtualAlloc2 needs to be used together with a special flag to map JIT arm64 code.	2024-04-06 12:40:52 +00:00
Alyssa Rosenzweig	95589f6172	OpcodeDispatcher: rm deferred variable shift flag calcs Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-04-05 20:34:05 -04:00
Alyssa Rosenzweig	098859caf7	OpcodeDispatcher: use _ShiftFlags for ASHR Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-04-05 20:34:05 -04:00
Alyssa Rosenzweig	c632543451	OpcodeDispatcher: use _ShiftFlags for SHRD Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-04-05 20:34:05 -04:00
Alyssa Rosenzweig	801cf72f95	OpcodeDispatcher: use _ShiftFlags for SHLD Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-04-05 20:34:05 -04:00
Alyssa Rosenzweig	650cd2c46e	OpcodeDispatcher: use _ShiftFlags for SHR Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-04-05 20:34:05 -04:00
Alyssa Rosenzweig	5b48ce2228	OpcodeDispatcher: use _ShiftFlags for SHL Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-04-05 20:34:05 -04:00
Alyssa Rosenzweig	2173c26fd8	OpcodeDispatcher: add HandleShift helper all the variable shift impls need to do this dance, make it common. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-04-05 20:34:05 -04:00
Alyssa Rosenzweig	982391ba9d	IR: add ShiftFlags op Generates flags for a variable shift as a dedicated IR op. This lets us optimize around it (without generating control flow, relying on deferred flag infra, etc). And it neatly solves our RA problem for shifts. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-04-05 20:34:05 -04:00
Alyssa Rosenzweig	a99c48b7a3	RedundantFlagCalculationElimination: do not eliminate if there are uses we'll hit this with _ShiftFlags. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-04-05 20:34:05 -04:00
Alyssa Rosenzweig	3661da1bc6	OpcodeDispatcher: calculate deferred flags before RMW on NZCV otherwise we might have the wrong input NZCV. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-04-05 20:34:05 -04:00
Alyssa Rosenzweig	859df5e0b2	OpcodeDispatcher: optimize shl flag This is something the new shift flag code will do. Backporting the opt since that's stalled and this reduces the diff. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-04-05 19:38:44 -04:00
Ryan Houdek	7786c23405	Merge pull request #3556 from Sonicadvance1/move_app_config FEXCore: Fixes priority of FEX_APP_CONFIG	2024-04-05 15:18:24 -07:00
Ryan Houdek	904646e93b	FEXCore: Fixes priority of FEX_APP_CONFIG This environment variable had an incorrect priority on the configuration system. The expectation was higher priority than most other layers. Now the only layer that has higher priority is the environment variables.	2024-04-05 13:10:43 -07:00
Alyssa Rosenzweig	a05cc06ab4	OpcodeDispatcher: unify imm/1-bit ASHR Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-04-04 07:42:15 -04:00
Alyssa Rosenzweig	031e756a78	OpcodeDispatcher: unify imm/1-bit SHR Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-04-04 07:42:15 -04:00
Alyssa Rosenzweig	2a9f1ce8cb	OpcodeDispatcher: unify imm/1-bit SHL Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-04-04 07:42:15 -04:00
Alyssa Rosenzweig	8c53a9f051	OpcodeDispatcher: use LoadConstantShift for rotates Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-04-04 07:42:15 -04:00
Alyssa Rosenzweig	cf26ec7898	OpcodeDispatcher: use LoadConstantShift for SHRD Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-04-04 07:42:15 -04:00
Alyssa Rosenzweig	582c3dae6e	OpcodeDispatcher: use LoadConstantShift for SHLD Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-04-04 07:42:15 -04:00
Alyssa Rosenzweig	2abac03ab0	OpcodeDispatcher: add LoadConstantShift helper shows up a bunch Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-04-04 07:42:15 -04:00
Alyssa Rosenzweig	8cc684fa12	OpcodeDispatcher: drop misinformed comment tbnz only tests a single bit, not a mask. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-04-04 07:42:15 -04:00
Alyssa Rosenzweig	d92de1d947	OpcodeDispatcher: drop result masking for shifts flag calcs are fine with upper garbage. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-04-04 07:42:15 -04:00
Alyssa Rosenzweig	202a60b77a	Merge pull request #3549 from alyssarosenzweig/constprop/dce ConstProp: drop dead code	2024-04-03 11:22:30 -04:00
Alyssa Rosenzweig	e07c81a5e7	ConstProp: also negate sub -> add Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-04-02 13:56:59 -04:00
Alyssa Rosenzweig	fa76961873	ConstProp: negate adds -> subs the arm ops are equiv, even though the x86 isn't (due to inverted carry). Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-04-02 13:53:21 -04:00
Alyssa Rosenzweig	b92c206db9	ConstProp: rm your deadcode not sure who this is supposed to be helping. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-04-02 13:21:48 -04:00
Alyssa Rosenzweig	efff942724	ConstProp: drop my deadcode Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-04-02 13:21:48 -04:00
Alyssa Rosenzweig	8d32113521	ConstProp: rm relic of x86 jit Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-04-02 13:21:48 -04:00
Alyssa Rosenzweig	37f2b417e4	Merge pull request #3546 from alyssarosenzweig/flag/cleanup Minor cleanups around flags	2024-04-02 11:29:19 -04:00
Alyssa Rosenzweig	bd0b5eceb8	Merge pull request #3545 from alyssarosenzweig/opt/pf-scalar Use scalar integer code to calculate PF	2024-04-02 11:28:53 -04:00
Alyssa Rosenzweig	b632f7215c	Merge pull request #3544 from alyssarosenzweig/ra/zero-multiple OpcodeDispatcher: drop ZeroMultipleFlags	2024-04-02 11:27:45 -04:00
Ryan Houdek	e8abc88702	Merge pull request #3542 from alyssarosenzweig/ra/rep Eliminate xblock liveness with rep cmp/lod/scas	2024-04-02 04:24:24 -07:00
Ryan Houdek	29c6281e11	Merge pull request #3539 from alyssarosenzweig/ra/rol-ror2 rewrite ROL/ROR	2024-04-02 00:17:08 -07:00
Alyssa Rosenzweig	067a5444dc	OpcodeDispatcher: use HandleNZ00Write Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-04-01 16:42:38 -04:00
Alyssa Rosenzweig	c7f159972d	OpcodeDispatcher: rm pointless NZCV loads Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-04-01 16:42:38 -04:00
Alyssa Rosenzweig	a70d0a5dd4	OpcodeDispatcher: rm unnecessary NZCV dirtying Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-04-01 16:42:38 -04:00
Ryan Houdek	cd9ffd2045	Merge pull request #3536 from alyssarosenzweig/ra/rcl-rcr OpcodeDispatcher: eliminate xblock liveness for rcl/rcr	2024-04-01 11:44:37 -07:00
Alyssa Rosenzweig	eb4bb5875e	OpcodeDispatcher: absorb invert into PF calculation with xorn Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-04-01 14:12:33 -04:00
Alyssa Rosenzweig	3b052e826f	OpcodeDispatcher: calculate PF with integer ops based on clang's __builtin_parity Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-04-01 14:12:32 -04:00
Alyssa Rosenzweig	65ec191dc1	IR: add XornShift Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-04-01 14:12:32 -04:00
Alyssa Rosenzweig	b1ddd8cd3b	Merge pull request #3541 from alyssarosenzweig/opt/clc optimize clc	2024-04-01 13:51:10 -04:00
Alyssa Rosenzweig	f2d001e721	Merge pull request #3543 from alyssarosenzweig/ra/dead-code RA: drop dead block interference code	2024-04-01 13:51:00 -04:00
Alyssa Rosenzweig	7852909cc4	OpcodeDispatcher: simplify IsNZCV Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-04-01 13:50:00 -04:00
Alyssa Rosenzweig	f8b68d8b5a	OpcodeDispatcher: drop ZeroMultipleFlags lot of complexity for only a single interesting case. we can massively simplify. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-04-01 13:48:11 -04:00
Ryan Houdek	e2a095372e	Merge pull request #3534 from Sonicadvance1/move_ir_defines FEXCore: Move nearly all IR definitions to internal	2024-04-01 10:00:20 -07:00
Ryan Houdek	5c29c9d464	Merge pull request #3527 from Sonicadvance1/move_type_defines Moves FHU TypeDefines to FEXCore includes	2024-04-01 08:57:22 -07:00
Ryan Houdek	3bed305660	Merge pull request #3526 from Sonicadvance1/move_codeloader FEXCore: Moves CodeLoader to frontend	2024-04-01 07:52:02 -07:00
Ryan Houdek	f6639c3594	Merge pull request #3525 from Sonicadvance1/move_cpubackend FEXCore: Moves CPUBackend definition internal	2024-04-01 06:47:34 -07:00
Alyssa Rosenzweig	ca1ec232c9	RA: drop dead block interference code Unused, and new RA won't use it either. Torch it. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-31 20:51:11 -04:00
Alyssa Rosenzweig	7b1bb159fa	OpcodeDispatcher: use ForeachDirection for scas Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-31 20:31:38 -04:00
Alyssa Rosenzweig	5c7f2934de	OpcodeDispatcher: use ForeachDirection for lods eliminates xblock live Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-31 20:29:29 -04:00
Alyssa Rosenzweig	5d79d4eb50	OpcodeDispatcher: use ForeachDirection for CMPS eliminates xblock liveness Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-31 20:29:16 -04:00
Alyssa Rosenzweig	3f66173bc7	OpcodeDispatcher: add ForeachDirection helper Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-31 20:28:56 -04:00
Alyssa Rosenzweig	4452f0acba	ConstProp: optimize rmif with 0 for clc Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-31 20:01:44 -04:00
Alyssa Rosenzweig	1a1545da0f	OpcodeDispatcher: rework rep cmp 1. pull flag calculation out of the loop body for perf 2. fully rotate the inner loop to save an instruction per iteration 3. hoist the rcx=0 jump to avoid computing df when rcx=0 Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-31 19:54:00 -04:00
Alyssa Rosenzweig	a70ea30c02	IR: add CondSubNZCV (ccmp) instruction Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-31 17:50:57 -04:00
Alyssa Rosenzweig	15b86e4c5a	OpcodeDispatcher: rewrite ROL/ROR single unified implementation for ROL & ROR (instead of 4 cases). no more deferred flags because it's easy to shoot ourselves in the foot with deferred flags w.r.t the new RA design, and rotates are rare enough with very efficient flag calculations such that the extra JIT overhead should be minimal to DCE the resulting calculations later. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-31 14:47:19 -04:00
Alyssa Rosenzweig	67baff8a57	Merge pull request #3537 from Sonicadvance1/remove_vla_ra RA: Removes VLA usage	2024-03-31 14:44:37 -04:00
Ryan Houdek	fedc24be1e	RA: Removes VLA usage Just like #3508, clang-18 complains about VLA usage. This vector is relatively small, only around 18 elements but is semi-dynamic depending on arch and if FEXCore is targeting Linux or Win32.	2024-03-30 16:50:04 -07:00
Alyssa Rosenzweig	6f5e4fd34b	OpcodeDispatcher: add non-flag calc version of ShiftVariable more correct for rcl, etc Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-30 14:30:36 -04:00
Alyssa Rosenzweig	bdda99e44f	ConstProp: constant fold Neg will come up with rotate in the next patch Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-30 14:15:11 -04:00
Alyssa Rosenzweig	706065b0e2	OpcodeDispatcher: accelerate cmpxchg with flagm Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-30 14:12:59 -04:00
Alyssa Rosenzweig	9fd32f07cb	JIT: preserve nzcv for the slow atomic path Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-30 14:12:59 -04:00
Alyssa Rosenzweig	deba6a1b76	JIT: add comment about unaligned backpatching save future me some grief. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-30 14:12:59 -04:00
Alyssa Rosenzweig	d25ace43aa	Merge pull request #3528 from alyssarosenzweig/ra/xsave-xrstor Eliminate crossblock liveness in xsave/xrstor	2024-03-30 14:11:25 -04:00
Alyssa Rosenzweig	9010b3c117	OpcodeDispatcher: use neg trick for rcl smaller Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-30 13:42:41 -04:00
Alyssa Rosenzweig	b0e001b660	OpcodeDispatcher: elim xblock live for smaller rcl Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-30 13:42:41 -04:00
Alyssa Rosenzweig	eadacbd67b	OpcodeDispatcher: elim xblock live with smaller rcr Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-30 13:42:03 -04:00
Alyssa Rosenzweig	7de29749be	OpcodeDispatcher: eliminate xblock live for rcl Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-30 13:42:03 -04:00
Alyssa Rosenzweig	1f3843ccad	OpcodeDispatcher: eliminate xblock live for rcr Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-30 13:42:03 -04:00
Alyssa Rosenzweig	bc76df9901	OpcodeDispatcher: add non-flag calc version of ShiftVariable more correct for rcl, etc Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-30 13:42:03 -04:00
Alyssa Rosenzweig	6e92cc454d	ConstProp: constant fold Neg will come up with rotate in the next patch Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-30 13:42:03 -04:00
Ryan Houdek	ed3af580c5	FEXCore: Move nearly all IR definitions to internal It has been a long time coming that FEX no longer needed to leak IR implementation details to the frontend, this was legacy due to IR CI and various other problems. Now that the last bits of IR leaking has been removed, move everything that we can internally to the implementation. We still have a couple of minor details in the exposed IR.h to the frontend, but these are limited to a few enums and some thunking struct information rather than all the implementation details. No functional change with this, just moving headers around.	2024-03-29 17:20:18 -07:00
Ryan Houdek	8564290f76	FEXCore: Remove DebugStore map This hasn't been used and is blocking refactoring more code.	2024-03-29 14:58:44 -07:00
Alyssa Rosenzweig	c513b9685d	OpcodeDispatcher: eliminate crossblock liveness in xsave/xrstor Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-29 09:57:16 -04:00
Ryan Houdek	d11a36eaea	Moves FHU TypeDefines to FEXCore includes FEXCore includes was including an FHU header which would result in compilation failure for external projects trying to link to libFEXCore. Moves it over to fix this, it was the only FHU usage in FEXCore/include NFC	2024-03-29 02:54:54 -07:00
Ryan Houdek	f46e88ebdb	FEXCore: Moves CPUBackend definition internal This is no longer necessary to be part of the public API. Moves the header internally. Needed to pass through `IsAddressInCodeBuffer` from CPUBackend through the Context object, but otherwise no functional change.	2024-03-29 02:27:29 -07:00
Ryan Houdek	20eb338644	FEXCore: Moves CodeLoader to frontend FEXCore no longer has a need for this since a bunch of related code was already moved to the frontend. Move the CodeLoader now.	2024-03-29 02:24:53 -07:00
Ryan Houdek	aa26b6288e	Merge pull request #3522 from alyssarosenzweig/ra/cmpxchg8 OpcodeDispatcher: eliminate branch in cmpxchg pair	2024-03-27 21:56:19 -07:00
Ryan Houdek	624bc3fce5	Merge pull request #3520 from Sonicadvance1/sleep_process FEXLoader: Add a way to sleep a process on startup	2024-03-27 18:35:06 -07:00
Alyssa Rosenzweig	61758ea47d	OpcodeDispatcher: eliminate branch in cmpxchg pair In the old case: * if we take the branch, 1 instruction * if we don't take the branch, 3 instruction * branch predictor fun * 3 instructions of icache pressure In the new case: * unconditionally 2 instructions * no branch predictor dependence * 2 instructions of icache pressure This should not be non-neglibly worse, and it simplifies things for RA. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-27 12:40:06 -04:00
Ryan Houdek	7b74ca1931	Merge pull request #3514 from alyssarosenzweig/opt/demon rewrite Demon Addition Adjust (DAA) and other demonic opcodes	2024-03-26 23:24:00 -07:00
Ryan Houdek	24fd28ed9e	Merge pull request #3511 from Sonicadvance1/more_tso_levers FEXCore: Adds more TSO control levers	2024-03-26 23:23:41 -07:00
Ryan Houdek	970d5d5b13	Merge pull request #3509 from Sonicadvance1/allow_telemetry_redirect Telemetry: Allow redirecting directory that data is written to	2024-03-26 23:23:05 -07:00
Ryan Houdek	7f90ca53f7	Merge pull request #3505 from Sonicadvance1/telemetry_noncanonical Telemetry: Adds tracker for non-canonical memory access crash	2024-03-26 23:21:32 -07:00
Ryan Houdek	ade0c46845	FEXLoader: Add a way to sleep a process on startup I find myself reimplementing this nearly monthly. Actually codify it so I can stop reimplementing it.	2024-03-26 07:48:09 -07:00
Ryan Houdek	6f29e75f67	FEXCore: Removes vestigial mman SMC checking This wasn't actually wired up to anything ever since some refactoring occured two years ago.	2024-03-26 02:56:26 -07:00
Alyssa Rosenzweig	dfe0bdd7f2	OpcodeDispatcher: rewrite DAS exhaustively checked against the Intel pseudocode since this is tricky: def intel(AL, CF, AF): old_AL = AL old_CF = CF CF = False if (AL & 0x0F) > 9 or AF: Borrow = AL < 6 AL = (AL - 6) & 0xff CF = old_CF or Borrow AF = True else: AF = False if (old_AL > 0x99) or old_CF: AL = (AL - 0x60) & 0xff CF = True return (AL & 0xff, CF, AF) def fex(AL, CF, AF): AF = AF \| ((AL & 0xf) > 9) CF = CF \| (AL > 0x99) NewCF = CF \| (AF if (AL < 6) else CF) AL = (AL - 6) if AF else AL AL = (AL - 0x60) if CF else AL return (AL & 0xff, NewCF, AF) for AL in range(256): for CF in [False, True]: for AF in [False, True]: ref = intel(AL, CF, AF) test = fex(AL, CF, AF) print(AL, "CF" if CF else "", "AF" if AF else "", ref, test) assert(ref == test) Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-25 19:43:10 -04:00
Alyssa Rosenzweig	e26481e3cc	OpcodeDispatcher: simplify AAM in the area. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-25 19:43:10 -04:00
Alyssa Rosenzweig	86b5a2f352	OpcodeDispatcher: simplify AAD noticed in the area. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-25 19:43:10 -04:00
Alyssa Rosenzweig	2bf880c43a	OpcodeDispatcher: rewrite AAS Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-25 19:43:10 -04:00
Alyssa Rosenzweig	583d4f8f94	OpcodeDispatcher: factor out CalculateAFForDecimal Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-25 19:43:10 -04:00
Alyssa Rosenzweig	3ca2c4377f	OpcodeDispatcher: rewrite AAA Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-25 19:43:10 -04:00
Ryan Houdek	76983476b9	Merge pull request #3504 from Sonicadvance1/fix_loop_a16 OpcodeDispatcher: Fixes 32-bit mode LOOP RCX register usage	2024-03-25 12:18:14 -07:00
Alyssa Rosenzweig	949717a95f	OpcodeDispatcher: rewrite DAA implementation Based on https://www.righto.com/2023/01/ New implementation is branchless, which is theoretically easier to RA. It's also massively simpler which is good for a demon opcode. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-25 13:00:59 -04:00
Alyssa Rosenzweig	693d86dd67	OpcodeDispatcher: add SetAFAndFixup helper Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-25 12:59:19 -04:00
Ryan Houdek	3034edb0aa	RA: Adds RIP when a block panic spills I find myself adding this every time I find a game that panic spills. Let's just print it out.	2024-03-24 17:11:29 -07:00
Ryan Houdek	64f47d1ec2	FEXCore: Adds more TSO control levers Lets use control vector loadstores and memcpy/memset TSO visibility. This just gives us a bit more configuration rather than TSO off or on.	2024-03-24 16:34:18 -07:00
Ryan Houdek	70befc216f	Telemetry: Allow redirecting directory that data is written to This will be necessary	2024-03-24 00:47:35 -07:00
Ryan Houdek	4952b2e16c	Telemetry: Rename old file instead of copying Since we do an immediate overwrite of the file we are copying, we can instead do a rename. Failure on rename is fine, will either mean the telemetry file didn't exist initially, or some other permission error so the telemetry will get lost regardless.	2024-03-21 22:51:20 -07:00
Ryan Houdek	5a35e119fe	Telemetry: Adds tracker for non-canonical memory access crash This may be useful for tracking TSO faulting when it manages to fetch stale data. While most TSO crashes are due to nullptr dereferences, this can still check for the corruption case.	2024-03-21 20:47:36 -07:00
Ryan Houdek	824f122680	OpcodeDispatcher: Fixes 32-bit mode LOOP RCX register usage In 64-bit mode, the LOOP instruction's RCX register usage is 64-bit or 32-bit. In 32-bit mode, the LOOP instruction's RCX register usage is 32-bit or 16-bit. FEX wasn't handling the 16-bit case at all which was causing the LOOP instruction to effectively always operate at 32-bit size. Now this is correctly supported, and it also stops treating the operation as 64-bit.	2024-03-21 20:13:15 -07:00
Ryan Houdek	8852d94416	Merge pull request #3503 from alyssarosenzweig/opt/loop OpcodeDispatcher: optimize LOOP/N/E	2024-03-21 20:05:50 -07:00
Alyssa Rosenzweig	82ba16c6ed	OpcodeDispatcher: optimize LOOP/N/E Don't clobber NZCV. Before/after assembly from the Primary_E1 unit test: < 4340: [INFO] cset w20, ne < 4340: [INFO] mrs x21, nzcv < 4340: [INFO] cmp x5, #0x0 (0) < 4340: [INFO] cset x22, ne < 4340: [INFO] and x20, x22, x20 < 4340: [INFO] msr nzcv, x21 < 4340: [INFO] cbnz x20, #+0x8 (addr 0xffff896f8084) < 4340: [INFO] b #+0x1c (addr 0xffff896f809c) < 4340: [INFO] ldr x0, pc+8 (addr 0xffff896f808c) --- > 4340: [INFO] csel x20, x5, xzr, ne > 4340: [INFO] cbnz x20, #+0x8 (addr 0xfffed7308070) > 4340: [INFO] b #+0x1c (addr 0xfffed7308088) > 4340: [INFO] ldr x0, pc+8 (addr 0xfffed7308078) Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-21 12:08:40 -04:00
Ryan Houdek	45ea0cd782	Removes false termux support This was a funny joke that this was here, but it is fundamentally incompatible with what we're doing. All those users are running proot anyway because of how broken running under termux directly is. Just remove this from here.	2024-03-20 22:04:32 -07:00
Billy Laws	d490cb1b79	FEXCore: Fallback to the memcpy slow path for overlaps within 32 bytes Take e.g a forward rep movsb copy from addr 0 to 1, the expected behaviour since this is a bytewise copy is: before: aaabbbb... after: aaaaaaa... but by copying in 32-byte chunks we end up with: after: aaaabbbb... due to the self overwrites not occuring within a single 32 bit copy.	2024-03-20 20:54:19 +00:00
Billy Laws	94fecb9dad	FEXCore: Remove needless alignment checks for the mem{cpy,set} fastpath	2024-03-20 20:54:09 +00:00
Ryan Houdek	7dcacfe990	Merge pull request #3478 from bylaws/memcpy FEXCore: Add non-atomic Memcpy and Memset IR fast paths	2024-03-18 18:56:44 -07:00
Billy Laws	8d4d8fe3e5	FEXCore: Add non-atomic Memcpy and Memset IR fast paths When TSO is disabled, vector LDP/STP can be used for a two instruction 32 byte memory copy which is significantly faster than the current byte-by-byte copy. Performing two such copies directly after oneanother also marginally increases copy speed for all sizes >=64.	2024-03-18 23:28:50 +00:00
Ryan Houdek	ab8ee64352	Merge pull request #3497 from Sonicadvance1/movmaskb_constant JIT: Optimize pmovmaskb with a named vector constant	2024-03-18 16:08:40 -07:00
Alyssa Rosenzweig	2a9fcc6a66	Merge pull request #3492 from Sonicadvance1/implement_prefetch OpcodeDispatcher: Implement support for the various prefetch instructions	2024-03-18 07:49:47 -04:00
Ryan Houdek	fd391b1b18	JIT: Optimize pmovmaskb with a named vector constant I was looking at some other JIT overheads and this cropped up as some overhead. Instead of materializing a constant using mov+movk+movk+movk, load it from the named vector constant array. In a micro-benchmark this improved performance by 34%. In bytemark this improved on subbench by 0.82%	2024-03-17 18:40:46 -07:00
Ryan Houdek	f79991a9d8	OpcodeDispatcher: Implement rdpid Missed this instruction when implementing rdtscp. Returns the same ID result in a register just like rdtscp, but without the cycle counter results. Doesn't touch any flags just like rdtscp.	2024-03-14 20:07:58 -07:00
Ryan Houdek	ca6b2e43e6	Merge pull request #3491 from alyssarosenzweig/rclse/waw RCLSE: Optimize store-after-store	2024-03-14 03:23:05 -07:00
Ryan Houdek	8a3d08e1d8	Merge pull request #3483 from neobrain/refactor_stealmemoryregion Allocator: Cleanup StealMemoryRegions implementation	2024-03-14 03:21:09 -07:00
Ryan Houdek	8056bee82b	OpcodeDispatcher: Implement support for the various prefetch instructions x86 has a few prefetch instructions. - prefetch - One of two classic 3DNow! instructions - Prefetch in to L1 data cache - prefetchw - One of two classic 3DNow! instructions - Implies prefetch in to L1 data cache - Prefetch cacheline with intent to write and exclusive ownership - prefetchnta - Prefetch non-temporal data in respect to /all/ cache levels - Assumes inclusive caches? - prefetch{t0,t1,t2} - Prefetch data with respect to each cache level - T0 = L1 and higher - T1 = L2 and higher - T2 = L3 and higher Some silly duplicates - prefetchwt1 - Duplicate of prefetchw but explicitly L1 data cache - prefetch_exclusive - Duplicate of prefetch God Of War 2018 uses prefetchw as a hint for exclusive ownership of the cacheline in some very aggressive spin-loops. Let's implement the operations to help it along.	2024-03-12 21:37:31 -07:00
Ryan Houdek	cc635a54f8	IR: Implements support for prefetch operation	2024-03-12 21:19:50 -07:00
Ryan Houdek	217d9d8c50	ARMEmitter: Fixes prfm with negative or unaligned offsets	2024-03-12 21:18:23 -07:00
Tony Wasserka	a047ac1699	Allocator: Test CollectMemoryGaps instead of StealMemoryRegions and restore the original interfaces	2024-03-12 10:49:31 +01:00
Tony Wasserka	bb0b114fc8	Allocator: Miscellaneous cleanups	2024-03-12 10:49:30 +01:00
Tony Wasserka	ccd6c15316	Allocator: Use std::from_chars instead of parsing digits manually	2024-03-12 10:49:30 +01:00
Tony Wasserka	0a1fe1c8c2	Allocator: Parse process mappings per-line instead of per-character	2024-03-12 10:49:30 +01:00
Tony Wasserka	f43fe5fd63	Allocator: Stop parsing more eagerly This is a soft-revert of `eaf83aa`. That change is no longer needed, since the stack special case is handled externally now.	2024-03-12 10:49:30 +01:00
Tony Wasserka	dce9f651fd	Allocator: Split off memory gap collection to a separate function This function can be unit-tested more easily, and the stack special is more cleanly handled as a post-collection step. There is a minor functional change: The stack special case didn't trigger previously if the range end was within the stack mapping. This is now fixed.	2024-03-12 10:49:30 +01:00
Tony Wasserka	0d71f169d0	Allocator: Adopt a more testable interface for StealMemoryRegions	2024-03-12 10:49:30 +01:00
Tony Wasserka	430ac0f70a	Allocator: Fix format strings	2024-03-12 10:49:30 +01:00
Alyssa Rosenzweig	7629007cfa	OpcodeDispatcher: allow upper garbage on STOS Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-11 18:50:31 -04:00
Alyssa Rosenzweig	03c6abdad4	OpcodeDispatcher: optimize DF add fuse the shift the right way Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-11 18:50:31 -04:00
Alyssa Rosenzweig	c99cbe6d0a	JIT: switch DF representation Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-11 18:50:31 -04:00
Alyssa Rosenzweig	e3ee65e491	OpcodeDispatcher: use transformed DF for memset/memcpy Use the 1/-1 representation instead of 0/1. This will be better by the end of the series. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-11 18:50:31 -04:00
Alyssa Rosenzweig	aee00f524c	OpcodeDispatcher: use DF retrieval helpers Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-11 18:50:31 -04:00
Alyssa Rosenzweig	a76321c6c1	OpcodeDispatcher: add DF retrieval helpers Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-11 18:50:31 -04:00
Alyssa Rosenzweig	f7586f4459	CoreState: use x86 enums for readability Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-11 18:50:31 -04:00
Ryan Houdek	c37a12e806	Merge pull request #3490 from Sonicadvance1/disable_assert Disable assert in release	2024-03-11 15:48:18 -07:00
Alyssa Rosenzweig	92f31648b9	RCLSE: optimize out pointless stores can help a lot of x86 code because x86 is 2-address and a64 is 3-address, so x86 ends up with piles of movs that end up dead after translation It's not a win across the board because our RA isn't aware of tied registers so sometimes we regress moves. But it's a win on average, and the RA bits can be improved with time. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-11 18:41:23 -04:00
Alyssa Rosenzweig	85f8ad3842	JIT: fix sha256msg1 encoding botched move in the !tied reg case. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-11 18:41:23 -04:00
Ryan Houdek	ff0c7637c9	Merge pull request #3421 from pmatos/AddressingModes32 Improve 32bit ld/st addressing mode propagation	2024-03-11 15:20:20 -07:00
Ryan Houdek	54403e2146	Disable assert in release Arguments and conditional doesn't get optimized out in release builds for the inline function call versus the define. Was showing up an annoying amount of time when testing.	2024-03-10 22:01:50 -07:00
Paulo Matos	a86f2d3e2c	Improve 32bit constant usage in memory addressing Folds reg+const memory address into addressing mode, if the constant is within 16Kb. Update instcountci files. Add test 32Bit_ASM/FEX_bugs/SubAddrBug.asm	2024-03-05 14:01:32 +00:00
Tony Wasserka	6edba49784	Update Catch2 to v3.5.3	2024-03-05 12:15:29 +01:00
Alyssa Rosenzweig	11880459a5	OpcodeDispatcher: use SETF for DEC Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-01 19:40:53 -04:00
Alyssa Rosenzweig	0ef0bb2c97	OpcodeDispatcher: use SETF for INC Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-01 19:40:53 -04:00
Alyssa Rosenzweig	72edee7c6f	IR: add SETF8/SETF16 ir ops Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-03-01 19:40:53 -04:00
Ryan Houdek	009ae55ff0	Merge pull request #3475 from alyssarosenzweig/opt/lock-dec Optimize lock dec	2024-02-29 08:44:24 -08:00
Ryan Houdek	98572b9e23	Merge pull request #3473 from Sonicadvance1/remove_mov_swap Arm64: Stop moving source in atomic swap	2024-02-29 08:44:16 -08:00
Alyssa Rosenzweig	fed5e6d546	OpcodeDispatcher: use fetchadd for atomic DEC Avoids a NEG. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-02-29 09:28:21 -04:00
Alyssa Rosenzweig	f27e2246e2	Merge pull request #3468 from alyssarosenzweig/opt/miscs Misc little opts	2024-02-29 09:18:17 -04:00
Ryan Houdek	eaf83aa6b4	Fix reserving range check Fixes an issue where TestHarnessRunner was managing to reserve the space below stack again, resulting in stack growth breaking. Would typically only show up when using the vixl simulator under gdb for some reason. This is likely the last bandage on this code before it gets completely rewritten to be more readable.	2024-02-29 04:02:05 -08:00
Ryan Houdek	c318947695	Arm64: Stop moving source in atomic swap ldswpal doesn't overwrite the source register and only reads the bits required for the sized operation. Not sure exactly why we were doing a copy here. Removing it means improving Skyrim's hottest code block, as seen in #3472	2024-02-29 03:07:05 -08:00
Alyssa Rosenzweig	811487ad98	OpcodeDispatcher: use real branch for INT Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-02-28 10:35:12 -04:00

... 2 3 4 5 6 ...

1338 Commits