FEX-Emu/FEX - FEX - Gitea: Git with a cup of tea

mirror of https://github.com/FEX-Emu/FEX.git synced 2024-12-14 17:38:47 +00:00

Author	SHA1	Message	Date
Lioncache	1d453f10a0	Arm64/Emitter: Remove unnecessary qualifiers from integer unary arith ops While we're in the area, we can make these much quicker to read by removing the unnecessary qualifiers.	2023-07-19 09:18:36 -04:00
Lioncache	5ebd21ca6a	Arm64/Emitter: Deduplicate integer unary arithmetic instructions We can move the opcodes into the underlying implementation along with the asserts to deduplicate a bit of code.	2023-07-19 09:16:33 -04:00
Ryan Houdek	79f7dcbaa5	FEXCore/Config: Adds support for enum mask configuration array Allows us to consume an array of strings and convert it to an mask of enum values. This is a quality of life change that allows us to specify a mask of options. The first configuration option added to support this is to control the vixl disassembler. Now by default the vixl disassembler doesn't disassemble any blocks and needs to be enabled individually. eg: ``` FEXLoader --disassemble=blocks <args> FEXLoader --disassemble=dispatcher <args> FEXLoader --disassemble=blocks,dispatcher <args> ``` Has the additional convenience option of just passing in numbers as well. ``` FEXLoader --disassemble=2 <args> FEXLoader --disassemble=1 <args> FEXLoader --disassemble=3 <args> ``` Also of course all of this works through environment variables. ``` FEX_DISASSEMBLE=blocks FEXInterpreter <args> FEX_DISASSEMBLE=dispatcher FEXInterpreter <args> FEX_DISASSEMBLE=blocks,dispatcher FEXInterpreter <args> ``` While only used fairly sparingly now, this is likely to have some additional configurations using this in the future. Since we already have some configs that are basically using enums, but just by doing string comparisons. This was asked for by a developer, so I figured I would throw it together quick.	2023-07-18 18:17:43 -07:00
Ryan Houdek	f7b7997c77	Merge pull request #2786 from Sonicadvance1/minor_fcmov_opt2 OpcodeDispatcher: Another FCMov minor optimization	2023-07-18 18:14:17 -07:00
Ryan Houdek	0674dfab0a	Merge pull request #2787 from Sonicadvance1/icache_only_code Arm64: Only clear icache for code	2023-07-18 18:14:05 -07:00
Lioncache	fe5f17d92e	OpcodeDispatcher: Handle VSIB byte Ensures that we handle the AVX2 VSIB byte in a decent way. As is, we can't compute the [index * scale] variant portion of the entire address operand, since the scale needs to act on every element of the vector after sign extension. What we can do though, is compute the base address and add the displacement to it ahead of time though.	2023-07-18 13:56:38 -04:00
Lioncache	5043e5fbc0	OpcodeDispatcher: Move ShouldDump member into private section Like with HandledLock, we can move this into the private section and put an API around it for consistency.	2023-07-18 11:03:25 -04:00
Lioncache	2acfde3cad	OpcodeDispatcher: Move CTX member into private section This isn't used outside of the class.	2023-07-18 11:00:04 -04:00
Lioncache	c1eeeaf688	OpcodeDispatcher: Move flag-related variables into private section These are only used within the opcode dispatcher, so they can be private.	2023-07-18 10:58:09 -04:00
Lioncache	f2b3229a87	OpcodeDispatcher: Move JumpTargets into private section This is only used within the class, so it can be made private.	2023-07-18 10:52:48 -04:00
Lioncache	e20bfc0701	OpcodeDispatcher: Move HandledLock boolean into private class section This can be trivially hidden and have an API put around it.	2023-07-18 10:43:52 -04:00
Lioncache	4caee5c9be	OpcodeDispatcher: Remove unused Current_Header and Current_HeaderNode variables These aren't used outside of being assigned to, so they can be removed.	2023-07-18 10:40:26 -04:00
Lioncache	cee5512a56	Arm64/Emitter: Handle unsized contiguous STR variants We handle the unsized load variants, so we should do the same with the stores.	2023-07-18 08:38:23 -04:00
Ryan Houdek	80ae3e632d	Arm64: Optimize {Load,Store}ContextIndexed address generation All of these IR operations were being fairly inefficient in their address calculation. All of these are known using power of 2 stride indexing. So all of these can be converted from three instructions to one. These are always used for x87 stack accesses so each one gets an improvement. Before: ```asm 0x0000ffff6a800248 d2800200 mov x0, #0x10 0x0000ffff6a80024c 9b007e80 mul x0, x20, x0 0x0000ffff6a800250 8b000380 add x0, x28, x0 0x0000ffff6a800254 fd417805 ldr d5, [x0, #752] ``` After: ```asm 0x0000ffff91e80240 8b141380 add x0, x28, x20, lsl #4 0x0000ffff91e80244 fd417805 ldr d5, [x0, #752] ```	2023-07-17 22:59:33 -07:00
Ryan Houdek	ed75c19324	OpcodeDispatcher: Optimize 8/16-bit RCR The BFI cascades in this particular instruction weren't optimal. Biggest improvement is the 8-bit version, while the 16-bit version gets a minor improvement. 8-bit instruction count reduced from 38 to 29. 16-bit instruction count reduced from 34 to 28. RCL can have a similar optimization done to it. ```asm Before 16-bit: 0x0000ffff80a801e0 10ffffe0 adr x0, #-0x4 (addr 0xffff80a801dc) 0x0000ffff80a801e4 f9005f80 str x0, [x28, #184] 0x0000ffff80a801e8 d3403cb4 uxth x20, w5 0x0000ffff80a801ec d3403cf5 uxth x21, w7 0x0000ffff80a801f0 394b0396 ldrb w22, [x28, #704] 0x0000ffff80a801f4 12001294 and w20, w20, #0x1f 0x0000ffff80a801f8 d2800017 mov x23, #0x0 0x0000ffff80a801fc b3403eb7 bfxil x23, x21, #0, #16 0x0000ffff80a80200 b37002d7 bfi x23, x22, #16, #1 0x0000ffff80a80204 b36f3eb7 bfi x23, x21, #17, #16 0x0000ffff80a80208 b35f02d7 bfi x23, x22, #33, #1 0x0000ffff80a8020c aa1703e0 mov x0, x23 0x0000ffff80a80210 b35e3ea0 bfi x0, x21, #34, #16 0x0000ffff80a80214 aa0003f5 mov x21, x0 0x0000ffff80a80218 b34e02d5 bfi x21, x22, #50, #1 0x0000ffff80a8021c 9ad426b7 lsr x23, x21, x20 0x0000ffff80a80220 b3403ee7 bfxil x7, x23, #0, #16 0x0000ffff80a80224 51000698 sub w24, w20, #0x1 (1) 0x0000ffff80a80228 9ad826b5 lsr x21, x21, x24 0x0000ffff80a8022c d34002b5 ubfx x21, x21, #0, #1 0x0000ffff80a80230 7100069f cmp w20, #0x1 (1) 0x0000ffff80a80234 9a9622b4 csel x20, x21, x22, hs 0x0000ffff80a80238 390b0394 strb w20, [x28, #704] 0x0000ffff80a8023c d34f3ef4 ubfx x20, x23, #15, #1 0x0000ffff80a80240 d34e3af5 ubfx x21, x23, #14, #1 0x0000ffff80a80244 ca150294 eor x20, x20, x21 0x0000ffff80a80248 390b2f94 strb w20, [x28, #715] 0x0000ffff80a8024c 58000040 ldr x0, pc+8 (addr 0xffff80a80254) 0x0000ffff80a80250 d63f0000 blr x0 0x0000ffff80a80254 967da128 bl #-0x6097b60 (addr 0xffff7a9e86f4) 0x0000ffff80a80258 0000ffff udf #0xffff 0x0000ffff80a8025c 00010023 unallocated (Unallocated) 0x0000ffff80a80260 00000000 udf #0x0 [DEBUG] RIP: 0x10020 [DEBUG] Guest Code instructions: 1 [DEBUG] Host Code instructions: 34 [DEBUG] Blow-up Amt: 34x After 16-bit: 0x0000ffffa7c801e0 10ffffe0 adr x0, #-0x4 (addr 0xffffa7c801dc) 0x0000ffffa7c801e4 f9005f80 str x0, [x28, #184] 0x0000ffffa7c801e8 d3403cb4 uxth x20, w5 0x0000ffffa7c801ec d3403cf5 uxth x21, w7 0x0000ffffa7c801f0 394b0396 ldrb w22, [x28, #704] 0x0000ffffa7c801f4 12001294 and w20, w20, #0x1f 0x0000ffffa7c801f8 b37002d5 bfi x21, x22, #16, #1 0x0000ffffa7c801fc b36f42b5 bfi x21, x21, #17, #17 0x0000ffffa7c80200 b35e42b5 bfi x21, x21, #34, #17 0x0000ffffa7c80204 9ad426b7 lsr x23, x21, x20 0x0000ffffa7c80208 b3403ee7 bfxil x7, x23, #0, #16 0x0000ffffa7c8020c 51000698 sub w24, w20, #0x1 (1) 0x0000ffffa7c80210 9ad826b5 lsr x21, x21, x24 0x0000ffffa7c80214 d34002b5 ubfx x21, x21, #0, #1 0x0000ffffa7c80218 7100069f cmp w20, #0x1 (1) 0x0000ffffa7c8021c 9a9622b4 csel x20, x21, x22, hs 0x0000ffffa7c80220 390b0394 strb w20, [x28, #704] 0x0000ffffa7c80224 d34f3ef4 ubfx x20, x23, #15, #1 0x0000ffffa7c80228 d34e3af5 ubfx x21, x23, #14, #1 0x0000ffffa7c8022c ca150294 eor x20, x20, x21 0x0000ffffa7c80230 390b2f94 strb w20, [x28, #715] 0x0000ffffa7c80234 58000040 ldr x0, pc+8 (addr 0xffffa7c8023c) 0x0000ffffa7c80238 d63f0000 blr x0 0x0000ffffa7c8023c bd9cc128 unallocated (Unallocated) 0x0000ffffa7c80240 0000ffff udf #0xffff 0x0000ffffa7c80244 00010023 unallocated (Unallocated) 0x0000ffffa7c80248 00000000 udf #0x0 [DEBUG] RIP: 0x10020 [DEBUG] Guest Code instructions: 1 [DEBUG] Host Code instructions: 28 [DEBUG] Blow-up Amt: 28x Before 8-bit: 0x0000ffffa92801e0 10ffffe0 adr x0, #-0x4 (addr 0xffffa92801dc) 0x0000ffffa92801e4 f9005f80 str x0, [x28, #184] 0x0000ffffa92801e8 d3401cb4 uxtb x20, w5 0x0000ffffa92801ec d3401cf5 uxtb x21, w7 0x0000ffffa92801f0 394b0396 ldrb w22, [x28, #704] 0x0000ffffa92801f4 12001294 and w20, w20, #0x1f 0x0000ffffa92801f8 d2800017 mov x23, #0x0 0x0000ffffa92801fc b3401eb7 bfxil x23, x21, #0, #8 0x0000ffffa9280200 b37802d7 bfi x23, x22, #8, #1 0x0000ffffa9280204 b3771eb7 bfi x23, x21, #9, #8 0x0000ffffa9280208 b36f02d7 bfi x23, x22, #17, #1 0x0000ffffa928020c b36e1eb7 bfi x23, x21, #18, #8 0x0000ffffa9280210 b36602d7 bfi x23, x22, #26, #1 0x0000ffffa9280214 b3651eb7 bfi x23, x21, #27, #8 0x0000ffffa9280218 b35d02d7 bfi x23, x22, #35, #1 0x0000ffffa928021c aa1703e0 mov x0, x23 0x0000ffffa9280220 b35c1ea0 bfi x0, x21, #36, #8 0x0000ffffa9280224 aa0003f5 mov x21, x0 0x0000ffffa9280228 b35402d5 bfi x21, x22, #44, #1 0x0000ffffa928022c 9ad426b7 lsr x23, x21, x20 0x0000ffffa9280230 b3401ee7 bfxil x7, x23, #0, #8 0x0000ffffa9280234 51000698 sub w24, w20, #0x1 (1) 0x0000ffffa9280238 9ad826b5 lsr x21, x21, x24 0x0000ffffa928023c d34002b5 ubfx x21, x21, #0, #1 0x0000ffffa9280240 7100069f cmp w20, #0x1 (1) 0x0000ffffa9280244 9a9622b4 csel x20, x21, x22, hs 0x0000ffffa9280248 390b0394 strb w20, [x28, #704] 0x0000ffffa928024c d3471ef4 ubfx x20, x23, #7, #1 0x0000ffffa9280250 d3461af5 ubfx x21, x23, #6, #1 0x0000ffffa9280254 ca150294 eor x20, x20, x21 0x0000ffffa9280258 390b2f94 strb w20, [x28, #715] 0x0000ffffa928025c 58000040 ldr x0, pc+8 (addr 0xffffa9280264) 0x0000ffffa9280260 d63f0000 blr x0 0x0000ffffa9280264 bf062128 unallocated (Unallocated) 0x0000ffffa9280268 0000ffff udf #0xffff 0x0000ffffa928026c 00010022 unallocated (Unallocated) 0x0000ffffa9280270 00000000 udf #0x0 [DEBUG] RIP: 0x10020 [DEBUG] Guest Code instructions: 1 [DEBUG] Host Code instructions: 38 [DEBUG] Blow-up Amt: 38x After 8-bit: 0x0000ffff9cc801e0 10ffffe0 adr x0, #-0x4 (addr 0xffff9cc801dc) 0x0000ffff9cc801e4 f9005f80 str x0, [x28, #184] 0x0000ffff9cc801e8 d3401cb4 uxtb x20, w5 0x0000ffff9cc801ec d3401cf5 uxtb x21, w7 0x0000ffff9cc801f0 394b0396 ldrb w22, [x28, #704] 0x0000ffff9cc801f4 12001294 and w20, w20, #0x1f 0x0000ffff9cc801f8 b37802d5 bfi x21, x22, #8, #1 0x0000ffff9cc801fc b37722b5 bfi x21, x21, #9, #9 0x0000ffff9cc80200 b36e46b5 bfi x21, x21, #18, #18 0x0000ffff9cc80204 b3778eb5 bfi x21, x21, #9, #36 0x0000ffff9cc80208 9ad426b7 lsr x23, x21, x20 0x0000ffff9cc8020c b3401ee7 bfxil x7, x23, #0, #8 0x0000ffff9cc80210 51000698 sub w24, w20, #0x1 (1) 0x0000ffff9cc80214 9ad826b5 lsr x21, x21, x24 0x0000ffff9cc80218 d34002b5 ubfx x21, x21, #0, #1 0x0000ffff9cc8021c 7100069f cmp w20, #0x1 (1) 0x0000ffff9cc80220 9a9622b4 csel x20, x21, x22, hs 0x0000ffff9cc80224 390b0394 strb w20, [x28, #704] 0x0000ffff9cc80228 d3471ef4 ubfx x20, x23, #7, #1 0x0000ffff9cc8022c d3461af5 ubfx x21, x23, #6, #1 0x0000ffff9cc80230 ca150294 eor x20, x20, x21 0x0000ffff9cc80234 390b2f94 strb w20, [x28, #715] 0x0000ffff9cc80238 58000040 ldr x0, pc+8 (addr 0xffff9cc80240) 0x0000ffff9cc8023c d63f0000 blr x0 0x0000ffff9cc80240 b2a75128 unallocated (Unallocated) 0x0000ffff9cc80244 0000ffff udf #0xffff 0x0000ffff9cc80248 00010022 unallocated (Unallocated) 0x0000ffff9cc8024c 00000000 udf #0x0 [DEBUG] RIP: 0x10020 [DEBUG] Guest Code instructions: 1 [DEBUG] Host Code instructions: 29 [DEBUG] Blow-up Amt: 29x ```	2023-07-17 19:13:23 -07:00
Lioncache	98f51c47fa	Frontend: Handle VSIB byte Extends handling of the SIB byte to also handle the AVX2 VSIB byte. While we're in the area, we can set up the gather instruction flags as well.	2023-07-17 16:16:21 -04:00
Ryan Houdek	724a8e13bf	Merge pull request #2789 from lioncash/scatter Arm64/Emitter: Handle ST1{*} scatter store variants	2023-07-17 09:20:06 -07:00
Mai	d9b52fd67d	Merge pull request #2785 from Sonicadvance1/32bit_mov_bitmask ArmEmitter: Support 32-bit bitmask moves	2023-07-17 12:13:27 -04:00
Mai	d3a2795106	Merge pull request #2781 from Sonicadvance1/optimize_phminposuw OpcodeDispatcher: Minor optimization to phminposuw	2023-07-17 12:12:25 -04:00
Mai	3cd6c2d91a	Merge pull request #2779 from Sonicadvance1/optimize_shiftd OpcodeDispatcher: Optimize 32/64-bit SH{L,R}D with extr	2023-07-17 12:11:51 -04:00
Mai	b86abfbccf	Merge pull request #2778 from Sonicadvance1/move_tls_signal_frontend SignalDelegator: Moves last TLS variable to the frontend	2023-07-17 12:10:58 -04:00
Mai	ee66985ae0	Merge pull request #2777 from Sonicadvance1/deadstore_elimination IR/Passes: Fixes DeadStoreElimination pass	2023-07-17 12:09:54 -04:00
Mai	daeba0625f	Merge pull request #2776 from Sonicadvance1/fix_constprop_mask ConstProp: Fix shift mask in const-prop	2023-07-17 12:08:33 -04:00
Lioncache	5e6af25194	Arm64/Emitter: Handle ST1{*} Vector + Imm scatter stores	2023-07-17 12:05:27 -04:00
Lioncache	7f4528a6b0	Arm64/Emitter: Handle ST1{*} Scalar + Vector scatter stores	2023-07-17 11:11:30 -04:00
Ryan Houdek	776b7674e4	Arm64: Only clear icache for code Currently we're clearing icache including the data that lives on the tail of the block. Instead only clear the code that the was emitted and not tail data. Additionally only disasm the code rather than all the tail data as well, as it gets unwieldy if viewing.	2023-07-16 22:24:01 -07:00
Ryan Houdek	24cb2610a2	OpcodeDispatcher: Another FCMov minor optimization If we are loading exactly the flags we need from the RFLAGS (ensuring we don't load the reserved flag in bit 1) then we don't need to do a mask on the result. Additionally there is some bad code-motion around selects that was causing SBFE operations to occur on constants. Ensure that we const-prop any SBFE operations to clean this up. This PR along with #2783 causes FMOV blow-up to go from 41 instruction to 31 instructions.	2023-07-16 21:37:54 -07:00
Ryan Houdek	54b7a43b95	ArmEmitter: Support 32-bit bitmask moves Noticed this when inspecting some code that was moving constant `0x80808080` in to a register. Was using two move instructions when it could have used a single bitmask move. This now checks to see if a constant can be 32-bit encoded in a logical bitmask move and uses that.	2023-07-16 18:52:34 -07:00
Ryan Houdek	ead43c6a51	OpcodeDispatcher: Minor optimization to FILD Removes one instruction from FILD, or two instructions if the CPU supports the CSSC extension. Going from 47 instructions to 46/45.	2023-07-16 01:28:57 -07:00
Ryan Houdek	e49de77225	IR: Implements a 2's complement Integer absolute Supports CSSC extension.	2023-07-16 01:28:57 -07:00
Ryan Houdek	8f4fe39b7d	Emitter: Adds support for CNEG instruction alias tests included.	2023-07-16 00:25:52 -07:00
Ryan Houdek	f250509718	OpcodeDispatcher: Minor optimization to phminposuw This instruction doesn't match ARM semantics very well since it returns the position of the minimum element. But at the very least the insert in to the final instruction can be a bit more optimal, Converts an 5 inst eor+mov+mov+mov+mov in to 2 inst mov+mov. This works because `VUMinV` already zero extends the vector so the position only needs to be inserted at the end.	2023-07-15 23:23:32 -07:00
Ryan Houdek	6179c5a13e	OpcodeDispatcher: Optimize 32/64-bit SH{L,R}D with extr 32-bit and 64-bit SH{L,R}D matches behaviour of EXTR. Optimize to using this op in that case. This converts the lsl+lsr+orr sequence in to a single extr instruction. 16-bit still goes down the old path. Weirdly this code manages to have a bad insert for no reason? But unrelated since this happens in the old code as well. ``` %4(GPRFixed3) i64 = LoadRegister #0x0, #0x20, GPR, GPRFixed, u8:Tmp:Size %5(GPR0) i64 = LoadRegister #0x0, #0x8, GPR, GPRFixed, u8:Tmp:Size %6(GPRFixed0) i64 = Extr %5(GPR0) i64, %4(GPRFixed3) i64, #0x3e ``` Not sure why the SRA fails on that second LoadRegister.	2023-07-15 22:05:39 -07:00
Ryan Houdek	9e14a83442	SignalDelegator: Moves last TLS variable to the frontend There was one holdout variable that was in a TLS object in FEXCore. Move it to the frontend with the rest of the TLS variables. Allows us to remove "Frontend" TLS management to be the only TLS management.	2023-07-15 20:28:58 -07:00
Ryan Houdek	95bfd003d2	IR/Passes: Fixes DeadStoreElimination pass This pass is currently doing nothing in main. Ever since we have enforced that LoadContext/StoreContext doesn't touch GPRs and FPRs, this has only been eliminating flags. Remove that usage of LoadContext/StoreContext and replace with their their replacement of LoadRegister/StoreRegister for tracking GPR and FPR accesses. Stripped from #2700 since this is safe to merge.	2023-07-14 22:22:26 -07:00
Ryan Houdek	08ca43c3c4	ConstProp: Fix shift mask in const-prop Noticed while looking at #2700. Testing doesn't currently see this as a bug but will once #2700 starts optimizing StoreRegister+LoadRegister pairs. Doesn't fix the issues in that PR, but this is one.	2023-07-14 18:14:57 -07:00
Lioncache	1a18bbb966	Arm64: Emitter: Handle LD1{}/LDFF1{} Vector + Immediate encodings While we're in the area implementing the Scalar + Vector variants, we may as well cross off the Vector + Immediate variants and complete all of the load variants for the regular LD1{*} loads	2023-07-14 00:50:48 -04:00
Ryan Houdek	699c3f5762	Merge pull request #2773 from lioncash/memop Arm64/Emitter: Simplify SVEMemOperand data union	2023-07-13 18:24:44 -07:00
Lioncache	c1205eb809	Arm64/Emitter: Mark SVEMemOperand Type enum as enum class Now that we have helpers to make querying a little less verbose, we can mark the enum as an enum class to get rid of implicit conversions.	2023-07-13 15:28:17 -04:00
Lioncache	31b7cd77e9	Arm64/Emitter: Simplify SVEMemOperand data union We can just move the header out of the union, since it's present in all cases.	2023-07-13 15:28:14 -04:00
Ryan Houdek	68a2441e65	Merge pull request #2772 from lioncash/insrem OpcodeDispatcher: Narrow use of LoadXMMRegister in StoreResult_WithOpSize	2023-07-13 12:07:05 -07:00
Ryan Houdek	1d7b4bb522	Merge pull request #2768 from alyssarosenzweig/fix/pf OpcodeDispatcher: Fix and optimize PF calculation	2023-07-13 11:54:30 -07:00
Ryan Houdek	22f95e627d	Merge pull request #2769 from random415/main fix spelling errors	2023-07-13 11:48:39 -07:00
Ryan Houdek	599b64e975	Merge pull request #2771 from alyssarosenzweig/print/de-ssa IR: Print SSA values as %123 instead of %ssa123	2023-07-13 11:48:28 -07:00
Lioncache	58c93568f6	OpcodeDispatcher: Narrow use of LoadXMMRegister in StoreResult_WithOpSize This only needs to be loaded when a partial insert needs to be performed, so we can narrow it's scope instead of always loading it in the AVX case.	2023-07-13 14:23:34 -04:00
Alyssa Rosenzweig	491e4e2c23	IR: Print SSA values as %123 instead of %ssa123 This is less noisy with no loss of clarity, and follows the notation used by both LLVM IR and NIR. (So, it should be familiar.) Change done with: sed -i -e 's/%ssa/%/g' $(git grep -l '%ssa') Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2023-07-13 11:52:36 -04:00
Lioncache	1ed2e24fba	Arm64/Emitter: Handle LDFF1{} scalar plus vector variants These can use the same handling code that we introduced for the normal LD1{} gathers, so we may as well expose support for them.	2023-07-13 11:25:44 -04:00
Lioncache	1fbb8bd78f	Arm64/Emitter: Handle LD1{*} Scalar + Vector variants	2023-07-13 11:25:41 -04:00
Elias James Howell	b953433404	fix spelling errors Fixing some minor spelling errors which should not affect functionality but improve the overall quality of documentation.	2023-07-13 11:23:59 -04:00
Alyssa Rosenzweig	eae950be16	OpcodeDispatcher: Use vector ops for PF calculation On current targets, popcount is a vector op. By using VPopcount ourselves when calculating, we can reduce some pointless masking. Before: and x22, x4, #0xff fmov d0, x22 cnt v0.8b, v0.8b addv b0, v0.8b umov w22, v0.b[0] eor x22, x22, #0x1 strb w22, [x28, #706] After: eor x22, x4, #0x1 fmov s4, w22 cnt v4.16b, v4.16b umov w22, v4.b[0] strb w22, [x28, #706] llvm-mca before: Iterations: 100 Instructions: 700 Total Cycles: 2002 Total uOps: 700 Dispatch Width: 2 uOps Per Cycle: 0.35 IPC: 0.35 Block RThroughput: 3.5 llvm-mca after: Iterations: 100 Instructions: 500 Total Cycles: 1402 Total uOps: 500 Dispatch Width: 2 uOps Per Cycle: 0.36 IPC: 0.36 Block RThroughput: 2.5 Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2023-07-13 11:08:47 -04:00
Alyssa Rosenzweig	7e6bb04db1	OpcodeDispatcher: Extract CalculatePF This does duplicate the _Constant(1) but it doesn't matter because it gets inlined into the eor anyway. There is no functional change here. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2023-07-13 10:05:18 -04:00
Alyssa Rosenzweig	716cac35a8	OpcodeDispatcher: Fix PF calculation We store garbage in the upper bits. That's ok, but it means we need to mask on read for correct behaviour. Closes #2767 Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2023-07-13 08:38:46 -04:00
Ryan Houdek	9722c4c5a4	Merge pull request #2766 from alyssarosenzweig/flags/add-of OpcodeDispatcher: Optimize ADD/ADC OF flag packing	2023-07-12 15:47:21 -07:00
Alyssa Rosenzweig	e8c0e19afc	OpcodeDispatcher: "Calculcate" -> "Calculate" Typofix. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2023-07-12 18:07:04 -04:00
Alyssa Rosenzweig	c559fec959	OpcodeDispatcher: Optimize ADD/ADC OF flag packing We can fold the Not into the And. This requires flipping the arguments to Andn, but we do not flip the order of the assignments since that requires an extra register in a test I'm looking at. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2023-07-12 18:06:36 -04:00
Alyssa Rosenzweig	8d2fabe705	OpcodeDispatcher: Deduplicate ADD/ADC OF generation Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2023-07-12 18:06:36 -04:00
Ryan Houdek	5dbd1b8dc2	FEXCore: Removes unused TLS variable Not sure why this still existed.	2023-07-12 13:05:47 -07:00
Ryan Houdek	5fef0c29aa	FEXCore: Rename Telemetry helper function `GetObject` WIN32 has a define already called `GetObject` and will cause our symbol to have an A appended to it and break linking. Just rename it to `GetTelemetryValue`	2023-07-12 11:53:13 -07:00
Ryan Houdek	d387c46aab	FEXCore: Fixes WIN32 compiling again Mostly a quick bandage while I'm setting getting ready to setup the runners to test this for us.	2023-07-12 11:53:13 -07:00
Mai	ddd6dbfdcc	Merge pull request #2759 from Sonicadvance1/redundant_bfe_flags OpcodeDispatcher: Remove spurious bfe with flag storing	2023-07-10 22:19:21 -04:00
Mai	7f2557e322	Merge pull request #2757 from Sonicadvance1/optimize_movss_reg OpcodeDispatcher: Optimize MOVSS to register	2023-07-10 21:22:47 -04:00
Mai	810c7d926c	Merge pull request #2758 from Sonicadvance1/optimize_tso_vector_loadstores IR: Optimize vector TSO loadstore address calculation	2023-07-10 21:21:46 -04:00
Ryan Houdek	04c325661c	OpcodeDispatcher: Remove spurious bfe with flag storing Noticed during introspection that we were generating zero constants redundantly. Bunch of single cycle hits or zero-register renames. Every time a `SetRFLAG` helper was called, it was /always/ doing a BFE on everything passed in to extract the lowest bit. In nearly all cases the data getting passed in is already only the lowest bit. Instead, stop the helper from doing this BFE, and ensure the OpcodeDispatcher does BFE in the couple of cases it still needs to do. As I was skimming through all these to ensure BFE isn't necessary, I did notice that some of the BCD instructions are wrong or questionable. So I left a comment on those so we can come back to it.	2023-07-10 18:03:23 -07:00
Ryan Houdek	2d800b2627	IR: Optimize vector TSO loadstore address calculation These address calculations were failing to understand that they can be optimized. When TSO emulation is disabled these were fine, but with TSO we were eating one more instruction. Before: ``` add x20, x12, #0x4 (4) dmb ish ldr s16, [x20] dmb ish ``` After: ``` dmb ish ldr s16, [x12, #4] dmb ish ``` Also left a note that once LRCPC3 is supported in hardware that we can do a similar optimization there.	2023-07-10 15:21:46 -07:00
Ryan Houdek	55ed3e0549	OpcodeDispatcher: Optimize MOVSS to register Easily fixed. Found through inspection. Before: ``` eor v0.16b, v0.16b, v0.16b mov v0.s[0], v17.s[0] mov v4.16b, v0.16b mov v16.s[0], v4.s[0] ``` After: ``` mov v16.s[0], v17.s[0] ```	2023-07-10 14:36:27 -07:00
Ryan Houdek	55d084ebb0	OpcodeDispatcher: Optimize MOVSS to memory destination Easy fixed. Found through inspection. Before: ``` eor v0.16b, v0.16b, v0.16b mov v0.s[0], v16.s[0] mov v4.16b, v0.16b str s4, [x11] ``` After: ``` str s16, [x11] ```	2023-07-10 14:25:01 -07:00
Mai	98eda5e163	Merge pull request #2749 from Sonicadvance1/optimize_away_redundant_masks OpcodeDispatcher: Optimize some shifts size masking	2023-07-10 08:08:57 -04:00
Ryan Houdek	92d0344d6a	OpcodeDispatcher: Fixes bug with pcmpestri When this instruction returns the index in to the ecx register, this is defined as a 32-bit result. This means it actually gets zero-extended to the full 64-bit GPR size on 64-bit processes. Previously FEX was doing a 32-bit insert which leaves garbage data in the upper 32-bits of the RCX register. Adds a unit test to ensure the result is zero extended. Fixes running Java games under FEX now that SSE4.2 is exposed.	2023-07-08 18:08:47 -07:00
Ryan Houdek	9327435f97	OpcodeDispatcher: Optimize some shifts size masking Inspired from #2561, these shifts don't need to be masked if we know their operating size up front. Causes a handful of these to become more optimal.	2023-07-08 16:41:15 -07:00
Mai	8a4bfba47c	Merge pull request #2745 from Sonicadvance1/optimize_fcmov OpcodeDispatcher: Optimize GetPackedRFLAG	2023-07-07 22:29:52 -04:00
Mai	69ea03f0eb	Merge pull request #2746 from Sonicadvance1/optimize_maskmov OpcodeDispatcher: Optimize MASKMOVDQU and MASKMOVQ	2023-07-07 22:29:37 -04:00
Ryan Houdek	15f5fe658b	OpcodeDispatcher: Optimize MASKMOVDQU and MASKMOVQ This previous implementation was particularly gnarly. Because these instructions are both weackly ordered and have implementation dependent exception and trap behaviour these can actually be fairly conveniently converted over to a load + cmlt + bsl + str instruction. For the XMM variant this reduces code blowup from 80x to 15x! For the MMX variant this reduces code blowup from 46x to 17x! Both of these improvements are significant wins! There's still some minor improvement that could be done with bsl that requires some redundant moves, but since we don't have constraint support for this we still eat two additional instructions Before: ```asm 0x0000ffff7b800718 10ffffe0 adr x0, #-0x4 (addr 0xffff7b800714) 0x0000ffff7b80071c f9005f80 str x0, [x28, #184] 0x0000ffff7b800720 4eb11e24 mov v4.16b, v17.16b 0x0000ffff7b800724 4eb01e05 mov v5.16b, v16.16b 0x0000ffff7b800728 aa0b03f4 mov x20, x11 0x0000ffff7b80072c 4e083c95 mov x21, v4.d[0] 0x0000ffff7b800730 4e083cb6 mov x22, v5.d[0] 0x0000ffff7b800734 d3471eb7 ubfx x23, x21, #7, #1 0x0000ffff7b800738 b4000077 cbz x23, #+0xc (addr 0xffff7b800744) 0x0000ffff7b80073c d3401ed7 uxtb x23, w22 0x0000ffff7b800740 39000297 strb w23, [x20] 0x0000ffff7b800744 d34f3eb7 ubfx x23, x21, #15, #1 0x0000ffff7b800748 b4000077 cbz x23, #+0xc (addr 0xffff7b800754) 0x0000ffff7b80074c d3483ed7 ubfx x23, x22, #8, #8 0x0000ffff7b800750 39000697 strb w23, [x20, #1] 0x0000ffff7b800754 d3575eb7 ubfx x23, x21, #23, #1 0x0000ffff7b800758 b4000077 cbz x23, #+0xc (addr 0xffff7b800764) 0x0000ffff7b80075c d3505ed7 ubfx x23, x22, #16, #8 0x0000ffff7b800760 39000a97 strb w23, [x20, #2] 0x0000ffff7b800764 d35f7eb7 ubfx x23, x21, #31, #1 0x0000ffff7b800768 b4000077 cbz x23, #+0xc (addr 0xffff7b800774) 0x0000ffff7b80076c d3587ed7 ubfx x23, x22, #24, #8 0x0000ffff7b800770 39000e97 strb w23, [x20, #3] 0x0000ffff7b800774 d3679eb7 ubfx x23, x21, #39, #1 0x0000ffff7b800778 b4000077 cbz x23, #+0xc (addr 0xffff7b800784) 0x0000ffff7b80077c d3609ed7 ubfx x23, x22, #32, #8 0x0000ffff7b800780 39001297 strb w23, [x20, #4] 0x0000ffff7b800784 d36fbeb7 ubfx x23, x21, #47, #1 0x0000ffff7b800788 b4000077 cbz x23, #+0xc (addr 0xffff7b800794) 0x0000ffff7b80078c d368bed7 ubfx x23, x22, #40, #8 0x0000ffff7b800790 39001697 strb w23, [x20, #5] 0x0000ffff7b800794 d377deb7 ubfx x23, x21, #55, #1 0x0000ffff7b800798 b4000077 cbz x23, #+0xc (addr 0xffff7b8007a4) 0x0000ffff7b80079c d370ded7 ubfx x23, x22, #48, #8 0x0000ffff7b8007a0 39001a97 strb w23, [x20, #6] 0x0000ffff7b8007a4 d37ffeb5 lsr x21, x21, #63 0x0000ffff7b8007a8 b4000075 cbz x21, #+0xc (addr 0xffff7b8007b4) 0x0000ffff7b8007ac d378fed5 lsr x21, x22, #56 0x0000ffff7b8007b0 39001e95 strb w21, [x20, #7] 0x0000ffff7b8007b4 4e183c95 mov x21, v4.d[1] 0x0000ffff7b8007b8 4e183cb6 mov x22, v5.d[1] 0x0000ffff7b8007bc d3471eb7 ubfx x23, x21, #7, #1 0x0000ffff7b8007c0 b4000077 cbz x23, #+0xc (addr 0xffff7b8007cc) 0x0000ffff7b8007c4 d3401ed7 uxtb x23, w22 0x0000ffff7b8007c8 39002297 strb w23, [x20, #8] 0x0000ffff7b8007cc d34f3eb7 ubfx x23, x21, #15, #1 0x0000ffff7b8007d0 b4000077 cbz x23, #+0xc (addr 0xffff7b8007dc) 0x0000ffff7b8007d4 d3483ed7 ubfx x23, x22, #8, #8 0x0000ffff7b8007d8 39002697 strb w23, [x20, #9] 0x0000ffff7b8007dc d3575eb7 ubfx x23, x21, #23, #1 0x0000ffff7b8007e0 b4000077 cbz x23, #+0xc (addr 0xffff7b8007ec) 0x0000ffff7b8007e4 d3505ed7 ubfx x23, x22, #16, #8 0x0000ffff7b8007e8 39002a97 strb w23, [x20, #10] 0x0000ffff7b8007ec d35f7eb7 ubfx x23, x21, #31, #1 0x0000ffff7b8007f0 b4000077 cbz x23, #+0xc (addr 0xffff7b8007fc) 0x0000ffff7b8007f4 d3587ed7 ubfx x23, x22, #24, #8 0x0000ffff7b8007f8 39002e97 strb w23, [x20, #11] 0x0000ffff7b8007fc d3679eb7 ubfx x23, x21, #39, #1 0x0000ffff7b800800 b4000077 cbz x23, #+0xc (addr 0xffff7b80080c) 0x0000ffff7b800804 d3609ed7 ubfx x23, x22, #32, #8 0x0000ffff7b800808 39003297 strb w23, [x20, #12] 0x0000ffff7b80080c d36fbeb7 ubfx x23, x21, #47, #1 0x0000ffff7b800810 b4000077 cbz x23, #+0xc (addr 0xffff7b80081c) 0x0000ffff7b800814 d368bed7 ubfx x23, x22, #40, #8 0x0000ffff7b800818 39003697 strb w23, [x20, #13] 0x0000ffff7b80081c d377deb7 ubfx x23, x21, #55, #1 0x0000ffff7b800820 b4000077 cbz x23, #+0xc (addr 0xffff7b80082c) 0x0000ffff7b800824 d370ded7 ubfx x23, x22, #48, #8 0x0000ffff7b800828 39003a97 strb w23, [x20, #14] 0x0000ffff7b80082c d37ffeb5 lsr x21, x21, #63 0x0000ffff7b800830 b4000075 cbz x21, #+0xc (addr 0xffff7b80083c) 0x0000ffff7b800834 d378fed5 lsr x21, x22, #56 0x0000ffff7b800838 39003e95 strb w21, [x20, #15] 0x0000ffff7b80083c 58000040 ldr x0, pc+8 (addr 0xffff7b800844) 0x0000ffff7b800840 d63f0000 blr x0 ``` After: ```asm 0x0000ffff7ac00718 10ffffe0 adr x0, #-0x4 (addr 0xffff7ac00714) 0x0000ffff7ac0071c f9005f80 str x0, [x28, #184] 0x0000ffff7ac00720 4e20aa24 cmlt v4.16b, v17.16b, #0 0x0000ffff7ac00724 3dc00165 ldr q5, [x11] 0x0000ffff7ac00728 4ea41c80 mov v0.16b, v4.16b 0x0000ffff7ac0072c 6e651e00 bsl v0.16b, v16.16b, v5.16b 0x0000ffff7ac00730 4ea01c04 mov v4.16b, v0.16b 0x0000ffff7ac00734 3d800164 str q4, [x11] 0x0000ffff7ac00738 58000040 ldr x0, pc+8 (addr 0xffff7ac00740) 0x0000ffff7ac0073c d63f0000 blr x0 ```	2023-07-07 18:37:17 -07:00
Ryan Houdek	052aa4317b	OpcodeDispatcher: Optimize `GetPackedRFLAG` Only return the particular flags that are being requested in the moment since compacting them all when requested is fairly slow. x87 fcmov in particular was requesting all the flags when it only needs a couple. This reduces a `fcmovb` instruction count blowup from 103x to 38x. Still more room to go but this one stood out as being particularly bad. Old: ```asm 0x0000000265a002bc 10ffffe0 adr x0, #-0x4 (addr 0x265a002b8) 0x0000000265a002c0 f9005f80 str x0, [x28, #184] 0x0000000265a002c4 d2800014 mov x20, #0x0 0x0000000265a002c8 d2800035 mov x21, #0x1 0x0000000265a002cc d2800056 mov x22, #0x2 0x0000000265a002d0 394b0397 ldrb w23, [x28, #704] 0x0000000265a002d4 d3407ef7 ubfx x23, x23, #0, #32 0x0000000265a002d8 aa1702d6 orr x22, x22, x23 0x0000000265a002dc 394b0b97 ldrb w23, [x28, #706] 0x0000000265a002e0 d3407ef7 ubfx x23, x23, #0, #32 0x0000000265a002e4 531e76f7 lsl w23, w23, #2 0x0000000265a002e8 aa1702d6 orr x22, x22, x23 0x0000000265a002ec 394b1397 ldrb w23, [x28, #708] 0x0000000265a002f0 d3407ef7 ubfx x23, x23, #0, #32 0x0000000265a002f4 531c6ef7 lsl w23, w23, #4 0x0000000265a002f8 aa1702d6 orr x22, x22, x23 0x0000000265a002fc 394b1b97 ldrb w23, [x28, #710] 0x0000000265a00300 d3407ef7 ubfx x23, x23, #0, #32 0x0000000265a00304 531a66f7 lsl w23, w23, #6 0x0000000265a00308 aa1702d6 orr x22, x22, x23 0x0000000265a0030c 394b1f97 ldrb w23, [x28, #711] 0x0000000265a00310 d3407ef7 ubfx x23, x23, #0, #32 0x0000000265a00314 531962f7 lsl w23, w23, #7 0x0000000265a00318 aa1702d6 orr x22, x22, x23 0x0000000265a0031c 394b2397 ldrb w23, [x28, #712] 0x0000000265a00320 d3407ef7 ubfx x23, x23, #0, #32 0x0000000265a00324 53185ef7 lsl w23, w23, #8 0x0000000265a00328 aa1702d6 orr x22, x22, x23 0x0000000265a0032c 394b2797 ldrb w23, [x28, #713] 0x0000000265a00330 d3407ef7 ubfx x23, x23, #0, #32 0x0000000265a00334 53175af7 lsl w23, w23, #9 0x0000000265a00338 aa1702d6 orr x22, x22, x23 0x0000000265a0033c 394b2b97 ldrb w23, [x28, #714] 0x0000000265a00340 d3407ef7 ubfx x23, x23, #0, #32 0x0000000265a00344 531656f7 lsl w23, w23, #10 0x0000000265a00348 aa1702d6 orr x22, x22, x23 0x0000000265a0034c 394b2f97 ldrb w23, [x28, #715] 0x0000000265a00350 d3407ef7 ubfx x23, x23, #0, #32 0x0000000265a00354 531552f7 lsl w23, w23, #11 0x0000000265a00358 aa1702d6 orr x22, x22, x23 0x0000000265a0035c 394b3397 ldrb w23, [x28, #716] 0x0000000265a00360 d3407ef7 ubfx x23, x23, #0, #32 0x0000000265a00364 53144ef7 lsl w23, w23, #12 0x0000000265a00368 aa1702d6 orr x22, x22, x23 0x0000000265a0036c 394b3b97 ldrb w23, [x28, #718] 0x0000000265a00370 d3407ef7 ubfx x23, x23, #0, #32 0x0000000265a00374 531246f7 lsl w23, w23, #14 0x0000000265a00378 aa1702d6 orr x22, x22, x23 0x0000000265a0037c 394b4397 ldrb w23, [x28, #720] 0x0000000265a00380 d3407ef7 ubfx x23, x23, #0, #32 0x0000000265a00384 53103ef7 lsl w23, w23, #16 0x0000000265a00388 aa1702d6 orr x22, x22, x23 0x0000000265a0038c 394b4797 ldrb w23, [x28, #721] 0x0000000265a00390 d3407ef7 ubfx x23, x23, #0, #32 0x0000000265a00394 530f3af7 lsl w23, w23, #17 0x0000000265a00398 aa1702d6 orr x22, x22, x23 0x0000000265a0039c 394b4b97 ldrb w23, [x28, #722] 0x0000000265a003a0 d3407ef7 ubfx x23, x23, #0, #32 0x0000000265a003a4 530e36f7 lsl w23, w23, #18 0x0000000265a003a8 aa1702d6 orr x22, x22, x23 0x0000000265a003ac 394b4f97 ldrb w23, [x28, #723] 0x0000000265a003b0 d3407ef7 ubfx x23, x23, #0, #32 0x0000000265a003b4 530d32f7 lsl w23, w23, #19 0x0000000265a003b8 aa1702d6 orr x22, x22, x23 0x0000000265a003bc 394b5397 ldrb w23, [x28, #724] 0x0000000265a003c0 d3407ef7 ubfx x23, x23, #0, #32 0x0000000265a003c4 530c2ef7 lsl w23, w23, #20 0x0000000265a003c8 aa1702d6 orr x22, x22, x23 0x0000000265a003cc 394b5797 ldrb w23, [x28, #725] 0x0000000265a003d0 d3407ef7 ubfx x23, x23, #0, #32 0x0000000265a003d4 530b2af7 lsl w23, w23, #21 0x0000000265a003d8 aa1702d6 orr x22, x22, x23 0x0000000265a003dc 924002d6 and x22, x22, #0x1 0x0000000265a003e0 93400294 sbfx x20, x20, #0, #1 0x0000000265a003e4 934002b5 sbfx x21, x21, #0, #1 0x0000000265a003e8 f10002df cmp x22, #0x0 (0) 0x0000000265a003ec 9a950294 csel x20, x20, x21, eq 0x0000000265a003f0 4e080e84 dup v4.2d, x20 0x0000000265a003f4 394baf94 ldrb w20, [x28, #747] 0x0000000265a003f8 91000695 add x21, x20, #0x1 (1) 0x0000000265a003fc 92400ab5 and x21, x21, #0x7 0x0000000265a00400 d2800200 mov x0, #0x10 0x0000000265a00404 9b007e80 mul x0, x20, x0 0x0000000265a00408 8b000380 add x0, x28, x0 0x0000000265a0040c 3dc0bc05 ldr q5, [x0, #752] 0x0000000265a00410 d2800200 mov x0, #0x10 0x0000000265a00414 9b007ea0 mul x0, x21, x0 0x0000000265a00418 8b000380 add x0, x28, x0 0x0000000265a0041c 3dc0bc06 ldr q6, [x0, #752] 0x0000000265a00420 4ea41c80 mov v0.16b, v4.16b 0x0000000265a00424 6e651cc0 bsl v0.16b, v6.16b, v5.16b 0x0000000265a00428 4ea01c04 mov v4.16b, v0.16b 0x0000000265a0042c d2800200 mov x0, #0x10 0x0000000265a00430 9b007e80 mul x0, x20, x0 0x0000000265a00434 8b000380 add x0, x28, x0 0x0000000265a00438 3d80bc04 str q4, [x0, #752] 0x0000000265a0043c 58000040 ldr x0, pc+8 (addr 0x265a00444) 0x0000000265a00440 d63f0000 blr x0 ``` New: ```asm 0x0000000265a002bc 10ffffe0 adr x0, #-0x4 (addr 0x265a002b8) 0x0000000265a002c0 f9005f80 str x0, [x28, #184] 0x0000000265a002c4 d2800014 mov x20, #0x0 0x0000000265a002c8 d2800035 mov x21, #0x1 0x0000000265a002cc d2800056 mov x22, #0x2 0x0000000265a002d0 394b0397 ldrb w23, [x28, #704] 0x0000000265a002d4 330002f6 bfxil w22, w23, #0, #1 0x0000000265a002d8 924002d6 and x22, x22, #0x1 0x0000000265a002dc 93400294 sbfx x20, x20, #0, #1 0x0000000265a002e0 934002b5 sbfx x21, x21, #0, #1 0x0000000265a002e4 f10002df cmp x22, #0x0 (0) 0x0000000265a002e8 9a950294 csel x20, x20, x21, eq 0x0000000265a002ec 4e080e84 dup v4.2d, x20 0x0000000265a002f0 394baf94 ldrb w20, [x28, #747] 0x0000000265a002f4 91000695 add x21, x20, #0x1 (1) 0x0000000265a002f8 92400ab5 and x21, x21, #0x7 0x0000000265a002fc d2800200 mov x0, #0x10 0x0000000265a00300 9b007e80 mul x0, x20, x0 0x0000000265a00304 8b000380 add x0, x28, x0 0x0000000265a00308 3dc0bc05 ldr q5, [x0, #752] 0x0000000265a0030c d2800200 mov x0, #0x10 0x0000000265a00310 9b007ea0 mul x0, x21, x0 0x0000000265a00314 8b000380 add x0, x28, x0 0x0000000265a00318 3dc0bc06 ldr q6, [x0, #752] 0x0000000265a0031c 4ea41c80 mov v0.16b, v4.16b 0x0000000265a00320 6e651cc0 bsl v0.16b, v6.16b, v5.16b 0x0000000265a00324 4ea01c04 mov v4.16b, v0.16b 0x0000000265a00328 d2800200 mov x0, #0x10 0x0000000265a0032c 9b007e80 mul x0, x20, x0 0x0000000265a00330 8b000380 add x0, x28, x0 0x0000000265a00334 3d80bc04 str q4, [x0, #752] 0x0000000265a00338 58000040 ldr x0, pc+8 (addr 0x265a00340) 0x0000000265a0033c d63f0000 blr x0 ```	2023-07-07 17:01:59 -07:00
Ryan Houdek	debcb0e047	Arm64: Optimize BFI in the case that Dst == srcDst ARM64 BFI doesn't allow you to encode two source registers here to match our SSA semantics. Also since we don't support RA constraints to ensure that these match, just do the optimal case in the backend. Leave a comment for future RA contraint excavators to make this more optimal	2023-07-07 16:43:41 -07:00
Ryan Houdek	baf04b6a41	FEXCore: Minor cleanup This isn't required anymore since we are exposing the virtual class directly.	2023-07-07 15:06:14 -07:00
Ryan Houdek	f9b352a093	Linux: Fixes hangs due to mutexes locked while fork happens. When a fork occurs FEX needs to be incredibly careful as any thread (that isn't forking) that holds a lock will vanish when the fork occurs. At this point if the newly forked process tries to use these mutexes then the process hangs indefinitely. The three major mutexes that need to be held during a fork: - Code Invalidation mutex - This is the highest priority and causes us to hang frequently. - This is highly likely to occur when one thread is loading shared libraries and another thread is forking. - Happens frequently with Wine and steam. - VMA tracking mutex - This one happens when one thread is allocating memory while a fork occurs. - This closely relates to the code invalidation mutex, just happens at the syscall layer instead of the FEXCore layer. - Happens as frequently as the code invalidation mutex. - Allocation mutex - This mutex is used for FEX's 64-bit Allocator, this happens when FEX is allocating memory on one thread and a fork occurs. - Fairly infrequent because jemalloc doesn't allocate VMA regions that often. While this likely doesn't hit all of the FEX mutexes, this hits the ones that are burning fires and are happening frequently. - FEXCore: Adds forkable mutex/locks Necessary since we have a few locations in FEX that need to be locked before and after a fork. When a fork occurs the locks must be locked prior to the fork. Then afterwards they either need to unlock or be set to default initialization state. - Parent - Does an unlock - Child - Sets the lock to default initialization state - This is because it pthreads does TID based ownership checking on unique locks and refcount based waiting for shared locks. - No way to "unlock" after fork in this case other than default initializing.	2023-07-04 02:13:06 -07:00
Ryan Houdek	d2032da452	Merge pull request #2737 from bylaws/main Some small fixes for android building	2023-07-01 14:59:18 -07:00
Billy Laws	35c52f20f9	AllocatorHooks: Avoid referencing valloc on Android This is not implemented in bionic, so follow the MINGW approach and implement it with _aligned_alloc.	2023-07-01 22:21:16 +01:00
Billy Laws	17c82c22a6	JitSymbols: Store symbol mappings in /data/local/tmp on Android	2023-07-01 22:13:44 +01:00
Ryan Houdek	df03a7b101	unittests/Emitter: Adds CSSC tests	2023-06-30 19:34:35 -07:00
Ryan Houdek	c859540d7e	Emitter: Adds support for CSSC Not used currently but will be used in the future.	2023-06-30 19:34:35 -07:00
Ryan Houdek	20794593e7	unittests/Emitter: Update tests for updated vixl Output in vixl changed for some of these. Most for the better but not all of them.	2023-06-30 19:34:35 -07:00
Ryan Houdek	a80a2bf569	External/vixl: Update	2023-06-30 19:11:22 -07:00
Mai	1a4d5a1abb	Merge pull request #2733 from Sonicadvance1/fix_jemalloc_checks External/jemalloc: Updates external jemallocs	2023-06-30 17:57:29 -04:00
Ryan Houdek	677b72c9a5	External/jemalloc: Updates external jemallocs Fixes their `malloc_usable_size` checks.	2023-06-28 09:26:45 -07:00
Ryan Houdek	71a8c66c95	Context: Removes dead `AddVirtualMemoryMapping` function This has been around since the initial commit. Bad idea that wasn't ever thought through. Something about remapping guest virtual and host virtual memory which will never be a thing.	2023-06-28 09:18:36 -07:00
Lioncache	bf773452ac	IR: Add missing formatters Currently RegisterClassType and FenceType are passed into logs, which fmt 10.0.0 is more strict about. Adds the formatters that were missing so that compilation can succeed without needing to change all log sites.	2023-06-17 09:42:31 -04:00
Lioncache	95dbccc0ab	Externals: Update fmt to 10.0.0 Keeps ourselves up to date with the latest major release.	2023-06-17 09:25:20 -04:00
Ryan Houdek	e5189d63a2	Merge pull request #2708 from Sonicadvance1/fix_paranoidtso Arm64: Fixes paranoidtso option for CPUs that support LRCPC/2	2023-06-16 13:32:43 -07:00
Ryan Houdek	9dcc1deec0	Merge pull request #2722 from Sonicadvance1/rip_reconstruction JIT: Implement support for per-instruction RIP reconstruction	2023-06-16 13:02:14 -07:00
Ryan Houdek	66d4206cd7	Merge pull request #2719 from lioncash/flags OpcodeDispatcher: Ensure MXCSR is saved/restored with FXSAVE/FXRSTOR	2023-06-16 13:01:56 -07:00
Lioncache	01837b3ad6	IR: Remove HasSideEffects for VPCMPXSTRX ops This is a leftover from early on and not necessary, since we don't operate on any state other than what is provided to the IR op itself.	2023-06-16 11:53:31 -04:00
Lioncache	bdb68840e3	IR: Move VPCMPESTRX REX handling to OpcodeDispatcher We can handle this in the dispatcher itself, so that we don't need to pass along the register size as a member of the opcode. This gets rid of some unnecessary duplication of functionality in the backends and makes it so potential backends don't need to deal with this.	2023-06-16 11:49:36 -04:00
Lioncache	4e2dcf3298	OpcodeDispatcher: Ensure MXCSR is saved/restored with FXSAVE/FXRSTOR Previously, the bits that we support in the MXCSR weren't being saved, which means that some opcode patterns may fail to restore the rounding mode properly. e.g. FXSAVE, followed by FNINIT, followed by FXRSTOR wouldn't restore the rounding mode properly This fixes that.	2023-06-16 09:25:53 -04:00
Ryan Houdek	628f825416	JIT: Implement support for per-instruction RIP reconstruction FEX's current implementation of RIP reconstruction is limited to the entrypoint that a single block has. This will cause the RIP to be incorrect past the first instruction in that block. While this is fine for a decent number of games, especially since fault handling isn't super common. This doesn't work for all situations. When testing Ultimate Chicken Horse, we found out that changing the block size to 1 worked around an early crash in the game's startup. This game is likely relying on Mono/Unity's AOT compilation step, which does some more robust faulting that the runtime JIT. Needing the RIP to be correct since they do some sort of checking for what the code came from. This fixes Ultimate Chicken Horse specifically, but will likely fix other games that are built the same way.	2023-06-14 17:28:56 -07:00
Ryan Houdek	a80327f6df	X86Tables: Adds some missing MEM_ACCESS flags to REP instructions	2023-06-14 17:04:50 -07:00
Ryan Houdek	c9712e45cb	Arm64: Fixes GPR pair allocation to get one pair back When executing a 32-bit application we were failing to allocate a single GPR pair. This meant we only have 7 pairs when we could have had 8. This was because r30 was ending up in the middle of the allocation arrays so we couldn't safely create a sequential pair of registers. Organize the register allocation arrays to be unique for each bitness being executed and then access them through spans instead. Also works around bug where the RA validation doesn't understand when pair indexes don't correlate directly to GPR indexes. So while the previous PR fixed the RA pass, it didn't fix the RA validation pass. Noticed this when pr57018 32-bit gcc test was run with the #2700 PR which improved the RA allocation a bit.	2023-06-13 20:04:51 -07:00
Lioncache	9017325c95	CPUID: Signify support for XSAVE if AVX is enabled Now that XSAVE and XRSTOR are implemented, we can enable the CPUID bits for them when AVX support is enabled.	2023-06-13 19:21:14 -04:00
Lioncache	ae536e44d7	OpcodeDispatcher: Handle XRSTOR	2023-06-13 17:47:45 -04:00
Lioncache	7679485cc3	OpcodeDispatcher: Handle XSAVE	2023-06-13 15:01:33 -04:00
Ryan Houdek	537562fab7	Arm64: Fixes register pair conflict. When FEX was updated to reclaim 64-bit registers in #2494, I had mistakenly messed up pair register class conflicts. The problem is that FEX has r30 stuck in the middle of the RA which causes the paired registers to need to offset their index half way. This meant that the conflict index being incorrect was always broken on 32-bit applications ever since that PR. Keep the intersection indexes in their own array so to can be correctly indexed at runtime. Thanks to @asahilina finding out that Osmos started crashing a few months ago and I finally just got around to bisecting what the problem was. This now fixes Osmos from crashing, although the motes are still invisible on the 32-bit application. Not sure what other havok this has been causing.	2023-06-12 23:31:16 -07:00
Mai	f8721992c2	Merge pull request #2712 from Sonicadvance1/fix_jemalloc_generate External: Update jemalloc trees	2023-06-12 17:12:24 -04:00
Lioncache	755600c371	CPUID: Signify support for SSE4.2 With all the kinks worked out of these instructions, we can finally enable SSE4.2	2023-06-12 13:19:38 -04:00
Lioncache	bec8b70e5d	VectorFallbacks: Fix PCMPSTR fallback ZF/SF flag setting So, uh, this was a little silly to track down. So, having the upper limit as unsigned was a mistake, since this would cause negative valid lengths to convert into an unsigned value within the first two flag comparison cases A -1 valid length can occur if one of the strings starts with a null character in a vector's first element. (It will be zero and we then subtract it to make the length zero-based). Fixes this edge-case up and expands a test to check for this in the future.	2023-06-12 13:13:24 -04:00
Ryan Houdek	bef8ddde48	External: Update jemalloc trees Allows us to generate a header at compile time for OS specific features. Should fix compiling on Android since they have a different function declaration for `malloc_usable_size` compared to Linux.	2023-06-12 09:34:30 -07:00
Mai	fe06f1b151	Merge pull request #2711 from Sonicadvance1/pad_ir_header_32bit IR: Pad IROp_Header to be 32-bit in width	2023-06-11 05:49:00 -04:00
Ryan Houdek	92a15e00c7	IR: Pad IROp_Header to be 32-bit in width We spent a bit of effort removing 8-bits from this header to get it down to three bytes. This ended up in PRs #2319 and #2320 There was no explicit need to go down to three bytes, the other two arguments we were removing were just better served to be lookups instead of adding IR overhead for each operation. This now introduced alignment issues that was brought up in #2472. Apparently the Android NDK's clang will pad nested structs like this, maybe to match alignment? Regardless we should just make it be 32-bit. This fixes Android execution of FEXCore. This fixes #2472 Pros: - Initialization now turns in to a single str because it's 32-bit - We have 8-bits more space that we can abuse in the IR op now - If we need more than 64-bit and 128-bit are easy bumps in the future Cons: - Each IR operation takes at minimum 25% more space in the intrusive allocators - Not really that big of a deal since we are talking 3 bytes versus 4.	2023-06-10 12:38:03 -07:00
Ryan Houdek	7ceadc6b5b	Move config layers to the frontend FEXCore has no need to understand how to load these layers. Which requires json parsing. Move these to the frontend which is already doing the configuration layer setup and initialization tasks anyway. Means FEXCore itself no longer needs to link to tiny-json which can be left to the frontend.	2023-06-09 18:15:40 -07:00
Ryan Houdek	8c41e8f7d8	Arm64: Fixes paranoidtso option for CPUs that support LRCPC/2 Regular LoadStoreTSO operations have gained support for LRCPC and LRCPC2 which changes the semantics of the operation by letting it support immediate offsets. The paranoid version of these operations didn't support the immediate offsets yet which was causing incorrect memory loadstores. Bring over the new semantics from the regular LoadStoreTSO but without any nop padding.	2023-06-09 16:32:28 -07:00
Ryan Houdek	784b3064fc	ArchHelpers: Convert a couple of magic numbers to constants Makes this easier to read.	2023-06-09 16:31:44 -07:00
Ryan Houdek	5b5808218b	Merge pull request #2703 from Sonicadvance1/minor_of_opt OpcodeDispatcher: Optimize ADC/ADD OF flag calculation	2023-06-07 12:54:55 -07:00
Ryan Houdek	41ec987f3e	OpcodeDispatcher: Optimize ADC/ADD OF flag calculation `eor <reg>, <reg>, #-1` can't be encoded as an instruction. Instead use mvn which does the same thing. Removes a single instruction from each OF calculation for ADC and ADD. Also no reason to use a switch statement for the source size, just use _Bfe and calculate the offset based on operation size. SBB caught in the crossfire to ensure it also isn't using a switch statement.	2023-06-07 12:40:51 -07:00
Ryan Houdek	03f73531d3	IRDumper: Fixes ssa number in arguments. This can spuriously end up as a hex number which makes it hard to reason why DCE wasn't deleting IR operations. Ensure it is always a decimal.	2023-06-07 09:52:04 -07:00
Ryan Houdek	a2cbfccb3b	OpcodeDispatcher: Optimize EFLAG unpacking Noticed this was slightly unoptimal. Resulting in a 18% code reduction in the case of of a simple four instruction test ASM case.	2023-06-06 17:56:25 -07:00
Mai	4e01452a65	Merge pull request #2699 from Sonicadvance1/minor_fcmov_opt X87: Super minor FCMOV optimization	2023-06-06 20:22:40 -04:00
Mai	cc7a56b1a6	Merge pull request #2689 from Sonicadvance1/fix_bmi CPUID: Only enable BMI1 and BMI2 if AVX is supported	2023-06-06 20:21:57 -04:00
Ryan Houdek	0b0dd3891e	X87: Super minor FCMOV optimization This caught my eye as I was skimming, remove one IR op per FCMOV instruction. This was just duplicating the generated GPR mask across the FPR.	2023-06-04 06:39:35 -07:00
Ryan Houdek	96a0364a86	Review comments	2023-06-02 21:53:52 -07:00
Ryan Houdek	c0a783997d	Convert remaining memory tracking to deferred signals	2023-06-01 11:35:22 -07:00
Ryan Houdek	f78537109d	Core: Convert mtrack code invalidation over to deferred signals	2023-06-01 11:35:22 -07:00
Ryan Houdek	0c156ed6f9	Context: Switch over to deferred signals	2023-06-01 11:28:04 -07:00
Ryan Houdek	8840b2154c	Allocator: Allow more optimal deferred signals path	2023-06-01 11:28:04 -07:00
Ryan Houdek	e02be8073e	FEXCore: Support deferred signal mutex This is part of FEXCore since it pulls in InternalThreadData, but is related to the FHU signal mutex class. Necessary to allow deferring signals in C++ code rather than right in the JIT.	2023-06-01 11:28:04 -07:00
Ryan Houdek	f75d3550b4	Jit64: Used deferred signals in dispatcher	2023-06-01 11:28:04 -07:00
Ryan Houdek	802c588695	Arm64: Use deferred signals in dispatcher	2023-06-01 11:28:04 -07:00
Ryan Houdek	fd962f40d7	SignalDelegator: Support deferring signals	2023-06-01 11:28:04 -07:00
Ryan Houdek	a9b660af69	CoreState: Add new members to track deferred signal capability	2023-06-01 11:28:04 -07:00
Ryan Houdek	5be798e9e6	Merge pull request #2693 from Sonicadvance1/remove_debug Context: Remove debug namespace	2023-06-01 11:26:05 -07:00
Ryan Houdek	c9d1f0d75a	Merge pull request #2687 from Sonicadvance1/telemetry_save_crash Telemetry: Save on signal terminate	2023-05-30 10:26:03 -07:00
Ryan Houdek	1dc4f8c429	Context: Remove debug namespace Unused and broken	2023-05-30 09:00:57 -07:00
Ryan Houdek	45d3b83143	Telemetry: Save on signal terminate When a signal handler is not installed and is a terminal failure, make sure to save telemetry before faulting. We know when an application is going down in this case so we can make sure to have the telemetry data saved. Adds a telemetry signal mask data point as well to know which signal took it down.	2023-05-30 08:49:33 -07:00
Ryan Houdek	c9101d3f68	CPUID: Only enable BMI1 and BMI2 if AVX is supported These two extensions rely on AVX being supported to be used. Primarily because they are VEX encoded. GTA5 is using these flags to determine if it should enable its AVX support.	2023-05-26 20:48:36 -07:00
Ryan Houdek	a6c6248bcb	ArmEmitter: Fixes bug in SpillStaticRegs Some code in FEX's Arm64 emitter was making an assumption that once SpillStaticRegs was called that it was safe to still use the SRA register state. This wasn't actually true since FEX was using one SRA register to optimize FPR stores. Assuming that the SRA registers were safe to use since they were just saved and no longer necessary. Correct this assumption hell by forcing users of the function to provide the temporary register directly. In all cases the users have a temporary available that it can use. Probably fixes some very weird edge case bugs.	2023-05-22 16:48:07 -07:00
Ryan Houdek	5646428640	FEXCore: Implements support for xgetbv This returns the `XFEATURE_ENABLED_MASK` register which reports what features are enabled on the CPU. This behaves similarly to CPUID where it uses an index register in ecx. This is a prerequisite to enabling XSAVE/XRSTOR and AVX since applications will expect this to exist. xsetbv is a privileged instruction and doesn't need to be implemented.	2023-05-22 16:48:07 -07:00
Ryan Houdek	6ef6d9c391	Thunks: Mostly reverts #2672 I forgot that x11 was part of the custom ABI of thunks. #2672 had broken thunks on ARM64. I thought I had tested a game with them enabled but apparently I tested the wrong game. Not a full revert since we can still ldr with a literal, but we also still need to adr x11 and nop pad. At least removes the data dependency on x11 from the ldr.	2023-05-18 15:50:55 -07:00
Ryan Houdek	3a4a965347	TestHarnessRunner: Support exiting on HLT Currently WINE's longjump doesn't work, so instead set a flag that if HLT is attempted, just exit the JIT. This will get our unittests executing at least.	2023-05-17 21:09:31 -07:00
Ryan Houdek	45cdab2ac3	HostFeatures: Use ID registers under Wine InferFromOS doesn't work under WINE. InferFromIDRegisters doesn't work under Windows but it will under Wine. Since we don't support Windows, just use InferFromIDRegisters.	2023-05-17 21:07:40 -07:00
Ryan Houdek	d675b4af6f	External: Update vixl	2023-05-17 21:07:40 -07:00
Ryan Houdek	363411f0c7	ArchHelpers: Adds missing stub function	2023-05-17 21:05:55 -07:00
Ryan Houdek	5bc418407c	FEXCore: Disable emitter unit tests on win32	2023-05-17 21:05:55 -07:00
Ryan Houdek	61ca651fe1	FEXCore: Don't initialize ThunkHandler on Win32 Adds a couple pointer checks to ensure it won't crash. Doesn't work and will cause assertions.	2023-05-17 21:05:55 -07:00
Mai	77e8be1215	Merge pull request #2671 from Sonicadvance1/wine_syscalls FEXCore: Support Wine syscalls	2023-05-18 00:04:25 -04:00
Lioncache	f7c663240e	OpcodeDispatcher: Handle PCMPESTRM/VPCMPESTRM ...and with that all of the SSE4.2 string instructions are implemented now	2023-05-17 00:21:55 -04:00
Lioncache	82b4aef30d	OpcodeDispatcher: Handle PCMPISTRM/VPCMPISTRM	2023-05-16 22:59:54 -04:00
Lioncache	22919a5b65	OpcodeDispatcher: Add mask variant handling to PCMPXSTXOpImpl() Will be used to handle PCMPESTRM/PCMPISTRM instruction variants.	2023-05-16 22:59:52 -04:00
Ryan Houdek	f47caf48c6	Merge pull request #2669 from Sonicadvance1/aotir_mutex AOTIR: Stop passing a mutex around. It's already guarded	2023-05-12 18:56:55 -07:00
Ryan Houdek	5674d3a871	Merge pull request #2667 from Sonicadvance1/fextl_file FEXCore: Convert Core and Telemetry over to fextl::file::File	2023-05-12 18:56:45 -07:00
Mai	e03b859c20	Merge pull request #2673 from Sonicadvance1/remove_warnings_13 OpcodeDispatcher: Removes a warning that cropped up.	2023-05-12 21:49:43 -04:00
Ryan Houdek	7d822ba1c8	OpcodeDispatcher: Removes a warning that cropped up.	2023-05-12 17:34:20 -07:00
Ryan Houdek	f90dcd2eb1	FEXCore: Convert Core and Telemetry over to FEXCore::File::File This way telemetry and IR dumping can work under Wine.	2023-05-12 17:32:48 -07:00
Ryan Houdek	adbdd33ece	fextl/fmt: Adds write handler for FEXCore::File::File	2023-05-12 17:32:48 -07:00
Ryan Houdek	06250d806d	FEXCore/Utils: Adds File type OS agnostic file class since we can't use std::FILE	2023-05-12 17:32:48 -07:00
Ryan Houdek	613ed559e7	Thunks: Optimize ARM64 trampoline No need to use adr for getting the PC relative literal, we can use LDR (literal) to load the PC relative address directly. Reduces trampline instructions from 3 to 2, also reduces trampoline size from 24-bytes to 16-bytes.	2023-05-12 17:28:36 -07:00
Ryan Houdek	8ac3841946	FEXCore: Support Wine syscalls Wine syscalls need to end the code block at the point of the syscall. This is because syscalls may update RIP which means the JIT loop needs to immediately restart. Additionally since they can update CPU state, make wine syscalls not return a result and instead refill the register state from the CPU state. This will mean the syscall handler will need to update their result register (RAX?) before returning.	2023-05-12 16:42:26 -07:00
Ryan Houdek	458259bf47	FEXCore: Move EnumOperators to FEXCore fextl needs this and can't depend on FHU	2023-05-12 15:23:00 -07:00
Ryan Houdek	2fc529d5b7	AOTIR: Stop passing a mutex around. It's already guarded	2023-05-11 03:56:33 -07:00
Ryan Houdek	ea489567da	ARM64: Fixes SRA disabled codepath Disabling SRA has been broken a quite a while. Disabling this was instrumental in figuring out the VC redistributable crash. Ensure it works by reintroducing non-SRA load/store register handlers, and by supporting runtime selectable dispatch pointers for the JIT. Side-bonus, moves the {LOAD,STORE}MEMTSO ops over to this dispatch as well to make it consistent and probably slightly quicker.	2023-05-11 03:25:19 -07:00
Ryan Houdek	6eae064511	FEXCore: Adds support for hardware x86-TSO prctl From https://github.com/AsahiLinux/linux/commits/bits/220-tso This fails gracefully in the case the upstream kernel doesn't support this feature, so can go in early. This feature allows FEX to use hardware's TSO emulation capability to reduce emulation overhead from our atomic/lrcpc implementation. In the case that the TSO emulation feature is enabled in FEX, we will check if the hardware supports this feature and then enable it. If the hardware feature is supported it will then use regular memory accesses with the expectation that these are x86-TSO in strength. The only hardware that anyone cares about that supports this is Apple's M class SoCs. Theoretically NVIDIA Denver/Carmel supports sequentially consistent, which isn't quite the same thing. I haven't cared to check if multithreaded SC has as strong of guarantees. But also since Carmel/Denver hardware is fairly rare, it's hard to care about for our use case.	2023-05-08 20:12:03 -07:00
Ryan Houdek	2d4bf97cac	FEXCore: Moves SIGBUS handler to FEXCore/Utils This can be done in an OS agnostic fashion. FEXCore knows the details of its JIT and should be done in FEXCore itself. The frontend is only necessary to inform FEXCore where the fault occured and provide the array of GPRs for accessing and modifying the signal state. This is necessary for supporting both Linux and Wine signal contexts with their unaligned access handlers.	2023-05-05 17:04:26 -07:00
Mai	f7d827a26a	Merge pull request #2662 from Sonicadvance1/disable_rdtscp CPUID: Disable RDTSCP under wine	2023-05-05 17:33:20 -04:00
Ryan Houdek	37b5bc49c6	Merge pull request #2656 from Sonicadvance1/fexcore_no_exceptions FEXCore: Compile without exceptions	2023-05-05 14:32:37 -07:00
Ryan Houdek	dcb3f182d6	CPUID: Disable RDTSCP under wine We don't have a sane way to query cpu index under wine. We could technically still use the syscall since we know that we are still executing under Linux, but that seems a bit terrible. Disable for now until something can be worked out. Not like it is used heavily anyway.	2023-05-05 13:52:39 -07:00
Mai	ba45bf4ae7	Merge pull request #2661 from Sonicadvance1/virtual_alloc_base Allocator: Adds VirtualAlloc with memory Base hint function	2023-05-05 14:35:24 -04:00
Mai	121d9fda2d	Merge pull request #2660 from Sonicadvance1/arm64_win32_ra Arm64Emitter: Replace x18 usage with x30	2023-05-05 14:35:02 -04:00
Mai	73ede9d000	Merge pull request #2659 from Sonicadvance1/save_platform_register ARM64Emitter: Ensure platform register is saved on win32	2023-05-05 14:34:25 -04:00
Mai	6dfea8a80f	Merge pull request #2657 from Sonicadvance1/remove_unnecessary_guard LookupCache: Removes unnecessary recursive lock_guard	2023-05-05 14:33:50 -04:00
Ryan Houdek	ef6c220a75	Allocator: Adds VirtualAlloc with memory Base hint function This will be used with the TestHarnessRunner in the future to map specific memory regions. This is only used as a hint rather than exact placement with failure on inability to map. This also hits the fun quirk of 64k allocation granularity which developers need to be careful about.	2023-05-04 15:39:32 -07:00
Ryan Houdek	1e4a6d432c	Merge pull request #2658 from Sonicadvance1/remove_unused_log LogManager: Remove unused handler	2023-05-04 15:32:04 -07:00
Ryan Houdek	4ebd180147	Arm64Emitter: Replace x18 usage with x30 Related to #2659 but not necessary directly. Currently x30(LR) is unused in our RA. In all locations that call out to code, we are already preserving LR and bringing it back after the fact. This was just a missed opportunity since we aren't doing any call-ret stack manipulations that would facilitate LR needing to stick around. Since x18 is a reserved platform register on win32, we can replace its usage with r19, and then replace r19 usage with x30 and everything just works happily. Now x18 is the unused register instead of x30 and we can come back in the future to gain one more register for RA on Linux platforms.	2023-05-04 15:25:47 -07:00
Ryan Houdek	ac4ef63ae6	ARM64Emitter: Ensure platform register is saved on win32 Platform register stores the TEB region on win32 and needs to be preserved if we're going to overwrite it. Ensure we do so.	2023-05-04 15:12:52 -07:00
Ryan Houdek	b2392ef1c6	LogManager: Remove unused handler This non-fmt handler is now entirely unused and can be removed.	2023-05-04 14:52:45 -07:00
Ryan Houdek	8e4d52396b	LookupCache: Removes unnecessary recursive lock_guard All code paths to this are already guaranteed to own the lock. The rest of the codepaths haven't been vetted to actually need recursive_mutex yet, but seems likely that it will be able to get converted to a regular mutex with some more work.	2023-05-04 14:45:19 -07:00
Ryan Houdek	6eeb45b2dc	FEXCore: Compile without exceptions This disables some unwinding overhead when FEXCore is already guaranteed to not throw.	2023-05-04 14:42:02 -07:00
Ryan Houdek	22cf2696da	fextl/memory: Don't allow arrays in fextl::make_unique This ensures we don't hit a programming error since we don't support the array version of this.	2023-05-04 14:38:12 -07:00
Alexandre Julliard	8081ac61e5	AllocatorHooks: Fix parameter order for Win32 _aligned_malloc. The prototype is the opposite of memalign().	2023-05-03 16:15:07 +02:00
Alexandre Julliard	435b4daae1	AllocatorHooks: Pass valid parameters to the Win32 VirtualAlloc.	2023-05-03 16:13:37 +02:00
Lioncache	5ee913bc75	OpcodeDispatcher: Simplify PCMPXSTRIOpImpl All variants of the PCMPXSTRX instructions will take their arguments in the same manner, so we don't need to specify them for each handler. We can also rename the function to PCMPXSTRXOpImpl, since this will be extended to handle the masking variants of the string instructions.	2023-05-02 18:48:35 -04:00
Lioncache	f502154f96	OpcodeDispatcher: Handle VPCMPISTRI	2023-05-02 14:00:05 -04:00
Lioncache	7a59fb3e25	IR: Add IR fallback for VPCMPISTRX Will be the fallback that handles the implicit length string instruction emulation.	2023-05-02 13:52:30 -04:00
Mai	590422b295	Merge pull request #2641 from Sonicadvance1/remove_unittest_gen FEXCore: Stop exposing the x86 table data symbols	2023-04-26 05:09:23 -04:00
Ryan Houdek	9d268df91f	Softfloat: Disable some duplicate BIGFLOAT handlers Since mingw has its reduced precision has double, these handlers were duplicated and causing compile failure.	2023-04-26 01:48:37 -07:00
Ryan Houdek	699541485d	Arm64: Disable ProcessorID and Break on mingw Currently unsupported on mingw	2023-04-26 01:48:37 -07:00
Ryan Houdek	46a63186a2	FEXCore: Name libFEXCore correctly and use sync library	2023-04-26 01:48:37 -07:00
Ryan Houdek	90f347839d	InterruptableConditionVariable: Implement for mingw	2023-04-26 01:48:37 -07:00
Ryan Houdek	c9e7d9f331	FEXCore: Disable IRDumper on mingw	2023-04-26 01:48:37 -07:00
Ryan Houdek	8c3a3bfb7c	FEXCore: Resolve some header includes Some aren't necessary anymore. Some need to not exist on mingw.	2023-04-26 01:48:37 -07:00
Ryan Houdek	9034946b43	Move UContext from FEXCore to frontend. FEXCore no longer needs this since all the signal handling is done in the frontend.	2023-04-26 01:48:37 -07:00
Ryan Houdek	056f44be0b	SignalDelegator: Moves all signal handling to the frontend This is a very OS specific operation and it living in FEXCore doesn't make much sense. This still requires some strong collaboration between FEXCore and the frontend but it is now split between the locations. There's still a bit more cleanup work that can be done after this is merged, but we need to get this burning fire out of the way. This is necessary for llvm-mingw, this requires all previous PRs to be merged first. After this is merged, most of the llvm-mingw work is complete, just some minor cleanups. To be merged first: - #2602 - #2604 - #2605 - #2607 - #2610 - #2615 - #2619 - #2621 - #2622 - #2624 - #2625 - #2626 - #2627 - #2628 - #2629	2023-04-26 01:24:11 -07:00
Mai	b5420f5db3	Merge pull request #2629 from Sonicadvance1/fexcore_cmake_mingw FEXCore: Fixup cmake file for mingw	2023-04-25 10:12:17 -04:00
Mai	c94268789b	Merge pull request #2619 from Sonicadvance1/fileloading_mingw FileLoading: Add WIN32 specific loading path	2023-04-25 10:11:14 -04:00
Mai	af15277fc4	Merge pull request #2615 from Sonicadvance1/fhu_mingw FHU/FS: Create WIN32 helpers for some functions.	2023-04-25 10:09:35 -04:00
Mai	86e09a00f0	Merge pull request #2610 from Sonicadvance1/mingw_virtual_alloc AllocatorHooks: Adds some mingw allocator helpers	2023-04-25 10:08:44 -04:00
Lioncache	c94721a04b	OpcodeDispatcher: Handle VPMASKMOVD/VPMASKMOVQ We can reuse the same helper we have for handling VMASKMOVPD and VMASKMOVPS, though we need to move some handling around to account for the fact that VPMASKMOVD and VPMASKMOVQ 'hijack' the REX.W bit to signify the element size of the operation.	2023-04-24 10:50:11 -04:00
Ryan Houdek	c87f361bb5	FEXCore: Stop exposing the x86 table data symbols This was only used for the unit test fuzzing framework. Which has been removed and unused for pretty much its entire lifespan. These can now be internal only.	2023-04-23 09:38:03 -07:00
Mai	0fa4390e47	Merge pull request #2622 from Sonicadvance1/dispatcher_signals Dispatcher: Disable signal handling under mingw	2023-04-21 21:43:30 -04:00
Mai	7a774a8d80	Merge pull request #2624 from Sonicadvance1/fexcore_cpuid FEXCore: Switch to xbyak for CPUID fetch helpers.	2023-04-21 21:42:54 -04:00
Mai	361e684c64	Merge pull request #2628 from Sonicadvance1/objectcache_mingw Disable AOT and object cache under mingw	2023-04-21 21:42:25 -04:00
Mai	4c74913edf	Merge pull request #2627 from Sonicadvance1/disable_break_mingw Disable Break/INT operations on mingw	2023-04-21 21:42:08 -04:00
Mai	059472fcef	Merge pull request #2621 from Sonicadvance1/object_cache_packed ObjectCache: Ensure correctly packed config option	2023-04-21 21:41:34 -04:00
Mai	4a11111abd	Merge pull request #2626 from Sonicadvance1/thunks_mingw Thunks: Disable under mingw	2023-04-21 21:40:40 -04:00
Mai	f673afc38f	Merge pull request #2625 from Sonicadvance1/gdbserver_mingw GdbServer: Disable under mingw	2023-04-21 21:40:24 -04:00
Mai	1fad26d72f	Merge pull request #2613 from Sonicadvance1/cpuinfo_mingw CPUInfo: Add mingw helper for CalculateNumberOfCPUs	2023-04-21 21:39:52 -04:00
Mai	2b5ddb6b93	Merge pull request #2607 from Sonicadvance1/mingw_softflow llvm-mingw: Fix SoftFloat compiling	2023-04-21 21:38:27 -04:00
Mai	c140dd7da8	Merge pull request #2605 from Sonicadvance1/aligned_alloc Allocator: Ensure uses of aligned allocations use aligned_free	2023-04-21 21:38:08 -04:00
Mai	da126141d3	Merge pull request #2604 from Sonicadvance1/move_config_paths Config: Move path generation to the frontend	2023-04-21 21:37:23 -04:00
Mai	874ae5b0fc	Merge pull request #2602 from Sonicadvance1/move_thread_creation Threads: Moves pthread logic to FEXLoader	2023-04-21 21:36:28 -04:00
Lioncache	651c6f8ddf	OpcodeDispatcher: Handle VCVTPS2PD/VCVTPD2PS	2023-04-18 10:29:57 -04:00
Lioncache	73ca4e5687	OpcodeDispatcher: Move vector float conversion to helper Will be used for implementing the equivalent AVX instructions.	2023-04-18 10:07:30 -04:00
Lioncache	cb9cc74fcc	OpcodeDispatcher: Handle AVX variants of float-to-float conversions Adds in the handling of destination type size differences with AVX. Also fixes cases where the SSE operations would load 128-bit vectors from meory, rather than only loading 64-bit vectors with VCVTPS2PD.	2023-04-18 09:52:28 -04:00
Lioncache	d1116456fc	OpcodeDispatcher: Handle VCVTSD2SS/VCVTSS2SD	2023-04-18 08:13:23 -04:00
Lioncache	84985952c9	OpcodeDispatcher: Factor out scalar floating-point conversion to helper Will be used to implement the AVX variants of VCVTSD2SS and VCVTSS2SD	2023-04-18 07:16:37 -04:00
Mai	a351620c60	Merge pull request #2634 from Sonicadvance1/missing_avx VEXTables: Adds a missing class of AVX instructions	2023-04-18 06:55:35 -04:00
Ryan Houdek	9117f7e724	VEXTables: Adds a missing class of AVX instructions These are all AVX1, not sure how I missed this. Sorry @lioncash, four more instructions.	2023-04-17 20:39:59 -07:00
Lioncache	8e391e7a61	Interpreter: Move PCMPESTRX fallback to VectorFallbacks Now that OpHandlers isn't coupled to the F80 ops anymore, we can move this over to its own file dedicated to vector fallbacks.	2023-04-17 22:57:09 -04:00
Lioncache	98fbc4a46d	Interpreter: Move OpHandler struct into its own header We can also provide a general rundown for hooking up interpreter fallbacks here for the uninitiated.	2023-04-17 22:55:02 -04:00
Lioncache	8481aeccb5	Interpreter: Move F80Ops.h into Fallback directory We can also rename it to F80Fallbacks.h to make the file purpose a little more explicit.	2023-04-17 22:54:58 -04:00
Lioncache	b1df63f425	Interpreter: Move fallbacks into new directory Will be used to store fallbacks and separate the definition struct from the F80 fallbacks	2023-04-17 22:05:00 -04:00
Lioncache	39c73d975b	OpcodeDispatcher: Handle PCMPESTRI/VPCMPESTRI	2023-04-17 21:42:58 -04:00
Lioncache	30cb1aaaed	IR: Add VPCMPESTRX fallback In order to implement the SSE4.2 string instructions in a reasonable manner, we can make use of a fallback implementation for the time being. This implementation just returns the intermediate result and leaves it up to the function making use of it to derive the final result from said intermediate result. This is fine, considering we have the immediate control byte that tells us exactly what is desired as far as output formats go. Given that the result of this IR op will never take up more than 16-bits, we store the flags we need to set in the upper 16 bits of the result to avoid needing to implement multiple return values in the JIT. Also, since the IR op just returns the intermediate result, this can be used to implement all of the explicit string instructions with a single IR op. The implementation is pretty heavily documented to help make heads or tails of these monster instructions.	2023-04-17 21:39:32 -04:00
Ryan Houdek	cbf41448fc	Thunks: Disable under mingw	2023-04-17 03:10:04 -07:00
Ryan Houdek	47bdc9af12	Config: Move realpath usage to FHU	2023-04-17 03:05:25 -07:00
Ryan Houdek	0fad5b88c1	FEXCore: Fixup cmake file for mingw - 64-bit allocator doesn't work under mingw atm. - Can't link against libdl - Can't have a SONAME because it is a PE, not a shared library.	2023-04-17 02:57:27 -07:00
Ryan Houdek	8c9fe0dd31	AOTIR: Disable loading and saving on mingw	2023-04-17 02:55:15 -07:00
Ryan Houdek	dda3afcfaf	ObjectCache: Disable job handling on mingw This isn't wired up anyway, but this needs to be disabled for now.	2023-04-17 02:55:11 -07:00
Ryan Houdek	34ceefb2c3	JIT64: Disable Break op on mingw No way to handle this currently.	2023-04-17 02:54:30 -07:00
Ryan Houdek	25ef63a069	OpcodeDispatcher: Disable INT instruction entirely under mingw Not yet able to handle this there.	2023-04-17 02:54:25 -07:00
Ryan Houdek	99a9c88f3f	GdbServer: Disable under mingw This needs to move to the frontend at some point.	2023-04-17 02:53:40 -07:00
Ryan Houdek	77f56199e8	Dispatcher: Disable signal handling under mingw This needs some hefty reconstructing	2023-04-17 02:53:03 -07:00
Ryan Houdek	6c13b629af	FEXCore: Switch to xbyak for CPUID fetch helpers. This will use the correct `__cpuid` define, either in cpuid.h or self-defined depending on environment. Otherwise we would need to define our own cpuid helpers to match the difference between mingw and linux.	2023-04-17 02:52:17 -07:00
Ryan Houdek	3ebe9f7b04	CPUInfo: Add mingw helper for CalculateNumberOfCPUs	2023-04-16 17:30:30 -07:00
Ryan Houdek	005389f8c1	llvm-mingw: Fix SoftFloat compiling	2023-04-16 00:30:28 -07:00
Mai	a33443db62	Merge pull request #2611 from Sonicadvance1/arm64_mingw ARM64Dispatcher: Fix compiling with mingw	2023-04-16 03:30:09 -04:00
Mai	68599bf124	Merge pull request #2618 from Sonicadvance1/corestate_mingw CoreState: Fix SynchronousFaultData padding type	2023-04-16 03:26:32 -04:00
Mai	4bffdc6345	Merge pull request #2612 from Sonicadvance1/frontend_mingw Frontend: Remove errant header	2023-04-16 03:25:03 -04:00
Mai	cbe55b0765	Merge pull request #2616 from Sonicadvance1/telemetry_mingw Telemetry: Disable on WIN32	2023-04-16 03:23:20 -04:00
Mai	2ff5096103	Merge pull request #2617 from Sonicadvance1/netstream_mingw Netstream: Disable on WIN32	2023-04-16 03:23:06 -04:00
Mai	797737a84d	Merge pull request #2609 from Sonicadvance1/mingw_threadname Threads: Adds SetThreadName helper	2023-04-16 03:22:46 -04:00
Mai	cfc1aa593b	Merge pull request #2620 from Sonicadvance1/ra_helper RA: Use FindFirstSetBit helper	2023-04-16 03:20:09 -04:00
Ryan Houdek	6b964f70e0	CPUID: Fix std::min type cast	2023-04-16 00:09:36 -07:00
Ryan Houdek	78844ee975	ObjectCache: Ensure correctly packed config option	2023-04-15 18:41:57 -07:00
Ryan Houdek	51afcb7143	RA: Use FindFirstSetBit helper	2023-04-15 18:41:35 -07:00
Ryan Houdek	5258b1972b	FileLoading: Add WIN32 specific loading path	2023-04-15 18:41:11 -07:00
Ryan Houdek	c6616d64d8	CoreState: Fix SynchronousFaultData padding type	2023-04-15 18:40:34 -07:00
Ryan Houdek	d9b9ce804b	Netstream: Disable on WIN32	2023-04-15 18:40:11 -07:00
Ryan Houdek	132aa7e4d3	Telemetry: Disable on WIN32	2023-04-15 18:39:44 -07:00
Ryan Houdek	fc00a31aee	Frontend: Remove errant header	2023-04-15 18:37:53 -07:00
Ryan Houdek	1de84110e8	ARM64Dispatcher: Fix compiling with mingw	2023-04-15 18:37:31 -07:00
Ryan Houdek	1962f036e1	ObjectCacheService: Use ThreadName helper	2023-04-15 18:21:42 -07:00
Ryan Houdek	879a081556	Threads: Adds SetThreadName helper	2023-04-15 18:21:42 -07:00
Ryan Houdek	105060363f	FEXCore: Move mmap allocators over to VirtualAlloc	2023-04-15 18:07:54 -07:00

... 3 4 5 6 7 ...

4056 Commits