Commit Graph

4056 Commits

Author SHA1 Message Date
Alyssa Rosenzweig
7e6bb04db1 OpcodeDispatcher: Extract CalculatePF
This does duplicate the _Constant(1) but it doesn't matter because it
gets inlined into the eor anyway. There is no functional change here.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-07-13 10:05:18 -04:00
Alyssa Rosenzweig
716cac35a8 OpcodeDispatcher: Fix PF calculation
We store garbage in the upper bits. That's ok, but it means we need to
mask on read for correct behaviour.

Closes #2767

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-07-13 08:38:46 -04:00
Ryan Houdek
9722c4c5a4
Merge pull request #2766 from alyssarosenzweig/flags/add-of
OpcodeDispatcher: Optimize ADD/ADC OF flag packing
2023-07-12 15:47:21 -07:00
Alyssa Rosenzweig
e8c0e19afc OpcodeDispatcher: "Calculcate" -> "Calculate"
Typofix.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-07-12 18:07:04 -04:00
Alyssa Rosenzweig
c559fec959 OpcodeDispatcher: Optimize ADD/ADC OF flag packing
We can fold the Not into the And. This requires flipping the arguments
to Andn, but we do not flip the order of the assignments since that
requires an extra register in a test I'm looking at.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-07-12 18:06:36 -04:00
Alyssa Rosenzweig
8d2fabe705 OpcodeDispatcher: Deduplicate ADD/ADC OF generation
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-07-12 18:06:36 -04:00
Ryan Houdek
5dbd1b8dc2 FEXCore: Removes unused TLS variable
Not sure why this still existed.
2023-07-12 13:05:47 -07:00
Ryan Houdek
5fef0c29aa FEXCore: Rename Telemetry helper function GetObject
WIN32 has a define already called `GetObject` and will cause our
symbol to have an A appended to it and break linking.

Just rename it to `GetTelemetryValue`
2023-07-12 11:53:13 -07:00
Ryan Houdek
d387c46aab FEXCore: Fixes WIN32 compiling again
Mostly a quick bandage while I'm setting getting ready to setup the
runners to test this for us.
2023-07-12 11:53:13 -07:00
Mai
ddd6dbfdcc
Merge pull request #2759 from Sonicadvance1/redundant_bfe_flags
OpcodeDispatcher: Remove spurious bfe with flag storing
2023-07-10 22:19:21 -04:00
Mai
7f2557e322
Merge pull request #2757 from Sonicadvance1/optimize_movss_reg
OpcodeDispatcher: Optimize MOVSS to register
2023-07-10 21:22:47 -04:00
Mai
810c7d926c
Merge pull request #2758 from Sonicadvance1/optimize_tso_vector_loadstores
IR: Optimize vector TSO loadstore address calculation
2023-07-10 21:21:46 -04:00
Ryan Houdek
04c325661c OpcodeDispatcher: Remove spurious bfe with flag storing
Noticed during introspection that we were generating zero constants
redundantly. Bunch of single cycle hits or zero-register renames.

Every time a `SetRFLAG` helper was called, it was /always/ doing a BFE
on everything passed in to extract the lowest bit. In nearly all cases
the data getting passed in is already only the lowest bit.

Instead, stop the helper from doing this BFE, and ensure the
OpcodeDispatcher does BFE in the couple of cases it still needs to do.

As I was skimming through all these to ensure BFE isn't necessary, I did
notice that some of the BCD instructions are wrong or questionable. So I
left a comment on those so we can come back to it.
2023-07-10 18:03:23 -07:00
Ryan Houdek
2d800b2627 IR: Optimize vector TSO loadstore address calculation
These address calculations were failing to understand that they can be
optimized. When TSO emulation is disabled these were fine, but with TSO
we were eating one more instruction.

Before:
```
add x20, x12, #0x4 (4)
dmb ish
ldr s16, [x20]
dmb ish
```

After:
```
dmb ish
ldr s16, [x12, #4]
dmb ish
```

Also left a note that once LRCPC3 is supported in hardware that we can do a similar optimization there.
2023-07-10 15:21:46 -07:00
Ryan Houdek
55ed3e0549 OpcodeDispatcher: Optimize MOVSS to register
Easily fixed. Found through inspection.

Before:
```
eor v0.16b, v0.16b, v0.16b
mov v0.s[0], v17.s[0]
mov v4.16b, v0.16b
mov v16.s[0], v4.s[0]
```

After:
```
mov v16.s[0], v17.s[0]
```
2023-07-10 14:36:27 -07:00
Ryan Houdek
55d084ebb0 OpcodeDispatcher: Optimize MOVSS to memory destination
Easy fixed. Found through inspection.

Before:
```
eor v0.16b, v0.16b, v0.16b
mov v0.s[0], v16.s[0]
mov v4.16b, v0.16b
str s4, [x11]
```

After:
```
str s16, [x11]
```
2023-07-10 14:25:01 -07:00
Mai
98eda5e163
Merge pull request #2749 from Sonicadvance1/optimize_away_redundant_masks
OpcodeDispatcher: Optimize some shifts size masking
2023-07-10 08:08:57 -04:00
Ryan Houdek
92d0344d6a OpcodeDispatcher: Fixes bug with pcmpestri
When this instruction returns the index in to the ecx register, this is
defined as a 32-bit result. This means it actually gets zero-extended to
the full 64-bit GPR size on 64-bit processes.
Previously FEX was doing a 32-bit insert which leaves garbage data in
the upper 32-bits of the RCX register.

Adds a unit test to ensure the result is zero extended.
Fixes running Java games under FEX now that SSE4.2 is exposed.
2023-07-08 18:08:47 -07:00
Ryan Houdek
9327435f97 OpcodeDispatcher: Optimize some shifts size masking
Inspired from #2561, these shifts  don't need to be masked if we know
their operating size up front.

Causes a handful of these to become more optimal.
2023-07-08 16:41:15 -07:00
Mai
8a4bfba47c
Merge pull request #2745 from Sonicadvance1/optimize_fcmov
OpcodeDispatcher: Optimize GetPackedRFLAG
2023-07-07 22:29:52 -04:00
Mai
69ea03f0eb
Merge pull request #2746 from Sonicadvance1/optimize_maskmov
OpcodeDispatcher: Optimize MASKMOVDQU and MASKMOVQ
2023-07-07 22:29:37 -04:00
Ryan Houdek
15f5fe658b OpcodeDispatcher: Optimize MASKMOVDQU and MASKMOVQ
This previous implementation was particularly gnarly. Because these
instructions are both weackly ordered and have implementation dependent
exception and trap behaviour these can actually be fairly conveniently
converted over to a load + cmlt + bsl + str instruction.

For the XMM variant this reduces code blowup from 80x to 15x!
For the MMX variant this reduces code blowup from 46x to 17x!

Both of these improvements are significant wins! There's still some
minor improvement that could be done with bsl that requires some
redundant moves, but since we don't have constraint support for this we
still eat two additional instructions

Before:
```asm
0x0000ffff7b800718  10ffffe0    adr x0, #-0x4 (addr 0xffff7b800714)
0x0000ffff7b80071c  f9005f80    str x0, [x28, #184]
0x0000ffff7b800720  4eb11e24    mov v4.16b, v17.16b
0x0000ffff7b800724  4eb01e05    mov v5.16b, v16.16b
0x0000ffff7b800728  aa0b03f4    mov x20, x11
0x0000ffff7b80072c  4e083c95    mov x21, v4.d[0]
0x0000ffff7b800730  4e083cb6    mov x22, v5.d[0]
0x0000ffff7b800734  d3471eb7    ubfx x23, x21, #7, #1
0x0000ffff7b800738  b4000077    cbz x23, #+0xc (addr 0xffff7b800744)
0x0000ffff7b80073c  d3401ed7    uxtb x23, w22
0x0000ffff7b800740  39000297    strb w23, [x20]
0x0000ffff7b800744  d34f3eb7    ubfx x23, x21, #15, #1
0x0000ffff7b800748  b4000077    cbz x23, #+0xc (addr 0xffff7b800754)
0x0000ffff7b80074c  d3483ed7    ubfx x23, x22, #8, #8
0x0000ffff7b800750  39000697    strb w23, [x20, #1]
0x0000ffff7b800754  d3575eb7    ubfx x23, x21, #23, #1
0x0000ffff7b800758  b4000077    cbz x23, #+0xc (addr 0xffff7b800764)
0x0000ffff7b80075c  d3505ed7    ubfx x23, x22, #16, #8
0x0000ffff7b800760  39000a97    strb w23, [x20, #2]
0x0000ffff7b800764  d35f7eb7    ubfx x23, x21, #31, #1
0x0000ffff7b800768  b4000077    cbz x23, #+0xc (addr 0xffff7b800774)
0x0000ffff7b80076c  d3587ed7    ubfx x23, x22, #24, #8
0x0000ffff7b800770  39000e97    strb w23, [x20, #3]
0x0000ffff7b800774  d3679eb7    ubfx x23, x21, #39, #1
0x0000ffff7b800778  b4000077    cbz x23, #+0xc (addr 0xffff7b800784)
0x0000ffff7b80077c  d3609ed7    ubfx x23, x22, #32, #8
0x0000ffff7b800780  39001297    strb w23, [x20, #4]
0x0000ffff7b800784  d36fbeb7    ubfx x23, x21, #47, #1
0x0000ffff7b800788  b4000077    cbz x23, #+0xc (addr 0xffff7b800794)
0x0000ffff7b80078c  d368bed7    ubfx x23, x22, #40, #8
0x0000ffff7b800790  39001697    strb w23, [x20, #5]
0x0000ffff7b800794  d377deb7    ubfx x23, x21, #55, #1
0x0000ffff7b800798  b4000077    cbz x23, #+0xc (addr 0xffff7b8007a4)
0x0000ffff7b80079c  d370ded7    ubfx x23, x22, #48, #8
0x0000ffff7b8007a0  39001a97    strb w23, [x20, #6]
0x0000ffff7b8007a4  d37ffeb5    lsr x21, x21, #63
0x0000ffff7b8007a8  b4000075    cbz x21, #+0xc (addr 0xffff7b8007b4)
0x0000ffff7b8007ac  d378fed5    lsr x21, x22, #56
0x0000ffff7b8007b0  39001e95    strb w21, [x20, #7]
0x0000ffff7b8007b4  4e183c95    mov x21, v4.d[1]
0x0000ffff7b8007b8  4e183cb6    mov x22, v5.d[1]
0x0000ffff7b8007bc  d3471eb7    ubfx x23, x21, #7, #1
0x0000ffff7b8007c0  b4000077    cbz x23, #+0xc (addr 0xffff7b8007cc)
0x0000ffff7b8007c4  d3401ed7    uxtb x23, w22
0x0000ffff7b8007c8  39002297    strb w23, [x20, #8]
0x0000ffff7b8007cc  d34f3eb7    ubfx x23, x21, #15, #1
0x0000ffff7b8007d0  b4000077    cbz x23, #+0xc (addr 0xffff7b8007dc)
0x0000ffff7b8007d4  d3483ed7    ubfx x23, x22, #8, #8
0x0000ffff7b8007d8  39002697    strb w23, [x20, #9]
0x0000ffff7b8007dc  d3575eb7    ubfx x23, x21, #23, #1
0x0000ffff7b8007e0  b4000077    cbz x23, #+0xc (addr 0xffff7b8007ec)
0x0000ffff7b8007e4  d3505ed7    ubfx x23, x22, #16, #8
0x0000ffff7b8007e8  39002a97    strb w23, [x20, #10]
0x0000ffff7b8007ec  d35f7eb7    ubfx x23, x21, #31, #1
0x0000ffff7b8007f0  b4000077    cbz x23, #+0xc (addr 0xffff7b8007fc)
0x0000ffff7b8007f4  d3587ed7    ubfx x23, x22, #24, #8
0x0000ffff7b8007f8  39002e97    strb w23, [x20, #11]
0x0000ffff7b8007fc  d3679eb7    ubfx x23, x21, #39, #1
0x0000ffff7b800800  b4000077    cbz x23, #+0xc (addr 0xffff7b80080c)
0x0000ffff7b800804  d3609ed7    ubfx x23, x22, #32, #8
0x0000ffff7b800808  39003297    strb w23, [x20, #12]
0x0000ffff7b80080c  d36fbeb7    ubfx x23, x21, #47, #1
0x0000ffff7b800810  b4000077    cbz x23, #+0xc (addr 0xffff7b80081c)
0x0000ffff7b800814  d368bed7    ubfx x23, x22, #40, #8
0x0000ffff7b800818  39003697    strb w23, [x20, #13]
0x0000ffff7b80081c  d377deb7    ubfx x23, x21, #55, #1
0x0000ffff7b800820  b4000077    cbz x23, #+0xc (addr 0xffff7b80082c)
0x0000ffff7b800824  d370ded7    ubfx x23, x22, #48, #8
0x0000ffff7b800828  39003a97    strb w23, [x20, #14]
0x0000ffff7b80082c  d37ffeb5    lsr x21, x21, #63
0x0000ffff7b800830  b4000075    cbz x21, #+0xc (addr 0xffff7b80083c)
0x0000ffff7b800834  d378fed5    lsr x21, x22, #56
0x0000ffff7b800838  39003e95    strb w21, [x20, #15]
0x0000ffff7b80083c  58000040    ldr x0, pc+8 (addr 0xffff7b800844)
0x0000ffff7b800840  d63f0000    blr x0
```

After:
```asm
0x0000ffff7ac00718  10ffffe0            adr x0, #-0x4 (addr 0xffff7ac00714)
0x0000ffff7ac0071c  f9005f80            str x0, [x28, #184]
0x0000ffff7ac00720  4e20aa24            cmlt v4.16b, v17.16b, #0
0x0000ffff7ac00724  3dc00165            ldr q5, [x11]
0x0000ffff7ac00728  4ea41c80            mov v0.16b, v4.16b
0x0000ffff7ac0072c  6e651e00            bsl v0.16b, v16.16b, v5.16b
0x0000ffff7ac00730  4ea01c04            mov v4.16b, v0.16b
0x0000ffff7ac00734  3d800164            str q4, [x11]
0x0000ffff7ac00738  58000040            ldr x0, pc+8 (addr 0xffff7ac00740)
0x0000ffff7ac0073c  d63f0000            blr x0
```
2023-07-07 18:37:17 -07:00
Ryan Houdek
052aa4317b OpcodeDispatcher: Optimize GetPackedRFLAG
Only return the particular flags that are being requested in the moment
since compacting them all when requested is fairly slow.

x87 fcmov in particular was requesting all the flags when it only needs
a couple.
This reduces a `fcmovb` instruction count blowup from 103x to 38x. Still
more room to go but this one stood out as being particularly bad.

Old:
```asm
0x0000000265a002bc  10ffffe0    adr x0, #-0x4 (addr 0x265a002b8)
0x0000000265a002c0  f9005f80    str x0, [x28, #184]
0x0000000265a002c4  d2800014    mov x20, #0x0
0x0000000265a002c8  d2800035    mov x21, #0x1
0x0000000265a002cc  d2800056    mov x22, #0x2
0x0000000265a002d0  394b0397    ldrb w23, [x28, #704]
0x0000000265a002d4  d3407ef7    ubfx x23, x23, #0, #32
0x0000000265a002d8  aa1702d6    orr x22, x22, x23
0x0000000265a002dc  394b0b97    ldrb w23, [x28, #706]
0x0000000265a002e0  d3407ef7    ubfx x23, x23, #0, #32
0x0000000265a002e4  531e76f7    lsl w23, w23, #2
0x0000000265a002e8  aa1702d6    orr x22, x22, x23
0x0000000265a002ec  394b1397    ldrb w23, [x28, #708]
0x0000000265a002f0  d3407ef7    ubfx x23, x23, #0, #32
0x0000000265a002f4  531c6ef7    lsl w23, w23, #4
0x0000000265a002f8  aa1702d6    orr x22, x22, x23
0x0000000265a002fc  394b1b97    ldrb w23, [x28, #710]
0x0000000265a00300  d3407ef7    ubfx x23, x23, #0, #32
0x0000000265a00304  531a66f7    lsl w23, w23, #6
0x0000000265a00308  aa1702d6    orr x22, x22, x23
0x0000000265a0030c  394b1f97    ldrb w23, [x28, #711]
0x0000000265a00310  d3407ef7    ubfx x23, x23, #0, #32
0x0000000265a00314  531962f7    lsl w23, w23, #7
0x0000000265a00318  aa1702d6    orr x22, x22, x23
0x0000000265a0031c  394b2397    ldrb w23, [x28, #712]
0x0000000265a00320  d3407ef7    ubfx x23, x23, #0, #32
0x0000000265a00324  53185ef7    lsl w23, w23, #8
0x0000000265a00328  aa1702d6    orr x22, x22, x23
0x0000000265a0032c  394b2797    ldrb w23, [x28, #713]
0x0000000265a00330  d3407ef7    ubfx x23, x23, #0, #32
0x0000000265a00334  53175af7    lsl w23, w23, #9
0x0000000265a00338  aa1702d6    orr x22, x22, x23
0x0000000265a0033c  394b2b97    ldrb w23, [x28, #714]
0x0000000265a00340  d3407ef7    ubfx x23, x23, #0, #32
0x0000000265a00344  531656f7    lsl w23, w23, #10
0x0000000265a00348  aa1702d6    orr x22, x22, x23
0x0000000265a0034c  394b2f97    ldrb w23, [x28, #715]
0x0000000265a00350  d3407ef7    ubfx x23, x23, #0, #32
0x0000000265a00354  531552f7    lsl w23, w23, #11
0x0000000265a00358  aa1702d6    orr x22, x22, x23
0x0000000265a0035c  394b3397    ldrb w23, [x28, #716]
0x0000000265a00360  d3407ef7    ubfx x23, x23, #0, #32
0x0000000265a00364  53144ef7    lsl w23, w23, #12
0x0000000265a00368  aa1702d6    orr x22, x22, x23
0x0000000265a0036c  394b3b97    ldrb w23, [x28, #718]
0x0000000265a00370  d3407ef7    ubfx x23, x23, #0, #32
0x0000000265a00374  531246f7    lsl w23, w23, #14
0x0000000265a00378  aa1702d6    orr x22, x22, x23
0x0000000265a0037c  394b4397    ldrb w23, [x28, #720]
0x0000000265a00380  d3407ef7    ubfx x23, x23, #0, #32
0x0000000265a00384  53103ef7    lsl w23, w23, #16
0x0000000265a00388  aa1702d6    orr x22, x22, x23
0x0000000265a0038c  394b4797    ldrb w23, [x28, #721]
0x0000000265a00390  d3407ef7    ubfx x23, x23, #0, #32
0x0000000265a00394  530f3af7    lsl w23, w23, #17
0x0000000265a00398  aa1702d6    orr x22, x22, x23
0x0000000265a0039c  394b4b97    ldrb w23, [x28, #722]
0x0000000265a003a0  d3407ef7    ubfx x23, x23, #0, #32
0x0000000265a003a4  530e36f7    lsl w23, w23, #18
0x0000000265a003a8  aa1702d6    orr x22, x22, x23
0x0000000265a003ac  394b4f97    ldrb w23, [x28, #723]
0x0000000265a003b0  d3407ef7    ubfx x23, x23, #0, #32
0x0000000265a003b4  530d32f7    lsl w23, w23, #19
0x0000000265a003b8  aa1702d6    orr x22, x22, x23
0x0000000265a003bc  394b5397    ldrb w23, [x28, #724]
0x0000000265a003c0  d3407ef7    ubfx x23, x23, #0, #32
0x0000000265a003c4  530c2ef7    lsl w23, w23, #20
0x0000000265a003c8  aa1702d6    orr x22, x22, x23
0x0000000265a003cc  394b5797    ldrb w23, [x28, #725]
0x0000000265a003d0  d3407ef7    ubfx x23, x23, #0, #32
0x0000000265a003d4  530b2af7    lsl w23, w23, #21
0x0000000265a003d8  aa1702d6    orr x22, x22, x23
0x0000000265a003dc  924002d6    and x22, x22, #0x1
0x0000000265a003e0  93400294    sbfx x20, x20, #0, #1
0x0000000265a003e4  934002b5    sbfx x21, x21, #0, #1
0x0000000265a003e8  f10002df    cmp x22, #0x0 (0)
0x0000000265a003ec  9a950294    csel x20, x20, x21, eq
0x0000000265a003f0  4e080e84    dup v4.2d, x20
0x0000000265a003f4  394baf94    ldrb w20, [x28, #747]
0x0000000265a003f8  91000695    add x21, x20, #0x1 (1)
0x0000000265a003fc  92400ab5    and x21, x21, #0x7
0x0000000265a00400  d2800200    mov x0, #0x10
0x0000000265a00404  9b007e80    mul x0, x20, x0
0x0000000265a00408  8b000380    add x0, x28, x0
0x0000000265a0040c  3dc0bc05    ldr q5, [x0, #752]
0x0000000265a00410  d2800200    mov x0, #0x10
0x0000000265a00414  9b007ea0    mul x0, x21, x0
0x0000000265a00418  8b000380    add x0, x28, x0
0x0000000265a0041c  3dc0bc06    ldr q6, [x0, #752]
0x0000000265a00420  4ea41c80    mov v0.16b, v4.16b
0x0000000265a00424  6e651cc0    bsl v0.16b, v6.16b, v5.16b
0x0000000265a00428  4ea01c04    mov v4.16b, v0.16b
0x0000000265a0042c  d2800200    mov x0, #0x10
0x0000000265a00430  9b007e80    mul x0, x20, x0
0x0000000265a00434  8b000380    add x0, x28, x0
0x0000000265a00438  3d80bc04    str q4, [x0, #752]
0x0000000265a0043c  58000040    ldr x0, pc+8 (addr 0x265a00444)
0x0000000265a00440  d63f0000    blr x0
```

New:
```asm
0x0000000265a002bc  10ffffe0    adr x0, #-0x4 (addr 0x265a002b8)
0x0000000265a002c0  f9005f80    str x0, [x28, #184]
0x0000000265a002c4  d2800014    mov x20, #0x0
0x0000000265a002c8  d2800035    mov x21, #0x1
0x0000000265a002cc  d2800056    mov x22, #0x2
0x0000000265a002d0  394b0397    ldrb w23, [x28, #704]
0x0000000265a002d4  330002f6    bfxil w22, w23, #0, #1
0x0000000265a002d8  924002d6    and x22, x22, #0x1
0x0000000265a002dc  93400294    sbfx x20, x20, #0, #1
0x0000000265a002e0  934002b5    sbfx x21, x21, #0, #1
0x0000000265a002e4  f10002df    cmp x22, #0x0 (0)
0x0000000265a002e8  9a950294    csel x20, x20, x21, eq
0x0000000265a002ec  4e080e84    dup v4.2d, x20
0x0000000265a002f0  394baf94    ldrb w20, [x28, #747]
0x0000000265a002f4  91000695    add x21, x20, #0x1 (1)
0x0000000265a002f8  92400ab5    and x21, x21, #0x7
0x0000000265a002fc  d2800200    mov x0, #0x10
0x0000000265a00300  9b007e80    mul x0, x20, x0
0x0000000265a00304  8b000380    add x0, x28, x0
0x0000000265a00308  3dc0bc05    ldr q5, [x0, #752]
0x0000000265a0030c  d2800200    mov x0, #0x10
0x0000000265a00310  9b007ea0    mul x0, x21, x0
0x0000000265a00314  8b000380    add x0, x28, x0
0x0000000265a00318  3dc0bc06    ldr q6, [x0, #752]
0x0000000265a0031c  4ea41c80    mov v0.16b, v4.16b
0x0000000265a00320  6e651cc0    bsl v0.16b, v6.16b, v5.16b
0x0000000265a00324  4ea01c04    mov v4.16b, v0.16b
0x0000000265a00328  d2800200    mov x0, #0x10
0x0000000265a0032c  9b007e80    mul x0, x20, x0
0x0000000265a00330  8b000380    add x0, x28, x0
0x0000000265a00334  3d80bc04    str q4, [x0, #752]
0x0000000265a00338  58000040    ldr x0, pc+8 (addr 0x265a00340)
0x0000000265a0033c  d63f0000    blr x0
```
2023-07-07 17:01:59 -07:00
Ryan Houdek
debcb0e047 Arm64: Optimize BFI in the case that Dst == srcDst
ARM64 BFI doesn't allow you to encode two source registers here to match
our SSA semantics. Also since we don't support RA constraints to ensure
that these match, just do the optimal case in the backend.

Leave a comment for future RA contraint excavators to make this more
optimal
2023-07-07 16:43:41 -07:00
Ryan Houdek
baf04b6a41 FEXCore: Minor cleanup
This isn't required anymore since we are exposing the virtual class
directly.
2023-07-07 15:06:14 -07:00
Ryan Houdek
f9b352a093 Linux: Fixes hangs due to mutexes locked while fork happens.
When a fork occurs FEX needs to be incredibly careful as any thread
(that isn't forking) that holds a lock will vanish when the fork occurs.

At this point if the newly forked process tries to use these mutexes
then the process hangs indefinitely.

The three major mutexes that need to be held during a fork:
- Code Invalidation mutex
  - This is the highest priority and causes us to hang frequently.
  - This is highly likely to occur when one thread is loading shared
    libraries and another thread is forking.
     - Happens frequently with Wine and steam.
- VMA tracking mutex
  - This one happens when one thread is allocating memory while a fork
    occurs.
  - This closely relates to the code invalidation mutex, just happens at
    the syscall layer instead of the FEXCore layer.
  - Happens as frequently as the code invalidation mutex.
- Allocation mutex
  - This mutex is used for FEX's 64-bit Allocator, this happens when FEX
    is allocating memory on one thread and a fork occurs.
  - Fairly infrequent because jemalloc doesn't allocate VMA regions that
    often.

While this likely doesn't hit all of the FEX mutexes, this hits the ones
that are burning fires and are happening frequently.

- FEXCore: Adds forkable mutex/locks

Necessary since we have a few locations in FEX that need to be locked
before and after a fork.

When a fork occurs the locks must be locked prior to the fork. Then
afterwards they either need to unlock or be set to default
initialization state.
- Parent
   - Does an unlock
- Child
   - Sets the lock to default initialization state
   - This is because it pthreads does TID based ownership checking on
     unique locks and refcount based waiting for shared locks.
   - No way to "unlock" after fork in this case other than default
     initializing.
2023-07-04 02:13:06 -07:00
Ryan Houdek
d2032da452
Merge pull request #2737 from bylaws/main
Some small fixes for android building
2023-07-01 14:59:18 -07:00
Billy Laws
35c52f20f9 AllocatorHooks: Avoid referencing valloc on Android
This is not implemented in bionic, so follow the MINGW approach and implement it with _aligned_alloc.
2023-07-01 22:21:16 +01:00
Billy Laws
17c82c22a6 JitSymbols: Store symbol mappings in /data/local/tmp on Android 2023-07-01 22:13:44 +01:00
Ryan Houdek
df03a7b101 unittests/Emitter: Adds CSSC tests 2023-06-30 19:34:35 -07:00
Ryan Houdek
c859540d7e Emitter: Adds support for CSSC
Not used currently but will be used in the future.
2023-06-30 19:34:35 -07:00
Ryan Houdek
20794593e7 unittests/Emitter: Update tests for updated vixl
Output in vixl changed for some of these. Most for the better but not
all of them.
2023-06-30 19:34:35 -07:00
Ryan Houdek
a80a2bf569 External/vixl: Update 2023-06-30 19:11:22 -07:00
Mai
1a4d5a1abb
Merge pull request #2733 from Sonicadvance1/fix_jemalloc_checks
External/jemalloc: Updates external jemallocs
2023-06-30 17:57:29 -04:00
Ryan Houdek
677b72c9a5 External/jemalloc: Updates external jemallocs
Fixes their `malloc_usable_size` checks.
2023-06-28 09:26:45 -07:00
Ryan Houdek
71a8c66c95 Context: Removes dead AddVirtualMemoryMapping function
This has been around since the initial commit. Bad idea that wasn't ever
thought through. Something about remapping guest virtual and host
virtual memory which will never be a thing.
2023-06-28 09:18:36 -07:00
Lioncache
bf773452ac IR: Add missing formatters
Currently RegisterClassType and FenceType are passed into logs, which
fmt 10.0.0 is more strict about. Adds the formatters that were missing
so that compilation can succeed without needing to change all log sites.
2023-06-17 09:42:31 -04:00
Lioncache
95dbccc0ab Externals: Update fmt to 10.0.0
Keeps ourselves up to date with the latest major release.
2023-06-17 09:25:20 -04:00
Ryan Houdek
e5189d63a2
Merge pull request #2708 from Sonicadvance1/fix_paranoidtso
Arm64: Fixes paranoidtso option for CPUs that support LRCPC/2
2023-06-16 13:32:43 -07:00
Ryan Houdek
9dcc1deec0
Merge pull request #2722 from Sonicadvance1/rip_reconstruction
JIT: Implement support for per-instruction RIP reconstruction
2023-06-16 13:02:14 -07:00
Ryan Houdek
66d4206cd7
Merge pull request #2719 from lioncash/flags
OpcodeDispatcher: Ensure MXCSR is saved/restored with FXSAVE/FXRSTOR
2023-06-16 13:01:56 -07:00
Lioncache
01837b3ad6 IR: Remove HasSideEffects for VPCMPXSTRX ops
This is a leftover from early on and not necessary, since we
don't operate on any state other than what is provided to the
IR op itself.
2023-06-16 11:53:31 -04:00
Lioncache
bdb68840e3 IR: Move VPCMPESTRX REX handling to OpcodeDispatcher
We can handle this in the dispatcher itself, so that we don't need to pass along
the register size as a member of the opcode. This gets rid of some unnecessary duplication
of functionality in the backends and makes it so potential backends don't need to deal
with this.
2023-06-16 11:49:36 -04:00
Lioncache
4e2dcf3298 OpcodeDispatcher: Ensure MXCSR is saved/restored with FXSAVE/FXRSTOR
Previously, the bits that we support in the MXCSR weren't being saved,
which means that some opcode patterns may fail to restore the rounding mode
properly.

e.g. FXSAVE, followed by FNINIT, followed by FXRSTOR wouldn't restore the
     rounding mode properly

This fixes that.
2023-06-16 09:25:53 -04:00
Ryan Houdek
628f825416 JIT: Implement support for per-instruction RIP reconstruction
FEX's current implementation of RIP reconstruction is limited to the
entrypoint that a single block has. This will cause the RIP to be
incorrect past the first instruction in that block.

While this is fine for a decent number of games, especially since fault
handling isn't super common. This doesn't work for all situations.

When testing Ultimate Chicken Horse, we found out that changing the
block size to 1 worked around an early crash in the game's startup.
This game is likely relying on Mono/Unity's AOT compilation step, which
does some more robust faulting that the runtime JIT. Needing the RIP to
be correct since they do some sort of checking for what the code came
from.

This fixes Ultimate Chicken Horse specifically, but will likely fix
other games that are built the same way.
2023-06-14 17:28:56 -07:00
Ryan Houdek
a80327f6df X86Tables: Adds some missing MEM_ACCESS flags to REP instructions 2023-06-14 17:04:50 -07:00
Ryan Houdek
c9712e45cb Arm64: Fixes GPR pair allocation to get one pair back
When executing a 32-bit application we were failing to allocate a single
GPR pair. This meant we only have 7 pairs when we could have had 8.

This was because r30 was ending up in the middle of the allocation
arrays so we couldn't safely create a sequential pair of registers.

Organize the register allocation arrays to be unique for each bitness
being executed and then access them through spans instead.

Also works around bug where the RA validation doesn't understand when pair
indexes don't correlate directly to GPR indexes. So while the previous
PR fixed the RA pass, it didn't fix the RA validation pass.

Noticed this when pr57018 32-bit gcc test was run with the #2700 PR
which improved the RA allocation a bit.
2023-06-13 20:04:51 -07:00
Lioncache
9017325c95 CPUID: Signify support for XSAVE if AVX is enabled
Now that XSAVE and XRSTOR are implemented, we can enable the
CPUID bits for them when AVX support is enabled.
2023-06-13 19:21:14 -04:00
Lioncache
ae536e44d7 OpcodeDispatcher: Handle XRSTOR 2023-06-13 17:47:45 -04:00
Lioncache
7679485cc3 OpcodeDispatcher: Handle XSAVE 2023-06-13 15:01:33 -04:00
Ryan Houdek
537562fab7 Arm64: Fixes register pair conflict.
When FEX was updated to reclaim 64-bit registers in #2494, I had
mistakenly messed up pair register class conflicts.

The problem is that FEX has r30 stuck in the middle of the RA which
causes the paired registers to need to offset their index half way.

This meant that the conflict index being incorrect was always broken on
32-bit applications ever since that PR.

Keep the intersection indexes in their own array so to can be correctly
indexed at runtime.

Thanks to @asahilina finding out that Osmos started crashing a few
months ago and I finally just got around to bisecting what the problem
was.
This now fixes Osmos from crashing, although the motes are still
invisible on the 32-bit application. Not sure what other havok this has
been causing.
2023-06-12 23:31:16 -07:00
Mai
f8721992c2
Merge pull request #2712 from Sonicadvance1/fix_jemalloc_generate
External: Update jemalloc trees
2023-06-12 17:12:24 -04:00
Lioncache
755600c371 CPUID: Signify support for SSE4.2
With all the kinks worked out of these instructions, we can finally enable SSE4.2
2023-06-12 13:19:38 -04:00
Lioncache
bec8b70e5d VectorFallbacks: Fix PCMPSTR fallback ZF/SF flag setting
So, uh, this was a little silly to track down. So, having the upper limit
as unsigned was a mistake, since this would cause negative valid lengths to
convert into an unsigned value within the first two flag comparison cases

A -1 valid length can occur if one of the strings starts with a null character
in a vector's first element. (It will be zero and we then subtract it to
make the length zero-based).

Fixes this edge-case up and expands a test to check for this in the future.
2023-06-12 13:13:24 -04:00
Ryan Houdek
bef8ddde48 External: Update jemalloc trees
Allows us to generate a header at compile time for OS specific features.
Should fix compiling on Android since they have a different function
declaration for `malloc_usable_size` compared to Linux.
2023-06-12 09:34:30 -07:00
Mai
fe06f1b151
Merge pull request #2711 from Sonicadvance1/pad_ir_header_32bit
IR: Pad IROp_Header to be 32-bit in width
2023-06-11 05:49:00 -04:00
Ryan Houdek
92a15e00c7 IR: Pad IROp_Header to be 32-bit in width
We spent a bit of effort removing 8-bits from this header to get it down
to three bytes. This ended up in PRs #2319 and #2320

There was no explicit need to go down to three bytes, the other two
arguments we were removing were just better served to be lookups instead
of adding IR overhead for each operation.

This now introduced alignment issues that was brought up in #2472.
Apparently the Android NDK's clang will pad nested structs like this,
maybe to match alignment? Regardless we should just make it be 32-bit.

This fixes Android execution of FEXCore.
This fixes #2472

Pros:
- Initialization now turns in to a single str because it's 32-bit
- We have 8-bits more space that we can abuse in the IR op now
   - If we need more than 64-bit and 128-bit are easy bumps in the
     future

Cons:
- Each IR operation takes at minimum 25% more space in the intrusive
  allocators
   - Not really that big of a deal since we are talking 3 bytes versus
     4.
2023-06-10 12:38:03 -07:00
Ryan Houdek
7ceadc6b5b Move config layers to the frontend
FEXCore has no need to understand how to load these layers. Which
requires json parsing.

Move these to the frontend which is already doing the configuration
layer setup and initialization tasks anyway.

Means FEXCore itself no longer needs to link to tiny-json which can be
left to the frontend.
2023-06-09 18:15:40 -07:00
Ryan Houdek
8c41e8f7d8 Arm64: Fixes paranoidtso option for CPUs that support LRCPC/2
Regular LoadStoreTSO operations have gained support for LRCPC and LRCPC2
which changes the semantics of the operation by letting it support
immediate offsets.

The paranoid version of these operations didn't support the immediate
offsets yet which was causing incorrect memory loadstores.

Bring over the new semantics from the regular LoadStoreTSO but without
any nop padding.
2023-06-09 16:32:28 -07:00
Ryan Houdek
784b3064fc ArchHelpers: Convert a couple of magic numbers to constants
Makes this easier to read.
2023-06-09 16:31:44 -07:00
Ryan Houdek
5b5808218b
Merge pull request #2703 from Sonicadvance1/minor_of_opt
OpcodeDispatcher: Optimize ADC/ADD OF flag calculation
2023-06-07 12:54:55 -07:00
Ryan Houdek
41ec987f3e OpcodeDispatcher: Optimize ADC/ADD OF flag calculation
`eor <reg>, <reg>, #-1` can't be encoded as an instruction. Instead use
mvn which does the same thing.

Removes a single instruction from each OF calculation for ADC and ADD.

Also no reason to use a switch statement for the source size, just use
_Bfe and calculate the offset based on operation size.

SBB caught in the crossfire to ensure it also isn't using a switch
statement.
2023-06-07 12:40:51 -07:00
Ryan Houdek
03f73531d3 IRDumper: Fixes ssa number in arguments.
This can spuriously end up as a hex number which makes it hard to reason
why DCE wasn't deleting IR operations. Ensure it is always a decimal.
2023-06-07 09:52:04 -07:00
Ryan Houdek
a2cbfccb3b OpcodeDispatcher: Optimize EFLAG unpacking
Noticed this was slightly unoptimal. Resulting in a 18% code reduction
in the case of of a simple four instruction test ASM case.
2023-06-06 17:56:25 -07:00
Mai
4e01452a65
Merge pull request #2699 from Sonicadvance1/minor_fcmov_opt
X87: Super minor FCMOV optimization
2023-06-06 20:22:40 -04:00
Mai
cc7a56b1a6
Merge pull request #2689 from Sonicadvance1/fix_bmi
CPUID: Only enable BMI1 and BMI2 if AVX is supported
2023-06-06 20:21:57 -04:00
Ryan Houdek
0b0dd3891e X87: Super minor FCMOV optimization
This caught my eye as I was skimming, remove one IR op per FCMOV
instruction.

This was just duplicating the generated GPR mask across the FPR.
2023-06-04 06:39:35 -07:00
Ryan Houdek
96a0364a86 Review comments 2023-06-02 21:53:52 -07:00
Ryan Houdek
c0a783997d Convert remaining memory tracking to deferred signals 2023-06-01 11:35:22 -07:00
Ryan Houdek
f78537109d Core: Convert mtrack code invalidation over to deferred signals 2023-06-01 11:35:22 -07:00
Ryan Houdek
0c156ed6f9 Context: Switch over to deferred signals 2023-06-01 11:28:04 -07:00
Ryan Houdek
8840b2154c Allocator: Allow more optimal deferred signals path 2023-06-01 11:28:04 -07:00
Ryan Houdek
e02be8073e FEXCore: Support deferred signal mutex
This is part of FEXCore since it pulls in InternalThreadData, but is
related to the FHU signal mutex class.

Necessary to allow deferring signals in C++ code rather than right in
the JIT.
2023-06-01 11:28:04 -07:00
Ryan Houdek
f75d3550b4 Jit64: Used deferred signals in dispatcher 2023-06-01 11:28:04 -07:00
Ryan Houdek
802c588695 Arm64: Use deferred signals in dispatcher 2023-06-01 11:28:04 -07:00
Ryan Houdek
fd962f40d7 SignalDelegator: Support deferring signals 2023-06-01 11:28:04 -07:00
Ryan Houdek
a9b660af69 CoreState: Add new members to track deferred signal capability 2023-06-01 11:28:04 -07:00
Ryan Houdek
5be798e9e6
Merge pull request #2693 from Sonicadvance1/remove_debug
Context: Remove debug namespace
2023-06-01 11:26:05 -07:00
Ryan Houdek
c9d1f0d75a
Merge pull request #2687 from Sonicadvance1/telemetry_save_crash
Telemetry: Save on signal terminate
2023-05-30 10:26:03 -07:00
Ryan Houdek
1dc4f8c429 Context: Remove debug namespace
Unused and broken
2023-05-30 09:00:57 -07:00
Ryan Houdek
45d3b83143 Telemetry: Save on signal terminate
When a signal handler is not installed and is a terminal failure, make
sure to save telemetry before faulting.

We know when an application is going down in this case so we can make
sure to have the telemetry data saved.

Adds a telemetry signal mask data point as well to know which signal
took it down.
2023-05-30 08:49:33 -07:00
Ryan Houdek
c9101d3f68 CPUID: Only enable BMI1 and BMI2 if AVX is supported
These two extensions rely on AVX being supported to be used. Primarily
because they are VEX encoded.

GTA5 is using these flags to determine if it should enable its AVX
support.
2023-05-26 20:48:36 -07:00
Ryan Houdek
a6c6248bcb ArmEmitter: Fixes bug in SpillStaticRegs
Some code in FEX's Arm64 emitter was making an assumption that once
SpillStaticRegs was called that it was safe to still use the SRA
register state.
This wasn't actually true since FEX was using one SRA register to
optimize FPR stores. Assuming that the SRA registers were safe to use
since they were just saved and no longer necessary.

Correct this assumption hell by forcing users of the function to provide
the temporary register directly. In all cases the users have a temporary
available that it can use.

Probably fixes some very weird edge case bugs.
2023-05-22 16:48:07 -07:00
Ryan Houdek
5646428640 FEXCore: Implements support for xgetbv
This returns the `XFEATURE_ENABLED_MASK` register which reports what
features are enabled on the CPU.
This behaves similarly to CPUID where it uses an index register in ecx.

This is a prerequisite to enabling XSAVE/XRSTOR and AVX since
applications will expect this to exist.

xsetbv is a privileged instruction and doesn't need to be implemented.
2023-05-22 16:48:07 -07:00
Ryan Houdek
6ef6d9c391 Thunks: Mostly reverts #2672
I forgot that x11 was part of the custom ABI of thunks. #2672 had broken
thunks on ARM64. I thought I had tested a game with them enabled but
apparently I tested the wrong game.

Not a full revert since we can still ldr with a literal, but we also
still need to adr x11 and nop pad. At least removes the data dependency
on x11 from the ldr.
2023-05-18 15:50:55 -07:00
Ryan Houdek
3a4a965347 TestHarnessRunner: Support exiting on HLT
Currently WINE's longjump doesn't work, so instead set a flag that if
HLT is attempted, just exit the JIT.

This will get our unittests executing at least.
2023-05-17 21:09:31 -07:00
Ryan Houdek
45cdab2ac3 HostFeatures: Use ID registers under Wine
InferFromOS doesn't work under WINE.
InferFromIDRegisters doesn't work under Windows but it will under Wine.

Since we don't support Windows, just use InferFromIDRegisters.
2023-05-17 21:07:40 -07:00
Ryan Houdek
d675b4af6f External: Update vixl 2023-05-17 21:07:40 -07:00
Ryan Houdek
363411f0c7 ArchHelpers: Adds missing stub function 2023-05-17 21:05:55 -07:00
Ryan Houdek
5bc418407c FEXCore: Disable emitter unit tests on win32 2023-05-17 21:05:55 -07:00
Ryan Houdek
61ca651fe1 FEXCore: Don't initialize ThunkHandler on Win32
Adds a couple pointer checks to ensure it won't crash.

Doesn't work and will cause assertions.
2023-05-17 21:05:55 -07:00
Mai
77e8be1215
Merge pull request #2671 from Sonicadvance1/wine_syscalls
FEXCore: Support Wine syscalls
2023-05-18 00:04:25 -04:00
Lioncache
f7c663240e OpcodeDispatcher: Handle PCMPESTRM/VPCMPESTRM
...and with that all of the SSE4.2 string instructions are implemented now
2023-05-17 00:21:55 -04:00
Lioncache
82b4aef30d OpcodeDispatcher: Handle PCMPISTRM/VPCMPISTRM 2023-05-16 22:59:54 -04:00
Lioncache
22919a5b65 OpcodeDispatcher: Add mask variant handling to PCMPXSTXOpImpl()
Will be used to handle PCMPESTRM/PCMPISTRM instruction variants.
2023-05-16 22:59:52 -04:00
Ryan Houdek
f47caf48c6
Merge pull request #2669 from Sonicadvance1/aotir_mutex
AOTIR: Stop passing a mutex around. It's already guarded
2023-05-12 18:56:55 -07:00
Ryan Houdek
5674d3a871
Merge pull request #2667 from Sonicadvance1/fextl_file
FEXCore: Convert Core and Telemetry over to fextl::file::File
2023-05-12 18:56:45 -07:00
Mai
e03b859c20
Merge pull request #2673 from Sonicadvance1/remove_warnings_13
OpcodeDispatcher: Removes a warning that cropped up.
2023-05-12 21:49:43 -04:00
Ryan Houdek
7d822ba1c8 OpcodeDispatcher: Removes a warning that cropped up. 2023-05-12 17:34:20 -07:00
Ryan Houdek
f90dcd2eb1 FEXCore: Convert Core and Telemetry over to FEXCore::File::File
This way telemetry and IR dumping can work under Wine.
2023-05-12 17:32:48 -07:00
Ryan Houdek
adbdd33ece fextl/fmt: Adds write handler for FEXCore::File::File 2023-05-12 17:32:48 -07:00
Ryan Houdek
06250d806d FEXCore/Utils: Adds File type
OS agnostic file class since we can't use std::FILE
2023-05-12 17:32:48 -07:00
Ryan Houdek
613ed559e7 Thunks: Optimize ARM64 trampoline
No need to use adr for getting the PC relative literal, we can use LDR
(literal) to load the PC relative address directly.

Reduces trampline instructions from 3 to 2, also reduces trampoline size
from 24-bytes to 16-bytes.
2023-05-12 17:28:36 -07:00
Ryan Houdek
8ac3841946 FEXCore: Support Wine syscalls
Wine syscalls need to end the code block at the point of the syscall.
This is because syscalls may update RIP which means the JIT loop needs
to immediately restart.

Additionally since they can update CPU state, make wine syscalls not
return a result and instead refill the register state from the CPU
state. This will mean the syscall handler will need to update their
result register (RAX?) before returning.
2023-05-12 16:42:26 -07:00
Ryan Houdek
458259bf47 FEXCore: Move EnumOperators to FEXCore
fextl needs this and can't depend on FHU
2023-05-12 15:23:00 -07:00
Ryan Houdek
2fc529d5b7 AOTIR: Stop passing a mutex around. It's already guarded 2023-05-11 03:56:33 -07:00
Ryan Houdek
ea489567da ARM64: Fixes SRA disabled codepath
Disabling SRA has been broken a quite a while. Disabling this was
instrumental in figuring out the VC redistributable crash.

Ensure it works by reintroducing non-SRA load/store register handlers,
and by supporting runtime selectable dispatch pointers for the JIT.

Side-bonus, moves the {LOAD,STORE}MEMTSO ops over to this dispatch as
well to make it consistent and probably slightly quicker.
2023-05-11 03:25:19 -07:00
Ryan Houdek
6eae064511 FEXCore: Adds support for hardware x86-TSO prctl
From https://github.com/AsahiLinux/linux/commits/bits/220-tso

This fails gracefully in the case the upstream kernel doesn't support
this feature, so can go in early.

This feature allows FEX to use hardware's TSO emulation capability to
reduce emulation overhead from our atomic/lrcpc implementation.
In the case that the TSO emulation feature is enabled in FEX, we will
check if the hardware supports this feature and then enable it.

If the hardware feature is supported it will then use regular memory
accesses with the expectation that these are x86-TSO in strength.

The only hardware that anyone cares about that supports this is Apple's
M class SoCs. Theoretically NVIDIA Denver/Carmel supports sequentially
consistent, which isn't quite the same thing. I haven't cared to check
if multithreaded SC has as strong of guarantees. But also since
Carmel/Denver hardware is fairly rare, it's hard to care about for our
use case.
2023-05-08 20:12:03 -07:00
Ryan Houdek
2d4bf97cac FEXCore: Moves SIGBUS handler to FEXCore/Utils
This can be done in an OS agnostic fashion. FEXCore knows the details of
its JIT and should be done in FEXCore itself.

The frontend is only necessary to inform FEXCore where the fault occured
and provide the array of GPRs for accessing and modifying the signal
state.

This is necessary for supporting both Linux and Wine signal contexts
with their unaligned access handlers.
2023-05-05 17:04:26 -07:00
Mai
f7d827a26a
Merge pull request #2662 from Sonicadvance1/disable_rdtscp
CPUID: Disable RDTSCP under wine
2023-05-05 17:33:20 -04:00
Ryan Houdek
37b5bc49c6
Merge pull request #2656 from Sonicadvance1/fexcore_no_exceptions
FEXCore: Compile without exceptions
2023-05-05 14:32:37 -07:00
Ryan Houdek
dcb3f182d6 CPUID: Disable RDTSCP under wine
We don't have a sane way to query cpu index under wine. We could
technically still use the syscall since we know that we are still
executing under Linux, but that seems a bit terrible.

Disable for now until something can be worked out. Not like it is used
heavily anyway.
2023-05-05 13:52:39 -07:00
Mai
ba45bf4ae7
Merge pull request #2661 from Sonicadvance1/virtual_alloc_base
Allocator: Adds VirtualAlloc with memory Base hint function
2023-05-05 14:35:24 -04:00
Mai
121d9fda2d
Merge pull request #2660 from Sonicadvance1/arm64_win32_ra
Arm64Emitter: Replace x18 usage with x30
2023-05-05 14:35:02 -04:00
Mai
73ede9d000
Merge pull request #2659 from Sonicadvance1/save_platform_register
ARM64Emitter: Ensure platform register is saved on win32
2023-05-05 14:34:25 -04:00
Mai
6dfea8a80f
Merge pull request #2657 from Sonicadvance1/remove_unnecessary_guard
LookupCache: Removes unnecessary recursive lock_guard
2023-05-05 14:33:50 -04:00
Ryan Houdek
ef6c220a75 Allocator: Adds VirtualAlloc with memory Base hint function
This will be used with the TestHarnessRunner in the future to map
specific memory regions.

This is only used as a hint rather than exact placement with failure on
inability to map. This also hits the fun quirk of 64k allocation
granularity which developers need to be careful about.
2023-05-04 15:39:32 -07:00
Ryan Houdek
1e4a6d432c
Merge pull request #2658 from Sonicadvance1/remove_unused_log
LogManager: Remove unused handler
2023-05-04 15:32:04 -07:00
Ryan Houdek
4ebd180147 Arm64Emitter: Replace x18 usage with x30
Related to #2659 but not necessary directly.

Currently x30(LR) is unused in our RA. In all locations that call out to
code, we are already preserving LR and bringing it back after the fact.
This was just a missed opportunity since we aren't doing any call-ret
stack manipulations that would facilitate LR needing to stick around.

Since x18 is a reserved platform register on win32, we can replace its
usage with r19, and then replace r19 usage with x30 and everything just
works happily. Now x18 is the unused register instead of x30 and we can
come back in the future to gain one more register for RA on Linux
platforms.
2023-05-04 15:25:47 -07:00
Ryan Houdek
ac4ef63ae6 ARM64Emitter: Ensure platform register is saved on win32
Platform register stores the TEB region on win32 and needs to be
preserved if we're going to overwrite it.

Ensure we do so.
2023-05-04 15:12:52 -07:00
Ryan Houdek
b2392ef1c6 LogManager: Remove unused handler
This non-fmt handler is now entirely unused and can be removed.
2023-05-04 14:52:45 -07:00
Ryan Houdek
8e4d52396b LookupCache: Removes unnecessary recursive lock_guard
All code paths to this are already guaranteed to own the lock.

The rest of the codepaths haven't been vetted to actually need
recursive_mutex yet, but seems likely that it will be able to get
converted to a regular mutex with some more work.
2023-05-04 14:45:19 -07:00
Ryan Houdek
6eeb45b2dc FEXCore: Compile without exceptions
This disables some unwinding overhead when FEXCore is already guaranteed
to not throw.
2023-05-04 14:42:02 -07:00
Ryan Houdek
22cf2696da fextl/memory: Don't allow arrays in fextl::make_unique
This ensures we don't hit a programming error since we don't support the
array version of this.
2023-05-04 14:38:12 -07:00
Alexandre Julliard
8081ac61e5 AllocatorHooks: Fix parameter order for Win32 _aligned_malloc.
The prototype is the opposite of memalign().
2023-05-03 16:15:07 +02:00
Alexandre Julliard
435b4daae1 AllocatorHooks: Pass valid parameters to the Win32 VirtualAlloc. 2023-05-03 16:13:37 +02:00
Lioncache
5ee913bc75 OpcodeDispatcher: Simplify PCMPXSTRIOpImpl
All variants of the PCMPXSTRX instructions will take their arguments in
the same manner, so we don't need to specify them for each handler.

We can also rename the function to PCMPXSTRXOpImpl, since this will
be extended to handle the masking variants of the string instructions.
2023-05-02 18:48:35 -04:00
Lioncache
f502154f96 OpcodeDispatcher: Handle VPCMPISTRI 2023-05-02 14:00:05 -04:00
Lioncache
7a59fb3e25 IR: Add IR fallback for VPCMPISTRX
Will be the fallback that handles the implicit length string instruction emulation.
2023-05-02 13:52:30 -04:00
Mai
590422b295
Merge pull request #2641 from Sonicadvance1/remove_unittest_gen
FEXCore: Stop exposing the x86 table data symbols
2023-04-26 05:09:23 -04:00
Ryan Houdek
9d268df91f Softfloat: Disable some duplicate BIGFLOAT handlers
Since mingw has its reduced precision has double, these handlers were
duplicated and causing compile failure.
2023-04-26 01:48:37 -07:00
Ryan Houdek
699541485d Arm64: Disable ProcessorID and Break on mingw
Currently unsupported on mingw
2023-04-26 01:48:37 -07:00
Ryan Houdek
46a63186a2 FEXCore: Name libFEXCore correctly and use sync library 2023-04-26 01:48:37 -07:00
Ryan Houdek
90f347839d InterruptableConditionVariable: Implement for mingw 2023-04-26 01:48:37 -07:00
Ryan Houdek
c9e7d9f331 FEXCore: Disable IRDumper on mingw 2023-04-26 01:48:37 -07:00
Ryan Houdek
8c3a3bfb7c FEXCore: Resolve some header includes
Some aren't necessary anymore. Some need to not exist on mingw.
2023-04-26 01:48:37 -07:00
Ryan Houdek
9034946b43 Move UContext from FEXCore to frontend.
FEXCore no longer needs this since all the signal handling is done in
the frontend.
2023-04-26 01:48:37 -07:00
Ryan Houdek
056f44be0b SignalDelegator: Moves all signal handling to the frontend
This is a very OS specific operation and it living in FEXCore doesn't
make much sense. This still requires some strong collaboration between
FEXCore and the frontend but it is now split between the locations.

There's still a bit more cleanup work that can be done after this is
merged, but we need to get this burning fire out of the way.

This is necessary for llvm-mingw, this requires all previous PRs to be
merged first.

After this is merged, most of the llvm-mingw work is complete, just some
minor cleanups.

To be merged first:
- #2602
- #2604
- #2605
- #2607
- #2610
- #2615
- #2619
- #2621
- #2622
- #2624
- #2625
- #2626
- #2627
- #2628
- #2629
2023-04-26 01:24:11 -07:00
Mai
b5420f5db3
Merge pull request #2629 from Sonicadvance1/fexcore_cmake_mingw
FEXCore: Fixup cmake file for mingw
2023-04-25 10:12:17 -04:00
Mai
c94268789b
Merge pull request #2619 from Sonicadvance1/fileloading_mingw
FileLoading: Add WIN32 specific loading path
2023-04-25 10:11:14 -04:00
Mai
af15277fc4
Merge pull request #2615 from Sonicadvance1/fhu_mingw
FHU/FS: Create WIN32 helpers for some functions.
2023-04-25 10:09:35 -04:00
Mai
86e09a00f0
Merge pull request #2610 from Sonicadvance1/mingw_virtual_alloc
AllocatorHooks: Adds some mingw allocator helpers
2023-04-25 10:08:44 -04:00
Lioncache
c94721a04b OpcodeDispatcher: Handle VPMASKMOVD/VPMASKMOVQ
We can reuse the same helper we have for handling VMASKMOVPD and VMASKMOVPS,
though we need to move some handling around to account for the fact that
VPMASKMOVD and VPMASKMOVQ 'hijack' the REX.W bit to signify the element
size of the operation.
2023-04-24 10:50:11 -04:00
Ryan Houdek
c87f361bb5 FEXCore: Stop exposing the x86 table data symbols
This was only used for the unit test fuzzing framework. Which has been
removed and unused for pretty much its entire lifespan.

These can now be internal only.
2023-04-23 09:38:03 -07:00
Mai
0fa4390e47
Merge pull request #2622 from Sonicadvance1/dispatcher_signals
Dispatcher: Disable signal handling under mingw
2023-04-21 21:43:30 -04:00
Mai
7a774a8d80
Merge pull request #2624 from Sonicadvance1/fexcore_cpuid
FEXCore: Switch to xbyak for CPUID fetch helpers.
2023-04-21 21:42:54 -04:00
Mai
361e684c64
Merge pull request #2628 from Sonicadvance1/objectcache_mingw
Disable AOT and object cache under mingw
2023-04-21 21:42:25 -04:00
Mai
4c74913edf
Merge pull request #2627 from Sonicadvance1/disable_break_mingw
Disable Break/INT operations on mingw
2023-04-21 21:42:08 -04:00
Mai
059472fcef
Merge pull request #2621 from Sonicadvance1/object_cache_packed
ObjectCache: Ensure correctly packed config option
2023-04-21 21:41:34 -04:00
Mai
4a11111abd
Merge pull request #2626 from Sonicadvance1/thunks_mingw
Thunks: Disable under mingw
2023-04-21 21:40:40 -04:00
Mai
f673afc38f
Merge pull request #2625 from Sonicadvance1/gdbserver_mingw
GdbServer: Disable under mingw
2023-04-21 21:40:24 -04:00
Mai
1fad26d72f
Merge pull request #2613 from Sonicadvance1/cpuinfo_mingw
CPUInfo: Add mingw helper for CalculateNumberOfCPUs
2023-04-21 21:39:52 -04:00
Mai
2b5ddb6b93
Merge pull request #2607 from Sonicadvance1/mingw_softflow
llvm-mingw: Fix SoftFloat compiling
2023-04-21 21:38:27 -04:00
Mai
c140dd7da8
Merge pull request #2605 from Sonicadvance1/aligned_alloc
Allocator: Ensure uses of aligned allocations use aligned_free
2023-04-21 21:38:08 -04:00
Mai
da126141d3
Merge pull request #2604 from Sonicadvance1/move_config_paths
Config: Move path generation to the frontend
2023-04-21 21:37:23 -04:00
Mai
874ae5b0fc
Merge pull request #2602 from Sonicadvance1/move_thread_creation
Threads: Moves pthread logic to FEXLoader
2023-04-21 21:36:28 -04:00
Lioncache
651c6f8ddf OpcodeDispatcher: Handle VCVTPS2PD/VCVTPD2PS 2023-04-18 10:29:57 -04:00
Lioncache
73ca4e5687 OpcodeDispatcher: Move vector float conversion to helper
Will be used for implementing the equivalent AVX instructions.
2023-04-18 10:07:30 -04:00
Lioncache
cb9cc74fcc OpcodeDispatcher: Handle AVX variants of float-to-float conversions
Adds in the handling of destination type size differences with AVX.

Also fixes cases where the SSE operations would load 128-bit vectors
from meory, rather than only loading 64-bit vectors with VCVTPS2PD.
2023-04-18 09:52:28 -04:00
Lioncache
d1116456fc OpcodeDispatcher: Handle VCVTSD2SS/VCVTSS2SD 2023-04-18 08:13:23 -04:00
Lioncache
84985952c9 OpcodeDispatcher: Factor out scalar floating-point conversion to helper
Will be used to implement the AVX variants of VCVTSD2SS and VCVTSS2SD
2023-04-18 07:16:37 -04:00
Mai
a351620c60
Merge pull request #2634 from Sonicadvance1/missing_avx
VEXTables: Adds a missing class of AVX instructions
2023-04-18 06:55:35 -04:00
Ryan Houdek
9117f7e724 VEXTables: Adds a missing class of AVX instructions
These are all AVX1, not sure how I missed this.
Sorry @lioncash, four more instructions.
2023-04-17 20:39:59 -07:00
Lioncache
8e391e7a61 Interpreter: Move PCMPESTRX fallback to VectorFallbacks
Now that OpHandlers isn't coupled to the F80 ops anymore, we can
move this over to its own file dedicated to vector fallbacks.
2023-04-17 22:57:09 -04:00
Lioncache
98fbc4a46d Interpreter: Move OpHandler struct into its own header
We can also provide a general rundown for hooking up interpreter fallbacks here for the uninitiated.
2023-04-17 22:55:02 -04:00
Lioncache
8481aeccb5 Interpreter: Move F80Ops.h into Fallback directory
We can also rename it to F80Fallbacks.h to make the file purpose a little more explicit.
2023-04-17 22:54:58 -04:00
Lioncache
b1df63f425 Interpreter: Move fallbacks into new directory
Will be used to store fallbacks and separate the definition struct from the F80 fallbacks
2023-04-17 22:05:00 -04:00
Lioncache
39c73d975b OpcodeDispatcher: Handle PCMPESTRI/VPCMPESTRI 2023-04-17 21:42:58 -04:00
Lioncache
30cb1aaaed IR: Add VPCMPESTRX fallback
In order to implement the SSE4.2 string instructions in a reasonable
manner, we can make use of a fallback implementation for the time
being.

This implementation just returns the intermediate result and leaves it
up to the function making use of it to derive the final result from said
intermediate result. This is fine, considering we have the immediate
control byte that tells us exactly what is desired as far as output
formats go.

Given that the result of this IR op will never take up more than
16-bits, we store the flags we need to set in the upper 16 bits of the
result to avoid needing to implement multiple return values in the JIT.

Also, since the IR op just returns the intermediate result, this can be
used to implement all of the explicit string instructions with a single IR op.

The implementation is pretty heavily documented to help make heads or
tails of these monster instructions.
2023-04-17 21:39:32 -04:00
Ryan Houdek
cbf41448fc Thunks: Disable under mingw 2023-04-17 03:10:04 -07:00
Ryan Houdek
47bdc9af12 Config: Move realpath usage to FHU 2023-04-17 03:05:25 -07:00
Ryan Houdek
0fad5b88c1 FEXCore: Fixup cmake file for mingw
- 64-bit allocator doesn't work under mingw atm.
- Can't link against libdl
- Can't have a SONAME because it is a PE, not a shared library.
2023-04-17 02:57:27 -07:00
Ryan Houdek
8c9fe0dd31 AOTIR: Disable loading and saving on mingw 2023-04-17 02:55:15 -07:00
Ryan Houdek
dda3afcfaf ObjectCache: Disable job handling on mingw
This isn't wired up anyway, but this needs to be disabled for now.
2023-04-17 02:55:11 -07:00
Ryan Houdek
34ceefb2c3 JIT64: Disable Break op on mingw
No way to handle this currently.
2023-04-17 02:54:30 -07:00
Ryan Houdek
25ef63a069 OpcodeDispatcher: Disable INT instruction entirely under mingw
Not yet able to handle this there.
2023-04-17 02:54:25 -07:00
Ryan Houdek
99a9c88f3f GdbServer: Disable under mingw
This needs to move to the frontend at some point.
2023-04-17 02:53:40 -07:00
Ryan Houdek
77f56199e8 Dispatcher: Disable signal handling under mingw
This needs some hefty reconstructing
2023-04-17 02:53:03 -07:00
Ryan Houdek
6c13b629af FEXCore: Switch to xbyak for CPUID fetch helpers.
This will use the correct `__cpuid` define, either in cpuid.h or
self-defined depending on environment.

Otherwise we would need to define our own cpuid helpers to match the
difference between mingw and linux.
2023-04-17 02:52:17 -07:00
Ryan Houdek
3ebe9f7b04 CPUInfo: Add mingw helper for CalculateNumberOfCPUs 2023-04-16 17:30:30 -07:00
Ryan Houdek
005389f8c1 llvm-mingw: Fix SoftFloat compiling 2023-04-16 00:30:28 -07:00
Mai
a33443db62
Merge pull request #2611 from Sonicadvance1/arm64_mingw
ARM64Dispatcher: Fix compiling with mingw
2023-04-16 03:30:09 -04:00
Mai
68599bf124
Merge pull request #2618 from Sonicadvance1/corestate_mingw
CoreState: Fix SynchronousFaultData padding type
2023-04-16 03:26:32 -04:00
Mai
4bffdc6345
Merge pull request #2612 from Sonicadvance1/frontend_mingw
Frontend: Remove errant header
2023-04-16 03:25:03 -04:00
Mai
cbe55b0765
Merge pull request #2616 from Sonicadvance1/telemetry_mingw
Telemetry: Disable on WIN32
2023-04-16 03:23:20 -04:00
Mai
2ff5096103
Merge pull request #2617 from Sonicadvance1/netstream_mingw
Netstream: Disable on WIN32
2023-04-16 03:23:06 -04:00
Mai
797737a84d
Merge pull request #2609 from Sonicadvance1/mingw_threadname
Threads: Adds SetThreadName helper
2023-04-16 03:22:46 -04:00
Mai
cfc1aa593b
Merge pull request #2620 from Sonicadvance1/ra_helper
RA: Use FindFirstSetBit helper
2023-04-16 03:20:09 -04:00
Ryan Houdek
6b964f70e0 CPUID: Fix std::min type cast 2023-04-16 00:09:36 -07:00
Ryan Houdek
78844ee975 ObjectCache: Ensure correctly packed config option 2023-04-15 18:41:57 -07:00
Ryan Houdek
51afcb7143 RA: Use FindFirstSetBit helper 2023-04-15 18:41:35 -07:00
Ryan Houdek
5258b1972b FileLoading: Add WIN32 specific loading path 2023-04-15 18:41:11 -07:00
Ryan Houdek
c6616d64d8 CoreState: Fix SynchronousFaultData padding type 2023-04-15 18:40:34 -07:00
Ryan Houdek
d9b9ce804b Netstream: Disable on WIN32 2023-04-15 18:40:11 -07:00
Ryan Houdek
132aa7e4d3 Telemetry: Disable on WIN32 2023-04-15 18:39:44 -07:00
Ryan Houdek
fc00a31aee Frontend: Remove errant header 2023-04-15 18:37:53 -07:00
Ryan Houdek
1de84110e8 ARM64Dispatcher: Fix compiling with mingw 2023-04-15 18:37:31 -07:00
Ryan Houdek
1962f036e1 ObjectCacheService: Use ThreadName helper 2023-04-15 18:21:42 -07:00
Ryan Houdek
879a081556 Threads: Adds SetThreadName helper 2023-04-15 18:21:42 -07:00
Ryan Houdek
105060363f FEXCore: Move mmap allocators over to VirtualAlloc 2023-04-15 18:07:54 -07:00
Ryan Houdek
bbb3a6439f AllocatorHooks: Adds some mingw allocator helpers 2023-04-15 18:07:49 -07:00
Ryan Houdek
40b67462b7 Allocator: Ensure uses of aligned allocations use aligned_free
This will be used by mingw.
2023-04-15 15:25:17 -07:00
Ryan Houdek
d853de39ff Config: Move path generation to the frontend
This lets all the path generation for the config to be in the frontend.
This then informs FEXCore where things should live.

This is for llvm-mingw. While paths aren't quite generated correctly,
this gets the code closer to compiling.
2023-04-15 15:25:01 -07:00
Ryan Houdek
75a62f856b Threads: Moves pthread logic to FEXLoader
This is not an attempt to clean up the various issues with the pthread
logic, instead just moving the pthread specific logic out of FEXCore in
to FEXLoader.

FEXCore needs to know how to create threads in an agnostic way. Which is
why we obfuscate the details with this inteface.

Initially this was implemented with the pthread handlers in FEXCore and
expected eventually for those to get moved to the frontend. This is the
time when it has been moved.

This is the first step towards compiling with llvm-mingw.

Still a long way to go.
2023-04-15 15:24:30 -07:00
Ryan Houdek
1a91d849f0
Merge pull request #2591 from Sonicadvance1/new_jemalloc
Add in jemalloc glibc hooking again
2023-04-14 13:32:11 -07:00
Ryan Houdek
1cc9f2107d Add in jemalloc glibc hooking again
We still need to hook glibc for thunks to work with
`IsHostHeapAllocation`.
So now we link in two jemalloc allocators in different namespaces.

As usual we have multiple heap allocators that we need to be careful about.

1. jemalloc with `je_` namespace.
  - This is FEX's regular heap allocator and what gets used for all the
    fextl objects.
  - This allocator is the one that the FEX mmap/munmap hooks hook in to
     - This mmap hooking gives this allocator the full 48-bit VA even in
       32-bit space.

2. jemalloc with `glibc_je_` namespace.
  - This is the allocator that overrides the glibc allocator routines
  - This is the allocator that thunks will use.
  - This is what `IsHostHeapAllocation` will check for.

3. Guest glibc allocator
  - We don't touch this one. But it is distinct from the host side
    allocators.
  - The guest side of thunks will use this heap allocator.

4. Host glibc allocator
  - #2 replaces this one unless explicitly disabled.
  - Always expected to override the allocator, so this configuration
    isn't expected.

Already tested this with Dota Underlords to ensure this works with
thunks.
2023-04-14 13:16:22 -07:00
Ryan Houdek
46e5343a0e External/drm: Update to v6.2 2023-04-14 13:07:52 -07:00
Ryan Houdek
f45ea1e0f6
Merge pull request #2600 from Sonicadvance1/fix_stringconv_allocate
StringUtils: Stop allocating TrimTokens
2023-04-12 03:50:20 -07:00
Ryan Houdek
92593162b0 StringUtils: Stop allocating TrimTokens
Use a string_view instead of a fextl::string so this stops allocating
memory (stack in the cases I have seen).

Fixes #2562
2023-04-12 03:34:58 -07:00
Ryan Houdek
5c62ea21f4
Merge pull request #2598 from Sonicadvance1/stop_leaking_avx
FEXCore: Stop leaking AVX configuration state
2023-04-11 19:57:58 -07:00
Ryan Houdek
0d0b99f344 OpcodeDispatcher: Move usages of And(Not( to Andn
Fixes #2199

Very few uses actually, we were pretty good at this already.
2023-04-11 15:35:12 -07:00
Ryan Houdek
44e06185b7 FEXCore: Stop leaking AVX configuration state
The dispatcher was saving AVX state even though FEX doesn't support it
currently. This is due to it checking for the config option rather than
the HostFeatures option.

The `EnableAVX` config option is supposed to be used to inform FEXCore
if we want AVX disabled or not when the host supports the feature. In
this case it is universally enabled because we haven't encountered any
games that have issues with AVX state being saved with signals. (We know
they exist, we just don't have configurations for them).

The HostFeatures option `SupportsAVX` is the option that is supposed to
be getting used for determining if the runtime AVX feature is enabled.
This also had an issue though that this was **also** always enabled if
running on an x86 host with AVX, or an ARM host with SVE2-256bit.
It was then disabled if the config option was disabled; But, since
FEX-Emu doesn't support AVX fully yet, we need to ensure this isn't yet
enabled.

But this only solves half the problem. In order for our CI to test AVX
features before fully supporting AVX, it needs to be able to enable AVX
so that the CPU state is correctly saved.

So we need to change the default configuration option to be false, and
have CI enable it for the tests that matter before AVX is fully
implemented.
2023-04-11 15:21:32 -07:00
Ryan Houdek
49abe8afb5 Allocator: Remove pointer indirection overhead
Every time we are calling a function in `FEXCore::Allocator::` this is a
pointer indirection. Which means on x86 it is always a `call [rdi]` and
on AArch64 it is a `ldr x17, [x0]; blr x17;`.

Instead of doing this, use inline functions in the header that call the
correct allocation function directly. This function gets inlined and is
no longer an indirect call.

When compiling with jemalloc, we forward declare the jemalloc function
definitions so we don't have to pull in the entire jemalloc interface in
to the public header definitions.
2023-04-11 10:29:35 -07:00
Ryan Houdek
d7bc0370ee
Merge pull request #2592 from Sonicadvance1/remove_fwrite
fextl::fmt: Remove fwrite usage
2023-04-11 03:32:06 -07:00
Ryan Houdek
0e007d2724 StringConv: Convert to conversion functions that don't use std::string
`std::stoul` and `std::stroull` take a std::string which was converting
the string_view to a std::string first. Causing glibc fault testing to
catch this since not much uses this.

These will be added to the documentation.
2023-04-10 18:56:05 -07:00
Ryan Houdek
466edf7744 fextl::fmt: Remove fwrite usage
fwrite allocates some backing memory for buffering outputs which FEX
can't track.

Switch to using `fileno` to get the fd from the FILE and write directly.
This will need to be changed for llvm-mingw support but that will come after
this.

This will be added to the documentation that we can't use fwrite.
2023-04-10 17:43:47 -07:00
Ryan Houdek
0bb59ade53 Updates jemalloc
Needed to change some symbol names due to proper jemalloc namespacing
now.
2023-04-10 16:21:33 -07:00
Ryan Houdek
1864a1d3b5 fextl::fmt: Add print with std::FILE* handler
This is to match prior behaviour, but untested if fwrite itself
allocates any memory so far.
2023-04-10 15:38:55 -07:00
Ryan Houdek
e98a46aa5f Review comments 2023-04-07 17:01:53 -07:00
Ryan Houdek
265c918d90 Move fextl::String_from_path to the only usage in FEXConfig
Ensures that people won't be tempted to use this elsewhere.
2023-04-07 17:01:53 -07:00
Ryan Houdek
46b306e861 Config: Remove to_string usage 2023-04-07 17:01:52 -07:00
Ryan Houdek
32d7fae373 GdbServer: Convert to_string usage 2023-04-07 17:01:52 -07:00
Ryan Houdek
11402b637a fextl: Remove now unused string_from_string 2023-04-07 17:01:52 -07:00
Ryan Houdek
f5ed9c4ff3 CodeCache: Convert std::fs to FHU 2023-04-07 17:01:52 -07:00
Ryan Houdek
1306e597dd CodeSerialize: Convert unique_ptr to fextl 2023-04-07 17:01:52 -07:00
Ryan Houdek
4d70f4fc4e Remove some unused headers now. 2023-04-07 17:01:52 -07:00
Ryan Houdek
bb922f9a9e External: Update robin-map 2023-04-07 17:01:52 -07:00
Ryan Houdek
4ab822aebb IRParser: Convert to fextl 2023-04-07 17:01:52 -07:00
Ryan Houdek
efada6a0ea Config: Convert some std::filesystem to FHU 2023-04-07 17:01:52 -07:00
Ryan Houdek
2d18156e15 AOT: Convert fstream to fextl and raw files 2023-04-07 17:01:52 -07:00
Ryan Houdek
60fe987e09 NetStream: Add operator new/delete because of raw pointer usage. 2023-04-07 17:01:51 -07:00
Ryan Houdek
7180bb1496 GdbServer: Convert fstream to fextl 2023-04-07 17:01:51 -07:00
Ryan Houdek
001a086d85 Convert remaining fmt::format to fextl 2023-04-07 17:01:51 -07:00
Ryan Houdek
8a711383bb fextl/fmt: Adds print helper that takes FD 2023-04-07 17:01:51 -07:00
Ryan Houdek
257a3a54dc Context: Convert over to a unique_ptr 2023-04-07 17:01:51 -07:00
Ryan Houdek
e4fadd6992
Merge pull request #2587 from Sonicadvance1/disable_sbrk
Allocator: Disable glibc sbrk allocations
2023-04-07 17:01:19 -07:00
Ryan Houdek
9e5971b89c Allocator: Disable glibc sbrk allocations
This is done by consuming a single page at the end of the current sbrk
memory region. Then consuming any remaining bytes that could have
potentially ended up in it.

This ensures that glibc won't be able to return 64-bit pointers to
32-bit thunks once the remaining work is in place.
2023-04-06 12:27:44 -07:00
André Zwing
f944709139 Dispatcher: Fixes restoring of AVX state 2023-04-05 21:15:05 +02:00
Ryan Houdek
aac4e25ca4
Merge pull request #2549 from Sonicadvance1/glibc_remaining_allocations
Move FEX away from the remaining glibc allocations that we can
2023-04-01 09:46:29 -07:00
Ryan Houdek
546a1edb55 CodeReview 2023-04-01 09:27:01 -07:00
Tony Wasserka
21838fe03f
Merge pull request #2574 from neobrain/feature_thunk_wayland
Add support for thunking Wayland
2023-04-01 17:15:49 +02:00
Tony Wasserka
99ba648a71 Thunks: Fix thunking libraries with "-" in their name
The LOAD_LIB and EXPORTS macros behave slightly differently in this regard:
* Use LOAD_LIB(libwayland-client) in Guest.cpp (library name with dash)
* Use EXPORTS(libwayland_client) in Host.cpp (library name with underscore)
2023-04-01 16:52:49 +02:00
Ryan Houdek
97daec3dba Review comments 2023-03-31 06:03:06 -07:00
Ryan Houdek
2990a9d820 FaultingAllocator: Review comments 2023-03-30 16:28:34 -07:00
Ryan Houdek
53bbbd5a4f Review code 2023-03-30 16:28:34 -07:00
Ryan Houdek
047dddb023 Rebase patching 2023-03-30 16:28:34 -07:00
Ryan Houdek
4f66ff6ec4 Paths: Remove unique_ptr usage 2023-03-30 16:28:34 -07:00
Ryan Houdek
067f807405 InternalThreadState: Convert tsl to fextl 2023-03-30 16:28:34 -07:00
Ryan Houdek
463b4b748c fextl: Add fextl::fmt::print 2023-03-30 16:28:34 -07:00
Ryan Houdek
3eae668cec X86Jit: Fix xbyak allocating through glibc 2023-03-30 16:28:34 -07:00
Ryan Houdek
fc8bf9f0f6 Config: Remove static which is allocated 2023-03-30 16:28:34 -07:00
Ryan Houdek
3cfc1de410 Common: Convert cpp-optparse over to fextl and use. 2023-03-30 16:28:34 -07:00
Ryan Houdek
a14353bc3c FEXLoader: Convert remaining usages away from glibc 2023-03-30 16:28:34 -07:00
Ryan Houdek
629e547e5f AllocatorOverride: Remove AFmt, it can try allocating memory and infinite loop. 2023-03-30 16:28:34 -07:00
Ryan Houdek
bbd0d26c16 Telemetry: Convert to FHU to remove glibc 2023-03-30 16:28:34 -07:00
Ryan Houdek
1eac7e7105 AOTIR: Convert to FHU to remove glibc 2023-03-30 16:28:34 -07:00
Ryan Houdek
1f9458a3c3 Config: Convert to FHU to remove glibc 2023-03-30 16:28:34 -07:00
Ryan Houdek
a3be4b77fa Paths: Convert to FHU to remove glibc 2023-03-30 16:28:33 -07:00
Ryan Houdek
b2ec28503d LookupCache: Move over to fextl::pmr 2023-03-30 16:28:33 -07:00
Ryan Houdek
170c9ee9e4 LookupCache: Switch to fextl 2023-03-30 16:28:33 -07:00
Ryan Houdek
1eb36b8b31 Convert a ton of things over to fextl 2023-03-30 16:28:33 -07:00
Ryan Houdek
465ecd9b19 Mark code regions that require glibc memory allocations.
This ensures that when we enable glibc fault testing these sections
won't break CI.
2023-03-30 16:28:33 -07:00
Mai
df354e37dd
Merge pull request #2578 from Sonicadvance1/support_salc
OpcodeDispatcher: Implement support for 32-bit SALC instruction
2023-03-30 18:12:15 -04:00
Ryan Houdek
43e6d398b6 X86Dispatcher: Move xbyak to custom types 2023-03-30 08:49:26 -07:00
Ryan Houdek
0d7c856775 Update xbyak 2023-03-30 08:49:26 -07:00
Ryan Houdek
141dddc83e CMake: Adds glibc allocator fault option
This will be used for CI to ensure FEX doesn't use the glibc allocator
2023-03-30 08:49:26 -07:00
Ryan Houdek
64aa3bfabe Switch FEX to fextl::fmt 2023-03-30 08:49:26 -07:00
Ryan Houdek
f02a111d33 fextl: add memory for unique_ptr and make_unique 2023-03-30 08:49:26 -07:00
Ryan Houdek
79f7baffe3 fextl: Add pmr default resource 2023-03-30 08:49:26 -07:00
Ryan Houdek
7150c532f3 fextl: add robin_map 2023-03-30 08:49:26 -07:00
Ryan Houdek
e48fb1850e fextl: add unordered_multimap 2023-03-30 08:49:26 -07:00
Mai
88dba60bee
Merge pull request #2579 from Sonicadvance1/invalid_instruction_log
Core: Add a new log message for unsupported instruction
2023-03-29 22:52:08 -04:00
Ryan Houdek
d615ae9c6a Core: Add a new log message for unsupported instruction
The previous log in the frontend is super useful when an instruction
decoding wasn't supported.
Now that most of AVX is covered, a game will crash on SIGILL (and
usually catch it) and close without any indication.

Now if the instruction is decoded but it is invalid for the
configuration, still output a message as a good indicator that the game
is using instructions that the host doesn't support.

Will let us still pick up on games crashing due to lack of SVE very
easily.
2023-03-29 14:38:16 -07:00
Ryan Houdek
7629edcf61 OpcodeDispatcher: Implement support for 32-bit SALC instruction
This is an undocumented but supported instruction. It behaves just like
an `sbb al, al` but doesn't set flags and is one byte shorter.

The end result is that al is set to 0xFF or 0 depending on if CF is set
or not.
2023-03-29 14:34:51 -07:00
Lioncache
73d250c555 ARMEmitter: Handle SVE2 integer add/subtract wide category 2023-03-29 17:02:18 -04:00
Lioncache
337f8b06a3 ARMEmitter: Convert SVE2 integer multiply long to wide helper
Unifies the emitted ops under the same underlying emitter function.
2023-03-29 16:52:37 -04:00
Lioncache
87fa545bd0 ARMEmitter: Convert SVE2 integer add/subtract long to wide helper
The generic helper will be used to implement the remaining unimplemented
category from this group
2023-03-29 16:52:34 -04:00
Ryan Houdek
7747ac8de8
Merge pull request #2576 from lioncash/mul
ARMEmitter: Handle SVE Integer Multiply-Add - Unpredicated group
2023-03-29 13:14:29 -07:00
Lioncache
deb1c9e933 ARMEmitter: Handle SVE mixed sign dot product category 2023-03-29 15:43:07 -04:00
Lioncache
672a88395d ARMEmitter: Handle SVE2 saturating multiply-add high category 2023-03-29 15:38:38 -04:00
Lioncache
88524ce718 ARMEmitter: Handle SVE2 saturating multiply-add long category 2023-03-29 15:33:49 -04:00
Lioncache
bb153054f9 ARMEmitter: Handle SVE2 integer multiply-add long category 2023-03-29 15:28:32 -04:00
Lioncache
22a7a49042 ARMEmitter: Handle SVE2 complex integer multiply-add 2023-03-29 15:19:09 -04:00
Lioncache
9876f3eb5c ARMEmitter: Handle SVE2 saturating multiply-add interleaved long category 2023-03-29 15:08:30 -04:00
Lioncache
b9e4ce4029 ARMEmitter: Handle SVE integer dot product (unpredicated) category 2023-03-29 14:55:06 -04:00
Lioncache
0c048772e0 ARMEmitter: Handle CDOT (vectors) 2023-03-29 14:54:35 -04:00
Lioncache
830c1884d1 OpcodeDispatcher: Handle store variants of VMASKMOVPD/VMASKMOVPS
And with that, we support all of the AVX1-only instructions.

The remaining instructions for full AVX1 support is now just the SSE4.2
string instructions.
2023-03-29 14:03:23 -04:00
Lioncache
5abf9de8a5 IR: Add VStoreVectorMasked IR op
Will be used to implement the store variants of VPMASKMOV and
VMASKMOVP{D, S}
2023-03-29 14:03:20 -04:00
Lioncache
25960fe6b1 OpcodeDispatcher: Handle load variants of VMASKMOVP{D, S} 2023-03-28 10:35:23 -04:00
Lioncache
eb8626c1f7 IR: Add VLoadVectorMasked IR op
Will be used to implement the load variants of VMASKMOVP{D, S} and
VPMASKMOV{D, Q}

Particularly useful, since with SVE this behavior can be collapsed into
two instructions (CMPGT followed by the relevant LD1 load instruction)
2023-03-28 01:57:25 -04:00
Lioncache
ef7853ca4a ARMEmitter: Fix treating 32-bit elements as 64-bit with ld1w
These conditionals were accidentally inverted and were treating 32-bit
elements as 64-bit ones, when this is unintended.

Also add missing tests to ensure this doesn't slip through in the
future.
2023-03-28 00:55:04 -04:00
Ryan Houdek
55d65f3aea
Merge pull request #2570 from lioncash/mov
Arm64/VectorOps: Remove a few unnecessary EORs from comparisons in SVE path
2023-03-27 13:35:48 -07:00
Ryan Houdek
2c2abc550b
Merge pull request #2569 from lioncash/psadbw
OpcodeDispatcher: Handle VMPSADBW
2023-03-27 13:31:52 -07:00
Lioncache
a1dc132f03 Arm64/VectorOps: Eliminate unnecessary EOR and MOV in FP compares
We can use the zeroing variant of MOVPRFX to perform the same behavior.
2023-03-27 16:16:28 -04:00
Lioncache
ef31e0c7c7 OpcodeDispatcher: Handle VMPSADBW 2023-03-27 16:00:24 -04:00
Lioncache
eecd016ba8 Arm64/VectorOps: Eliminate unnecessary EOR and MOV in VCMP{EQ,GT}
We can use the zeroing version of MOVPRFX to perform the same behavior.
2023-03-27 15:38:03 -04:00
Lioncache
b2ec6d5208 OpcodeDispatcher: Move MPSADBW implementation into helper
This will be used for implementing the AVX variant of this instruction.
2023-03-27 12:40:28 -04:00
Lioncache
416a7b825d ARMEmitter: Move SVE2IntegerSaturatingAddSub over to using SVE2IntegerPredicated helper
Deduplicates some code.
2023-03-27 12:21:42 -04:00
Lioncache
c3511ffa48 ARMEmitter: Move SVEIntegerPairwiseArithmetic over to using SVE2IntegerPredicated helper
Deduplicates some code.
2023-03-27 12:21:42 -04:00
Lioncache
b9c3277b09 ARMEmitter: Move SVE2IntegerHalvingPredicated over to using SVE2IntegerPredicated helper
Deduplicates some code.
2023-03-27 12:21:38 -04:00