6759 Commits

Author SHA1 Message Date
Elias James Howell
b953433404 fix spelling errors
Fixing some minor spelling errors which should not affect functionality but improve the overall quality of documentation.
2023-07-13 11:23:59 -04:00
Ryan Houdek
9722c4c5a4
Merge pull request #2766 from alyssarosenzweig/flags/add-of
OpcodeDispatcher: Optimize ADD/ADC OF flag packing
2023-07-12 15:47:21 -07:00
Alyssa Rosenzweig
e8c0e19afc OpcodeDispatcher: "Calculcate" -> "Calculate"
Typofix.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-07-12 18:07:04 -04:00
Alyssa Rosenzweig
c559fec959 OpcodeDispatcher: Optimize ADD/ADC OF flag packing
We can fold the Not into the And. This requires flipping the arguments
to Andn, but we do not flip the order of the assignments since that
requires an extra register in a test I'm looking at.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-07-12 18:06:36 -04:00
Alyssa Rosenzweig
8d2fabe705 OpcodeDispatcher: Deduplicate ADD/ADC OF generation
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-07-12 18:06:36 -04:00
Mai
6455c4817a
Merge pull request #2764 from Sonicadvance1/remove_unused_tls
FEXCore: Removes unused TLS variable
2023-07-12 17:25:08 -04:00
Ryan Houdek
5dbd1b8dc2 FEXCore: Removes unused TLS variable
Not sure why this still existed.
2023-07-12 13:05:47 -07:00
Mai
7765bbc7b8
Merge pull request #2762 from Sonicadvance1/FEXRootFSFetcherPercent
FEXRootFSFetcher: Make verification percent easier to read
2023-07-12 15:34:07 -04:00
Mai
2283c73fae
Merge pull request #2742 from Sonicadvance1/fix_win32
FEXCore: Fixes WIN32 compiling again
2023-07-12 15:33:46 -04:00
Ryan Houdek
5fef0c29aa FEXCore: Rename Telemetry helper function GetObject
WIN32 has a define already called `GetObject` and will cause our
symbol to have an A appended to it and break linking.

Just rename it to `GetTelemetryValue`
2023-07-12 11:53:13 -07:00
Ryan Houdek
d387c46aab FEXCore: Fixes WIN32 compiling again
Mostly a quick bandage while I'm setting getting ready to setup the
runners to test this for us.
2023-07-12 11:53:13 -07:00
Ryan Houdek
3bb7f9d6b5 FEXRootFSFetcher: Make verification percent easier to read
Multiply it by 100 to actually show as a percentage, only show two
digits past the decimal, and update every second to be more responsive.
2023-07-11 20:03:06 -07:00
Ryan Houdek
70d54122b2
Merge pull request #2761 from bylaws/main
FHU: Fix WIN32 getcpu implementation
2023-07-11 17:37:06 -07:00
Billy Laws
4e266cf9fd FHU: Fix WIN32 getcpu implementation
Both parameters to getcpu are nullable and FEX relies upon this for
CPUID.
2023-07-12 00:02:49 +01:00
Mai
ddd6dbfdcc
Merge pull request #2759 from Sonicadvance1/redundant_bfe_flags
OpcodeDispatcher: Remove spurious bfe with flag storing
2023-07-10 22:19:21 -04:00
Mai
7f2557e322
Merge pull request #2757 from Sonicadvance1/optimize_movss_reg
OpcodeDispatcher: Optimize MOVSS to register
2023-07-10 21:22:47 -04:00
Mai
810c7d926c
Merge pull request #2758 from Sonicadvance1/optimize_tso_vector_loadstores
IR: Optimize vector TSO loadstore address calculation
2023-07-10 21:21:46 -04:00
Ryan Houdek
04c325661c OpcodeDispatcher: Remove spurious bfe with flag storing
Noticed during introspection that we were generating zero constants
redundantly. Bunch of single cycle hits or zero-register renames.

Every time a `SetRFLAG` helper was called, it was /always/ doing a BFE
on everything passed in to extract the lowest bit. In nearly all cases
the data getting passed in is already only the lowest bit.

Instead, stop the helper from doing this BFE, and ensure the
OpcodeDispatcher does BFE in the couple of cases it still needs to do.

As I was skimming through all these to ensure BFE isn't necessary, I did
notice that some of the BCD instructions are wrong or questionable. So I
left a comment on those so we can come back to it.
2023-07-10 18:03:23 -07:00
Mai
0121e858f2
Merge pull request #2756 from Sonicadvance1/movss_optimize
OpcodeDispatcher: Optimize MOVSS to memory destination
2023-07-10 18:48:13 -04:00
Mai
2c3361be9e
Merge pull request #2755 from Sonicadvance1/stop_installing_fmt
CMake: Stop installing fmt
2023-07-10 18:47:30 -04:00
Ryan Houdek
2d800b2627 IR: Optimize vector TSO loadstore address calculation
These address calculations were failing to understand that they can be
optimized. When TSO emulation is disabled these were fine, but with TSO
we were eating one more instruction.

Before:
```
add x20, x12, #0x4 (4)
dmb ish
ldr s16, [x20]
dmb ish
```

After:
```
dmb ish
ldr s16, [x12, #4]
dmb ish
```

Also left a note that once LRCPC3 is supported in hardware that we can do a similar optimization there.
2023-07-10 15:21:46 -07:00
Ryan Houdek
55ed3e0549 OpcodeDispatcher: Optimize MOVSS to register
Easily fixed. Found through inspection.

Before:
```
eor v0.16b, v0.16b, v0.16b
mov v0.s[0], v17.s[0]
mov v4.16b, v0.16b
mov v16.s[0], v4.s[0]
```

After:
```
mov v16.s[0], v17.s[0]
```
2023-07-10 14:36:27 -07:00
Ryan Houdek
55d084ebb0 OpcodeDispatcher: Optimize MOVSS to memory destination
Easy fixed. Found through inspection.

Before:
```
eor v0.16b, v0.16b, v0.16b
mov v0.s[0], v16.s[0]
mov v4.16b, v0.16b
str s4, [x11]
```

After:
```
str s16, [x11]
```
2023-07-10 14:25:01 -07:00
Ryan Houdek
457dc5dd90 CMake: Stop installing fmt
Fixes #2751

Luckily fmt provides an option to disable this.
2023-07-10 12:26:39 -07:00
Mai
98eda5e163
Merge pull request #2749 from Sonicadvance1/optimize_away_redundant_masks
OpcodeDispatcher: Optimize some shifts size masking
2023-07-10 08:08:57 -04:00
Ryan Houdek
592935790e
Merge pull request #2750 from Sonicadvance1/fix_pcmpestri
OpcodeDispatcher: Fixes bug with pcmpestri
2023-07-08 18:47:38 -07:00
Ryan Houdek
92d0344d6a OpcodeDispatcher: Fixes bug with pcmpestri
When this instruction returns the index in to the ecx register, this is
defined as a 32-bit result. This means it actually gets zero-extended to
the full 64-bit GPR size on 64-bit processes.
Previously FEX was doing a 32-bit insert which leaves garbage data in
the upper 32-bits of the RCX register.

Adds a unit test to ensure the result is zero extended.
Fixes running Java games under FEX now that SSE4.2 is exposed.
2023-07-08 18:08:47 -07:00
Ryan Houdek
9327435f97 OpcodeDispatcher: Optimize some shifts size masking
Inspired from #2561, these shifts  don't need to be masked if we know
their operating size up front.

Causes a handful of these to become more optimal.
2023-07-08 16:41:15 -07:00
Mai
573f339647
Merge pull request #2748 from Sonicadvance1/fix_missing_header
unittests: Adds missing header
2023-07-08 18:11:33 -04:00
Ryan Houdek
c1f18951ab unittests: Adds missing header
Newer libstdc++ moved an internal header include and now this failed to
compile.
2023-07-08 14:50:10 -07:00
Mai
8a4bfba47c
Merge pull request #2745 from Sonicadvance1/optimize_fcmov
OpcodeDispatcher: Optimize GetPackedRFLAG
2023-07-07 22:29:52 -04:00
Mai
69ea03f0eb
Merge pull request #2746 from Sonicadvance1/optimize_maskmov
OpcodeDispatcher: Optimize MASKMOVDQU and MASKMOVQ
2023-07-07 22:29:37 -04:00
Mai
462feec2a6
Merge pull request #2743 from Sonicadvance1/minor_cleanup
FEXCore: Minor cleanup
2023-07-07 22:25:53 -04:00
Ryan Houdek
15f5fe658b OpcodeDispatcher: Optimize MASKMOVDQU and MASKMOVQ
This previous implementation was particularly gnarly. Because these
instructions are both weackly ordered and have implementation dependent
exception and trap behaviour these can actually be fairly conveniently
converted over to a load + cmlt + bsl + str instruction.

For the XMM variant this reduces code blowup from 80x to 15x!
For the MMX variant this reduces code blowup from 46x to 17x!

Both of these improvements are significant wins! There's still some
minor improvement that could be done with bsl that requires some
redundant moves, but since we don't have constraint support for this we
still eat two additional instructions

Before:
```asm
0x0000ffff7b800718  10ffffe0    adr x0, #-0x4 (addr 0xffff7b800714)
0x0000ffff7b80071c  f9005f80    str x0, [x28, #184]
0x0000ffff7b800720  4eb11e24    mov v4.16b, v17.16b
0x0000ffff7b800724  4eb01e05    mov v5.16b, v16.16b
0x0000ffff7b800728  aa0b03f4    mov x20, x11
0x0000ffff7b80072c  4e083c95    mov x21, v4.d[0]
0x0000ffff7b800730  4e083cb6    mov x22, v5.d[0]
0x0000ffff7b800734  d3471eb7    ubfx x23, x21, #7, #1
0x0000ffff7b800738  b4000077    cbz x23, #+0xc (addr 0xffff7b800744)
0x0000ffff7b80073c  d3401ed7    uxtb x23, w22
0x0000ffff7b800740  39000297    strb w23, [x20]
0x0000ffff7b800744  d34f3eb7    ubfx x23, x21, #15, #1
0x0000ffff7b800748  b4000077    cbz x23, #+0xc (addr 0xffff7b800754)
0x0000ffff7b80074c  d3483ed7    ubfx x23, x22, #8, #8
0x0000ffff7b800750  39000697    strb w23, [x20, #1]
0x0000ffff7b800754  d3575eb7    ubfx x23, x21, #23, #1
0x0000ffff7b800758  b4000077    cbz x23, #+0xc (addr 0xffff7b800764)
0x0000ffff7b80075c  d3505ed7    ubfx x23, x22, #16, #8
0x0000ffff7b800760  39000a97    strb w23, [x20, #2]
0x0000ffff7b800764  d35f7eb7    ubfx x23, x21, #31, #1
0x0000ffff7b800768  b4000077    cbz x23, #+0xc (addr 0xffff7b800774)
0x0000ffff7b80076c  d3587ed7    ubfx x23, x22, #24, #8
0x0000ffff7b800770  39000e97    strb w23, [x20, #3]
0x0000ffff7b800774  d3679eb7    ubfx x23, x21, #39, #1
0x0000ffff7b800778  b4000077    cbz x23, #+0xc (addr 0xffff7b800784)
0x0000ffff7b80077c  d3609ed7    ubfx x23, x22, #32, #8
0x0000ffff7b800780  39001297    strb w23, [x20, #4]
0x0000ffff7b800784  d36fbeb7    ubfx x23, x21, #47, #1
0x0000ffff7b800788  b4000077    cbz x23, #+0xc (addr 0xffff7b800794)
0x0000ffff7b80078c  d368bed7    ubfx x23, x22, #40, #8
0x0000ffff7b800790  39001697    strb w23, [x20, #5]
0x0000ffff7b800794  d377deb7    ubfx x23, x21, #55, #1
0x0000ffff7b800798  b4000077    cbz x23, #+0xc (addr 0xffff7b8007a4)
0x0000ffff7b80079c  d370ded7    ubfx x23, x22, #48, #8
0x0000ffff7b8007a0  39001a97    strb w23, [x20, #6]
0x0000ffff7b8007a4  d37ffeb5    lsr x21, x21, #63
0x0000ffff7b8007a8  b4000075    cbz x21, #+0xc (addr 0xffff7b8007b4)
0x0000ffff7b8007ac  d378fed5    lsr x21, x22, #56
0x0000ffff7b8007b0  39001e95    strb w21, [x20, #7]
0x0000ffff7b8007b4  4e183c95    mov x21, v4.d[1]
0x0000ffff7b8007b8  4e183cb6    mov x22, v5.d[1]
0x0000ffff7b8007bc  d3471eb7    ubfx x23, x21, #7, #1
0x0000ffff7b8007c0  b4000077    cbz x23, #+0xc (addr 0xffff7b8007cc)
0x0000ffff7b8007c4  d3401ed7    uxtb x23, w22
0x0000ffff7b8007c8  39002297    strb w23, [x20, #8]
0x0000ffff7b8007cc  d34f3eb7    ubfx x23, x21, #15, #1
0x0000ffff7b8007d0  b4000077    cbz x23, #+0xc (addr 0xffff7b8007dc)
0x0000ffff7b8007d4  d3483ed7    ubfx x23, x22, #8, #8
0x0000ffff7b8007d8  39002697    strb w23, [x20, #9]
0x0000ffff7b8007dc  d3575eb7    ubfx x23, x21, #23, #1
0x0000ffff7b8007e0  b4000077    cbz x23, #+0xc (addr 0xffff7b8007ec)
0x0000ffff7b8007e4  d3505ed7    ubfx x23, x22, #16, #8
0x0000ffff7b8007e8  39002a97    strb w23, [x20, #10]
0x0000ffff7b8007ec  d35f7eb7    ubfx x23, x21, #31, #1
0x0000ffff7b8007f0  b4000077    cbz x23, #+0xc (addr 0xffff7b8007fc)
0x0000ffff7b8007f4  d3587ed7    ubfx x23, x22, #24, #8
0x0000ffff7b8007f8  39002e97    strb w23, [x20, #11]
0x0000ffff7b8007fc  d3679eb7    ubfx x23, x21, #39, #1
0x0000ffff7b800800  b4000077    cbz x23, #+0xc (addr 0xffff7b80080c)
0x0000ffff7b800804  d3609ed7    ubfx x23, x22, #32, #8
0x0000ffff7b800808  39003297    strb w23, [x20, #12]
0x0000ffff7b80080c  d36fbeb7    ubfx x23, x21, #47, #1
0x0000ffff7b800810  b4000077    cbz x23, #+0xc (addr 0xffff7b80081c)
0x0000ffff7b800814  d368bed7    ubfx x23, x22, #40, #8
0x0000ffff7b800818  39003697    strb w23, [x20, #13]
0x0000ffff7b80081c  d377deb7    ubfx x23, x21, #55, #1
0x0000ffff7b800820  b4000077    cbz x23, #+0xc (addr 0xffff7b80082c)
0x0000ffff7b800824  d370ded7    ubfx x23, x22, #48, #8
0x0000ffff7b800828  39003a97    strb w23, [x20, #14]
0x0000ffff7b80082c  d37ffeb5    lsr x21, x21, #63
0x0000ffff7b800830  b4000075    cbz x21, #+0xc (addr 0xffff7b80083c)
0x0000ffff7b800834  d378fed5    lsr x21, x22, #56
0x0000ffff7b800838  39003e95    strb w21, [x20, #15]
0x0000ffff7b80083c  58000040    ldr x0, pc+8 (addr 0xffff7b800844)
0x0000ffff7b800840  d63f0000    blr x0
```

After:
```asm
0x0000ffff7ac00718  10ffffe0            adr x0, #-0x4 (addr 0xffff7ac00714)
0x0000ffff7ac0071c  f9005f80            str x0, [x28, #184]
0x0000ffff7ac00720  4e20aa24            cmlt v4.16b, v17.16b, #0
0x0000ffff7ac00724  3dc00165            ldr q5, [x11]
0x0000ffff7ac00728  4ea41c80            mov v0.16b, v4.16b
0x0000ffff7ac0072c  6e651e00            bsl v0.16b, v16.16b, v5.16b
0x0000ffff7ac00730  4ea01c04            mov v4.16b, v0.16b
0x0000ffff7ac00734  3d800164            str q4, [x11]
0x0000ffff7ac00738  58000040            ldr x0, pc+8 (addr 0xffff7ac00740)
0x0000ffff7ac0073c  d63f0000            blr x0
```
2023-07-07 18:37:17 -07:00
Ryan Houdek
052aa4317b OpcodeDispatcher: Optimize GetPackedRFLAG
Only return the particular flags that are being requested in the moment
since compacting them all when requested is fairly slow.

x87 fcmov in particular was requesting all the flags when it only needs
a couple.
This reduces a `fcmovb` instruction count blowup from 103x to 38x. Still
more room to go but this one stood out as being particularly bad.

Old:
```asm
0x0000000265a002bc  10ffffe0    adr x0, #-0x4 (addr 0x265a002b8)
0x0000000265a002c0  f9005f80    str x0, [x28, #184]
0x0000000265a002c4  d2800014    mov x20, #0x0
0x0000000265a002c8  d2800035    mov x21, #0x1
0x0000000265a002cc  d2800056    mov x22, #0x2
0x0000000265a002d0  394b0397    ldrb w23, [x28, #704]
0x0000000265a002d4  d3407ef7    ubfx x23, x23, #0, #32
0x0000000265a002d8  aa1702d6    orr x22, x22, x23
0x0000000265a002dc  394b0b97    ldrb w23, [x28, #706]
0x0000000265a002e0  d3407ef7    ubfx x23, x23, #0, #32
0x0000000265a002e4  531e76f7    lsl w23, w23, #2
0x0000000265a002e8  aa1702d6    orr x22, x22, x23
0x0000000265a002ec  394b1397    ldrb w23, [x28, #708]
0x0000000265a002f0  d3407ef7    ubfx x23, x23, #0, #32
0x0000000265a002f4  531c6ef7    lsl w23, w23, #4
0x0000000265a002f8  aa1702d6    orr x22, x22, x23
0x0000000265a002fc  394b1b97    ldrb w23, [x28, #710]
0x0000000265a00300  d3407ef7    ubfx x23, x23, #0, #32
0x0000000265a00304  531a66f7    lsl w23, w23, #6
0x0000000265a00308  aa1702d6    orr x22, x22, x23
0x0000000265a0030c  394b1f97    ldrb w23, [x28, #711]
0x0000000265a00310  d3407ef7    ubfx x23, x23, #0, #32
0x0000000265a00314  531962f7    lsl w23, w23, #7
0x0000000265a00318  aa1702d6    orr x22, x22, x23
0x0000000265a0031c  394b2397    ldrb w23, [x28, #712]
0x0000000265a00320  d3407ef7    ubfx x23, x23, #0, #32
0x0000000265a00324  53185ef7    lsl w23, w23, #8
0x0000000265a00328  aa1702d6    orr x22, x22, x23
0x0000000265a0032c  394b2797    ldrb w23, [x28, #713]
0x0000000265a00330  d3407ef7    ubfx x23, x23, #0, #32
0x0000000265a00334  53175af7    lsl w23, w23, #9
0x0000000265a00338  aa1702d6    orr x22, x22, x23
0x0000000265a0033c  394b2b97    ldrb w23, [x28, #714]
0x0000000265a00340  d3407ef7    ubfx x23, x23, #0, #32
0x0000000265a00344  531656f7    lsl w23, w23, #10
0x0000000265a00348  aa1702d6    orr x22, x22, x23
0x0000000265a0034c  394b2f97    ldrb w23, [x28, #715]
0x0000000265a00350  d3407ef7    ubfx x23, x23, #0, #32
0x0000000265a00354  531552f7    lsl w23, w23, #11
0x0000000265a00358  aa1702d6    orr x22, x22, x23
0x0000000265a0035c  394b3397    ldrb w23, [x28, #716]
0x0000000265a00360  d3407ef7    ubfx x23, x23, #0, #32
0x0000000265a00364  53144ef7    lsl w23, w23, #12
0x0000000265a00368  aa1702d6    orr x22, x22, x23
0x0000000265a0036c  394b3b97    ldrb w23, [x28, #718]
0x0000000265a00370  d3407ef7    ubfx x23, x23, #0, #32
0x0000000265a00374  531246f7    lsl w23, w23, #14
0x0000000265a00378  aa1702d6    orr x22, x22, x23
0x0000000265a0037c  394b4397    ldrb w23, [x28, #720]
0x0000000265a00380  d3407ef7    ubfx x23, x23, #0, #32
0x0000000265a00384  53103ef7    lsl w23, w23, #16
0x0000000265a00388  aa1702d6    orr x22, x22, x23
0x0000000265a0038c  394b4797    ldrb w23, [x28, #721]
0x0000000265a00390  d3407ef7    ubfx x23, x23, #0, #32
0x0000000265a00394  530f3af7    lsl w23, w23, #17
0x0000000265a00398  aa1702d6    orr x22, x22, x23
0x0000000265a0039c  394b4b97    ldrb w23, [x28, #722]
0x0000000265a003a0  d3407ef7    ubfx x23, x23, #0, #32
0x0000000265a003a4  530e36f7    lsl w23, w23, #18
0x0000000265a003a8  aa1702d6    orr x22, x22, x23
0x0000000265a003ac  394b4f97    ldrb w23, [x28, #723]
0x0000000265a003b0  d3407ef7    ubfx x23, x23, #0, #32
0x0000000265a003b4  530d32f7    lsl w23, w23, #19
0x0000000265a003b8  aa1702d6    orr x22, x22, x23
0x0000000265a003bc  394b5397    ldrb w23, [x28, #724]
0x0000000265a003c0  d3407ef7    ubfx x23, x23, #0, #32
0x0000000265a003c4  530c2ef7    lsl w23, w23, #20
0x0000000265a003c8  aa1702d6    orr x22, x22, x23
0x0000000265a003cc  394b5797    ldrb w23, [x28, #725]
0x0000000265a003d0  d3407ef7    ubfx x23, x23, #0, #32
0x0000000265a003d4  530b2af7    lsl w23, w23, #21
0x0000000265a003d8  aa1702d6    orr x22, x22, x23
0x0000000265a003dc  924002d6    and x22, x22, #0x1
0x0000000265a003e0  93400294    sbfx x20, x20, #0, #1
0x0000000265a003e4  934002b5    sbfx x21, x21, #0, #1
0x0000000265a003e8  f10002df    cmp x22, #0x0 (0)
0x0000000265a003ec  9a950294    csel x20, x20, x21, eq
0x0000000265a003f0  4e080e84    dup v4.2d, x20
0x0000000265a003f4  394baf94    ldrb w20, [x28, #747]
0x0000000265a003f8  91000695    add x21, x20, #0x1 (1)
0x0000000265a003fc  92400ab5    and x21, x21, #0x7
0x0000000265a00400  d2800200    mov x0, #0x10
0x0000000265a00404  9b007e80    mul x0, x20, x0
0x0000000265a00408  8b000380    add x0, x28, x0
0x0000000265a0040c  3dc0bc05    ldr q5, [x0, #752]
0x0000000265a00410  d2800200    mov x0, #0x10
0x0000000265a00414  9b007ea0    mul x0, x21, x0
0x0000000265a00418  8b000380    add x0, x28, x0
0x0000000265a0041c  3dc0bc06    ldr q6, [x0, #752]
0x0000000265a00420  4ea41c80    mov v0.16b, v4.16b
0x0000000265a00424  6e651cc0    bsl v0.16b, v6.16b, v5.16b
0x0000000265a00428  4ea01c04    mov v4.16b, v0.16b
0x0000000265a0042c  d2800200    mov x0, #0x10
0x0000000265a00430  9b007e80    mul x0, x20, x0
0x0000000265a00434  8b000380    add x0, x28, x0
0x0000000265a00438  3d80bc04    str q4, [x0, #752]
0x0000000265a0043c  58000040    ldr x0, pc+8 (addr 0x265a00444)
0x0000000265a00440  d63f0000    blr x0
```

New:
```asm
0x0000000265a002bc  10ffffe0    adr x0, #-0x4 (addr 0x265a002b8)
0x0000000265a002c0  f9005f80    str x0, [x28, #184]
0x0000000265a002c4  d2800014    mov x20, #0x0
0x0000000265a002c8  d2800035    mov x21, #0x1
0x0000000265a002cc  d2800056    mov x22, #0x2
0x0000000265a002d0  394b0397    ldrb w23, [x28, #704]
0x0000000265a002d4  330002f6    bfxil w22, w23, #0, #1
0x0000000265a002d8  924002d6    and x22, x22, #0x1
0x0000000265a002dc  93400294    sbfx x20, x20, #0, #1
0x0000000265a002e0  934002b5    sbfx x21, x21, #0, #1
0x0000000265a002e4  f10002df    cmp x22, #0x0 (0)
0x0000000265a002e8  9a950294    csel x20, x20, x21, eq
0x0000000265a002ec  4e080e84    dup v4.2d, x20
0x0000000265a002f0  394baf94    ldrb w20, [x28, #747]
0x0000000265a002f4  91000695    add x21, x20, #0x1 (1)
0x0000000265a002f8  92400ab5    and x21, x21, #0x7
0x0000000265a002fc  d2800200    mov x0, #0x10
0x0000000265a00300  9b007e80    mul x0, x20, x0
0x0000000265a00304  8b000380    add x0, x28, x0
0x0000000265a00308  3dc0bc05    ldr q5, [x0, #752]
0x0000000265a0030c  d2800200    mov x0, #0x10
0x0000000265a00310  9b007ea0    mul x0, x21, x0
0x0000000265a00314  8b000380    add x0, x28, x0
0x0000000265a00318  3dc0bc06    ldr q6, [x0, #752]
0x0000000265a0031c  4ea41c80    mov v0.16b, v4.16b
0x0000000265a00320  6e651cc0    bsl v0.16b, v6.16b, v5.16b
0x0000000265a00324  4ea01c04    mov v4.16b, v0.16b
0x0000000265a00328  d2800200    mov x0, #0x10
0x0000000265a0032c  9b007e80    mul x0, x20, x0
0x0000000265a00330  8b000380    add x0, x28, x0
0x0000000265a00334  3d80bc04    str q4, [x0, #752]
0x0000000265a00338  58000040    ldr x0, pc+8 (addr 0x265a00340)
0x0000000265a0033c  d63f0000    blr x0
```
2023-07-07 17:01:59 -07:00
Ryan Houdek
debcb0e047 Arm64: Optimize BFI in the case that Dst == srcDst
ARM64 BFI doesn't allow you to encode two source registers here to match
our SSA semantics. Also since we don't support RA constraints to ensure
that these match, just do the optimal case in the backend.

Leave a comment for future RA contraint excavators to make this more
optimal
2023-07-07 16:43:41 -07:00
Ryan Houdek
baf04b6a41 FEXCore: Minor cleanup
This isn't required anymore since we are exposing the virtual class
directly.
2023-07-07 15:06:14 -07:00
Ryan Houdek
cc85a6a722 Docs: Update for release FEX-2307 FEX-2307 2023-07-07 08:33:30 -07:00
Ryan Houdek
e72fa02897
Merge pull request #2739 from Sonicadvance1/fork_mutexes
Linux: Fixes hangs due to mutexes locked while fork happens.
2023-07-05 15:01:05 -07:00
Mai
8a4c5bcc65
Merge pull request #2741 from Sonicadvance1/workaround_stdc++_bug
FHU: Workaround libstdc++ version 13+ bug
2023-07-05 17:20:35 -04:00
Ryan Houdek
7a13a24c05 FHU: Workaround libstdc++ version 13+ bug
In libstdc++ version 13, they moved the implementation of
`polymorphic_allocator` to `bits/memory_resource.h`.
In doing so they forgot to move the template's default argument to that
header. This causes the problem that `bits/memory_resource.h` is
included first without the template's default argument defined. This
breaking the automatic type deducation of `std::byte`.

Still broken in
[upstream](be240fc6ac/libstdc%2B%2B-v3/include/std/memory_resource (L79-L83))
and is unlikely to be fixed and backported. Since this is the only place
we use this type, just fix it here.
2023-07-05 13:52:23 -07:00
Mai
5a53931b92
Merge pull request #2738 from Sonicadvance1/xattr_emulatedpath
Linux: Handle xattr syscalls with emulated paths.
2023-07-05 15:19:18 -04:00
Ryan Houdek
f9b352a093 Linux: Fixes hangs due to mutexes locked while fork happens.
When a fork occurs FEX needs to be incredibly careful as any thread
(that isn't forking) that holds a lock will vanish when the fork occurs.

At this point if the newly forked process tries to use these mutexes
then the process hangs indefinitely.

The three major mutexes that need to be held during a fork:
- Code Invalidation mutex
  - This is the highest priority and causes us to hang frequently.
  - This is highly likely to occur when one thread is loading shared
    libraries and another thread is forking.
     - Happens frequently with Wine and steam.
- VMA tracking mutex
  - This one happens when one thread is allocating memory while a fork
    occurs.
  - This closely relates to the code invalidation mutex, just happens at
    the syscall layer instead of the FEXCore layer.
  - Happens as frequently as the code invalidation mutex.
- Allocation mutex
  - This mutex is used for FEX's 64-bit Allocator, this happens when FEX
    is allocating memory on one thread and a fork occurs.
  - Fairly infrequent because jemalloc doesn't allocate VMA regions that
    often.

While this likely doesn't hit all of the FEX mutexes, this hits the ones
that are burning fires and are happening frequently.

- FEXCore: Adds forkable mutex/locks

Necessary since we have a few locations in FEX that need to be locked
before and after a fork.

When a fork occurs the locks must be locked prior to the fork. Then
afterwards they either need to unlock or be set to default
initialization state.
- Parent
   - Does an unlock
- Child
   - Sets the lock to default initialization state
   - This is because it pthreads does TID based ownership checking on
     unique locks and refcount based waiting for shared locks.
   - No way to "unlock" after fork in this case other than default
     initializing.
2023-07-04 02:13:06 -07:00
Mai
f444b03317
Merge pull request #2740 from Sonicadvance1/faccessat2
Linux: Stop using faccessat2 for faccessat emulation
2023-07-03 14:35:31 -04:00
Ryan Houdek
ed05846dd0 Linux: Stop using faccessat2 for faccessat emulation
This can can issues when running on devices with kernel older than 5.8.
2023-07-02 17:22:15 -07:00
Ryan Houdek
f609990f90 Linux: Handle xattr syscalls with emulated paths.
Fixes a spurious `No such file or directory` error when `ls` is trying
to query a path's xattributes that come from the emulated rootfs.

These syscalls don't support the *at variants, so it can't use the optimized `GetEmulatedFDPath` implementation.
It must also return an error on a found file path, which makes their
implementation be slightly different than the other user of of
`GetEmulatedPath`. In the case of error, it must only return an error
from the emulated path if it is /not/ ENOENT.

Before:
```
$ FEXInterpreter /usr/bin/ls -alth /usr/bin/wine-stable
/usr/bin/ls: /usr/bin/wine-stable: No such file or directory
-rwxr-xr-x 1 ryanh ryanh 1.1K Sep 24  2022 /usr/bin/wine-stable
```

After:
```
$ FEXInterpreter /usr/bin/ls -alth /usr/bin/wine-stable
-rwxr-xr-x 1 ryanh ryanh 1.1K Sep 24  2022 /usr/bin/wine-stable
```
2023-07-01 16:47:19 -07:00
Ryan Houdek
d2032da452
Merge pull request #2737 from bylaws/main
Some small fixes for android building
2023-07-01 14:59:18 -07:00
Mai
8047007a7a
Merge pull request #2734 from Sonicadvance1/add_cssc
Emitter: Adds support for CSSC
2023-07-01 17:58:31 -04:00
Billy Laws
1f7e82ea09 CMake: Allow for disabling FEXConfig building
It's useful even in non-termux builds to be able to disable FEXConfig due to its build-time dependencies.
2023-07-01 22:21:17 +01:00
Billy Laws
35c52f20f9 AllocatorHooks: Avoid referencing valloc on Android
This is not implemented in bionic, so follow the MINGW approach and implement it with _aligned_alloc.
2023-07-01 22:21:16 +01:00