1357 Commits

Author SHA1 Message Date
Alyssa Rosenzweig
4d503d3155 RegisterAllocationPass: drop prewritable check
always true.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-05-08 14:24:41 -04:00
Alyssa Rosenzweig
7e663b91df IR: drop IRParser
Aside from its own self-test, the parser is unused and should remain that way,
since it's a maintenance burden with no real benefit. Burn it.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-05-08 14:16:54 -04:00
Alyssa Rosenzweig
3c3ba62c10 MemoryOps: optimize 32-bit SRA case
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-05-08 14:01:42 -04:00
Alyssa Rosenzweig
34fe56dfb2 DeadStoreElimination: CSE block info
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-05-08 14:01:42 -04:00
Alyssa Rosenzweig
ecf8cde5e0 DeadStoreElimination: group common logic
slightly less obnoxious copypaste.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-05-08 14:01:42 -04:00
Alyssa Rosenzweig
55284aad7e DeadStoreElimination: don't handle partial stores
SRA replaces the whole contents of the destination.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-05-08 14:01:42 -04:00
Alyssa Rosenzweig
3afc35f7b4 DeadStoreElimination: simplify
use registers internally, not synthesized offsets

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-05-08 14:01:42 -04:00
Alyssa Rosenzweig
1058428a51 IR: document invariant on SRA
This lets us simplify a lot!

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-05-08 14:01:42 -04:00
Alyssa Rosenzweig
a2fc51fc7b IR: specify registers, not offsets for SRA
SRA is fundamentally about hardware registers, not stores into a
software-defined context. So, it should take a register instead of an offset.
This makes all the unaligned special cases unrepresentable (by design).

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-05-08 14:01:42 -04:00
Alyssa Rosenzweig
1848629ba5 RegisterAllocationPass: drop aliasable check
always true with the new ir invariants.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-05-08 14:01:42 -04:00
Alyssa Rosenzweig
3399577330 JIT: clean up fpr sra
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-05-08 14:01:42 -04:00
Alyssa Rosenzweig
18bfc8afd0 JIT: clean up gpr sra handling
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-05-08 14:01:42 -04:00
Alyssa Rosenzweig
76b023ed3e JIT: drop unaligned and partial SRA handling
This is all dead, assert as much so it stays that way.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-05-08 14:01:42 -04:00
Alyssa Rosenzweig
b91b0e9d65 IR: infer SRA static class
no need to stick it in the IR.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-05-08 14:01:42 -04:00
Alyssa Rosenzweig
74489a4177 IR: remove dead SRA flags
I don't know what these were meant for, and I don't care (-:

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-05-08 14:01:42 -04:00
Billy Laws
bd24364c1b FEXCore: Switch stacks before exiting the JIT on ARM64EC
This removes the need for the frontend to have any knowledge of FEX's
SRA layout.
2024-05-06 15:41:34 +00:00
Billy Laws
ab516d7b79 Dispatcher: Implement ARM64EC SRA setup entrypoints
While the ARM64EC ABI mostly matches FEX's SRA, the stack still needs to
be switched to the emulator stack and target RIP stored into the FEX
context before jumping to the dispatcher loop.
2024-05-06 15:41:34 +00:00
Billy Laws
d25ed4b0bf Dispatcher: Block system call callbacks when compiling code
These callbacks are used for code invalidation and setting the right
emulated CPU features, neither of which are necessary for syscalls made
from within FEX. Avoid calling them to prevent deadlocks caused by
nested locks during compilation.
2024-05-06 15:41:28 +00:00
Ryan Houdek
5f0427c253
CPUID: Adds Qualcomm Oryon product name
From https://github.com/llvm/llvm-project/pull/91022

Easy enough
2024-05-03 20:16:46 -07:00
Ryan Houdek
faa494c288
Merge pull request #3605 from Sonicadvance1/move_fex_versionstring_cpuid
CPUID: Removes FEX version string from CPU model name
2024-05-02 11:20:49 -07:00
Ryan Houdek
6228226c08
CPUID: Fix inverted RDTSCP check
This was inverted and always enabling the RDTSCP cpuid bit for wine.
Thus always disabling it elsewhere.
2024-05-01 18:31:41 -07:00
Ryan Houdek
31341bb7c2
CPUID: Removes FEX version string from CPU model name
Moves it to the hypervisor leafs.

Before:
```bash
$ FEXBash 'cat /proc/cpuinfo | grep "model name"'
model name      : FEX-2404-101-gf9effcb           Cortex-A78C
model name      : FEX-2404-101-gf9effcb           Cortex-A78C
model name      : FEX-2404-101-gf9effcb           Cortex-A78C
model name      : FEX-2404-101-gf9effcb           Cortex-A78C
model name      : FEX-2404-101-gf9effcb           Cortex-X1C
model name      : FEX-2404-101-gf9effcb           Cortex-X1C
model name      : FEX-2404-101-gf9effcb           Cortex-X1C
model name      : FEX-2404-101-gf9effcb           Cortex-X1C
```

After:
```bash
$ FEXBash 'cat /proc/cpuinfo | grep "model name"'
model name      : Cortex-A78C
model name      : Cortex-A78C
model name      : Cortex-A78C
model name      : Cortex-A78C
model name      : Cortex-X1C
model name      : Cortex-X1C
model name      : Cortex-X1C
model name      : Cortex-X1C
```

Now the FEX string is in the hypervisor functions as a leaf, so if some
utility wants the FEX version they can query that directly

Ex:
```bash
$ ./Bin/FEXInterpreter get_cpuid_fex
Maximum 4000_0001h sub-leaf: 2
We are running under FEX on host: 2
FEX version string is: 'FEX-2404-113-g820494d'
```
2024-05-01 16:27:13 -07:00
Alyssa Rosenzweig
76b5ca4bcc OpcodeDispatcher: optimize 8/16-bit adc
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-04-29 20:00:34 -04:00
Alyssa Rosenzweig
28fa88ff39 OpcodeDispatcher: fix 8/16-bit adc/sbc flags
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-04-29 20:00:34 -04:00
Ryan Houdek
1069cabad0
Merge pull request #3598 from Sonicadvance1/half_barrier_delete_hack
Arm64: Adds another TSO hack to disable half-barrier TSO
2024-04-26 18:24:49 -07:00
Ryan Houdek
fe70ec7277
Merge pull request #3599 from alyssarosenzweig/jit/fix-faddv
JIT: fix neon vec4 faddv
2024-04-24 18:20:11 -07:00
Alyssa Rosenzweig
4a4fa64254 JIT: fix neon vec4 faddv
We were previously genrating nonsense code if the destination != source:

         faddp v2.4s, v4.4s, v4.4s
         faddp s2, v4.2s

The result of the first faddp is ignored, so the second merely calculates the
sum of the first 2 sources (not all 4 as needed).

The correct fix is to feed the first add into the second, regardless of the
final destination:

         faddp v2.4s, v4.4s, v4.4s
         faddp s2, v2.2s

Hit in an ASM test with new RA.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-04-24 21:13:02 -04:00
Ryan Houdek
6463054fa3
Arm64: Adds another TSO hack to disable half-barrier TSO
A feature of FEX's JIT is that when an unaligned atomic load/store
operation occurs, the instructions will be backpatched in to a barrier
plus a non-atomic memory instruction. This is the half-barrier technique
that still ensures correct visibility of loadstores in an unaligned
context.

The problem with this approach is that the dmb instructions are HEAVY,
because they effectively stop the world until all memory operations in
flight are visible. But it is a necessary evil since unaligned atomics
aren't a thing on ARM processors. FEAT_LSE only gives you unaligned
atomics inside of a 16-byte granularity, which doesn't match x86
behaviour of cacheline size (effectively always 64B).

This adds a new TSO option to disable the half-barrier on unaligned
atomic and instead only convert it to a regular loadstore instruction,
ommiting the half-barrier. This gives more insight in to how well a
CPU's LRCPC implementation is by not stalling on DMB instructions when
possible.

Originally implemented as a test to see if this makes Sonic Adventure 2
run full speed with TSO enabled (but all available TSO options disabled)
on NVIDIA Orin. Unfortunately this basically makes the code no longer
stall on dmb instructions and instead just showing how bad the LRCPC
implementation is, since the stalls show up on `ldapur` instructions
instead.

Tested Sonic Adventure 2 on X13s and it ran at 60FPS there without the
hack anyway.
2024-04-24 13:09:00 -07:00
Ryan Houdek
a0bf6a4255
Merge pull request #3595 from alyssarosenzweig/ir/before
Factor out SetWriteCursorBefore
2024-04-23 13:34:05 -07:00
Ryan Houdek
308488c419
Allocator: Fixes compiling on Fedora 40
This header was missing.

Either libstdc++14 or clang-18 changed includes and we were only getting
this indirectly before.
2024-04-23 12:12:33 -07:00
Alyssa Rosenzweig
2372c9458b ConstProp: use SetWriteCursorBefore
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-04-23 13:09:45 -04:00
Alyssa Rosenzweig
1a11343f34 IREmitter: add SetWriteCursorBefore helper
This is subtle, add an ergonomic helper for it.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-04-23 13:09:45 -04:00
Ryan Houdek
84d5b3ee59
CPUID: Enable enhanced rep movs in more situations
Instead of only enabling enhanced rep movs if software TSO is disabled,
Enable it if software tso is disabled OR memcpysettso is disabled. This
is because now we hit the fast path when memcpysettso is disabled alone
but global TSO is disabled.

Retested Hades and performance was fine in this configuration.
2024-04-21 18:50:17 -07:00
Ryan Houdek
376936c808
Merge pull request #3591 from alyssarosenzweig/ra/fix
JIT: fix ShiftFlags shuffles
2024-04-19 13:50:45 -07:00
Alyssa Rosenzweig
932b8f38f4 JIT: fix ShiftFlags shuffles
messed up my RA.

fixes ShiftPF.asm with jit_1 with a pathological register allocation

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-04-18 14:11:03 -04:00
Ryan Houdek
c8704a7f71
OpcodeDispatcher: Implement support for SMSW
Found out that Far Cry uses this instruction and it is viable to use in
CPL-3. This only returns constant data but its behaviour is a little
quirky.

This instruction has a weird behaviour that the 32-bit operation does an
insert in to the 64-bit destination, which might be an Intel versus AMD
behaviour. I don't have an Intel machine available to test if that
theory is true although. This assumption would match similar behaviour
where segment registers are inserted instead of zext.

Gets the game farther but then it crashes in a `___ascii_strnicmp`
function where the arguments end up being `___ascii_strnicmp(nullptr, "Color", 5);`.
2024-04-18 07:41:39 -07:00
Alyssa Rosenzweig
352dcdb478 RCLSE: disable store-after-store optimization
Functional revert of 92f31648b ("RCLSE: optimize out pointless stores"), which
reportedly regressed some titles due to RA doom. We'll revisit later, leaving in
the code for when RA is ready to light this up.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-04-17 14:53:11 -04:00
Paulo Matos
905aa935f5 Reformat until fixed-point
Followup to 2b4ec88daebd35fefb5bf5c73d7fc2b4155771ed.
Some files needed a couple of calls to clang-format 16.0.6 to
reach a fixed point.
2024-04-15 09:40:00 +02:00
Ryan Houdek
7614ac9f14
Merge pull request #3573 from pmatos/RemoveClangTidy
Remove trace of clang-tidy experiment from CMakeLists.txt
2024-04-12 17:13:53 -07:00
Paulo Matos
2b4ec88dae Whole-tree reformat
This follows discussions from #3413.
Followup commits add clang-format file, script and blame ignore lists.
2024-04-12 16:26:02 +02:00
Paulo Matos
6524716404 Move const to the left in preparation for reformatting
clang-format-16 had some issues with const placement, so we are manually changing these.
2024-04-12 16:06:57 +02:00
Paulo Matos
20559853ee Remove trace of clang-tidy experiment from CMakeLists.txt 2024-04-12 12:31:04 +02:00
Ryan Houdek
a9b7ad841c
Merge pull request #3570 from bylaws/ec_pt8
Enable jemalloc for ARM64EC
2024-04-11 12:57:36 -07:00
Ryan Houdek
271700e9f6
Merge pull request #3568 from lioncash/const
X87: Simplify constant loading for FLD family
2024-04-11 00:32:26 -07:00
Ryan Houdek
1ba678f631
Merge pull request #3562 from Sonicadvance1/fix_rsp_store_tso
OpcodeDispatcher: Fixes disabling TSO access on RSP SIB stores
2024-04-11 00:32:14 -07:00
Ryan Houdek
a0f2cae1cb
Merge pull request #3567 from lioncash/veczero
IR: Remove VectorZero
2024-04-09 20:04:26 -07:00
Ryan Houdek
66cbb66732
Merge pull request #3566 from lioncash/address
OpcodeDispatcher: Add helper for making segment offset addresses
2024-04-09 17:31:43 -07:00
Billy Laws
f1f0c47f16 AllocatorHooks: Allow using jemalloc on win32 2024-04-09 23:42:23 +00:00
Lioncache
4cb2432b5c OpcodeDispatcher: Make use of new x87 constants
Now we can load these directly instead of needing to manually materialize them.
2024-04-09 10:17:15 -04:00
Lioncache
27ba66a181 IRDumper: Extend printer for NamedVectorConstant
Makes it aware of the x87 constants.
2024-04-09 10:13:35 -04:00