Commit Graph

1338 Commits

Author SHA1 Message Date
Ryan Houdek
faa494c288
Merge pull request #3605 from Sonicadvance1/move_fex_versionstring_cpuid
CPUID: Removes FEX version string from CPU model name
2024-05-02 11:20:49 -07:00
Ryan Houdek
6228226c08
CPUID: Fix inverted RDTSCP check
This was inverted and always enabling the RDTSCP cpuid bit for wine.
Thus always disabling it elsewhere.
2024-05-01 18:31:41 -07:00
Ryan Houdek
31341bb7c2
CPUID: Removes FEX version string from CPU model name
Moves it to the hypervisor leafs.

Before:
```bash
$ FEXBash 'cat /proc/cpuinfo | grep "model name"'
model name      : FEX-2404-101-gf9effcb           Cortex-A78C
model name      : FEX-2404-101-gf9effcb           Cortex-A78C
model name      : FEX-2404-101-gf9effcb           Cortex-A78C
model name      : FEX-2404-101-gf9effcb           Cortex-A78C
model name      : FEX-2404-101-gf9effcb           Cortex-X1C
model name      : FEX-2404-101-gf9effcb           Cortex-X1C
model name      : FEX-2404-101-gf9effcb           Cortex-X1C
model name      : FEX-2404-101-gf9effcb           Cortex-X1C
```

After:
```bash
$ FEXBash 'cat /proc/cpuinfo | grep "model name"'
model name      : Cortex-A78C
model name      : Cortex-A78C
model name      : Cortex-A78C
model name      : Cortex-A78C
model name      : Cortex-X1C
model name      : Cortex-X1C
model name      : Cortex-X1C
model name      : Cortex-X1C
```

Now the FEX string is in the hypervisor functions as a leaf, so if some
utility wants the FEX version they can query that directly

Ex:
```bash
$ ./Bin/FEXInterpreter get_cpuid_fex
Maximum 4000_0001h sub-leaf: 2
We are running under FEX on host: 2
FEX version string is: 'FEX-2404-113-g820494d'
```
2024-05-01 16:27:13 -07:00
Alyssa Rosenzweig
76b5ca4bcc OpcodeDispatcher: optimize 8/16-bit adc
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-04-29 20:00:34 -04:00
Alyssa Rosenzweig
28fa88ff39 OpcodeDispatcher: fix 8/16-bit adc/sbc flags
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-04-29 20:00:34 -04:00
Ryan Houdek
1069cabad0
Merge pull request #3598 from Sonicadvance1/half_barrier_delete_hack
Arm64: Adds another TSO hack to disable half-barrier TSO
2024-04-26 18:24:49 -07:00
Ryan Houdek
fe70ec7277
Merge pull request #3599 from alyssarosenzweig/jit/fix-faddv
JIT: fix neon vec4 faddv
2024-04-24 18:20:11 -07:00
Alyssa Rosenzweig
4a4fa64254 JIT: fix neon vec4 faddv
We were previously genrating nonsense code if the destination != source:

         faddp v2.4s, v4.4s, v4.4s
         faddp s2, v4.2s

The result of the first faddp is ignored, so the second merely calculates the
sum of the first 2 sources (not all 4 as needed).

The correct fix is to feed the first add into the second, regardless of the
final destination:

         faddp v2.4s, v4.4s, v4.4s
         faddp s2, v2.2s

Hit in an ASM test with new RA.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-04-24 21:13:02 -04:00
Ryan Houdek
6463054fa3
Arm64: Adds another TSO hack to disable half-barrier TSO
A feature of FEX's JIT is that when an unaligned atomic load/store
operation occurs, the instructions will be backpatched in to a barrier
plus a non-atomic memory instruction. This is the half-barrier technique
that still ensures correct visibility of loadstores in an unaligned
context.

The problem with this approach is that the dmb instructions are HEAVY,
because they effectively stop the world until all memory operations in
flight are visible. But it is a necessary evil since unaligned atomics
aren't a thing on ARM processors. FEAT_LSE only gives you unaligned
atomics inside of a 16-byte granularity, which doesn't match x86
behaviour of cacheline size (effectively always 64B).

This adds a new TSO option to disable the half-barrier on unaligned
atomic and instead only convert it to a regular loadstore instruction,
ommiting the half-barrier. This gives more insight in to how well a
CPU's LRCPC implementation is by not stalling on DMB instructions when
possible.

Originally implemented as a test to see if this makes Sonic Adventure 2
run full speed with TSO enabled (but all available TSO options disabled)
on NVIDIA Orin. Unfortunately this basically makes the code no longer
stall on dmb instructions and instead just showing how bad the LRCPC
implementation is, since the stalls show up on `ldapur` instructions
instead.

Tested Sonic Adventure 2 on X13s and it ran at 60FPS there without the
hack anyway.
2024-04-24 13:09:00 -07:00
Ryan Houdek
a0bf6a4255
Merge pull request #3595 from alyssarosenzweig/ir/before
Factor out SetWriteCursorBefore
2024-04-23 13:34:05 -07:00
Ryan Houdek
308488c419
Allocator: Fixes compiling on Fedora 40
This header was missing.

Either libstdc++14 or clang-18 changed includes and we were only getting
this indirectly before.
2024-04-23 12:12:33 -07:00
Alyssa Rosenzweig
2372c9458b ConstProp: use SetWriteCursorBefore
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-04-23 13:09:45 -04:00
Alyssa Rosenzweig
1a11343f34 IREmitter: add SetWriteCursorBefore helper
This is subtle, add an ergonomic helper for it.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-04-23 13:09:45 -04:00
Ryan Houdek
84d5b3ee59
CPUID: Enable enhanced rep movs in more situations
Instead of only enabling enhanced rep movs if software TSO is disabled,
Enable it if software tso is disabled OR memcpysettso is disabled. This
is because now we hit the fast path when memcpysettso is disabled alone
but global TSO is disabled.

Retested Hades and performance was fine in this configuration.
2024-04-21 18:50:17 -07:00
Ryan Houdek
376936c808
Merge pull request #3591 from alyssarosenzweig/ra/fix
JIT: fix ShiftFlags shuffles
2024-04-19 13:50:45 -07:00
Alyssa Rosenzweig
932b8f38f4 JIT: fix ShiftFlags shuffles
messed up my RA.

fixes ShiftPF.asm with jit_1 with a pathological register allocation

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-04-18 14:11:03 -04:00
Ryan Houdek
c8704a7f71
OpcodeDispatcher: Implement support for SMSW
Found out that Far Cry uses this instruction and it is viable to use in
CPL-3. This only returns constant data but its behaviour is a little
quirky.

This instruction has a weird behaviour that the 32-bit operation does an
insert in to the 64-bit destination, which might be an Intel versus AMD
behaviour. I don't have an Intel machine available to test if that
theory is true although. This assumption would match similar behaviour
where segment registers are inserted instead of zext.

Gets the game farther but then it crashes in a `___ascii_strnicmp`
function where the arguments end up being `___ascii_strnicmp(nullptr, "Color", 5);`.
2024-04-18 07:41:39 -07:00
Alyssa Rosenzweig
352dcdb478 RCLSE: disable store-after-store optimization
Functional revert of 92f31648b ("RCLSE: optimize out pointless stores"), which
reportedly regressed some titles due to RA doom. We'll revisit later, leaving in
the code for when RA is ready to light this up.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-04-17 14:53:11 -04:00
Paulo Matos
905aa935f5 Reformat until fixed-point
Followup to 2b4ec88dae.
Some files needed a couple of calls to clang-format 16.0.6 to
reach a fixed point.
2024-04-15 09:40:00 +02:00
Ryan Houdek
7614ac9f14
Merge pull request #3573 from pmatos/RemoveClangTidy
Remove trace of clang-tidy experiment from CMakeLists.txt
2024-04-12 17:13:53 -07:00
Paulo Matos
2b4ec88dae Whole-tree reformat
This follows discussions from #3413.
Followup commits add clang-format file, script and blame ignore lists.
2024-04-12 16:26:02 +02:00
Paulo Matos
6524716404 Move const to the left in preparation for reformatting
clang-format-16 had some issues with const placement, so we are manually changing these.
2024-04-12 16:06:57 +02:00
Paulo Matos
20559853ee Remove trace of clang-tidy experiment from CMakeLists.txt 2024-04-12 12:31:04 +02:00
Ryan Houdek
a9b7ad841c
Merge pull request #3570 from bylaws/ec_pt8
Enable jemalloc for ARM64EC
2024-04-11 12:57:36 -07:00
Ryan Houdek
271700e9f6
Merge pull request #3568 from lioncash/const
X87: Simplify constant loading for FLD family
2024-04-11 00:32:26 -07:00
Ryan Houdek
1ba678f631
Merge pull request #3562 from Sonicadvance1/fix_rsp_store_tso
OpcodeDispatcher: Fixes disabling TSO access on RSP SIB stores
2024-04-11 00:32:14 -07:00
Ryan Houdek
a0f2cae1cb
Merge pull request #3567 from lioncash/veczero
IR: Remove VectorZero
2024-04-09 20:04:26 -07:00
Ryan Houdek
66cbb66732
Merge pull request #3566 from lioncash/address
OpcodeDispatcher: Add helper for making segment offset addresses
2024-04-09 17:31:43 -07:00
Billy Laws
f1f0c47f16 AllocatorHooks: Allow using jemalloc on win32 2024-04-09 23:42:23 +00:00
Lioncache
4cb2432b5c OpcodeDispatcher: Make use of new x87 constants
Now we can load these directly instead of needing to manually materialize them.
2024-04-09 10:17:15 -04:00
Lioncache
27ba66a181 IRDumper: Extend printer for NamedVectorConstant
Makes it aware of the x87 constants.
2024-04-09 10:13:35 -04:00
Lioncache
65b5281d7c IR: Add constants for FLD variants 2024-04-09 10:13:33 -04:00
Ryan Houdek
1a8b61b9fc
Merge pull request #3560 from bylaws/ec-pt6
FEXCore: Support x64 -> arm64ec calls
2024-04-09 07:08:38 -07:00
Ryan Houdek
f0dad86332
Merge pull request #3559 from bylaws/ec-pt5
LookupCache: Track ARM64EC page state in the code cache
2024-04-09 07:08:29 -07:00
Mai
eedb120fd0
Merge pull request #3563 from Sonicadvance1/fill_spill_pairs
JIT: Adds support for spilling/Filling GPRPair
2024-04-08 23:05:11 -04:00
Lioncache
98841fe07a IR: Remove VectorZero
We have LoadNamedVectorConstant that now performs this behavior while
also being more flexible.
2024-04-08 22:42:58 -04:00
Lioncache
b0aeb501f4 OpcodeDispatcher: Add helper for making segment offset addresses
There's quite a few places where the segment offset appending is open-coded
throughout the opcode dispatcher, but we can pull these out into a few
helpers to make the sites a little more compact and declarative.
2024-04-08 17:50:58 -04:00
Lioncache
b26bf2eaf6 DebugData: Remove header
This isn't included or used anywhere, so it can be removed.
2024-04-08 16:05:40 -04:00
Ryan Houdek
574fdcef32
JIT: Adds support for spilling/Filling GPRPair
Tony noticed this last week. I encountered it this week.
Add support for spilling and filling GPR pairs.
2024-04-08 11:55:29 -07:00
Ryan Houdek
0e93fd0f3e
OpcodeDispatcher: Fixes disabling TSO access on RSP SIB stores
GPR Direct/Indirect already had this and SIB version also already
supported on the load side. Fixes this missed behaviour.
2024-04-08 11:32:01 -07:00
Alyssa Rosenzweig
063954c9b3 ValueDominanceValidation: rm deadcode
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-04-08 13:49:35 -04:00
Alyssa Rosenzweig
7b3e031678 ValueDominanceValidation: forbid crossblock liveness
Now that we have successfully eliminated crossblock liveness from the IR we
generate, validate as much to ensure it doesn't come back. We will take
advantage of this new invariant in RA in the future.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-04-08 13:49:35 -04:00
Alyssa Rosenzweig
a775e474d5 ValueDominanceValidation: do not validate inline constants
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-04-08 13:49:35 -04:00
Alyssa Rosenzweig
0e99019586 ValueDominanceValidation: actually validate
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-04-08 13:49:35 -04:00
Alyssa Rosenzweig
d9493e5d9b OpcodeDispatcher: fix xblock liveness in xsave/xrstr
didn't fix this hard enough before. caught by validation.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-04-08 13:48:29 -04:00
Alyssa Rosenzweig
eb83c9e7f2 Core: use safe CondJump for self-modifying code
this ensures we put the StoreNZCV in the right block, which will fix validation
fails later in the series.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-04-08 13:48:29 -04:00
Ryan Houdek
bb24e1419c
Merge pull request #3558 from bylaws/ec-pt3
AllocatorHooks: Mark JIT code memory as EC code on ARM64EC
2024-04-08 09:32:51 -07:00
Billy Laws
526e3e654f LookupCache: Track ARM64EC page state in the code cache
Rather than checking the actual EC bitmap in the dispatcher (~6 instrs), this
indirection through the code cache allows just 1 instr for the hot path
of calling repeated EC code/x64 code.
2024-04-08 16:08:17 +01:00
Billy Laws
243bb45a68 FEXCore: Support x64 -> arm64ec calls
The frontend will provide the return logic via ExitFunctionEC, which
will be jumped to whenever there is an indirect branch/return to an addr
such that RtlIsEcCode(addr) returns true.
2024-04-06 13:20:48 +00:00
Billy Laws
bd5b817c3a AllocatorHooks: Mark JIT code memory as EC code on ARM64EC
Executable mapped memory is treated as x86 code by default when
running under EC, VirtualAlloc2 needs to be used together with a
special flag to map JIT arm64 code.
2024-04-06 12:40:52 +00:00
Alyssa Rosenzweig
95589f6172 OpcodeDispatcher: rm deferred variable shift flag calcs
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-04-05 20:34:05 -04:00
Alyssa Rosenzweig
098859caf7 OpcodeDispatcher: use _ShiftFlags for ASHR
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-04-05 20:34:05 -04:00
Alyssa Rosenzweig
c632543451 OpcodeDispatcher: use _ShiftFlags for SHRD
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-04-05 20:34:05 -04:00
Alyssa Rosenzweig
801cf72f95 OpcodeDispatcher: use _ShiftFlags for SHLD
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-04-05 20:34:05 -04:00
Alyssa Rosenzweig
650cd2c46e OpcodeDispatcher: use _ShiftFlags for SHR
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-04-05 20:34:05 -04:00
Alyssa Rosenzweig
5b48ce2228 OpcodeDispatcher: use _ShiftFlags for SHL
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-04-05 20:34:05 -04:00
Alyssa Rosenzweig
2173c26fd8 OpcodeDispatcher: add HandleShift helper
all the variable shift impls need to do this dance, make it common.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-04-05 20:34:05 -04:00
Alyssa Rosenzweig
982391ba9d IR: add ShiftFlags op
Generates flags for a variable shift as a dedicated IR op. This lets us optimize
around it (without generating control flow, relying on deferred flag infra,
etc). And it neatly solves our RA problem for shifts.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-04-05 20:34:05 -04:00
Alyssa Rosenzweig
a99c48b7a3 RedundantFlagCalculationElimination: do not eliminate if there are uses
we'll hit this with _ShiftFlags.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-04-05 20:34:05 -04:00
Alyssa Rosenzweig
3661da1bc6 OpcodeDispatcher: calculate deferred flags before RMW on NZCV
otherwise we might have the wrong input NZCV.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-04-05 20:34:05 -04:00
Alyssa Rosenzweig
859df5e0b2 OpcodeDispatcher: optimize shl flag
This is something the new shift flag code will do. Backporting the opt since
that's stalled and this reduces the diff.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-04-05 19:38:44 -04:00
Ryan Houdek
7786c23405
Merge pull request #3556 from Sonicadvance1/move_app_config
FEXCore: Fixes priority of FEX_APP_CONFIG
2024-04-05 15:18:24 -07:00
Ryan Houdek
904646e93b
FEXCore: Fixes priority of FEX_APP_CONFIG
This environment variable had an incorrect priority on the configuration
system. The expectation was higher priority than most other layers.

Now the only layer that has higher priority is the environment
variables.
2024-04-05 13:10:43 -07:00
Alyssa Rosenzweig
a05cc06ab4 OpcodeDispatcher: unify imm/1-bit ASHR
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-04-04 07:42:15 -04:00
Alyssa Rosenzweig
031e756a78 OpcodeDispatcher: unify imm/1-bit SHR
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-04-04 07:42:15 -04:00
Alyssa Rosenzweig
2a9f1ce8cb OpcodeDispatcher: unify imm/1-bit SHL
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-04-04 07:42:15 -04:00
Alyssa Rosenzweig
8c53a9f051 OpcodeDispatcher: use LoadConstantShift for rotates
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-04-04 07:42:15 -04:00
Alyssa Rosenzweig
cf26ec7898 OpcodeDispatcher: use LoadConstantShift for SHRD
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-04-04 07:42:15 -04:00
Alyssa Rosenzweig
582c3dae6e OpcodeDispatcher: use LoadConstantShift for SHLD
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-04-04 07:42:15 -04:00
Alyssa Rosenzweig
2abac03ab0 OpcodeDispatcher: add LoadConstantShift helper
shows up a bunch

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-04-04 07:42:15 -04:00
Alyssa Rosenzweig
8cc684fa12 OpcodeDispatcher: drop misinformed comment
tbnz only tests a single bit, not a mask.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-04-04 07:42:15 -04:00
Alyssa Rosenzweig
d92de1d947 OpcodeDispatcher: drop result masking for shifts
flag calcs are fine with upper garbage.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-04-04 07:42:15 -04:00
Alyssa Rosenzweig
202a60b77a
Merge pull request #3549 from alyssarosenzweig/constprop/dce
ConstProp: drop dead code
2024-04-03 11:22:30 -04:00
Alyssa Rosenzweig
e07c81a5e7 ConstProp: also negate sub -> add
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-04-02 13:56:59 -04:00
Alyssa Rosenzweig
fa76961873 ConstProp: negate adds -> subs
the arm ops are equiv, even though the x86 isn't (due to inverted carry).

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-04-02 13:53:21 -04:00
Alyssa Rosenzweig
b92c206db9 ConstProp: rm your deadcode
not sure who this is supposed to be helping.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-04-02 13:21:48 -04:00
Alyssa Rosenzweig
efff942724 ConstProp: drop my deadcode
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-04-02 13:21:48 -04:00
Alyssa Rosenzweig
8d32113521 ConstProp: rm relic of x86 jit
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-04-02 13:21:48 -04:00
Alyssa Rosenzweig
37f2b417e4
Merge pull request #3546 from alyssarosenzweig/flag/cleanup
Minor cleanups around flags
2024-04-02 11:29:19 -04:00
Alyssa Rosenzweig
bd0b5eceb8
Merge pull request #3545 from alyssarosenzweig/opt/pf-scalar
Use scalar integer code to calculate PF
2024-04-02 11:28:53 -04:00
Alyssa Rosenzweig
b632f7215c
Merge pull request #3544 from alyssarosenzweig/ra/zero-multiple
OpcodeDispatcher: drop ZeroMultipleFlags
2024-04-02 11:27:45 -04:00
Ryan Houdek
e8abc88702
Merge pull request #3542 from alyssarosenzweig/ra/rep
Eliminate xblock liveness with rep cmp/lod/scas
2024-04-02 04:24:24 -07:00
Ryan Houdek
29c6281e11
Merge pull request #3539 from alyssarosenzweig/ra/rol-ror2
rewrite ROL/ROR
2024-04-02 00:17:08 -07:00
Alyssa Rosenzweig
067a5444dc OpcodeDispatcher: use HandleNZ00Write
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-04-01 16:42:38 -04:00
Alyssa Rosenzweig
c7f159972d OpcodeDispatcher: rm pointless NZCV loads
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-04-01 16:42:38 -04:00
Alyssa Rosenzweig
a70d0a5dd4 OpcodeDispatcher: rm unnecessary NZCV dirtying
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-04-01 16:42:38 -04:00
Ryan Houdek
cd9ffd2045
Merge pull request #3536 from alyssarosenzweig/ra/rcl-rcr
OpcodeDispatcher: eliminate xblock liveness for rcl/rcr
2024-04-01 11:44:37 -07:00
Alyssa Rosenzweig
eb4bb5875e OpcodeDispatcher: absorb invert into PF calculation
with xorn

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-04-01 14:12:33 -04:00
Alyssa Rosenzweig
3b052e826f OpcodeDispatcher: calculate PF with integer ops
based on clang's __builtin_parity

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-04-01 14:12:32 -04:00
Alyssa Rosenzweig
65ec191dc1 IR: add XornShift
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-04-01 14:12:32 -04:00
Alyssa Rosenzweig
b1ddd8cd3b
Merge pull request #3541 from alyssarosenzweig/opt/clc
optimize clc
2024-04-01 13:51:10 -04:00
Alyssa Rosenzweig
f2d001e721
Merge pull request #3543 from alyssarosenzweig/ra/dead-code
RA: drop dead block interference code
2024-04-01 13:51:00 -04:00
Alyssa Rosenzweig
7852909cc4 OpcodeDispatcher: simplify IsNZCV
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-04-01 13:50:00 -04:00
Alyssa Rosenzweig
f8b68d8b5a OpcodeDispatcher: drop ZeroMultipleFlags
lot of complexity for only a single interesting case. we can massively simplify.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-04-01 13:48:11 -04:00
Ryan Houdek
e2a095372e
Merge pull request #3534 from Sonicadvance1/move_ir_defines
FEXCore: Move nearly all IR definitions to internal
2024-04-01 10:00:20 -07:00
Ryan Houdek
5c29c9d464
Merge pull request #3527 from Sonicadvance1/move_type_defines
Moves FHU TypeDefines to FEXCore includes
2024-04-01 08:57:22 -07:00
Ryan Houdek
3bed305660
Merge pull request #3526 from Sonicadvance1/move_codeloader
FEXCore: Moves CodeLoader to frontend
2024-04-01 07:52:02 -07:00
Ryan Houdek
f6639c3594
Merge pull request #3525 from Sonicadvance1/move_cpubackend
FEXCore: Moves CPUBackend definition internal
2024-04-01 06:47:34 -07:00
Alyssa Rosenzweig
ca1ec232c9 RA: drop dead block interference code
Unused, and new RA won't use it either. Torch it.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-03-31 20:51:11 -04:00
Alyssa Rosenzweig
7b1bb159fa OpcodeDispatcher: use ForeachDirection for scas
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-03-31 20:31:38 -04:00
Alyssa Rosenzweig
5c7f2934de OpcodeDispatcher: use ForeachDirection for lods
eliminates xblock live

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-03-31 20:29:29 -04:00
Alyssa Rosenzweig
5d79d4eb50 OpcodeDispatcher: use ForeachDirection for CMPS
eliminates xblock liveness

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-03-31 20:29:16 -04:00
Alyssa Rosenzweig
3f66173bc7 OpcodeDispatcher: add ForeachDirection helper
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-03-31 20:28:56 -04:00
Alyssa Rosenzweig
4452f0acba ConstProp: optimize rmif with 0 for clc
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-03-31 20:01:44 -04:00
Alyssa Rosenzweig
1a1545da0f OpcodeDispatcher: rework rep cmp
1. pull flag calculation out of the loop body for perf
2. fully rotate the inner loop to save an instruction per iteration
3. hoist the rcx=0 jump to avoid computing df when rcx=0

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-03-31 19:54:00 -04:00
Alyssa Rosenzweig
a70ea30c02 IR: add CondSubNZCV (ccmp) instruction
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-03-31 17:50:57 -04:00
Alyssa Rosenzweig
15b86e4c5a OpcodeDispatcher: rewrite ROL/ROR
single unified implementation for ROL & ROR (instead of 4 cases). no more
deferred flags because it's easy to shoot ourselves in the foot with deferred
flags w.r.t the new RA design, and rotates are rare enough with very efficient
flag calculations such that the extra JIT overhead should be minimal to DCE the
resulting calculations later.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-03-31 14:47:19 -04:00
Alyssa Rosenzweig
67baff8a57
Merge pull request #3537 from Sonicadvance1/remove_vla_ra
RA: Removes VLA usage
2024-03-31 14:44:37 -04:00
Ryan Houdek
fedc24be1e
RA: Removes VLA usage
Just like #3508, clang-18 complains about VLA usage.

This vector is relatively small, only around 18 elements but is
semi-dynamic depending on arch and if FEXCore is targeting Linux or
Win32.
2024-03-30 16:50:04 -07:00
Alyssa Rosenzweig
6f5e4fd34b OpcodeDispatcher: add non-flag calc version of ShiftVariable
more correct for rcl, etc

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-03-30 14:30:36 -04:00
Alyssa Rosenzweig
bdda99e44f ConstProp: constant fold Neg
will come up with rotate in the next patch

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-03-30 14:15:11 -04:00
Alyssa Rosenzweig
706065b0e2 OpcodeDispatcher: accelerate cmpxchg with flagm
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-03-30 14:12:59 -04:00
Alyssa Rosenzweig
9fd32f07cb JIT: preserve nzcv for the slow atomic path
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-03-30 14:12:59 -04:00
Alyssa Rosenzweig
deba6a1b76 JIT: add comment about unaligned backpatching
save future me some grief.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-03-30 14:12:59 -04:00
Alyssa Rosenzweig
d25ace43aa
Merge pull request #3528 from alyssarosenzweig/ra/xsave-xrstor
Eliminate crossblock liveness in xsave/xrstor
2024-03-30 14:11:25 -04:00
Alyssa Rosenzweig
9010b3c117 OpcodeDispatcher: use neg trick for rcl smaller
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-03-30 13:42:41 -04:00
Alyssa Rosenzweig
b0e001b660 OpcodeDispatcher: elim xblock live for smaller rcl
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-03-30 13:42:41 -04:00
Alyssa Rosenzweig
eadacbd67b OpcodeDispatcher: elim xblock live with smaller rcr
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-03-30 13:42:03 -04:00
Alyssa Rosenzweig
7de29749be OpcodeDispatcher: eliminate xblock live for rcl
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-03-30 13:42:03 -04:00
Alyssa Rosenzweig
1f3843ccad OpcodeDispatcher: eliminate xblock live for rcr
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-03-30 13:42:03 -04:00
Alyssa Rosenzweig
bc76df9901 OpcodeDispatcher: add non-flag calc version of ShiftVariable
more correct for rcl, etc

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-03-30 13:42:03 -04:00
Alyssa Rosenzweig
6e92cc454d ConstProp: constant fold Neg
will come up with rotate in the next patch

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-03-30 13:42:03 -04:00
Ryan Houdek
ed3af580c5
FEXCore: Move nearly all IR definitions to internal
It has been a long time coming that FEX no longer needed to leak IR
implementation details to the frontend, this was legacy due to IR CI and
various other problems.

Now that the last bits of IR leaking has been removed, move everything
that we can internally to the implementation.
We still have a couple of minor details in the exposed IR.h to the
frontend, but these are limited to a few enums and some thunking struct
information rather than all the implementation details.

No functional change with this, just moving headers around.
2024-03-29 17:20:18 -07:00
Ryan Houdek
8564290f76
FEXCore: Remove DebugStore map
This hasn't been used and is blocking refactoring more code.
2024-03-29 14:58:44 -07:00
Alyssa Rosenzweig
c513b9685d OpcodeDispatcher: eliminate crossblock liveness in xsave/xrstor
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-03-29 09:57:16 -04:00
Ryan Houdek
d11a36eaea
Moves FHU TypeDefines to FEXCore includes
FEXCore includes was including an FHU header which would result in
compilation failure for external projects trying to link to libFEXCore.

Moves it over to fix this, it was the only FHU usage in FEXCore/include
NFC
2024-03-29 02:54:54 -07:00
Ryan Houdek
f46e88ebdb
FEXCore: Moves CPUBackend definition internal
This is no longer necessary to be part of the public API. Moves the
header internally.

Needed to pass through `IsAddressInCodeBuffer` from CPUBackend through
the Context object, but otherwise no functional change.
2024-03-29 02:27:29 -07:00
Ryan Houdek
20eb338644
FEXCore: Moves CodeLoader to frontend
FEXCore no longer has a need for this since a bunch of related code was
already moved to the frontend. Move the CodeLoader now.
2024-03-29 02:24:53 -07:00
Ryan Houdek
aa26b6288e
Merge pull request #3522 from alyssarosenzweig/ra/cmpxchg8
OpcodeDispatcher: eliminate branch in cmpxchg pair
2024-03-27 21:56:19 -07:00
Ryan Houdek
624bc3fce5
Merge pull request #3520 from Sonicadvance1/sleep_process
FEXLoader: Add a way to sleep a process on startup
2024-03-27 18:35:06 -07:00
Alyssa Rosenzweig
61758ea47d OpcodeDispatcher: eliminate branch in cmpxchg pair
In the old case:

* if we take the branch, 1 instruction
* if we don't take the branch, 3 instruction
* branch predictor fun
* 3 instructions of icache pressure

In the new case:

* unconditionally 2 instructions
* no branch predictor dependence
* 2 instructions of icache pressure

This should not be non-neglibly worse, and it simplifies things for RA.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-03-27 12:40:06 -04:00
Ryan Houdek
7b74ca1931
Merge pull request #3514 from alyssarosenzweig/opt/demon
rewrite Demon Addition Adjust (DAA) and other demonic opcodes
2024-03-26 23:24:00 -07:00
Ryan Houdek
24fd28ed9e
Merge pull request #3511 from Sonicadvance1/more_tso_levers
FEXCore: Adds more TSO control levers
2024-03-26 23:23:41 -07:00
Ryan Houdek
970d5d5b13
Merge pull request #3509 from Sonicadvance1/allow_telemetry_redirect
Telemetry: Allow redirecting directory that data is written to
2024-03-26 23:23:05 -07:00
Ryan Houdek
7f90ca53f7
Merge pull request #3505 from Sonicadvance1/telemetry_noncanonical
Telemetry: Adds tracker for non-canonical memory access crash
2024-03-26 23:21:32 -07:00
Ryan Houdek
ade0c46845
FEXLoader: Add a way to sleep a process on startup
I find myself reimplementing this nearly monthly. Actually codify it so
I can stop reimplementing it.
2024-03-26 07:48:09 -07:00
Ryan Houdek
6f29e75f67
FEXCore: Removes vestigial mman SMC checking
This wasn't actually wired up to anything ever since some refactoring
occured two years ago.
2024-03-26 02:56:26 -07:00
Alyssa Rosenzweig
dfe0bdd7f2 OpcodeDispatcher: rewrite DAS
exhaustively checked against the Intel pseudocode since this is tricky:

  def intel(AL, CF, AF):
      old_AL = AL
      old_CF = CF
      CF = False

      if (AL & 0x0F) > 9 or AF:
          Borrow = AL < 6
          AL = (AL - 6) & 0xff
          CF = old_CF or Borrow
          AF = True
      else:
          AF = False

      if (old_AL > 0x99) or old_CF:
          AL = (AL - 0x60) & 0xff
          CF = True

      return (AL & 0xff, CF, AF)

  def fex(AL, CF, AF):
      AF = AF | ((AL & 0xf) > 9)
      CF = CF | (AL > 0x99)
      NewCF = CF | (AF if (AL < 6) else CF)
      AL = (AL - 6) if AF else AL
      AL = (AL - 0x60) if CF else AL
      return (AL & 0xff, NewCF, AF)

  for AL in range(256):
      for CF in [False, True]:
          for AF in [False, True]:
              ref = intel(AL, CF, AF)
              test = fex(AL, CF, AF)
              print(AL, "CF" if CF else "", "AF" if AF else "", ref, test)
              assert(ref == test)

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-03-25 19:43:10 -04:00
Alyssa Rosenzweig
e26481e3cc OpcodeDispatcher: simplify AAM
in the area.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-03-25 19:43:10 -04:00
Alyssa Rosenzweig
86b5a2f352 OpcodeDispatcher: simplify AAD
noticed in the area.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-03-25 19:43:10 -04:00
Alyssa Rosenzweig
2bf880c43a OpcodeDispatcher: rewrite AAS
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-03-25 19:43:10 -04:00
Alyssa Rosenzweig
583d4f8f94 OpcodeDispatcher: factor out CalculateAFForDecimal
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-03-25 19:43:10 -04:00
Alyssa Rosenzweig
3ca2c4377f OpcodeDispatcher: rewrite AAA
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-03-25 19:43:10 -04:00
Ryan Houdek
76983476b9
Merge pull request #3504 from Sonicadvance1/fix_loop_a16
OpcodeDispatcher: Fixes 32-bit mode LOOP RCX register usage
2024-03-25 12:18:14 -07:00
Alyssa Rosenzweig
949717a95f OpcodeDispatcher: rewrite DAA implementation
Based on https://www.righto.com/2023/01/

New implementation is branchless, which is theoretically easier to RA. It's also
massively simpler which is good for a demon opcode.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-03-25 13:00:59 -04:00
Alyssa Rosenzweig
693d86dd67 OpcodeDispatcher: add SetAFAndFixup helper
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-03-25 12:59:19 -04:00
Ryan Houdek
3034edb0aa
RA: Adds RIP when a block panic spills
I find myself adding this every time I find a game that panic spills.
Let's just print it out.
2024-03-24 17:11:29 -07:00
Ryan Houdek
64f47d1ec2
FEXCore: Adds more TSO control levers
Lets use control vector loadstores and memcpy/memset TSO visibility.
This just gives us a bit more configuration rather than TSO off or on.
2024-03-24 16:34:18 -07:00
Ryan Houdek
70befc216f
Telemetry: Allow redirecting directory that data is written to
This will be necessary
2024-03-24 00:47:35 -07:00
Ryan Houdek
4952b2e16c
Telemetry: Rename old file instead of copying
Since we do an immediate overwrite of the file we are copying, we can
instead do a rename. Failure on rename is fine, will either mean the
telemetry file didn't exist initially, or some other permission error so
the telemetry will get lost regardless.
2024-03-21 22:51:20 -07:00
Ryan Houdek
5a35e119fe
Telemetry: Adds tracker for non-canonical memory access crash
This may be useful for tracking TSO faulting when it manages to fetch
stale data. While most TSO crashes are due to nullptr dereferences, this
can still check for the corruption case.
2024-03-21 20:47:36 -07:00
Ryan Houdek
824f122680
OpcodeDispatcher: Fixes 32-bit mode LOOP RCX register usage
In 64-bit mode, the LOOP instruction's RCX register usage is 64-bit or
32-bit.
In 32-bit mode, the LOOP instruction's RCX register usage is 32-bit or
16-bit.

FEX wasn't handling the 16-bit case at all which was causing the LOOP
instruction to effectively always operate at 32-bit size. Now this is
correctly supported, and it also stops treating the operation as 64-bit.
2024-03-21 20:13:15 -07:00
Ryan Houdek
8852d94416
Merge pull request #3503 from alyssarosenzweig/opt/loop
OpcodeDispatcher: optimize LOOP/N/E
2024-03-21 20:05:50 -07:00
Alyssa Rosenzweig
82ba16c6ed OpcodeDispatcher: optimize LOOP/N/E
Don't clobber NZCV.

Before/after assembly from the Primary_E1 unit test:

< 4340: [INFO] cset w20, ne
< 4340: [INFO] mrs x21, nzcv
< 4340: [INFO] cmp x5, #0x0 (0)
< 4340: [INFO] cset x22, ne
< 4340: [INFO] and x20, x22, x20
< 4340: [INFO] msr nzcv, x21
< 4340: [INFO] cbnz x20, #+0x8 (addr 0xffff896f8084)
< 4340: [INFO] b #+0x1c (addr 0xffff896f809c)
< 4340: [INFO] ldr x0, pc+8 (addr 0xffff896f808c)
---
> 4340: [INFO] csel x20, x5, xzr, ne
> 4340: [INFO] cbnz x20, #+0x8 (addr 0xfffed7308070)
> 4340: [INFO] b #+0x1c (addr 0xfffed7308088)
> 4340: [INFO] ldr x0, pc+8 (addr 0xfffed7308078)

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-03-21 12:08:40 -04:00
Ryan Houdek
45ea0cd782
Removes false termux support
This was a funny joke that this was here, but it is fundamentally
incompatible with what we're doing. All those users are running proot
anyway because of how broken running under termux directly is.

Just remove this from here.
2024-03-20 22:04:32 -07:00
Billy Laws
d490cb1b79 FEXCore: Fallback to the memcpy slow path for overlaps within 32 bytes
Take e.g a forward rep movsb copy from addr 0 to 1, the expected
behaviour since this is a bytewise copy is:
before: aaabbbb...
after: aaaaaaa...
but by copying in 32-byte chunks we end up with:
after: aaaabbbb...
due to the self overwrites not occuring within a single 32 bit copy.
2024-03-20 20:54:19 +00:00
Billy Laws
94fecb9dad FEXCore: Remove needless alignment checks for the mem{cpy,set} fastpath 2024-03-20 20:54:09 +00:00
Ryan Houdek
7dcacfe990
Merge pull request #3478 from bylaws/memcpy
FEXCore: Add non-atomic Memcpy and Memset IR fast paths
2024-03-18 18:56:44 -07:00
Billy Laws
8d4d8fe3e5 FEXCore: Add non-atomic Memcpy and Memset IR fast paths
When TSO is disabled, vector LDP/STP can be used for a two
instruction 32 byte memory copy which is significantly faster than the
current byte-by-byte copy. Performing two such copies directly after
oneanother also marginally increases copy speed for all sizes >=64.
2024-03-18 23:28:50 +00:00
Ryan Houdek
ab8ee64352
Merge pull request #3497 from Sonicadvance1/movmaskb_constant
JIT: Optimize pmovmaskb with a named vector constant
2024-03-18 16:08:40 -07:00
Alyssa Rosenzweig
2a9fcc6a66
Merge pull request #3492 from Sonicadvance1/implement_prefetch
OpcodeDispatcher: Implement support for the various prefetch instructions
2024-03-18 07:49:47 -04:00
Ryan Houdek
fd391b1b18
JIT: Optimize pmovmaskb with a named vector constant
I was looking at some other JIT overheads and this cropped up as some
overhead. Instead of materializing a constant using mov+movk+movk+movk,
load it from the named vector constant array.

In a micro-benchmark this improved performance by 34%.
In bytemark this improved on subbench by 0.82%
2024-03-17 18:40:46 -07:00
Ryan Houdek
f79991a9d8
OpcodeDispatcher: Implement rdpid
Missed this instruction when implementing rdtscp. Returns the same ID
result in a register just like rdtscp, but without the cycle counter
results. Doesn't touch any flags just like rdtscp.
2024-03-14 20:07:58 -07:00
Ryan Houdek
ca6b2e43e6
Merge pull request #3491 from alyssarosenzweig/rclse/waw
RCLSE: Optimize store-after-store
2024-03-14 03:23:05 -07:00
Ryan Houdek
8a3d08e1d8
Merge pull request #3483 from neobrain/refactor_stealmemoryregion
Allocator: Cleanup StealMemoryRegions implementation
2024-03-14 03:21:09 -07:00
Ryan Houdek
8056bee82b
OpcodeDispatcher: Implement support for the various prefetch instructions
x86 has a few prefetch instructions.
- prefetch - One of two classic 3DNow! instructions
   - Prefetch in to L1 data cache
- prefetchw - One of two classic 3DNow! instructions
   - Implies prefetch in to L1 data cache
   - Prefetch cacheline with intent to write and exclusive ownership

- prefetchnta
   - Prefetch non-temporal data in respect to /all/ cache levels
   - Assumes inclusive caches?
- prefetch{t0,t1,t2}
   - Prefetch data with respect to each cache level
   - T0 = L1 and higher
   - T1 = L2 and higher
   - T2 = L3 and higher

**Some silly duplicates**
- prefetchwt1
   - Duplicate of prefetchw but explicitly L1 data cache
- prefetch_exclusive
   - Duplicate of prefetch

God Of War 2018 uses prefetchw as a hint for exclusive ownership of the
cacheline in some very aggressive spin-loops. Let's implement the
operations to help it along.
2024-03-12 21:37:31 -07:00
Ryan Houdek
cc635a54f8
IR: Implements support for prefetch operation 2024-03-12 21:19:50 -07:00
Ryan Houdek
217d9d8c50
ARMEmitter: Fixes prfm with negative or unaligned offsets 2024-03-12 21:18:23 -07:00
Tony Wasserka
a047ac1699 Allocator: Test CollectMemoryGaps instead of StealMemoryRegions and restore the original interfaces 2024-03-12 10:49:31 +01:00
Tony Wasserka
bb0b114fc8 Allocator: Miscellaneous cleanups 2024-03-12 10:49:30 +01:00
Tony Wasserka
ccd6c15316 Allocator: Use std::from_chars instead of parsing digits manually 2024-03-12 10:49:30 +01:00
Tony Wasserka
0a1fe1c8c2 Allocator: Parse process mappings per-line instead of per-character 2024-03-12 10:49:30 +01:00
Tony Wasserka
f43fe5fd63 Allocator: Stop parsing more eagerly
This is a soft-revert of eaf83aa. That change is no longer needed, since the
stack special case is handled externally now.
2024-03-12 10:49:30 +01:00
Tony Wasserka
dce9f651fd Allocator: Split off memory gap collection to a separate function
This function can be unit-tested more easily, and the stack special is more
cleanly handled as a post-collection step.

There is a minor functional change: The stack special case didn't trigger
previously if the range end was within the stack mapping. This is now fixed.
2024-03-12 10:49:30 +01:00
Tony Wasserka
0d71f169d0 Allocator: Adopt a more testable interface for StealMemoryRegions 2024-03-12 10:49:30 +01:00
Tony Wasserka
430ac0f70a Allocator: Fix format strings 2024-03-12 10:49:30 +01:00
Alyssa Rosenzweig
7629007cfa OpcodeDispatcher: allow upper garbage on STOS
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-03-11 18:50:31 -04:00
Alyssa Rosenzweig
03c6abdad4 OpcodeDispatcher: optimize DF add
fuse the shift the right way

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-03-11 18:50:31 -04:00
Alyssa Rosenzweig
c99cbe6d0a JIT: switch DF representation
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-03-11 18:50:31 -04:00
Alyssa Rosenzweig
e3ee65e491 OpcodeDispatcher: use transformed DF for memset/memcpy
Use the 1/-1 representation instead of 0/1. This will be better by the end of
the series.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-03-11 18:50:31 -04:00
Alyssa Rosenzweig
aee00f524c OpcodeDispatcher: use DF retrieval helpers
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-03-11 18:50:31 -04:00
Alyssa Rosenzweig
a76321c6c1 OpcodeDispatcher: add DF retrieval helpers
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-03-11 18:50:31 -04:00
Alyssa Rosenzweig
f7586f4459 CoreState: use x86 enums for readability
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-03-11 18:50:31 -04:00
Ryan Houdek
c37a12e806
Merge pull request #3490 from Sonicadvance1/disable_assert
Disable assert in release
2024-03-11 15:48:18 -07:00
Alyssa Rosenzweig
92f31648b9 RCLSE: optimize out pointless stores
can help a lot of x86 code because x86 is 2-address and a64 is 3-address, so x86
ends up with piles of movs that end up dead after translation

It's not a win across the board because our RA isn't aware of tied registers so
sometimes we regress moves. But it's a win on average, and the RA bits can be
improved with time.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-03-11 18:41:23 -04:00
Alyssa Rosenzweig
85f8ad3842 JIT: fix sha256msg1 encoding
botched move in the !tied reg case.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-03-11 18:41:23 -04:00
Ryan Houdek
ff0c7637c9
Merge pull request #3421 from pmatos/AddressingModes32
Improve 32bit ld/st addressing mode propagation
2024-03-11 15:20:20 -07:00
Ryan Houdek
54403e2146
Disable assert in release
Arguments and conditional doesn't get optimized out in release builds
for the inline function call versus the define.

Was showing up an annoying amount of time when testing.
2024-03-10 22:01:50 -07:00
Paulo Matos
a86f2d3e2c Improve 32bit constant usage in memory addressing
Folds reg+const memory address into addressing mode,
if the constant is within 16Kb.
Update instcountci files.
Add test 32Bit_ASM/FEX_bugs/SubAddrBug.asm
2024-03-05 14:01:32 +00:00
Tony Wasserka
6edba49784 Update Catch2 to v3.5.3 2024-03-05 12:15:29 +01:00
Alyssa Rosenzweig
11880459a5 OpcodeDispatcher: use SETF for DEC
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-03-01 19:40:53 -04:00
Alyssa Rosenzweig
0ef0bb2c97 OpcodeDispatcher: use SETF for INC
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-03-01 19:40:53 -04:00
Alyssa Rosenzweig
72edee7c6f IR: add SETF8/SETF16 ir ops
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-03-01 19:40:53 -04:00
Ryan Houdek
009ae55ff0
Merge pull request #3475 from alyssarosenzweig/opt/lock-dec
Optimize lock dec
2024-02-29 08:44:24 -08:00
Ryan Houdek
98572b9e23
Merge pull request #3473 from Sonicadvance1/remove_mov_swap
Arm64: Stop moving source in atomic swap
2024-02-29 08:44:16 -08:00
Alyssa Rosenzweig
fed5e6d546 OpcodeDispatcher: use fetchadd for atomic DEC
Avoids a NEG.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-29 09:28:21 -04:00
Alyssa Rosenzweig
f27e2246e2
Merge pull request #3468 from alyssarosenzweig/opt/miscs
Misc little opts
2024-02-29 09:18:17 -04:00
Ryan Houdek
eaf83aa6b4
Fix reserving range check
Fixes an issue where TestHarnessRunner was managing to reserve the space
below stack again, resulting in stack growth breaking. Would typically
only show up when using the vixl simulator under gdb for some reason.

This is likely the last bandage on this code before it gets completely
rewritten to be more readable.
2024-02-29 04:02:05 -08:00
Ryan Houdek
c318947695
Arm64: Stop moving source in atomic swap
ldswpal doesn't overwrite the source register and only reads the bits
required for the sized operation.
Not sure exactly why we were doing a copy here.

Removing it means improving Skyrim's hottest code block, as seen in #3472
2024-02-29 03:07:05 -08:00
Alyssa Rosenzweig
811487ad98 OpcodeDispatcher: use real branch for INT
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-28 10:35:12 -04:00