Commit Graph

962 Commits

Author SHA1 Message Date
Ryan Houdek
c37a12e806
Merge pull request #3490 from Sonicadvance1/disable_assert
Disable assert in release
2024-03-11 15:48:18 -07:00
Ryan Houdek
54403e2146
Disable assert in release
Arguments and conditional doesn't get optimized out in release builds
for the inline function call versus the define.

Was showing up an annoying amount of time when testing.
2024-03-10 22:01:50 -07:00
Paulo Matos
a86f2d3e2c Improve 32bit constant usage in memory addressing
Folds reg+const memory address into addressing mode,
if the constant is within 16Kb.
Update instcountci files.
Add test 32Bit_ASM/FEX_bugs/SubAddrBug.asm
2024-03-05 14:01:32 +00:00
Alyssa Rosenzweig
11880459a5 OpcodeDispatcher: use SETF for DEC
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-03-01 19:40:53 -04:00
Alyssa Rosenzweig
0ef0bb2c97 OpcodeDispatcher: use SETF for INC
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-03-01 19:40:53 -04:00
Alyssa Rosenzweig
72edee7c6f IR: add SETF8/SETF16 ir ops
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-03-01 19:40:53 -04:00
Ryan Houdek
009ae55ff0
Merge pull request #3475 from alyssarosenzweig/opt/lock-dec
Optimize lock dec
2024-02-29 08:44:24 -08:00
Ryan Houdek
98572b9e23
Merge pull request #3473 from Sonicadvance1/remove_mov_swap
Arm64: Stop moving source in atomic swap
2024-02-29 08:44:16 -08:00
Alyssa Rosenzweig
fed5e6d546 OpcodeDispatcher: use fetchadd for atomic DEC
Avoids a NEG.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-29 09:28:21 -04:00
Alyssa Rosenzweig
f27e2246e2
Merge pull request #3468 from alyssarosenzweig/opt/miscs
Misc little opts
2024-02-29 09:18:17 -04:00
Ryan Houdek
eaf83aa6b4
Fix reserving range check
Fixes an issue where TestHarnessRunner was managing to reserve the space
below stack again, resulting in stack growth breaking. Would typically
only show up when using the vixl simulator under gdb for some reason.

This is likely the last bandage on this code before it gets completely
rewritten to be more readable.
2024-02-29 04:02:05 -08:00
Ryan Houdek
c318947695
Arm64: Stop moving source in atomic swap
ldswpal doesn't overwrite the source register and only reads the bits
required for the sized operation.
Not sure exactly why we were doing a copy here.

Removing it means improving Skyrim's hottest code block, as seen in #3472
2024-02-29 03:07:05 -08:00
Alyssa Rosenzweig
811487ad98 OpcodeDispatcher: use real branch for INT
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-28 10:35:12 -04:00
Alyssa Rosenzweig
4f4e38ace2 OpcodeDispatcher: use real branch for rep cmps
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-28 10:35:12 -04:00
Alyssa Rosenzweig
edd6becc56 OpcodeDispatcher: use real branch for rep scas
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-28 10:35:12 -04:00
Alyssa Rosenzweig
e47a94cae7 OpcodeDispatcher: skip mask with shld
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-28 10:18:29 -04:00
Alyssa Rosenzweig
cc82dba1ca OpcodeDispatcher: use mvn for AF with constants
This reduces pointless constant usage. For now, it's no net change to
instcountci, but it should make it easier to get wins later. Hopefully.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-28 10:18:28 -04:00
Alyssa Rosenzweig
8232669b22 OpcodeDispatcher: simplify
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-28 10:00:13 -04:00
Alyssa Rosenzweig
28936073c4 OpcodeDispatcher: allow upper garbage on NEG
like SUB.

due to RA silliness, this is a loss for inst count but a win for cycles.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-28 09:59:57 -04:00
Alyssa Rosenzweig
ef2559d911 OpcodeDispatcher: allow garbage with SCAS
it's just feeding SUB flags which allow it

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-28 09:16:08 -04:00
Ryan Houdek
f346f89678
OpcodeDispatcher: Don't use AddShift with no shift
This accidentally removed optimizations elsewhere that was only checking
for Add.
2024-02-27 19:56:12 -08:00
Ryan Houdek
139367d248
Merge pull request #3463 from Sonicadvance1/update_xxhash
Update xxhash to v0.8.2
2024-02-27 16:39:38 -08:00
Alyssa Rosenzweig
49e798ab2b
Merge pull request #3461 from alyssarosenzweig/opt/sbc
Optimize SBC
2024-02-27 11:29:45 -04:00
Ryan Houdek
78a362581d
Update xxhash to v0.8.2
Switches to using upstream cmake files.
2024-02-26 23:57:25 -08:00
Ryan Houdek
946c805d84
Merge pull request #3459 from Sonicadvance1/fix_591
Capture a 64-bit process trying to jump to 32-bit syscall handler
2024-02-26 21:57:03 -08:00
Ryan Houdek
0b34035085
Merge pull request #3439 from Sonicadvance1/allocate_first_4gb_of_64bit
FEXLoader: Allocate the second 4GB of virtual memory when executing 32-bit
2024-02-26 18:46:22 -08:00
Alyssa Rosenzweig
12cc980603 OpcodeDispatcher: shuffle adc flag order
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-26 16:48:07 -04:00
Alyssa Rosenzweig
f3d55dd721 OpcodeDispatcher: shuffle SBC flag order
avoids clobbering nzcv

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-26 16:48:07 -04:00
Alyssa Rosenzweig
91cef6b76f OpcodeDispatcher: use native ADC even for 8/16-bit
we mask off the upper bits, and they agree in the lower bits.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-26 16:48:07 -04:00
Alyssa Rosenzweig
270cbf39b5 OpcodeDispatcher: specialize SALC
this gets rid of the awkward non-flag SBB case, which streamlines SBB. while
getting better codegen for the demon opcode (-:

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-26 16:33:41 -04:00
Alyssa Rosenzweig
2e0be0a5e7 OpcodeDispatcher: allow more upper garbage with adc
missed this last series.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-26 15:42:13 -04:00
Alyssa Rosenzweig
d60c089697 OpcodeDispatcher: allow upper garbage with sbb
for the usual reasons

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-26 15:35:14 -04:00
Alyssa Rosenzweig
e76ebeab58 OpcodeDispatcher: use 1-op "src + CF"
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-26 15:35:14 -04:00
Alyssa Rosenzweig
333271d490 OpcodeDispatcher: fuse sbb when flags calculated
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-26 15:35:14 -04:00
Alyssa Rosenzweig
a750870abf OpcodeDispatcher: use fused sbcs calculations
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-26 15:25:03 -04:00
Alyssa Rosenzweig
15db72ef60 IR: add Sbb, SbbWithFlags ops
For fusing sbc+sbcs

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-26 15:25:03 -04:00
Alyssa Rosenzweig
5f16f357af
Merge pull request #3456 from alyssarosenzweig/opt/adc
Optimize ADC
2024-02-26 09:39:37 -04:00
Ryan Houdek
4f028b8614
Capture a 64-bit process trying to jump to 32-bit syscall handler
Fixes #591

Adds a simple unittest
2024-02-26 05:37:29 -08:00
Alyssa Rosenzweig
7deb4976a3 RedundantFlagCalculationElimination: fix missing NEG case
can be predicated.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-25 17:37:16 -04:00
Alyssa Rosenzweig
1e153e0c81 OpcodeDispatcher: allow garbage with adcs
for the usual reasons

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-25 10:50:02 -04:00
Alyssa Rosenzweig
6994fc3a01 IR,OpcodeDispatcher,JIT: fuse adcs flags
The usual tricks, also requires introducing a bare adc op to optimize adcs to,
but we wanted that anyway!

Also support a zero source, so we can calculate "foo + CF" in one instruction to
optimize the "lock adc" cases.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-25 10:49:32 -04:00
Ryan Houdek
d703f3ccee
Fixes zero register flag generation
Fixes 140976d322

Adds a unit test to ensure it keeps working.
2024-02-24 16:32:25 -08:00
Alyssa Rosenzweig
80e632db8a OpcodeDispatcher: garbage collect
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-24 15:54:49 -04:00
Alyssa Rosenzweig
852e3c4e93 OpcodeDispatcher: fuse XADD
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-24 15:54:49 -04:00
Alyssa Rosenzweig
e86547bbcb OpcodeDispatcher: fuse INC
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-24 15:54:49 -04:00
Alyssa Rosenzweig
883cca2e8f OpcodeDispatcher: use AddWithFlags
give it the same treatment we just gave sub.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-24 15:54:49 -04:00
Alyssa Rosenzweig
8540332520 IR: add AddWithFlags
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-24 15:54:49 -04:00
Alyssa Rosenzweig
df5bdefb8a OpcodeDispatcher: merge secondary ALU with primary ALU
It's the same, stop copypasting. This gets our flag and arithmetic opts (current
and future) applied to secondary ALU too.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-24 15:54:49 -04:00
Alyssa Rosenzweig
3d1fb7701c OpcodeDispatcher: optimize sub
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-24 15:54:49 -04:00
Alyssa Rosenzweig
140976d322 OpcodeDispatcher: prep primary ALU for better flags
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-02-24 15:54:49 -04:00