9098 Commits

Author SHA1 Message Date
Ryan Houdek
20da1e4244
InstcountCI: Update for pmovmaskb 2024-03-17 18:52:21 -07:00
Ryan Houdek
fd391b1b18
JIT: Optimize pmovmaskb with a named vector constant
I was looking at some other JIT overheads and this cropped up as some
overhead. Instead of materializing a constant using mov+movk+movk+movk,
load it from the named vector constant array.

In a micro-benchmark this improved performance by 34%.
In bytemark this improved on subbench by 0.82%
2024-03-17 18:40:46 -07:00
Mai
ba3029b1f6
Merge pull request #3495 from Sonicadvance1/implement_rdpid
OpcodeDispatcher: Implement rdpid
2024-03-15 17:29:12 -04:00
Ryan Houdek
6757a80365
InstcountCI: Update for prefetch changes 2024-03-15 13:20:28 -07:00
Ryan Houdek
f79991a9d8
OpcodeDispatcher: Implement rdpid
Missed this instruction when implementing rdtscp. Returns the same ID
result in a register just like rdtscp, but without the cycle counter
results. Doesn't touch any flags just like rdtscp.
2024-03-14 20:07:58 -07:00
Ryan Houdek
ca6b2e43e6
Merge pull request #3491 from alyssarosenzweig/rclse/waw
RCLSE: Optimize store-after-store
2024-03-14 03:23:05 -07:00
Ryan Houdek
8a3d08e1d8
Merge pull request #3483 from neobrain/refactor_stealmemoryregion
Allocator: Cleanup StealMemoryRegions implementation
2024-03-14 03:21:09 -07:00
Ryan Houdek
cd2a6ce820
Merge pull request #3469 from alyssarosenzweig/opt/df
Optimize DF representation
2024-03-14 03:18:41 -07:00
Ryan Houdek
4e269d8b80
Merge pull request #3494 from neobrain/fix_libfwd_float_as_int
Library Forwarding: Don't map float/double to fixed-size integers
2024-03-14 03:11:11 -07:00
Tony Wasserka
552e76c001 Library Forwarding: Don't map float/double to fixed-size integers
Fixes #3455.
2024-03-14 10:14:57 +01:00
Mai
caff3cb799
Merge pull request #3493 from Sonicadvance1/bug_for_3478
unittests/ASM: Implements a unit test for #3478
2024-03-13 22:57:39 -04:00
Ryan Houdek
0d33dacc37
unittests/ASM: Implements a unit test for #3478
This unit test recreates the error condition that #3478 causes.
With a string operation that is a backwards copy then the optimization
will read past the end of the page and result in a crash.

Seemingly only happens with backwards string operations, but test
forward and backward in this test.
2024-03-13 18:36:19 -07:00
Ryan Houdek
ba7b69eea2
InstCountCI: Adds prefetch addressing limits 2024-03-12 21:38:28 -07:00
Ryan Houdek
8056bee82b
OpcodeDispatcher: Implement support for the various prefetch instructions
x86 has a few prefetch instructions.
- prefetch - One of two classic 3DNow! instructions
   - Prefetch in to L1 data cache
- prefetchw - One of two classic 3DNow! instructions
   - Implies prefetch in to L1 data cache
   - Prefetch cacheline with intent to write and exclusive ownership

- prefetchnta
   - Prefetch non-temporal data in respect to /all/ cache levels
   - Assumes inclusive caches?
- prefetch{t0,t1,t2}
   - Prefetch data with respect to each cache level
   - T0 = L1 and higher
   - T1 = L2 and higher
   - T2 = L3 and higher

**Some silly duplicates**
- prefetchwt1
   - Duplicate of prefetchw but explicitly L1 data cache
- prefetch_exclusive
   - Duplicate of prefetch

God Of War 2018 uses prefetchw as a hint for exclusive ownership of the
cacheline in some very aggressive spin-loops. Let's implement the
operations to help it along.
2024-03-12 21:37:31 -07:00
Ryan Houdek
cc635a54f8
IR: Implements support for prefetch operation 2024-03-12 21:19:50 -07:00
Ryan Houdek
217d9d8c50
ARMEmitter: Fixes prfm with negative or unaligned offsets 2024-03-12 21:18:23 -07:00
Tony Wasserka
a047ac1699 Allocator: Test CollectMemoryGaps instead of StealMemoryRegions and restore the original interfaces 2024-03-12 10:49:31 +01:00
Tony Wasserka
bb0b114fc8 Allocator: Miscellaneous cleanups 2024-03-12 10:49:30 +01:00
Tony Wasserka
ccd6c15316 Allocator: Use std::from_chars instead of parsing digits manually 2024-03-12 10:49:30 +01:00
Tony Wasserka
0a1fe1c8c2 Allocator: Parse process mappings per-line instead of per-character 2024-03-12 10:49:30 +01:00
Tony Wasserka
f43fe5fd63 Allocator: Stop parsing more eagerly
This is a soft-revert of eaf83aa. That change is no longer needed, since the
stack special case is handled externally now.
2024-03-12 10:49:30 +01:00
Tony Wasserka
dce9f651fd Allocator: Split off memory gap collection to a separate function
This function can be unit-tested more easily, and the stack special is more
cleanly handled as a post-collection step.

There is a minor functional change: The stack special case didn't trigger
previously if the range end was within the stack mapping. This is now fixed.
2024-03-12 10:49:30 +01:00
Tony Wasserka
0d71f169d0 Allocator: Adopt a more testable interface for StealMemoryRegions 2024-03-12 10:49:30 +01:00
Tony Wasserka
430ac0f70a Allocator: Fix format strings 2024-03-12 10:49:30 +01:00
Alyssa Rosenzweig
063b81da1d InstCountCI: Update
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-03-11 18:50:31 -04:00
Alyssa Rosenzweig
7629007cfa OpcodeDispatcher: allow upper garbage on STOS
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-03-11 18:50:31 -04:00
Alyssa Rosenzweig
03c6abdad4 OpcodeDispatcher: optimize DF add
fuse the shift the right way

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-03-11 18:50:31 -04:00
Alyssa Rosenzweig
c99cbe6d0a JIT: switch DF representation
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-03-11 18:50:31 -04:00
Alyssa Rosenzweig
e3ee65e491 OpcodeDispatcher: use transformed DF for memset/memcpy
Use the 1/-1 representation instead of 0/1. This will be better by the end of
the series.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-03-11 18:50:31 -04:00
Alyssa Rosenzweig
aee00f524c OpcodeDispatcher: use DF retrieval helpers
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-03-11 18:50:31 -04:00
Alyssa Rosenzweig
a76321c6c1 OpcodeDispatcher: add DF retrieval helpers
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-03-11 18:50:31 -04:00
Alyssa Rosenzweig
f7586f4459 CoreState: use x86 enums for readability
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-03-11 18:50:31 -04:00
Ryan Houdek
e33a76a2ad
Merge pull request #3489 from Sonicadvance1/linux_v6.8
Linux: Expose support for v6.8
2024-03-11 15:48:37 -07:00
Ryan Houdek
c37a12e806
Merge pull request #3490 from Sonicadvance1/disable_assert
Disable assert in release
2024-03-11 15:48:18 -07:00
Alyssa Rosenzweig
ed59f73a65 InstCountCI: Update
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-03-11 18:41:33 -04:00
Alyssa Rosenzweig
92f31648b9 RCLSE: optimize out pointless stores
can help a lot of x86 code because x86 is 2-address and a64 is 3-address, so x86
ends up with piles of movs that end up dead after translation

It's not a win across the board because our RA isn't aware of tied registers so
sometimes we regress moves. But it's a win on average, and the RA bits can be
improved with time.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-03-11 18:41:23 -04:00
Alyssa Rosenzweig
85f8ad3842 JIT: fix sha256msg1 encoding
botched move in the !tied reg case.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-03-11 18:41:23 -04:00
Alyssa Rosenzweig
64f71d87bb unittests: disable rdtsc test on sim
Patch written by Sonicadvance1. Unclear how this wasn't already broken, but we
need this to keep CI happy with the rest of this series.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-03-11 18:41:23 -04:00
Ryan Houdek
ff0c7637c9
Merge pull request #3421 from pmatos/AddressingModes32
Improve 32bit ld/st addressing mode propagation
2024-03-11 15:20:20 -07:00
Ryan Houdek
54403e2146
Disable assert in release
Arguments and conditional doesn't get optimized out in release builds
for the inline function call versus the define.

Was showing up an annoying amount of time when testing.
2024-03-10 22:01:50 -07:00
Ryan Houdek
d8202335e0
Linux: Expose support for v6.8
The new syscalls for futexes are the most interesting part
2024-03-10 15:48:55 -07:00
Ryan Houdek
9ec20c4bef
Linux/Ioctls: Update ioctl emulation for v6.8
- v3d added an ioctl
- drm base added a new ioctl
- pvr and xe are new drivers in v6.8
2024-03-10 15:46:21 -07:00
Ryan Houdek
ce7acd9b71
External/drm-headers: Update to v6.8 2024-03-10 15:45:34 -07:00
Ryan Houdek
8a607135fd
Linux: Update syscalls for v6.8 2024-03-10 15:22:51 -07:00
Tony Wasserka
26a66790ab
Merge pull request #3486 from neobrain/fix_libfwd_accidental_copy
Library Forwarding: Fix accidental data copying when converting from host to guest layout
2024-03-08 12:40:44 +01:00
Mai
6d94d79409
Merge pull request #3485 from Sonicadvance1/nouveau_ioctl
IoctlEmulation: Add missing nouveau ioctl
2024-03-07 07:57:47 -05:00
Tony Wasserka
2359a9899c Library Forwarding: Fix accidental data copying when converting from host to guest layout 2024-03-07 10:52:13 +01:00
Ryan Houdek
aeb41e9ae2
IoctlEmulation: Add missing nouveau ioctl
The NVIF ioctl isn't publicly described in the nouveau headers and it is
required for anything to work with Nouveau.

Pass the ioctl command through without modification and hope that this
ioctl is architecture agnostic.
2024-03-05 16:05:13 -08:00
Tony Wasserka
b892da72f3
Merge pull request #3484 from neobrain/feature_catch2_v3
Externals: Update Catch2 to v3.5.3
2024-03-05 19:30:30 +01:00
Paulo Matos
a86f2d3e2c Improve 32bit constant usage in memory addressing
Folds reg+const memory address into addressing mode,
if the constant is within 16Kb.
Update instcountci files.
Add test 32Bit_ASM/FEX_bugs/SubAddrBug.asm
2024-03-05 14:01:32 +00:00