611 Commits

Author SHA1 Message Date
Alyssa Rosenzweig
3767f3633d
Merge pull request #3263 from alyssarosenzweig/opt/not-garbage
OpcodeDispatcher: Make "not" not garbage
2023-11-09 19:58:02 -04:00
Ryan Houdek
af3253947e
Merge pull request #3162 from alyssarosenzweig/opt/nzcv-native
Keep guest SF/ZF/CF/OF flags resident in host NZCV
2023-11-09 15:10:45 -08:00
Ryan Houdek
efc5eb2933
Merge pull request #3250 from Sonicadvance1/gdbserver_frontend_move
FEXLoader: Wire up gdbserver in the frontend
2023-11-09 14:48:59 -08:00
Alyssa Rosenzweig
da3e3fc7a3 OpcodeDispatcher: Optimize some selects
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 15:21:10 -04:00
Alyssa Rosenzweig
b5c83f0628 JIT: Optimize 8/16-bit TestNZ
This is cursed. I blame the darling.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 14:50:36 -04:00
Alyssa Rosenzweig
1ce3c16b30 OpcodeDispatcher: Make "not" not garbage
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 12:02:20 -04:00
Alyssa Rosenzweig
584c4cc05e OpcodeDispatcher: Mask with rmif sometimes
For CF/OF calculation, this saves an instruction on flagm platforms.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 10:05:51 -04:00
Alyssa Rosenzweig
279afd88bb OpcodeDispatcher: Generalize rmif trick
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 10:05:51 -04:00
Alyssa Rosenzweig
c0a6d82025 OpcodeDispatcher: Use rmif for NZCV inserts
Optimizes piles of s/w flag generation on flagm.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 09:40:51 -04:00
Alyssa Rosenzweig
3a03e1c93c OpcodeDispatcher: rework InsertNZCV
in prep for rmif.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 09:40:51 -04:00
Alyssa Rosenzweig
bdaa70405f OpcodeDispatcher: simplify flag control ops
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 09:40:51 -04:00
Alyssa Rosenzweig
1281145982 OpcodeDispatcher: Optimize popf
mostly for easier debugging tbh

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 09:40:51 -04:00
Alyssa Rosenzweig
87cac09477 OpcodeDispatcher: optimize cmc
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 09:40:51 -04:00
Alyssa Rosenzweig
5336129b58 IR: Optimize sub/sbb
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 09:40:51 -04:00
Alyssa Rosenzweig
72fc2b522d IR: Optimize add/adc
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 09:40:51 -04:00
Alyssa Rosenzweig
b6f6c84790 IR: Optimize tests
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 09:40:51 -04:00
Alyssa Rosenzweig
afdb8753ba IR: Remove some implicit flag clobbers
Do it explicitly for sve-256 and punt on optimizing, so we avoid regressing code
gen otherwise.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 09:40:51 -04:00
Alyssa Rosenzweig
f6a2e6739d IR: VCMPEQ doesn't clobber nzcv
cmeq not cmpeq!

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 09:40:51 -04:00
Alyssa Rosenzweig
d6569d510d Arm64: Keep host flags resident in NZCV
Rather than the context. Effectively a static register allocation scheme for
flags. This will let us optimize out a LOT of flag handling code, keeping things
in NZCV rather than needing to copy between NZCV and memory all the time.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 09:40:51 -04:00
Alyssa Rosenzweig
04e4993d9b OpcodeDispatcher: Add a kludge to save NZCV less
Some opcodes only clobber NZCV under certain circumstances, we don't yet have
a good way of encoding that. In the mean time this hot fixes some would-be
instcountci regressions.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 09:40:51 -04:00
Alyssa Rosenzweig
783e09d67d ConstProp: Remove select code motion
Problematic in the new approach and not sure what it's trying to accomplish tbh.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 09:40:51 -04:00
Alyssa Rosenzweig
314f478225 ConstProp: remove select+branch fusion
Not beneficial in the new approach to flags.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 09:40:51 -04:00
Alyssa Rosenzweig
c1dbc28aa2 OpcodeDispatcher: Implement SaveNZCV
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 09:40:51 -04:00
Alyssa Rosenzweig
8f7e393ffb Arm64: Don't clobber NZCV in CondJump
Again we need to handle this one specially because the dispatcher can't insert
restore code after the branch. It should be optimized in the near future, don't
worry.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 09:40:51 -04:00
Alyssa Rosenzweig
b3055523b4 IR: Switch to dedicated NZCV load/store
Semantics differ markedly from the non-NZCV flags, splitting this out makes it a
lot easier to do things correctly imho. Gets the dest/src size correct
(important for spilling), as well as makes our existing opt passes skip this
which is needed for correctness at the moment anyway.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-09 09:40:51 -04:00
Ryan Houdek
0dcbdcc0e2 FEX: Only pass CPU tunables to FEXCore and FEXLoader
This fixes an issue where CPU tunables were ending up in the thunk
generator which means if your CPU doesn't support all the features on
the *Builder* then it would crash with SIGILL. This was happening with
Canonical's runners because they typically only support ARMv8.2 but we
are compiling packages to run on ARMv8.4 devices.

cc: FEX-2311.1
2023-11-08 05:50:33 -08:00
Alyssa Rosenzweig
bf147f47b5
Merge pull request #3258 from Sonicadvance1/remove_warnings_16
Arm64Emitter: Fixes warning
2023-11-08 07:05:26 -04:00
Ryan Houdek
1fc6725826 Arm64Emitter: Fixes warning 2023-11-08 01:27:27 -08:00
Alyssa Rosenzweig
73958b9163 OpcodeDispatcher: Use DeriveOp
Replace every instance of the Op overwrite pattern, and ban that anti-pattern
from the codebase in the future. This will prevent piles of NZCV related
regressions.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-07 12:05:00 -04:00
Alyssa Rosenzweig
9b81a83894 OpcodeDispatcher: Add DeriveOp helper
The "create op with wrong opcode, then change the opcode" pattern is REALLY
dangerous. This does not address that. But when we start doing NZCV trickery, it
will get /more/ dangerous, and so it's time to add a helper and make the
convenient thing the safe(r) thing. This helper correctly saves NZCV /before/
the instruction like the real builders would. It also provides a spot for future
safety asserts if someone is motivated.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-07 12:05:00 -04:00
Alyssa Rosenzweig
8ee5b5cf50
Merge pull request #3253 from Sonicadvance1/fix_double_munmap
JITArm64: Fixes double munmap issue that was causing crashes
2023-11-06 07:57:01 -04:00
Ryan Houdek
829384e488 JITArm64: Fixes double munmap issue that was causing crashes
While tracking issues in #3162, I had encountered a random crash that I
started hunting. It was very quickly apparent that this crash was
unrelated to that PR. I just happened to be running a unittest that was
creating and tearing down a bunch of threads that exacerbated the
problem.

See as follows with the strace output:
```
[pid 269497] munmap(0x7fffde1ff000, 16777216) = 0
[pid 269497] munmap(0x7fffde1ff000, 16777216 <unfinished ...>
[pid 268982] mmap(NULL, 16777216, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fffde1ff000
[pid 269497] <... munmap resumed>)      = 0
```

One thread is freeing some memory with munmap, another one then does a mmap and gets the same address back.
Nothing too crazy at initial glance, but taking a closelier look, we can
see that there are two strange oddities:
1) We are double unmapping the same address range through munmap
2) The second munmap is interrupted and returns AFTER the mmap.

This has the unfortunate side-effect that the mmap that just returned
the same address has actually just been unmapped! This was resulting in
spurious crashes around thread creation that was SUPER hard to nail
down.

The problem comes down to how code buffer objects are managed, in
particular how the Arm64Emitter and Dispatcher handled its buffers.

Arm64Emitter is inherited by two classes; Dispatcher, and Arm64JITCore.
On class destruction the emitter would free its internal tracking
buffer. Additionally on destruction, the Arm64JITCore would walk through
all of its CodeBuffers and free them. The problem ends up being that in
the Arm64JITCore, it would free its code buffers which also ended up
being the current active buffer bound to the Arm64Emitter. Thus causing
the Arm64Emitter to come back around and try to free the same buffer
again.

This is a double-free problem! and was only visible on thread exiting!
Can't track double frees with mmap and munmap with current tooling!

This problem typically didn't occur because of how fast the destruction
usually takes and jemalloc inbetween also typically means the problem
doesn't occur. Initially thinking this was a threaded pool allocator bug
because typically the new allocation would end up in there once a new
thread was spinning up.

Now we change behaviour, Arm64Emitter doesn't do any buffer management
itself, instead just passing an initial buffer on to its internal buffer
tracking if given one up front.

This leaves the Dispatcher and the Arm64JITCore to do their buffer
management and ensuring there is no double free.

The day is saved!
2023-11-05 19:18:40 -08:00
Ryan Houdek
0af0427efd ARMEmitter: Fix GPR fill mask in FillStaticRegs
This mask was being used incorrectly, it's a GPR spill mask for host
GPRs not an index in to the SRA array. Search the array of SRA registers
for the first one in the mask first to use as a temporary.

Fixes an issue with 32-bit inline syscalls where the first register
being spilled was r8, which was beyond the size of SRA registers on
32-bit processes. This would cause FEX to read the value just after
x32::SRA which is x32::RA. This would mean it would use r20 as a
temporary, corrupting the register in the process.

I noticed this while poking at #3162, but also when I was looking at a
memory buffer ownership problem.
2023-11-05 02:54:48 -08:00
Ryan Houdek
5dee921300 FEXLoader: Wire up gdbserver in the frontend
Requires #3249 to be merged first

Library alerting has been disabled for now, and storing IR while
gdbserver is running is removed.

Otherwise no functional change.
2023-11-03 20:23:51 -07:00
Ryan Houdek
c0dcf8925a Move gdbserver to frontend 2023-11-03 20:23:51 -07:00
Ryan Houdek
e91c5ff906
Merge pull request #3249 from Sonicadvance1/gdbserver_frontend_prep
GDBServer: Preparation work to get this moved to the frontend
2023-11-03 20:21:28 -07:00
Ryan Houdek
190f7c27e0
Merge pull request #3243 from Sonicadvance1/loadfile_unsized
FEXCore/FileLoading: Updates helper to load file that is backed by memory
2023-11-03 20:21:12 -07:00
Ryan Houdek
b15f0b5d36 FEXCore/FileLoading: Updates helper to load file that is backed by memory
When attempting to read files that aren't backed by a filesystem then
our current read file helpers fail since they query the file size
upfront.

Change the helper so that it doesn't query the size and just reads the file if it
can be opened. This lets us read `/proc/self/maps` using helpers.
2023-11-03 07:01:39 -07:00
Ryan Houdek
e2c65189ff GDBServer: Preparation work to get this moved to the frontend
GDBServer is inherently OS specific which is why all this code is
removed when compiling for mingw/win32. This should get moved to the
frontend before we start landing more work to clean this interface up.

Not really any functional change.

Changes:

FEXCore/Context: Adds new public interfaces, these were previously
private.
- WaitForIdle
   - If `Pause` was called or the process is shutting down then this
     will wait until all threads have paused or exited.
- WaitForThreadsToRun
   - If `Pause` was previously called and then `Run` was called to get
     them running again, this waits until all the threads have come out
     of idle to avoid races.
- GetThreads
   - Returns the `InternalThreadData` for all the current threads.
   - GDBServer needs to know all the internal thread data state when the
     threads are paused which is what this gives it.

GDBServer:
- Removes usages of internal data structures where possible.
   - This gets it clean enough that moving it out of FEXCore is now
     possible.
2023-11-02 20:11:01 -07:00
Ryan Houdek
03f63f99a8 FEXCore: Moves StringUtils to FEXCore headers
Once gdbserver gets moved to the frontend this will need to be in the
includes.
2023-11-02 20:09:12 -07:00
Tony Wasserka
3a90dbbb35 Thunks: Print error if guest-provided callbacks are called asynchronously from the host 2023-11-02 19:50:58 +01:00
Ryan Houdek
5103f2d92b
Merge pull request #3247 from alyssarosenzweig/refactor/nzcv-prereq
Preparatory patches for nzcv
2023-11-01 14:07:14 -07:00
Alyssa Rosenzweig
d4a6b031ea
Merge pull request #3245 from Sonicadvance1/remove_gdbpausecheck
FEXCore: Removes gdb pause check handler
2023-11-01 15:46:40 -04:00
Alyssa Rosenzweig
319cf4bf3d Arm64: Preserve flags in ExitFunction
Mostly harmless (except for fusion on cortexes), prepares us for nzcv work.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-01 15:44:36 -04:00
Alyssa Rosenzweig
d75c0f2c50 IR: Add missing flag clobbers
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-01 15:44:36 -04:00
Alyssa Rosenzweig
a586d3823d IR: VFCMPEQ does not clobber flag
Oversight.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-01 15:44:36 -04:00
Alyssa Rosenzweig
5522c6db9c OpcodeDispatcher: Use jump wrappers
Mostly automated replacement + renaming for build fixing.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-01 15:44:36 -04:00
Alyssa Rosenzweig
367e1658ad OpcodeDispatcher: Add jump wrappers
These should always be used in the dispatcher rather than the raw jumps they
translate to, as they ensure that flags are flushed. Eliminates a class of bugs
that will become a lot easier to hit with the new nzcv work.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-01 15:44:36 -04:00
Alyssa Rosenzweig
bbad06f81a ArchHelpers: Add cfinv()
FlagM goodness.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-11-01 15:44:35 -04:00
Mai
9612b2fe4b
Merge pull request #3240 from Sonicadvance1/optimize_palignr_zero
OpcodeDispatcher: Optimize palignr with zero immediate
2023-11-01 05:55:52 +01:00