Commit Graph

209 Commits

Author SHA1 Message Date
Ryan Houdek
97a68cb643
Telemetry: Change how visibility of telemetry values work
Removes global initializer for telemetry values since their address is
visible and PIC relative code loading handles the address fetching for
us.
2024-07-12 03:18:23 -07:00
Ryan Houdek
870e395ac4
Merge pull request #3862 from Sonicadvance1/remove_atexit_logman
LogManager: Removes fextl::vector usage
2024-07-12 02:05:02 -07:00
Ryan Houdek
5ef0db994d
VDSO: Stop using a vector for a static
This causes a global initializer that registers an atexit handler.

Be smarter, use an std::array and pass its data around using a span
instead.

Removes the global initializer and removes the atexit installation
2024-07-11 23:53:57 -07:00
Ryan Houdek
b523407a3e
LogManager: Removes fextl::vector usage
We never use more than one logging method at a time so this was
overengineered for what it is doing.

Instead only allow one handler for messages and throw messages each
which just is a pointer.

Removes a global initializer and an atexit handler being installed
2024-07-11 22:51:56 -07:00
Mai
b282620a48
Merge pull request #3857 from Sonicadvance1/sve_bitperm
Arm64: Implement support for SVE bitperm
2024-07-11 05:05:41 -04:00
Tony Wasserka
5dc4ab062d Fix invalid-offsetof warnings due to InternalThreadState not being standard layout
See https://github.com/llvm/llvm-project/issues/53021 for more information
about unique_ptr turning non-standard-layout.
2024-07-11 09:54:30 +02:00
Ryan Houdek
3554d5c2f7
HostFeatures: Check for SVE bit permute extension 2024-07-10 21:45:07 -07:00
Tony Wasserka
470b435afd fextl: Properly handle nullptr arguments in fextl::default_delete
This reflects behavior of std::default_delete.
2024-07-10 19:17:50 +02:00
Billy Laws
e45e631199 AllocatorHooks: Allocate from the top down on windows
FEX allocations can get in the way of allocations that are 4gb-limited
even in 65-bit mode (i.e. those from LuaJIT), so allocate starting from
the top of the AS to prevent conflicts.
2024-07-06 20:35:38 +00:00
Ryan Houdek
add0e7a8db
HostFeatures: Removes distinction between AVX and AVX2
We now no longer care about AVX versions, consolidate them in to a
single config option which enables both.
2024-06-26 14:56:01 -07:00
Ryan Houdek
efa05ba19d
IR: Adds support for new SUBADD FMA constants
ADDSUB didn't cover this new variant.
2024-06-25 11:22:22 -07:00
Ryan Houdek
d52a1da501 FEXCore: Implement support for fetching/setting YMM registers
Because we have two views of the YMM registers depending on if the host
supports SVE256 or not, add helper functions to fetch them correctly.

We fetch them in the way that Linux desires them in signal handlers, if
we want to return the converged view directly, that is easy to add
support for. It's unnecessary for now.
2024-06-21 17:13:56 -04:00
Ryan Houdek
542ed8b6ad
Implement support for querying AES256 support
This is a different feature flag than regular AES as the default AES+AVX
only operates on 128-bit wide vectors.

With the newer `VAES` extension this is expanded to 256-bit.
2024-06-19 05:51:47 -07:00
Ryan Houdek
bf812aae8f CoreState: Adds avx_high structure for tracking decoupled AVX halves.
Needed something inbetween the `InlineJITBlockHeader` and `avx_high` in
order to match alignment requirements of 16-byte for avx_high. Chose the
`DeferredSignalRefCount` because we hit it quite frequently and it is
basically the only 64-bit variable that we end up touching
significantly.

In the future the CPUState object is going to need to change its view of
the object depending on if the device supports SVE256 or not, but we
don't need to frontload the work right now. It'll become significantly
easier to support that path once the RCLSE pass gets deleted.
2024-06-18 12:00:45 -04:00
Ryan Houdek
9a71443005 CoreState: Adds a gregs offset check
This is required to be less than the maximum range for LDP and STP in
the Arm64 Dispatcher otherwise it breaks. Necessary to ensure this when
reorganizing the CoreState.
2024-06-18 12:00:45 -04:00
Ryan Houdek
1ce27a5e6b
FEXCore: Disentangle the SVE256 feature from AVX
In quite a few locations we are mixing the case that SVE256 == AVX or
that AVX means the guest register size is 256-bit.

While this is true today, this is entanglement is going to change very
quickly and cause confusion in follow-up PRs.

Now we have SVE128, SVE256, and SVE2 HostFeatures to disambiguate the
different features which mean different things.

This PR keeps the alias that `SupportsAVX` = `SupportsSVE256 && SupportsSVE2`
but that alias is going to very quickly change its definition.
2024-06-17 17:20:32 -07:00
Ryan Houdek
a9bacc1b6b
CoreState: Move InlineJITBlockHeader to the start of the struct
This currently doesn't do much but soon this will be very important to
ensure the data prefetcher of Cortex keeps the cachelines following this
variable in L1.
2024-06-17 02:59:56 -07:00
Alyssa Rosenzweig
cb00d9171f IR: merge general DCE with flag DCE
Flag DCE needs to do general DCE anyway to converge in one pass. So we can move
the special syscall/atomic logic over to flag DCE and then drop the second DCE
pass altogether. Now local dead code of both is eliminated in a single pass.

Flag DCE is carefully written to converge in a single iteration which makes this
scheme work.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-05-24 15:44:49 -04:00
Alyssa Rosenzweig
a10f984b1c clang-format: left-align escaped newlines
alternative to #3638. this is theoretically better for side-by-side diffs. in
practice it may make other diffs worse since all the \'s change when part of the
macro change.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-05-20 09:47:21 -04:00
Ryan Houdek
d19b57a52e
FEXCore: Get rid of DeferredSignalFaultAddress and use the InterruptFaultPage
Arm64ec introduced the InterruptFaultPage which is lower overhead since
instead of ldr+str it just turns in to a single str. We were already
allocating the space, FEXCore and the frontend signal delegator just
needed to be updated to understand the new location.

We can additionally use this in the future if we want to make deferred
async signals INSIDE the JIT only cost a single str as well.
2024-05-10 15:31:28 -07:00
Ryan Houdek
2cae2f2462
Merge pull request #3617 from bylaws/arm64ec-dispatcher
FEXCore: ARM64EC x64 entry/exit support
2024-05-08 12:25:26 -07:00
Alyssa Rosenzweig
a2fc51fc7b IR: specify registers, not offsets for SRA
SRA is fundamentally about hardware registers, not stores into a
software-defined context. So, it should take a register instead of an offset.
This makes all the unaligned special cases unrepresentable (by design).

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-05-08 14:01:42 -04:00
Billy Laws
ab516d7b79 Dispatcher: Implement ARM64EC SRA setup entrypoints
While the ARM64EC ABI mostly matches FEX's SRA, the stack still needs to
be switched to the emulator stack and target RIP stored into the FEX
context before jumping to the dispatcher loop.
2024-05-06 15:41:34 +00:00
Ryan Houdek
6463054fa3
Arm64: Adds another TSO hack to disable half-barrier TSO
A feature of FEX's JIT is that when an unaligned atomic load/store
operation occurs, the instructions will be backpatched in to a barrier
plus a non-atomic memory instruction. This is the half-barrier technique
that still ensures correct visibility of loadstores in an unaligned
context.

The problem with this approach is that the dmb instructions are HEAVY,
because they effectively stop the world until all memory operations in
flight are visible. But it is a necessary evil since unaligned atomics
aren't a thing on ARM processors. FEAT_LSE only gives you unaligned
atomics inside of a 16-byte granularity, which doesn't match x86
behaviour of cacheline size (effectively always 64B).

This adds a new TSO option to disable the half-barrier on unaligned
atomic and instead only convert it to a regular loadstore instruction,
ommiting the half-barrier. This gives more insight in to how well a
CPU's LRCPC implementation is by not stalling on DMB instructions when
possible.

Originally implemented as a test to see if this makes Sonic Adventure 2
run full speed with TSO enabled (but all available TSO options disabled)
on NVIDIA Orin. Unfortunately this basically makes the code no longer
stall on dmb instructions and instead just showing how bad the LRCPC
implementation is, since the stalls show up on `ldapur` instructions
instead.

Tested Sonic Adventure 2 on X13s and it ran at 60FPS there without the
hack anyway.
2024-04-24 13:09:00 -07:00
Paulo Matos
905aa935f5 Reformat until fixed-point
Followup to 2b4ec88dae.
Some files needed a couple of calls to clang-format 16.0.6 to
reach a fixed point.
2024-04-15 09:40:00 +02:00
Paulo Matos
2b4ec88dae Whole-tree reformat
This follows discussions from #3413.
Followup commits add clang-format file, script and blame ignore lists.
2024-04-12 16:26:02 +02:00
Ryan Houdek
a9b7ad841c
Merge pull request #3570 from bylaws/ec_pt8
Enable jemalloc for ARM64EC
2024-04-11 12:57:36 -07:00
Billy Laws
f1f0c47f16 AllocatorHooks: Allow using jemalloc on win32 2024-04-09 23:42:23 +00:00
Lioncache
65b5281d7c IR: Add constants for FLD variants 2024-04-09 10:13:33 -04:00
Ryan Houdek
1a8b61b9fc
Merge pull request #3560 from bylaws/ec-pt6
FEXCore: Support x64 -> arm64ec calls
2024-04-09 07:08:38 -07:00
Billy Laws
243bb45a68 FEXCore: Support x64 -> arm64ec calls
The frontend will provide the return logic via ExitFunctionEC, which
will be jumped to whenever there is an indirect branch/return to an addr
such that RtlIsEcCode(addr) returns true.
2024-04-06 13:20:48 +00:00
Billy Laws
bd5b817c3a AllocatorHooks: Mark JIT code memory as EC code on ARM64EC
Executable mapped memory is treated as x86 code by default when
running under EC, VirtualAlloc2 needs to be used together with a
special flag to map JIT arm64 code.
2024-04-06 12:40:52 +00:00
Ryan Houdek
904646e93b
FEXCore: Fixes priority of FEX_APP_CONFIG
This environment variable had an incorrect priority on the configuration
system. The expectation was higher priority than most other layers.

Now the only layer that has higher priority is the environment
variables.
2024-04-05 13:10:43 -07:00
Ryan Houdek
e2a095372e
Merge pull request #3534 from Sonicadvance1/move_ir_defines
FEXCore: Move nearly all IR definitions to internal
2024-04-01 10:00:20 -07:00
Ryan Houdek
5c29c9d464
Merge pull request #3527 from Sonicadvance1/move_type_defines
Moves FHU TypeDefines to FEXCore includes
2024-04-01 08:57:22 -07:00
Ryan Houdek
3bed305660
Merge pull request #3526 from Sonicadvance1/move_codeloader
FEXCore: Moves CodeLoader to frontend
2024-04-01 07:52:02 -07:00
Ryan Houdek
f6639c3594
Merge pull request #3525 from Sonicadvance1/move_cpubackend
FEXCore: Moves CPUBackend definition internal
2024-04-01 06:47:34 -07:00
Ryan Houdek
ed3af580c5
FEXCore: Move nearly all IR definitions to internal
It has been a long time coming that FEX no longer needed to leak IR
implementation details to the frontend, this was legacy due to IR CI and
various other problems.

Now that the last bits of IR leaking has been removed, move everything
that we can internally to the implementation.
We still have a couple of minor details in the exposed IR.h to the
frontend, but these are limited to a few enums and some thunking struct
information rather than all the implementation details.

No functional change with this, just moving headers around.
2024-03-29 17:20:18 -07:00
Ryan Houdek
8564290f76
FEXCore: Remove DebugStore map
This hasn't been used and is blocking refactoring more code.
2024-03-29 14:58:44 -07:00
Ryan Houdek
d11a36eaea
Moves FHU TypeDefines to FEXCore includes
FEXCore includes was including an FHU header which would result in
compilation failure for external projects trying to link to libFEXCore.

Moves it over to fix this, it was the only FHU usage in FEXCore/include
NFC
2024-03-29 02:54:54 -07:00
Ryan Houdek
f46e88ebdb
FEXCore: Moves CPUBackend definition internal
This is no longer necessary to be part of the public API. Moves the
header internally.

Needed to pass through `IsAddressInCodeBuffer` from CPUBackend through
the Context object, but otherwise no functional change.
2024-03-29 02:27:29 -07:00
Ryan Houdek
20eb338644
FEXCore: Moves CodeLoader to frontend
FEXCore no longer has a need for this since a bunch of related code was
already moved to the frontend. Move the CodeLoader now.
2024-03-29 02:24:53 -07:00
Ryan Houdek
7f90ca53f7
Merge pull request #3505 from Sonicadvance1/telemetry_noncanonical
Telemetry: Adds tracker for non-canonical memory access crash
2024-03-26 23:21:32 -07:00
Ryan Houdek
6f29e75f67
FEXCore: Removes vestigial mman SMC checking
This wasn't actually wired up to anything ever since some refactoring
occured two years ago.
2024-03-26 02:56:26 -07:00
Ryan Houdek
5a35e119fe
Telemetry: Adds tracker for non-canonical memory access crash
This may be useful for tracking TSO faulting when it manages to fetch
stale data. While most TSO crashes are due to nullptr dereferences, this
can still check for the corruption case.
2024-03-21 20:47:36 -07:00
Ryan Houdek
fd391b1b18
JIT: Optimize pmovmaskb with a named vector constant
I was looking at some other JIT overheads and this cropped up as some
overhead. Instead of materializing a constant using mov+movk+movk+movk,
load it from the named vector constant array.

In a micro-benchmark this improved performance by 34%.
In bytemark this improved on subbench by 0.82%
2024-03-17 18:40:46 -07:00
Ryan Houdek
8a3d08e1d8
Merge pull request #3483 from neobrain/refactor_stealmemoryregion
Allocator: Cleanup StealMemoryRegions implementation
2024-03-14 03:21:09 -07:00
Tony Wasserka
a047ac1699 Allocator: Test CollectMemoryGaps instead of StealMemoryRegions and restore the original interfaces 2024-03-12 10:49:31 +01:00
Tony Wasserka
dce9f651fd Allocator: Split off memory gap collection to a separate function
This function can be unit-tested more easily, and the stack special is more
cleanly handled as a post-collection step.

There is a minor functional change: The stack special case didn't trigger
previously if the range end was within the stack mapping. This is now fixed.
2024-03-12 10:49:30 +01:00
Tony Wasserka
0d71f169d0 Allocator: Adopt a more testable interface for StealMemoryRegions 2024-03-12 10:49:30 +01:00