FEX-Emu/FEX - FEX - Gitea: Git with a cup of tea

mirror of https://github.com/FEX-Emu/FEX.git synced 2025-01-05 21:09:56 +00:00

Author	SHA1	Message	Date
Ryan Houdek	97a68cb643	Telemetry: Change how visibility of telemetry values work Removes global initializer for telemetry values since their address is visible and PIC relative code loading handles the address fetching for us.	2024-07-12 03:18:23 -07:00
Ryan Houdek	870e395ac4	Merge pull request #3862 from Sonicadvance1/remove_atexit_logman LogManager: Removes fextl::vector usage	2024-07-12 02:05:02 -07:00
Ryan Houdek	5ef0db994d	VDSO: Stop using a vector for a static This causes a global initializer that registers an atexit handler. Be smarter, use an std::array and pass its data around using a span instead. Removes the global initializer and removes the atexit installation	2024-07-11 23:53:57 -07:00
Ryan Houdek	b523407a3e	LogManager: Removes fextl::vector usage We never use more than one logging method at a time so this was overengineered for what it is doing. Instead only allow one handler for messages and throw messages each which just is a pointer. Removes a global initializer and an atexit handler being installed	2024-07-11 22:51:56 -07:00
Mai	b282620a48	Merge pull request #3857 from Sonicadvance1/sve_bitperm Arm64: Implement support for SVE bitperm	2024-07-11 05:05:41 -04:00
Tony Wasserka	5dc4ab062d	Fix invalid-offsetof warnings due to InternalThreadState not being standard layout See https://github.com/llvm/llvm-project/issues/53021 for more information about unique_ptr turning non-standard-layout.	2024-07-11 09:54:30 +02:00
Ryan Houdek	3554d5c2f7	HostFeatures: Check for SVE bit permute extension	2024-07-10 21:45:07 -07:00
Tony Wasserka	470b435afd	fextl: Properly handle nullptr arguments in fextl::default_delete This reflects behavior of std::default_delete.	2024-07-10 19:17:50 +02:00
Billy Laws	e45e631199	AllocatorHooks: Allocate from the top down on windows FEX allocations can get in the way of allocations that are 4gb-limited even in 65-bit mode (i.e. those from LuaJIT), so allocate starting from the top of the AS to prevent conflicts.	2024-07-06 20:35:38 +00:00
Ryan Houdek	add0e7a8db	HostFeatures: Removes distinction between AVX and AVX2 We now no longer care about AVX versions, consolidate them in to a single config option which enables both.	2024-06-26 14:56:01 -07:00
Ryan Houdek	efa05ba19d	IR: Adds support for new SUBADD FMA constants ADDSUB didn't cover this new variant.	2024-06-25 11:22:22 -07:00
Ryan Houdek	d52a1da501	FEXCore: Implement support for fetching/setting YMM registers Because we have two views of the YMM registers depending on if the host supports SVE256 or not, add helper functions to fetch them correctly. We fetch them in the way that Linux desires them in signal handlers, if we want to return the converged view directly, that is easy to add support for. It's unnecessary for now.	2024-06-21 17:13:56 -04:00
Ryan Houdek	542ed8b6ad	Implement support for querying AES256 support This is a different feature flag than regular AES as the default AES+AVX only operates on 128-bit wide vectors. With the newer `VAES` extension this is expanded to 256-bit.	2024-06-19 05:51:47 -07:00
Ryan Houdek	bf812aae8f	CoreState: Adds avx_high structure for tracking decoupled AVX halves. Needed something inbetween the `InlineJITBlockHeader` and `avx_high` in order to match alignment requirements of 16-byte for avx_high. Chose the `DeferredSignalRefCount` because we hit it quite frequently and it is basically the only 64-bit variable that we end up touching significantly. In the future the CPUState object is going to need to change its view of the object depending on if the device supports SVE256 or not, but we don't need to frontload the work right now. It'll become significantly easier to support that path once the RCLSE pass gets deleted.	2024-06-18 12:00:45 -04:00
Ryan Houdek	9a71443005	CoreState: Adds a gregs offset check This is required to be less than the maximum range for LDP and STP in the Arm64 Dispatcher otherwise it breaks. Necessary to ensure this when reorganizing the CoreState.	2024-06-18 12:00:45 -04:00
Ryan Houdek	1ce27a5e6b	FEXCore: Disentangle the SVE256 feature from AVX In quite a few locations we are mixing the case that SVE256 == AVX or that AVX means the guest register size is 256-bit. While this is true today, this is entanglement is going to change very quickly and cause confusion in follow-up PRs. Now we have SVE128, SVE256, and SVE2 HostFeatures to disambiguate the different features which mean different things. This PR keeps the alias that `SupportsAVX` = `SupportsSVE256 && SupportsSVE2` but that alias is going to very quickly change its definition.	2024-06-17 17:20:32 -07:00
Ryan Houdek	a9bacc1b6b	CoreState: Move `InlineJITBlockHeader` to the start of the struct This currently doesn't do much but soon this will be very important to ensure the data prefetcher of Cortex keeps the cachelines following this variable in L1.	2024-06-17 02:59:56 -07:00
Alyssa Rosenzweig	cb00d9171f	IR: merge general DCE with flag DCE Flag DCE needs to do general DCE anyway to converge in one pass. So we can move the special syscall/atomic logic over to flag DCE and then drop the second DCE pass altogether. Now local dead code of both is eliminated in a single pass. Flag DCE is carefully written to converge in a single iteration which makes this scheme work. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-05-24 15:44:49 -04:00
Alyssa Rosenzweig	a10f984b1c	clang-format: left-align escaped newlines alternative to #3638. this is theoretically better for side-by-side diffs. in practice it may make other diffs worse since all the \'s change when part of the macro change. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-05-20 09:47:21 -04:00
Ryan Houdek	d19b57a52e	FEXCore: Get rid of DeferredSignalFaultAddress and use the InterruptFaultPage Arm64ec introduced the InterruptFaultPage which is lower overhead since instead of ldr+str it just turns in to a single str. We were already allocating the space, FEXCore and the frontend signal delegator just needed to be updated to understand the new location. We can additionally use this in the future if we want to make deferred async signals INSIDE the JIT only cost a single str as well.	2024-05-10 15:31:28 -07:00
Ryan Houdek	2cae2f2462	Merge pull request #3617 from bylaws/arm64ec-dispatcher FEXCore: ARM64EC x64 entry/exit support	2024-05-08 12:25:26 -07:00
Alyssa Rosenzweig	a2fc51fc7b	IR: specify registers, not offsets for SRA SRA is fundamentally about hardware registers, not stores into a software-defined context. So, it should take a register instead of an offset. This makes all the unaligned special cases unrepresentable (by design). Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-05-08 14:01:42 -04:00
Billy Laws	ab516d7b79	Dispatcher: Implement ARM64EC SRA setup entrypoints While the ARM64EC ABI mostly matches FEX's SRA, the stack still needs to be switched to the emulator stack and target RIP stored into the FEX context before jumping to the dispatcher loop.	2024-05-06 15:41:34 +00:00
Ryan Houdek	6463054fa3	Arm64: Adds another TSO hack to disable half-barrier TSO A feature of FEX's JIT is that when an unaligned atomic load/store operation occurs, the instructions will be backpatched in to a barrier plus a non-atomic memory instruction. This is the half-barrier technique that still ensures correct visibility of loadstores in an unaligned context. The problem with this approach is that the dmb instructions are HEAVY, because they effectively stop the world until all memory operations in flight are visible. But it is a necessary evil since unaligned atomics aren't a thing on ARM processors. FEAT_LSE only gives you unaligned atomics inside of a 16-byte granularity, which doesn't match x86 behaviour of cacheline size (effectively always 64B). This adds a new TSO option to disable the half-barrier on unaligned atomic and instead only convert it to a regular loadstore instruction, ommiting the half-barrier. This gives more insight in to how well a CPU's LRCPC implementation is by not stalling on DMB instructions when possible. Originally implemented as a test to see if this makes Sonic Adventure 2 run full speed with TSO enabled (but all available TSO options disabled) on NVIDIA Orin. Unfortunately this basically makes the code no longer stall on dmb instructions and instead just showing how bad the LRCPC implementation is, since the stalls show up on `ldapur` instructions instead. Tested Sonic Adventure 2 on X13s and it ran at 60FPS there without the hack anyway.	2024-04-24 13:09:00 -07:00
Paulo Matos	905aa935f5	Reformat until fixed-point Followup to `2b4ec88dae`. Some files needed a couple of calls to clang-format 16.0.6 to reach a fixed point.	2024-04-15 09:40:00 +02:00
Paulo Matos	2b4ec88dae	Whole-tree reformat This follows discussions from #3413. Followup commits add clang-format file, script and blame ignore lists.	2024-04-12 16:26:02 +02:00
Ryan Houdek	a9b7ad841c	Merge pull request #3570 from bylaws/ec_pt8 Enable jemalloc for ARM64EC	2024-04-11 12:57:36 -07:00
Billy Laws	f1f0c47f16	AllocatorHooks: Allow using jemalloc on win32	2024-04-09 23:42:23 +00:00
Lioncache	65b5281d7c	IR: Add constants for FLD variants	2024-04-09 10:13:33 -04:00
Ryan Houdek	1a8b61b9fc	Merge pull request #3560 from bylaws/ec-pt6 FEXCore: Support x64 -> arm64ec calls	2024-04-09 07:08:38 -07:00
Billy Laws	243bb45a68	FEXCore: Support x64 -> arm64ec calls The frontend will provide the return logic via ExitFunctionEC, which will be jumped to whenever there is an indirect branch/return to an addr such that RtlIsEcCode(addr) returns true.	2024-04-06 13:20:48 +00:00
Billy Laws	bd5b817c3a	AllocatorHooks: Mark JIT code memory as EC code on ARM64EC Executable mapped memory is treated as x86 code by default when running under EC, VirtualAlloc2 needs to be used together with a special flag to map JIT arm64 code.	2024-04-06 12:40:52 +00:00
Ryan Houdek	904646e93b	FEXCore: Fixes priority of FEX_APP_CONFIG This environment variable had an incorrect priority on the configuration system. The expectation was higher priority than most other layers. Now the only layer that has higher priority is the environment variables.	2024-04-05 13:10:43 -07:00
Ryan Houdek	e2a095372e	Merge pull request #3534 from Sonicadvance1/move_ir_defines FEXCore: Move nearly all IR definitions to internal	2024-04-01 10:00:20 -07:00
Ryan Houdek	5c29c9d464	Merge pull request #3527 from Sonicadvance1/move_type_defines Moves FHU TypeDefines to FEXCore includes	2024-04-01 08:57:22 -07:00
Ryan Houdek	3bed305660	Merge pull request #3526 from Sonicadvance1/move_codeloader FEXCore: Moves CodeLoader to frontend	2024-04-01 07:52:02 -07:00
Ryan Houdek	f6639c3594	Merge pull request #3525 from Sonicadvance1/move_cpubackend FEXCore: Moves CPUBackend definition internal	2024-04-01 06:47:34 -07:00
Ryan Houdek	ed3af580c5	FEXCore: Move nearly all IR definitions to internal It has been a long time coming that FEX no longer needed to leak IR implementation details to the frontend, this was legacy due to IR CI and various other problems. Now that the last bits of IR leaking has been removed, move everything that we can internally to the implementation. We still have a couple of minor details in the exposed IR.h to the frontend, but these are limited to a few enums and some thunking struct information rather than all the implementation details. No functional change with this, just moving headers around.	2024-03-29 17:20:18 -07:00
Ryan Houdek	8564290f76	FEXCore: Remove DebugStore map This hasn't been used and is blocking refactoring more code.	2024-03-29 14:58:44 -07:00
Ryan Houdek	d11a36eaea	Moves FHU TypeDefines to FEXCore includes FEXCore includes was including an FHU header which would result in compilation failure for external projects trying to link to libFEXCore. Moves it over to fix this, it was the only FHU usage in FEXCore/include NFC	2024-03-29 02:54:54 -07:00
Ryan Houdek	f46e88ebdb	FEXCore: Moves CPUBackend definition internal This is no longer necessary to be part of the public API. Moves the header internally. Needed to pass through `IsAddressInCodeBuffer` from CPUBackend through the Context object, but otherwise no functional change.	2024-03-29 02:27:29 -07:00
Ryan Houdek	20eb338644	FEXCore: Moves CodeLoader to frontend FEXCore no longer has a need for this since a bunch of related code was already moved to the frontend. Move the CodeLoader now.	2024-03-29 02:24:53 -07:00
Ryan Houdek	7f90ca53f7	Merge pull request #3505 from Sonicadvance1/telemetry_noncanonical Telemetry: Adds tracker for non-canonical memory access crash	2024-03-26 23:21:32 -07:00
Ryan Houdek	6f29e75f67	FEXCore: Removes vestigial mman SMC checking This wasn't actually wired up to anything ever since some refactoring occured two years ago.	2024-03-26 02:56:26 -07:00
Ryan Houdek	5a35e119fe	Telemetry: Adds tracker for non-canonical memory access crash This may be useful for tracking TSO faulting when it manages to fetch stale data. While most TSO crashes are due to nullptr dereferences, this can still check for the corruption case.	2024-03-21 20:47:36 -07:00
Ryan Houdek	fd391b1b18	JIT: Optimize pmovmaskb with a named vector constant I was looking at some other JIT overheads and this cropped up as some overhead. Instead of materializing a constant using mov+movk+movk+movk, load it from the named vector constant array. In a micro-benchmark this improved performance by 34%. In bytemark this improved on subbench by 0.82%	2024-03-17 18:40:46 -07:00
Ryan Houdek	8a3d08e1d8	Merge pull request #3483 from neobrain/refactor_stealmemoryregion Allocator: Cleanup StealMemoryRegions implementation	2024-03-14 03:21:09 -07:00
Tony Wasserka	a047ac1699	Allocator: Test CollectMemoryGaps instead of StealMemoryRegions and restore the original interfaces	2024-03-12 10:49:31 +01:00
Tony Wasserka	dce9f651fd	Allocator: Split off memory gap collection to a separate function This function can be unit-tested more easily, and the stack special is more cleanly handled as a post-collection step. There is a minor functional change: The stack special case didn't trigger previously if the range end was within the stack mapping. This is now fixed.	2024-03-12 10:49:30 +01:00
Tony Wasserka	0d71f169d0	Allocator: Adopt a more testable interface for StealMemoryRegions	2024-03-12 10:49:30 +01:00

1 2 3 4 5

209 Commits