mirror of
https://github.com/FEX-Emu/FEX.git
synced 2025-02-11 09:56:46 +00:00
![Ryan Houdek](/assets/img/avatar_default.png)
A feature of FEX's JIT is that when an unaligned atomic load/store operation occurs, the instructions will be backpatched in to a barrier plus a non-atomic memory instruction. This is the half-barrier technique that still ensures correct visibility of loadstores in an unaligned context. The problem with this approach is that the dmb instructions are HEAVY, because they effectively stop the world until all memory operations in flight are visible. But it is a necessary evil since unaligned atomics aren't a thing on ARM processors. FEAT_LSE only gives you unaligned atomics inside of a 16-byte granularity, which doesn't match x86 behaviour of cacheline size (effectively always 64B). This adds a new TSO option to disable the half-barrier on unaligned atomic and instead only convert it to a regular loadstore instruction, ommiting the half-barrier. This gives more insight in to how well a CPU's LRCPC implementation is by not stalling on DMB instructions when possible. Originally implemented as a test to see if this makes Sonic Adventure 2 run full speed with TSO enabled (but all available TSO options disabled) on NVIDIA Orin. Unfortunately this basically makes the code no longer stall on dmb instructions and instead just showing how bad the LRCPC implementation is, since the stalls show up on `ldapur` instructions instead. Tested Sonic Adventure 2 on X13s and it ran at 60FPS there without the hack anyway.