Ronald Caesar b18bd29f66 Add .gitignore
Signed-off-by: Ronald Caesar <github43132@proton.me>
2025-12-07 14:15:06 -04:00
2025-12-07 14:15:06 -04:00
2025-12-07 13:09:15 -04:00
2025-12-07 13:09:15 -04:00
2025-12-07 13:09:15 -04:00

The Ballistic JIT Engine

“Dynarmic but faster"

Overview

This is a rewrite the dynarmic recompiler, with the goal of fixing its many flaws.

Dynarmic Flaws

  • The JIT state structure exceeds 64 bytes (the typical size of a CPU cache line). This makes it difficult to store reliably in the cache, leading to eviction issues and inefficient access patterns.
  • The JIT state pointer is constantly read from and written to. CPU branch mispredictions regarding the next incoming store/read result in significant performance penalties.
  • The allocation strategy for compiled code blocks causes new blocks to evict older blocks, but the execution flow often jumps back to those older, now-evicted blocks, causing instruction cache misses.
  • Calls made from JIT code back to the C++ host environment result in total cache thrashing and stack clobbering, disrupting the execution pipeline.
  • The setup and teardown code for each basic block is stupidly large (approximately 128 bytes per block), wasting memory and instruction cache space
  • Unlike GCC or LLVM, the current JIT backend lacks a peephole optimizer to perform local code improvements (e.g., instruction combining or redundant instruction removal).
  • XMM (SSE/AVX) register spilling is not properly implemented, leading to potential data corruption or inefficient register usage.
  • The code relies heavily on loading absolute pointers. It is suggested to use base[index] addressing, which allows the CPU to use a fast LEA (Load Effective Address) instruction, avoiding the latency of loading 8-byte pointers (which might also be unaligned).
  • The current intrusive list implementation relies on pointer chasing, which destroys data locality. * Proposed Solution: Switch to dense linked lists. This involves using a backing array of elements and an array of indices. Swapping indices is faster than swapping pointers, and keeping data contiguous improves cache coherency.
  • The Intermediate Representation layer is too heavy, creating a large memory footprint and performance hotspots.
  • The Argument class uses excessive memory for every instance. Since arguments are ubiquitous in the IR, this results in significant cumulative memory waste.
  • The C++ compiler fails to devirtualize critical calls, particularly terminal handlers and coprocessor logic. This adds the overhead of virtual function lookups to hot code paths.
  • The IsImmediate() function utilizes recursion, which constantly clobbers the micro-op (uop) cache, degrading CPU front-end performance.
  • There are "shenanigans" regarding mmap usage where the code attempts to outsmart the OS/Compiler but likely results in suboptimal memory management.
Description
⚠️ ARCHIVED: Original GitHub repository no longer exists. Preserved as backup on 2026-01-31T05:27:53.380Z
Readme GPL-2.0 12 MiB
Languages
C 91%
XSLT 6.7%
Python 1.4%
CSS 0.6%
CMake 0.3%