The ARM64EC SRA layout will use x0-3 for x86_64 registers, as such any
arguments passed to C ABI functions need to proxy their arguments
through the temporaries and move as appropriate.
We had a chance of doing an additional bogus wfe if the expected value
was hit in one iteration of a loop. Not the biggest problem on current
hardware where WFE only ever sleeps for 1-4 system cycles, but on future
hardware where WFE might actually sleep for longer then this could have
been an issue.
Noticed this while writing #3342.
Fixes#3343
The syscall instruction is defined in the documentation that it will set
RCX to the next instruction's RIP and R11 to be RFLAGS. We entirely
skipped this which I noticed while writing unit tests.
Adds unittests to test both 32-bit and 64-bit behaviour because our
helper shares code with both.
I don't know if anything actually relied on this behaviour but we should
definitely support it.
Primary goal for this is to ensure that the delinker doesn't need to
allocate any memory. This delinker can end up getting hit heavily with
JIT code so we don't want it to be allocating memory.
Currently all uses of the forward label calls in to jemalloc to allocate
memory. This allows a forward label that doesn't require any memory
allocation, which is the common case in FEX.
The delinker step of the JIT was using std::function with capture
lambdas that required memory allocation when unnecessary.
Because the compiler can't see through our std::function usage it could
never decompose these by itself.
By passing the Thread's frame and record to the function as arguments
then we can have the signature be a raw function pointer.
This fixes an area of concern from:
https://github.com/FEX-Emu/FEX/blob/main/docs/ProgrammingConcerns.md#stdfunction-and-lambdas
If the Dst register is allocated as VectorIndices or VectorTable,
using Dst as an operand to perform the tbx operation will result in an error.
For example:
%131(FPR0) i128 = LoadNamedVectorIndexedConstant u8:Tmp:RegisterSize, #0x6, #0xaa0
%132(FPR0) i128 = VTBX1 u8:Tmp:RegisterSize, %129(FPRFixed6) i32v4, %126(FPRFixed10) i16v8, %131(FPR0) i128
Since the tbx instruction's destination register is also the original operand,
this is consistent with the semantics of VTBX1. Therefore,
directly using VectorSrcDst as the destination operand for the tbx instruction is safe.
While locking a shared_lock and doing an empty table lookup is fairly
fast, just remove them from the hot path entirely if no custom IR
handlers are installed.
This is only used for our IRLoader, which is losing its importance
significantly and should probably be removed anyway.
This unit test hasn't really served any purpose for a while now and
mostly just causes pain when reworking things in the IR.
Just remove the IRLoader, its unit tests, the github action steps and
the public FEXCore interface to it. Since it isn't used by anything
other than Thunks.
Also moves some IR definitions from the public API to the backend.
Need #3348 merged first.
As I was casually thinking, this code made me realize that it was quite
branch heavy and could likely be optimized to logic.
The previous code generated some fairly nasty branch heavy code. This
can be optimized to be branchless and take roughly five instructions
per flag. Using a bitfield for each feature would turn each calculation
in to 3-4 instructions but that seems overkill.
Very minor thing.
We only used this so that our Xavier CI system which were running old
kernels could run unit tests. We have now removed the Xaviers from CI
and this is no longer necessary.
Stop pretending that we support kernels older than 5.0 and allowing this
fallback.
The 32-bit allocator is still used for the MAP_32BIT mmap flag, so the
load bearing code can't be fully removed. Just remove the config and the
frontend things using it.
Currently no functional change but public API breaks should come early.
The thread state object will be used for looking up thread specific
codebuffers in the future when we support MDWE with code mirrors.
We can safely call virtual functions through the JIT with a little bit
of work.
FEX's JIT has quite a few steps before it gets to a syscall handler.
Before this commit:
JIT->static HandleSyscall->SyscallHandler::HandleSyscall->SyscallHandler
After this commit:
JIT->SyscallHandler::HandleSyscall->SyscallHandler
A bit hard to notice this when this interface can spin at 67-million
calls per second though.
This has the Frontend and OpcodeDispatcher select their operating mode
depending on the incoming code segment long-mode flag.
Adds some asserts since currently it is unexpected if the configuration
changes at runtime.
This is fairly straightforward for an initial setup but isn't fully
fleshed out.
Right now FEX's x86 tables aren't setup in a way to support choosing a
different instruction decoding depending on runtime operating mode
change, so that would break in interesting ways.
Primarily this just gets FEX setup to start piping the operating mode
through from the frontend to the backend. This is a long term task, so
it is going to take a long time to iron out all the issues.
Previously we were only storing the 32-bit base address which isn't
actually how segment descriptors work.
In reality segment descriptors are 64-bit descriptors that are laid out
in a particular layout depending on the 4-bit type value. In reality we
only care about code and data segment layouts since the rest are
bonkers.
Describe these descriptors correctly and setup a default code descriptor
for the operating mode that FEX is starting in.