If the Dst register is allocated as VectorIndices or VectorTable,
using Dst as an operand to perform the tbx operation will result in an error.
For example:
%131(FPR0) i128 = LoadNamedVectorIndexedConstant u8:Tmp:RegisterSize, #0x6, #0xaa0
%132(FPR0) i128 = VTBX1 u8:Tmp:RegisterSize, %129(FPRFixed6) i32v4, %126(FPRFixed10) i16v8, %131(FPR0) i128
Since the tbx instruction's destination register is also the original operand,
this is consistent with the semantics of VTBX1. Therefore,
directly using VectorSrcDst as the destination operand for the tbx instruction is safe.
While locking a shared_lock and doing an empty table lookup is fairly
fast, just remove them from the hot path entirely if no custom IR
handlers are installed.
This is only used for our IRLoader, which is losing its importance
significantly and should probably be removed anyway.
This unit test hasn't really served any purpose for a while now and
mostly just causes pain when reworking things in the IR.
Just remove the IRLoader, its unit tests, the github action steps and
the public FEXCore interface to it. Since it isn't used by anything
other than Thunks.
Also moves some IR definitions from the public API to the backend.
Need #3348 merged first.
As I was casually thinking, this code made me realize that it was quite
branch heavy and could likely be optimized to logic.
The previous code generated some fairly nasty branch heavy code. This
can be optimized to be branchless and take roughly five instructions
per flag. Using a bitfield for each feature would turn each calculation
in to 3-4 instructions but that seems overkill.
Very minor thing.
We only used this so that our Xavier CI system which were running old
kernels could run unit tests. We have now removed the Xaviers from CI
and this is no longer necessary.
Stop pretending that we support kernels older than 5.0 and allowing this
fallback.
The 32-bit allocator is still used for the MAP_32BIT mmap flag, so the
load bearing code can't be fully removed. Just remove the config and the
frontend things using it.
Currently no functional change but public API breaks should come early.
The thread state object will be used for looking up thread specific
codebuffers in the future when we support MDWE with code mirrors.
We can safely call virtual functions through the JIT with a little bit
of work.
FEX's JIT has quite a few steps before it gets to a syscall handler.
Before this commit:
JIT->static HandleSyscall->SyscallHandler::HandleSyscall->SyscallHandler
After this commit:
JIT->SyscallHandler::HandleSyscall->SyscallHandler
A bit hard to notice this when this interface can spin at 67-million
calls per second though.
This has the Frontend and OpcodeDispatcher select their operating mode
depending on the incoming code segment long-mode flag.
Adds some asserts since currently it is unexpected if the configuration
changes at runtime.
This is fairly straightforward for an initial setup but isn't fully
fleshed out.
Right now FEX's x86 tables aren't setup in a way to support choosing a
different instruction decoding depending on runtime operating mode
change, so that would break in interesting ways.
Primarily this just gets FEX setup to start piping the operating mode
through from the frontend to the backend. This is a long term task, so
it is going to take a long time to iron out all the issues.
Previously we were only storing the 32-bit base address which isn't
actually how segment descriptors work.
In reality segment descriptors are 64-bit descriptors that are laid out
in a particular layout depending on the 4-bit type value. In reality we
only care about code and data segment layouts since the rest are
bonkers.
Describe these descriptors correctly and setup a default code descriptor
for the operating mode that FEX is starting in.
This will result in FEX not being able to allocate executable memory.
We can use shared memory in the future to work around this but for now
we don't support that as a fix.
Lots going on here.
This moves OS thread object lifetime management and internal thread
state lifetime management to the frontend. This causes a bunch of thread
handling to move from the FEXCore Context to the frontend.
Looking at `FEXCore/include/FEXCore/Core/Context.h` really shows how
much of the API has moved to the frontend that FEXCore no longer needs
to manage. Primarily this makes FEXCore itself no longer need to care
about most of the management of the emulation state.
A large amount of the behaviour moved wholesale from Core.cpp to
LinuxEmulation's ThreadManager.cpp. Which this manages the lifetimes of
both the OS threads and the FEXCore thread state objects.
One feature lost was the instruction capability, but this was already
buggy and is going to be rewritten/fixed when gdbserver work continues.
Now that all of this management is moved to the frontend, the gdbserver
can start improving since it can start managing all thread state
directly.
Similar to #3284 but works around some of the bugs that one introduced.
This is the minimal amount of changes to move the ownership from FEXCore
to the frontend. Since the frontends don't yet have a full thread state
tracking, there is an opaque pointer that needs to be managed.
In the followup commits this will be changed to have the syscall handler
to be the thread object manager.
This was a temporary header to help with when this header was migrated
to our public API headers.
It's temporary nature is no longer necessary, just get rid of it.
No need to wait for initialization on for this anymore.
Ever since Init was refactored to do basically no work, this hasn't been
necessary.
CPUID does need to still be initialized after HostFeatures though, so
need to ensure correct member ordering there.
When the address calculation for SIB has both index and base then we can
optimize this to an add with a shifted register. This will convert a
three instruction sequence in to one instruction in most cases.
While we were calling this function, its asserting nature hasn't been
used for a long time.
This used to trigger more frequently when CompileBlock would fail to
compile code, either due to not being able to decode an instruction or
hitting an instruction that FEX doesn't understand.
When these cases are hit today we still generate code blocks which
generate SIGILL. This means that this code was actually never hit.
Completely remove this function and have the JIT's dispatcher call the
CompileBlock function directly. Signature is slightly different since we
need to set x3 to be 0.
git blame shows that 718b3e6b4cc577cda8710b8c1f7ac5b59563814c added this
handler.
It doesn't explain why this was desired but it was never wired up to
anything. Just remove it.
Reduces the ELF's VM size from 9.8MB down to 9.37MB and should reduce
initialization time a smidge.
Slammed this out while waiting for other PRs to get reviewed.
Fairly lightweight since it is almost 1:1 transplanting the code from
FEXCore in to the SyscallHandler's thread creation code.
Minor changes:
- ExecutionThreadHandler gets freed before executing the thread
- Saves 16-bytes of memory per thread
- Start all threads paused by default
- Since I moved the code to the frontend, I noticed we needed to do
some post thread-creation setup.
- Without the pause we were racing code execution with TLS setup and
a few other things.