This now improves the instruction implementation from 17 instructions
down to 5 or 6 depending on if the host supports SVE.
I would say this is now optimal.
The number of times the implicit size calculation in GPR operations has
bit us is immeasurable and was a mistake from the start of the project.
The vector based operations never had this problem since they were
explicitly sized for a long time now.
This converts the base IR operations to be explicitly sized, but adds
implicit sized helpers for the moment while we work on removing implicit
usage from the OpcodeDispatcher.
Should be NFC at this moment but it is a big enough change that I want
it in before the "real" work starts.
Use a named constant for loading the sign inversion, then EOR the second
source and just FAdd it all.
In a vacuum it isn't a significant improvement, but as soon as more than
one instruction is in a block it will eventually get optimized with
named constant caching and be a significant win.
Thanks to @rygorous for the idea!
VRev32 matches Arm64 semantics directly.
LoadNamedVectorConstant allows FEX to quickly load "named constants".
This will allow us to have specific hardcoded vector constant values
that we can load with a ldr(State)+ldr(Value) and will be more abused in
the future.
This also allows us to do a very simple optimization in the future where
we can optimize away redundant loads of these loads if they are used
multiple times in the same block. (Not implemented here).
This takes a similar approach to deferred signal handling and allows any given
thread to be interrupted while running JIT code by protecting the appropriate
page as RO. When the thread then enters a new block, it will try to acccess
that page and segfault. This is safer than just sending a signal to the thread
as that could stop in a place where JIT context couldn't be recovered correctly.
Due to Intel dropping support for legacy segment registers[1] there is a
concern that this will break legacy 32-bit software that is doing some
magic segment register handling.
Adds some simple telemetry for 32-bit applications that when they
encounter an instruction that sets the segment register or uses a
segment register that the JIT will do a /relatively/ quick four
instruction check to see if it is not a null segment.
It's not enough to just check if the segment index is 0 or not, 32-bit
Linux software starts with non-zero segment register indexes but the LDT
for each segment index is a null-descriptor.
Once the segment address is loaded, the IR operation will do a quick
check against zero and if it /isn't/ zero then set the telemetry value.
A very minor optimization that segment registers only get checked once
per block to ensure overhead stays low.
[1] https://www.intel.com/content/www/us/en/developer/articles/technical/envisioning-future-simplified-architecture.html
- 3.6 - Restricted Subset of Segmentation
- `Bases are supported for FS, GS, GDT, IDT, LDT, and TSS
registers; the base for CS, DS, ES, and SS is ignored for 32-bit
mode, same as 64-bit mode (treated as zero).`
- 4.2.17 - MOV to Segment Register
- Will fault if SS is written (Breaking anything that writes to
SS).
- Will not fault if CS, DS, ES are written (Thus it sets the
segment but gets ignored due to 3.6).
It is not an external component, and it makes paths needlessly long.
Ryan seemed amenable to this when we discussed on IRC earlier.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>