This moves the CPU feature querying to the frontend. The primary purpose
here is for the wow64 frontend to not require linux-isms for querying
these features. This is required since non-Linux environments don't have
the "CPUID" feature for reading EL1 MSRs in EL0.
Wiring up the remaining wow64 registry querying is left for a future
exercise.
This also technically removes an xbyak requirement from FEXCore for when
building the x86 Test harness runner, but that doesn't really matter for
regular use cases.
Started by cherry-picking some cases from the variants that appeared when running
Steam, games, AV1 convolve tests, openssl, ffmpeg, libjpeg-turbo,
openh264, libvpx, gemmlowp, libyuv, and dav1d.
Then turned it around and optimized them all since all variants end up
needing to be split in to two halves, that effectively means we need to
have 16 implementations, plus a couple of special cases for duplicated
results.
Fixes#3795
- IsImmLogical already existed in our CodeEmitter. We just forgot to
allow nullptr arguments and to use it.
- Adds an equivalent IsImmAddSub helper and uses it
This gets us closer to removing vixl's global initializers from FEXCore.
The prior approach using the L2 cache was flawed as it assumed L2
page entries had a 1-1 correspondence with actual pages. While the L2
cache could be extended to handle aliases with EC, this could lead to
thrashing etc. The cost of a lookup in the actual EC code bitmap is
cheap enough to perform every time considering the infrequency of calls
to ARM64EC code when compare to X86 L2 hits.
In the environment section this was causing the next environment
variable line to be merged with the strenum options
Also makes it so strenum options doesn't have a spurious comma at the
end of the list.
Using a brute force solver to add in more optimized code paths
- Adds 12 single VInsElement implementations
- Adds 4 two IR operation implementations
Not adding any of the two or three IR operation implementations that use
VInsElement because SRA interacts badly and becomes worse than the VTBX
implementation.
This has been leaked state to FEXCore for quite a while. FEXCore never
actually needed this information, moves the bits to the frontend that
are necessary.
Minor behaviour change that `RunUntilExit` now just assumes the primary
thread is using it. This behaviour is on the chopping block to get
removed next anyway.
Instead of passing the TID back to the exit handler, just pass the whole
thread object. This will allow some cleanups with the frontend thread
tracking soon
NFC
Optimizes the AVX128 blends by reusing the prior SSE4.1 implementation.
Only difference is the destination register isn't reused as a source
register.
One confusing thing is that Felix Cloutier's documentation has a typo on
the 256-bit VPBLENDW instruction where it had the top 128-bit lane
reusing the destination instead of sources. So I wrote a unittest to
ensure correctness.
Fixes#3796
stop prefixing the arguments when we generate allocate ops (in particular), this
is more convenient and simpler. in exchange we need to prefix Op to avoid a
collision on fcmpscalarinsert which has an argument named Op, but that's a local
change at least.
came up when experimenting with new IR, but I think this is probably a win by
itself.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>