We were only running SVE256 and SVE128 with AVX disabled.
Enable asm tests with SVE256, SVE128, and ASIMD, all running with AVX
enabled to hit all the tests.
FEX_ENABLEAVX was removed. Settings for host features should use
FEX_HOSTFEATURES but actually FEX_HOSTFEATURES=enableavx has different
behaviour, so we don't enable it.
Sometimes github or the CI runner times out trying to checkout the
source and stays timing out forever.
Give it a three minute timeout otherwise the CI runner will stall
forever.
Adapted from LLVM version of pr-code-format.yml.
Copies a few scripts from LLVM to External/.
Runs self-hosted on X64.
Assumes clang-format 16.0.6 for formatting.
This unit test hasn't really served any purpose for a while now and
mostly just causes pain when reworking things in the IR.
Just remove the IRLoader, its unit tests, the github action steps and
the public FEXCore interface to it. Since it isn't used by anything
other than Thunks.
Also moves some IR definitions from the public API to the backend.
We only used this so that our Xavier CI system which were running old
kernels could run unit tests. We have now removed the Xaviers from CI
and this is no longer necessary.
Stop pretending that we support kernels older than 5.0 and allowing this
fallback.
The 32-bit allocator is still used for the MAP_32BIT mmap flag, so the
load bearing code can't be fully removed. Just remove the config and the
frontend things using it.
It is scarcely used today, and like the x86 jit, it is a significant
maintainence burden complicating work on FEXCore and arm64 optimization. Remove
it, bringing us down to 2 backends.
1 down, 1 to go.
Some interpreter scaffolding remains for x87 fallbacks. That is not a problem
here.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Due to Intel dropping support for legacy segment registers[1] there is a
concern that this will break legacy 32-bit software that is doing some
magic segment register handling.
Adds some simple telemetry for 32-bit applications that when they
encounter an instruction that sets the segment register or uses a
segment register that the JIT will do a /relatively/ quick four
instruction check to see if it is not a null segment.
It's not enough to just check if the segment index is 0 or not, 32-bit
Linux software starts with non-zero segment register indexes but the LDT
for each segment index is a null-descriptor.
Once the segment address is loaded, the IR operation will do a quick
check against zero and if it /isn't/ zero then set the telemetry value.
A very minor optimization that segment registers only get checked once
per block to ensure overhead stays low.
[1] https://www.intel.com/content/www/us/en/developer/articles/technical/envisioning-future-simplified-architecture.html
- 3.6 - Restricted Subset of Segmentation
- `Bases are supported for FS, GS, GDT, IDT, LDT, and TSS
registers; the base for CS, DS, ES, and SS is ignored for 32-bit
mode, same as 64-bit mode (treated as zero).`
- 4.2.17 - MOV to Segment Register
- Will fault if SS is written (Breaking anything that writes to
SS).
- Will not fault if CS, DS, ES are written (Thus it sets the
segment but gets ignored due to 3.6).
Implements CI for tracking instruction counts for generate blocks of
code when transforming from x86 to ARM64 assembly.
This will end up encompassing every instruction in our instruction
tables similarly to how our assembly tests try to test everything in our
instruction tables.
Incidentally, the data for this CI is generated using our assembly
tests. By enabling disassembly and instruction stats when executing a
suite of instructions, this gives the stats that can be added to a json
file.
The current implementation only implements the SecondGroup table of
instructions because it is a relatively small table and has known
inefficiencies in the instruction implementations. As this gets merged I
will be adding more tables of instructions to additional json files for
testing.
These JSON files will support adjusting CPU features regardless of the
host features so it can test implementations depending on different CPU
features. This will let us test things like one instruction having
different "optimal" implementations depending on if it supports SVE128,
SVE256, SVEI8MM, etc.
This initial instruction auditing is what found the bug in our vector
shift instructions by size of zero. If inspecting the result of the CI
run, you can tell that these instructions still aren't "optimal" because
they are doing loads and stores that can be eliminated.
The "Optimal" in the JSON is purely for human readable and grepping
ability to see what is optimal versus not. Same with the "Comment"
section.
According to my auditing spreadsheet, the total number of instructions
that will end up in these json files will be about 1000, but we will
likely end up with more since there will be edge cases that can be more
optimal depending on arguments.
We don't currently have a device in CI that can run SVE with 128-bit
width registers. Until we have a device with this, make sure the vixl
simulator is also running the ASM tests in this width.
While the ENABLE_LLD and ENABLE_MOLD options are nice, they don't handle
the case when the linker of `lld` or `mold` doesn't match the compiler.
This particularly crops up when overriding the C compiler to a new
version of clang but the globally installed `ld.lld` is still the old
clang version.
This then causes clang to fail with unusual errors when upstream breaks
compatibility with itself.
Easy enough to use by passing the linker to cmake:
`-DUSE_LINKER=/usr/bin/ld.lld-15`
This also removes the ENABLE_LLD and ENABLE_MOLD options to use
USE_LINKER directly.
- ldd: `-DUSE_LINKER=lld`
- mold: `-DUSE_LINKER=mold`
Example of compiler failure when built with clang-15 but attempting to
link with ld.lld 14:
```bash
ld.lld-14: error: unittests/APITests/CMakeFiles/Filesystem.dir/Filesystem.cpp.o: Opaque pointers are only supported in -opaque-pointers mode (Producer: 'LLVM15.0.7' Reader: 'LLVM 14.0.6')
```
Currently only does a build, doing a CI run means figuring out why
TestHarnessRunner doesn't find libraries correctly.
I want to ensure we don't break building at least while sorting out the
rest of this.
Usually uploading of results takes about two seconds.
Sometimes github's connection to the runner flakes and it stalls out the
upload action for some reason.
Github's default timeout is SIX HOURS.
Change this to a one-minute timeout on the upload step so it quickly
goes away when Github's internet flakes out.
All of these IR operations were being fairly inefficient in their
address calculation. All of these are known using power of 2 stride
indexing. So all of these can be converted from three instructions to
one.
These are always used for x87 stack accesses so each one gets an
improvement.
Before:
```asm
0x0000ffff6a800248 d2800200 mov x0, #0x10
0x0000ffff6a80024c 9b007e80 mul x0, x20, x0
0x0000ffff6a800250 8b000380 add x0, x28, x0
0x0000ffff6a800254 fd417805 ldr d5, [x0, #752]
```
After:
```asm
0x0000ffff91e80240 8b141380 add x0, x28, x20, lsl #4
0x0000ffff91e80244 fd417805 ldr d5, [x0, #752]
```
Removes some annotation warnings that have been showing up on the
actions results page.
v2 is deprecated so going to v3 is necessary. Apparently this upgrades
from Node.js 12 to 16.
Some of the unit tests we run will leak shm regions. Presumably this is
because they never called `shmctl(IPC_RMID)` so the ID is laked forever.
This can be seen by querying `/proc/sysvipc/shm` to see a list of old
shm regions that eventually hit the maximum capacity of 4096 shm ids.
Once CI is done running, run a utility application that all it does is
check for SHM IDs that have zero attachments (thus unused), it was
created by the UID of the runner, and it is older than ten minutes. At
which point it will erase it.
This will fix spurious failures in our CI caused by running out of SHM
IDs, previously I had a cron job setup to restart the CI runners every
hour or so which caused its own spurious failure problems.
FINALLY this bug was triaged which has been annoying us for...years?
This is based on our regular CI runner file with a couple of things
stripped out.
- gvisor tests removed because they cause our CI machines pain with
tmp/shm mounts
- Thunks disabled since glibc fault testing is incompatible with it
- ARMEmitter tests removed since we don't want to test vixl.
All glibc conversion PRs will need to be merged before this and then
this needs to be rebased.
The dispatcher was saving AVX state even though FEX doesn't support it
currently. This is due to it checking for the config option rather than
the HostFeatures option.
The `EnableAVX` config option is supposed to be used to inform FEXCore
if we want AVX disabled or not when the host supports the feature. In
this case it is universally enabled because we haven't encountered any
games that have issues with AVX state being saved with signals. (We know
they exist, we just don't have configurations for them).
The HostFeatures option `SupportsAVX` is the option that is supposed to
be getting used for determining if the runtime AVX feature is enabled.
This also had an issue though that this was **also** always enabled if
running on an x86 host with AVX, or an ARM host with SVE2-256bit.
It was then disabled if the config option was disabled; But, since
FEX-Emu doesn't support AVX fully yet, we need to ensure this isn't yet
enabled.
But this only solves half the problem. In order for our CI to test AVX
features before fully supporting AVX, it needs to be able to enable AVX
so that the CPU state is correctly saved.
So we need to change the default configuration option to be false, and
have CI enable it for the tests that matter before AVX is fully
implemented.
This is the bare minimum, it only tests glxinfo and vulkaninfo with and
without thunks. Nothing more special than that. Already found the .1 bug
with libvulkan host library loading.
build folder doesn't exist on a freshly started runner.
Executing the rootfs fetch script doesn't care what the working
directory is. Doesn't need to be in `build/` which doesn't exist on a
fresh runner and will fail chdir.
By default we won't build with the interpeter to reduce user confusion.
The interpreter isn't really useful to end users so remove it.
Completely removes it from building except for the fallback operations.
This also removes the selection from FEXConfig to remove selection
confusion there.
File Stats:
FEXLoader Size with Interpreter: 3422768 bytes
FEXLoader Size without Interpreter: 3301944 bytes
Size difference: 96.4699915%
Bytes removed: 120824 bytes
4k pages removed: 29.498046875 -> 30 rounded up
VM Stats (Reported from bloaty):
Memory Size with Interpreter: 6.50Mi
Memory Size without Interpreter: 6.38Mi
Size difference: 98.1538462%