A dictionary lookup was only ever returning the .ero result since we
were adding multiple values to the same key. Adds them as a list now and
walks them.
Additionally if we forget to add a new version again, have a fallback to
try checking using our file format layout.
Fixes an issue where a user running 23.10 was faulting the python
script.
Instruction count CI has transformed the way we work on FEX… I love the system
and want to make it better. there’s one part of instruction count CI that isn’t
so lovable: the problematic “optimal” flag on instructions.
There are several issues with this flag, both philosophical and practical.
– it is tedious to update the optimal flag when making an implementation
optimal. The effect of that is discouraging people from making instructions,
optimal, or encouraging people to fail to update the flag, and dilute the value
of it. Either way, since we care far more about optimal implementations, then we
do about updating the flag, clearly we should prioritize the implementation and
not the flag. This issue was not obvious at the outset, when instruction count,
CI was introduced, and still quite small. The problem magnified when we started
duplicating instructions in bulk for different combinations of CPU features
(flagm, AFP, etc.) that intern multiplies the manual work required to update the
flags by the corresponding constant factor. if it comes down to a choice between
removing this extra coverage and removing the flag, I think we all agree that
removing the flag is the lesser evil.
– The definition of “optimal” is fundamentally problematic. I have often
improved the instruction count of an instruction that was already “optimal”.
This is all kinds of silly, and calls into question whether there’s any value
whatsoever in the existing classifications of the flag. Furthermore, it is often
unknowable, whether an implementation really is optimal. Is it possible to
implement BZHI (with flag calculations) in fewer than eight instructions? We
don’t know, and it’s silly to pretend that we do.
– as a consequence of the problematic definitions , there are so many errors in
both directions that I don’t think there’s much value in preserving the existing
classification at the expense of +progress. Being able to say “32% of
instructions are translated optimally” is neat, but it really doesn’t tell us
anything whatsoever when you dig a little deeper.
So, as the flag is misleading at best and perhaps harmful at worst, let’s remove
it and make the instruction count CI, more useful overall. let’s let the
expected count and the assembly speak for themselves, and cut away the chaff. if
we want a meaningless number to report to management, we can instead calculate
the average blowup factor ;-)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
There are some cases where we want to test multiple instructions where
we can do optimizations that would overwise be hard to see.
eg:
```asm
; Can be optimized to a single stp
push eax
push ebx
; Can remove half of the copy since we know the direction
cld
rep movsb
; Can remove a redundant insert
addss xmm0, xmm1
addss xmm0, xmm2
```
This lets us have arbitrary sized code in instruction count CI, with the
original json key becoming only a label if the instruction array is
provided.
There are still some major limitations to this, instructions that
generate side-effects might have "garbage" after the end of the block
that isn't correctly accounted for. So care must be taken.
Example in the json
```json
"push ax, bx": {
"ExpectedInstructionCount": 4,
"Optimal": "No",
"Comment": "0x50",
"x86Insts": [
"push ax",
"push bx"
],
"ExpectedArm64ASM": [
"uxth w20, w4",
"strh w20, [x8, #-2]!",
"uxth w20, w7",
"strh w20, [x8, #-2]!"
]
}
```
This allows us to use reciprocal instructions which matches precision of
what x86 expects rather than converting everything to float divides.
Currently no hardware supports this, and even the upcoming X4/A720/A520
won't support it, but it was trivial to implement so wire it up.
Now that PF calculation is deferred, the cost of calculating PF correctly should
be tolerable. Remove the speed hack to skip PF. It's fundamentally broken, and
there are enough broken things in FEX as it is that we don't need to maintain
this one ;-)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
This will allow investigating the Arm64 directly next to the test, plus
publicly linking directly to badly behaving tests.
Perfect for nerdsniping implementations.
Pretty sure this is why CI is unhappy. If a test in a different file has
the same name then it is highly likely to conflict when nasm is
generating files and will overwrite and erase, causing CI to break.
Include the incoming json filename as part of the asm keys so it can't
conflict here.
Implements CI for tracking instruction counts for generate blocks of
code when transforming from x86 to ARM64 assembly.
This will end up encompassing every instruction in our instruction
tables similarly to how our assembly tests try to test everything in our
instruction tables.
Incidentally, the data for this CI is generated using our assembly
tests. By enabling disassembly and instruction stats when executing a
suite of instructions, this gives the stats that can be added to a json
file.
The current implementation only implements the SecondGroup table of
instructions because it is a relatively small table and has known
inefficiencies in the instruction implementations. As this gets merged I
will be adding more tables of instructions to additional json files for
testing.
These JSON files will support adjusting CPU features regardless of the
host features so it can test implementations depending on different CPU
features. This will let us test things like one instruction having
different "optimal" implementations depending on if it supports SVE128,
SVE256, SVEI8MM, etc.
This initial instruction auditing is what found the bug in our vector
shift instructions by size of zero. If inspecting the result of the CI
run, you can tell that these instructions still aren't "optimal" because
they are doing loads and stores that can be eliminated.
The "Optimal" in the JSON is purely for human readable and grepping
ability to see what is optimal versus not. Same with the "Comment"
section.
According to my auditing spreadsheet, the total number of instructions
that will end up in these json files will be about 1000, but we will
likely end up with more since there will be edge cases that can be more
optimal depending on arguments.
Fixes#2724
If catchsegv doesn't exist then just remove it from the execution
environment.
While nice to have, this shouldn't be mandatory especially with Debian
no longer shipping it.
Cortex-X1C and A78C are relatively minor changes to their non-C
counterparts. Support classifying them in case clang understands them.
Fixes a minor perf regression noticed on the Lenovo X13s while testing.
Because the binaries have no metadata, we allow a .json file to be
placed alongside a test indicating required features in a requirements
directory
We also check if the system itself supports those features and run tests
based off of that.
X11 has an attribute that was causing function declarations to be
missed.
These definitions exist in XLibint.h
eg:
```cpp
extern void _XEatData(
Display* /* dpy */,
unsigned long /* n */
) _X_COLD;
```
This `_X_COLD` attribute was causing these function definitions to get
missed.
The x86 runner had unattended-upgrades accidentally still enabled. It
upgraded a bunch of development packages which broke CI.
Fix the struct verifier so it works with the new packages.
Sadly python3-clang doesn't support all the new CursorKind types so we
need to self-define some of them for now.
Once this tool gets converted over to C++ it will be a non-issue.
If a test is marked as a flake then it will be tried five times before
giving up.
Works around the problem of needing to babysit CI once a PR is pushed.
As long as we have all the flake tests marked.
Termux uses defines for these, so our token pasting fails, but we also
still want to use their define so we can fall down their emulation
library whenever possible.
Prefix an underscore to be able to use both our number definitions and
their defines in the same file.
For these unit tests we no longer need to put them in the disabled tests
file. Instead it will be skipped if the host doesn't support the feature
required.
Necessary for tests that depend on the state of the running context.
Since we support an SSE mode and an AVX mode, the FPR store truncate
test will fail on hosts that don't support AVX as the register offsets
are going to be different between the two. So we can conditionally
enable support for these tests.