useful for checking blow up ratios. I keep calculating this by hand so let's
just add it to the json
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Different approach from #3579
Instead of completely ddropping the deprecated path, support the new
path and the old path using python try-except import exceptions.
This allows us to continue using old packages in CI, while supporting
the future API once pkg_resources gets deprecated and removed. Best of
both worlds.
Disabled_Tests was mostly a copy of Known_Failures. Leave only the race
condition on SIGPROF test (mcount_pic.c).
Also remove pr57275.c from known failures. It passes now that we support
AVX.
Also if a test is disabled, just skip it
This is a different feature flag than regular AES as the default AES+AVX
only operates on 128-bit wide vectors.
With the newer `VAES` extension this is expanded to 256-bit.
This is helpful for devs working on FEXCore, I've been using this locally but it
might make sense to stick it in tree.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Installing PPA with the script now installs a different
URL. This means that GetPPAStatus always returns false.
For the sake of backwards compatibility match the end
of the line to check.
Adds paths to ignore to clang-format, .clang-format file.
Also a wrapper script to clang-format to read .clang-format-ignore.
To format the whole tree at the root of the repository run:
`find . -iname '*.h' -o -iname '*.cpp' -exec python3 Scripts/clang-format.py -i \{\} \;`
Add reformat target to reformat the whole tree
This was a funny joke that this was here, but it is fundamentally
incompatible with what we're doing. All those users are running proot
anyway because of how broken running under termux directly is.
Just remove this from here.
A dictionary lookup was only ever returning the .ero result since we
were adding multiple values to the same key. Adds them as a list now and
walks them.
Additionally if we forget to add a new version again, have a fallback to
try checking using our file format layout.
Fixes an issue where a user running 23.10 was faulting the python
script.
Instruction count CI has transformed the way we work on FEX… I love the system
and want to make it better. there’s one part of instruction count CI that isn’t
so lovable: the problematic “optimal” flag on instructions.
There are several issues with this flag, both philosophical and practical.
– it is tedious to update the optimal flag when making an implementation
optimal. The effect of that is discouraging people from making instructions,
optimal, or encouraging people to fail to update the flag, and dilute the value
of it. Either way, since we care far more about optimal implementations, then we
do about updating the flag, clearly we should prioritize the implementation and
not the flag. This issue was not obvious at the outset, when instruction count,
CI was introduced, and still quite small. The problem magnified when we started
duplicating instructions in bulk for different combinations of CPU features
(flagm, AFP, etc.) that intern multiplies the manual work required to update the
flags by the corresponding constant factor. if it comes down to a choice between
removing this extra coverage and removing the flag, I think we all agree that
removing the flag is the lesser evil.
– The definition of “optimal” is fundamentally problematic. I have often
improved the instruction count of an instruction that was already “optimal”.
This is all kinds of silly, and calls into question whether there’s any value
whatsoever in the existing classifications of the flag. Furthermore, it is often
unknowable, whether an implementation really is optimal. Is it possible to
implement BZHI (with flag calculations) in fewer than eight instructions? We
don’t know, and it’s silly to pretend that we do.
– as a consequence of the problematic definitions , there are so many errors in
both directions that I don’t think there’s much value in preserving the existing
classification at the expense of +progress. Being able to say “32% of
instructions are translated optimally” is neat, but it really doesn’t tell us
anything whatsoever when you dig a little deeper.
So, as the flag is misleading at best and perhaps harmful at worst, let’s remove
it and make the instruction count CI, more useful overall. let’s let the
expected count and the assembly speak for themselves, and cut away the chaff. if
we want a meaningless number to report to management, we can instead calculate
the average blowup factor ;-)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
There are some cases where we want to test multiple instructions where
we can do optimizations that would overwise be hard to see.
eg:
```asm
; Can be optimized to a single stp
push eax
push ebx
; Can remove half of the copy since we know the direction
cld
rep movsb
; Can remove a redundant insert
addss xmm0, xmm1
addss xmm0, xmm2
```
This lets us have arbitrary sized code in instruction count CI, with the
original json key becoming only a label if the instruction array is
provided.
There are still some major limitations to this, instructions that
generate side-effects might have "garbage" after the end of the block
that isn't correctly accounted for. So care must be taken.
Example in the json
```json
"push ax, bx": {
"ExpectedInstructionCount": 4,
"Optimal": "No",
"Comment": "0x50",
"x86Insts": [
"push ax",
"push bx"
],
"ExpectedArm64ASM": [
"uxth w20, w4",
"strh w20, [x8, #-2]!",
"uxth w20, w7",
"strh w20, [x8, #-2]!"
]
}
```
This allows us to use reciprocal instructions which matches precision of
what x86 expects rather than converting everything to float divides.
Currently no hardware supports this, and even the upcoming X4/A720/A520
won't support it, but it was trivial to implement so wire it up.
Now that PF calculation is deferred, the cost of calculating PF correctly should
be tolerable. Remove the speed hack to skip PF. It's fundamentally broken, and
there are enough broken things in FEX as it is that we don't need to maintain
this one ;-)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
This will allow investigating the Arm64 directly next to the test, plus
publicly linking directly to badly behaving tests.
Perfect for nerdsniping implementations.
Pretty sure this is why CI is unhappy. If a test in a different file has
the same name then it is highly likely to conflict when nasm is
generating files and will overwrite and erase, causing CI to break.
Include the incoming json filename as part of the asm keys so it can't
conflict here.
Implements CI for tracking instruction counts for generate blocks of
code when transforming from x86 to ARM64 assembly.
This will end up encompassing every instruction in our instruction
tables similarly to how our assembly tests try to test everything in our
instruction tables.
Incidentally, the data for this CI is generated using our assembly
tests. By enabling disassembly and instruction stats when executing a
suite of instructions, this gives the stats that can be added to a json
file.
The current implementation only implements the SecondGroup table of
instructions because it is a relatively small table and has known
inefficiencies in the instruction implementations. As this gets merged I
will be adding more tables of instructions to additional json files for
testing.
These JSON files will support adjusting CPU features regardless of the
host features so it can test implementations depending on different CPU
features. This will let us test things like one instruction having
different "optimal" implementations depending on if it supports SVE128,
SVE256, SVEI8MM, etc.
This initial instruction auditing is what found the bug in our vector
shift instructions by size of zero. If inspecting the result of the CI
run, you can tell that these instructions still aren't "optimal" because
they are doing loads and stores that can be eliminated.
The "Optimal" in the JSON is purely for human readable and grepping
ability to see what is optimal versus not. Same with the "Comment"
section.
According to my auditing spreadsheet, the total number of instructions
that will end up in these json files will be about 1000, but we will
likely end up with more since there will be edge cases that can be more
optimal depending on arguments.
Fixes#2724
If catchsegv doesn't exist then just remove it from the execution
environment.
While nice to have, this shouldn't be mandatory especially with Debian
no longer shipping it.
Cortex-X1C and A78C are relatively minor changes to their non-C
counterparts. Support classifying them in case clang understands them.
Fixes a minor perf regression noticed on the Lenovo X13s while testing.
Because the binaries have no metadata, we allow a .json file to be
placed alongside a test indicating required features in a requirements
directory
We also check if the system itself supports those features and run tests
based off of that.