This adds all the missing atomic tests in to their own tests files.
This includes all of them except a few choice ones that are in their
original files.
- BTC, BTR, BTS are in their Secondary/SecondaryGroup files
- CMPXCHG, CMPXCHG8B, CMPXCHG16B are in their Secondary/SecondaryGroup
files
- These always imply lock semantics even without the prefix.
This is a quality of life improvement for people that want to tinker
with the InstCountCI but they may not necessarily have an Arm64 device
available immediately for poking.
As long as the vixl disassembler is enabled then the InstCountCI tests
can run and get bit-accurate encodings just like on an Arm64 device.
This also ensures that behaviour is consistent with or without the vixl
simulator enabled which is very important when running on x86 hosts.
This runs the data layout analysis pass added in the previous change twice:
Once for the host architecture and once for the guest architecture. This
allows the new DataLayoutCompareAction to query architecture differences for
each type, which can then be used to instruct code generation accordingly.
Currently, type compatibility is classified into 3 categories:
* Fully compatible (same size/alignment for the type itself and any members)
* Repackable (incompatibility can be resolved with emission of automatable
repacking code, e.g. when struct members are located at differing offsets
due to padding bytes)
* Incompatible
Similar to previous tests, vpgatherqq and vgatherqpd are equivalent
instructions. So the tests are the same with the mnemonic changed.
This adds tests for an additional two sets of instructions. Getting us
full coverage of all eight instructions if we include the tests from
PR #3167 and #3166
Tests the same things as described in #3165
In addition, since these tests use 64-bit indices for address
calculation, we can easily generate and indice vector that tests
overflow. So every test at every displacement ALSO gains an additional
overflow test to ensure correct behaviour around pointer overflow
calculation.
Similar to previous tests, vgatherqd and vgatherqps are equivalent
instructions. So the tests are the same with the mnemonic changed.
This adds tests for an additional two sets of instructions, Getting us
up to six total over the eight if we include the tests from #3166.
Tests the same things as described in #3165
In addition, since these tests use 64-bit indices for address
calculation, we can easily generate and indice vector that tests
overflow. So every test at every displacement ALSO gains and additional
overflow test to ensure correct behaviour around pointer overflow
calculation.
Just like the previous tests, vpgatherdq and vgatherpq are equivalent
instructions. So the tests are the same except for the instruction
mnemonic again.
This adds unittests for two more of the eight gather instructions.
Getting us up to testing four in total.
Specifically this adds tests for 32-bit indices while loading 64-bit
element instructions.
Same thing as PR #3165 for what it tests versus doesn't.
vpgatherdd and vgatherps are effectively the same instructions, so the
tests are the same except for the instruction mnemonic.
This adds unit tests for two of the eight gather instructions.
Specifically this adds tests for the 32-bit indices loading 32-bit
elements instructions.
What it tests:
- Tests all displacement scales
- Tests multiple mask arrangements
- Ensures the mask register is zero'd after the instruction
What it doesn't test:
- Doesn't test address size calculation overflow
- Only would happen on 32-bit with 32-bit indices, or /really/ high
base addresses
- The instruction should behave as a mask to the address size
- Effectively behaves like `(uint64_t)(base + index << ilog2(scale))`
- Better idea is to just not expose AVX to 32-bit applications
- Doesn't test VSIB immediate displacement
- This just ends up being base_addr + imm so it isn't too interesting
- We can add more tests in the future if we think we messed that up
- Doesn't test partial fault behaviour
- Because that's a nightmare.
Specifically keeps each instruction test small and isolated so if a
single register fails it is very easily to nail down which operation did
it.
I know some of our ASM tests do a chunk of work and spit out a result at
the end which can be difficult to debug in some cases. Didn't want to do
that which is why the tests are spread out across 16 files for these
single class of instructions.
This is blocking performance improvements. This backend is almost
unilaterally unused except for when I'm testing if games run on Radeon
video drivers.
Hopefully AmpereOne and Orin/Grace can fulfill this role when they
launch next year.
It is scarcely used today, and like the x86 jit, it is a significant
maintainence burden complicating work on FEXCore and arm64 optimization. Remove
it, bringing us down to 2 backends.
1 down, 1 to go.
Some interpreter scaffolding remains for x87 fallbacks. That is not a problem
here.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>