This is the cause of a bunch of redundant moves that shows up in
InstCountCI. Fixing this aliasing and pre-colouring issue causes a ton
of 256-bit operations to become optimal.
Using the cached zero value is less efficient than loading it in to the
register for all these cases.
Lets us use rename hardware more efficiently and removes a dependency
chain on a single register.
Original:
```
movi v2.2d, #0x0
mov z16.d, p7/m, z2.d
<... 16 more times>
mov z31.d, p7/m, z2.d
```
Result:
```
movi v16.2d, #0x0
<... 16 more times>
movi v31.2d, #0x0
```
This is a quality of life improvement for people that want to tinker
with the InstCountCI but they may not necessarily have an Arm64 device
available immediately for poking.
As long as the vixl disassembler is enabled then the InstCountCI tests
can run and get bit-accurate encodings just like on an Arm64 device.
This also ensures that behaviour is consistent with or without the vixl
simulator enabled which is very important when running on x86 hosts.
This runs the data layout analysis pass added in the previous change twice:
Once for the host architecture and once for the guest architecture. This
allows the new DataLayoutCompareAction to query architecture differences for
each type, which can then be used to instruct code generation accordingly.
Currently, type compatibility is classified into 3 categories:
* Fully compatible (same size/alignment for the type itself and any members)
* Repackable (incompatibility can be resolved with emission of automatable
repacking code, e.g. when struct members are located at differing offsets
due to padding bytes)
* Incompatible
The set of these types is tracked in AnalysisAction, to which extensive
verification logic is added to detect potential incompatibilities and to
enforce use of annotatations where needed.
This was only required on x86 devices trying to escape the emulation.
Since x86 is now remove, this is entirely unnecessary.
When Steam launches applications with `/bin/sh`, this will remain under
the emulation and not escape these days.
With the removal of the x86 JIT, there is no need to have these be
independent classes.
Merges the Arm64Dispatcher in to the base Dispatcher class.
No functional change, just moving code.
Similar to previous tests, vpgatherqq and vgatherqpd are equivalent
instructions. So the tests are the same with the mnemonic changed.
This adds tests for an additional two sets of instructions. Getting us
full coverage of all eight instructions if we include the tests from
PR #3167 and #3166
Tests the same things as described in #3165
In addition, since these tests use 64-bit indices for address
calculation, we can easily generate and indice vector that tests
overflow. So every test at every displacement ALSO gains an additional
overflow test to ensure correct behaviour around pointer overflow
calculation.
Similar to previous tests, vgatherqd and vgatherqps are equivalent
instructions. So the tests are the same with the mnemonic changed.
This adds tests for an additional two sets of instructions, Getting us
up to six total over the eight if we include the tests from #3166.
Tests the same things as described in #3165
In addition, since these tests use 64-bit indices for address
calculation, we can easily generate and indice vector that tests
overflow. So every test at every displacement ALSO gains and additional
overflow test to ensure correct behaviour around pointer overflow
calculation.
Just like the previous tests, vpgatherdq and vgatherpq are equivalent
instructions. So the tests are the same except for the instruction
mnemonic again.
This adds unittests for two more of the eight gather instructions.
Getting us up to testing four in total.
Specifically this adds tests for 32-bit indices while loading 64-bit
element instructions.
Same thing as PR #3165 for what it tests versus doesn't.
vpgatherdd and vgatherps are effectively the same instructions, so the
tests are the same except for the instruction mnemonic.
This adds unit tests for two of the eight gather instructions.
Specifically this adds tests for the 32-bit indices loading 32-bit
elements instructions.
What it tests:
- Tests all displacement scales
- Tests multiple mask arrangements
- Ensures the mask register is zero'd after the instruction
What it doesn't test:
- Doesn't test address size calculation overflow
- Only would happen on 32-bit with 32-bit indices, or /really/ high
base addresses
- The instruction should behave as a mask to the address size
- Effectively behaves like `(uint64_t)(base + index << ilog2(scale))`
- Better idea is to just not expose AVX to 32-bit applications
- Doesn't test VSIB immediate displacement
- This just ends up being base_addr + imm so it isn't too interesting
- We can add more tests in the future if we think we messed that up
- Doesn't test partial fault behaviour
- Because that's a nightmare.
Specifically keeps each instruction test small and isolated so if a
single register fails it is very easily to nail down which operation did
it.
I know some of our ASM tests do a chunk of work and spit out a result at
the end which can be difficult to debug in some cases. Didn't want to do
that which is why the tests are spread out across 16 files for these
single class of instructions.