This instruction is a little weird.
When accessing memory, the 128-bit operating size of the instruction
only loads 64-bits.
Meanwhile the 256-bit operating size of the instruction fetches a full
256-bits.
Theoretically the hardware could get away with two 64-bit loads or a
wacky 24-byte load, but it looks like to simplify hardware they just
spec'd it that the 256-bit version will always load the full range.
Only installs the tables if SVE256 isn't supported yet AVX is explicitly
enabled with HostFeatures, to protect accidental enablement early.
- Only implements 85 instructions starting out
- Basic vector moves
- Basic vector unary operations
- Basic vector binary operations
- VZeroUpper/VZeroAll
The bulk of the implementation is currently the handling for loading and
storing the halves of the registers from the context or from memory.
This means the load/store helpers must always return a pair unless only
requesting the bottom half of the register, which occurs with 128-bit
AVX operations. The store side then needing to consume the named zero
register if it occurs since those cases will zero the upper bits.
This implementation approach has a few benefits.
- I can pound this out extremely quickly
- SSE implementations are unaffected and don't need to deal with the
insert behaviour of SVE256.
- We still keep the SVE256 implementation for the inevitable future when
hardware vendors actually do implement it (Give it 8 years or
something).
- We can actually unit test this path in CI once it is complete.
- We can partially optimize some paths with SVE128 (Gathers) and support
a full ASIMD path if necessary.
One downside is that I can't enable this in CI yet because it can't pass
all unittests. but that's a non-issue since it is going to be in heavy
flux as I'm hammering out the implementation. It'll get switched on at
the end when it's passing all 1265 AVX unittests. Currently at 1001 on
this.
This is a different feature flag than regular AES as the default AES+AVX
only operates on 128-bit wide vectors.
With the newer `VAES` extension this is expanded to 256-bit.
Fixes#3690
When doing scalar insertions, upper bits come from different arguments
depending on the operation. These are listed in the ARM spec under the
NEP bit documentation.
The Oryon is the first CPU we know of that implemented support for the
RNG extension. It also has an errata where reading the RNDRRS register
never returns success. X86's RDSEED guarantees forward progress with
enough retries.
When an x86 processor messed this up at one point, some Linux systems
would infinite loop (presumably when something in boot was filling an
entropy pool). This required a microcode change to fix that processor.
The rdseed unittest infinite loops on this platform if RNG was exposed.
to be consistent with the scalar _Andn opcode, which is specifically named _Andn
and not _Bic.
noticed while reviewing AVX patches
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Needed something inbetween the `InlineJITBlockHeader` and `avx_high` in
order to match alignment requirements of 16-byte for avx_high. Chose the
`DeferredSignalRefCount` because we hit it quite frequently and it is
basically the only 64-bit variable that we end up touching
significantly.
In the future the CPUState object is going to need to change its view of
the object depending on if the device supports SVE256 or not, but we
don't need to frontload the work right now. It'll become significantly
easier to support that path once the RCLSE pass gets deleted.
This is required to be less than the maximum range for LDP and STP in
the Arm64 Dispatcher otherwise it breaks. Necessary to ensure this when
reorganizing the CoreState.