Arm64's SVE load instruction can be minorly optimized in the case that a
base GPR register isn't provided, as it has a version of the instruction
that doesn't require one.
The limitation of this instruction is that it doesn't support scaling at
all so it only works if the offset scale is 1.
When FEX hits the optimal case that the destination isn't one of the
incoming sources (other than the incomingDest source) then we can
optimize out two moves per 128-bit lane.
Cuts 256-bit non-SVE gather loads from 50 instructions down to 46.
Some applications don't measure rdtsc correctly and instead use cpuinfo
to get the CPU core's base clock speed. Which for most x86 CPUs is the
base clock speed which also matches their cycle counter speed.
Did this as a quick test to see if this would help `Unbound: Worlds
Apart` stuttering while BinaryNinja was disassembling the binary.
Turns out the game doesn't use cpuinfo for its cycle counter speed
determination, but it is good to implement this regardless.
We can support a few combinations of guest and host vector sizes
Host: 128-bit or 256-bit
Guest: 128-bit or 256-bit
The typical case is Host = 128-bit and Guest = 256-bit now that AVX is
implemented.
On 32-bit this changes to Host=128-bit and Guest=128-bit because we
disable AVX.
In the vixl simulator 32-bit turns in to Host=256-bit and Guest=128-bit.
And then in the vixl sim 64-bit turns in to Host=256-bit and
Guest=256-bit.
We cover all four combinations of guest and host vector register sizes!
Fixes a few assumptions that SVE256 = AVX256 basically.
In regular SIB land the index register encoding of 0b100 encodes to "no
register", this feature lets you get SIB encodings without an index
register for flexibility.
In VSIB encoding this isn't expected behaviour and instead there are no
encodings where an index register is missing. Allowing you to encode all
sixteen registers as an index register.
This was causing an abort in `AVX128_LoadVSIB` because the index turned
in to an invalid register.
Working instruction:
`vgatherdps ymm2, dword [eax+ymm5*4], ymm7`
Broken instruction:
`vgatherdps ymm0, dword [eax+ymm4*4], ymm7`
This fixes a crash in libfmod where it is using gathers in the wild.
Fixing a crash in Ender Lilies.
When the source or destination is a register, the address size override
doesn't apply. We were accidentally applying it on all sources
regardless of type which was causing us to zero extend on operations
that aren't affected by address size override.
This fixes the OpenSSL cert error in every application, but most
importantly Steam.