This is a very minor performance change. On Cortex CPUs that support
SVE, they do movprfx+<instruction> fusion to remove two cycles and a
dependency from the backend.
This is a minor win to convert from ASIMD mov+bsl to SVE movprfx+bsl
because of this, saving two cycles and a dependency on Cortex A710 and
A715. This is slightly less of a win on Cortex-A720/A725 because it supports
zero-cycle vector register renames, but it is still a win on Cortex-X925
because that is an older core design that doesn't support zero-cycle
vector register renames.
Very silly little thing.
Some applications don't measure rdtsc correctly and instead use cpuinfo
to get the CPU core's base clock speed. Which for most x86 CPUs is the
base clock speed which also matches their cycle counter speed.
Did this as a quick test to see if this would help `Unbound: Worlds
Apart` stuttering while BinaryNinja was disassembling the binary.
Turns out the game doesn't use cpuinfo for its cycle counter speed
determination, but it is good to implement this regardless.
We can support a few combinations of guest and host vector sizes
Host: 128-bit or 256-bit
Guest: 128-bit or 256-bit
The typical case is Host = 128-bit and Guest = 256-bit now that AVX is
implemented.
On 32-bit this changes to Host=128-bit and Guest=128-bit because we
disable AVX.
In the vixl simulator 32-bit turns in to Host=256-bit and Guest=128-bit.
And then in the vixl sim 64-bit turns in to Host=256-bit and
Guest=256-bit.
We cover all four combinations of guest and host vector register sizes!
Fixes a few assumptions that SVE256 = AVX256 basically.
In regular SIB land the index register encoding of 0b100 encodes to "no
register", this feature lets you get SIB encodings without an index
register for flexibility.
In VSIB encoding this isn't expected behaviour and instead there are no
encodings where an index register is missing. Allowing you to encode all
sixteen registers as an index register.
This was causing an abort in `AVX128_LoadVSIB` because the index turned
in to an invalid register.
Working instruction:
`vgatherdps ymm2, dword [eax+ymm5*4], ymm7`
Broken instruction:
`vgatherdps ymm0, dword [eax+ymm4*4], ymm7`
This fixes a crash in libfmod where it is using gathers in the wild.
Fixing a crash in Ender Lilies.
When the source or destination is a register, the address size override
doesn't apply. We were accidentally applying it on all sources
regardless of type which was causing us to zero extend on operations
that aren't affected by address size override.
This fixes the OpenSSL cert error in every application, but most
importantly Steam.