These instructions essentially have the same behavior. This also allows
us to remove the only used instance of FLAGS_SF_HIGH_XMM_REG, which,
given that we now support AVX, has ambiguous use.
While we're at it, we can expand the tests to make use of the store to
memory variant.
Also removes an erroneous copy-pasted comment about ZEXTing. This is
from the MOVQ implementation function. MOVHPS/MOVHPD don't do any
ZEXTing, they either store to memory or insert into a register.
For these unit tests we no longer need to put them in the disabled tests
file. Instead it will be skipped if the host doesn't support the feature
required.
This test doesn't increase coverage significantly, since OP_THUNK is called
with an invalid library name. An ideal test should verify that thunk symbols
are loaded properly, whereas currently it's only ensured the opcode 0xF 0x3F
is recognized by FEX at all. That's better than nothing, but a regression
here would likely show up in other tests anyway.
This just takes the regular non-atomic unit tests and changes them to
have lock prefixes.
These are all handled as byte sized atomics so there aren't any
alignment problems.
cmpxchg/cmpxchg8b doesn't have alignment requirements on x86, which means
applications rely on unaligned behaviour support with it.
Steam relies on this to work for a linked list array of jobs in some
internal job queueing system. It will end up always aligning by offsets
of 4 since it stores a couple of pointers.
Doesn't currently support the case of unaligned cmpxchg8b crossing a
cacheline, which ends up being semi-broken depending on which x86
behaviour the application is expected.
Intel CPUs do the "Big ring lock" or "split locks". Which means accesses
across cachelines are atomic.
AMD CPUs will tear the value across the cacheline, which is expected x86
behaviour by spec.
If they are expecting Intel behaviour, then that application is just
broken on non-Intel platforms unless they are fine with a tear.
Most of these were relying on the upper 16bits of the 80bit MM registers
to be zero.
This isn't necessarily true as one will find out when running this under
the host runner.