stop prefixing the arguments when we generate allocate ops (in particular), this
is more convenient and simpler. in exchange we need to prefix Op to avoid a
collision on fcmpscalarinsert which has an argument named Op, but that's a local
change at least.
came up when experimenting with new IR, but I think this is probably a win by
itself.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
now that we do everything via NZCV, this is mostly vestigial. DF/x87 flags are
sufficiently rare to be "don't care"s here, and we don't even have multiblock
enabled yet!
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
This is no longer necessary and it also no longer provides us any useful
information. Since we expose the AVX CPUID flag, basically everything
uses VEX encoding now, so it is basically always set.
Some locations could end up with SRA registers that only spilled one
register.
Allow passing in temporaries from the call site.
Fixes rpid and syscalls asserting.
When AFP is supported then we can actually support DAZ. This might also
fix the audio corruption in Animal Well but I can't test it until Steam
is running on Oryon. Requires a bit of plumbing for MXCSR which we were
hacking around before but now we actually want to store the value.
Fixes#3856
A disowned buffer could be unclaimed or claimed by a different thread in
the time between the !IsFree check and locking the allocation mutex.
Fix this and prevent such errors in the future by always checking
ownership with the allocator locked before attempting to unclaim
buffers.
We support leaf functions now, so add the few that were calling for it.
We will be gaining support for the xsave ones relatively soon, so its
good to have them supported.
Also deletes a couple of cdt/cqm things that aren't exposed and we won't
be supporting.
With a brute force search of methods between 1-3 instructions we cover a
lot more cases more optimally.
There's definitely still more cases (and probably some that can reduce
from 3 instruction to 2), but covering 44 cases is a pretty good margin
already.
VPSHUFD and VPERMILPS are aliases of each other.
Reuses the implementation path from the PSHUFD implementation which has
a few swizzles and then a table lookup.
VPERMILPD is a very simple swizzle per 128-bit lane.
Fixes#3797Fixes#3784
nothing is optimizing around this, it's just adding pointless complexity. if we
want to actually optimize F80Cmp, the right way would be to lift the
implementation into the OpcodeDispatcher or JIT. it wouldn't be terribly
difficult. This kludge doesn't get us closer there.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>