We can avoid needing to use movprfx here by moving
directly into the destination when possible and just
doing the UMAX directly.
Also expands the unsigned max tests to test values with
the sign bit set to ensure all behavior is caught.
Since SMAX performs a comparison and returns the max value regardless
of how the operands are provided, we can check for when the second
input aliases the destination.
Similar to previous tests, vpgatherqq and vgatherqpd are equivalent
instructions. So the tests are the same with the mnemonic changed.
This adds tests for an additional two sets of instructions. Getting us
full coverage of all eight instructions if we include the tests from
PR #3167 and #3166
Tests the same things as described in #3165
In addition, since these tests use 64-bit indices for address
calculation, we can easily generate and indice vector that tests
overflow. So every test at every displacement ALSO gains an additional
overflow test to ensure correct behaviour around pointer overflow
calculation.
Similar to previous tests, vgatherqd and vgatherqps are equivalent
instructions. So the tests are the same with the mnemonic changed.
This adds tests for an additional two sets of instructions, Getting us
up to six total over the eight if we include the tests from #3166.
Tests the same things as described in #3165
In addition, since these tests use 64-bit indices for address
calculation, we can easily generate and indice vector that tests
overflow. So every test at every displacement ALSO gains and additional
overflow test to ensure correct behaviour around pointer overflow
calculation.
Just like the previous tests, vpgatherdq and vgatherpq are equivalent
instructions. So the tests are the same except for the instruction
mnemonic again.
This adds unittests for two more of the eight gather instructions.
Getting us up to testing four in total.
Specifically this adds tests for 32-bit indices while loading 64-bit
element instructions.
Same thing as PR #3165 for what it tests versus doesn't.
vpgatherdd and vgatherps are effectively the same instructions, so the
tests are the same except for the instruction mnemonic.
This adds unit tests for two of the eight gather instructions.
Specifically this adds tests for the 32-bit indices loading 32-bit
elements instructions.
What it tests:
- Tests all displacement scales
- Tests multiple mask arrangements
- Ensures the mask register is zero'd after the instruction
What it doesn't test:
- Doesn't test address size calculation overflow
- Only would happen on 32-bit with 32-bit indices, or /really/ high
base addresses
- The instruction should behave as a mask to the address size
- Effectively behaves like `(uint64_t)(base + index << ilog2(scale))`
- Better idea is to just not expose AVX to 32-bit applications
- Doesn't test VSIB immediate displacement
- This just ends up being base_addr + imm so it isn't too interesting
- We can add more tests in the future if we think we messed that up
- Doesn't test partial fault behaviour
- Because that's a nightmare.
Specifically keeps each instruction test small and isolated so if a
single register fails it is very easily to nail down which operation did
it.
I know some of our ASM tests do a chunk of work and spit out a result at
the end which can be difficult to debug in some cases. Didn't want to do
that which is why the tests are spread out across 16 files for these
single class of instructions.
It is scarcely used today, and like the x86 jit, it is a significant
maintainence burden complicating work on FEXCore and arm64 optimization. Remove
it, bringing us down to 2 backends.
1 down, 1 to go.
Some interpreter scaffolding remains for x87 fallbacks. That is not a problem
here.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Arm64 store with writeback when source register is the same register as
the address is undefined behaviour.
Depending on hardware details this can do a whole bunch of things.
This situation happens when the x86 code does `push rsp` which is quite
common for applications to do. We would then convert this to a `str x8, [x8, #-8]!`
Which results in undefined behaviour.
Now that redundant loads are optimized this showed up as an issue. Adds
a unit test to ensure we don't hit this again.
Logical ops leave AF undefined so we can't expect it to be zero after. Mask the
result of lahf to avoid testing UB. These unit tests would regress from the work
in this MR otherwise.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
For a bunch of cases that act as broadcasts (where all
indices in the imm8 specify the same element), we
can use VDupElement here rather than iterating through.
While it would be bizarre if this actually occurred frequently
in practice, we can still tune it so there's no subpar assembly
output in the cases it actually does happen.
When a 32-bit adcx instruction was encountered, it was getting treated
as a 16-bit adcx instruction instead. This is because of the 0x66 prefix
required to handle this instruction.
Adds a unit test to ensure it doesn't break again.
Tests for the regression from 7e6bb04db ("OpcodeDispatcher: Extract
CalculatePF"). This fails on main but passes with this PR.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
When this instruction returns the index in to the ecx register, this is
defined as a 32-bit result. This means it actually gets zero-extended to
the full 64-bit GPR size on 64-bit processes.
Previously FEX was doing a 32-bit insert which leaves garbage data in
the upper 32-bits of the RCX register.
Adds a unit test to ensure the result is zero extended.
Fixes running Java games under FEX now that SSE4.2 is exposed.
Allows us to have a place to put helper includes and files that contain
macro utilities. This will be nice for making macro files that cut down
on verbosity across tests (e.g. Making tests for XSAVE would be way less
copy-pastey).
So, uh, this was a little silly to track down. So, having the upper limit
as unsigned was a mistake, since this would cause negative valid lengths to
convert into an unsigned value within the first two flag comparison cases
A -1 valid length can occur if one of the strings starts with a null character
in a vector's first element. (It will be zero and we then subtract it to
make the length zero-based).
Fixes this edge-case up and expands a test to check for this in the future.
This returns the `XFEATURE_ENABLED_MASK` register which reports what
features are enabled on the CPU.
This behaves similarly to CPUID where it uses an index register in ecx.
This is a prerequisite to enabling XSAVE/XRSTOR and AVX since
applications will expect this to exist.
xsetbv is a privileged instruction and doesn't need to be implemented.
We can reuse the same helper we have for handling VMASKMOVPD and VMASKMOVPS,
though we need to move some handling around to account for the fact that
VPMASKMOVD and VPMASKMOVQ 'hijack' the REX.W bit to signify the element
size of the operation.
And with that, we support all of the AVX1-only instructions.
The remaining instructions for full AVX1 support is now just the SSE4.2
string instructions.
These instructions essentially have the same behavior. This also allows
us to remove the only used instance of FLAGS_SF_HIGH_XMM_REG, which,
given that we now support AVX, has ambiguous use.
While we're at it, we can expand the tests to make use of the store to
memory variant.
Also removes an erroneous copy-pasted comment about ZEXTing. This is
from the MOVQ implementation function. MOVHPS/MOVHPD don't do any
ZEXTing, they either store to memory or insert into a register.
Makes the behavior consistent with the x86 JIT.
We need to treat values larger than 31 as if they were 31 bit shifts in
order to handle sign-extending behavior properly.
All of these operations were only testing positive integers which is why
they didn't show 16-bit failures.
Adds a bunch of negative tests to each ones now that #2314 is merged,
which would have caught them.