Commit Graph

358 Commits

Author SHA1 Message Date
Mai
af6a0be832
Merge pull request #3842 from Sonicadvance1/fix_f64_to_i32
VCVT{T,}PD2DQ fixes and optimization
2024-07-09 03:49:31 -04:00
Ryan Houdek
c9c163cd7b
unittests: Update vcv{t,tt}pd2dq tests to ensure upper bits of destination are cleared 2024-07-08 03:30:10 -07:00
Ryan Houdek
fa587398bd
unittests: Extends vinsert{i,f}128 tests for garbage data
Just to ensure we don't hit an issue with masking the immediate bits.

Fixes #3753
2024-07-07 02:16:21 -07:00
Ryan Houdek
8b9b1a90e4
unittests: Fixes typo in vpcmpgtw test 2024-07-01 14:42:23 -07:00
Ryan Houdek
babde31bf0
AVX128: Fixes vmovlhps
We didn't have a unit test for this and we weren't implementing it at
all.
We treated it as vmovhps/vmovhpd accidentally. Once again caught by the
libaom Intrinsics unit tests.
2024-07-01 13:54:11 -07:00
Ryan Houdek
aba7a3a830
AVX128: Fixes vblendps lower and upper selector 2024-06-27 17:20:39 -07:00
Ryan Houdek
9027d1eee7
AVX128: Fixes bug in vector immediate shift 2024-06-27 16:22:14 -07:00
Ryan Houdek
c6c147daf6
unittests: Updates vcvtps2ph test for failure case of writing too much memory. 2024-06-26 16:49:00 -07:00
Ryan Houdek
52e541d453
Unittests: Stop using AVX2 flag 2024-06-26 14:56:01 -07:00
Ryan Houdek
ba28e6f82e
unittests: Adds vcvtps2ph tests that use mxcsr 2024-06-26 14:08:20 -07:00
Ryan Houdek
94fd100fc7
Merge pull request #3719 from lioncash/f16c
OpcodeDispatcher: Handle F16C operations
2024-06-26 12:12:13 -07:00
Lioncache
cd5a809ec9 OpcodeDispatcher: Handle VCVTPS2PH 2024-06-26 15:05:03 -04:00
Lioncache
045a8efbeb OpcodeDispatcher: Handle VCVTPH2PS
Fairly straightforward, since we already have handling for half-float conversions.
2024-06-26 15:05:00 -04:00
Ryan Houdek
54a1f7d833
Merge pull request #3764 from Sonicadvance1/rorx_masking
BMI2: Ensure rorx immediate masks by operation size correctly.
2024-06-26 11:52:47 -07:00
Ryan Houdek
a515061465
BMI2: Ensure rorx immediate masks by operation size correctly. 2024-06-26 11:11:37 -07:00
Ryan Houdek
122ae5b710
unittests: Adds FMA3 unittests 2024-06-25 11:37:18 -07:00
Ryan Houdek
41923bac99 OpcodeDispatcher: Fixes PCMUL with weird selectors and zero-extend
We had a bug where we weren't correctly ignoring the non-used bits in
the selector. This was causing an assert in the ARM backend.
2024-06-25 12:54:03 -04:00
Alyssa Rosenzweig
77aaa9af4d
Merge pull request #3748 from Sonicadvance1/avx_15
AVX128: More instructions Part 4
2024-06-25 12:39:48 -04:00
Ryan Houdek
48e7aae38f unittests: Adds support for 256-bit vpclmulqdq
It's easy because the test was already written for this in mind.
2024-06-25 10:03:33 -04:00
Ryan Houdek
3a310b8815
Merge pull request #3756 from Sonicadvance1/fix_vmovhlps
Fix VMOVLHPS instruction
2024-06-24 19:14:56 -07:00
Ryan Houdek
bd24ebc96a
unittests: Adds VMOVHLPS unit test
A bit confusing because the instruction encoding is the same between
VMOVHLPS and VMOVLPS so this unittest was missed.

Implement the test to ensure it stays working
2024-06-24 17:22:55 -07:00
Ryan Houdek
99b2018d0e
unittests: Extend vmovntpd test 2024-06-24 16:32:13 -07:00
Ryan Houdek
f0d9c8c10a
AVX128: Fix vmovntdqa failing to zero upper 128-bits 2024-06-24 16:32:09 -07:00
Ryan Houdek
6941a59223
unittests: Split up vtestps unittest to accumulate flags in independent registers.
Makes it easier to see what is failing on the 128-bit side versus
256-bit side.
2024-06-21 00:45:30 -07:00
Ryan Houdek
8fb801069f
unittests: Adds new VAES tests 2024-06-19 05:51:47 -07:00
Mikhail Nitenko
99a43283be unittests/bextr: add SrcSize tests
dougallj mentioned that adding these tests might expose
a bug in bextr. Since bextr implementation was changed
apparently it now works correctly, that's good.
2024-06-10 05:45:12 +00:00
Ryan Houdek
0f26bc20a3 unittests/ASM: Adds palignr tests for zero immediate
These effectively turn in to moves.
2023-10-27 15:07:52 -07:00
Lioncache
24f2796141 VectorOps: Handle SVE VFCADD a little better
If no registers alias, then we can move the first source directly into the
destination and then perform the FCADD operation as opposed to using a
temporary.
2023-10-19 14:48:46 +02:00
Lioncache
1f6c6345d9 VectorOps: Handle SVE VURAvg a little better
We can perform less moves by checking for scenarios where aliasing
occurs. Since addition is commutative (usually, general-case anyway),
order of inputs doesn't strictly matter here.
2023-10-19 12:14:12 +02:00
Lioncache
3d23cd5765 VectorOps: Handle SVE VFDiv a little better
In the event no source vectors alias the destination,
we can just move the first source vector into it and
then perform the divide without needing to move afterword.
2023-10-19 11:45:35 +02:00
Lioncache
39e658f02a VectorOps: Handle more VUMin SVE cases better
We can avoid needing to use movprfx here by moving
directly into the destination when possible and just
doing the UMIN directly
2023-10-18 18:48:13 +02:00
Lioncache
e89dd27f2a VectorOps: Handle more VSMin SVE cases better
We can avoid needing to use movprfx here by moving
directly into the destination when possible and just
doing the SMIN directly.
2023-10-18 18:48:13 +02:00
Lioncache
f85fae0041 VectorOps: Handle more VUMax SVE cases better
We can avoid needing to use movprfx here by moving
directly into the destination when possible and just
doing the UMAX directly.

Also expands the unsigned max tests to test values with
the sign bit set to ensure all behavior is caught.
2023-10-18 18:48:12 +02:00
Lioncache
65eec673fc VectorOps: Handle more VSMax SVE cases better
Since SMAX performs a comparison and returns the max value regardless
of how the operands are provided, we can check for when the second
input aliases the destination.
2023-10-18 18:48:03 +02:00
Mai
ab4642af38
Merge pull request #3167 from Sonicadvance1/gatherqdps
unittests/ASM: Implements tests for vpgatherqd/vgatherqps
2023-09-29 12:16:43 -04:00
Mai
d94e5ce7f4
Merge pull request #3168 from Sonicadvance1/gatherqqpd
unittests/ASM: Implements tests for vpgatherqq/vgatherqpd
2023-09-29 12:16:12 -04:00
Ryan Houdek
a21def7d74 unittests/ASM: Implements tests for vpgatherqq/vgatherqpd
Similar to previous tests, vpgatherqq and vgatherqpd are equivalent
instructions. So the tests are the same with the mnemonic changed.

This adds tests for an additional two sets of instructions. Getting us
full coverage of all eight instructions if we include the tests from
PR #3167 and #3166

Tests the same things as described in #3165

In addition, since these tests use 64-bit indices for address
calculation, we can easily generate and indice vector that tests
overflow. So every test at every displacement ALSO gains an additional
overflow test to ensure correct behaviour around pointer overflow
calculation.
2023-09-29 08:04:47 -07:00
Ryan Houdek
0d8d5444a4 unittests/ASM: Implements tests for vpgatherqd/vgatherqps
Similar to previous tests, vgatherqd and vgatherqps are equivalent
instructions. So the tests are the same with the mnemonic changed.

This adds tests for an additional two sets of instructions, Getting us
up to six total over the eight if we include the tests from #3166.

Tests the same things as described in #3165

In addition, since these tests use 64-bit indices for address
calculation, we can easily generate and indice vector that tests
overflow. So every test at every displacement ALSO gains and additional
overflow test to ensure correct behaviour around pointer overflow
calculation.
2023-09-29 07:20:07 -07:00
Ryan Houdek
eedfad5036 unittests/ASM: Implements tests for vpgatherdq/vgatherpq
Just like the previous tests, vpgatherdq and vgatherpq are equivalent
instructions. So the tests are the same except for the instruction
mnemonic again.

This adds unittests for two more of the eight gather instructions.
Getting us up to testing four in total.
Specifically this adds tests for 32-bit indices while loading 64-bit
element instructions.

Same thing as PR #3165 for what it tests versus doesn't.
2023-09-28 22:49:03 -07:00
Ryan Houdek
9a01b440e3 unittests/ASM: Implements tests for vpgatherdd/vgatherps
vpgatherdd and vgatherps are effectively the same instructions, so the
tests are the same except for the instruction mnemonic.

This adds unit tests for two of the eight gather instructions.
Specifically this adds tests for the 32-bit indices loading 32-bit
elements instructions.

What it tests:
- Tests all displacement scales
- Tests multiple mask arrangements
- Ensures the mask register is zero'd after the instruction

What it doesn't test:
- Doesn't test address size calculation overflow
   - Only would happen on 32-bit with 32-bit indices, or /really/ high
     base addresses
   - The instruction should behave as a mask to the address size
   - Effectively behaves like `(uint64_t)(base + index << ilog2(scale))`
   - Better idea is to just not expose AVX to 32-bit applications
- Doesn't test VSIB immediate displacement
   - This just ends up being base_addr + imm so it isn't too interesting
   - We can add more tests in the future if we think we messed that up
- Doesn't test partial fault behaviour
   - Because that's a nightmare.

Specifically keeps each instruction test small and isolated so if a
single register fails it is very easily to nail down which operation did
it.
I know some of our ASM tests do a chunk of work and spit out a result at
the end which can be difficult to debug in some cases. Didn't want to do
that which is why the tests are spread out across 16 files for these
single class of instructions.
2023-09-28 19:58:34 -07:00
Lioncache
b0c8ff0ea6 Arm64/ALUOps: Remove spills in PEXT
Reduces the number of emitted instructions for a
corresponding PEXT instruction.

We no longer spill for this IR op.
2023-09-15 19:39:51 -04:00
Lioncache
4a37ea4819 OpcodeDispatcher: Handle RORX corner cases better
There are a few cases where we were emitting code when we
didn't really need to, or could emit less.
2023-09-15 17:36:36 -04:00
Lioncache
bbed4d73ed OpcodeDispatcher: Improve VPERMQ/VPERMPD broadcast cases
For a bunch of cases that act as broadcasts (where all
indices in the imm8 specify the same element), we
can use VDupElement here rather than iterating through.
2023-08-19 01:08:30 -04:00
Lioncache
9e54ec2724 OpcodeDispatcher: Improve {V}PSRLDQ shift by 0
While it would be bizarre if this actually occurred frequently
in practice, we can still tune it so there's no subpar assembly
output in the cases it actually does happen.
2023-08-17 19:33:09 -04:00
Lioncache
f7c663240e OpcodeDispatcher: Handle PCMPESTRM/VPCMPESTRM
...and with that all of the SSE4.2 string instructions are implemented now
2023-05-17 00:21:55 -04:00
Lioncache
82b4aef30d OpcodeDispatcher: Handle PCMPISTRM/VPCMPISTRM 2023-05-16 22:59:54 -04:00
Ryan Houdek
88247141d7
Merge pull request #2649 from lioncash/istri
OpcodeDispatcher: Handle PCMPISTRI/VPCMPISTRI
2023-05-02 14:35:47 -07:00
Lioncache
f502154f96 OpcodeDispatcher: Handle VPCMPISTRI 2023-05-02 14:00:05 -04:00
Lioncache
8369f9c25b unittests: Add missing VPMASKMOVQ store test
Realized I forgot to add this in the commit that added
VPMASKMOVD/VPMASKMOVQ support.
2023-05-02 11:13:44 -04:00
Lioncache
c94721a04b OpcodeDispatcher: Handle VPMASKMOVD/VPMASKMOVQ
We can reuse the same helper we have for handling VMASKMOVPD and VMASKMOVPS,
though we need to move some handling around to account for the fact that
VPMASKMOVD and VPMASKMOVQ 'hijack' the REX.W bit to signify the element
size of the operation.
2023-04-24 10:50:11 -04:00