9825 Commits

Author SHA1 Message Date
Ryan Houdek
add0e7a8db
HostFeatures: Removes distinction between AVX and AVX2
We now no longer care about AVX versions, consolidate them in to a
single config option which enables both.
2024-06-26 14:56:01 -07:00
Ryan Houdek
52e541d453
Unittests: Stop using AVX2 flag 2024-06-26 14:56:01 -07:00
Mai
a031a49546
Merge pull request #3767 from Sonicadvance1/avx128_fix_wide_shift
AVX128: Fixes wide shifts
2024-06-26 17:29:09 -04:00
Alyssa Rosenzweig
4d821b8dd8
Merge pull request #3765 from Sonicadvance1/avx128_f16c
AVX128: F16C support
2024-06-26 17:25:05 -04:00
Ryan Houdek
f277025c9a
AVX128: Fixes wide shifts
During refactoring this was missed and rerunning unittests locally
caught it. 256-bit operations get their shift only from the lower half
of the vector register.
2024-06-26 14:16:39 -07:00
Ryan Houdek
ba28e6f82e
unittests: Adds vcvtps2ph tests that use mxcsr 2024-06-26 14:08:20 -07:00
Ryan Houdek
3a89df9bed
AVX128: Implement support for F16C 2024-06-26 14:05:12 -07:00
Ryan Houdek
f6a0866fbb
IR: Split Vector_FToF2 in to VFCVTL2 and VCVTFN2
I forgot in the narrowing case we need to be careful about insert. No IR
op used Vector_FToF2 with narrowing.
2024-06-26 14:03:41 -07:00
Ryan Houdek
756fa2ecc5
Merge pull request #3766 from alyssarosenzweig/opt/f16c-round
Optimize vcvtps2ph
2024-06-26 14:03:24 -07:00
Alyssa Rosenzweig
cf834aa6da InstCountCI: Update
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-06-26 16:46:21 -04:00
Alyssa Rosenzweig
d2324f4a93 OpcodeDispatcher: optimize vcvtps2ph
We can avoid a LOT of pointless work with some dedicated IR ops for specifically
overriding the round mode.

Small behaviour change here: we no longer reset FTZ. I think this is a bug fix?
But if it's not it's not hard to fix.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-06-26 16:46:21 -04:00
Ryan Houdek
6226c7f4f3
Merge pull request #3757 from Sonicadvance1/avx_16
AVX128: Implement support for gathers
2024-06-26 13:29:58 -07:00
Ryan Houdek
991ecd558e InstcountCI: Update for SVE256 gathers! 2024-06-26 16:00:53 -04:00
Ryan Houdek
a4fa3a460e OpcodeDispatcher: Implement AVX gathers with SVE256
Just to ensure we still have feature parity.
2024-06-26 16:00:53 -04:00
Ryan Houdek
77ba708933 AVX128: Implement support for gather load instructions
This is the last family of instructions that we needed to implement for
AVX2 to be properly advertised!
2024-06-26 16:00:53 -04:00
Ryan Houdek
662d50a966 X86Tables: Describe VPGather in the VEX tables 2024-06-26 16:00:53 -04:00
Ryan Houdek
5472d1cc04 Arm64: Implement VLoadVectorGatherMasked operation
This does a gather load three ways, SVE256, SVE128, and ASIMD.

This operation is a bit special since it it can't quite handle all
gather loadstores in the 256-bit case and requires the frontend to
decompose the operation in the case that the striding hits a mode that
SVE doesn't support!

The 128-bit case is a lot simpler since both support all the cases where
stride doesn't match. I find this to be a nice compromise while there
aren't any SVE256 products on the market.

In the 128-bit case there is an SVE path which is utilized if the passed
in stride supports what SVE understands, otherwise it falls back to an
ASIMD implementation which manually emulates everything that is
necessary.

This instruction is very explicitly doing basically exactly what AVX
gather instructions want, because it's complex enough that we don't want
to try and make this a generic solution.
2024-06-26 16:00:53 -04:00
Alyssa Rosenzweig
d1d41f5645
Merge pull request #3763 from alyssarosenzweig/rclse/less-aggressive
Remove RCLSE
2024-06-26 15:14:14 -04:00
Ryan Houdek
94fd100fc7
Merge pull request #3719 from lioncash/f16c
OpcodeDispatcher: Handle F16C operations
2024-06-26 12:12:13 -07:00
Lioncache
b9ff36b5d9 CPUID: Signify F16C support if AVX is available
On Aarch64 hardware, if we have SVE2 available (which we use in the AVX implementation),
then we can also enable F16C support.
2024-06-26 15:05:03 -04:00
Lioncache
cd5a809ec9 OpcodeDispatcher: Handle VCVTPS2PH 2024-06-26 15:05:03 -04:00
Lioncache
045a8efbeb OpcodeDispatcher: Handle VCVTPH2PS
Fairly straightforward, since we already have handling for half-float conversions.
2024-06-26 15:05:00 -04:00
Ryan Houdek
54a1f7d833
Merge pull request #3764 from Sonicadvance1/rorx_masking
BMI2: Ensure rorx immediate masks by operation size correctly.
2024-06-26 11:52:47 -07:00
Alyssa Rosenzweig
1b496cda8f InstCountCI: Update
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-06-26 14:49:58 -04:00
Alyssa Rosenzweig
a5b24bfe4c IR: drop RCLSE
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-06-26 14:49:05 -04:00
Alyssa Rosenzweig
46676ca376 OpcodeDispatcher: add cache
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-06-26 14:49:05 -04:00
Alyssa Rosenzweig
7d939a3b3d
Merge pull request #3758 from Sonicadvance1/avx_17
AVX128: FMA3
2024-06-26 14:18:32 -04:00
Ryan Houdek
a515061465
BMI2: Ensure rorx immediate masks by operation size correctly. 2024-06-26 11:11:37 -07:00
Ryan Houdek
1c24d63f73
Merge pull request #3762 from alyssarosenzweig/bug/constprop-bextr 2024-06-26 09:28:18 -07:00
Alyssa Rosenzweig
7e10dba5e2 unittests: add test for a BEXTR bug
Ryan reduced this test while debugging openssl. This fails without the constprop
fix.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-06-26 12:06:47 -04:00
Alyssa Rosenzweig
e2d73014f1 ConstProp: fix LSHR constant prop
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-06-26 12:06:47 -04:00
Ryan Houdek
53aa30596e
InstcountCI: Update 2024-06-25 11:37:18 -07:00
Ryan Houdek
122ae5b710
unittests: Adds FMA3 unittests 2024-06-25 11:37:18 -07:00
Ryan Houdek
45c27b2965
CPUID: Enable support for FMA3 when AVX is enabled 2024-06-25 11:24:53 -07:00
Ryan Houdek
832b247fc1
SVE258: Implement support for FMA3 2024-06-25 11:24:46 -07:00
Ryan Houdek
0e8b53d566
AVX128: Implement FMA3 instructions 2024-06-25 11:23:50 -07:00
Ryan Houdek
d03d69273b
X86Tables: Describe FMA3 instructions 2024-06-25 11:22:27 -07:00
Ryan Houdek
efa05ba19d
IR: Adds support for new SUBADD FMA constants
ADDSUB didn't cover this new variant.
2024-06-25 11:22:22 -07:00
Ryan Houdek
5da205d91a
Merge pull request #3760 from alyssarosenzweig/avx/vpclmulqdql
AVX128: fix VPCLMULQDQl
2024-06-25 10:31:52 -07:00
Ryan Houdek
41923bac99 OpcodeDispatcher: Fixes PCMUL with weird selectors and zero-extend
We had a bug where we weren't correctly ignoring the non-used bits in
the selector. This was causing an assert in the ARM backend.
2024-06-25 12:54:03 -04:00
Alyssa Rosenzweig
c6148f6bf1 AVX128: fix VPCLMULQDQl
use the helper. I assumed the lack of zero extension here was intentional.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-06-25 12:51:21 -04:00
Alyssa Rosenzweig
77aaa9af4d
Merge pull request #3748 from Sonicadvance1/avx_15
AVX128: More instructions Part 4
2024-06-25 12:39:48 -04:00
Ryan Houdek
00cf8d530c
Merge pull request #3752 from Sonicadvance1/fma_ir_operations
ARM64: Adds new FMA vector instructions
2024-06-25 09:07:06 -07:00
Alyssa Rosenzweig
98aa58e9f5 InstCountCI: Update
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-06-25 10:03:33 -04:00
Ryan Houdek
6911917819 Disable vpclmulqdq_256 on simulator 2024-06-25 10:03:33 -04:00
Ryan Houdek
a8255aa475 CPUID: Expose support for VPCLMULQDQ
Wasn't exposed before since we couldn't unit test the SVE256
implementation.
2024-06-25 10:03:33 -04:00
Ryan Houdek
48e7aae38f unittests: Adds support for 256-bit vpclmulqdq
It's easy because the test was already written for this in mind.
2024-06-25 10:03:33 -04:00
Ryan Houdek
7069643ae6 AVX128: Implement support for VPCLMULQDQ
This is just the 128-bit version twice.
2024-06-25 10:03:33 -04:00
Ryan Houdek
34272fc134 AVX128: Implement support for vperm{d,ps}! 2024-06-25 10:03:33 -04:00
Ryan Houdek
1d41002dfe AVX128: Implement support for variable vpermil{ps,pd} 2024-06-25 10:03:33 -04:00