1519 Commits

Author SHA1 Message Date
Ryan Houdek
45c27b2965
CPUID: Enable support for FMA3 when AVX is enabled 2024-06-25 11:24:53 -07:00
Ryan Houdek
832b247fc1
SVE258: Implement support for FMA3 2024-06-25 11:24:46 -07:00
Ryan Houdek
0e8b53d566
AVX128: Implement FMA3 instructions 2024-06-25 11:23:50 -07:00
Ryan Houdek
d03d69273b
X86Tables: Describe FMA3 instructions 2024-06-25 11:22:27 -07:00
Ryan Houdek
efa05ba19d
IR: Adds support for new SUBADD FMA constants
ADDSUB didn't cover this new variant.
2024-06-25 11:22:22 -07:00
Ryan Houdek
41923bac99 OpcodeDispatcher: Fixes PCMUL with weird selectors and zero-extend
We had a bug where we weren't correctly ignoring the non-used bits in
the selector. This was causing an assert in the ARM backend.
2024-06-25 12:54:03 -04:00
Alyssa Rosenzweig
c6148f6bf1 AVX128: fix VPCLMULQDQl
use the helper. I assumed the lack of zero extension here was intentional.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-06-25 12:51:21 -04:00
Alyssa Rosenzweig
77aaa9af4d
Merge pull request #3748 from Sonicadvance1/avx_15
AVX128: More instructions Part 4
2024-06-25 12:39:48 -04:00
Ryan Houdek
00cf8d530c
Merge pull request #3752 from Sonicadvance1/fma_ir_operations
ARM64: Adds new FMA vector instructions
2024-06-25 09:07:06 -07:00
Ryan Houdek
a8255aa475 CPUID: Expose support for VPCLMULQDQ
Wasn't exposed before since we couldn't unit test the SVE256
implementation.
2024-06-25 10:03:33 -04:00
Ryan Houdek
7069643ae6 AVX128: Implement support for VPCLMULQDQ
This is just the 128-bit version twice.
2024-06-25 10:03:33 -04:00
Ryan Houdek
34272fc134 AVX128: Implement support for vperm{d,ps}! 2024-06-25 10:03:33 -04:00
Ryan Houdek
1d41002dfe AVX128: Implement support for variable vpermil{ps,pd} 2024-06-25 10:03:33 -04:00
Ryan Houdek
563bf342d5 AVX128: Implement support for vptest 2024-06-25 10:03:33 -04:00
Ryan Houdek
efd5fabb95 AVX128: Implement support for vtest{ps,pd} 2024-06-25 10:03:33 -04:00
Ryan Houdek
c1da525110 AVX128: Implement support for vperm2{f128,i128} 2024-06-25 10:03:33 -04:00
Ryan Houdek
5ce6c88a88 AVX128: Reenable {ldm,stm}mxcsr. Can use the regular implementation. 2024-06-25 10:03:33 -04:00
Ryan Houdek
64cce7c6fa AVX128: Implement support for xsave/xrstor 2024-06-25 10:03:33 -04:00
Ryan Houdek
4544e5b51f AVX128: Implement support for vblend{ps,pd}/vpblendvb 2024-06-25 10:03:33 -04:00
Ryan Houdek
eb3e314946 AVX128: Implement support for vmaskmovdqu 2024-06-25 10:03:33 -04:00
Ryan Houdek
8b65c3de10 AVX128: Implement vmaskmov{ps,pd}, vpmaskmov{d,q} using SVE2 gather loadstores. 2024-06-25 10:03:33 -04:00
Ryan Houdek
c2beb27a9d AVX128: Implement support for vpalignr 2024-06-25 10:03:33 -04:00
Ryan Houdek
05fdec9e72 AVX128: Implement support for vmpsadbw 2024-06-25 10:03:33 -04:00
Ryan Houdek
e8e3c95349 AVX128: Implement support for vpsadbw 2024-06-25 10:03:33 -04:00
Ryan Houdek
a87fa3f246 AVX128: Implement support for vpshufb 2024-06-25 10:03:33 -04:00
Ryan Houdek
b31ad523f5 AVX128: Implement support for hsub{ps,pd} 2024-06-25 10:03:33 -04:00
Ryan Houdek
34bce540ff AVX128: Implement support for vpblendw/vpblendd/vblendps/vblendpd 2024-06-25 10:03:33 -04:00
Ryan Houdek
8ea38e1d80 AVX128: Implement support for vpmaddwd 2024-06-25 10:03:33 -04:00
Ryan Houdek
ce591a9541 AVX128: Implement support for vpmaddubsw 2024-06-25 10:03:33 -04:00
Ryan Houdek
a48c65cd65 AVX128: Implement support for vphaddsw 2024-06-25 10:03:33 -04:00
Ryan Houdek
c283f80f48 AVX128: Implement support for vhaddpd/vphadd{w,d} 2024-06-25 10:03:33 -04:00
Ryan Houdek
d6bf276b5a AVX128: Implement support for imm vpermil{ps,pd} 2024-06-25 10:03:33 -04:00
Ryan Houdek
96a51650b1 AVX128: Implement support for vshuf{ps,pd} 2024-06-25 10:03:33 -04:00
Ryan Houdek
f35a9c74a2 AVX128: Implement support for vpshuf{lw,hw,d} 2024-06-25 10:03:33 -04:00
Ryan Houdek
e2457943f5 AVX128: Implement support for vperm{q,pd} 2024-06-25 10:03:33 -04:00
Ryan Houdek
a05644172a AVX128: Implement support for vdd{ps,pd} 2024-06-25 10:03:33 -04:00
Ryan Houdek
cc168ce0fb VectorOps: Restructure DPPOpImpl. This will get reused by AVX128 2024-06-25 10:03:33 -04:00
Alyssa Rosenzweig
76bd22d279 OpcodeDispatcher: rm gratuitous lambda
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-06-25 10:03:33 -04:00
Alyssa Rosenzweig
18574f3cf1 OpcodeDispatcher: extract VPERMILRegOpImpl
for avx

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-06-25 10:03:33 -04:00
Alyssa Rosenzweig
665215ab47 OpcodeDispatcher: extract PTestOpImpl
for avx128

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-06-25 09:52:48 -04:00
Alyssa Rosenzweig
6009f36403 OpcodeDispatcher: extract VPERMDIndices
and rename things accordingly.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-06-25 09:46:17 -04:00
Alyssa Rosenzweig
2580efda0d OpcodeDispatcher: tweak VTestOp signature
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-06-25 09:46:17 -04:00
Ryan Houdek
3a310b8815
Merge pull request #3756 from Sonicadvance1/fix_vmovhlps
Fix VMOVLHPS instruction
2024-06-24 19:14:56 -07:00
Ryan Houdek
7ff96227c0
Merge pull request #3755 from Sonicadvance1/fix_avx128_vmovntdqa
AVX128: Fix vmovntdqa failing to zero upper 128-bits
2024-06-24 19:14:48 -07:00
Ryan Houdek
6d3745b8f1
AVX128: Fixes VMOVLHPS instruction
We didn't have unit tests for this
2024-06-24 17:22:51 -07:00
Ryan Houdek
d0f0b975be
SVE256: Fixes VMOVLHPS instruction
We didn't have unit tests for this
2024-06-24 17:22:47 -07:00
Ryan Houdek
ff2e6ed59f
X86Tables: Fixes instruction encoding for VMOVLP{S,D}
These can have both register and memory modrm encoding
2024-06-24 17:22:43 -07:00
Ryan Houdek
f0d9c8c10a
AVX128: Fix vmovntdqa failing to zero upper 128-bits 2024-06-24 16:32:09 -07:00
Ryan Houdek
b47e981932
AVX128: Fixes SSE4.2 string compare instructions 2024-06-24 15:54:06 -07:00
Ryan Houdek
dc44eb4caf
Merge pull request #3749 from Sonicadvance1/contigous_mask_optimization_removal
Arm64: Remove contiguous masked element optimization
2024-06-24 15:22:23 -07:00