Ryan Houdek
|
45c27b2965
|
CPUID: Enable support for FMA3 when AVX is enabled
|
2024-06-25 11:24:53 -07:00 |
|
Ryan Houdek
|
832b247fc1
|
SVE258: Implement support for FMA3
|
2024-06-25 11:24:46 -07:00 |
|
Ryan Houdek
|
0e8b53d566
|
AVX128: Implement FMA3 instructions
|
2024-06-25 11:23:50 -07:00 |
|
Ryan Houdek
|
d03d69273b
|
X86Tables: Describe FMA3 instructions
|
2024-06-25 11:22:27 -07:00 |
|
Ryan Houdek
|
efa05ba19d
|
IR: Adds support for new SUBADD FMA constants
ADDSUB didn't cover this new variant.
|
2024-06-25 11:22:22 -07:00 |
|
Ryan Houdek
|
41923bac99
|
OpcodeDispatcher: Fixes PCMUL with weird selectors and zero-extend
We had a bug where we weren't correctly ignoring the non-used bits in
the selector. This was causing an assert in the ARM backend.
|
2024-06-25 12:54:03 -04:00 |
|
Alyssa Rosenzweig
|
c6148f6bf1
|
AVX128: fix VPCLMULQDQl
use the helper. I assumed the lack of zero extension here was intentional.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
|
2024-06-25 12:51:21 -04:00 |
|
Alyssa Rosenzweig
|
77aaa9af4d
|
Merge pull request #3748 from Sonicadvance1/avx_15
AVX128: More instructions Part 4
|
2024-06-25 12:39:48 -04:00 |
|
Ryan Houdek
|
00cf8d530c
|
Merge pull request #3752 from Sonicadvance1/fma_ir_operations
ARM64: Adds new FMA vector instructions
|
2024-06-25 09:07:06 -07:00 |
|
Ryan Houdek
|
a8255aa475
|
CPUID: Expose support for VPCLMULQDQ
Wasn't exposed before since we couldn't unit test the SVE256
implementation.
|
2024-06-25 10:03:33 -04:00 |
|
Ryan Houdek
|
7069643ae6
|
AVX128: Implement support for VPCLMULQDQ
This is just the 128-bit version twice.
|
2024-06-25 10:03:33 -04:00 |
|
Ryan Houdek
|
34272fc134
|
AVX128: Implement support for vperm{d,ps}!
|
2024-06-25 10:03:33 -04:00 |
|
Ryan Houdek
|
1d41002dfe
|
AVX128: Implement support for variable vpermil{ps,pd}
|
2024-06-25 10:03:33 -04:00 |
|
Ryan Houdek
|
563bf342d5
|
AVX128: Implement support for vptest
|
2024-06-25 10:03:33 -04:00 |
|
Ryan Houdek
|
efd5fabb95
|
AVX128: Implement support for vtest{ps,pd}
|
2024-06-25 10:03:33 -04:00 |
|
Ryan Houdek
|
c1da525110
|
AVX128: Implement support for vperm2{f128,i128}
|
2024-06-25 10:03:33 -04:00 |
|
Ryan Houdek
|
5ce6c88a88
|
AVX128: Reenable {ldm,stm}mxcsr. Can use the regular implementation.
|
2024-06-25 10:03:33 -04:00 |
|
Ryan Houdek
|
64cce7c6fa
|
AVX128: Implement support for xsave/xrstor
|
2024-06-25 10:03:33 -04:00 |
|
Ryan Houdek
|
4544e5b51f
|
AVX128: Implement support for vblend{ps,pd}/vpblendvb
|
2024-06-25 10:03:33 -04:00 |
|
Ryan Houdek
|
eb3e314946
|
AVX128: Implement support for vmaskmovdqu
|
2024-06-25 10:03:33 -04:00 |
|
Ryan Houdek
|
8b65c3de10
|
AVX128: Implement vmaskmov{ps,pd}, vpmaskmov{d,q} using SVE2 gather loadstores.
|
2024-06-25 10:03:33 -04:00 |
|
Ryan Houdek
|
c2beb27a9d
|
AVX128: Implement support for vpalignr
|
2024-06-25 10:03:33 -04:00 |
|
Ryan Houdek
|
05fdec9e72
|
AVX128: Implement support for vmpsadbw
|
2024-06-25 10:03:33 -04:00 |
|
Ryan Houdek
|
e8e3c95349
|
AVX128: Implement support for vpsadbw
|
2024-06-25 10:03:33 -04:00 |
|
Ryan Houdek
|
a87fa3f246
|
AVX128: Implement support for vpshufb
|
2024-06-25 10:03:33 -04:00 |
|
Ryan Houdek
|
b31ad523f5
|
AVX128: Implement support for hsub{ps,pd}
|
2024-06-25 10:03:33 -04:00 |
|
Ryan Houdek
|
34bce540ff
|
AVX128: Implement support for vpblendw/vpblendd/vblendps/vblendpd
|
2024-06-25 10:03:33 -04:00 |
|
Ryan Houdek
|
8ea38e1d80
|
AVX128: Implement support for vpmaddwd
|
2024-06-25 10:03:33 -04:00 |
|
Ryan Houdek
|
ce591a9541
|
AVX128: Implement support for vpmaddubsw
|
2024-06-25 10:03:33 -04:00 |
|
Ryan Houdek
|
a48c65cd65
|
AVX128: Implement support for vphaddsw
|
2024-06-25 10:03:33 -04:00 |
|
Ryan Houdek
|
c283f80f48
|
AVX128: Implement support for vhaddpd/vphadd{w,d}
|
2024-06-25 10:03:33 -04:00 |
|
Ryan Houdek
|
d6bf276b5a
|
AVX128: Implement support for imm vpermil{ps,pd}
|
2024-06-25 10:03:33 -04:00 |
|
Ryan Houdek
|
96a51650b1
|
AVX128: Implement support for vshuf{ps,pd}
|
2024-06-25 10:03:33 -04:00 |
|
Ryan Houdek
|
f35a9c74a2
|
AVX128: Implement support for vpshuf{lw,hw,d}
|
2024-06-25 10:03:33 -04:00 |
|
Ryan Houdek
|
e2457943f5
|
AVX128: Implement support for vperm{q,pd}
|
2024-06-25 10:03:33 -04:00 |
|
Ryan Houdek
|
a05644172a
|
AVX128: Implement support for vdd{ps,pd}
|
2024-06-25 10:03:33 -04:00 |
|
Ryan Houdek
|
cc168ce0fb
|
VectorOps: Restructure DPPOpImpl. This will get reused by AVX128
|
2024-06-25 10:03:33 -04:00 |
|
Alyssa Rosenzweig
|
76bd22d279
|
OpcodeDispatcher: rm gratuitous lambda
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
|
2024-06-25 10:03:33 -04:00 |
|
Alyssa Rosenzweig
|
18574f3cf1
|
OpcodeDispatcher: extract VPERMILRegOpImpl
for avx
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
|
2024-06-25 10:03:33 -04:00 |
|
Alyssa Rosenzweig
|
665215ab47
|
OpcodeDispatcher: extract PTestOpImpl
for avx128
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
|
2024-06-25 09:52:48 -04:00 |
|
Alyssa Rosenzweig
|
6009f36403
|
OpcodeDispatcher: extract VPERMDIndices
and rename things accordingly.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
|
2024-06-25 09:46:17 -04:00 |
|
Alyssa Rosenzweig
|
2580efda0d
|
OpcodeDispatcher: tweak VTestOp signature
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
|
2024-06-25 09:46:17 -04:00 |
|
Ryan Houdek
|
3a310b8815
|
Merge pull request #3756 from Sonicadvance1/fix_vmovhlps
Fix VMOVLHPS instruction
|
2024-06-24 19:14:56 -07:00 |
|
Ryan Houdek
|
7ff96227c0
|
Merge pull request #3755 from Sonicadvance1/fix_avx128_vmovntdqa
AVX128: Fix vmovntdqa failing to zero upper 128-bits
|
2024-06-24 19:14:48 -07:00 |
|
Ryan Houdek
|
6d3745b8f1
|
AVX128: Fixes VMOVLHPS instruction
We didn't have unit tests for this
|
2024-06-24 17:22:51 -07:00 |
|
Ryan Houdek
|
d0f0b975be
|
SVE256: Fixes VMOVLHPS instruction
We didn't have unit tests for this
|
2024-06-24 17:22:47 -07:00 |
|
Ryan Houdek
|
ff2e6ed59f
|
X86Tables: Fixes instruction encoding for VMOVLP{S,D}
These can have both register and memory modrm encoding
|
2024-06-24 17:22:43 -07:00 |
|
Ryan Houdek
|
f0d9c8c10a
|
AVX128: Fix vmovntdqa failing to zero upper 128-bits
|
2024-06-24 16:32:09 -07:00 |
|
Ryan Houdek
|
b47e981932
|
AVX128: Fixes SSE4.2 string compare instructions
|
2024-06-24 15:54:06 -07:00 |
|
Ryan Houdek
|
dc44eb4caf
|
Merge pull request #3749 from Sonicadvance1/contigous_mask_optimization_removal
Arm64: Remove contiguous masked element optimization
|
2024-06-24 15:22:23 -07:00 |
|