5415 Commits

Author SHA1 Message Date
Mai
9a8852f9b6
Merge pull request #2250 from Sonicadvance1/optimize_spilling_filling
Arm64: Optimizing spilling and filling
2022-12-16 04:47:22 +00:00
Mai
65e8bf9d72
Merge pull request #2253 from Sonicadvance1/single_page_dispatcher
Arm64: Reduce dispatcher to 1 page
2022-12-16 04:44:55 +00:00
Ryan Houdek
344ec33ba5
Merge pull request #2252 from lioncash/fadd
Arm64/VectorOps: Simplify FADDP result merging
2022-12-15 20:37:11 -08:00
Ryan Houdek
5dc7dfacb3 Arm64: Reduce dispatcher to 1 page
We currently only use 2236 bytes, no need for two pages.
Once #2250 is merged we will use 1716 bytes
2022-12-15 20:33:28 -08:00
lioncash
122aa8a69a Arm64/VectorOps: Simplify FADDP result merging
Keeps the implementation similarly in sync with VAddP.
2022-12-16 04:19:46 +00:00
Ryan Houdek
8ce6c08152
Merge pull request #2251 from lioncash/hadd
OpcodeDispatcher: Handle VPHADDW/VPHADDD
2022-12-15 20:11:07 -08:00
Ryan Houdek
1beb791d52 Arm64: Optimizing spilling and filling
Just makes these a little more optimal when jumping out of the JIT.

Noticed these while working on the new emitter.
2022-12-15 20:04:16 -08:00
lioncash
27c0d4a9f5 OpcodeDispatcher: Handle VPHADDD 2022-12-16 03:28:57 +00:00
lioncash
dd4ba7562f OpcodeDispatcher: Handle VPHADDW 2022-12-16 03:28:57 +00:00
lioncash
bd9d8e8fe5 x86_64: Correct handling for 128-bit/256-bit VAddP
Makes the behavior consistent with the ARM JIT.
2022-12-16 03:28:57 +00:00
lioncash
c7ac204322 Arm64/VectorOps: Simplify VAddP merging operation
We can just merge the two results together instead of shifting to the
left and then ORing together.
2022-12-16 03:28:52 +00:00
Ryan Houdek
4c013c867f
Merge pull request #2249 from lioncash/clear
Crypto: Explicitly clear upper lane with VPCLMULQDQ
2022-12-15 17:33:57 -08:00
lioncash
5e634fcbc9 Crypto: Explicitly clear upper lane with VPCLMULQDQ
Ensures the 128-bit case will be handled when extending for 256-bit
2022-12-16 01:08:48 +00:00
Ryan Houdek
91c00d2cb6
Merge pull request #2248 from lioncash/acc
X86Tables: Restrict CVTDQ2PD and CVTTSD2SI to 64-bit memory accesses
2022-12-15 16:48:42 -08:00
lioncash
e985dcdb22 X86Tables: Restrict CVTTSD2SI src to 64 bit
When accessing memory, this should only be doing a 64-bit access, rather
than a 128-bit one.
2022-12-15 23:59:15 +00:00
lioncash
ee9778480d X86Tables: Restrict CVTDQ2PD src to 64 bit
When accessing memory, this should only be doing a 64-bit access, rather
than a 128-bit one.
2022-12-15 23:46:48 +00:00
Mai
048daa4579
Merge pull request #2244 from Sonicadvance1/move_to_header
ARM64: Moves RA functions to header
2022-12-15 23:13:32 +00:00
Ryan Houdek
6ae8a1e55f ARM64: Moves RA functions to header
These are just some basic address calculations and a load, we want these
to be inlined as much as possible.
2022-12-15 15:00:33 -08:00
Ryan Houdek
dc2eaf6511
Merge pull request #2246 from lioncash/extend
OpcodeDispatcher: Handle VPMOVSXB{D, W, Q}/VPMOVSXW{D, Q}/VPMOVSXDQ/VPMOVZXB{D, W, Q}/VPMOVZXW{D, Q}/VPMOVZXDQ
2022-12-15 14:19:28 -08:00
Ryan Houdek
0e233a96f0
Merge pull request #2247 from lioncash/roundacc
OpcodeDispatcher: Narrow memory access with scalar rounding operations
2022-12-15 14:17:49 -08:00
lioncash
ba5fafcd7f OpcodeDispatcher: Narrow memory access with scalar rounding operations
These should only be accessing a 32-bit or 64-bit portion of memory
depending on single or double precision variants are used. Previously
we'd be doing a full 128-bit load.
2022-12-15 19:42:37 +00:00
lioncash
b12503fe32 OpcodeDispatcher: Handle VPMOVSXDQ 2022-12-15 18:10:38 +00:00
lioncash
aa63c7b94d OpcodeDispatcher: Handle VPMOVSXWQ 2022-12-15 18:08:00 +00:00
lioncash
cccbb7f595 OpcodeDispatcher: Handle VPMOVSXWD 2022-12-15 18:01:43 +00:00
lioncash
ce12ed60ae OpcodeDispatcher: Handle VPMOVSXBQ 2022-12-15 17:58:42 +00:00
lioncash
d7eab5f787 OpcodeDispatcher: Handle VPMOVSXBD 2022-12-15 17:54:51 +00:00
lioncash
21537a3636 OpcodeDispatcher: Handle VPMOVSXBW 2022-12-15 17:50:58 +00:00
lioncash
588a2611a7 OpcodeDispatcher: Handle VPMOVZXDQ 2022-12-15 17:45:14 +00:00
lioncash
2895a09101 OpcodeDispatcher: Handle VPMOVZXWQ 2022-12-15 17:41:15 +00:00
lioncash
5c8d40d9be OpcodeDispatcher: Handle VPMOVZXWD 2022-12-15 17:37:51 +00:00
lioncash
b4079cfea3 OpcodeDispatcher: Handle VPMOVZXBQ 2022-12-15 17:32:18 +00:00
lioncash
2b5570a910 OpcodeDispatcher: Handle VPMOVZXBD 2022-12-15 17:28:35 +00:00
lioncash
6bb0c5b24c OpcodeDispatcher: Handle VPMOVZXBW 2022-12-15 17:18:49 +00:00
lioncash
bc31f98f16 OpcodeDispatcher: Move ExtendVectorElements impl to regular function
This can be reused for the AVX versions.
2022-12-15 17:11:02 +00:00
Ryan Houdek
4b891d6147
Merge pull request #2245 from lioncash/split
OpcodeDispatcher: Move template impl to regular function where applicable
2022-12-14 18:18:43 -08:00
lioncash
58c3e20bd1 OpcodeDispatcher: Move template impl to regular function where applicable
Reduces the amount of code size generated by the specializations.

Only targets ones that are heavily templated like the generic op helper
functions.
2022-12-15 01:54:12 +00:00
Ryan Houdek
d5f3a091d0
Merge pull request #2216 from Sonicadvance1/32bit_host_thunk_support
Initial 32-bit host thunk feature support
2022-12-14 12:05:37 -08:00
Ryan Houdek
a14e03f35d Update guest thunk lib register usage comment 2022-12-14 11:40:33 -08:00
Ryan Houdek
5c1789952e GuestThunks: Disable stack protector on 32-bit 2022-12-14 11:29:19 -08:00
Ryan Houdek
f5809f24f7 GuestLibs: Fixes accidental guest lib setting 2022-12-14 11:29:19 -08:00
Ryan Houdek
122a9114a3 Thunks: 32-bit host library support 2022-12-14 11:29:19 -08:00
Ryan Houdek
d8f226b460 Support 32-bit thunks ABI 2022-12-14 11:29:19 -08:00
Ryan Houdek
7171c5ae39 Support 32-bit thunksdb 2022-12-14 11:29:19 -08:00
Ryan Houdek
798a78534a Support Indirect thunk callback with mm0 as custom ABI 2022-12-14 11:24:18 -08:00
Ryan Houdek
ae4a04b560 Fix incorrect THUNK_ABI prefix 2022-12-14 11:24:18 -08:00
Ryan Houdek
1971c8d505 32bit host thunk lib config path support 2022-12-14 11:24:18 -08:00
Ryan Houdek
1ca356371d
Merge pull request #2242 from lioncash/round
OpcodeDispatcher: Handle VROUNDS{D, S}/VROUNDP{D, S}
2022-12-13 23:00:51 -08:00
lioncash
27ea6096a2 OpcodeDispatcher: Handle VROUNDSD 2022-12-14 06:41:36 +00:00
lioncash
2244dd9847 OpcodeDispatcher: Handle VROUNDSS 2022-12-14 06:34:58 +00:00
lioncash
ca2f4bd468 OpcodeDispatcher: Handle VROUNDPD 2022-12-14 06:28:17 +00:00