Mai
9a8852f9b6
Merge pull request #2250 from Sonicadvance1/optimize_spilling_filling
...
Arm64: Optimizing spilling and filling
2022-12-16 04:47:22 +00:00
Mai
65e8bf9d72
Merge pull request #2253 from Sonicadvance1/single_page_dispatcher
...
Arm64: Reduce dispatcher to 1 page
2022-12-16 04:44:55 +00:00
Ryan Houdek
344ec33ba5
Merge pull request #2252 from lioncash/fadd
...
Arm64/VectorOps: Simplify FADDP result merging
2022-12-15 20:37:11 -08:00
Ryan Houdek
5dc7dfacb3
Arm64: Reduce dispatcher to 1 page
...
We currently only use 2236 bytes, no need for two pages.
Once #2250 is merged we will use 1716 bytes
2022-12-15 20:33:28 -08:00
lioncash
122aa8a69a
Arm64/VectorOps: Simplify FADDP result merging
...
Keeps the implementation similarly in sync with VAddP.
2022-12-16 04:19:46 +00:00
Ryan Houdek
8ce6c08152
Merge pull request #2251 from lioncash/hadd
...
OpcodeDispatcher: Handle VPHADDW/VPHADDD
2022-12-15 20:11:07 -08:00
Ryan Houdek
1beb791d52
Arm64: Optimizing spilling and filling
...
Just makes these a little more optimal when jumping out of the JIT.
Noticed these while working on the new emitter.
2022-12-15 20:04:16 -08:00
lioncash
27c0d4a9f5
OpcodeDispatcher: Handle VPHADDD
2022-12-16 03:28:57 +00:00
lioncash
dd4ba7562f
OpcodeDispatcher: Handle VPHADDW
2022-12-16 03:28:57 +00:00
lioncash
bd9d8e8fe5
x86_64: Correct handling for 128-bit/256-bit VAddP
...
Makes the behavior consistent with the ARM JIT.
2022-12-16 03:28:57 +00:00
lioncash
c7ac204322
Arm64/VectorOps: Simplify VAddP merging operation
...
We can just merge the two results together instead of shifting to the
left and then ORing together.
2022-12-16 03:28:52 +00:00
Ryan Houdek
4c013c867f
Merge pull request #2249 from lioncash/clear
...
Crypto: Explicitly clear upper lane with VPCLMULQDQ
2022-12-15 17:33:57 -08:00
lioncash
5e634fcbc9
Crypto: Explicitly clear upper lane with VPCLMULQDQ
...
Ensures the 128-bit case will be handled when extending for 256-bit
2022-12-16 01:08:48 +00:00
Ryan Houdek
91c00d2cb6
Merge pull request #2248 from lioncash/acc
...
X86Tables: Restrict CVTDQ2PD and CVTTSD2SI to 64-bit memory accesses
2022-12-15 16:48:42 -08:00
lioncash
e985dcdb22
X86Tables: Restrict CVTTSD2SI src to 64 bit
...
When accessing memory, this should only be doing a 64-bit access, rather
than a 128-bit one.
2022-12-15 23:59:15 +00:00
lioncash
ee9778480d
X86Tables: Restrict CVTDQ2PD src to 64 bit
...
When accessing memory, this should only be doing a 64-bit access, rather
than a 128-bit one.
2022-12-15 23:46:48 +00:00
Mai
048daa4579
Merge pull request #2244 from Sonicadvance1/move_to_header
...
ARM64: Moves RA functions to header
2022-12-15 23:13:32 +00:00
Ryan Houdek
6ae8a1e55f
ARM64: Moves RA functions to header
...
These are just some basic address calculations and a load, we want these
to be inlined as much as possible.
2022-12-15 15:00:33 -08:00
Ryan Houdek
dc2eaf6511
Merge pull request #2246 from lioncash/extend
...
OpcodeDispatcher: Handle VPMOVSXB{D, W, Q}/VPMOVSXW{D, Q}/VPMOVSXDQ/VPMOVZXB{D, W, Q}/VPMOVZXW{D, Q}/VPMOVZXDQ
2022-12-15 14:19:28 -08:00
Ryan Houdek
0e233a96f0
Merge pull request #2247 from lioncash/roundacc
...
OpcodeDispatcher: Narrow memory access with scalar rounding operations
2022-12-15 14:17:49 -08:00
lioncash
ba5fafcd7f
OpcodeDispatcher: Narrow memory access with scalar rounding operations
...
These should only be accessing a 32-bit or 64-bit portion of memory
depending on single or double precision variants are used. Previously
we'd be doing a full 128-bit load.
2022-12-15 19:42:37 +00:00
lioncash
b12503fe32
OpcodeDispatcher: Handle VPMOVSXDQ
2022-12-15 18:10:38 +00:00
lioncash
aa63c7b94d
OpcodeDispatcher: Handle VPMOVSXWQ
2022-12-15 18:08:00 +00:00
lioncash
cccbb7f595
OpcodeDispatcher: Handle VPMOVSXWD
2022-12-15 18:01:43 +00:00
lioncash
ce12ed60ae
OpcodeDispatcher: Handle VPMOVSXBQ
2022-12-15 17:58:42 +00:00
lioncash
d7eab5f787
OpcodeDispatcher: Handle VPMOVSXBD
2022-12-15 17:54:51 +00:00
lioncash
21537a3636
OpcodeDispatcher: Handle VPMOVSXBW
2022-12-15 17:50:58 +00:00
lioncash
588a2611a7
OpcodeDispatcher: Handle VPMOVZXDQ
2022-12-15 17:45:14 +00:00
lioncash
2895a09101
OpcodeDispatcher: Handle VPMOVZXWQ
2022-12-15 17:41:15 +00:00
lioncash
5c8d40d9be
OpcodeDispatcher: Handle VPMOVZXWD
2022-12-15 17:37:51 +00:00
lioncash
b4079cfea3
OpcodeDispatcher: Handle VPMOVZXBQ
2022-12-15 17:32:18 +00:00
lioncash
2b5570a910
OpcodeDispatcher: Handle VPMOVZXBD
2022-12-15 17:28:35 +00:00
lioncash
6bb0c5b24c
OpcodeDispatcher: Handle VPMOVZXBW
2022-12-15 17:18:49 +00:00
lioncash
bc31f98f16
OpcodeDispatcher: Move ExtendVectorElements impl to regular function
...
This can be reused for the AVX versions.
2022-12-15 17:11:02 +00:00
Ryan Houdek
4b891d6147
Merge pull request #2245 from lioncash/split
...
OpcodeDispatcher: Move template impl to regular function where applicable
2022-12-14 18:18:43 -08:00
lioncash
58c3e20bd1
OpcodeDispatcher: Move template impl to regular function where applicable
...
Reduces the amount of code size generated by the specializations.
Only targets ones that are heavily templated like the generic op helper
functions.
2022-12-15 01:54:12 +00:00
Ryan Houdek
d5f3a091d0
Merge pull request #2216 from Sonicadvance1/32bit_host_thunk_support
...
Initial 32-bit host thunk feature support
2022-12-14 12:05:37 -08:00
Ryan Houdek
a14e03f35d
Update guest thunk lib register usage comment
2022-12-14 11:40:33 -08:00
Ryan Houdek
5c1789952e
GuestThunks: Disable stack protector on 32-bit
2022-12-14 11:29:19 -08:00
Ryan Houdek
f5809f24f7
GuestLibs: Fixes accidental guest lib setting
2022-12-14 11:29:19 -08:00
Ryan Houdek
122a9114a3
Thunks: 32-bit host library support
2022-12-14 11:29:19 -08:00
Ryan Houdek
d8f226b460
Support 32-bit thunks ABI
2022-12-14 11:29:19 -08:00
Ryan Houdek
7171c5ae39
Support 32-bit thunksdb
2022-12-14 11:29:19 -08:00
Ryan Houdek
798a78534a
Support Indirect thunk callback with mm0 as custom ABI
2022-12-14 11:24:18 -08:00
Ryan Houdek
ae4a04b560
Fix incorrect THUNK_ABI prefix
2022-12-14 11:24:18 -08:00
Ryan Houdek
1971c8d505
32bit host thunk lib config path support
2022-12-14 11:24:18 -08:00
Ryan Houdek
1ca356371d
Merge pull request #2242 from lioncash/round
...
OpcodeDispatcher: Handle VROUNDS{D, S}/VROUNDP{D, S}
2022-12-13 23:00:51 -08:00
lioncash
27ea6096a2
OpcodeDispatcher: Handle VROUNDSD
2022-12-14 06:41:36 +00:00
lioncash
2244dd9847
OpcodeDispatcher: Handle VROUNDSS
2022-12-14 06:34:58 +00:00
lioncash
ca2f4bd468
OpcodeDispatcher: Handle VROUNDPD
2022-12-14 06:28:17 +00:00