Commit Graph

5456 Commits

Author SHA1 Message Date
lioncash
f57debeb29 OpcodeDispatcher: Handle VPACKSSWB 2022-12-17 02:42:09 +00:00
lioncash
4ac031df59 OpcodeDispatcher: Move PACKSSOp impl to a regular function
We can reuse it with AVX versions.
2022-12-17 02:13:26 +00:00
Ryan Houdek
78b53bfa49
Merge pull request #2266 from lioncash/arith
OpcodeDispatcher: Handle vector versions of VPSRA{D, W}
2022-12-16 18:05:05 -08:00
lioncash
c53fb7d697 OpcodeDispatcher: Handle VPSRAD (vector) 2022-12-17 01:51:42 +00:00
lioncash
a1a52450cb OpcodeDispatcher: Handle VPSRAW (vector) 2022-12-17 01:40:25 +00:00
Ryan Houdek
fabf453046
Merge pull request #2265 from lioncash/pextrw
OpcodeDispatcher: Handle remaining PEXTRW opcode
2022-12-16 17:35:33 -08:00
lioncash
68916ae2d9 OpcodeDispatcher: Move PSRAOp implementation to regular function
We can reuse this with the AVX variant.
2022-12-17 01:23:02 +00:00
lioncash
bf56b7b2da OpcodeDispatcher: Handle remaining PEXTRW opcode 2022-12-17 01:14:22 +00:00
Ryan Houdek
905eb015c0
Merge pull request #2264 from lioncash/addsub
OpcodeHandler: Handle VADDSUBP{D, S}
2022-12-16 16:54:35 -08:00
lioncash
858f13e76a OpcodeDispatcher: Handle VADDSUBPD 2022-12-17 00:41:25 +00:00
lioncash
169d7bbf50 OpcodeDispatcher: Handle VADDSUBPS 2022-12-17 00:29:29 +00:00
lioncash
31c8d4acac OpcodeDispatcher: Factor out ADDSUB impl into regular function
We can reuse this with the AVX versions
2022-12-17 00:16:38 +00:00
lioncash
8291e600fa OpcodeDispatcher: Simplify ADDSUBPOp
Rather than looping vectors, we can interleave them together directly
with IR ops.
2022-12-17 00:11:51 +00:00
Ryan Houdek
b26e4109fa
Merge pull request #2263 from lioncash/mull
OpcodeDispatcher: Handle VPMULL{D, B}
2022-12-16 15:40:53 -08:00
lioncash
dcc218a168 OpcodeDispatcher: Handle VPMULLD 2022-12-16 23:20:25 +00:00
lioncash
49b9b18b4a OpcodeDispatcher: Handle VPMULLW 2022-12-16 23:07:06 +00:00
Ryan Houdek
ad3bf189c0
Merge pull request #2262 from lioncash/rlog
OpcodeDispatcher: Handle vector variants of VPSRL{D, Q, W}
2022-12-16 14:56:56 -08:00
lioncash
47b21fa758 OpcodeDispatcher: Handle VPSRLQ (vector)
Also mark VPMOVMSKB as UNDEC, since it's not implemented yet.
2022-12-16 22:18:40 +00:00
lioncash
b6e82965df OpcodeDispatcher: Handle VPSRLD (vector) 2022-12-16 22:09:52 +00:00
lioncash
8dc8785340 OpcodeDispatcher: Handle VPSRLW (vector) 2022-12-16 22:00:53 +00:00
lioncash
c710ab60b0 OpcodeDispatcher: Factor out PSRLDOp implementation to regular function
This will be used with the AVX variants of the shifts also
2022-12-16 21:43:11 +00:00
Ryan Houdek
c86ba7646c
Merge pull request #2259 from lioncash/pextr
OpcodeDispatcher: Handle VPEXTR{B, D, Q, W}/VEXTRACTPS
2022-12-16 11:25:38 -08:00
Ryan Houdek
c1e301a5ed Merge pull request #2257 from lioncash/limm
OpcodeDispatcher: Handle immediate variants of VPSLL{D, Q, W}
2022-12-16 11:23:16 -08:00
Mai
0ebb15c732
Merge pull request #2258 from Sonicadvance1/fixed_syscall_spill
Arm64: Inline Syscall spill optimization
2022-12-16 18:48:08 +00:00
lioncash
37c743b616 OpcodeDispatcher: Handle VPSLLQ (immediate) 2022-12-16 18:37:27 +00:00
lioncash
c810ae4018 OpcodeDispatcher: Handle VPSLLD (immediate) 2022-12-16 18:37:27 +00:00
lioncash
d3481c8271 OpcodeDispatcher: Handle VPSLLW (immediate) 2022-12-16 18:37:27 +00:00
lioncash
7c1e152441 OpcodeDispatcher: Extract PSLLI impl to regular function
This will be reused for the AVX variants.
2022-12-16 18:37:20 +00:00
lioncash
f11ac8674d OpcodeDispatcher: Handle VEXTRACTPS 2022-12-16 18:13:55 +00:00
Ryan Houdek
1fecf89bfc Arm64: Inline Syscall spill optimization
This was likely an issue with signals racing to the spill handler, which
we have fixed bugs with over the past few months.

This means we don't need to spill all SRA GPR registers anymore, at most
we need to spill three registers that intersect with syscall arguments.
2022-12-16 10:04:16 -08:00
lioncash
21ad0fa334 OpcodeDispatcher: Handle VPEXTRQ
VPEXTRQ uses VEX.W to handle size differencing, since it shares an
encoding spot with VPEXTRD, so we need to handle that a little
differently.
2022-12-16 18:02:24 +00:00
lioncash
3429815103 OpcodeDispatcher: Handle VPEXTRD 2022-12-16 17:33:01 +00:00
lioncash
559ff1582e OpcodeDispatcher: Handle VPEXTRW 2022-12-16 17:29:16 +00:00
lioncash
2e973ae079 OpcodeDispatcher: Handle VPEXTRB 2022-12-16 14:37:47 +00:00
Mai
1ab4471ef9
Merge pull request #2255 from Sonicadvance1/optimize_sve_spillfill
Arm64: Optimize SVE register spilling and filling
2022-12-16 13:19:05 +00:00
Ryan Houdek
40e073c8b2 Arm64: Optimize SVE register spilling and filling
Causes the dispatcher to drop from 4476 bytes down to 3900 for
SVE-256bit supporting targets.

This is done by significantly reducing SVE loadstore ops. Going from 8
instructions per 4 registers, down to 2 instructions.

This is done by switching from 1 register loadstore instructions up to 4
register loadstore instructions. Which should significantly improve
performance on future SVE platforms.

Filling and Spilling to the context is still using the old code path
because SVE doesn't offer non-interleaving loadstores.
Spilling and filling on the stack is fine because we don't need to match
context state.
2022-12-16 00:25:50 -08:00
Ryan Houdek
58fab721b3
Merge pull request #2254 from lioncash/logical
OpcodeDispatcher: Handle vector variants of VPSLL{D, Q, W}
2022-12-15 22:52:05 -08:00
lioncash
8fac21e43f OpcodeDispatcher: Handle VPSLLQ (vector) 2022-12-16 06:34:00 +00:00
lioncash
d9a1e97bc1 OpcodeDispatcher: Handle VPSLLD (vector) 2022-12-16 06:34:00 +00:00
lioncash
848f1a2f78 OpcodeDispatcher: Handle VPSLLW (vector) 2022-12-16 06:34:00 +00:00
lioncash
7b8a46d934 OpcodeDispatcher: Move PSLL impl into a regular function 2022-12-16 06:33:58 +00:00
Mai
9a8852f9b6
Merge pull request #2250 from Sonicadvance1/optimize_spilling_filling
Arm64: Optimizing spilling and filling
2022-12-16 04:47:22 +00:00
Mai
65e8bf9d72
Merge pull request #2253 from Sonicadvance1/single_page_dispatcher
Arm64: Reduce dispatcher to 1 page
2022-12-16 04:44:55 +00:00
Ryan Houdek
344ec33ba5
Merge pull request #2252 from lioncash/fadd
Arm64/VectorOps: Simplify FADDP result merging
2022-12-15 20:37:11 -08:00
Ryan Houdek
5dc7dfacb3 Arm64: Reduce dispatcher to 1 page
We currently only use 2236 bytes, no need for two pages.
Once #2250 is merged we will use 1716 bytes
2022-12-15 20:33:28 -08:00
lioncash
122aa8a69a Arm64/VectorOps: Simplify FADDP result merging
Keeps the implementation similarly in sync with VAddP.
2022-12-16 04:19:46 +00:00
Ryan Houdek
8ce6c08152
Merge pull request #2251 from lioncash/hadd
OpcodeDispatcher: Handle VPHADDW/VPHADDD
2022-12-15 20:11:07 -08:00
Ryan Houdek
1beb791d52 Arm64: Optimizing spilling and filling
Just makes these a little more optimal when jumping out of the JIT.

Noticed these while working on the new emitter.
2022-12-15 20:04:16 -08:00
lioncash
27c0d4a9f5 OpcodeDispatcher: Handle VPHADDD 2022-12-16 03:28:57 +00:00
lioncash
dd4ba7562f OpcodeDispatcher: Handle VPHADDW 2022-12-16 03:28:57 +00:00