lioncash
f57debeb29
OpcodeDispatcher: Handle VPACKSSWB
2022-12-17 02:42:09 +00:00
lioncash
4ac031df59
OpcodeDispatcher: Move PACKSSOp impl to a regular function
...
We can reuse it with AVX versions.
2022-12-17 02:13:26 +00:00
Ryan Houdek
78b53bfa49
Merge pull request #2266 from lioncash/arith
...
OpcodeDispatcher: Handle vector versions of VPSRA{D, W}
2022-12-16 18:05:05 -08:00
lioncash
c53fb7d697
OpcodeDispatcher: Handle VPSRAD (vector)
2022-12-17 01:51:42 +00:00
lioncash
a1a52450cb
OpcodeDispatcher: Handle VPSRAW (vector)
2022-12-17 01:40:25 +00:00
Ryan Houdek
fabf453046
Merge pull request #2265 from lioncash/pextrw
...
OpcodeDispatcher: Handle remaining PEXTRW opcode
2022-12-16 17:35:33 -08:00
lioncash
68916ae2d9
OpcodeDispatcher: Move PSRAOp implementation to regular function
...
We can reuse this with the AVX variant.
2022-12-17 01:23:02 +00:00
lioncash
bf56b7b2da
OpcodeDispatcher: Handle remaining PEXTRW opcode
2022-12-17 01:14:22 +00:00
Ryan Houdek
905eb015c0
Merge pull request #2264 from lioncash/addsub
...
OpcodeHandler: Handle VADDSUBP{D, S}
2022-12-16 16:54:35 -08:00
lioncash
858f13e76a
OpcodeDispatcher: Handle VADDSUBPD
2022-12-17 00:41:25 +00:00
lioncash
169d7bbf50
OpcodeDispatcher: Handle VADDSUBPS
2022-12-17 00:29:29 +00:00
lioncash
31c8d4acac
OpcodeDispatcher: Factor out ADDSUB impl into regular function
...
We can reuse this with the AVX versions
2022-12-17 00:16:38 +00:00
lioncash
8291e600fa
OpcodeDispatcher: Simplify ADDSUBPOp
...
Rather than looping vectors, we can interleave them together directly
with IR ops.
2022-12-17 00:11:51 +00:00
Ryan Houdek
b26e4109fa
Merge pull request #2263 from lioncash/mull
...
OpcodeDispatcher: Handle VPMULL{D, B}
2022-12-16 15:40:53 -08:00
lioncash
dcc218a168
OpcodeDispatcher: Handle VPMULLD
2022-12-16 23:20:25 +00:00
lioncash
49b9b18b4a
OpcodeDispatcher: Handle VPMULLW
2022-12-16 23:07:06 +00:00
Ryan Houdek
ad3bf189c0
Merge pull request #2262 from lioncash/rlog
...
OpcodeDispatcher: Handle vector variants of VPSRL{D, Q, W}
2022-12-16 14:56:56 -08:00
lioncash
47b21fa758
OpcodeDispatcher: Handle VPSRLQ (vector)
...
Also mark VPMOVMSKB as UNDEC, since it's not implemented yet.
2022-12-16 22:18:40 +00:00
lioncash
b6e82965df
OpcodeDispatcher: Handle VPSRLD (vector)
2022-12-16 22:09:52 +00:00
lioncash
8dc8785340
OpcodeDispatcher: Handle VPSRLW (vector)
2022-12-16 22:00:53 +00:00
lioncash
c710ab60b0
OpcodeDispatcher: Factor out PSRLDOp implementation to regular function
...
This will be used with the AVX variants of the shifts also
2022-12-16 21:43:11 +00:00
Ryan Houdek
c86ba7646c
Merge pull request #2259 from lioncash/pextr
...
OpcodeDispatcher: Handle VPEXTR{B, D, Q, W}/VEXTRACTPS
2022-12-16 11:25:38 -08:00
Ryan Houdek
c1e301a5ed
Merge pull request #2257 from lioncash/limm
...
OpcodeDispatcher: Handle immediate variants of VPSLL{D, Q, W}
2022-12-16 11:23:16 -08:00
Mai
0ebb15c732
Merge pull request #2258 from Sonicadvance1/fixed_syscall_spill
...
Arm64: Inline Syscall spill optimization
2022-12-16 18:48:08 +00:00
lioncash
37c743b616
OpcodeDispatcher: Handle VPSLLQ (immediate)
2022-12-16 18:37:27 +00:00
lioncash
c810ae4018
OpcodeDispatcher: Handle VPSLLD (immediate)
2022-12-16 18:37:27 +00:00
lioncash
d3481c8271
OpcodeDispatcher: Handle VPSLLW (immediate)
2022-12-16 18:37:27 +00:00
lioncash
7c1e152441
OpcodeDispatcher: Extract PSLLI impl to regular function
...
This will be reused for the AVX variants.
2022-12-16 18:37:20 +00:00
lioncash
f11ac8674d
OpcodeDispatcher: Handle VEXTRACTPS
2022-12-16 18:13:55 +00:00
Ryan Houdek
1fecf89bfc
Arm64: Inline Syscall spill optimization
...
This was likely an issue with signals racing to the spill handler, which
we have fixed bugs with over the past few months.
This means we don't need to spill all SRA GPR registers anymore, at most
we need to spill three registers that intersect with syscall arguments.
2022-12-16 10:04:16 -08:00
lioncash
21ad0fa334
OpcodeDispatcher: Handle VPEXTRQ
...
VPEXTRQ uses VEX.W to handle size differencing, since it shares an
encoding spot with VPEXTRD, so we need to handle that a little
differently.
2022-12-16 18:02:24 +00:00
lioncash
3429815103
OpcodeDispatcher: Handle VPEXTRD
2022-12-16 17:33:01 +00:00
lioncash
559ff1582e
OpcodeDispatcher: Handle VPEXTRW
2022-12-16 17:29:16 +00:00
lioncash
2e973ae079
OpcodeDispatcher: Handle VPEXTRB
2022-12-16 14:37:47 +00:00
Mai
1ab4471ef9
Merge pull request #2255 from Sonicadvance1/optimize_sve_spillfill
...
Arm64: Optimize SVE register spilling and filling
2022-12-16 13:19:05 +00:00
Ryan Houdek
40e073c8b2
Arm64: Optimize SVE register spilling and filling
...
Causes the dispatcher to drop from 4476 bytes down to 3900 for
SVE-256bit supporting targets.
This is done by significantly reducing SVE loadstore ops. Going from 8
instructions per 4 registers, down to 2 instructions.
This is done by switching from 1 register loadstore instructions up to 4
register loadstore instructions. Which should significantly improve
performance on future SVE platforms.
Filling and Spilling to the context is still using the old code path
because SVE doesn't offer non-interleaving loadstores.
Spilling and filling on the stack is fine because we don't need to match
context state.
2022-12-16 00:25:50 -08:00
Ryan Houdek
58fab721b3
Merge pull request #2254 from lioncash/logical
...
OpcodeDispatcher: Handle vector variants of VPSLL{D, Q, W}
2022-12-15 22:52:05 -08:00
lioncash
8fac21e43f
OpcodeDispatcher: Handle VPSLLQ (vector)
2022-12-16 06:34:00 +00:00
lioncash
d9a1e97bc1
OpcodeDispatcher: Handle VPSLLD (vector)
2022-12-16 06:34:00 +00:00
lioncash
848f1a2f78
OpcodeDispatcher: Handle VPSLLW (vector)
2022-12-16 06:34:00 +00:00
lioncash
7b8a46d934
OpcodeDispatcher: Move PSLL impl into a regular function
2022-12-16 06:33:58 +00:00
Mai
9a8852f9b6
Merge pull request #2250 from Sonicadvance1/optimize_spilling_filling
...
Arm64: Optimizing spilling and filling
2022-12-16 04:47:22 +00:00
Mai
65e8bf9d72
Merge pull request #2253 from Sonicadvance1/single_page_dispatcher
...
Arm64: Reduce dispatcher to 1 page
2022-12-16 04:44:55 +00:00
Ryan Houdek
344ec33ba5
Merge pull request #2252 from lioncash/fadd
...
Arm64/VectorOps: Simplify FADDP result merging
2022-12-15 20:37:11 -08:00
Ryan Houdek
5dc7dfacb3
Arm64: Reduce dispatcher to 1 page
...
We currently only use 2236 bytes, no need for two pages.
Once #2250 is merged we will use 1716 bytes
2022-12-15 20:33:28 -08:00
lioncash
122aa8a69a
Arm64/VectorOps: Simplify FADDP result merging
...
Keeps the implementation similarly in sync with VAddP.
2022-12-16 04:19:46 +00:00
Ryan Houdek
8ce6c08152
Merge pull request #2251 from lioncash/hadd
...
OpcodeDispatcher: Handle VPHADDW/VPHADDD
2022-12-15 20:11:07 -08:00
Ryan Houdek
1beb791d52
Arm64: Optimizing spilling and filling
...
Just makes these a little more optimal when jumping out of the JIT.
Noticed these while working on the new emitter.
2022-12-15 20:04:16 -08:00
lioncash
27c0d4a9f5
OpcodeDispatcher: Handle VPHADDD
2022-12-16 03:28:57 +00:00
lioncash
dd4ba7562f
OpcodeDispatcher: Handle VPHADDW
2022-12-16 03:28:57 +00:00