Ryan Houdek
2501ebc1cd
Merge pull request #2980 from lioncash/avg
...
Arm64/VectorOps: Remove redundant moves from SVE VURAvg if possible
2023-08-23 14:05:51 -07:00
Lioncache
8431ab43a0
Arm64/VectorOps: Remove redundant moves from SVE VURAvg if possible
...
If Dst and Vector1 alias one another, then we can perform the operation
without needing to move any data around.
2023-08-23 16:51:17 -04:00
Mai
4b06069c0d
Merge pull request #2972 from Sonicadvance1/optimize_scalar_mov
...
OpcodeDispatcher: Optimizes scalar movd/movq
2023-08-23 16:30:45 -04:00
Ryan Houdek
b646f4b781
Merge pull request #2978 from lioncash/misc
...
OpcodeDispatcher: Remove redundant moves from remaining AVX ops
2023-08-23 13:18:25 -07:00
Ryan Houdek
8836ab8988
OpcodeDispatcher: Optimizes scalar movd/movq
...
MMX and SSE versions are now optimal.
2023-08-23 12:56:11 -07:00
Lioncache
ea9747289a
OpcodeDispatcher: Remove redundant moves from remaining AVX ops
...
Zero-extension will occur automatically if necessary upon storing.
2023-08-23 15:31:59 -04:00
Lioncache
735e2060a3
OpcodeDispatcher: Remove redundant moves from VPACKUSOP/VPACKSSOp
...
Zero-extension will occur automatically if necessary.
2023-08-23 15:09:57 -04:00
Ryan Houdek
86ef6fe48d
Merge pull request #2976 from lioncash/mov
...
OpcodeDispatcher: Remove unnecessary moves from AVX move ops where applicable
2023-08-23 12:02:05 -07:00
Lioncache
8e7e91d61f
OpcodeDispatcher: Remove redundant moves in VMOVLPOp
...
Zero-extension will automatically occur upon storing if necessary.
2023-08-23 14:23:57 -04:00
Lioncache
bcba3700c8
OpcodeDispatcher: Remove redundant moves from VMOVVectorNTOp
...
Zero-extension will automatically occur if necessary upon storing.
We can also join the SSE and AVX implementations.
2023-08-23 14:17:27 -04:00
Lioncache
7d05797e82
OpcodeDispatcher: Remove redundant moves from VMOVHPOp
...
Zero-extension will automatically occur if necessary upon storing.
2023-08-23 14:14:35 -04:00
Lioncache
c409ea78bc
OpcodeDispatcher: Remove unnecessary moves from VMOV{A,U}PS/VMOV{A,U}PD
...
Zero-extension will occur automatically upon storing if necessary.
We can also join the SSE and AVX implementations together.
2023-08-23 14:10:49 -04:00
Ryan Houdek
a62ba75ede
Merge pull request #2975 from lioncash/scalar
...
Arm64/ConversionOps: Add scalar support to Vector_FToI
2023-08-23 10:57:38 -07:00
Lioncache
4a7ef3da13
OpcodeDispatcher: Remove unnecessary moves in AVXVectorRound
...
Zero-extension will occur automatically if necessary upon storing.
2023-08-23 13:41:56 -04:00
Lioncache
990b70dcd6
OpcodeDispatcher: Use scalar rounding for scalar round instructions
2023-08-23 13:34:01 -04:00
Lioncache
393cea2e8b
Arm64/ConversionOps: Correct AdvSIMD round-to-nearest Vector_FToI case
...
This was previously using frinti, which uses the host rounding mode, rather
than round to nearest.
2023-08-23 13:27:44 -04:00
Ryan Houdek
6624f50abf
Merge pull request #2974 from lioncash/extend
...
OpcodeDispatcher: Remove unnecessary moves from AVXExtendVectorElements
2023-08-23 10:25:56 -07:00
Ryan Houdek
cd1f401363
Merge pull request #2973 from lioncash/vfcmp
...
OpcodeDispatcher: Remove unnecessary moves in AVXVFCMPOp
2023-08-23 10:25:16 -07:00
Lioncache
e89321dc60
Arm64/ConversionOps: Add scalar support to Vector_FToI
...
This can be used for the scalar conversions instead of always using the
vector variants.
2023-08-23 13:23:07 -04:00
Lioncache
d99bcbf01b
OpcodeDispatcher: Remove unnecessary moves from AVXExtendVectorElements
...
Zero-extension will already occur if necessary upon storing.
Also we can join the AVX and SSE implementations together and get
rid of some template instantiations, now that the only differing
behavior is removed.
2023-08-23 12:47:31 -04:00
Mai
0819338dbf
Merge pull request #2970 from Sonicadvance1/optimize_pminmax
...
Arm64: Optimize VFMin/VFMax
2023-08-23 12:39:37 -04:00
Lioncache
f516aed4b7
OpcodeDispatcher: Remove unnecessary moves in AVXVFCMPOp
...
Zero-extension will already occur if necessary upon storing.
2023-08-23 12:37:45 -04:00
Lioncache
3858e4124b
OpcodeDispatcher: Remove redundant moves in AVX blend special cases
...
Zero-extension will happen if necessary upon storing.
2023-08-23 00:08:33 -04:00
Mai
819fe110da
Merge pull request #2967 from Sonicadvance1/optimize_storeelement
...
OpcodeDispatcher: Optimize MOVHP{S,D}
2023-08-23 00:01:25 -04:00
Ryan Houdek
f0b1030e54
Arm64: Optimize VFMin/VFMax
...
We can be more optimal on the selects. This makes 3DNow! and SSE packed
min/max operations optimal.
2023-08-22 20:59:11 -07:00
Ryan Houdek
adfd6787c0
Merge pull request #2969 from lioncash/insert
...
OpcodeDispatcher: Remove unnecessary moves from AVX inserts
2023-08-22 20:51:55 -07:00
Ryan Houdek
0ee2579a5e
OpcodeDispatcher: Optimize MOVHP{S,D}
...
Loads can turn in to element Loads.
Stores can turn in to element stores.
These four instruction variants are now optimal.
2023-08-22 20:42:24 -07:00
Ryan Houdek
6aa2cab41c
IR: Implement support for vector store element
...
Matches ARM64 ST1 semantics
2023-08-22 20:42:24 -07:00
Mai
bb2f7107cd
Merge pull request #2963 from Sonicadvance1/optimize_loadelement
...
OpcodeDispatcher: Optimize MOVLP{S,D} loads
2023-08-22 23:41:59 -04:00
Lioncache
c33f3ff8df
OpcodeDispatcher: Remove unnecessary moves from AVX inserts
...
We already zero-extend on stores when necessary.
2023-08-22 23:29:40 -04:00
Ryan Houdek
de239cde67
OpcodeDispatcher: Optimize MOVLP{S,D} loads
...
This now uses the new load element IR operation and makes these
instructions optimal.
LRPCPC3 will introduce instructions in the future for TSO emulation to
help these operations, but that doesn't exist today.
2023-08-22 20:15:16 -07:00
Ryan Houdek
5e57ec94cf
IR: Implement support for vector load element
...
Matches Arm64 LD1 semantics.
2023-08-22 20:15:16 -07:00
Lioncache
2f5fae7677
OpcodeDispatcher: Remove unnecessary moves from AVX register shifts
...
Zero-extension will occur automatically when necessary upon storing.
2023-08-22 23:06:53 -04:00
Lioncache
8f8062eb4e
OpcodeDispatcher: Remove redundant moves from AVX immediate shifts
...
These zero-extensions will occur automatically when applicable.
2023-08-22 22:50:10 -04:00
Lioncache
e5f5629ffc
OpcodeDispatcher: Remove unnecessary moves from AVX conversion operations
...
These zero-extensions will already happen automatically if necessary.
2023-08-22 22:20:13 -04:00
Ryan Houdek
ead141fd90
Merge pull request #2962 from lioncash/variable
...
OpcodeDispatcher: Remove unnecessary moves from AVXVariableShiftImpl
2023-08-22 18:52:47 -07:00
Lioncache
2b071e282e
OpcodeDispatcher: Remove unnecessary moves from AVXVariableShiftImpl
...
We already zero-extend on a store if necessary.
2023-08-22 21:20:34 -04:00
Ryan Houdek
14144523f7
Merge pull request #2961 from lioncash/minpos
...
OpcodeDispatcher: Remove unnecessary move from VPHMINPOSUW
2023-08-22 18:19:46 -07:00
Ryan Houdek
1f2c5fc6c6
Merge pull request #2960 from lioncash/index
...
Arm64: Optimize SVE VInsElement
2023-08-22 18:19:13 -07:00
Lioncache
fa17d9fae9
OpcodeDispatcher: Remove unnecessary move from VPHMINPOSUW
...
We already do a zero-extend if necessary in StoreResult.
This also lets us unify both the SSE and AVX handling code.
2023-08-22 20:59:56 -04:00
Lioncache
398a70312e
Arm64: Optimize SVE VInsElement
...
This can be done without storing any data to memory and also
compressing the amount of instructions being used.
Thanks to @dougallj for the optimization suggestions.
2023-08-22 20:36:29 -04:00
Ryan Houdek
d3ed9766e8
X86Tables: Optimize MOVLPD stores
...
Just use the full register size and store the lower bits.
2023-08-22 17:33:34 -07:00
Ryan Houdek
c795d42d21
OpcodeDispatcher: Optimize phminposuw
...
I would now consider the XMM version of this to be optimal.
Thanks to @rygorous for giving the idea for how to optimize this!
2023-08-22 16:29:06 -07:00
Ryan Houdek
bbf9cb9d52
IR: Implements new VRev32 and LoadNamedVectorConstant ops
...
VRev32 matches Arm64 semantics directly.
LoadNamedVectorConstant allows FEX to quickly load "named constants".
This will allow us to have specific hardcoded vector constant values
that we can load with a ldr(State)+ldr(Value) and will be more abused in
the future.
This also allows us to do a very simple optimization in the future where
we can optimize away redundant loads of these loads if they are used
multiple times in the same block. (Not implemented here).
2023-08-22 16:29:06 -07:00
Mai
6c7933e7b1
Merge pull request #2957 from Sonicadvance1/optimize_pfnacc
...
OpcodeDispatcher: Optimize PFNACC
2023-08-22 10:10:11 -04:00
Ryan Houdek
364f084604
Merge pull request #2956 from Sonicadvance1/optimize_hsubp
...
OpcodeDispatcher: Optimize hsubp
2023-08-21 20:47:50 -07:00
Ryan Houdek
ffa8f1e3dc
Merge pull request #2955 from lioncash/sign
...
OpcodeDispatcher: Remove redundant move from VPSIGN
2023-08-21 20:47:41 -07:00
Ryan Houdek
ad6738939b
OpcodeDispatcher: Optimize PFNACC
...
Turns out this can be even more optimal.
2023-08-21 20:38:18 -07:00
Ryan Houdek
dcb3e4ee86
OpcodeDispatcher: Optimize hsubp
...
This makes the SSE version optimal.
This dramatically improves the AVX version as well.
2023-08-21 20:22:14 -07:00
Lioncache
dbbe6288de
OpcodeDispatcher: Remove redundant move from VPSIGN
...
StoreResult will already zero-extend if the vector is 128-bit.
2023-08-21 23:11:07 -04:00