FEX-Emu/FEX - FEX - Gitea: Git with a cup of tea

mirror of https://github.com/FEX-Emu/FEX.git synced 2025-02-13 03:02:47 +00:00

Author	SHA1	Message	Date
Ryan Houdek	2501ebc1cd	Merge pull request #2980 from lioncash/avg Arm64/VectorOps: Remove redundant moves from SVE VURAvg if possible	2023-08-23 14:05:51 -07:00
Lioncache	8431ab43a0	Arm64/VectorOps: Remove redundant moves from SVE VURAvg if possible If Dst and Vector1 alias one another, then we can perform the operation without needing to move any data around.	2023-08-23 16:51:17 -04:00
Mai	4b06069c0d	Merge pull request #2972 from Sonicadvance1/optimize_scalar_mov OpcodeDispatcher: Optimizes scalar movd/movq	2023-08-23 16:30:45 -04:00
Ryan Houdek	b646f4b781	Merge pull request #2978 from lioncash/misc OpcodeDispatcher: Remove redundant moves from remaining AVX ops	2023-08-23 13:18:25 -07:00
Ryan Houdek	8836ab8988	OpcodeDispatcher: Optimizes scalar movd/movq MMX and SSE versions are now optimal.	2023-08-23 12:56:11 -07:00
Lioncache	ea9747289a	OpcodeDispatcher: Remove redundant moves from remaining AVX ops Zero-extension will occur automatically if necessary upon storing.	2023-08-23 15:31:59 -04:00
Lioncache	735e2060a3	OpcodeDispatcher: Remove redundant moves from VPACKUSOP/VPACKSSOp Zero-extension will occur automatically if necessary.	2023-08-23 15:09:57 -04:00
Ryan Houdek	86ef6fe48d	Merge pull request #2976 from lioncash/mov OpcodeDispatcher: Remove unnecessary moves from AVX move ops where applicable	2023-08-23 12:02:05 -07:00
Lioncache	8e7e91d61f	OpcodeDispatcher: Remove redundant moves in VMOVLPOp Zero-extension will automatically occur upon storing if necessary.	2023-08-23 14:23:57 -04:00
Lioncache	bcba3700c8	OpcodeDispatcher: Remove redundant moves from VMOVVectorNTOp Zero-extension will automatically occur if necessary upon storing. We can also join the SSE and AVX implementations.	2023-08-23 14:17:27 -04:00
Lioncache	7d05797e82	OpcodeDispatcher: Remove redundant moves from VMOVHPOp Zero-extension will automatically occur if necessary upon storing.	2023-08-23 14:14:35 -04:00
Lioncache	c409ea78bc	OpcodeDispatcher: Remove unnecessary moves from VMOV{A,U}PS/VMOV{A,U}PD Zero-extension will occur automatically upon storing if necessary. We can also join the SSE and AVX implementations together.	2023-08-23 14:10:49 -04:00
Ryan Houdek	a62ba75ede	Merge pull request #2975 from lioncash/scalar Arm64/ConversionOps: Add scalar support to Vector_FToI	2023-08-23 10:57:38 -07:00
Lioncache	4a7ef3da13	OpcodeDispatcher: Remove unnecessary moves in AVXVectorRound Zero-extension will occur automatically if necessary upon storing.	2023-08-23 13:41:56 -04:00
Lioncache	990b70dcd6	OpcodeDispatcher: Use scalar rounding for scalar round instructions	2023-08-23 13:34:01 -04:00
Lioncache	393cea2e8b	Arm64/ConversionOps: Correct AdvSIMD round-to-nearest Vector_FToI case This was previously using frinti, which uses the host rounding mode, rather than round to nearest.	2023-08-23 13:27:44 -04:00
Ryan Houdek	6624f50abf	Merge pull request #2974 from lioncash/extend OpcodeDispatcher: Remove unnecessary moves from AVXExtendVectorElements	2023-08-23 10:25:56 -07:00
Ryan Houdek	cd1f401363	Merge pull request #2973 from lioncash/vfcmp OpcodeDispatcher: Remove unnecessary moves in AVXVFCMPOp	2023-08-23 10:25:16 -07:00
Lioncache	e89321dc60	Arm64/ConversionOps: Add scalar support to Vector_FToI This can be used for the scalar conversions instead of always using the vector variants.	2023-08-23 13:23:07 -04:00
Lioncache	d99bcbf01b	OpcodeDispatcher: Remove unnecessary moves from AVXExtendVectorElements Zero-extension will already occur if necessary upon storing. Also we can join the AVX and SSE implementations together and get rid of some template instantiations, now that the only differing behavior is removed.	2023-08-23 12:47:31 -04:00
Mai	0819338dbf	Merge pull request #2970 from Sonicadvance1/optimize_pminmax Arm64: Optimize VFMin/VFMax	2023-08-23 12:39:37 -04:00
Lioncache	f516aed4b7	OpcodeDispatcher: Remove unnecessary moves in AVXVFCMPOp Zero-extension will already occur if necessary upon storing.	2023-08-23 12:37:45 -04:00
Lioncache	3858e4124b	OpcodeDispatcher: Remove redundant moves in AVX blend special cases Zero-extension will happen if necessary upon storing.	2023-08-23 00:08:33 -04:00
Mai	819fe110da	Merge pull request #2967 from Sonicadvance1/optimize_storeelement OpcodeDispatcher: Optimize MOVHP{S,D}	2023-08-23 00:01:25 -04:00
Ryan Houdek	f0b1030e54	Arm64: Optimize VFMin/VFMax We can be more optimal on the selects. This makes 3DNow! and SSE packed min/max operations optimal.	2023-08-22 20:59:11 -07:00
Ryan Houdek	adfd6787c0	Merge pull request #2969 from lioncash/insert OpcodeDispatcher: Remove unnecessary moves from AVX inserts	2023-08-22 20:51:55 -07:00
Ryan Houdek	0ee2579a5e	OpcodeDispatcher: Optimize MOVHP{S,D} Loads can turn in to element Loads. Stores can turn in to element stores. These four instruction variants are now optimal.	2023-08-22 20:42:24 -07:00
Ryan Houdek	6aa2cab41c	IR: Implement support for vector store element Matches ARM64 ST1 semantics	2023-08-22 20:42:24 -07:00
Mai	bb2f7107cd	Merge pull request #2963 from Sonicadvance1/optimize_loadelement OpcodeDispatcher: Optimize MOVLP{S,D} loads	2023-08-22 23:41:59 -04:00
Lioncache	c33f3ff8df	OpcodeDispatcher: Remove unnecessary moves from AVX inserts We already zero-extend on stores when necessary.	2023-08-22 23:29:40 -04:00
Ryan Houdek	de239cde67	OpcodeDispatcher: Optimize MOVLP{S,D} loads This now uses the new load element IR operation and makes these instructions optimal. LRPCPC3 will introduce instructions in the future for TSO emulation to help these operations, but that doesn't exist today.	2023-08-22 20:15:16 -07:00
Ryan Houdek	5e57ec94cf	IR: Implement support for vector load element Matches Arm64 LD1 semantics.	2023-08-22 20:15:16 -07:00
Lioncache	2f5fae7677	OpcodeDispatcher: Remove unnecessary moves from AVX register shifts Zero-extension will occur automatically when necessary upon storing.	2023-08-22 23:06:53 -04:00
Lioncache	8f8062eb4e	OpcodeDispatcher: Remove redundant moves from AVX immediate shifts These zero-extensions will occur automatically when applicable.	2023-08-22 22:50:10 -04:00
Lioncache	e5f5629ffc	OpcodeDispatcher: Remove unnecessary moves from AVX conversion operations These zero-extensions will already happen automatically if necessary.	2023-08-22 22:20:13 -04:00
Ryan Houdek	ead141fd90	Merge pull request #2962 from lioncash/variable OpcodeDispatcher: Remove unnecessary moves from AVXVariableShiftImpl	2023-08-22 18:52:47 -07:00
Lioncache	2b071e282e	OpcodeDispatcher: Remove unnecessary moves from AVXVariableShiftImpl We already zero-extend on a store if necessary.	2023-08-22 21:20:34 -04:00
Ryan Houdek	14144523f7	Merge pull request #2961 from lioncash/minpos OpcodeDispatcher: Remove unnecessary move from VPHMINPOSUW	2023-08-22 18:19:46 -07:00
Ryan Houdek	1f2c5fc6c6	Merge pull request #2960 from lioncash/index Arm64: Optimize SVE VInsElement	2023-08-22 18:19:13 -07:00
Lioncache	fa17d9fae9	OpcodeDispatcher: Remove unnecessary move from VPHMINPOSUW We already do a zero-extend if necessary in StoreResult. This also lets us unify both the SSE and AVX handling code.	2023-08-22 20:59:56 -04:00
Lioncache	398a70312e	Arm64: Optimize SVE VInsElement This can be done without storing any data to memory and also compressing the amount of instructions being used. Thanks to @dougallj for the optimization suggestions.	2023-08-22 20:36:29 -04:00
Ryan Houdek	d3ed9766e8	X86Tables: Optimize MOVLPD stores Just use the full register size and store the lower bits.	2023-08-22 17:33:34 -07:00
Ryan Houdek	c795d42d21	OpcodeDispatcher: Optimize phminposuw I would now consider the XMM version of this to be optimal. Thanks to @rygorous for giving the idea for how to optimize this!	2023-08-22 16:29:06 -07:00
Ryan Houdek	bbf9cb9d52	IR: Implements new VRev32 and LoadNamedVectorConstant ops VRev32 matches Arm64 semantics directly. LoadNamedVectorConstant allows FEX to quickly load "named constants". This will allow us to have specific hardcoded vector constant values that we can load with a ldr(State)+ldr(Value) and will be more abused in the future. This also allows us to do a very simple optimization in the future where we can optimize away redundant loads of these loads if they are used multiple times in the same block. (Not implemented here).	2023-08-22 16:29:06 -07:00
Mai	6c7933e7b1	Merge pull request #2957 from Sonicadvance1/optimize_pfnacc OpcodeDispatcher: Optimize PFNACC	2023-08-22 10:10:11 -04:00
Ryan Houdek	364f084604	Merge pull request #2956 from Sonicadvance1/optimize_hsubp OpcodeDispatcher: Optimize hsubp	2023-08-21 20:47:50 -07:00
Ryan Houdek	ffa8f1e3dc	Merge pull request #2955 from lioncash/sign OpcodeDispatcher: Remove redundant move from VPSIGN	2023-08-21 20:47:41 -07:00
Ryan Houdek	ad6738939b	OpcodeDispatcher: Optimize PFNACC Turns out this can be even more optimal.	2023-08-21 20:38:18 -07:00
Ryan Houdek	dcb3e4ee86	OpcodeDispatcher: Optimize hsubp This makes the SSE version optimal. This dramatically improves the AVX version as well.	2023-08-21 20:22:14 -07:00
Lioncache	dbbe6288de	OpcodeDispatcher: Remove redundant move from VPSIGN StoreResult will already zero-extend if the vector is 128-bit.	2023-08-21 23:11:07 -04:00

1 2 3

121 Commits