FEX-Emu/FEX - FEX - Gitea: Git with a cup of tea

mirror of https://github.com/FEX-Emu/FEX.git synced 2025-02-12 18:39:18 +00:00

Author	SHA1	Message	Date
Ryan Houdek	ec3e7ceeb5	IR: Removes implicit sized EXTR	2023-08-28 05:02:00 -07:00
Ryan Houdek	c8c8ddbd4f	IR: Removes implicit sized PDEP/PEXT	2023-08-28 05:02:00 -07:00
Ryan Houdek	35013bda37	IR: Adds helper to convert between an integer size and IR::OpSize This is a nop operation and will get optimized away in release builds.	2023-08-28 05:02:00 -07:00
Ryan Houdek	62a9a075b7	Arm64: Leave a comment that 32-bit division shouldn't leave garbage in upper 64-bits	2023-08-28 05:02:00 -07:00
Ryan Houdek	ea6d068cc5	IR: Removes implicit sized LDIV/LREM	2023-08-28 05:02:00 -07:00
Ryan Houdek	5013473ec0	IR: Removes implicit sized LUDIV/LUREM	2023-08-28 05:02:00 -07:00
Ryan Houdek	6f2b3e76ac	Merge pull request #3013 from Sonicadvance1/32bit_sra IR/Passes/RA: Enable SRA for 32-bit GPRs	2023-08-27 21:30:39 -07:00
Ryan Houdek	1d7c280367	Merge pull request #3012 from Sonicadvance1/optimize_movmskps OpcodeDispatcher: Optimizes SSE movmaskps	2023-08-27 21:29:04 -07:00
Ryan Houdek	514a8223d9	OpcodeDispatcher: Optimizes SSE movmaskps This now improves the instruction implementation from 17 instructions down to 5 or 6 depending on if the host supports SVE. I would say this is now optimal.	2023-08-27 21:07:20 -07:00
Ryan Houdek	8d110738ac	IR: Add option to disable vector shift range clamping The range check and clamping is necessary in the cases of passing x86 shift amounts directly through VUSHL/VSSHR. Some AVX operations are still using these with range clamping. A future investigation task should be the check if they can be switched over to the wide variants that we implemented for the SSE instructions. When consuming our own controlled data, we don't want the range clamping to be enabled.	2023-08-27 21:07:20 -07:00
Ryan Houdek	e4bb0df486	IR: Convert all Move+Atomic+ALU ops from implicit to explicit size The number of times the implicit size calculation in GPR operations has bit us is immeasurable and was a mistake from the start of the project. The vector based operations never had this problem since they were explicitly sized for a long time now. This converts the base IR operations to be explicitly sized, but adds implicit sized helpers for the moment while we work on removing implicit usage from the OpcodeDispatcher. Should be NFC at this moment but it is a big enough change that I want it in before the "real" work starts.	2023-08-27 01:35:08 -07:00
Ryan Houdek	7146691360	IR/Passes/RA: Enable SRA for 32-bit GPRs Noticed that we hadn't ever enabled this, which was a concern when our GPR operations weren't as strict about leaving garbage in the upper bits when operating as a 32-bit operation. Now that our ALU operations are more strict about enforcing upper bit zeroing we can enable this. This causes Half-Life: Source FPS to get to > 200FPS finally. Causes significant performance improvements for 32-bit games because we're no longer redundantly moving registers before and after every operation. Causing a bunch of 3-4 instruction sequences to convert to 1.	2023-08-26 18:22:50 -07:00
Ryan Houdek	572d6cd3e6	OpcodeDispatcher: Fixes ADC and SBB	2023-08-26 18:22:50 -07:00
Ryan Houdek	fcc37bf6a8	OpcodeDispatcher: Fixes RCR and ADOX 32-bit Automatic size inheritance was breaking these operations.	2023-08-26 18:22:50 -07:00
Ryan Houdek	8f7925d06f	Arm64: Simple typo fix	2023-08-26 18:22:50 -07:00
Ryan Houdek	eace648fa9	OpcodeDispatcher: Fixes bug in UMUL This was trying to operating on a 32-bit value but BFE the upper 32-bits. Actually fixes this so it is operating on the 64-bit multiply result.	2023-08-26 18:22:50 -07:00
Ryan Houdek	a4ac21a4e4	OpcodeDispatcher: Fixes bug in GetRFLAG with CachedNZCV This was only operating at byte size but it was attempting to get bit offsets at greater than operating size. Change operating size over to 64-bit.	2023-08-26 18:22:50 -07:00
Ryan Houdek	a01e69092d	Arm64: Ensure Bfe and Sbfe operate at 32-bit or 64-bit op size For Sbfe at least it ensures the upper bits don't get filled with garbage. Bfe it doesn't change behaviour but best to be correct.	2023-08-26 18:22:50 -07:00
Ryan Houdek	2fde2140ef	Arm64: Ensure assert is testing correct array	2023-08-26 18:22:50 -07:00
Ryan Houdek	e10afefb2b	IR: Fixes RAValidation for 32-bit applications RAValidation was making an assumption that GPR register class would only have up to 16 registers for either SRA or dynamic registers. When running a 32-bit application we allow 17 GPRs to be dynamically allocated, since we can take 8 back from SRA in that case. Just split the two classes in the RAValidation pass since they will never overlap their allocation. Fixes validation in `32Bit_Secondary/15_XX_0.asm` locally that changed behaviour due to tinkering.	2023-08-26 16:18:03 -07:00
Ryan Houdek	9ba46f429e	X8764: Ensure frndint uses host rounding mode This previously used `Round_Nearest` which had a bug on Arm64 that it actually was always using `Round_Host` aka frinti. Ever since 393cea2e8ba47a15a3ce31d07a6088a2ff91653c[1] this has been fixed so that `Round_Nearest` actually uses frintn for neaest. This instruction actually wants to use the host rounding mode. Once issue with this is that x87 and SSE have different rounding mode flags and currently we conflate the two in our JIT. This will need to be fixed in the future. In the meantime this restores behaviour that it actually uses the host rounding mode, which fixes black screen and broken vertices in Grim Fandango Remastered. [1] e89321dc602e35cbb1382b35ac2b35e7e417ef92 for scalar.	2023-08-25 16:04:01 -07:00
Ryan Houdek	a76c2c57b0	OpcodeDispatcher: Optimize PSHUF{LW, HW, D}! This is way more optimal!	2023-08-25 12:59:40 -07:00
Ryan Houdek	7f63d87295	IR: Adds support for new LoadNamedVectorIndexedConstant IR	2023-08-25 12:59:40 -07:00
Mai	bf12f08218	Merge pull request #3002 from Sonicadvance1/optimize_movmaskpd OpcodeDispatcher: Optimize 128-bit movmaskpd	2023-08-25 08:50:28 -04:00
Mai	1f7d138d2a	Merge pull request #3008 from Sonicadvance1/optimize_movddup OpcodeDispatcher: Optimize movddup from register	2023-08-25 08:48:40 -04:00
Mai	f36f07055a	Merge pull request #3007 from Sonicadvance1/optimize_cvtdq2pd OpcodeDispatcher: Optimize cvtdq2pd from register source	2023-08-25 08:47:47 -04:00
Mai	30a1a382c4	Merge pull request #3006 from Sonicadvance1/optimize_movq OpcodeDispatcher: Optimizes movq	2023-08-25 08:46:02 -04:00
Mai	631655dd81	Merge pull request #3005 from Sonicadvance1/nontemporalmoves OpcodeDispatcher: Optimize nontemporal moves	2023-08-25 08:44:48 -04:00
Ryan Houdek	2fbcf2e4a9	OpcodeDispatcher: Optimize movddup from register This is now optimal	2023-08-25 03:44:10 -07:00
Ryan Houdek	00124205e5	OpcodeDispatcher: Optimize cvtdq2pd from register source This is now optimal	2023-08-25 03:39:24 -07:00
Ryan Houdek	81281e2115	OpcodeDispatcher: Optimizes movq Removes a redundant move between registers and makes it optimal. Also removes a couple redundant moves on the avx version.	2023-08-25 03:29:58 -07:00
Ryan Houdek	1cb2b084b3	OpcodeDispatcher: Generate more optimal code for scalar GPR converts 1) In the case that we are converted a GPR, don't zero extend it first. 2) In the case that the scalar comes from memory, load it first in an FPR and converted it in-place. These are now optimal in the case of AFP is unsupported.	2023-08-25 03:19:11 -07:00
Ryan Houdek	189b0da68f	JIT/Int: Add support for scalar conversion as well	2023-08-25 03:19:11 -07:00
Ryan Houdek	f3679a99ec	OpcodeDispatcher: Optimize nontemporal moves These are now optimal.	2023-08-25 03:13:10 -07:00
Ryan Houdek	62156f2152	ARM64JIT: Adds support for scalar cvt	2023-08-25 02:34:30 -07:00
Ryan Houdek	9a54898429	OpcodeDispatcher: Optimize 128-bit movmaskpd I'd consider this optimal now. Thanks to @dougallj for the optimization idea again!	2023-08-24 17:27:54 -07:00
Ryan Houdek	80d871fb18	Merge pull request #3001 from Sonicadvance1/optimize_cvtps2pd OpcodeDispatcher: Optimize cvtps2pd	2023-08-24 16:09:12 -07:00
Ryan Houdek	3731e6d88b	OpcodeDispatcher: Optimize cvtps2pd SSE version is now optimal and AVX version gets rid of a redundant move.	2023-08-24 15:55:11 -07:00
Ryan Houdek	c441b238c7	OpcodeDispatcher: Optimize MMX conversion operation These instructions are now optimal	2023-08-24 15:46:19 -07:00
Ryan Houdek	72ce7ddf2d	Arm64: Optimize CVT operations for 64-bit variants Using 128-bit converts for 64-bit versions cuts their throughput in half on Cortex. Ensure we use the 64-bit version when possible.	2023-08-24 15:45:07 -07:00
Ryan Houdek	a1210f892a	OpcodeDispatcher: Optimize addsubp{s,d} using fcadd This extension was added with seemingly Cortex-A710 and turns this instruction in to two instructions which is quite good. Needs #2994 merged first. Huge thanks to @dougallj for the optimization idea!	2023-08-24 15:00:41 -07:00
Ryan Houdek	ba01eac467	IR: Adds support for ARM's FCMA FCADD instruction	2023-08-24 15:00:41 -07:00
Ryan Houdek	c5d147322f	HostFeatures: Adds support for FCMA	2023-08-24 15:00:41 -07:00
Ryan Houdek	565b30e15e	OpcodeDispatcher: Cache named vector constants in the block If the named constant of that size gets used multiple times then just use the previous value if it was in scope. Makes addsubp{s,d} and phminposuw more optimal for each that are in a block. Needs #2993 merged first.	2023-08-24 14:46:37 -07:00
Ryan Houdek	f300196d90	OpcodeDispatcher: Optimize AddSubP{S,D} Use a named constant for loading the sign inversion, then EOR the second source and just FAdd it all. In a vacuum it isn't a significant improvement, but as soon as more than one instruction is in a block it will eventually get optimized with named constant caching and be a significant win. Thanks to @rygorous for the idea!	2023-08-23 20:32:51 -07:00
Lioncache	42ccc18606	x86_64/MemoryOps: Fix mislabeled IR op messages	2023-08-23 22:54:36 -04:00
Mai	66c6f96120	Merge pull request #2990 from Sonicadvance1/optimize_pmulh OpcodeDispatcher: Optimize PMULH{U,}W using new IR operations	2023-08-23 22:06:14 -04:00
Ryan Houdek	77b6d854b9	OpcodeDispatcher: Optimize PMULH{U,}W using new IR operations SSE implementations are now optimal. SVE-128bit operation makes it more optimal.	2023-08-23 18:38:05 -07:00
Ryan Houdek	05b9651279	IR: Implements new vector multiply returning high bits SVE implemented a new instruction that does this explicitly, so we should support it directly.	2023-08-23 18:38:05 -07:00
Lioncache	26c81224ac	OpcodeDispatcher: Remove redundant moves from AESIMC Zero-extension will occur automatically upon storing if necessary. We can also join the SSE and AVX implementations together.	2023-08-23 21:34:37 -04:00

... 22 23 24 25 26 ...

1357 Commits