FEX-Emu/FEX - FEX - Gitea: Git with a cup of tea

mirror of https://github.com/FEX-Emu/FEX.git synced 2025-01-23 06:56:31 +00:00

Author	SHA1	Message	Date
Ryan Houdek	87fbcf754d	Vector: Optimize pblendw Using a brute force solver to add in more optimized code paths - Adds 12 single VInsElement implementations - Adds 4 two IR operation implementations Not adding any of the two or three IR operation implementations that use VInsElement because SRA interacts badly and becomes worse than the VTBX implementation.	2024-07-27 19:25:51 -07:00
Ryan Houdek	f8ef6feff9	AVX128: Optimize blends Optimizes the AVX128 blends by reusing the prior SSE4.1 implementation. Only difference is the destination register isn't reused as a source register. One confusing thing is that Felix Cloutier's documentation has a typo on the 256-bit VPBLENDW instruction where it had the top 128-bit lane reusing the destination instead of sources. So I wrote a unittest to ensure correctness. Fixes #3796	2024-07-23 19:24:19 -07:00
Ryan Houdek	3c5b59d985	AVX128: Implement support for scalar FMA with AFP Now that I have AFP supporting hardware I felt better implementing this since I can run unit tests. Fixes #3793	2024-07-22 12:58:19 -07:00
Alyssa Rosenzweig	587b924de9	json_ir_generator: stop prefixing arguments stop prefixing the arguments when we generate allocate ops (in particular), this is more convenient and simpler. in exchange we need to prefix Op to avoid a collision on fcmpscalarinsert which has an argument named Op, but that's a local change at least. came up when experimenting with new IR, but I think this is probably a win by itself. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-07-22 13:50:21 -04:00
Paulo Matos	a1378f94ce	X87 Code Refactoring and Optimization Pass	2024-07-22 08:44:45 +02:00
Alyssa Rosenzweig	610caf8529	ConstProp: treat StoreContext as zeroable todo: FPR equivalent. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-07-21 15:49:09 -04:00
Alyssa Rosenzweig	d20b46e46f	IR: drop LoadFlag/StoreFlag ops pointless, we can just load/store the context now. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-07-21 15:49:09 -04:00
Alyssa Rosenzweig	4094aa1b9a	DeadStoreElimination: drop flag handling now that we do everything via NZCV, this is mostly vestigial. DF/x87 flags are sufficiently rare to be "don't care"s here, and we don't even have multiblock enabled yet! Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-07-21 15:49:08 -04:00
Ryan Houdek	f8c6baae97	Merge pull request #3883 from Sonicadvance1/implement_daz Arm64: Implements support for DAZ using AFP.FIZ	2024-07-21 10:03:34 -07:00
Ryan Houdek	f7b4d25803	Telemetry: Remove VEX flag This is no longer necessary and it also no longer provides us any useful information. Since we expose the AVX CPUID flag, basically everything uses VEX encoding now, so it is basically always set.	2024-07-20 17:24:00 -07:00
Ryan Houdek	95b15d788b	Arm64: Fix filling static registers Some locations could end up with SRA registers that only spilled one register. Allow passing in temporaries from the call site. Fixes rpid and syscalls asserting.	2024-07-20 15:57:01 -07:00
Ryan Houdek	b78da2e5ad	Arm64: Implements support for DAZ using AFP.FIZ When AFP is supported then we can actually support DAZ. This might also fix the audio corruption in Animal Well but I can't test it until Steam is running on Oryon. Requires a bit of plumbing for MXCSR which we were hacking around before but now we actually want to store the value. Fixes #3856	2024-07-20 15:34:54 -07:00
Ryan Houdek	1c35eeffeb	Vector: Optimize PSHUFD with brute force search With a brute force search of methods between 1-3 instructions we cover a lot more cases more optimally. There's definitely still more cases (and probably some that can reduce from 3 instruction to 2), but covering 44 cases is a pretty good margin already.	2024-07-18 04:10:58 -07:00
Ryan Houdek	b0bd8a62a2	AVX128: Improve VPERMILPS/PD and VPSHUFD VPSHUFD and VPERMILPS are aliases of each other. Reuses the implementation path from the PSHUFD implementation which has a few swizzles and then a table lookup. VPERMILPD is a very simple swizzle per 128-bit lane. Fixes #3797 Fixes #3784	2024-07-18 04:10:58 -07:00
Ryan Houdek	da51169ba9	Merge pull request #3875 from alyssarosenzweig/ir/gethostflag IR: garbage collect premature F80Cmp optimizations	2024-07-17 03:05:48 -07:00
Ryan Houdek	f72cee480f	Merge pull request #3874 from alyssarosenzweig/opt/reconstructftw X87: save uop in ReconstructFTW	2024-07-17 03:05:37 -07:00
Alyssa Rosenzweig	e7d5a01c5f	IR: remove F80Cmp flags nothing is optimizing around this, it's just adding pointless complexity. if we want to actually optimize F80Cmp, the right way would be to lift the implementation into the OpcodeDispatcher or JIT. it wouldn't be terribly difficult. This kludge doesn't get us closer there. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-07-16 14:53:58 -04:00
Alyssa Rosenzweig	0c3a8d0bc8	IR: remove GetHostFlag it doesn't get host flags, it's just an extra Bfe used in x87. pointless and confusing! Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-07-16 14:44:34 -04:00
Alyssa Rosenzweig	c4ba7eee87	X87: save uop in ReconstructFTW noticed while reviewing Paulo's work Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-07-16 13:54:09 -04:00
Paulo Matos	8d89adef2e	Add IR stack operations These IR operations deal implicitly with the x87 stack and are removed by the x87 stack optimization pass.	2024-07-16 09:07:35 +02:00
Alyssa Rosenzweig	1e709d1150	OpcodeDispatcher: add RecordX87 helper calls will be generated. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-07-16 09:07:35 +02:00
Alyssa Rosenzweig	476ee0cd7d	IR: track whether x87 is used in header Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-07-16 09:07:35 +02:00
Ryan Houdek	b8e864ffdf	Merge pull request #3865 from Sonicadvance1/telemetry_atexit Telemetry: Change how visibility of telemetry values work	2024-07-15 09:53:37 -07:00
Ryan Houdek	d79b7fcc49	Merge pull request #3808 from alyssarosenzweig/rclse/3 Try to delete RCLSE again	2024-07-12 20:38:06 -07:00
Ryan Houdek	b9a6caea8d	Merge pull request #3844 from Sonicadvance1/fix_vmovq AVX128: Fixes vmovq loading too much data	2024-07-12 17:07:32 -07:00
Ryan Houdek	97a68cb643	Telemetry: Change how visibility of telemetry values work Removes global initializer for telemetry values since their address is visible and PIC relative code loading handles the address fetching for us.	2024-07-12 03:18:23 -07:00
Ryan Houdek	870e395ac4	Merge pull request #3862 from Sonicadvance1/remove_atexit_logman LogManager: Removes fextl::vector usage	2024-07-12 02:05:02 -07:00
Ryan Houdek	04592f82f5	Merge pull request #3861 from Sonicadvance1/remove_atexit_vdso VDSO: Stop using a vector for a static	2024-07-12 02:04:25 -07:00
Ryan Houdek	5ef0db994d	VDSO: Stop using a vector for a static This causes a global initializer that registers an atexit handler. Be smarter, use an std::array and pass its data around using a span instead. Removes the global initializer and removes the atexit installation	2024-07-11 23:53:57 -07:00
Ryan Houdek	b523407a3e	LogManager: Removes fextl::vector usage We never use more than one logging method at a time so this was overengineered for what it is doing. Instead only allow one handler for messages and throw messages each which just is a pointer. Removes a global initializer and an atexit handler being installed	2024-07-11 22:51:56 -07:00
Ryan Houdek	8021dc10a1	OpcodeDispatcher: Force noinline for the function call in the Bind helper Clang was inlining a few of the functions it was calling. So force it never to inline since this is supports to be a little shim trampoline only.	2024-07-11 19:00:42 -07:00
Ryan Houdek	7e8d734e43	AVX256: Initial fixes just to get my unittest working This is the initial split to decouple AVX256 composed operations from their MMX/SSE counterparts. This is to work around the subtle differences with AVX/SSE zext/insert behaviour.	2024-07-11 18:43:31 -07:00
Ryan Houdek	3c7318d7c8	AVX128: Fixes vmovq loading too much data This was doing a 128-bit load from memory and then a 64-bit zero extend which looked like a spurious move but it was trying to match the behaviour of vmovq where it needed the zero extend. Also adds a unit test to ensure that we aren't loading too much data by loading right up against a page boundary. Fixes #3787	2024-07-11 18:34:05 -07:00
Ryan Houdek	fc0b233046	Merge pull request #3859 from neobrain/refactor_opdispatch_templates OpcodeDispatcher: Replace hand-written wrapper templates with a generic utility	2024-07-11 18:18:23 -07:00
Mai	e25918d846	Merge pull request #3858 from Sonicadvance1/implement_nt_load Implement support for SSE4.1/AVX NT loads	2024-07-11 14:22:41 -04:00
Alyssa Rosenzweig	3a334c4585	Reapply "IR: drop RCLSE" This reverts commit 78aee4d96e39c9ef6415a7dca21fd6b81dabe12e.	2024-07-11 13:21:14 -04:00
Alyssa Rosenzweig	8dae4bcd44	OpcodeDispatcher: drop stale comment Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-07-11 13:21:14 -04:00
Alyssa Rosenzweig	294f10fdd0	OpcodeDispatcher: reg cache mmx Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-07-11 13:21:14 -04:00
Tony Wasserka	b9829ed316	OpcodeDispatcher: Replace even more hand-written wrapper templates	2024-07-11 16:19:15 +02:00
Tony Wasserka	4ccec17676	OpcodeDispatcher: Replace more hand-written wrapper templates	2024-07-11 16:19:15 +02:00
Tony Wasserka	f45082043b	OpcodeDispatcher: Replace hand-written wrapper templates with a generic utility	2024-07-11 16:19:14 +02:00
Tony Wasserka	3222f13dde	Fix comment formatting	2024-07-11 16:19:14 +02:00
Mai	b282620a48	Merge pull request #3857 from Sonicadvance1/sve_bitperm Arm64: Implement support for SVE bitperm	2024-07-11 05:05:41 -04:00
Ryan Houdek	e24b01b6cb	Arm64: Implement support for SVE bitperm	2024-07-11 01:46:35 -07:00
Tony Wasserka	9a8694c2f3	Merge pull request #3853 from neobrain/refactor_warn_fixes Fix all the warnings	2024-07-11 10:12:41 +02:00
Tony Wasserka	070a9148aa	Merge pull request #3852 from neobrain/refactor_opdispatch_codesize OpcodeDispatcher: Avoid template monomorphization to reduce FEXLoader binary size	2024-07-11 09:58:49 +02:00
Tony Wasserka	f19fe3b6f3	Fix warning about an expression with side effects being passed to __builtin_assume LOGMAN_THROW_AA_FMT has no benefit over LOGMAN_THROW_A_FMT here, so just use the latter.	2024-07-11 09:54:31 +02:00
Tony Wasserka	8d2b15665d	Fix unused-variable warnings	2024-07-11 09:54:30 +02:00
Ryan Houdek	548fd9daf8	OpcodeDispatcher: Implement support for SSE4.1 NT load	2024-07-10 23:07:37 -07:00
Ryan Houdek	f831f5a0e1	AVX128: Implement support for NT Load	2024-07-10 23:07:14 -07:00

1 2 3 4 5 ...

1628 Commits