FEX-Emu/FEX - FEX - Gitea: Git with a cup of tea

mirror of https://github.com/FEX-Emu/FEX.git synced 2024-12-14 01:18:46 +00:00

Author	SHA1	Message	Date
Ryan Houdek	664d766b45	AVX128: Implement support for vmovshdup	2024-06-20 09:43:10 -07:00
Ryan Houdek	fce694ed92	AVX128: Implement support for vmovsldup	2024-06-20 09:43:10 -07:00
Ryan Houdek	96aafb4f07	AVX128: Implement support for vmovddup This instruction is a little weird. When accessing memory, the 128-bit operating size of the instruction only loads 64-bits. Meanwhile the 256-bit operating size of the instruction fetches a full 256-bits. Theoretically the hardware could get away with two 64-bit loads or a wacky 24-byte load, but it looks like to simplify hardware they just spec'd it that the 256-bit version will always load the full range.	2024-06-20 09:43:10 -07:00
Ryan Houdek	dbaf95a8f3	AVX128: Implement support for vmovhps/d	2024-06-20 06:53:21 -07:00
Ryan Houdek	e67df96ad9	AVX128: Implement support for movlps/d	2024-06-20 06:53:17 -07:00
Ryan Houdek	56de94578d	AVX128: Implement support for vmovq	2024-06-20 06:53:13 -07:00
Ryan Houdek	06fc2f5ef0	AVX128: Implement support for non-temporal moves.	2024-06-20 06:53:09 -07:00
Ryan Houdek	b3ba315cbd	AVX128: Implements unary/binary lambda helper	2024-06-20 06:53:05 -07:00
Alyssa Rosenzweig	b2eb8aaf66	Merge pull request #3718 from Sonicadvance1/avx128_3 OpcodeDispatcher: Adds initial groundwork for decomposed AVX operations	2024-06-20 08:57:35 -04:00
Ryan Houdek	acbd920c9a	OpcodeDispatcher: Adds initial groundwork for decomposed AVX operations Only installs the tables if SVE256 isn't supported yet AVX is explicitly enabled with HostFeatures, to protect accidental enablement early. - Only implements 85 instructions starting out - Basic vector moves - Basic vector unary operations - Basic vector binary operations - VZeroUpper/VZeroAll The bulk of the implementation is currently the handling for loading and storing the halves of the registers from the context or from memory. This means the load/store helpers must always return a pair unless only requesting the bottom half of the register, which occurs with 128-bit AVX operations. The store side then needing to consume the named zero register if it occurs since those cases will zero the upper bits. This implementation approach has a few benefits. - I can pound this out extremely quickly - SSE implementations are unaffected and don't need to deal with the insert behaviour of SVE256. - We still keep the SVE256 implementation for the inevitable future when hardware vendors actually do implement it (Give it 8 years or something). - We can actually unit test this path in CI once it is complete. - We can partially optimize some paths with SVE128 (Gathers) and support a full ASIMD path if necessary. One downside is that I can't enable this in CI yet because it can't pass all unittests. but that's a non-issue since it is going to be in heavy flux as I'm hammering out the implementation. It'll get switched on at the end when it's passing all 1265 AVX unittests. Currently at 1001 on this.	2024-06-20 08:44:14 -04:00
Alyssa Rosenzweig	db0bdd48e5	Merge pull request #3729 from alyssarosenzweig/refactor/address-modes OpcodeDispatcher: Refactor address modes	2024-06-20 08:18:33 -04:00
Ryan Houdek	da21ee3cda	Merge pull request #3692 from pmatos/AFP_RPRES_fix Fixes AFP.NEP handling on scalar insertions	2024-06-19 19:23:49 -07:00
Ryan Houdek	d2baef2b36	Merge pull request #3727 from Sonicadvance1/vaes VAES support	2024-06-19 19:22:56 -07:00
Ryan Houdek	df96bc83cc	Merge pull request #3726 from Sonicadvance1/oryon_errata HostFeatures: Work around Qualcomm Oryon RNG errata	2024-06-19 19:21:14 -07:00
Alyssa Rosenzweig	ec03831a21	OpcodeDispatcher: plumb A.NonTSO deeper Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-06-19 08:52:07 -04:00
Alyssa Rosenzweig	9ca821316a	OpcodeDispatcher: factor out DecodeAddress this is the common guts of the load/store routines. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-06-19 08:52:07 -04:00
Alyssa Rosenzweig	025a060337	OpcodeDispatcher: extract IsNonTSOReg Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-06-19 08:52:07 -04:00
Alyssa Rosenzweig	371d6f0730	OpcodeDispatcher: extract IsOperandMem Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-06-19 08:52:07 -04:00
Ryan Houdek	643bc10d52	CPUID: Expose VAES if supported	2024-06-19 05:51:47 -07:00
Ryan Houdek	8fb801069f	unittests: Adds new VAES tests	2024-06-19 05:51:47 -07:00
Ryan Houdek	542ed8b6ad	Implement support for querying AES256 support This is a different feature flag than regular AES as the default AES+AVX only operates on 128-bit wide vectors. With the newer `VAES` extension this is expanded to 256-bit.	2024-06-19 05:51:47 -07:00
Ryan Houdek	053620f4f5	Merge pull request #3728 from pmatos/PythonIgnore Ignore python files for clang-format	2024-06-19 05:51:25 -07:00
Paulo Matos	88b01a0ca9	Ignore python files for clang-format	2024-06-19 14:23:27 +02:00
Alyssa Rosenzweig	197140498b	Merge pull request #3721 from alyssarosenzweig/scripts/instcountci Scripts: add update_instcountci.sh script	2024-06-19 06:50:00 -04:00
Paulo Matos	9acd325aa4	instcountci: Fixes AFP.NEP handling on scalar insertions	2024-06-19 10:02:54 +02:00
Paulo Matos	2483329ef6	Fixes AFP.NEP handling on scalar insertions Fixes #3690 When doing scalar insertions, upper bits come from different arguments depending on the operation. These are listed in the ARM spec under the NEP bit documentation.	2024-06-19 10:02:54 +02:00
Ryan Houdek	87fe1d672e	Merge pull request #3715 from pmatos/FXCHFlag FXCH should set C1 to zero	2024-06-19 00:20:01 -07:00
Paulo Matos	9257221b3b	instcountci: FXCH should set C1 to zero	2024-06-19 09:11:49 +02:00
Paulo Matos	f9b38a1de7	FXCH should set C1 to zero	2024-06-19 08:57:48 +02:00
Ryan Houdek	67e1ac0442	Merge pull request #3725 from alyssarosenzweig/ir/vbic IR: rename _VBic -> _VAndn	2024-06-18 16:34:26 -07:00
Ryan Houdek	c57e9e008f	Merge pull request #3723 from alyssarosenzweig/fexcore/zero-helper OpcodeDispatcher: refactor zero vector loads	2024-06-18 16:34:15 -07:00
Ryan Houdek	b34c23fe3d	HostFeatures: Work around Qualcomm Oryon RNG errata The Oryon is the first CPU we know of that implemented support for the RNG extension. It also has an errata where reading the RNDRRS register never returns success. X86's RDSEED guarantees forward progress with enough retries. When an x86 processor messed this up at one point, some Linux systems would infinite loop (presumably when something in boot was filling an entropy pool). This required a microcode change to fix that processor. The rdseed unittest infinite loops on this platform if RNG was exposed.	2024-06-18 16:29:53 -07:00
Ryan Houdek	29f644235d	Merge pull request #3724 from alyssarosenzweig/ryan-avx-cut First few commits from Ryan's AVX branch	2024-06-18 11:39:51 -07:00
Alyssa Rosenzweig	01da5972fc	IR: rename _VBic -> _VAndn to be consistent with the scalar _Andn opcode, which is specifically named _Andn and not _Bic. noticed while reviewing AVX patches Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-06-18 14:00:01 -04:00
Alyssa Rosenzweig	643e964edd	Merge pull request #3694 from Sonicadvance1/fix_3691 FEX: Consolidate JSON allocators and fix 3691	2024-06-18 13:54:10 -04:00
Ryan Houdek	30e3d795da	FEX: Consolidate JSON allocators and fix 3691 Fixes #3691 We weren't checking if the file was empty before using its `at` function member. This was causing an early crash if the config file existed but was empty. Consolidates the three locations that copy and pasted the json allocator tools and adds an empty check for all of them. Also adds two missing checks to the ThunksDB handler that could have resulted in the same crash if ThunksDB was an empty file.	2024-06-18 13:31:25 -04:00
Alyssa Rosenzweig	89b05a2ea4	Merge pull request #3706 from Sonicadvance1/threadstateobject_cast LinuxEmulation: Add a helper for getting the ThreadStateObject from CPU frame	2024-06-18 13:28:46 -04:00
Alyssa Rosenzweig	2e009be27c	Merge pull request #3708 from Sonicadvance1/fexgetconfig_tsoemulation_facts FEXGetConfig: Support the ability to get TSO emulation facts	2024-06-18 13:28:02 -04:00
Alyssa Rosenzweig	32150cf7b5	InstCountCI: Update Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-06-18 12:01:23 -04:00
Ryan Houdek	bf812aae8f	CoreState: Adds avx_high structure for tracking decoupled AVX halves. Needed something inbetween the `InlineJITBlockHeader` and `avx_high` in order to match alignment requirements of 16-byte for avx_high. Chose the `DeferredSignalRefCount` because we hit it quite frequently and it is basically the only 64-bit variable that we end up touching significantly. In the future the CPUState object is going to need to change its view of the object depending on if the device supports SVE256 or not, but we don't need to frontload the work right now. It'll become significantly easier to support that path once the RCLSE pass gets deleted.	2024-06-18 12:00:45 -04:00
Ryan Houdek	9a71443005	CoreState: Adds a gregs offset check This is required to be less than the maximum range for LDP and STP in the Arm64 Dispatcher otherwise it breaks. Necessary to ensure this when reorganizing the CoreState.	2024-06-18 12:00:45 -04:00
Ryan Houdek	ee165249bc	Dispatcher: Fix ARM64EC We don't have CI for this and was missed.	2024-06-18 12:00:45 -04:00
Mai	7c7d767195	Merge pull request #3722 from alyssarosenzweig/instcountci/disable-afp InstCountCI: explicitly disable AFP everywhere	2024-06-18 11:52:21 -04:00
Alyssa Rosenzweig	af8cfb79e5	OpcodeDispatcher: refactor zero vector loads AVX128 is going to slam this, so make it more ergonomic. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-06-18 11:44:46 -04:00
Alyssa Rosenzweig	27c8bf3021	InstCountCI: explicitly disable AFP everywhere (except for when we explicitly enable AFP). Since AFP gets saved/restored, we get `msr fpcr` garbage in random instructions when AFP is enabled. Explicitly disable everywhere since it's not worth our time to triage which files might hit that path. Fixes instcountci on AFP-supporting hosts now that we have AFP enabled. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-06-18 11:40:20 -04:00
Alyssa Rosenzweig	b0a09b31bb	Scripts: add update_instcountci.sh script This is helpful for devs working on FEXCore, I've been using this locally but it might make sense to stick it in tree. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2024-06-18 09:07:49 -04:00
Ryan Houdek	13ebfb1a49	Merge pull request #3711 from Sonicadvance1/avx128_2 FEXCore: Disentangle the SVE256 feature from AVX	2024-06-17 17:35:15 -07:00
Ryan Houdek	f863b30951	Merge pull request #3716 from alyssarosenzweig/ir-dump/unrecoverable json_ir_generator: don't print unrecoverable temps	2024-06-17 17:25:27 -07:00
Ryan Houdek	1ce27a5e6b	FEXCore: Disentangle the SVE256 feature from AVX In quite a few locations we are mixing the case that SVE256 == AVX or that AVX means the guest register size is 256-bit. While this is true today, this is entanglement is going to change very quickly and cause confusion in follow-up PRs. Now we have SVE128, SVE256, and SVE2 HostFeatures to disambiguate the different features which mean different things. This PR keeps the alias that `SupportsAVX` = `SupportsSVE256 && SupportsSVE2` but that alias is going to very quickly change its definition.	2024-06-17 17:20:32 -07:00
Ryan Houdek	933d622860	Merge pull request #3710 from Sonicadvance1/avx128_1 CoreState: Move `InlineJITBlockHeader` to the start of the struct	2024-06-17 17:17:56 -07:00

1 2 3 4 5 ...

9532 Commits