Commit Graph

9532 Commits

Author SHA1 Message Date
Ryan Houdek
664d766b45
AVX128: Implement support for vmovshdup 2024-06-20 09:43:10 -07:00
Ryan Houdek
fce694ed92
AVX128: Implement support for vmovsldup 2024-06-20 09:43:10 -07:00
Ryan Houdek
96aafb4f07
AVX128: Implement support for vmovddup
This instruction is a little weird.
When accessing memory, the 128-bit operating size of the instruction
only loads 64-bits.
Meanwhile the 256-bit operating size of the instruction fetches a full
256-bits.

Theoretically the hardware could get away with two 64-bit loads or a
wacky 24-byte load, but it looks like to simplify hardware they just
spec'd it that the 256-bit version will always load the full range.
2024-06-20 09:43:10 -07:00
Ryan Houdek
dbaf95a8f3
AVX128: Implement support for vmovhps/d 2024-06-20 06:53:21 -07:00
Ryan Houdek
e67df96ad9
AVX128: Implement support for movlps/d 2024-06-20 06:53:17 -07:00
Ryan Houdek
56de94578d
AVX128: Implement support for vmovq 2024-06-20 06:53:13 -07:00
Ryan Houdek
06fc2f5ef0
AVX128: Implement support for non-temporal moves. 2024-06-20 06:53:09 -07:00
Ryan Houdek
b3ba315cbd
AVX128: Implements unary/binary lambda helper 2024-06-20 06:53:05 -07:00
Alyssa Rosenzweig
b2eb8aaf66
Merge pull request #3718 from Sonicadvance1/avx128_3
OpcodeDispatcher: Adds initial groundwork for decomposed AVX operations
2024-06-20 08:57:35 -04:00
Ryan Houdek
acbd920c9a OpcodeDispatcher: Adds initial groundwork for decomposed AVX operations
Only installs the tables if SVE256 isn't supported yet AVX is explicitly
enabled with HostFeatures, to protect accidental enablement early.

- Only implements 85 instructions starting out
- Basic vector moves
- Basic vector unary operations
- Basic vector binary operations
- VZeroUpper/VZeroAll

The bulk of the implementation is currently the handling for loading and
storing the halves of the registers from the context or from memory.

This means the load/store helpers must always return a pair unless only
requesting the bottom half of the register, which occurs with 128-bit
AVX operations. The store side then needing to consume the named zero
register if it occurs since those cases will zero the upper bits.

This implementation approach has a few benefits.
- I can pound this out extremely quickly
- SSE implementations are unaffected and don't need to deal with the
  insert behaviour of SVE256.
- We still keep the SVE256 implementation for the inevitable future when
  hardware vendors actually do implement it (Give it 8 years or
  something).
- We can actually unit test this path in CI once it is complete.
- We can partially optimize some paths with SVE128 (Gathers) and support
  a full ASIMD path if necessary.

One downside is that I can't enable this in CI yet because it can't pass
all unittests. but that's a non-issue since it is going to be in heavy
flux as I'm hammering out the implementation. It'll get switched on at
the end when it's passing all 1265 AVX unittests. Currently at 1001 on
this.
2024-06-20 08:44:14 -04:00
Alyssa Rosenzweig
db0bdd48e5
Merge pull request #3729 from alyssarosenzweig/refactor/address-modes
OpcodeDispatcher: Refactor address modes
2024-06-20 08:18:33 -04:00
Ryan Houdek
da21ee3cda
Merge pull request #3692 from pmatos/AFP_RPRES_fix
Fixes AFP.NEP handling on scalar insertions
2024-06-19 19:23:49 -07:00
Ryan Houdek
d2baef2b36
Merge pull request #3727 from Sonicadvance1/vaes
VAES support
2024-06-19 19:22:56 -07:00
Ryan Houdek
df96bc83cc
Merge pull request #3726 from Sonicadvance1/oryon_errata
HostFeatures: Work around Qualcomm Oryon RNG errata
2024-06-19 19:21:14 -07:00
Alyssa Rosenzweig
ec03831a21 OpcodeDispatcher: plumb A.NonTSO deeper
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-06-19 08:52:07 -04:00
Alyssa Rosenzweig
9ca821316a OpcodeDispatcher: factor out DecodeAddress
this is the common guts of the load/store routines.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-06-19 08:52:07 -04:00
Alyssa Rosenzweig
025a060337 OpcodeDispatcher: extract IsNonTSOReg
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-06-19 08:52:07 -04:00
Alyssa Rosenzweig
371d6f0730 OpcodeDispatcher: extract IsOperandMem
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-06-19 08:52:07 -04:00
Ryan Houdek
643bc10d52
CPUID: Expose VAES if supported 2024-06-19 05:51:47 -07:00
Ryan Houdek
8fb801069f
unittests: Adds new VAES tests 2024-06-19 05:51:47 -07:00
Ryan Houdek
542ed8b6ad
Implement support for querying AES256 support
This is a different feature flag than regular AES as the default AES+AVX
only operates on 128-bit wide vectors.

With the newer `VAES` extension this is expanded to 256-bit.
2024-06-19 05:51:47 -07:00
Ryan Houdek
053620f4f5
Merge pull request #3728 from pmatos/PythonIgnore
Ignore python files for clang-format
2024-06-19 05:51:25 -07:00
Paulo Matos
88b01a0ca9 Ignore python files for clang-format 2024-06-19 14:23:27 +02:00
Alyssa Rosenzweig
197140498b
Merge pull request #3721 from alyssarosenzweig/scripts/instcountci
Scripts: add update_instcountci.sh script
2024-06-19 06:50:00 -04:00
Paulo Matos
9acd325aa4 instcountci: Fixes AFP.NEP handling on scalar insertions 2024-06-19 10:02:54 +02:00
Paulo Matos
2483329ef6 Fixes AFP.NEP handling on scalar insertions
Fixes #3690

When doing scalar insertions, upper bits come from different arguments
depending on the operation. These are listed in the ARM spec under the
NEP bit documentation.
2024-06-19 10:02:54 +02:00
Ryan Houdek
87fe1d672e
Merge pull request #3715 from pmatos/FXCHFlag
FXCH should set C1 to zero
2024-06-19 00:20:01 -07:00
Paulo Matos
9257221b3b instcountci: FXCH should set C1 to zero 2024-06-19 09:11:49 +02:00
Paulo Matos
f9b38a1de7 FXCH should set C1 to zero 2024-06-19 08:57:48 +02:00
Ryan Houdek
67e1ac0442
Merge pull request #3725 from alyssarosenzweig/ir/vbic
IR: rename _VBic -> _VAndn
2024-06-18 16:34:26 -07:00
Ryan Houdek
c57e9e008f
Merge pull request #3723 from alyssarosenzweig/fexcore/zero-helper
OpcodeDispatcher: refactor zero vector loads
2024-06-18 16:34:15 -07:00
Ryan Houdek
b34c23fe3d
HostFeatures: Work around Qualcomm Oryon RNG errata
The Oryon is the first CPU we know of that implemented support for the
RNG extension. It also has an errata where reading the RNDRRS register
never returns success. X86's RDSEED guarantees forward progress with
enough retries.

When an x86 processor messed this up at one point, some Linux systems
would infinite loop (presumably when something in boot was filling an
entropy pool). This required a microcode change to fix that processor.

The rdseed unittest infinite loops on this platform if RNG was exposed.
2024-06-18 16:29:53 -07:00
Ryan Houdek
29f644235d
Merge pull request #3724 from alyssarosenzweig/ryan-avx-cut
First few commits from Ryan's AVX branch
2024-06-18 11:39:51 -07:00
Alyssa Rosenzweig
01da5972fc IR: rename _VBic -> _VAndn
to be consistent with the scalar _Andn opcode, which is specifically named _Andn
and not _Bic.

noticed while reviewing AVX patches

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-06-18 14:00:01 -04:00
Alyssa Rosenzweig
643e964edd
Merge pull request #3694 from Sonicadvance1/fix_3691
FEX: Consolidate JSON allocators and fix 3691
2024-06-18 13:54:10 -04:00
Ryan Houdek
30e3d795da FEX: Consolidate JSON allocators and fix 3691
Fixes #3691

We weren't checking if the file was empty before using its `at` function
member. This was causing an early crash if the config file existed but
was empty.

Consolidates the three locations that copy and pasted the json allocator
tools and adds an empty check for all of them.

Also adds two missing checks to the ThunksDB handler that could have
resulted in the same crash if ThunksDB was an empty file.
2024-06-18 13:31:25 -04:00
Alyssa Rosenzweig
89b05a2ea4
Merge pull request #3706 from Sonicadvance1/threadstateobject_cast
LinuxEmulation: Add a helper for getting the ThreadStateObject from CPU frame
2024-06-18 13:28:46 -04:00
Alyssa Rosenzweig
2e009be27c
Merge pull request #3708 from Sonicadvance1/fexgetconfig_tsoemulation_facts
FEXGetConfig: Support the ability to get TSO emulation facts
2024-06-18 13:28:02 -04:00
Alyssa Rosenzweig
32150cf7b5 InstCountCI: Update
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-06-18 12:01:23 -04:00
Ryan Houdek
bf812aae8f CoreState: Adds avx_high structure for tracking decoupled AVX halves.
Needed something inbetween the `InlineJITBlockHeader` and `avx_high` in
order to match alignment requirements of 16-byte for avx_high. Chose the
`DeferredSignalRefCount` because we hit it quite frequently and it is
basically the only 64-bit variable that we end up touching
significantly.

In the future the CPUState object is going to need to change its view of
the object depending on if the device supports SVE256 or not, but we
don't need to frontload the work right now. It'll become significantly
easier to support that path once the RCLSE pass gets deleted.
2024-06-18 12:00:45 -04:00
Ryan Houdek
9a71443005 CoreState: Adds a gregs offset check
This is required to be less than the maximum range for LDP and STP in
the Arm64 Dispatcher otherwise it breaks. Necessary to ensure this when
reorganizing the CoreState.
2024-06-18 12:00:45 -04:00
Ryan Houdek
ee165249bc Dispatcher: Fix ARM64EC
We don't have CI for this and was missed.
2024-06-18 12:00:45 -04:00
Mai
7c7d767195
Merge pull request #3722 from alyssarosenzweig/instcountci/disable-afp
InstCountCI: explicitly disable AFP everywhere
2024-06-18 11:52:21 -04:00
Alyssa Rosenzweig
af8cfb79e5 OpcodeDispatcher: refactor zero vector loads
AVX128 is going to slam this, so make it more ergonomic.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-06-18 11:44:46 -04:00
Alyssa Rosenzweig
27c8bf3021 InstCountCI: explicitly disable AFP everywhere
(except for when we explicitly enable AFP).

Since AFP gets saved/restored, we get `msr fpcr` garbage in random instructions
when AFP is enabled. Explicitly disable everywhere since it's not worth our time
to triage which files might hit that path. Fixes instcountci on AFP-supporting
hosts now that we have AFP enabled.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-06-18 11:40:20 -04:00
Alyssa Rosenzweig
b0a09b31bb Scripts: add update_instcountci.sh script
This is helpful for devs working on FEXCore, I've been using this locally but it
might make sense to stick it in tree.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-06-18 09:07:49 -04:00
Ryan Houdek
13ebfb1a49
Merge pull request #3711 from Sonicadvance1/avx128_2
FEXCore: Disentangle the SVE256 feature from AVX
2024-06-17 17:35:15 -07:00
Ryan Houdek
f863b30951
Merge pull request #3716 from alyssarosenzweig/ir-dump/unrecoverable
json_ir_generator: don't print unrecoverable temps
2024-06-17 17:25:27 -07:00
Ryan Houdek
1ce27a5e6b
FEXCore: Disentangle the SVE256 feature from AVX
In quite a few locations we are mixing the case that SVE256 == AVX or
that AVX means the guest register size is 256-bit.

While this is true today, this is entanglement is going to change very
quickly and cause confusion in follow-up PRs.

Now we have SVE128, SVE256, and SVE2 HostFeatures to disambiguate the
different features which mean different things.

This PR keeps the alias that `SupportsAVX` = `SupportsSVE256 && SupportsSVE2`
but that alias is going to very quickly change its definition.
2024-06-17 17:20:32 -07:00
Ryan Houdek
933d622860
Merge pull request #3710 from Sonicadvance1/avx128_1
CoreState: Move `InlineJITBlockHeader` to the start of the struct
2024-06-17 17:17:56 -07:00