8111 Commits

Author SHA1 Message Date
Ryan Houdek
2671246fef IR: Adds scalar vector insert operations
These IR operations are required to support AFP's NEP mode which does
vector insert in to the destination register. Additionally it gives us
tracking information to allow optimizing out redundant inserts on
devices that don't support AFP natively.

In order to match x86 semantics we need to support binary and unary
scalar operations that do a final insert in to a vector. With optional
zeroing of the top 128-bits for AVX variants.

A tricky thing is that in binary operations this means that the
destination and first source have an intrinsically linked property
depending on if it is SSE or AVX.

SSE example:
- addss xmm0, xmm1
   - xmm0 is both the destination and the first source.
   - This means xmm0[31:0] = xmm0[31:0] + xmm1[31:0]
   - Bits [127:32] are UNMODIFIED.

FEX's JIT jumps through some hoops so that if the destination register
equals the first source register, then it hits the optimal path the
AFP.NEP will insert in to the result. AVX throws a small wrench in to
this due to changed behaviour

AVX example:
- vaddss xmm0, xmm1, xmm2
  - xmm0 is ONLY the destination, xmm1 and xmm2 are the sources
  - This operation copies the bits above the scalar result from the
    first source (xmm1).
  - Additionally this will zero bits above the original 128-bit xmm
    register.
  - xmm0[31:0] = xmm1[31:0] + xmm2[31:0]
  - xmm0[127:32] = xmm1[127:32]
  - ymm0[255:127] = 0

This causes these instructions to support a fairly large table depending
on if the instruction is an SSE or AVX instruction, plus if the host CPU
supports AFP or not.

So while fairly complex, it's handling all the edge cases and gives us
optimization opportunities as we move forward. Currently on non-AFP
supporting devices this has a minor benefit that these IR operations
remove one temporary register, lowering the Register Allocation
overhead.

In the coming weeks I am likely to introduce an optimization pass that
removes redundant inserts because FEX currently does /really/ badly with
scalar code loops.

Needs #3184 merged first.
2023-10-10 03:17:19 -07:00
Ryan Houdek
8cb8f090dd Arm64Emitter: enable/disable AFP on Fill/Spill
When FEX is in the JIT we need to make sure to enable NEP and AH and
then disable when leaving.

Explicitly disabled when the vixl simulator is used since even
attempting to set the bits will cause it to fault out. Ensures
InstCountCI keeps working.
2023-10-10 03:17:18 -07:00
Ryan Houdek
a37d89a7d5 HostFeatures: Disable AFP until verified that it is working
Need to audit scalar instruction usage to ensure all uses are okay with
garbage in the upper bits.
2023-10-10 03:17:18 -07:00
Ryan Houdek
252d7712ea Arm64: Save if the host supports AFP 2023-10-10 03:17:18 -07:00
Ryan Houdek
c548625fbe InstCountCI: Update tests for disabling AFP
Doesn't change behaviour yet, just prep work.
2023-10-10 03:17:18 -07:00
Ryan Houdek
f036a0b84f
Merge pull request #3191 from Sonicadvance1/instcountci_multiple
InstCountCI: Support multiple instructions in the tests
2023-10-10 02:53:14 -07:00
Ryan Houdek
2e1389b25e
Merge pull request #3184 from Sonicadvance1/armemitter_sized_scalars
ArmEmitter: Adds sized Scalar 1 source and 2 source helpers
2023-10-10 02:53:06 -07:00
Alyssa Rosenzweig
a5f82a57fa
Merge pull request #3190 from Sonicadvance1/atomic_instcountci
InstCountCI: Adds missing atomic tests
2023-10-10 05:22:36 -04:00
Ryan Houdek
cd83d3eb24 InstCountCI: Support multiple instructions in the tests
There are some cases where we want to test multiple instructions where
we can do optimizations that would overwise be hard to see.

eg:
```asm
; Can be optimized to a single stp
push eax
push ebx

; Can remove half of the copy since we know the direction
cld
rep movsb

; Can remove a redundant insert
addss xmm0, xmm1
addss xmm0, xmm2
```

This lets us have arbitrary sized code in instruction count CI, with the
original json key becoming only a label if the instruction array is
provided.

There are still some major limitations to this, instructions that
generate side-effects might have "garbage" after the end of the block
that isn't correctly accounted for. So care must be taken.

Example in the json
```json
"push ax, bx": {
  "ExpectedInstructionCount": 4,
  "Optimal": "No",
  "Comment": "0x50",
  "x86Insts": [
    "push ax",
    "push bx"
  ],
  "ExpectedArm64ASM": [
    "uxth w20, w4",
    "strh w20, [x8, #-2]!",
    "uxth w20, w7",
    "strh w20, [x8, #-2]!"
  ]
}
```
2023-10-09 21:49:53 -07:00
Ryan Houdek
93ab8ab23c InstCountCI: Adds missing atomic tests
This adds all the missing atomic tests in to their own tests files.
This includes all of them except a few choice ones that are in their
original files.

- BTC, BTR, BTS  are in their Secondary/SecondaryGroup files
- CMPXCHG, CMPXCHG8B, CMPXCHG16B are in their Secondary/SecondaryGroup
  files
   - These always imply lock semantics even without the prefix.
2023-10-09 21:18:08 -07:00
Ryan Houdek
462fff2c67
Merge pull request #3189 from Sonicadvance1/remove_warnings_15
FEXCore: Removes a warning about assume discarding side-effects
2023-10-09 17:26:14 -07:00
Alyssa Rosenzweig
8dab35cbf8
Merge pull request #3188 from Sonicadvance1/reconstruct_flags_naming
FEXCore: Renames raw FLAGS location names to signify they can't be used directly
2023-10-09 19:41:09 -04:00
Ryan Houdek
a1a479e69f FEXCore: Removes a warning about assume discarding side-effects 2023-10-09 16:04:57 -07:00
Ryan Houdek
6403290019 FEXCore: Renames raw FLAGS location names to signify they can't be used directly
Six of the EFLAGS can't be used directly in a bitmask because they are
either contained in a different flags location or has multiple bits
stored in it.

SF, ZF, CF, OF are stored in ARM's NZCV format in offset 24.
PF calculation is deferred but stored in the regular offset.
AF is also deferred in relation to the PF but stored in the regular
offset.

These /need/ to be reconstructed using the `ReconstructCompactedEFLAGS`
function when wanting to read the EFLAGS.

When setting these flags they /need/ to be set using
`SetFlagsFromCompactedEFLAGS`.

If either of these functions are not used when managing EFLAGs then the
internal representation will get mangled and the state will be
corrupted.

Having a little `_RAW` on these to signify that these aren't just
regular single bit representations like the other flags in EFLAGS should
make us puzzle about this issue before writing more broken code that
tries accessing it directly.
2023-10-08 11:51:11 -07:00
Ryan Houdek
580bd50a00 unittests/ASM: Removes eflags comparison option
This was not used and is also broken.
2023-10-08 11:51:11 -07:00
Ryan Houdek
b2a8b0ca12
Merge pull request #3187 from Sonicadvance1/implement_rpres
FEXCore: Implements support for RPRES
2023-10-08 09:57:27 -07:00
Ryan Houdek
f78bdf0852 unittests/Emitter: Adds sized scalar unittests. 2023-10-08 09:48:37 -07:00
Ryan Houdek
5652eb4c5d ARMEmitter: Removes templated ptrue/ptrues
Non-templated version exists and templated version gets us nothing.
2023-10-07 23:51:32 -07:00
Ryan Houdek
a52bb47551 unittests: Update for rpres optimization 2023-10-07 23:22:51 -07:00
Ryan Houdek
22590dde77 FEXCore: Implements support for RPRES
This allows us to use reciprocal instructions which matches precision of
what x86 expects rather than converting everything to float divides.

Currently no hardware supports this, and even the upcoming X4/A720/A520
won't support it, but it was trivial to implement so wire it up.
2023-10-07 23:13:47 -07:00
Ryan Houdek
6543a80ff9
Merge pull request #3185 from Sonicadvance1/ir_dispatcher_emit
FEXCore/IR: Changes over to automated IR dispatch generation
2023-10-07 21:21:44 -07:00
Ryan Houdek
9c36d1061b
Merge pull request #3182 from Sonicadvance1/instcountci_stacking_test_names
InstCountCI: Fixes recursive tests with same filename
2023-10-07 21:21:10 -07:00
Alyssa Rosenzweig
5a3cc7b469
Merge pull request #3183 from Sonicadvance1/instcountci_support_afp_override
InstCountCI: Support overriding AFP features
2023-10-07 19:57:33 -04:00
Ryan Houdek
4cff3e5f1f FEXCore/IR: Changes over to automated IR dispatch generation
Suggested by Alyssa. Adding an IR operation can be a little tedious
since you need to add the definition to JIT.cpp for the dispatch switch,
JITClass.h for the function declared, and then actually defining the
implementation in the correct file.

Instead support the common case where an IR operation just gets
dispatched through to the regular handler. This lets the developer just
put the function definition in to the json and the relevent cpp file and
it just gets picked up.

Some minor things:
- Needs to support dynamic dispatch for {Load,Store}Register and
  {Load,Store}Mem
   - This is just a bool in the json
- It needs to not output JIT dispatch for some IR operations
   - SSE4.2 string instructions and x87 operations
   - These go down the "Unhandled" path
- Needs to support a Dispatcher function override
   - This is just for handling NoOp IR operations that get used for
     other reasons.
- Finally removes VSMul and VUMul, consolidating to VMul
   - Unlike V{U,S}Mull, signed or unsigned doesn't change behaviour here
- Fixed a couple random handler names not matching the IR operation
  name.
2023-10-07 15:01:47 -07:00
Ryan Houdek
a1eb571630 ArmEmitter: Adds sized Scalar 1 source and 2 source helpers
Removes the need for an annoying switch statement with scalar operations
for the most part.
2023-10-07 11:51:32 -07:00
Ryan Houdek
559cf6491a InstCountCI: Support overriding AFP features
Also disable AFP under the vixl simulator by default since it doesn't support it.
2023-10-07 11:48:42 -07:00
Ryan Houdek
4bdda1eeb5 InstCountCI: Fixes recursive tests with same filename
This will be used to move AFP tests to a sub-directory
2023-10-07 11:47:16 -07:00
Mai
fc70fc3506
Merge pull request #3179 from Sonicadvance1/support_hostfeature_crypto
FEXCore: Support crypto extensions in HostFeatures override
2023-10-06 16:01:59 -04:00
Mai
26ee63cc24
Merge pull request #3181 from Sonicadvance1/remove_spurious_license
External: Remove a spurious license
2023-10-06 15:59:41 -04:00
Ryan Houdek
0092ea7c0b External: Remove a spurious license
This doesn't exist anymore
2023-10-06 09:37:17 -07:00
Ryan Houdek
439a3b9c3a HostFeatures: Use a define 2023-10-06 09:33:41 -07:00
Alyssa Rosenzweig
b4ddf36582
Merge pull request #3180 from Sonicadvance1/remove_warnings_14
Linux: Fixes warning in 32-bit clock_settime
2023-10-06 08:13:52 -04:00
Ryan Houdek
12c44f26e5 Linux: Fixes warning in 32-bit clock_settime
This syscall requires a valid pointer otherwise it returns EFAULT.
When going through the glibc helper it can crash before reaching the raw
syscall even.
2023-10-05 17:44:33 -07:00
Ryan Houdek
5b7ba06d5c FEXCore: Support crypto extensions in HostFeatures override
Enables in InstCountCI so Pi users can run InstCountCI can run the tests
without breaking on crypto operations.

When crypto is enabled or disabled just wholesale change AES, CRC32, and
PMULL 128-bit in one step. We don't really care about partial support
here.
2023-10-05 17:41:08 -07:00
Ryan Houdek
ee0c1457d8 Docs: Update for release FEX-2310 FEX-2310 2023-10-05 14:39:10 -07:00
Alyssa Rosenzweig
3413eb3d98
Merge pull request #3169 from Sonicadvance1/remove_constant_indirection
FEXCore: Support CpuState relative vector named constants
2023-10-05 08:22:48 -04:00
Ryan Houdek
2e0753a244 InstCountCI: Update for named vector constant optimization 2023-10-04 20:57:09 -07:00
Ryan Houdek
8a51bb7a61 FEXCore: Support CpuState relative vector named constants
The motivation towards just having a pointer array in CpuState was that
initialization was fairly cheap and that we have limited space inside
the encoding depending on what we want to do.

Initialization cost is still a concern but doing a memcpy of 128-bytes
isn't that big of a deal.

Limited space in CpuState, while a concern isn't a significant one.
   - Needs to currently be less than 1 page in size
   - Needs to be under the architectural offset limitations of loadstore
     scaled offsets. Which is 65KB for 128-bit vectors

Still keeps the pointer array around for cases when we would need
synthesize an address offset and it's just easier to load the
process-wide table.

The performance improvement here is removing the dependency in the
ldr+ldr chain. In microbenchmarks this has shown to have an improvement
of ~4% by removing this dependency chain on Cortex-X1C.
2023-10-04 20:56:29 -07:00
Ryan Houdek
ee6debe8fd FEXCore: Adds DividePow2 helper 2023-10-04 20:56:29 -07:00
Mai
3ba1c7912c
Merge pull request #3178 from Sonicadvance1/fix_avx_alias_precolour
Minor AVX optimizations
2023-10-04 21:31:20 -04:00
Ryan Houdek
a408afaeb0 InstCountCI: Update for optimized AVX 2023-10-04 10:05:09 -07:00
Ryan Houdek
fba7c4bedc IR/RA: Fixes register aliasing and pre-colouring for AVX
This is the cause of a bunch of redundant moves that shows up in
InstCountCI. Fixing this aliasing and pre-colouring issue causes a ton
of 256-bit operations to become optimal.
2023-10-04 10:04:06 -07:00
Ryan Houdek
c52753e9c8 OpcodeDispatcher: Minor optimization in vzeroall
Using the cached zero value is less efficient than loading it in to the
register for all these cases.

Lets us use rename hardware more efficiently and removes a dependency
chain on a single register.

Original:
```
movi v2.2d, #0x0
mov z16.d, p7/m, z2.d
<... 16 more times>
mov z31.d, p7/m, z2.d
```

Result:
```
movi v16.2d, #0x0
<... 16 more times>
movi v31.2d, #0x0
```
2023-10-04 10:01:13 -07:00
Ryan Houdek
e39634d314 Arm64: Fixes assert in VSQSHL/VSQSHR with SVE
When Dst != Vector then we need to pass Dst in to both Zd and Zdn.
Would have worked fine in a release build but assert build managed to
capture it.
2023-10-04 09:59:59 -07:00
Ryan Houdek
507cf82dad
Merge pull request #3176 from neobrain/fix_thunks_unused_artifacts
Thunks: Only build guest target for libfex_thunk_test if FEXLinuxTests are enabled
2023-10-04 07:07:18 -07:00
Ryan Houdek
48fa4f1121
Merge pull request #3156 from neobrain/feature_thunk_data_layout_analysis
Thunks: Analyze data layout to detect platform differences
2023-10-04 07:06:49 -07:00
Tony Wasserka
e06d609bf0 Thunks: Drop unused STRUCT_VERIFIER define from CMake 2023-10-03 11:43:29 +02:00
Tony Wasserka
0a09e04e33 Thunks: Only build guest target for libfex_thunk_test if FEXLinuxTests are enabled 2023-10-03 11:43:27 +02:00
Ryan Houdek
a1a709f948
Merge pull request #3170 from Sonicadvance1/vixl_sim_instcountci
InstCountCI: Enable running on x86 hosts
2023-10-02 16:38:25 -07:00
Ryan Houdek
5925eef213 Github/InstCountCI: Enables x86 runner
To ensure we don't break this path for developers.
2023-10-02 16:26:14 -07:00