Commit Graph

4056 Commits

Author SHA1 Message Date
Alyssa Rosenzweig
af21b8f3c7 Move External/FEXCore/ to FEXCore/
It is not an external component, and it makes paths needlessly long.
Ryan seemed amenable to this when we discussed on IRC earlier.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-08-17 16:32:16 -04:00
Lioncache
e14e71aaff IR: Allow 128-bit broadcasts in VBroadcastFromMem
Now all vbroadcast implementations go down the more optimal path.

For non-SVE 128-bit cases where we only have 128-bit wide registers,
we behave like ld1rqb and just act as a normal 128-bit load for
interface convenience.
2023-08-17 14:20:12 -04:00
Ryan Houdek
a9dea29f03
Merge pull request #2917 from lioncash/quad
ARMEmitter: Handle SVE load and broadcast quadword groups
2023-08-17 10:59:27 -07:00
Lioncache
f97df2a40f ARMEmitter: Handle SVE load and broadcast quadword (scalar plus scalar) group 2023-08-17 13:32:20 -04:00
Lioncache
6f9bc1e2fe ARMEmitter: Handle SVE load and broadcast quadword (scalar plus imm) category 2023-08-17 13:32:17 -04:00
Ryan Houdek
49b8b7cd2c
Merge pull request #2916 from Sonicadvance1/128bit_predicate
Arm64Emitter: Ensure that 128-bit predicate is generated with SVE
2023-08-17 09:56:56 -07:00
Ryan Houdek
059c022255 Arm64Emitter: Ensure that 128-bit predicate is generated with SVE
In the case of running on a 128-bit SVE system this predicate wasn't
setup. Since we never had any predicate usage before this wasn't an
issue. Now that #2914 is using the 128-bit predicate we need to make
sure that we are generating it.
2023-08-17 09:37:55 -07:00
Lioncache
25708be807 IR: Add TSO handling to VBroadcastFromMem 2023-08-17 12:30:57 -04:00
Lioncache
c8e3ca481f OpcodeDispatcher: Remove explicit zero-extending in VBROADCASTOp
Since the implementations zero the upper lanes when appropriate, we can
remove the unnecessary explicit move.
2023-08-17 12:17:24 -04:00
Lioncache
879bc5176e IR: Add VBroadcastFromMem opcode
Allows the implementations of the vbroadcast instructions to perform the
load and broadcast in one operation as opposed to doing the load and then
broadcast separately.

Notably, the broadcasting loads can also be used on systems that have SVE 128-bit
support as well, not only 256-bit.

On non-SVE systems, we use the equivalent AdvSIMD instructions.
2023-08-17 12:17:24 -04:00
Ryan Houdek
6d562f8b3b
Merge pull request #2911 from Sonicadvance1/stop_abusing_orr
Arm64: Stop abusing orr in LoadConstant
2023-08-17 09:13:36 -07:00
Ryan Houdek
1029bb1fae
Merge pull request #2910 from Sonicadvance1/minor_bfi_opt
Arm64: Optimize non-optimal BFI move case
2023-08-17 09:12:37 -07:00
Ryan Houdek
1343c14db0
Merge pull request #2909 from Sonicadvance1/optimize_clzero_clear
Arm64: Optimize CacheLine{Clear,Clean}
2023-08-17 09:10:45 -07:00
Lioncache
77c64285cb OpcodeDispatcher: Remove unused variable in AVXVectorUnaryOpImpl
Forgot to remove this when getting rid of the unnecessary
explicit zero-extending behavior
2023-08-17 11:26:41 -04:00
Ryan Houdek
fe37c89109 FEXCore/Config: Stop making temporary string copies
For config values that were string objects we were unnecessary creating
copies each time the string was accessed.

Convert the () operator over to returning a reference.
2023-08-16 21:35:13 -07:00
Ryan Houdek
23fd79a3b3 Arm64: Stop abusing orr in LoadConstant
The current implementation uses orr excessively. This has FEX missing
hardware optimization opportunities where some CPU cores will zero-cycle
move constants that fit in to the 16-bits of movz/movk.

First evaluate up front if the number of 16-bit segments is > 1, in
those cases we should check if it is a bitfield that can be moved in one
instruction with orr.

After that point we will use movz for 16-bit constant moves.

Additionally this optimizes the case where a constant of zero is loaded
to be a `mov <reg>, zr` which gets renamed in most hardware.
2023-08-16 19:35:15 -07:00
Ryan Houdek
a3b40c37c2 Arm64: Optimize non-optimal BFI move case
Commonly we are doing a BFI into a 32-bit register, which is hitting the
ubfx (lsr alias) path.

In the case of 32-bit destination we can also do a regular move, which
will take advantage of CPU's rename functionality and give a minor speed
boost.
2023-08-16 14:35:41 -07:00
Ryan Houdek
4522a766e0 Arm64: Optimize CacheLine{Clear,Clean}
When the cacheline size matches the expected x86 cacheline size then we
can remove the spurious move + add.
2023-08-16 14:20:22 -07:00
Ryan Houdek
fc12958095 FEXCore/IR: Fixes bug in IRDumper without specification
Didn't notice this in the previous PR, When DUMPIR=stderr without and
selection of where to place it in PASSMANAGERDUMPIR it was supposed to
put the dumper at the end of the passes.

We need to make sure that it it placed at the end of the passes rather
than current `it`.
2023-08-16 13:51:03 -07:00
Mai
df3d4efc80
Merge pull request #2904 from Sonicadvance1/instcountci_only_arm
GIthub: Only enable InstCountCI on an ARM platform
2023-08-15 17:33:10 -04:00
Ryan Houdek
1441cb76b9 HostFeatures: Adds support for overriding ARMv8.1 LSE atomics
Always enable it on the InstCountCI.
2023-08-15 14:12:27 -07:00
Lioncache
17956eac5f OpcodeDispatcher: Eliminate unnecessary moves in AVXVectorUnaryOpImpl
We no longer need to do any manual zero-extending here, since this
will occur automatically on hardware with SVE when 128-bit AdvSIMD
is used.
2023-08-15 15:43:34 -04:00
Lioncache
2708374d95 Arm64/VectorOps: Remove redundant move in VFRSqrt SVE path
We can perform the SQRT first and then broadcast 1.0 into the destination
since all the intermediary work is done, meaning we don't have to worry
about Dst and Vector aliasing one another.
2023-08-15 15:22:21 -04:00
Lioncache
6acce60855 ARMEmitter: Handle SVE load and broadcast element group
These can be used to improve vbroadcast implementations from
doing a mem load+dup in the non-GPR case into just directly
loading into the destination.
2023-08-15 13:47:12 -04:00
Lioncache
81115f64f6 ARMEmitter: Handle SVE Store Multiple Structures (scalar plus scalar) 2023-08-15 10:18:37 -04:00
Lioncache
0176efa3bb ARMEmitter: Handle SVE Load Multiple Structures (scalar plus scalar) group 2023-08-15 10:01:15 -04:00
Ryan Houdek
398e76be89 X86Tables: Fixes typo 2023-08-14 16:04:05 -07:00
Ryan Houdek
f248e7f3e7 Config: If DumpIR is enabled, default enable a passmanager option
If DumpIR is enabled but the PassManagerDumpIR option isn't enabled then
this currently does nothing.

As a convenience, enable dumping the final optimized IR if an option
hasn't been specified.
2023-08-14 12:29:56 -07:00
Ryan Houdek
e51606c669 Config: Fixes mixup in PassManagerDumpIR
The opt and pass options were inverted in PassManager.
Renames the enum to make this more clear.
2023-08-14 12:28:37 -07:00
Ryan Houdek
648d8aeb65 Config: Adds missing server option to DumpIR description
This was accepted but I failed to describe it when added.
2023-08-14 12:22:35 -07:00
Ryan Houdek
112c463655 Config: Ensure OutputLog to server doesn't try to expand path
"server" isn't a path, this was missed when it was added.
2023-08-14 12:20:58 -07:00
Alyssa Rosenzweig
7ecbbd6c04 ConstProp: Fix set-but-not-used mask variable
I think this was the intended logic?

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-08-14 11:59:59 -04:00
Ryan Houdek
ef3887ca4f X86Tables: Fixes typo in VEX table 2023-08-13 12:32:14 -07:00
Lioncache
8fd810c3c0 ARMEmitter: Migrate adr off SVEMemOperand
We need to move the modifier enum out of the SVEMemOperand class
since it's also used with adr. Plus, this can also be convenient
not being tied down to the class itself.

This also makes accessing modifiers less noisy, since the class
2023-08-11 22:17:46 -04:00
Lioncache
068db933bf ARMEmitter: Handle SVE ADR 2023-08-11 19:50:21 -04:00
Lioncache
0bf74a1f3e ARMEmitter: Use signed imm8 handler with dup_imm
Lets us deduplicate the behavior used for dup and cpy.
2023-08-11 18:30:45 -04:00
Lioncache
78f06c7fcb ARMEmitter: Handle SVE CPY (immediate)
Also adds the relevant aliases.
2023-08-11 18:24:26 -04:00
Ryan Houdek
6d1fcfce09
Merge pull request #2877 from Sonicadvance1/classification_adds
InstructionCountCI: Adds three more instruction tables
2023-08-11 15:09:29 -07:00
Ryan Houdek
5a0a6dd0ca
Merge pull request #2880 from lioncash/vfp
ARMEmitter: Migrate off vixl float utils
2023-08-11 14:35:23 -07:00
Mai
da17e24996
Merge pull request #2879 from Sonicadvance1/fix_adcx
FEXCore: Fixes bug with 32-bit adcx
2023-08-11 17:27:00 -04:00
Lioncache
aef0795dc8 ARMEmitter: Make FloatToEquivalentUInt a little more robust
Rather than compare sizes, we should be comparing the types directly,
prevents any shenanigans from happening if interface changes occur to
Float16.
2023-08-11 17:16:14 -04:00
Lioncache
f63498a558 ARMEmitter: Migrate off vixl float utils 2023-08-11 17:12:34 -04:00
Ryan Houdek
9aa3fde174 FEXCore: Fixes bug with 32-bit adcx
When a 32-bit adcx instruction was encountered, it was getting treated
as a 16-bit adcx instruction instead. This is because of the 0x66 prefix
required to handle this instruction.

Adds a unit test to ensure it doesn't break again.
2023-08-11 14:09:31 -07:00
Ryan Houdek
8fce13386a
Merge pull request #2878 from lioncash/fcpy
ARMEmitter: Handle SVE FCPY (predicated)
2023-08-11 13:58:31 -07:00
Lioncache
247c7ce784 ARMEmitter: Handle SVE FCPY (predicated)
While we're at it, we can reduce our dependence on vixl's utils by
implementing our own based off the pseudocode of VFPExpandImm.
2023-08-11 16:18:20 -04:00
Ryan Houdek
e8e52af2e8 CodeSizeValidation: Adds support for overriding CLZero support
Vixl simulator by default doesn't support this.
2023-08-11 11:12:46 -07:00
Ryan Houdek
b8b4dd8008 FEXCore/Utils: Add the ability to write a fextl::string 2023-08-11 09:08:54 -07:00
Ryan Houdek
ec8855f8fb Arm64: Consolidate simulator and diassembler code in to Arm64Emitter
This was confusingly split between Arm64Emitter, Arm64Dispatcher, and
Arm64JIT.

- Arm64JIT objects were unnecessary and free to be deleted.
- Arm64Dispatcher simulator and decoder moved to Arm64Emitter
- Arm64Emitter disassembler and decoder renamed
  - Dropped usage of the PrintDisassembler since it is hardcoded to go
    through a FILE* type
  - We instead want its output to go through LogMan, which means using a
    split Decoder+Disassembler object pair.
  - Can't reuse the object from the vixl simulator since the simulator
    registers the decoder as a visitor, causing the simulator to execute
    while disassembling instructions if reused.
- Disassembly output for blocks and dispatcher now output through Logman
  - Blocks wrapped in Begin/End text for tracking purposes for CI.
2023-08-11 08:05:10 -07:00
Ryan Houdek
969ad9b3b0
Merge pull request #2869 from Sonicadvance1/sve_128bit_ci
Github: Adds a CI runner for 128-bit SVE testing
2023-08-11 07:37:47 -07:00
Ryan Houdek
5de7eeea20
Merge pull request #2876 from lioncash/comment
ARMEmitter: Remove resolved TODO comment
2023-08-11 07:22:19 -07:00