7054 Commits

Author SHA1 Message Date
Ryan Houdek
e8e52af2e8 CodeSizeValidation: Adds support for overriding CLZero support
Vixl simulator by default doesn't support this.
2023-08-11 11:12:46 -07:00
Ryan Houdek
6c7371af60
Merge pull request #2872 from Sonicadvance1/instruction_count_ci
FEX: Adds instruction count CI
2023-08-11 11:12:00 -07:00
Ryan Houdek
acc7f2fa8f FEX: Adds instruction count CI
Implements CI for tracking instruction counts for generate blocks of
code when transforming from x86 to ARM64 assembly.

This will end up encompassing every instruction in our instruction
tables similarly to how our assembly tests try to test everything in our
instruction tables.

Incidentally, the data for this CI is generated using our assembly
tests. By enabling disassembly and instruction stats when executing a
suite of instructions, this gives the stats that can be added to a json
file.

The current implementation only implements the SecondGroup table of
instructions because it is a relatively small table and has known
inefficiencies in the instruction implementations. As this gets merged I
will be adding more tables of instructions to additional json files for
testing.

These JSON files will support adjusting CPU features regardless of the
host features so it can test implementations depending on different CPU
features. This will let us test things like one instruction having
different "optimal" implementations depending on if it supports SVE128,
SVE256, SVEI8MM, etc.

This initial instruction auditing is what found the bug in our vector
shift instructions by size of zero. If inspecting the result of the CI
run, you can tell that these instructions still aren't "optimal" because
they are doing loads and stores that can be eliminated.

The "Optimal" in the JSON is purely for human readable and grepping
ability to see what is optimal versus not. Same with the "Comment"
section.

According to my auditing spreadsheet, the total number of instructions
that will end up in these json files will be about 1000, but we will
likely end up with more since there will be edge cases that can be more
optimal depending on arguments.
2023-08-11 09:10:36 -07:00
Ryan Houdek
b8b4dd8008 FEXCore/Utils: Add the ability to write a fextl::string 2023-08-11 09:08:54 -07:00
Ryan Houdek
ec8855f8fb Arm64: Consolidate simulator and diassembler code in to Arm64Emitter
This was confusingly split between Arm64Emitter, Arm64Dispatcher, and
Arm64JIT.

- Arm64JIT objects were unnecessary and free to be deleted.
- Arm64Dispatcher simulator and decoder moved to Arm64Emitter
- Arm64Emitter disassembler and decoder renamed
  - Dropped usage of the PrintDisassembler since it is hardcoded to go
    through a FILE* type
  - We instead want its output to go through LogMan, which means using a
    split Decoder+Disassembler object pair.
  - Can't reuse the object from the vixl simulator since the simulator
    registers the decoder as a visitor, causing the simulator to execute
    while disassembling instructions if reused.
- Disassembly output for blocks and dispatcher now output through Logman
  - Blocks wrapped in Begin/End text for tracking purposes for CI.
2023-08-11 08:05:10 -07:00
Ryan Houdek
969ad9b3b0
Merge pull request #2869 from Sonicadvance1/sve_128bit_ci
Github: Adds a CI runner for 128-bit SVE testing
2023-08-11 07:37:47 -07:00
Ryan Houdek
5de7eeea20
Merge pull request #2876 from lioncash/comment
ARMEmitter: Remove resolved TODO comment
2023-08-11 07:22:19 -07:00
Ryan Houdek
0109e88082
Merge pull request #2875 from lioncash/ff
ARMEmitter: Handle contiguous first fault load (scalar plus scalar) group
2023-08-11 07:09:32 -07:00
Lioncache
73288f377f ARMEmitter: Remove resolved TODO comment
I forgot to remove this after implementing the normal gather instruction handling.
2023-08-11 10:03:10 -04:00
Lioncache
0aaa9503c9 ARMEmitter: Add missing ld1w (scalar plus scalar) tests
ld1sw was mistakenly tested twice.

Also groups the tests by data sizes.
2023-08-11 09:51:04 -04:00
Lioncache
14cc23b6c3 ARMEmitter: Handle contiguous first fault load (scalar plus scalar) group
Adds the only missing implementation category for the first-faulting loads,
making the interface more consistent.
2023-08-11 09:46:40 -04:00
Ryan Houdek
5404dba360 Github: Adds a CI runner for 128-bit SVE testing
We don't currently have a device in CI that can run SVE with 128-bit
width registers. Until we have a device with this, make sure the vixl
simulator is also running the ASM tests in this width.
2023-08-10 22:27:59 -07:00
Ryan Houdek
833c07e9e2 CoreState: Zero initialize some important members
This was causing test failure locally where some values were set to
uninitialized data. Ensure that gregs, YMM, and MMX registers are all
zero initialized.
2023-08-10 22:27:59 -07:00
Ryan Houdek
186ec201aa Config: Stop passing a temporary std::string_view outside of scope
Was causing strenum variables to be parsed, then leaving scope would
break the string.
2023-08-10 22:27:59 -07:00
Ryan Houdek
887c47c451 Config: Adds an option to override SVE width for CI 2023-08-10 21:25:57 -07:00
Ryan Houdek
0f3460e025 Config: Fixes typo in HostFeatures disable{sve,avx} 2023-08-10 21:25:57 -07:00
Ryan Houdek
fadba9a3e1 External: Update vixl
Fixes simulator bug
2023-08-10 21:25:57 -07:00
Ryan Houdek
9d26af95ab
Merge pull request #2873 from neobrain/refactor_warning_fixes
Various warning fixes
2023-08-10 16:58:14 -07:00
Tony Wasserka
aed4dda3e4 Arm64: Remove unused function 2023-08-10 18:45:14 +02:00
Tony Wasserka
e0d21e61cc Syscalls: Fix warnings about unused variables in Release builds 2023-08-10 18:45:14 +02:00
Tony Wasserka
45d0f0d349 ARMEmitter: Fix warnings about unused variables in Release builds 2023-08-10 18:45:14 +02:00
Tony Wasserka
f1cc76614b Include VIXL as a system library
This suppresses warnings from VIXL headers.
2023-08-10 18:45:14 +02:00
Ryan Houdek
099f29f1ed
Merge pull request #2871 from Sonicadvance1/fix_stats_missing_member
FEXCore: Fixes Arm64 stats disassembly
2023-08-10 06:23:02 -07:00
Ryan Houdek
b1a3f82923 FEXCore: Fixes Arm64 stats disassembly
Requires the IR headerop to house the number of host instructions this
code is translating for the stats.

Fixes compiling with disassembly enabled, will be used with the
instruction count CI.
2023-08-10 03:23:25 -07:00
Ryan Houdek
6f4a23dd15
Merge pull request #2870 from lioncash/indexed
ARMEmitter: Handle SVE FP multiply-add long groups
2023-08-09 21:35:33 -07:00
Ryan Houdek
f3182036bc
Merge pull request #2867 from Sonicadvance1/dummy_thin_handlers
FEX: Create a CommonTools static library
2023-08-09 21:34:46 -07:00
Lioncache
444961ad79 ARMEmitter: Handle SVE FP multiply-add long group 2023-08-09 15:20:04 -04:00
Lioncache
48a3271fbc ARMEmitter: Handle SVE FP multiply-add long (indexed) group 2023-08-09 15:19:49 -04:00
Mai
ea8fbc61c2
Merge pull request #2868 from Sonicadvance1/irdumper_passmanager
IR: Adds Option to run the IRDumper with more configurations
2023-08-09 10:28:25 -04:00
Ryan Houdek
35e97ec9bc IR: Adds Option to run the IRDumper with more configurations
This is incredibly useful and I find myself hacking this feature in
every time I am optimizing IR. Adds a new configuration option which
allows dumping IR at various times.

Before any optimization passes has happened
After all optimizations passes have happened
Before and After each IRPass to see what is breaking something.

Needs #2864 merged first
2023-08-09 05:58:20 -07:00
Ryan Houdek
53ac8abce9
Merge pull request #2863 from Sonicadvance1/stats
Arm64: Adds stats to the disassembly
2023-08-09 04:06:22 -07:00
Ryan Houdek
fe351353f6
Merge pull request #2865 from Sonicadvance1/first_sve_opt
Arm64: Implement first SVE-128bit optimization
2023-08-09 04:06:05 -07:00
Ryan Houdek
a23cb0447b Arm64: Implement first SVE-128bit optimization
This is a /very/ simple optimization purely because of a choice that ARM
made with SVE in latest Cortex.

Cortex-A715:
   - sxtl/sxtl2/uxtl/uxtl2 can execute 1 instruction per cycle.
   - sunpklo/sunpkhi/uunpklo/uunpkhi can execute 2 instructions per cycle.

Cortex-X3:
   - sxtl/sxtl2/uxtl/uxtl2 can execute 2 instruction per cycle.
   - sunpklo/sunpkhi/uunpklo/uunpkhi can execute 4 instructions per cycle.

This is fairly quirky since this optimization only works on SVE systems
with 128-bit Vector length. Which since it is all of the current
consumer platforms, it will work.
2023-08-09 03:51:57 -07:00
Ryan Houdek
f2aa2ce4bb Arm64: Rename HostSupportsSVE
We need to know the difference between the host supporting SVE with
128-bit registers versus 256-bit registers. Ensure we know the
difference.

No functional change here.
2023-08-09 03:51:56 -07:00
Ryan Houdek
cf93652708 Config: Adds support for overriding host features
This allows use to both enable and disable regardless of what the host
supports. This replaces the old `EnableAVX` option.

Unlike the old EnableAVX option which was a binary option which could
only disable, each of these options are technically trinary states.
Not setting an option gives you the default detection, while explicitly
enabling or disabling will toggle the option regardless of what the host
supports.

This will be used by the instruction count CI in the future.
2023-08-09 03:51:37 -07:00
Ryan Houdek
eaed5c4704
Merge pull request #2862 from Sonicadvance1/optimize_vector_zero
ARM64: Optimize vector zeroing
2023-08-09 03:51:04 -07:00
Mai
c77ed78f5a
Merge pull request #2861 from Sonicadvance1/fix_vector_shift_by_zero
FEXCore: Fixes vector shifts by zero
2023-08-09 05:52:10 -04:00
Ryan Houdek
348844a95b FEX: Create a CommonTools static library
Moves the dummy handlers over to this library. This will end up getting
used for more than the mingw test harness runner once the instruction
count CI is operational.
2023-08-09 02:27:13 -07:00
Ryan Houdek
e8fb322025 unittests: Adds tests for vector shifts with zero immediate
To ensure FEX doesn't encounter the encoding bug again.
2023-08-09 02:16:17 -07:00
Ryan Houdek
5f0efda8fe ARM64: Fixes shift by immediate zero
These would emit invalid instructions in most cases. Turn in to a move
or a no-op if the shift is zero.
2023-08-09 02:16:17 -07:00
Ryan Houdek
d198d701aa OpcodeDispatcher: Fixes vector shifts by immediate zero
pslldq logic was wrong in the case of zero shift.
The rest should just return their source in the case of zero shift.
2023-08-09 02:16:17 -07:00
Mai
c4c7620ed5
Merge pull request #2866 from Sonicadvance1/remove_unnecessary_loadconstant
Arm64: Remove erroneous LoadConstant
2023-08-09 05:10:11 -04:00
Ryan Houdek
bf5719770e Arm64: Remove erroneous LoadConstant
This was a debug LoadConstant that would load the entry in to a temprary
register to make it easier to see what RIP a block was in.

This was implemented when FEX stopped storing the RIP in the CPU state
for every block. This is now no longer necessary since FEX stores the
in the tail data of the block.

This was affecting instructioncountci when in a debug build.
2023-08-08 22:56:36 -07:00
Ryan Houdek
0f6a268243 Arm64: Adds stats to the disassembly
I use this locally when looking for optimization opportunities in the
JIT.
The instruction count CI in the future will use this as well.
Just get it upstreamed right away.
2023-08-08 22:28:52 -07:00
Ryan Houdek
e0461497a0 ARM64: Optimize vector zeroing
`eor <reg>, <reg>, <reg>` is not the optimal way to zero a vector
register on ARM CPUs. Instead we should move by constant or zero
register to take advantage of zero-latency moves.
2023-08-08 22:24:11 -07:00
Ryan Houdek
e9e5b6fb0b Docs: Update for release FEX-2308 FEX-2308 2023-08-06 02:34:55 -07:00
Ryan Houdek
68cb6e61d1
Merge pull request #2860 from lioncash/alias
ARMEmitter: Add missing atomic aliases
2023-08-06 02:05:30 -07:00
Mai
5d0b2060e2
Merge pull request #2858 from Sonicadvance1/fix_clzero
X86Tables: Fixes CLZero destination address
2023-08-06 05:05:07 -04:00
Lioncache
b7d05a65c7 ARMEmitter: Add missing atomic aliases 2023-08-04 21:49:59 -04:00
Lioncache
93fe2fe06c ARMEmitter: Detemplatize LoadStoreAtomicLSE
Lets us lessen some template instantiations.
2023-08-04 19:47:03 -04:00