4074 Commits

Author SHA1 Message Date
Ryan Houdek
ef3887ca4f X86Tables: Fixes typo in VEX table 2023-08-13 12:32:14 -07:00
Lioncache
8fd810c3c0 ARMEmitter: Migrate adr off SVEMemOperand
We need to move the modifier enum out of the SVEMemOperand class
since it's also used with adr. Plus, this can also be convenient
not being tied down to the class itself.

This also makes accessing modifiers less noisy, since the class
2023-08-11 22:17:46 -04:00
Lioncache
068db933bf ARMEmitter: Handle SVE ADR 2023-08-11 19:50:21 -04:00
Lioncache
0bf74a1f3e ARMEmitter: Use signed imm8 handler with dup_imm
Lets us deduplicate the behavior used for dup and cpy.
2023-08-11 18:30:45 -04:00
Lioncache
78f06c7fcb ARMEmitter: Handle SVE CPY (immediate)
Also adds the relevant aliases.
2023-08-11 18:24:26 -04:00
Ryan Houdek
6d1fcfce09
Merge pull request #2877 from Sonicadvance1/classification_adds
InstructionCountCI: Adds three more instruction tables
2023-08-11 15:09:29 -07:00
Ryan Houdek
5a0a6dd0ca
Merge pull request #2880 from lioncash/vfp
ARMEmitter: Migrate off vixl float utils
2023-08-11 14:35:23 -07:00
Mai
da17e24996
Merge pull request #2879 from Sonicadvance1/fix_adcx
FEXCore: Fixes bug with 32-bit adcx
2023-08-11 17:27:00 -04:00
Lioncache
aef0795dc8 ARMEmitter: Make FloatToEquivalentUInt a little more robust
Rather than compare sizes, we should be comparing the types directly,
prevents any shenanigans from happening if interface changes occur to
Float16.
2023-08-11 17:16:14 -04:00
Lioncache
f63498a558 ARMEmitter: Migrate off vixl float utils 2023-08-11 17:12:34 -04:00
Ryan Houdek
9aa3fde174 FEXCore: Fixes bug with 32-bit adcx
When a 32-bit adcx instruction was encountered, it was getting treated
as a 16-bit adcx instruction instead. This is because of the 0x66 prefix
required to handle this instruction.

Adds a unit test to ensure it doesn't break again.
2023-08-11 14:09:31 -07:00
Ryan Houdek
8fce13386a
Merge pull request #2878 from lioncash/fcpy
ARMEmitter: Handle SVE FCPY (predicated)
2023-08-11 13:58:31 -07:00
Lioncache
247c7ce784 ARMEmitter: Handle SVE FCPY (predicated)
While we're at it, we can reduce our dependence on vixl's utils by
implementing our own based off the pseudocode of VFPExpandImm.
2023-08-11 16:18:20 -04:00
Ryan Houdek
e8e52af2e8 CodeSizeValidation: Adds support for overriding CLZero support
Vixl simulator by default doesn't support this.
2023-08-11 11:12:46 -07:00
Ryan Houdek
b8b4dd8008 FEXCore/Utils: Add the ability to write a fextl::string 2023-08-11 09:08:54 -07:00
Ryan Houdek
ec8855f8fb Arm64: Consolidate simulator and diassembler code in to Arm64Emitter
This was confusingly split between Arm64Emitter, Arm64Dispatcher, and
Arm64JIT.

- Arm64JIT objects were unnecessary and free to be deleted.
- Arm64Dispatcher simulator and decoder moved to Arm64Emitter
- Arm64Emitter disassembler and decoder renamed
  - Dropped usage of the PrintDisassembler since it is hardcoded to go
    through a FILE* type
  - We instead want its output to go through LogMan, which means using a
    split Decoder+Disassembler object pair.
  - Can't reuse the object from the vixl simulator since the simulator
    registers the decoder as a visitor, causing the simulator to execute
    while disassembling instructions if reused.
- Disassembly output for blocks and dispatcher now output through Logman
  - Blocks wrapped in Begin/End text for tracking purposes for CI.
2023-08-11 08:05:10 -07:00
Ryan Houdek
969ad9b3b0
Merge pull request #2869 from Sonicadvance1/sve_128bit_ci
Github: Adds a CI runner for 128-bit SVE testing
2023-08-11 07:37:47 -07:00
Ryan Houdek
5de7eeea20
Merge pull request #2876 from lioncash/comment
ARMEmitter: Remove resolved TODO comment
2023-08-11 07:22:19 -07:00
Lioncache
73288f377f ARMEmitter: Remove resolved TODO comment
I forgot to remove this after implementing the normal gather instruction handling.
2023-08-11 10:03:10 -04:00
Lioncache
0aaa9503c9 ARMEmitter: Add missing ld1w (scalar plus scalar) tests
ld1sw was mistakenly tested twice.

Also groups the tests by data sizes.
2023-08-11 09:51:04 -04:00
Lioncache
14cc23b6c3 ARMEmitter: Handle contiguous first fault load (scalar plus scalar) group
Adds the only missing implementation category for the first-faulting loads,
making the interface more consistent.
2023-08-11 09:46:40 -04:00
Ryan Houdek
833c07e9e2 CoreState: Zero initialize some important members
This was causing test failure locally where some values were set to
uninitialized data. Ensure that gregs, YMM, and MMX registers are all
zero initialized.
2023-08-10 22:27:59 -07:00
Ryan Houdek
186ec201aa Config: Stop passing a temporary std::string_view outside of scope
Was causing strenum variables to be parsed, then leaving scope would
break the string.
2023-08-10 22:27:59 -07:00
Ryan Houdek
887c47c451 Config: Adds an option to override SVE width for CI 2023-08-10 21:25:57 -07:00
Ryan Houdek
0f3460e025 Config: Fixes typo in HostFeatures disable{sve,avx} 2023-08-10 21:25:57 -07:00
Ryan Houdek
fadba9a3e1 External: Update vixl
Fixes simulator bug
2023-08-10 21:25:57 -07:00
Tony Wasserka
aed4dda3e4 Arm64: Remove unused function 2023-08-10 18:45:14 +02:00
Tony Wasserka
45d0f0d349 ARMEmitter: Fix warnings about unused variables in Release builds 2023-08-10 18:45:14 +02:00
Ryan Houdek
b1a3f82923 FEXCore: Fixes Arm64 stats disassembly
Requires the IR headerop to house the number of host instructions this
code is translating for the stats.

Fixes compiling with disassembly enabled, will be used with the
instruction count CI.
2023-08-10 03:23:25 -07:00
Lioncache
444961ad79 ARMEmitter: Handle SVE FP multiply-add long group 2023-08-09 15:20:04 -04:00
Lioncache
48a3271fbc ARMEmitter: Handle SVE FP multiply-add long (indexed) group 2023-08-09 15:19:49 -04:00
Ryan Houdek
35e97ec9bc IR: Adds Option to run the IRDumper with more configurations
This is incredibly useful and I find myself hacking this feature in
every time I am optimizing IR. Adds a new configuration option which
allows dumping IR at various times.

Before any optimization passes has happened
After all optimizations passes have happened
Before and After each IRPass to see what is breaking something.

Needs #2864 merged first
2023-08-09 05:58:20 -07:00
Ryan Houdek
53ac8abce9
Merge pull request #2863 from Sonicadvance1/stats
Arm64: Adds stats to the disassembly
2023-08-09 04:06:22 -07:00
Ryan Houdek
a23cb0447b Arm64: Implement first SVE-128bit optimization
This is a /very/ simple optimization purely because of a choice that ARM
made with SVE in latest Cortex.

Cortex-A715:
   - sxtl/sxtl2/uxtl/uxtl2 can execute 1 instruction per cycle.
   - sunpklo/sunpkhi/uunpklo/uunpkhi can execute 2 instructions per cycle.

Cortex-X3:
   - sxtl/sxtl2/uxtl/uxtl2 can execute 2 instruction per cycle.
   - sunpklo/sunpkhi/uunpklo/uunpkhi can execute 4 instructions per cycle.

This is fairly quirky since this optimization only works on SVE systems
with 128-bit Vector length. Which since it is all of the current
consumer platforms, it will work.
2023-08-09 03:51:57 -07:00
Ryan Houdek
f2aa2ce4bb Arm64: Rename HostSupportsSVE
We need to know the difference between the host supporting SVE with
128-bit registers versus 256-bit registers. Ensure we know the
difference.

No functional change here.
2023-08-09 03:51:56 -07:00
Ryan Houdek
cf93652708 Config: Adds support for overriding host features
This allows use to both enable and disable regardless of what the host
supports. This replaces the old `EnableAVX` option.

Unlike the old EnableAVX option which was a binary option which could
only disable, each of these options are technically trinary states.
Not setting an option gives you the default detection, while explicitly
enabling or disabling will toggle the option regardless of what the host
supports.

This will be used by the instruction count CI in the future.
2023-08-09 03:51:37 -07:00
Ryan Houdek
eaed5c4704
Merge pull request #2862 from Sonicadvance1/optimize_vector_zero
ARM64: Optimize vector zeroing
2023-08-09 03:51:04 -07:00
Ryan Houdek
5f0efda8fe ARM64: Fixes shift by immediate zero
These would emit invalid instructions in most cases. Turn in to a move
or a no-op if the shift is zero.
2023-08-09 02:16:17 -07:00
Ryan Houdek
d198d701aa OpcodeDispatcher: Fixes vector shifts by immediate zero
pslldq logic was wrong in the case of zero shift.
The rest should just return their source in the case of zero shift.
2023-08-09 02:16:17 -07:00
Ryan Houdek
bf5719770e Arm64: Remove erroneous LoadConstant
This was a debug LoadConstant that would load the entry in to a temprary
register to make it easier to see what RIP a block was in.

This was implemented when FEX stopped storing the RIP in the CPU state
for every block. This is now no longer necessary since FEX stores the
in the tail data of the block.

This was affecting instructioncountci when in a debug build.
2023-08-08 22:56:36 -07:00
Ryan Houdek
0f6a268243 Arm64: Adds stats to the disassembly
I use this locally when looking for optimization opportunities in the
JIT.
The instruction count CI in the future will use this as well.
Just get it upstreamed right away.
2023-08-08 22:28:52 -07:00
Ryan Houdek
e0461497a0 ARM64: Optimize vector zeroing
`eor <reg>, <reg>, <reg>` is not the optimal way to zero a vector
register on ARM CPUs. Instead we should move by constant or zero
register to take advantage of zero-latency moves.
2023-08-08 22:24:11 -07:00
Ryan Houdek
68cb6e61d1
Merge pull request #2860 from lioncash/alias
ARMEmitter: Add missing atomic aliases
2023-08-06 02:05:30 -07:00
Mai
5d0b2060e2
Merge pull request #2858 from Sonicadvance1/fix_clzero
X86Tables: Fixes CLZero destination address
2023-08-06 05:05:07 -04:00
Lioncache
b7d05a65c7 ARMEmitter: Add missing atomic aliases 2023-08-04 21:49:59 -04:00
Lioncache
93fe2fe06c ARMEmitter: Detemplatize LoadStoreAtomicLSE
Lets us lessen some template instantiations.
2023-08-04 19:47:03 -04:00
Lioncache
1b53337925 ARMEmitter: Simplify LoadStoreAtomicLSE variant
We can move the base opcode into the implementation function.
2023-08-04 19:35:54 -04:00
Billy Laws
52e5b8ccd9 OpcodeDispatcher: Fix 16-bit popa insertion behaviour
The 16-bit writes shouldn't overwrite the upper half of the 32-bit register for
POPA.
2023-08-04 17:24:55 +01:00
Ryan Houdek
617977357a X86Tables: Fixes CLZero destination address
This needs to default to 64-bit addresses, this was previously
defaulting to 32-bit which was meaning the destination address was
getting truncated. In a 32-bit process the address is still 32-bit.

I'm actually surprised this hasn't caused spurious SIGSEGV before this
point.

Adds a 32-bit test to ensure that side is tested as well.
2023-08-04 02:31:30 -07:00
Alyssa Rosenzweig
a996e5300e JIT: Use TST instead of CMN
This is more obvious. llvm-mca says TST is half the cycle count of CMN
for whatever it's defaulting to. dougallj's reference shows both as the
same performance.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-08-02 17:51:33 -04:00