7038 Commits

Author SHA1 Message Date
Lioncache
14cc23b6c3 ARMEmitter: Handle contiguous first fault load (scalar plus scalar) group
Adds the only missing implementation category for the first-faulting loads,
making the interface more consistent.
2023-08-11 09:46:40 -04:00
Ryan Houdek
9d26af95ab
Merge pull request #2873 from neobrain/refactor_warning_fixes
Various warning fixes
2023-08-10 16:58:14 -07:00
Tony Wasserka
aed4dda3e4 Arm64: Remove unused function 2023-08-10 18:45:14 +02:00
Tony Wasserka
e0d21e61cc Syscalls: Fix warnings about unused variables in Release builds 2023-08-10 18:45:14 +02:00
Tony Wasserka
45d0f0d349 ARMEmitter: Fix warnings about unused variables in Release builds 2023-08-10 18:45:14 +02:00
Tony Wasserka
f1cc76614b Include VIXL as a system library
This suppresses warnings from VIXL headers.
2023-08-10 18:45:14 +02:00
Ryan Houdek
099f29f1ed
Merge pull request #2871 from Sonicadvance1/fix_stats_missing_member
FEXCore: Fixes Arm64 stats disassembly
2023-08-10 06:23:02 -07:00
Ryan Houdek
b1a3f82923 FEXCore: Fixes Arm64 stats disassembly
Requires the IR headerop to house the number of host instructions this
code is translating for the stats.

Fixes compiling with disassembly enabled, will be used with the
instruction count CI.
2023-08-10 03:23:25 -07:00
Ryan Houdek
6f4a23dd15
Merge pull request #2870 from lioncash/indexed
ARMEmitter: Handle SVE FP multiply-add long groups
2023-08-09 21:35:33 -07:00
Ryan Houdek
f3182036bc
Merge pull request #2867 from Sonicadvance1/dummy_thin_handlers
FEX: Create a CommonTools static library
2023-08-09 21:34:46 -07:00
Lioncache
444961ad79 ARMEmitter: Handle SVE FP multiply-add long group 2023-08-09 15:20:04 -04:00
Lioncache
48a3271fbc ARMEmitter: Handle SVE FP multiply-add long (indexed) group 2023-08-09 15:19:49 -04:00
Mai
ea8fbc61c2
Merge pull request #2868 from Sonicadvance1/irdumper_passmanager
IR: Adds Option to run the IRDumper with more configurations
2023-08-09 10:28:25 -04:00
Ryan Houdek
35e97ec9bc IR: Adds Option to run the IRDumper with more configurations
This is incredibly useful and I find myself hacking this feature in
every time I am optimizing IR. Adds a new configuration option which
allows dumping IR at various times.

Before any optimization passes has happened
After all optimizations passes have happened
Before and After each IRPass to see what is breaking something.

Needs #2864 merged first
2023-08-09 05:58:20 -07:00
Ryan Houdek
53ac8abce9
Merge pull request #2863 from Sonicadvance1/stats
Arm64: Adds stats to the disassembly
2023-08-09 04:06:22 -07:00
Ryan Houdek
fe351353f6
Merge pull request #2865 from Sonicadvance1/first_sve_opt
Arm64: Implement first SVE-128bit optimization
2023-08-09 04:06:05 -07:00
Ryan Houdek
a23cb0447b Arm64: Implement first SVE-128bit optimization
This is a /very/ simple optimization purely because of a choice that ARM
made with SVE in latest Cortex.

Cortex-A715:
   - sxtl/sxtl2/uxtl/uxtl2 can execute 1 instruction per cycle.
   - sunpklo/sunpkhi/uunpklo/uunpkhi can execute 2 instructions per cycle.

Cortex-X3:
   - sxtl/sxtl2/uxtl/uxtl2 can execute 2 instruction per cycle.
   - sunpklo/sunpkhi/uunpklo/uunpkhi can execute 4 instructions per cycle.

This is fairly quirky since this optimization only works on SVE systems
with 128-bit Vector length. Which since it is all of the current
consumer platforms, it will work.
2023-08-09 03:51:57 -07:00
Ryan Houdek
f2aa2ce4bb Arm64: Rename HostSupportsSVE
We need to know the difference between the host supporting SVE with
128-bit registers versus 256-bit registers. Ensure we know the
difference.

No functional change here.
2023-08-09 03:51:56 -07:00
Ryan Houdek
cf93652708 Config: Adds support for overriding host features
This allows use to both enable and disable regardless of what the host
supports. This replaces the old `EnableAVX` option.

Unlike the old EnableAVX option which was a binary option which could
only disable, each of these options are technically trinary states.
Not setting an option gives you the default detection, while explicitly
enabling or disabling will toggle the option regardless of what the host
supports.

This will be used by the instruction count CI in the future.
2023-08-09 03:51:37 -07:00
Ryan Houdek
eaed5c4704
Merge pull request #2862 from Sonicadvance1/optimize_vector_zero
ARM64: Optimize vector zeroing
2023-08-09 03:51:04 -07:00
Mai
c77ed78f5a
Merge pull request #2861 from Sonicadvance1/fix_vector_shift_by_zero
FEXCore: Fixes vector shifts by zero
2023-08-09 05:52:10 -04:00
Ryan Houdek
348844a95b FEX: Create a CommonTools static library
Moves the dummy handlers over to this library. This will end up getting
used for more than the mingw test harness runner once the instruction
count CI is operational.
2023-08-09 02:27:13 -07:00
Ryan Houdek
e8fb322025 unittests: Adds tests for vector shifts with zero immediate
To ensure FEX doesn't encounter the encoding bug again.
2023-08-09 02:16:17 -07:00
Ryan Houdek
5f0efda8fe ARM64: Fixes shift by immediate zero
These would emit invalid instructions in most cases. Turn in to a move
or a no-op if the shift is zero.
2023-08-09 02:16:17 -07:00
Ryan Houdek
d198d701aa OpcodeDispatcher: Fixes vector shifts by immediate zero
pslldq logic was wrong in the case of zero shift.
The rest should just return their source in the case of zero shift.
2023-08-09 02:16:17 -07:00
Mai
c4c7620ed5
Merge pull request #2866 from Sonicadvance1/remove_unnecessary_loadconstant
Arm64: Remove erroneous LoadConstant
2023-08-09 05:10:11 -04:00
Ryan Houdek
bf5719770e Arm64: Remove erroneous LoadConstant
This was a debug LoadConstant that would load the entry in to a temprary
register to make it easier to see what RIP a block was in.

This was implemented when FEX stopped storing the RIP in the CPU state
for every block. This is now no longer necessary since FEX stores the
in the tail data of the block.

This was affecting instructioncountci when in a debug build.
2023-08-08 22:56:36 -07:00
Ryan Houdek
0f6a268243 Arm64: Adds stats to the disassembly
I use this locally when looking for optimization opportunities in the
JIT.
The instruction count CI in the future will use this as well.
Just get it upstreamed right away.
2023-08-08 22:28:52 -07:00
Ryan Houdek
e0461497a0 ARM64: Optimize vector zeroing
`eor <reg>, <reg>, <reg>` is not the optimal way to zero a vector
register on ARM CPUs. Instead we should move by constant or zero
register to take advantage of zero-latency moves.
2023-08-08 22:24:11 -07:00
Ryan Houdek
e9e5b6fb0b Docs: Update for release FEX-2308 FEX-2308 2023-08-06 02:34:55 -07:00
Ryan Houdek
68cb6e61d1
Merge pull request #2860 from lioncash/alias
ARMEmitter: Add missing atomic aliases
2023-08-06 02:05:30 -07:00
Mai
5d0b2060e2
Merge pull request #2858 from Sonicadvance1/fix_clzero
X86Tables: Fixes CLZero destination address
2023-08-06 05:05:07 -04:00
Lioncache
b7d05a65c7 ARMEmitter: Add missing atomic aliases 2023-08-04 21:49:59 -04:00
Lioncache
93fe2fe06c ARMEmitter: Detemplatize LoadStoreAtomicLSE
Lets us lessen some template instantiations.
2023-08-04 19:47:03 -04:00
Ryan Houdek
21eb6e03c7
Merge pull request #2859 from bylaws/ooo
Fix 16-bit popa insertion behaviour
2023-08-04 16:37:09 -07:00
Lioncache
1b53337925 ARMEmitter: Simplify LoadStoreAtomicLSE variant
We can move the base opcode into the implementation function.
2023-08-04 19:35:54 -04:00
Billy Laws
52e5b8ccd9 OpcodeDispatcher: Fix 16-bit popa insertion behaviour
The 16-bit writes shouldn't overwrite the upper half of the 32-bit register for
POPA.
2023-08-04 17:24:55 +01:00
Billy Laws
8c8a8c84df unittests: Test for 16-bit popa insertion behaviour 2023-08-04 17:24:52 +01:00
Ryan Houdek
0d6837f1a1
Merge pull request #2856 from Sonicadvance1/allow_override_linker
CMake: Allow overriding linker
2023-08-04 03:37:17 -07:00
Ryan Houdek
7ef3cb88f9 CMake: Allow overriding linker
While the ENABLE_LLD and ENABLE_MOLD options are nice, they don't handle
the case when the linker of `lld` or `mold` doesn't match the compiler.

This particularly crops up when overriding the C compiler to a new
version of clang but the globally installed `ld.lld` is still the old
clang version.
This then causes clang to fail with unusual errors when upstream breaks
compatibility with itself.

Easy enough to use by passing the linker to cmake:
`-DUSE_LINKER=/usr/bin/ld.lld-15`

This also removes the ENABLE_LLD and ENABLE_MOLD options to use
USE_LINKER directly.
- ldd: `-DUSE_LINKER=lld`
- mold: `-DUSE_LINKER=mold`

Example of compiler failure when built with clang-15 but attempting to
link with ld.lld 14:
```bash
ld.lld-14: error: unittests/APITests/CMakeFiles/Filesystem.dir/Filesystem.cpp.o: Opaque pointers are only supported in -opaque-pointers mode (Producer: 'LLVM15.0.7' Reader: 'LLVM 14.0.6')
```
2023-08-04 02:34:15 -07:00
Ryan Houdek
617977357a X86Tables: Fixes CLZero destination address
This needs to default to 64-bit addresses, this was previously
defaulting to 32-bit which was meaning the destination address was
getting truncated. In a 32-bit process the address is still 32-bit.

I'm actually surprised this hasn't caused spurious SIGSEGV before this
point.

Adds a 32-bit test to ensure that side is tested as well.
2023-08-04 02:31:30 -07:00
Ryan Houdek
5a53c9231b
Merge pull request #2855 from alyssarosenzweig/tst-instead-of-cmn
JIT: Use TST instead of CMN
2023-08-02 15:07:28 -07:00
Alyssa Rosenzweig
a996e5300e JIT: Use TST instead of CMN
This is more obvious. llvm-mca says TST is half the cycle count of CMN
for whatever it's defaulting to. dougallj's reference shows both as the
same performance.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-08-02 17:51:33 -04:00
Ryan Houdek
25b2af14fd
Merge pull request #2854 from alyssarosenzweig/flags/rotate-harder
OpcodeDispatcher: Optimize rotates
2023-08-02 14:09:40 -07:00
Alyssa Rosenzweig
76059949ea
Merge pull request #2852 from Sonicadvance1/optimize_phsubsw
OpcodeDispatcher: Optimize phsubsw/phaddsw
2023-08-02 17:02:13 -04:00
Ryan Houdek
6e15c9c213
Merge pull request #2851 from Sonicadvance1/optimize_cas128_select
OpcodeDispatcher: Optimize CMPXCHG{8B,16B} final comparison
2023-08-02 14:01:58 -07:00
Alyssa Rosenzweig
01fcca884b OpcodeDispatcher: Optimize rotates
In the non-immediate cases, we can amortize some work between the two
flags to come out 1 instruction ahead.

In the immediate case, costs us an extra 2 instructions compared to
before we packed NZCV flags, but this mitigates a bigger instr count
regression that this PR would otherwise have. Coming out ahead will
require FlagM and smarter RA, but is doable.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-08-02 16:54:56 -04:00
Ryan Houdek
91bd3aa62a
Merge pull request #2832 from alyssarosenzweig/flags/pack-nzcv
Pack NZCV flags
2023-08-02 13:42:56 -07:00
Alyssa Rosenzweig
7a0119b092 OpcodeDispatcher: Optimize right shifts
Same technique as the left shifts. Gets rid of all our COND_FLAG_SET
use, which is good because it's a performance footgun.

Overall saves 17 instructions (!!!!) from the flag calculation code for
`sar eax, cl`.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-08-02 14:45:44 -04:00
Alyssa Rosenzweig
fa42c1616e OpcodeDispatcher: Preserve AF for non-immediate shift
The selection logic is expensive. Saves 5 instructions.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-08-02 14:45:10 -04:00