FEX-Emu/FEX - FEX - Gitea: Git with a cup of tea

mirror of https://github.com/FEX-Emu/FEX.git synced 2025-02-16 04:47:32 +00:00

Author	SHA1	Message	Date
Lioncache	14cc23b6c3	ARMEmitter: Handle contiguous first fault load (scalar plus scalar) group Adds the only missing implementation category for the first-faulting loads, making the interface more consistent.	2023-08-11 09:46:40 -04:00
Ryan Houdek	9d26af95ab	Merge pull request #2873 from neobrain/refactor_warning_fixes Various warning fixes	2023-08-10 16:58:14 -07:00
Tony Wasserka	aed4dda3e4	Arm64: Remove unused function	2023-08-10 18:45:14 +02:00
Tony Wasserka	e0d21e61cc	Syscalls: Fix warnings about unused variables in Release builds	2023-08-10 18:45:14 +02:00
Tony Wasserka	45d0f0d349	ARMEmitter: Fix warnings about unused variables in Release builds	2023-08-10 18:45:14 +02:00
Tony Wasserka	f1cc76614b	Include VIXL as a system library This suppresses warnings from VIXL headers.	2023-08-10 18:45:14 +02:00
Ryan Houdek	099f29f1ed	Merge pull request #2871 from Sonicadvance1/fix_stats_missing_member FEXCore: Fixes Arm64 stats disassembly	2023-08-10 06:23:02 -07:00
Ryan Houdek	b1a3f82923	FEXCore: Fixes Arm64 stats disassembly Requires the IR headerop to house the number of host instructions this code is translating for the stats. Fixes compiling with disassembly enabled, will be used with the instruction count CI.	2023-08-10 03:23:25 -07:00
Ryan Houdek	6f4a23dd15	Merge pull request #2870 from lioncash/indexed ARMEmitter: Handle SVE FP multiply-add long groups	2023-08-09 21:35:33 -07:00
Ryan Houdek	f3182036bc	Merge pull request #2867 from Sonicadvance1/dummy_thin_handlers FEX: Create a CommonTools static library	2023-08-09 21:34:46 -07:00
Lioncache	444961ad79	ARMEmitter: Handle SVE FP multiply-add long group	2023-08-09 15:20:04 -04:00
Lioncache	48a3271fbc	ARMEmitter: Handle SVE FP multiply-add long (indexed) group	2023-08-09 15:19:49 -04:00
Mai	ea8fbc61c2	Merge pull request #2868 from Sonicadvance1/irdumper_passmanager IR: Adds Option to run the IRDumper with more configurations	2023-08-09 10:28:25 -04:00
Ryan Houdek	35e97ec9bc	IR: Adds Option to run the IRDumper with more configurations This is incredibly useful and I find myself hacking this feature in every time I am optimizing IR. Adds a new configuration option which allows dumping IR at various times. Before any optimization passes has happened After all optimizations passes have happened Before and After each IRPass to see what is breaking something. Needs #2864 merged first	2023-08-09 05:58:20 -07:00
Ryan Houdek	53ac8abce9	Merge pull request #2863 from Sonicadvance1/stats Arm64: Adds stats to the disassembly	2023-08-09 04:06:22 -07:00
Ryan Houdek	fe351353f6	Merge pull request #2865 from Sonicadvance1/first_sve_opt Arm64: Implement first SVE-128bit optimization	2023-08-09 04:06:05 -07:00
Ryan Houdek	a23cb0447b	Arm64: Implement first SVE-128bit optimization This is a /very/ simple optimization purely because of a choice that ARM made with SVE in latest Cortex. Cortex-A715: - sxtl/sxtl2/uxtl/uxtl2 can execute 1 instruction per cycle. - sunpklo/sunpkhi/uunpklo/uunpkhi can execute 2 instructions per cycle. Cortex-X3: - sxtl/sxtl2/uxtl/uxtl2 can execute 2 instruction per cycle. - sunpklo/sunpkhi/uunpklo/uunpkhi can execute 4 instructions per cycle. This is fairly quirky since this optimization only works on SVE systems with 128-bit Vector length. Which since it is all of the current consumer platforms, it will work.	2023-08-09 03:51:57 -07:00
Ryan Houdek	f2aa2ce4bb	Arm64: Rename HostSupportsSVE We need to know the difference between the host supporting SVE with 128-bit registers versus 256-bit registers. Ensure we know the difference. No functional change here.	2023-08-09 03:51:56 -07:00
Ryan Houdek	cf93652708	Config: Adds support for overriding host features This allows use to both enable and disable regardless of what the host supports. This replaces the old `EnableAVX` option. Unlike the old EnableAVX option which was a binary option which could only disable, each of these options are technically trinary states. Not setting an option gives you the default detection, while explicitly enabling or disabling will toggle the option regardless of what the host supports. This will be used by the instruction count CI in the future.	2023-08-09 03:51:37 -07:00
Ryan Houdek	eaed5c4704	Merge pull request #2862 from Sonicadvance1/optimize_vector_zero ARM64: Optimize vector zeroing	2023-08-09 03:51:04 -07:00
Mai	c77ed78f5a	Merge pull request #2861 from Sonicadvance1/fix_vector_shift_by_zero FEXCore: Fixes vector shifts by zero	2023-08-09 05:52:10 -04:00
Ryan Houdek	348844a95b	FEX: Create a CommonTools static library Moves the dummy handlers over to this library. This will end up getting used for more than the mingw test harness runner once the instruction count CI is operational.	2023-08-09 02:27:13 -07:00
Ryan Houdek	e8fb322025	unittests: Adds tests for vector shifts with zero immediate To ensure FEX doesn't encounter the encoding bug again.	2023-08-09 02:16:17 -07:00
Ryan Houdek	5f0efda8fe	ARM64: Fixes shift by immediate zero These would emit invalid instructions in most cases. Turn in to a move or a no-op if the shift is zero.	2023-08-09 02:16:17 -07:00
Ryan Houdek	d198d701aa	OpcodeDispatcher: Fixes vector shifts by immediate zero pslldq logic was wrong in the case of zero shift. The rest should just return their source in the case of zero shift.	2023-08-09 02:16:17 -07:00
Mai	c4c7620ed5	Merge pull request #2866 from Sonicadvance1/remove_unnecessary_loadconstant Arm64: Remove erroneous LoadConstant	2023-08-09 05:10:11 -04:00
Ryan Houdek	bf5719770e	Arm64: Remove erroneous LoadConstant This was a debug LoadConstant that would load the entry in to a temprary register to make it easier to see what RIP a block was in. This was implemented when FEX stopped storing the RIP in the CPU state for every block. This is now no longer necessary since FEX stores the in the tail data of the block. This was affecting instructioncountci when in a debug build.	2023-08-08 22:56:36 -07:00
Ryan Houdek	0f6a268243	Arm64: Adds stats to the disassembly I use this locally when looking for optimization opportunities in the JIT. The instruction count CI in the future will use this as well. Just get it upstreamed right away.	2023-08-08 22:28:52 -07:00
Ryan Houdek	e0461497a0	ARM64: Optimize vector zeroing `eor <reg>, <reg>, <reg>` is not the optimal way to zero a vector register on ARM CPUs. Instead we should move by constant or zero register to take advantage of zero-latency moves.	2023-08-08 22:24:11 -07:00
Ryan Houdek	e9e5b6fb0b	Docs: Update for release FEX-2308 FEX-2308	2023-08-06 02:34:55 -07:00
Ryan Houdek	68cb6e61d1	Merge pull request #2860 from lioncash/alias ARMEmitter: Add missing atomic aliases	2023-08-06 02:05:30 -07:00
Mai	5d0b2060e2	Merge pull request #2858 from Sonicadvance1/fix_clzero X86Tables: Fixes CLZero destination address	2023-08-06 05:05:07 -04:00
Lioncache	b7d05a65c7	ARMEmitter: Add missing atomic aliases	2023-08-04 21:49:59 -04:00
Lioncache	93fe2fe06c	ARMEmitter: Detemplatize LoadStoreAtomicLSE Lets us lessen some template instantiations.	2023-08-04 19:47:03 -04:00
Ryan Houdek	21eb6e03c7	Merge pull request #2859 from bylaws/ooo Fix 16-bit popa insertion behaviour	2023-08-04 16:37:09 -07:00
Lioncache	1b53337925	ARMEmitter: Simplify LoadStoreAtomicLSE variant We can move the base opcode into the implementation function.	2023-08-04 19:35:54 -04:00
Billy Laws	52e5b8ccd9	OpcodeDispatcher: Fix 16-bit popa insertion behaviour The 16-bit writes shouldn't overwrite the upper half of the 32-bit register for POPA.	2023-08-04 17:24:55 +01:00
Billy Laws	8c8a8c84df	unittests: Test for 16-bit popa insertion behaviour	2023-08-04 17:24:52 +01:00
Ryan Houdek	0d6837f1a1	Merge pull request #2856 from Sonicadvance1/allow_override_linker CMake: Allow overriding linker	2023-08-04 03:37:17 -07:00
Ryan Houdek	7ef3cb88f9	CMake: Allow overriding linker While the ENABLE_LLD and ENABLE_MOLD options are nice, they don't handle the case when the linker of `lld` or `mold` doesn't match the compiler. This particularly crops up when overriding the C compiler to a new version of clang but the globally installed `ld.lld` is still the old clang version. This then causes clang to fail with unusual errors when upstream breaks compatibility with itself. Easy enough to use by passing the linker to cmake: `-DUSE_LINKER=/usr/bin/ld.lld-15` This also removes the ENABLE_LLD and ENABLE_MOLD options to use USE_LINKER directly. - ldd: `-DUSE_LINKER=lld` - mold: `-DUSE_LINKER=mold` Example of compiler failure when built with clang-15 but attempting to link with ld.lld 14: ```bash ld.lld-14: error: unittests/APITests/CMakeFiles/Filesystem.dir/Filesystem.cpp.o: Opaque pointers are only supported in -opaque-pointers mode (Producer: 'LLVM15.0.7' Reader: 'LLVM 14.0.6') ```	2023-08-04 02:34:15 -07:00
Ryan Houdek	617977357a	X86Tables: Fixes CLZero destination address This needs to default to 64-bit addresses, this was previously defaulting to 32-bit which was meaning the destination address was getting truncated. In a 32-bit process the address is still 32-bit. I'm actually surprised this hasn't caused spurious SIGSEGV before this point. Adds a 32-bit test to ensure that side is tested as well.	2023-08-04 02:31:30 -07:00
Ryan Houdek	5a53c9231b	Merge pull request #2855 from alyssarosenzweig/tst-instead-of-cmn JIT: Use TST instead of CMN	2023-08-02 15:07:28 -07:00
Alyssa Rosenzweig	a996e5300e	JIT: Use TST instead of CMN This is more obvious. llvm-mca says TST is half the cycle count of CMN for whatever it's defaulting to. dougallj's reference shows both as the same performance. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2023-08-02 17:51:33 -04:00
Ryan Houdek	25b2af14fd	Merge pull request #2854 from alyssarosenzweig/flags/rotate-harder OpcodeDispatcher: Optimize rotates	2023-08-02 14:09:40 -07:00
Alyssa Rosenzweig	76059949ea	Merge pull request #2852 from Sonicadvance1/optimize_phsubsw OpcodeDispatcher: Optimize phsubsw/phaddsw	2023-08-02 17:02:13 -04:00
Ryan Houdek	6e15c9c213	Merge pull request #2851 from Sonicadvance1/optimize_cas128_select OpcodeDispatcher: Optimize CMPXCHG{8B,16B} final comparison	2023-08-02 14:01:58 -07:00
Alyssa Rosenzweig	01fcca884b	OpcodeDispatcher: Optimize rotates In the non-immediate cases, we can amortize some work between the two flags to come out 1 instruction ahead. In the immediate case, costs us an extra 2 instructions compared to before we packed NZCV flags, but this mitigates a bigger instr count regression that this PR would otherwise have. Coming out ahead will require FlagM and smarter RA, but is doable. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2023-08-02 16:54:56 -04:00
Ryan Houdek	91bd3aa62a	Merge pull request #2832 from alyssarosenzweig/flags/pack-nzcv Pack NZCV flags	2023-08-02 13:42:56 -07:00
Alyssa Rosenzweig	7a0119b092	OpcodeDispatcher: Optimize right shifts Same technique as the left shifts. Gets rid of all our COND_FLAG_SET use, which is good because it's a performance footgun. Overall saves 17 instructions (!!!!) from the flag calculation code for `sar eax, cl`. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2023-08-02 14:45:44 -04:00
Alyssa Rosenzweig	fa42c1616e	OpcodeDispatcher: Preserve AF for non-immediate shift The selection logic is expensive. Saves 5 instructions. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2023-08-02 14:45:10 -04:00

1 2 3 4 5 ...

7038 Commits