FEX-Emu/FEX - FEX - Gitea: Git with a cup of tea

mirror of https://github.com/FEX-Emu/FEX.git synced 2025-02-12 18:39:18 +00:00

Author	SHA1	Message	Date
Lioncache	a9a7cbce21	Interpreter: Use alias for temporary vector data Lets us extract the size into one location for easy size changes in the future if necessary.	2023-08-21 14:12:55 -04:00
Ryan Houdek	a3bf952f2b	IR: Implements support for wide scalar shifts This matches x86 vector shift behaviour closely for ps{rl,ra,ll}{w,d,q} where the vector is shifted by a scalar value that is 64-bits wide. Anything larger than the element size will set that element to zero. With SVE we have some new wide element shifts that match this behaviour exactly (except supports wide shift sources rather than scalar). This is a significant improvement even on platforms that only support 128-bit SVE.	2023-08-20 19:16:40 -07:00
Ryan Houdek	ac77986c44	IR: Implements support for saturating/rounding vector shifts	2023-08-20 13:38:23 -07:00
Ryan Houdek	c5f5a03c68	OpcodeDispatcher: Optimize PSIGN This dramatically improves the performance of the PSIGN instructions.	2023-08-20 13:38:23 -07:00
Ryan Houdek	1ed9ec63be	Arm64: Optimize BSL when possible. With ASIMD FEX would never optimize BSL out of fear if some registers overlapped it would break things. So it had previously always moved to a temporary first and then moved the result back out when done. Now instead check upfront if any of the source registers overlap the destination. If the destination register overlaps any of the three sources we can bsl, bit, or bif depending on which register gets overlapped. Worst case the destination doesn't overlap any of the source registers and still needs these moves.	2023-08-20 13:38:23 -07:00
Ryan Houdek	a523858f66	Merge pull request #2923 from Sonicadvance1/nonnull_legacy_segment_telemetry FEXCore: Adds telemetry around legacy segment register setting	2023-08-20 10:27:56 -07:00
Ryan Houdek	9e4888c6a1	Merge pull request #2930 from Sonicadvance1/support_push OpcodeDispatcher: Implement support for push IR operation	2023-08-20 10:25:26 -07:00
Lioncache	8167626a07	x86_64/VectorOps: Properly handle VExtr element sizes other than bytes We need to convert the index into a byte index.	2023-08-20 13:03:27 -04:00
Lioncache	c3778a9729	OpcodeDispatcher: Improve SHA1MSG1 output We can simplify these inserts down to a single EXT	2023-08-20 12:50:07 -04:00
Ryan Houdek	34722348e8	Merge pull request #2912 from Sonicadvance1/optimize_flag_clearing OpcodeDispatcher: Minor optimization around clearing flags	2023-08-19 23:03:40 -07:00
Lioncache	1fbf193739	Arm64/EncryptionOps: Use MOVI reg, #0 to zero vectors This is a little more optimal than XORing the vector by itself.	2023-08-19 23:20:36 -04:00
Ryan Houdek	92c3014aaa	OpcodeDispatcher: Minor optimization around clearing flags When clearing multiple flags it is more optimal to load the mask constant in to a register and then clear with a single and/bic. Back to back bfi is actually less optimal due to dependency tracking. With #2911, this is a total win since this hits an edge case with constant loading that #2911 fixes.	2023-08-19 20:14:37 -07:00
Mai	6960fca256	Merge pull request #2929 from Sonicadvance1/signaldelegator_getconfig SignalDelegator: Allow getting the internal configuration	2023-08-19 23:12:06 -04:00
Mai	affbcd2241	Merge pull request #2928 from Sonicadvance1/remove_x18_saving Arm64Emitter: Stop saving and restoring platform register	2023-08-19 23:11:44 -04:00
Mai	a2e5c231ae	Merge pull request #2908 from Sonicadvance1/optimize_stc_clc IR/ConstProp: Ensure that BFI with constant bitfields can optimize to Andn or Or	2023-08-19 23:10:48 -04:00
Ryan Houdek	b973c193be	Merge pull request #2939 from lioncash/round OpcodeDispatcher: Eliminate redundant moves in {AVX}VectorRound	2023-08-19 19:29:01 -07:00
Ryan Houdek	4c409ea47d	Merge pull request #2938 from lioncash/fcmp OpcodeDispatcher: Eliminate unnecessary moves in {AVX}VFCMPOp	2023-08-19 19:27:24 -07:00
Ryan Houdek	ac53913c37	Merge pull request #2937 from lioncash/scalarunary OpcodeDispatcher: Remove unnecessary moves in {AVX}VectorUnaryOp	2023-08-19 19:22:10 -07:00
Ryan Houdek	2224c23c79	Merge pull request #2934 from lioncash/scalarfp OpcodeDispatcher: Remove extraneous moves in {V}CVTSD2SS/{V}CVTSS2SD	2023-08-19 19:20:20 -07:00
Ryan Houdek	6ce380d7a9	Merge pull request #2936 from lioncash/scalaralu OpcodeDispatcher: Remove unnecessary moves in {AVX}VectorScalarALUOp	2023-08-19 19:09:06 -07:00
Lioncache	5152854b98	OpcodeDispatcher: Remove extraneous moves in {V}CVTSD2SS/{V}CVTSS2SD Since all we're going to be doing is an insert as the final operation, in the cases where our source is a vector, we can specify the size of the vector rather than the size of the element to avoid doing unnecessary zero-extending.	2023-08-19 22:07:25 -04:00
Ryan Houdek	8dade7eea1	Merge pull request #2935 from lioncash/scalarfp2 OpcodeDispatcher: Remove redundant moves from {V}CVTSD2SI/{V}CVTSS2SI	2023-08-19 19:05:52 -07:00
Lioncache	6ba42e5cf1	OpcodeDispatcher: Eliminate redundant moves in {AVX}VectorRound When dealing with scalar source registers, we can opt to not zero-extend the vector and just perform the scalar operation and then insert the result.	2023-08-19 21:27:59 -04:00
Lioncache	343b00818d	OpcodeDispatcher: Eliminate unnecessary moves in {AVX}VFCMPOp We dealing with scalar vector sources, we don't need to zero-extend the vector, and we can just use it as is.	2023-08-19 21:20:03 -04:00
Lioncache	09addb217a	OpcodeDispatcher: Remove unnecessary moves in {AVX}VectorUnaryOp When dealing with source vectors, we can use the vector length rather than using a smaller size and zero extending the register, especially since the resulting value is just inserted into another vector.	2023-08-19 20:09:19 -04:00
Lioncache	4d1f002dea	OpcodeDispatcher: Remove unnecessary moves in AVXVectorScalarALUOp Same thing as the SSE variant, but for AVX.	2023-08-19 19:22:42 -04:00
Lioncache	1158ad7b2a	OpcodeDispatcher: Remove unnecessary moves in VectorScalarALUOp We can explicitly specify the vector width when working with a vector source, so that we don't do any unnecessary zero-extending on the element.	2023-08-19 19:15:13 -04:00
Lioncache	6907fdca6b	OpcodeDispatcher: Remove redundant moves from {V}CVTSD2SI/{V}CVTSS2SI We can specify the full vector length when dealing with a source vector to avoid zero-extending the vector unnecessarily. When dealing with a memory operand, however, we only want to load the exact source size.	2023-08-19 18:46:08 -04:00
Lioncache	c31329609f	OpcodeDispatcher: Unify handling code for MOVSD and MOVSS These have the same behavior and only differ based on element size, so we can join the implementations together instead of duplicating them across both functions.	2023-08-19 17:44:45 -04:00
Lioncache	1fe8470933	OpcodeDispatcher: Remove extraneous moves from VMOVSS/VMOVSD xmm to mem case Like the changes made to the xmm to xmm case, since we're going to be storing a 64-bit value, we don't directly need to zero-extend the vector on a load.	2023-08-19 17:35:52 -04:00
Lioncache	99b5aaa426	OpcodeDispatcher: Remove extraneous moves in VMOVSS/VMOVSD register case In the event that we have a full length vector, we can just load and move from it, which gets rid of a little bit of mov noise. Since all we intend to do is perform an insert from one vector into another, we don't need the zero-extending behavior that an 64-bit vector load would do.	2023-08-19 17:35:12 -04:00
Lioncache	84f228a75a	x86_64/VectorOps: Simplify index handling in VInsElement While we're in the area, we can simplify these cases down a little.	2023-08-19 01:28:45 -04:00
Lioncache	83a330b039	x86_64/VectorOps: Fix insertion bugs in VDupElement for 256-bit Previously we weren't hitting this because we were never broadcasting from the upper lane with VDupElement.	2023-08-19 01:21:12 -04:00
Lioncache	bbed4d73ed	OpcodeDispatcher: Improve VPERMQ/VPERMPD broadcast cases For a bunch of cases that act as broadcasts (where all indices in the imm8 specify the same element), we can use VDupElement here rather than iterating through.	2023-08-19 01:08:30 -04:00
Ryan Houdek	8b051b5e63	OpcodeDispatcher: Implement support for push IR operation This paves the way to optimizing pushes in to both push operations and push pair operations to more optimally match Arm64 push support. While this does the first step for supporting the base push, we'll leave optimizing push pairs to future work.	2023-08-18 14:19:16 -07:00
Ryan Houdek	1fdc4d2c62	IR: Implement support for a push operation This is a bit of tricky operation where due to our our usage of SSA, the incoming source isn't guaranteed to end its live-range at this instruction. This gives us a behaviour where to be optimal we need to take different paths depending on if the incoming address register is the same as the destination node. Once we have form of RA constraints or non-SSA IR form that can guarantee this restriction then this will go away.	2023-08-18 14:14:38 -07:00
Ryan Houdek	ea5c67da80	SignalDelegator: Allow getting the internal configuration Not used by FEX today but will be used by the WINE integration.	2023-08-18 11:56:52 -07:00
Ryan Houdek	0373826f46	Arm64Emitter: Stop saving and restoring platform register FEX doesn't use the platform register on wine platforms so there is no reason to save and restore it. On Linux we can still use it at some point but for now it isn't part of our RA.	2023-08-18 11:49:44 -07:00
Ryan Houdek	7db2e487c3	IR/ConstProp: Remove some UBSAN behaviour Changes the idiom used for constant mask generation to a ternary. This pattern is definitely used elsewhere in code but we can get rid of all instances here.	2023-08-18 11:41:11 -07:00
Ryan Houdek	6e5111b876	IR/ConstProp: Ensure that BFI with constant bitfields can optimize to Andn or Or This optimizes the clc and stc instructions for flag setting and clearing.	2023-08-18 11:33:11 -07:00
Ryan Houdek	d502ad63f4	IR/ConstProp: Ensure ANDN is optimized	2023-08-18 11:27:06 -07:00
Ryan Houdek	fc84f6b345	Merge pull request #2927 from bylaws/interrupt FEXCore: Allow for interrupting the JIT on block entry	2023-08-18 06:14:24 -07:00
Billy Laws	de63fd05d0	FEXCore: Allow for interrupting the JIT on block entry This takes a similar approach to deferred signal handling and allows any given thread to be interrupted while running JIT code by protecting the appropriate page as RO. When the thread then enters a new block, it will try to acccess that page and segfault. This is safer than just sending a signal to the thread as that could stop in a place where JIT context couldn't be recovered correctly.	2023-08-18 05:58:51 -07:00
Ryan Houdek	d3f0c7e969	Merge pull request #2925 from bylaws/winfile Support for Config.json loading on WIN32	2023-08-18 05:00:22 -07:00
Billy Laws	00556023c2	Remove unnecessary WIN32 file handling TODOs With WOW, all allocations from 64-bit code use the full address space and limiting is handled on the syscall thunk side so theres need to worry about STL allocations stealing AS.	2023-08-18 04:37:40 -07:00
Billy Laws	5de0714766	FileLoading: Fix handling of non-existent files on WIN32	2023-08-18 04:37:40 -07:00
Billy Laws	bbfd15f801	LogMan: Commonise log level to string conversion	2023-08-18 04:36:31 -07:00
Billy Laws	0954c7eb9f	AllocatorHooks: Add C++17 aligned new/delete functions	2023-08-18 04:32:16 -07:00
Ryan Houdek	f09d9af3db	Merge pull request #2922 from lioncash/psrld OpcodeDispatcher: Improve {V}PSRLDQ shift by 0	2023-08-17 17:05:35 -07:00
Ryan Houdek	d19e2507e5	FEXCore: Adds telemetry around legacy segment register setting Due to Intel dropping support for legacy segment registers[1] there is a concern that this will break legacy 32-bit software that is doing some magic segment register handling. Adds some simple telemetry for 32-bit applications that when they encounter an instruction that sets the segment register or uses a segment register that the JIT will do a /relatively/ quick four instruction check to see if it is not a null segment. It's not enough to just check if the segment index is 0 or not, 32-bit Linux software starts with non-zero segment register indexes but the LDT for each segment index is a null-descriptor. Once the segment address is loaded, the IR operation will do a quick check against zero and if it /isn't/ zero then set the telemetry value. A very minor optimization that segment registers only get checked once per block to ensure overhead stays low. [1] https://www.intel.com/content/www/us/en/developer/articles/technical/envisioning-future-simplified-architecture.html - 3.6 - Restricted Subset of Segmentation - `Bases are supported for FS, GS, GDT, IDT, LDT, and TSS registers; the base for CS, DS, ES, and SS is ignored for 32-bit mode, same as 64-bit mode (treated as zero).` - 4.2.17 - MOV to Segment Register - Will fault if SS is written (Breaking anything that writes to SS). - Will not fault if CS, DS, ES are written (Thus it sets the segment but gets ignored due to 3.6).	2023-08-17 17:00:41 -07:00

... 24 25 26 27 28

1357 Commits