Commit Graph

1749 Commits

Author SHA1 Message Date
Ryan Houdek
85d1b573ef
Merge pull request #3927 from bylaws/winafp
ARM64EC: Set appropriate AFP and SVE256 state on JIT entry/exit
2024-08-08 22:21:23 -07:00
Ryan Houdek
1007f874bf
Merge pull request #3926 from bylaws/windef
FEXCore: Drop deferred signal handling on Windows
2024-08-08 17:33:44 -07:00
Mai
4882f10536
Merge pull request #3888 from Sonicadvance1/avx128_optimize_blends
AVX128: Optimize blends
2024-08-07 17:08:19 -04:00
Billy Laws
fe43a2bcb2 ARM64EC: Set appropriate AFP and SVE256 state on JIT entry/exit 2024-08-07 18:34:35 +00:00
Billy Laws
6700511cdf FEXCore: Drop deferred signal handling on Windows
The async signal issues this handles do not exist on Windows.
2024-08-07 18:31:48 +00:00
Ryan Houdek
e84848b16b
FEX: Moves HostFeatures querying to the frontend
This moves the CPU feature querying to the frontend. The primary purpose
here is for the wow64 frontend to not require linux-isms for querying
these features. This is required since non-Linux environments don't have
the "CPUID" feature for reading EL1 MSRs in EL0.

Wiring up the remaining wow64 registry querying is left for a future
exercise.

This also technically removes an xbyak requirement from FEXCore for when
building the x86 Test harness runner, but that doesn't really matter for
regular use cases.
2024-08-07 05:26:02 -07:00
Ryan Houdek
69ed39d49e
Merge pull request #3892 from Sonicadvance1/optimize_vpermq
AVX128: Optimize all cases of vpermq
2024-08-06 20:07:28 -07:00
Ryan Houdek
e613876e9d
AVX128: Optimize all cases of vpermq
Started by cherry-picking some cases from the variants that appeared when running
Steam, games, AV1 convolve tests, openssl, ffmpeg, libjpeg-turbo,
openh264, libvpx, gemmlowp, libyuv, and dav1d.

Then turned it around and optimized them all since all variants end up
needing to be split in to two halves, that effectively means we need to
have 16 implementations, plus a couple of special cases for duplicated
results.

Fixes #3795
2024-08-06 09:08:30 -07:00
Mai
1473129a8f
Merge pull request #3920 from Sonicadvance1/fix_newline_asm
SpinWaitLock: Fixes missing newline in asm
2024-08-06 12:08:14 -04:00
Alyssa Rosenzweig
a7424416d9
Merge pull request #3921 from bylaws/reloadf
Arm64Emitter: Reload STATE before SRA fill on ARM64EC
2024-08-06 09:28:23 -04:00
Billy Laws
ccf332d48e Arm64Emitter: Reload STATE before SRA fill on ARM64EC
While ARM64EC code cannot use x28, it can be cleared by the kernel
when performing syscalls etc so restore it from the TEB to be safe.
2024-08-05 17:31:01 +00:00
Ryan Houdek
802a32ce8a
SpinWaitLock: Fixes missing newline in asm
This would cause the atomic load after the wfe to be dropped,
effectively returning stale data.
2024-08-04 06:57:05 -07:00
Ryan Houdek
70c02d5c58
ARM64Emitter: Removes unused vixl CPU object 2024-08-03 22:26:00 -07:00
Ryan Houdek
2e4fb47848
HostFeatures: Read VL ourselves
Instead of calling out to vixl
2024-08-03 22:26:00 -07:00
Ryan Houdek
a4d5302369
Arm64: Adds Int helpers
One more vixl step removed.
2024-08-03 21:40:28 -07:00
Ryan Houdek
6ff3c90af3
CodeEmitter: Removes vestigial vixl usage
- IsImmLogical already existed in our CodeEmitter. We just forgot to
  allow nullptr arguments and to use it.
- Adds an equivalent IsImmAddSub helper and uses it

This gets us closer to removing vixl's global initializers from FEXCore.
2024-08-03 21:04:56 -07:00
Ryan Houdek
201fe6ee23
Merge pull request #3909 from bylaws/ec-bitmap
Directly use the EC code bitmap for determining page arch
2024-08-02 10:55:43 -07:00
Ryan Houdek
83fedd6c8f
Merge pull request #3912 from bylaws/addroverride
Don't apply the address-size flag to segment addresses
2024-08-01 18:35:06 -07:00
Ryan Houdek
c3c2b6115d
Merge pull request #3910 from bylaws/f80
F80: Drop dependency on state stored in TLS
2024-08-01 12:03:41 -07:00
Ryan Houdek
3e59fc0a8c
Merge pull request #3911 from bylaws/x80bug
x87StackOptimizationPass: Default initialise StackMemberInfo members
2024-07-31 22:57:16 -07:00
Ryan Houdek
f98c010854
Merge pull request #3907 from bylaws/ec-mema
AllocatorHooks: Correct memory API usage on Windows
2024-07-31 17:48:13 -07:00
Billy Laws
be4777110c OpcodeDispatcher: Don't apply the address-size flag to segment addresses
The address-size flag only applies to the offset from the segment base,
rather than the segment address itself.
2024-07-31 18:14:32 +00:00
Billy Laws
696503680a F80: Drop dependency on state stored in TLS
Windows cannot support the implicit TLS as was used prior, so introduce
a state structure and pass it in to functions where necessary.
2024-07-31 18:51:42 +01:00
Billy Laws
dd4d3bcf38 AllocatorHooks: Correct memory API usage on Windows
These issues end up being tolerated by wine but not actual windows.
2024-07-31 18:36:50 +01:00
Billy Laws
f26bb6bf53 x87StackOptimizationPass: Default initialise StackMemberInfo members
Not doing so is UB.
2024-07-31 17:30:29 +00:00
Billy Laws
2c4fd79304 FEXCore: Add a generic spill/fill-all syscall ABI and use for Windows
Also drop the legacy hangover ABI as it has no users.
2024-07-31 17:25:59 +00:00
Billy Laws
910ec4aadd Dispatcher: Directly use the EC code bitmap for determining page arch
The prior approach using the L2 cache was flawed as it assumed L2
page entries had a 1-1 correspondence with actual pages. While the L2
cache could be extended to handle aliases with EC, this could lead to
thrashing etc. The cost of a lookup in the actual EC code bitmap is
cheap enough to perform every time considering the infrequency of calls
to ARM64EC code when compare to X86 L2 hits.
2024-07-31 17:24:50 +00:00
Billy Laws
fb7275b3d8 Revert: "LookupCache: Track ARM64EC page state in the code cache"
This reverts the commit 526e3e654f.
2024-07-31 17:24:50 +00:00
Billy Laws
51b4bfc6a6 FEXCore: Move ARM64EC TEB offset constants to Arm64Emitter
These need to be used from outside the dispatcher, and there are already
similar defines for EC registers in the emitter header.
2024-07-31 17:24:50 +00:00
Alyssa Rosenzweig
941fd9c6ea
Merge pull request #3901 from pmatos/TopUsage
Reuse Top in ReconstructFSW_Helper
2024-07-31 08:01:05 -04:00
Paulo Matos
5933a59c09 Intersperse flag retrieval and FSW insertion 2024-07-31 12:04:54 +02:00
Paulo Matos
c2136272bf Reuse Top in ReconstructFSW_Helper
This is a non functional. Instead of fetching top again, we use the one
obtained through the fast path calculation.
2024-07-31 11:56:14 +02:00
Ryan Houdek
3f3e937967
man: Fixes newline issue with strenum
In the environment section this was causing the next environment
variable line to be merged with the strenum options

Also makes it so strenum options doesn't have a spurious comma at the
end of the list.
2024-07-31 02:02:05 -07:00
Ryan Houdek
87fbcf754d
Vector: Optimize pblendw
Using a brute force solver to add in more optimized code paths

- Adds 12 single VInsElement implementations
- Adds 4 two IR operation implementations

Not adding any of the two or three IR operation implementations that use
VInsElement because SRA interacts badly and becomes worse than the VTBX
implementation.
2024-07-27 19:25:51 -07:00
Ryan Houdek
403fd62b34
Merge pull request #3890 from Sonicadvance1/refactor_frontend_threadmanager
FEXCore: Removes ThreadManager
2024-07-26 13:27:43 -07:00
Ryan Houdek
d92b6a9ac4
Merge pull request #3898 from alyssarosenzweig/ir/creative-refs
OpcodeDispatcher/X87: use less creative Refs
2024-07-26 13:26:43 -07:00
Ryan Houdek
c2092bfed0
Merge pull request #3893 from pmatos/FNINITFix
Fix call to FNINITF64 and refactor
2024-07-26 13:25:49 -07:00
Alyssa Rosenzweig
5ff09f5091 OpcodeDispatcher/X87: use less creative Refs
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-07-26 14:30:42 -04:00
Paulo Matos
d1e36f264f Fix call to FNINITF64 and refactor 2024-07-26 14:56:06 +02:00
Mai
93eead243f
Merge pull request #3864 from Sonicadvance1/threads_atexit_remove
Threads: Setup the stack tracker to not need global initialization
2024-07-26 06:38:29 -04:00
Ryan Houdek
3b2e657fd4
FEXCore: Removes ThreadManager
This has been leaked state to FEXCore for quite a while. FEXCore never
actually needed this information, moves the bits to the frontend that
are necessary.

Minor behaviour change that `RunUntilExit` now just assumes the primary
thread is using it. This behaviour is on the chopping block to get
removed next anyway.
2024-07-25 14:54:10 -07:00
Ryan Houdek
380ba0a014
Merge pull request #3889 from Sonicadvance1/refactor_frontend_exithandler
FEXCore: Refactor ExitHandler slightly
2024-07-25 14:53:25 -07:00
Ryan Houdek
7816b150d0
FEXCore: Removes CPUBackendFeatures
We were only ever hardcoding true for TBL2 and Flags now. Get rid of it.
2024-07-24 17:19:30 -07:00
Ryan Houdek
ce8bc9d25c
FEXCore: Refactor ExitHandler slightly
Instead of passing the TID back to the exit handler, just pass the whole
thread object. This will allow some cleanups with the frontend thread
tracking soon

NFC
2024-07-24 14:39:56 -07:00
Ryan Houdek
f8ef6feff9
AVX128: Optimize blends
Optimizes the AVX128 blends by reusing the prior SSE4.1 implementation.
Only difference is the destination register isn't reused as a source
register.

One confusing thing is that Felix Cloutier's documentation has a typo on
the 256-bit VPBLENDW instruction where it had the top 128-bit lane
reusing the destination instead of sources. So I wrote a unittest to
ensure correctness.

Fixes #3796
2024-07-23 19:24:19 -07:00
Ryan Houdek
3c5b59d985
AVX128: Implement support for scalar FMA with AFP
Now that I have AFP supporting hardware I felt better implementing this
since I can run unit tests.

Fixes #3793
2024-07-22 12:58:19 -07:00
Alyssa Rosenzweig
587b924de9 json_ir_generator: stop prefixing arguments
stop prefixing the arguments when we generate allocate ops (in particular), this
is more convenient and simpler. in exchange we need to prefix Op to avoid a
collision on fcmpscalarinsert which has an argument named Op, but that's a local
change at least.

came up when experimenting with new IR, but I think this is probably a win by
itself.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-07-22 13:50:21 -04:00
Paulo Matos
a1378f94ce X87 Code Refactoring and Optimization Pass 2024-07-22 08:44:45 +02:00
Alyssa Rosenzweig
610caf8529 ConstProp: treat StoreContext as zeroable
todo: FPR equivalent.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-07-21 15:49:09 -04:00
Alyssa Rosenzweig
d20b46e46f IR: drop LoadFlag/StoreFlag ops
pointless, we can just load/store the context now.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-07-21 15:49:09 -04:00