1628 Commits

Author SHA1 Message Date
Ryan Houdek
87fbcf754d
Vector: Optimize pblendw
Using a brute force solver to add in more optimized code paths

- Adds 12 single VInsElement implementations
- Adds 4 two IR operation implementations

Not adding any of the two or three IR operation implementations that use
VInsElement because SRA interacts badly and becomes worse than the VTBX
implementation.
2024-07-27 19:25:51 -07:00
Ryan Houdek
f8ef6feff9
AVX128: Optimize blends
Optimizes the AVX128 blends by reusing the prior SSE4.1 implementation.
Only difference is the destination register isn't reused as a source
register.

One confusing thing is that Felix Cloutier's documentation has a typo on
the 256-bit VPBLENDW instruction where it had the top 128-bit lane
reusing the destination instead of sources. So I wrote a unittest to
ensure correctness.

Fixes #3796
2024-07-23 19:24:19 -07:00
Ryan Houdek
3c5b59d985
AVX128: Implement support for scalar FMA with AFP
Now that I have AFP supporting hardware I felt better implementing this
since I can run unit tests.

Fixes #3793
2024-07-22 12:58:19 -07:00
Alyssa Rosenzweig
587b924de9 json_ir_generator: stop prefixing arguments
stop prefixing the arguments when we generate allocate ops (in particular), this
is more convenient and simpler. in exchange we need to prefix Op to avoid a
collision on fcmpscalarinsert which has an argument named Op, but that's a local
change at least.

came up when experimenting with new IR, but I think this is probably a win by
itself.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-07-22 13:50:21 -04:00
Paulo Matos
a1378f94ce X87 Code Refactoring and Optimization Pass 2024-07-22 08:44:45 +02:00
Alyssa Rosenzweig
610caf8529 ConstProp: treat StoreContext as zeroable
todo: FPR equivalent.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-07-21 15:49:09 -04:00
Alyssa Rosenzweig
d20b46e46f IR: drop LoadFlag/StoreFlag ops
pointless, we can just load/store the context now.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-07-21 15:49:09 -04:00
Alyssa Rosenzweig
4094aa1b9a DeadStoreElimination: drop flag handling
now that we do everything via NZCV, this is mostly vestigial. DF/x87 flags are
sufficiently rare to be "don't care"s here, and we don't even have multiblock
enabled yet!

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-07-21 15:49:08 -04:00
Ryan Houdek
f8c6baae97
Merge pull request #3883 from Sonicadvance1/implement_daz
Arm64: Implements support for DAZ using AFP.FIZ
2024-07-21 10:03:34 -07:00
Ryan Houdek
f7b4d25803
Telemetry: Remove VEX flag
This is no longer necessary and it also no longer provides us any useful
information. Since we expose the AVX CPUID flag, basically everything
uses VEX encoding now, so it is basically always set.
2024-07-20 17:24:00 -07:00
Ryan Houdek
95b15d788b
Arm64: Fix filling static registers
Some locations could end up with SRA registers that only spilled one
register.
Allow passing in temporaries from the call site.
Fixes rpid and syscalls asserting.
2024-07-20 15:57:01 -07:00
Ryan Houdek
b78da2e5ad
Arm64: Implements support for DAZ using AFP.FIZ
When AFP is supported then we can actually support DAZ. This might also
fix the audio corruption in Animal Well but I can't test it until Steam
is running on Oryon. Requires a bit of plumbing for MXCSR which we were
hacking around before but now we actually want to store the value.

Fixes #3856
2024-07-20 15:34:54 -07:00
Ryan Houdek
1c35eeffeb
Vector: Optimize PSHUFD with brute force search
With a brute force search of methods between 1-3 instructions we cover a
lot more cases more optimally.

There's definitely still more cases (and probably some that can reduce
from 3 instruction to 2), but covering 44 cases is a pretty good margin
already.
2024-07-18 04:10:58 -07:00
Ryan Houdek
b0bd8a62a2
AVX128: Improve VPERMILPS/PD and VPSHUFD
VPSHUFD and VPERMILPS are aliases of each other.

Reuses the implementation path from the PSHUFD implementation which has
a few swizzles and then a table lookup.

VPERMILPD is a very simple swizzle per 128-bit lane.

Fixes #3797
Fixes #3784
2024-07-18 04:10:58 -07:00
Ryan Houdek
da51169ba9
Merge pull request #3875 from alyssarosenzweig/ir/gethostflag
IR: garbage collect premature F80Cmp optimizations
2024-07-17 03:05:48 -07:00
Ryan Houdek
f72cee480f
Merge pull request #3874 from alyssarosenzweig/opt/reconstructftw
X87: save uop in ReconstructFTW
2024-07-17 03:05:37 -07:00
Alyssa Rosenzweig
e7d5a01c5f IR: remove F80Cmp flags
nothing is optimizing around this, it's just adding pointless complexity. if we
want to actually optimize F80Cmp, the right way would be to lift the
implementation into the OpcodeDispatcher or JIT. it wouldn't be terribly
difficult. This kludge doesn't get us closer there.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-07-16 14:53:58 -04:00
Alyssa Rosenzweig
0c3a8d0bc8 IR: remove GetHostFlag
it doesn't get host flags, it's just an extra Bfe used in x87. pointless and
confusing!

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-07-16 14:44:34 -04:00
Alyssa Rosenzweig
c4ba7eee87 X87: save uop in ReconstructFTW
noticed while reviewing Paulo's work

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-07-16 13:54:09 -04:00
Paulo Matos
8d89adef2e Add IR stack operations
These IR operations deal implicitly with the x87 stack and are removed
by the x87 stack optimization pass.
2024-07-16 09:07:35 +02:00
Alyssa Rosenzweig
1e709d1150 OpcodeDispatcher: add RecordX87 helper
calls will be generated.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-07-16 09:07:35 +02:00
Alyssa Rosenzweig
476ee0cd7d IR: track whether x87 is used in header
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-07-16 09:07:35 +02:00
Ryan Houdek
b8e864ffdf
Merge pull request #3865 from Sonicadvance1/telemetry_atexit
Telemetry: Change how visibility of telemetry values work
2024-07-15 09:53:37 -07:00
Ryan Houdek
d79b7fcc49
Merge pull request #3808 from alyssarosenzweig/rclse/3
Try to delete RCLSE again
2024-07-12 20:38:06 -07:00
Ryan Houdek
b9a6caea8d
Merge pull request #3844 from Sonicadvance1/fix_vmovq
AVX128: Fixes vmovq loading too much data
2024-07-12 17:07:32 -07:00
Ryan Houdek
97a68cb643
Telemetry: Change how visibility of telemetry values work
Removes global initializer for telemetry values since their address is
visible and PIC relative code loading handles the address fetching for
us.
2024-07-12 03:18:23 -07:00
Ryan Houdek
870e395ac4
Merge pull request #3862 from Sonicadvance1/remove_atexit_logman
LogManager: Removes fextl::vector usage
2024-07-12 02:05:02 -07:00
Ryan Houdek
04592f82f5
Merge pull request #3861 from Sonicadvance1/remove_atexit_vdso
VDSO: Stop using a vector for a static
2024-07-12 02:04:25 -07:00
Ryan Houdek
5ef0db994d
VDSO: Stop using a vector for a static
This causes a global initializer that registers an atexit handler.

Be smarter, use an std::array and pass its data around using a span
instead.

Removes the global initializer and removes the atexit installation
2024-07-11 23:53:57 -07:00
Ryan Houdek
b523407a3e
LogManager: Removes fextl::vector usage
We never use more than one logging method at a time so this was
overengineered for what it is doing.

Instead only allow one handler for messages and throw messages each
which just is a pointer.

Removes a global initializer and an atexit handler being installed
2024-07-11 22:51:56 -07:00
Ryan Houdek
8021dc10a1
OpcodeDispatcher: Force noinline for the function call in the Bind helper
Clang was inlining a few of the functions it was calling. So force it
never to inline since this is supports to be a little shim trampoline
only.
2024-07-11 19:00:42 -07:00
Ryan Houdek
7e8d734e43
AVX256: Initial fixes just to get my unittest working
This is the initial split to decouple AVX256 composed operations from
their MMX/SSE counterparts. This is to work around the subtle
differences with AVX/SSE zext/insert behaviour.
2024-07-11 18:43:31 -07:00
Ryan Houdek
3c7318d7c8
AVX128: Fixes vmovq loading too much data
This was doing a 128-bit load from memory and then a 64-bit zero extend
which looked like a spurious move but it was trying to match the
behaviour of vmovq where it needed the zero extend.

Also adds a unit test to ensure that we aren't loading too much data by
loading right up against a page boundary.

Fixes #3787
2024-07-11 18:34:05 -07:00
Ryan Houdek
fc0b233046
Merge pull request #3859 from neobrain/refactor_opdispatch_templates
OpcodeDispatcher: Replace hand-written wrapper templates with a generic utility
2024-07-11 18:18:23 -07:00
Mai
e25918d846
Merge pull request #3858 from Sonicadvance1/implement_nt_load
Implement support for SSE4.1/AVX NT loads
2024-07-11 14:22:41 -04:00
Alyssa Rosenzweig
3a334c4585 Reapply "IR: drop RCLSE"
This reverts commit 78aee4d96e39c9ef6415a7dca21fd6b81dabe12e.
2024-07-11 13:21:14 -04:00
Alyssa Rosenzweig
8dae4bcd44 OpcodeDispatcher: drop stale comment
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-07-11 13:21:14 -04:00
Alyssa Rosenzweig
294f10fdd0 OpcodeDispatcher: reg cache mmx
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-07-11 13:21:14 -04:00
Tony Wasserka
b9829ed316 OpcodeDispatcher: Replace even more hand-written wrapper templates 2024-07-11 16:19:15 +02:00
Tony Wasserka
4ccec17676 OpcodeDispatcher: Replace more hand-written wrapper templates 2024-07-11 16:19:15 +02:00
Tony Wasserka
f45082043b OpcodeDispatcher: Replace hand-written wrapper templates with a generic utility 2024-07-11 16:19:14 +02:00
Tony Wasserka
3222f13dde Fix comment formatting 2024-07-11 16:19:14 +02:00
Mai
b282620a48
Merge pull request #3857 from Sonicadvance1/sve_bitperm
Arm64: Implement support for SVE bitperm
2024-07-11 05:05:41 -04:00
Ryan Houdek
e24b01b6cb
Arm64: Implement support for SVE bitperm 2024-07-11 01:46:35 -07:00
Tony Wasserka
9a8694c2f3
Merge pull request #3853 from neobrain/refactor_warn_fixes
Fix all the warnings
2024-07-11 10:12:41 +02:00
Tony Wasserka
070a9148aa
Merge pull request #3852 from neobrain/refactor_opdispatch_codesize
OpcodeDispatcher: Avoid template monomorphization to reduce FEXLoader binary size
2024-07-11 09:58:49 +02:00
Tony Wasserka
f19fe3b6f3 Fix warning about an expression with side effects being passed to __builtin_assume
LOGMAN_THROW_AA_FMT has no benefit over LOGMAN_THROW_A_FMT here, so just use
the latter.
2024-07-11 09:54:31 +02:00
Tony Wasserka
8d2b15665d Fix unused-variable warnings 2024-07-11 09:54:30 +02:00
Ryan Houdek
548fd9daf8
OpcodeDispatcher: Implement support for SSE4.1 NT load 2024-07-10 23:07:37 -07:00
Ryan Houdek
f831f5a0e1
AVX128: Implement support for NT Load 2024-07-10 23:07:14 -07:00