10044 Commits

Author SHA1 Message Date
Paulo Matos
b1ec50c7c2 Test running scripts tell ctest of skipped tests
CMake sets 125 as the skipped test exit code that the scripts use.
2024-07-26 14:04:54 +02:00
Ryan Houdek
9201ac5a6b
Merge pull request #3882 from Sonicadvance1/scalar_afp_fma
AVX128: Implement support for scalar FMA with AFP
2024-07-22 13:19:59 -07:00
Ryan Houdek
8ebf049fb9
InstcountCI: Update for Scalar FMA with AFP 2024-07-22 12:58:20 -07:00
Ryan Houdek
3c5b59d985
AVX128: Implement support for scalar FMA with AFP
Now that I have AFP supporting hardware I felt better implementing this
since I can run unit tests.

Fixes #3793
2024-07-22 12:58:19 -07:00
Ryan Houdek
6b91e0cb0e
Merge pull request #3887 from alyssarosenzweig/ir/prefix
json_ir_generator: stop prefixing arguments
2024-07-22 12:57:09 -07:00
Alyssa Rosenzweig
587b924de9 json_ir_generator: stop prefixing arguments
stop prefixing the arguments when we generate allocate ops (in particular), this
is more convenient and simpler. in exchange we need to prefix Op to avoid a
collision on fcmpscalarinsert which has an argument named Op, but that's a local
change at least.

came up when experimenting with new IR, but I think this is probably a win by
itself.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-07-22 13:50:21 -04:00
Tony Wasserka
d507f4c9b1
Merge pull request #3547 from pmatos/wip_x87_stack
x87 Stack Optimization
2024-07-22 14:42:02 +02:00
Paulo Matos
39bc2a82c1 instcountci: X87 Pass and refactoring 2024-07-22 08:50:01 +02:00
Paulo Matos
774325dcf2 Tests: X87 Refactoring and Pass 2024-07-22 08:44:45 +02:00
Paulo Matos
a1378f94ce X87 Code Refactoring and Optimization Pass 2024-07-22 08:44:45 +02:00
Ryan Houdek
77ec950ff2
Merge pull request #3885 from alyssarosenzweig/opt/zero-flag
Optimize zero x87 flags
2024-07-21 13:06:25 -07:00
Alyssa Rosenzweig
592d6cc43f InstCountCI: Update
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-07-21 15:50:10 -04:00
Alyssa Rosenzweig
610caf8529 ConstProp: treat StoreContext as zeroable
todo: FPR equivalent.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-07-21 15:49:09 -04:00
Alyssa Rosenzweig
d20b46e46f IR: drop LoadFlag/StoreFlag ops
pointless, we can just load/store the context now.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-07-21 15:49:09 -04:00
Alyssa Rosenzweig
4094aa1b9a DeadStoreElimination: drop flag handling
now that we do everything via NZCV, this is mostly vestigial. DF/x87 flags are
sufficiently rare to be "don't care"s here, and we don't even have multiblock
enabled yet!

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-07-21 15:49:08 -04:00
Ryan Houdek
f8c6baae97
Merge pull request #3883 from Sonicadvance1/implement_daz
Arm64: Implements support for DAZ using AFP.FIZ
2024-07-21 10:03:34 -07:00
Ryan Houdek
5c9bb6594c
Merge pull request #3884 from Sonicadvance1/remove_vex_telem
Telemetry: Remove VEX flag
2024-07-20 19:44:28 -07:00
Ryan Houdek
56df57e980
InstcountCI: Update 2024-07-20 17:26:27 -07:00
Ryan Houdek
f7b4d25803
Telemetry: Remove VEX flag
This is no longer necessary and it also no longer provides us any useful
information. Since we expose the AVX CPUID flag, basically everything
uses VEX encoding now, so it is basically always set.
2024-07-20 17:24:00 -07:00
Ryan Houdek
4fffe68f81
InstcountCI: Update 2024-07-20 15:57:01 -07:00
Ryan Houdek
95b15d788b
Arm64: Fix filling static registers
Some locations could end up with SRA registers that only spilled one
register.
Allow passing in temporaries from the call site.
Fixes rpid and syscalls asserting.
2024-07-20 15:57:01 -07:00
Ryan Houdek
ae9312bdab
unittests: Implements a DAZ test
Specifically does a vector add with and without DAZ enabled and ensures
the value is different when the source values contain a denormal.
2024-07-20 15:34:54 -07:00
Ryan Houdek
b78da2e5ad
Arm64: Implements support for DAZ using AFP.FIZ
When AFP is supported then we can actually support DAZ. This might also
fix the audio corruption in Animal Well but I can't test it until Steam
is running on Oryon. Requires a bit of plumbing for MXCSR which we were
hacking around before but now we actually want to store the value.

Fixes #3856
2024-07-20 15:34:54 -07:00
Ryan Houdek
54fc8cb0bd
TestHarnessRunner: Support querying AFP for features
Also fixes desync of flags
2024-07-20 15:34:54 -07:00
Ryan Houdek
228009c283
Merge pull request #3881 from bylaws/race
FixedSizePooledAllocation: Fix a race when unclaiming disowned buffers
2024-07-20 12:35:35 -07:00
Billy Laws
1f878ce4cd FixedSizePooledAllocation: Fix a race when unclaiming disowned buffers
A disowned buffer could be unclaimed or claimed by a different thread in
the time between the !IsFree check and locking the allocation mutex.
Fix this and prevent such errors in the future by always checking
ownership with the allocator locked before attempting to unclaim
buffers.
2024-07-20 00:12:52 +00:00
Alyssa Rosenzweig
e4b7a65a49
Merge pull request #3880 from pmatos/InstCountMemcpy
Add x87 memcpy instcountci tests
2024-07-19 08:53:23 -04:00
Paulo Matos
c77a707dbe Add x87 memcpy instcountci tests 2024-07-19 09:09:34 +02:00
Ryan Houdek
f81fc4e4f0
Merge pull request #3866 from Sonicadvance1/ArgumentLoader_atexit_remove
ArgumentLoader: Removes static fextl::vector usage
2024-07-18 13:12:03 -07:00
Ryan Houdek
d385e496d3
Merge pull request #3879 from Sonicadvance1/cpuid_leafs
EmulatedFiles: Adds a few leaf CPUID flags
2024-07-18 13:10:21 -07:00
Mai
f8c4c543e3
Merge pull request #3871 from Sonicadvance1/improve_vpshufd_vpermilps
AVX128: Improve VPERMILPS/PD and VPSHUFD
2024-07-18 15:58:47 -04:00
Ryan Houdek
d2f903ae55
EmulatedFiles: Adds a few leaf CPUID flags
We support leaf functions now, so add the few that were calling for it.
We will be gaining support for the xsave ones relatively soon, so its
good to have them supported.

Also deletes a couple of cdt/cqm things that aren't exposed and we won't
be supporting.
2024-07-18 07:02:49 -07:00
Ryan Houdek
d1249ec5cf
Merge pull request #3878 from neobrain/refactor_fix_format_oops
EmulatedFiles: Fix bad formatting
2024-07-18 06:44:44 -07:00
Tony Wasserka
bf9a6d763c EmulatedFiles: Fix bad formatting 2024-07-18 15:06:57 +02:00
Ryan Houdek
0b829d2c46
unittests: Adds a test for full pshufd imm coverage 2024-07-18 04:13:03 -07:00
Ryan Houdek
bddb533fa0
InstcountCI: Add some more of the cases 2024-07-18 04:13:03 -07:00
Ryan Houdek
1c35eeffeb
Vector: Optimize PSHUFD with brute force search
With a brute force search of methods between 1-3 instructions we cover a
lot more cases more optimally.

There's definitely still more cases (and probably some that can reduce
from 3 instruction to 2), but covering 44 cases is a pretty good margin
already.
2024-07-18 04:10:58 -07:00
Ryan Houdek
c7254e31ed
InstcountCI: Update for VPERM/VPSHUFD improvements 2024-07-18 04:10:58 -07:00
Ryan Houdek
b0bd8a62a2
AVX128: Improve VPERMILPS/PD and VPSHUFD
VPSHUFD and VPERMILPS are aliases of each other.

Reuses the implementation path from the PSHUFD implementation which has
a few swizzles and then a table lookup.

VPERMILPD is a very simple swizzle per 128-bit lane.

Fixes #3797
Fixes #3784
2024-07-18 04:10:58 -07:00
Ryan Houdek
17a55fbb39
Merge pull request #3876 from alyssarosenzweig/json/x87
Autogenerate LoweredX87() query, misc json_ir_generator cleanup in the area
2024-07-18 04:10:21 -07:00
Alyssa Rosenzweig
dedec83881 json_ir_generator: autoderive array names
these are purely internal.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-07-17 15:32:51 -04:00
Alyssa Rosenzweig
9fd5c73633 json_ir_generator: generate IsLoweredX87 helper
X87 pass will use this query.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-07-17 15:31:46 -04:00
Alyssa Rosenzweig
bdb890a8b0 json_ir_generator: rename X87 -> LoweredX87
to reflect its actual meaning

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-07-17 15:31:14 -04:00
Alyssa Rosenzweig
5043d09771 json_ir_generator: use textwrap.dedent, f-string
for size

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-07-17 15:28:59 -04:00
Ryan Houdek
da51169ba9
Merge pull request #3875 from alyssarosenzweig/ir/gethostflag
IR: garbage collect premature F80Cmp optimizations
2024-07-17 03:05:48 -07:00
Ryan Houdek
f72cee480f
Merge pull request #3874 from alyssarosenzweig/opt/reconstructftw
X87: save uop in ReconstructFTW
2024-07-17 03:05:37 -07:00
Alyssa Rosenzweig
7546160811 InstCountCI: Update
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-07-16 14:53:58 -04:00
Alyssa Rosenzweig
e7d5a01c5f IR: remove F80Cmp flags
nothing is optimizing around this, it's just adding pointless complexity. if we
want to actually optimize F80Cmp, the right way would be to lift the
implementation into the OpcodeDispatcher or JIT. it wouldn't be terribly
difficult. This kludge doesn't get us closer there.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-07-16 14:53:58 -04:00
Alyssa Rosenzweig
0c3a8d0bc8 IR: remove GetHostFlag
it doesn't get host flags, it's just an extra Bfe used in x87. pointless and
confusing!

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-07-16 14:44:34 -04:00
Alyssa Rosenzweig
19e58cac62 InstCountCI: Update
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2024-07-16 13:54:28 -04:00