7929 Commits

Author SHA1 Message Date
Ryan Houdek
a21def7d74 unittests/ASM: Implements tests for vpgatherqq/vgatherqpd
Similar to previous tests, vpgatherqq and vgatherqpd are equivalent
instructions. So the tests are the same with the mnemonic changed.

This adds tests for an additional two sets of instructions. Getting us
full coverage of all eight instructions if we include the tests from
PR #3167 and #3166

Tests the same things as described in #3165

In addition, since these tests use 64-bit indices for address
calculation, we can easily generate and indice vector that tests
overflow. So every test at every displacement ALSO gains an additional
overflow test to ensure correct behaviour around pointer overflow
calculation.
2023-09-29 08:04:47 -07:00
Ryan Houdek
85da0f0640
Merge pull request #3165 from Sonicadvance1/gatherddps
unittests/ASM: Implements tests for vpgatherdd/vgatherps
2023-09-28 22:44:38 -07:00
Ryan Houdek
9a01b440e3 unittests/ASM: Implements tests for vpgatherdd/vgatherps
vpgatherdd and vgatherps are effectively the same instructions, so the
tests are the same except for the instruction mnemonic.

This adds unit tests for two of the eight gather instructions.
Specifically this adds tests for the 32-bit indices loading 32-bit
elements instructions.

What it tests:
- Tests all displacement scales
- Tests multiple mask arrangements
- Ensures the mask register is zero'd after the instruction

What it doesn't test:
- Doesn't test address size calculation overflow
   - Only would happen on 32-bit with 32-bit indices, or /really/ high
     base addresses
   - The instruction should behave as a mask to the address size
   - Effectively behaves like `(uint64_t)(base + index << ilog2(scale))`
   - Better idea is to just not expose AVX to 32-bit applications
- Doesn't test VSIB immediate displacement
   - This just ends up being base_addr + imm so it isn't too interesting
   - We can add more tests in the future if we think we messed that up
- Doesn't test partial fault behaviour
   - Because that's a nightmare.

Specifically keeps each instruction test small and isolated so if a
single register fails it is very easily to nail down which operation did
it.
I know some of our ASM tests do a chunk of work and spit out a result at
the end which can be difficult to debug in some cases. Didn't want to do
that which is why the tests are spread out across 16 files for these
single class of instructions.
2023-09-28 19:58:34 -07:00
Ryan Houdek
228ee7fa47 TestHarnessRunner: Support AVX2 flag detection 2023-09-28 19:58:34 -07:00
Ryan Houdek
98789a8039 FEXCore: Implement support for AVX2 feature detection 2023-09-28 19:57:08 -07:00
Ryan Houdek
14398742c3
Merge pull request #3164 from neobrain/fix_thunks_asan
Thunks: Fix AddressSanitizer build
2023-09-28 12:05:55 -07:00
Tony Wasserka
5a7e3192da Thunks: Fix AddressSanitizer build 2023-09-28 15:13:03 +02:00
Ryan Houdek
6b4ff4ae81
Merge pull request #3163 from alyssarosenzweig/opt/ascii-flags
Optimize ASCII flags
2023-09-27 10:42:47 -07:00
Ryan Houdek
d1d3de80d1
Merge pull request #3157 from alyssarosenzweig/opt/unmask-in
OpcodeDispatcher: Don't mask logic op inputs
2023-09-27 10:38:12 -07:00
Alyssa Rosenzweig
2e32e1367d InstCountCI: Update
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-09-27 10:55:57 -04:00
Alyssa Rosenzweig
711583aa76 OpcodeDispatcher: Optimize PTEST flags
Zero NZCV first to avoid RMW.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-09-27 10:55:57 -04:00
Alyssa Rosenzweig
3efac9646c OpcodeDispatcher: Optimize ASCII flags
Make the zeroing of undefined NZCV more obvious. Mitigates regressions from
future work.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-09-27 10:31:31 -04:00
Alyssa Rosenzweig
095a362046 InstCountCI: Update
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-09-26 20:30:09 -04:00
Alyssa Rosenzweig
3bb64c64e3 OpcodeDispatcher: Don't mask for TEST
Like AND.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-09-26 20:30:02 -04:00
Alyssa Rosenzweig
a4de164944 OpcodeDispatcher: Use lshr for ah/bh with AllowUpperGarbage
If we ever get around to fusing ops with shifts in the ConstProp optimizer (may
or may not be worthwhile), this will delete an instruction from things like "or
al, bh".

Even though lsr is the same speed as bfe on Firestorm, I feel if you ask for
garbage you should get garbage C:

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-09-26 20:28:01 -04:00
Alyssa Rosenzweig
45a645fbbc OpcodeDispatcher: Don't mask logic op inputs
Pointless, upper bits ignored anyway. Deletes piles of uxt and even some 32-bit
instruction moves.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-09-26 19:12:22 -04:00
Alyssa Rosenzweig
92211bf8c6 OpcodeDispatcher: Add AllowUpperGarbage option
To load 8-bit sources without bfe'ing for al/bl/cl if the caller knows it
doesn't need masking behaviour, but without lying about the size so the extract
for ah/bh/ch will still work properly.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-09-26 19:08:20 -04:00
Alyssa Rosenzweig
728d3f8ac7 InstCountCI: Add a case with a hi 8-bit reg
Noticeably different code pattern.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-09-26 18:33:55 -04:00
Ryan Houdek
ca87d8688d
Merge pull request #3153 from alyssarosenzweig/opt/adcs
Use adcs
2023-09-26 09:57:01 -07:00
Ryan Houdek
e32601f49d
Merge pull request #3161 from neobrain/fix_ctest_silent_failures
unittests: Instruct CTest to print output from tests on failure
2023-09-26 08:26:15 -07:00
Tony Wasserka
f4dd456c80 unittests: Instruct CTest to print output from tests on failure 2023-09-26 17:16:28 +02:00
Alyssa Rosenzweig
7b22dbfe24 InstCountCI: Update
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-09-26 10:05:59 -04:00
Alyssa Rosenzweig
7a06cc9727 IR: Use adcs/sbcs
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-09-26 09:06:46 -04:00
Ryan Houdek
8b3881b5db
Merge pull request #3154 from alyssarosenzweig/opt/smol-carry
Optimize 8/16-bit CF calculation
2023-09-26 05:49:07 -07:00
Ryan Houdek
76d4637d9c
Merge pull request #3159 from neobrain/feature_update_vulkan
Thunks: Update Vulkan thunk to v1.3.261.1
2023-09-26 05:20:18 -07:00
Alyssa Rosenzweig
0d12cce74f
Merge pull request #3158 from Sonicadvance1/unittest_for_3153
unittests/ASM: Adds unit test caught by #3153
2023-09-26 08:15:40 -04:00
Tony Wasserka
04592af609 Thunks: Update Vulkan thunk to v1.3.261.1 2023-09-26 12:14:58 +02:00
Ryan Houdek
d8366c04dc unittests/ASM: Adds unit test caught by #3153 2023-09-26 00:28:45 -07:00
Ryan Houdek
533f35934c
Merge pull request #3155 from neobrain/opt_thunks_rebuilds
Thunks: Avoid recompiling thunk interfaces on FEXLoader changes
2023-09-25 19:21:09 -07:00
Alyssa Rosenzweig
35bb7cc801 InstCountCI: Update
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-09-25 19:41:31 -04:00
Alyssa Rosenzweig
5facb21d30 OpcodeDispatcher: Don't mask small add/sub carries
For the GPR result, the masking already happens as part of the bfi. So the only
point of masking is for the flag calculation. But actually, every flag except
carry will ignore the upper bits anyway. And the carry calculation actually
WANTS the upper bit as a faster impl.

Deletes a pile of code both in FEX and the output :-)

ADC/SBC could probably get similar treatment later.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-09-25 18:25:30 -04:00
Tony Wasserka
adead832a5 Thunks: Avoid recompiling thunk interfaces on FEXLoader changes
The interface files themselves don't use FEXLoader. Only the final library
does.
2023-09-25 23:04:09 +02:00
Ryan Houdek
5eed24a242
Merge pull request #3152 from Sonicadvance1/instcountci_x87_f64
InstCountCI: Support f64 reduced precision mode tests
2023-09-24 19:29:37 -07:00
Ryan Houdek
7907f70ed2 InstCountCI: Adds new x87 reduced precision mode tests 2023-09-24 18:50:05 -07:00
Ryan Houdek
7141332f6f InstCountCI: Support setting environment variables in tests
This will allow us to enable FEX options through environment variables
just like the ASM tests.
2023-09-24 18:50:01 -07:00
Ryan Houdek
234e029391
Merge pull request #3145 from Sonicadvance1/optimize_inline_calls
PassManager: Optimize out CPUID and XGetBV calls
2023-09-24 18:09:18 -07:00
Ryan Houdek
19a7b514e6
Merge pull request #3150 from alyssarosenzweig/opt/ornror
Optimize PF calculation in lahf
2023-09-24 18:05:57 -07:00
Ryan Houdek
220761a0e8
Merge pull request #3151 from Sonicadvance1/unique_name_workflow_jobs
Github: Changes jobs to have unique names
2023-09-24 18:03:57 -07:00
Alyssa Rosenzweig
cbd4daddff InstCountCI: Update
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-09-24 20:59:28 -04:00
Alyssa Rosenzweig
c8519b0b87 OpcodeDispatcher: Remove LoadPF
Now unused, its former users all prefer LoadPFRaw since they can fold in some of
this math into the use.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-09-24 20:59:28 -04:00
Alyssa Rosenzweig
68d32ad70d OpcodeDispatcher: Optimize PF in lahf
Use the raw popcount rather than the final PF and use some sneaky bit math to
come out 1 instruction ahead.

Closes #3117

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-09-24 20:59:28 -04:00
Ryan Houdek
62890f148f Github: Changes jobs to have unique names
These overlapping names make it impossible to ensure all checks are
required to pass before merge.

Unique names will fix this.
2023-09-24 17:52:47 -07:00
Alyssa Rosenzweig
1f02a6da34 IR: Add Ornror op
Mostly copypaste of Orlshl... we really should deduplicate this mess somehow.
Maybe a shift enum on the core Or op?

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2023-09-24 20:47:50 -04:00
Alyssa Rosenzweig
86063411dc Revert "OpcodeDispatcher: Use plain Lshl for flags"
This logic is unused since 8adfaa9aa ("OpcodeDispatcher: Use SelectCC for x87"),
which addressed the underlying issue.

This reverts commit df3833edbe3d34da4df28269f31340076238e420.
2023-09-24 20:47:50 -04:00
Ryan Houdek
9968e6431f Passes: Rename SyscallOptimization
This is now inlining multiple external calls out of the JIT. Rename it
to InlineCallOptimization.
2023-09-24 17:25:38 -07:00
Ryan Houdek
ff24f64b2a PassManager: Optimize out CPUID and XGetBV calls
If we const-prop the required functions and leafs then we can directly
encode the CPUID information rather than jumping out of the JIT.
In testing almost all CPUID executions const-prop which function is
getting called. Worst case that I found was only 85% const-prop rate.

This isn't quite 100% optimal since we need to call the RCLSE and
Constprop passes after we optimize these, which would remove some
redundant moves.

Sadly there seems to be a bug in the constprop pass that starts crashing
applications if that is done.
Easily enough tested by running Half-Life 2 and it immediately hitting
SIGILL.

Even without this optimization, this is stil a significant savings since
we aren't jumping out of the JIT anymore for these optimized CPUIDs.
2023-09-24 17:25:38 -07:00
Ryan Houdek
e9a7ef2534 CPUID: Describe CPUID functions if they return constant state or not
Most CPUID routines return constant data, there are four that don't.
Some CPUID functions also need the leaf descriptor, so we need to
describe that as well.

Functions that don't return constant data:
- function 1Ah - Returns different data depending on current CPU core
- function 8000_000{2,3,4} - Different data based on CPU core

Functions that need leaf constprop:
- 4h, 7h, Dh, 4000_0001h, 8000_001Dh
2023-09-24 17:25:38 -07:00
Ryan Houdek
842c57e221 CPUID: Constify some functions
These don't modify CPUIDEmu state.
2023-09-24 17:25:38 -07:00
Ryan Houdek
93aeb157b4
Merge pull request #3149 from Sonicadvance1/fail_on_change
InstCountCI: Fail CI if there was any difference.
2023-09-24 17:23:52 -07:00
Ryan Houdek
02ff9f200c InstCountCI: Upload diff and check for failure 2023-09-24 17:14:08 -07:00