Commit Graph

6950 Commits

Author SHA1 Message Date
Triang3l
3d30b2eec3 [Vulkan] Shader memory export (#145) 2024-05-25 16:31:50 +03:00
Triang3l
210ac4b2d2 [GPU] Fix gamma ramp writing after RegisterFile API change (#2262) 2024-05-18 23:53:09 +03:00
Triang3l
8e7301f4d8 [SPIR-V] Use a helper class for most if/else branching
Simplifies emission of the blocks themselves (including inserting blocks
into the function's block list in the correct order), as well as phi after
the branching.

Also fixes 64bpp storing with blending in the fragment shader interlock
render backend implementation (had a typo that caused the high 32 bits to
overwrite the low ones).
2024-05-16 23:05:49 +03:00
Triang3l
3189a0e259 [GPU] Check memexport stream constant upper bits in range gathering 2024-05-12 20:26:14 +03:00
Triang3l
a3304d252f [Base/GPU] Cleanup float comparisons and NaN and -0 in clamping
C++ relational operators are supposed to raise FE_INVALID if an argument is
NaN, use std::isless/greater[equal] instead where they were easy to locate
(though there are other places possibly, mostly min/max and clamp usage was
checked).

Also fixes a copy-paste error making the CPU shader interpreter execute
MINs as MAXs instead.
2024-05-12 19:21:37 +03:00
Triang3l
f964290ea8 [Base] Relax the system clock difference allowance in the test
Hopefully should reduce the CI failure rate, although this testing
approach is fundamentally flawed as it depends on OS scheduling.
2024-05-12 17:44:52 +03:00
Triang3l
376bad5056 [GPU] Remove register reinterpret_casts + WAIT_REG_MEM volatility
Hopefully prevents some potential #1971-like situations.

WAIT_REG_MEM's implementation also allowed the compiler to load the value
only once, which caused an infinite loop with the other changes in the
commit (even in debug builds), so it's now accessed as volatile. Possibly
it would be even better to replace it with some (acquire/release?) atomic
load/store some day at least for the registers actually seen as
participating in those waits.

Also fixes the endianness being handled only on the first wait iteration in
WAIT_REG_MEM.
2024-05-12 17:28:17 +03:00
Triang3l
f0ad4f4587 [Base] Add aliasing-safe xe::memory::Reinterpret
Accessing the same memory as different types (other than char) using
reinterpret_cast or a union is undefined behavior that has already caused
issues like #1971.

Also adds a XE_RESTRICT_VAR definition for declaring non-aliasing pointers
in performance-critical areas in the future.
2024-05-12 17:28:16 +03:00
Triang3l
a90f83d44c [Vulkan] Non-seamless cube map filtering 2024-05-05 15:20:23 +03:00
Triang3l
e9f7a8bd48 [Vulkan] Optional functionality usage improvements
Functional changes:
- Enable only actually used features, as drivers may take more optimal
  paths when certain features are disabled.
- Support VK_SAMPLER_ADDRESS_MODE_CLAMP_TO_EDGE.
- Fix the separateStencilMaskRef check doing the opposite.
- Support shaderRoundingModeRTEFloat32.
- Fix vkGetDeviceBufferMemoryRequirements pointer not passed to the Vulkan
  Memory Allocator.

Stylistic changes:
- Move all device extensions, properties and features to one structure,
  especially simplifying portability subset feature checks, and also making
  it easier to request new extension functionality in the future.
- Remove extension suffixes from usage of promoted extensions.
2024-05-04 22:47:14 +03:00
Triang3l
f87c6afdeb [Vulkan] Update headers to 1.3.278 2024-05-04 19:59:28 +03:00
Triang3l
9ebe25fd77 [GPU] Declare unused register fields explicitly 2024-05-02 23:31:13 +03:00
Gliniak
f6b5424a9f [VFS] Fixed invalid month decoding in decode_fat_timestamp 2023-09-14 12:32:51 +03:00
Gliniak
0f331b5313 [Testing] Added test project for vfs
- Added test case for: decode_fat_timestamp
- Changed location of: decode_fat_timestamp
2023-09-14 12:32:51 +03:00
Gliniak
c5e6352c34 [CPU] Added constant propagation pass for: OPCODE_AND_NOT 2023-07-27 23:41:45 +03:00
Adriano Martins
1887ea0795 [Base] Add missing #include <cstdint> to utf8.cc 2023-07-27 13:02:54 +03:00
Gliniak
00aba94b98 [NET] NetDll___WSAFDIsSet: Fixed incorrect endianness of fd_count
Plus: limit it to 64 entries
Thanks to Bo98 for pointing that out
2023-06-09 19:47:56 -05:00
Roy Stewart
07e81fe172 [Base] Filter out relative directories on linux 2023-06-09 19:47:28 -05:00
Roy Stewart
41c423109f [Base] Set the path for posix file info 2023-06-09 19:43:49 -05:00
Adrian
4a3b04d4ee [XAM] Implemented XamGetCurrentTitleId 2023-06-09 19:43:15 -05:00
Gliniak
858af5ae75 [XAM] xeXamContentCreate - Disposition cleanup 2023-06-09 19:42:48 -05:00
Gliniak
e110527bfe [Base] ListFiles: Prevent leakage of file descriptors 2023-06-09 19:41:27 -05:00
Wunkolo
6ee2e3718f [x64] Add AVX512 optimizations for OPCODE_VECTOR_COMPARE_UGT(Integer)
AVX512 has native unsigned integer comparisons instructions, removing
the need to XOR the most-significant-bit with a constant in memory to
use the signed comparison instructions. These instructions only write to
a k-mask register though and need an additional call to `vpmovm2*` to
turn the mask-register into a vector-mask register.

As of Icelake:
`vpcmpu*` is all L3/T1
`vpmovm2d` is L1/T0.33
`vpmovm2{b,w}` is L3/T0.33

As of Zen4:
`vpcmpu*` is all L3/T0.50
`vpmovm2*` is all L1/T0.25
2023-05-29 14:57:09 -05:00
Wunkolo
121bf93cbe [PPC] Implement vsubcuw
Other half of #2125. I don't know of any title that utilizes this instruction, but I went ahead and implemented it for completeness.

Verified the implementation with `instr__gen_vsubcuw` from #1348. Can be grabbed with:
```
git checkout origin/gen_tests -- src\xenia\cpu\ppc\testing\*vsubcuw.s
```
2023-05-29 14:56:12 -05:00
Wunkolo
93b77fb775 [PPC] Implement vaddcuw
I don't know of any title that utilizes this instruction, but I went
ahead and implemented it for completeness.

Verified the implementation with `instr__gen_vaddcuw` from #1348. Can be
grabbed with:
```
git checkout origin/gen_tests -- src\xenia\cpu\ppc\testing\*vaddcuw.s
```
2023-05-29 14:56:00 -05:00
Triang3l
ed64e3072b [GPU] Remove implicit bool cast in memexport checks 2023-05-05 21:38:45 +03:00
Triang3l
0e81293b02 [GPU] Remove a dangerous comment about break after exece [ci skip]
There can be jumps across an exece, so the code beyond it may still be
executed.
2023-05-05 21:32:02 +03:00
Triang3l
53f98d1fe6 [GPU/D3D12] Memexport from anywhere in control flow + 8/16bpp memexport
There's no limit on the number of memory exports in a shader on the real
Xenos, and exports can be done anywhere, including in loops. Now, instead
of deferring the exports to the end of the shader, and assuming that export
allocs are executed only once, Xenia flushes exports when it reaches an
alloc (allocs terminate memory exports on Xenos, as well as individual ALU
instructions with `serialize`, but not handling this case for simplicity,
it's only truly mandatory to flush memory exports before starting a new
one), the end of the shader, or a pixel with outstanding exports is killed.

To know which eM# registers need to be flushed to the memory, traversing
the successors of each exec potentially writing any eM#, and specifying
that certain eM# registers might have potentially been written before each
reached control flow instruction, until a flush point or the end of the
shader is reached.

Also, some games export to sub-32bpp formats. These are now supported via
atomic AND clearing the bits of the dword to replace followed by an atomic
OR inserting the new byte/short.
2023-05-05 21:32:02 +03:00
Triang3l
8aaa6f1f7d [SPIR-V] Wrap 4-operand ops and 1-3-operand GLSL std calls 2023-04-19 21:44:24 +03:00
Triang3l
19d56001d2 [SPIR-V] Wrap NoContraction operations 2023-04-19 11:53:45 +03:00
Triang3l
78f1d55a36 [SPIR-V] Use Builder createSelectionMerge directly 2023-04-19 11:11:28 +03:00
Triang3l
64d2a80f79 [SPIR-V] Cleanup ALU emulation conditionals 2023-04-19 10:35:09 +03:00
Triang3l
eede38ff63 [SPIR-V] Remove more vec2-4 reserve calls 2023-04-18 22:05:02 +03:00
Triang3l
887fda55c2 [SPIR-V] Remove temp reserve for 4 or less elements 2023-04-13 22:43:44 +03:00
Triang3l
75d805245d [DXBC] discard pixels from kill with ROV instead of returning
Keep the current lane active as it may be needed for derivatives.
2023-04-09 20:13:22 +03:00
Triang3l
88c645d818 [D3D12] Don't use emit_then_cut due to RDNA 3 crash 2023-04-09 18:07:44 +03:00
Triang3l
baa2ff78d8 [Vulkan] Add missing stencil reference unpack in RT transfer + formatting fix 2023-03-30 22:40:40 +03:00
Triang3l
c238d8af55 [Vulkan] Fix FragStencilRef store type 2023-03-30 22:28:56 +03:00
Wunkolo
f357f26eae [Build] Add parallel PPC test generation
Utilizes `multiprocessing` to allow for multiple power-pc assembly tests
to be generated in parallel.

Some results on my i9-11900k(8c/16t):

Before:
```
Measure-Command {.\xb gentests}

Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 11
Milliseconds      : 200
Ticks             : 112007585
TotalDays         : 0.000129638408564815
TotalHours        : 0.00311132180555556
TotalMinutes      : 0.186679308333333
TotalSeconds      : 11.2007585
TotalMilliseconds : 11200.7585
```

After:
```
Measure-Command {.\xb gentests}

Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 5
Milliseconds      : 426
Ticks             : 54265895
TotalDays         : 6.28077488425926E-05
TotalHours        : 0.00150738597222222
TotalMinutes      : 0.0904431583333333
TotalSeconds      : 5.4265895
TotalMilliseconds : 5426.5895
```

This is an over **x2** speedup!
2023-02-05 20:56:37 -06:00
Shoegzer
4a2f4d9cfe Add include to fix compiling 2023-01-29 21:10:20 +03:00
Gliniak
4e87d1f9d1 [Kernel/Thread] Set TLS slot to 0 while freeing 2023-01-28 17:49:12 -06:00
Wunkolo
e55cb737c1 [x64] Add AX512 optimization for OPCODE_SELECT(F64) 2022-12-28 14:20:20 -06:00
Wunkolo
ba75a016b4 [x64] Add AX512 optimization for OPCODE_SELECT(V128)
Uses `vpternlogd` to collapse the bitwise select operation into one
instruction. Though it needs a `vmovdqa` instruction since `vpternlogd`
reads and writes to the first argument.
2022-12-28 14:20:20 -06:00
Wunkolo
7c21b327ff [x64] Add x64_util.h
Used to help with generating instruction-specific constants.  Currently
used for the ternary-logic constants(`vpternlog*`).
2022-12-28 14:20:20 -06:00
Gliniak
eb25fe4f4a [CPU] Increase amount of possible labels used in FinalizationPass
Instead of using decimal notaation for labels let's use hexadecimal.
That will increase amount of possible combination by a lot.
2022-12-28 14:19:55 -06:00
Joel Linn
9eef64d3fb [SDL2] Print version on startup 2022-12-28 14:19:02 -06:00
Joel Linn
76561d5add [SDL2] Update to version 2.24.2 2022-12-28 14:19:02 -06:00
p01arst0rm
12c8d5348c added fxaa LICENSE file 2022-12-28 14:18:25 -06:00
p01arst0rm
2c1aadd2d2 remove dlmalloc 2022-12-28 14:17:50 -06:00
p01arst0rm
a1bb6cc142 moved vswhere to tools directory 2022-12-28 14:17:24 -06:00