Commit Graph

8711 Commits

Author SHA1 Message Date
M4xw
99ce3125df [Softgpu] Fix AArch64 oversight 2023-06-27 17:20:11 +02:00
Unknown W. Brackets
fedb92b0e9 softgpu: Ensure early depth test uses SIMD. 2023-06-25 10:18:21 -07:00
Henrik Rydgård
08d578dce9
Merge pull request #17618 from unknownbrackets/softgpu-opt-cast
Optimize casts in softgpu
2023-06-25 07:55:30 +02:00
Henrik Rydgård
ec92675c5e
Merge pull request #17619 from unknownbrackets/softgpu-opt-z
softgpu: Improve Z interpolation SIMD
2023-06-25 07:55:03 +02:00
Unknown W. Brackets
d42642edd2 softgpu: Improve Z interpolation SIMD. 2023-06-24 22:17:11 -07:00
Unknown W. Brackets
15b66ba6c0 softgpu: Make SIMD on x86_32 a bit safer. 2023-06-24 14:49:23 -07:00
Unknown W. Brackets
ae9d34370e softgpu: Move wsum_recip out of the triangle loop.
Seems like a small benefit, but not seeing any issues from this.
Noticed by fp64.
2023-06-24 12:38:05 -07:00
Unknown W. Brackets
795de9b164 softgpu: Use SIMD for more Vec4 casts.
A number of these were falling back to some pretty terrible code.
Thanks to fp64 for noticing.
2023-06-24 12:36:44 -07:00
Unknown W. Brackets
76990aec70
Merge pull request #17609 from fp64/optimize-softgpu-tex-linear
softgpu: Optimize (bi-)linear texture filtering
2023-06-21 23:39:15 -07:00
fp64
159faaa2ec softgpu: Optimize (bi-)linear texture filtering
Seeing as SampleLinearLevel is near the top in the profiler,
optimize actual bilinear filtering using SSE2. Solid win in the
synthetic benchmark (https://godbolt.org/z/fqh3xvbGx, also doubles
as correctness check), no visible difference in actual PPSSPP.
Note: profiler suggests that hot part of SampleLinearLevel is
elsewhere.
2023-06-21 20:02:34 +03:00
Henrik Rydgård
7cc8c6cea4 OSD: Add semantics, move the the OSD state to common (while keeping the renderer in the UI). 2023-06-20 14:40:46 +02:00
Unknown W. Brackets
efd8565ffe
Merge pull request #17592 from fp64/anymask-movemask
Use _mm_movemask_ps for AnyMask
2023-06-17 09:48:09 -07:00
fp64
ab85c46161 Use _mm_movemask_ps for AnyMask
Probably very minor speed improvement, but it's rather neat.
2023-06-17 01:05:02 -04:00
Henrik Rydgård
5b4fa06b00 Revert Dot33 on 32-bit x86 only. See #17584 2023-06-16 23:43:33 +02:00
Henrik Rydgård
6d4e5a0f3e
Merge pull request #17584 from fp64/sse2-dot33
Convert Dot33 to SSE2
2023-06-15 20:08:23 +02:00
Henrik Rydgård
def09bf575 Update the uvscale uniform a bit more conservatively on framebuffer changes
Plus fixes a few minor oversights

Fixes #17581 and possibly #17522
2023-06-15 11:57:30 +02:00
fp64
f0d844a5a3 Convert Dot33 to SSE2
Simpler, lower requirements, and doesn't seem to hurt speed. See #17571.
2023-06-14 22:02:50 -04:00
Henrik Rydgård
6d8069dfd1 Vulkan: Remove the remains of the input attachment experiment
Haven't been using these for a while.

I've come to the conclusion here that I think it's better to try to
deal with the issues using safe workarounds like copies, instead of
relying on features with somewhat iffy driver support that are not
universal across APIs anyway.
2023-06-13 20:46:27 +02:00
Henrik Rydgård
df7bd89b7d Division->shift. since it's a signed integer, gets rid of a cdq instruction. 2023-06-13 11:57:28 +02:00
Henrik Rydgård
0eb3702ecb Then add the early-outs for NEON too. 2023-06-13 11:48:04 +02:00
Henrik Rydgård
9647872a09 Same for NEON, first the refactor... 2023-06-13 11:48:04 +02:00
Henrik Rydgård
77da36c03f SSE addstrip: Add the early-outs. 2023-06-13 11:47:53 +02:00
Henrik Rydgård
39034586a4 SSE: Refactor AddStrip to prepare for early out 2023-06-13 11:45:59 +02:00
Henrik Rydgård
22632b82bd
Merge pull request #17565 from hrydgard/breakout-vcache-vulkan
Vulkan: Breakout the vertex cache logic from DoFlush()
2023-06-13 09:56:52 +02:00
Henrik Rydgård
963ca50ba7
Merge pull request #17567 from hrydgard/uvscale-as-argument
Pass uvScale in as a fourth argument to the vertex decoder
2023-06-13 09:49:31 +02:00
Henrik Rydgård
71a34d4ffc
Merge pull request #17569 from hrydgard/arm64dec-optimize-saved-regs
ARM64: Optimize saved registers in vertex decoder.
2023-06-13 09:49:08 +02:00
Unknown W. Brackets
a7fa37d114 softgpu: Use SIMD more for dot products. 2023-06-12 19:54:32 -07:00
Henrik Rydgård
cdcf3b272e ARM64: Optimize saved registers in vertex decoder.
Simplify away some arrays with unused elements
2023-06-13 00:26:38 +02:00
Henrik Rydgård
4af6fac726 Nop-align the ARM and ARM64 loops too. Many CPUs benefit somewhat from hot loops being 16-byte aligned. 2023-06-13 00:05:48 +02:00
Henrik Rydgård
5ae9c9c64e
Merge pull request #17568 from hrydgard/extract-some-changes
Extract some minor changes from #17497
2023-06-12 23:38:14 +02:00
Henrik Rydgård
c4e44d66b0 x86/x64: Nop-align the main loop of vertex decoder loops 2023-06-12 20:39:39 +02:00
Henrik Rydgård
01cea7f088 Pass uvScale in as an argument to the vertex decoder
Cleaner than overwriting/restoring gstate_c.uvScale in the decoder
loop. A small cleanup I've been wanting to do for ages.

Expecting a negligble perf boost if any.
2023-06-12 20:25:18 +02:00
Henrik Rydgård
880379c15d Extract some minor changes from #17497 2023-06-12 20:20:06 +02:00
Henrik Rydgård
d957f6b0be Of course got the check backwards 2023-06-12 19:45:34 +02:00
Henrik Rydgård
1a1462ecb0 x86 buildfix, warning fix 2023-06-12 17:46:57 +02:00
Henrik Rydgård
c9aa3479a4 Make vertexFullAlpha-in-register work the same as on ARM. 2023-06-12 16:08:14 +02:00
Henrik Rydgård
a164f77f47 VertexDecoderX86 (64-bit only): Avoid a memory access per loop iteration for alpha 2023-06-12 15:58:55 +02:00
Henrik Rydgård
f5516d3248 Actually switch away from XXH to a custom hash, to de-risk 2023-06-12 14:24:20 +02:00
Henrik Rydgård
2f90ec6093 Breakout the vertex caching (just code cleanup) 2023-06-12 13:16:14 +02:00
Henrik Rydgård
468757b93a Add comment about possible UV scale/offset bug. Move loop-max to local. 2023-06-12 13:16:14 +02:00
Henrik Rydgård
d90671e877 Add some comments. 2023-06-12 13:16:13 +02:00
Henrik Rydgård
186b0f105c Simplify the vertex cache ID handling 2023-06-12 13:16:13 +02:00
Henrik Rydgård
53aa2cc596 Apply stencil writemask when clearing properly again, see #17478
Also renames vpAndScissor to vpAndScissor_ for consistency.
2023-06-12 11:49:44 +02:00
Henrik Rydgård
e9e95d23ce VulkanDebug log fix, reduce log spam 2023-05-30 18:32:33 +02:00
Henrik Rydgård
49ecc01556 Fix image leak bug when pausing and we're just displaying a framebuffer in memory 2023-05-30 18:29:50 +02:00
Henrik Rydgård
7c4b9bac90 Cache textures created by MakePixelsTexture and reuse where appropriate. 2023-05-30 14:07:44 +02:00
Henrik Rydgård
ea552bc573 Add a new GPU stat (DrawPixels), kinda heavy since creates textures. GoW does 20-ish per frame. 2023-05-30 10:15:34 +02:00
Henrik Rydgård
f54f905be5 Vulkan: Remove support for other index types than 16-bit.
We don't have any use for them anyway.
2023-05-30 10:15:34 +02:00
Henrik Rydgård
ad8827ae70 Cleanup, address feedback 2023-05-26 10:28:10 +02:00
Henrik Rydgård
5c94a20ecb SoftGPU: implement CheckConfigChanged, have it check postshaders. Fixes #17511. 2023-05-26 09:48:51 +02:00