Commit Graph

31484 Commits

Author SHA1 Message Date
Unknown W. Brackets
8a00c2d233 GPU: Allow gcc/clang/icc runtime SSE4 usage.
All our builds before were only using SSE4 in jit...
2022-01-08 17:09:09 -08:00
Henrik Rydgård
eee62849fe
Merge pull request #15284 from unknownbrackets/softgpu-opt
Improve softgpu lighting accuracy and speed
2022-01-08 22:05:06 +01:00
Henrik Rydgård
d11357caca
Merge pull request #15285 from unknownbrackets/softgpu-xrange
Skip part of row easily outside triangles in softgpu
2022-01-08 22:03:55 +01:00
Unknown W. Brackets
c7fc448869 softgpu: Use some SSE4 in triangle interpolation. 2022-01-08 11:38:07 -08:00
Unknown W. Brackets
3b1cc0d3b8 softgpu: Limit minX/maxX per line.
Only helps when single-threaded, though.
2022-01-08 10:04:52 -08:00
Unknown W. Brackets
9458610d96 softgpu: Avoid rsqrt path for normals.
In LittleBigPlanet, it's noticeable that the lighting is very off due to
the slight loss of accuracy - possibly due to cutoff or similar.
2022-01-07 23:22:57 -08:00
Unknown W. Brackets
43f71884ee softgpu: Clarify internal matrix multiply usage. 2022-01-07 17:53:24 -08:00
Henrik Rydgård
49e7d72f41 Remove QWEmct from credits as requested 2022-01-07 11:11:17 +01:00
Henrik Rydgård
2e1ef5dfe8
Merge pull request #15283 from unknownbrackets/warnings
UI: Fix some sign/size comparison warnings
2022-01-07 09:36:24 +01:00
Unknown W. Brackets
ce8a49b1c1 softgpu: Retain floats in diffuse/specular.
This seems to be a bit more accurate.  Color blending seems correct now,
but the factors and especially pow results are off.

Also, normalize normal to 0, 0, 1, which seems to match results better.
2022-01-06 21:52:31 -08:00
Unknown W. Brackets
bd354164bc softgpu: Cleanup -NAN and diffuse factor. 2022-01-06 21:52:27 -08:00
Unknown W. Brackets
537e357741 softgpu: Correct NAN spotlight exponent/direction. 2022-01-06 21:19:48 -08:00
Unknown W. Brackets
b86bdc9456 softgpu: Correct handling of NAN attenuation. 2022-01-06 21:19:47 -08:00
Unknown W. Brackets
fa80c448ee softgpu: More closely match PSP light rounding. 2022-01-06 21:19:47 -08:00
Unknown W. Brackets
e7d66f2029 softgpu: Reuse SSE/NEON matrix code. 2022-01-06 21:19:47 -08:00
Unknown W. Brackets
079b67e7ed softgpu: Use common SIMD matrix multiplies. 2022-01-06 21:19:47 -08:00
Unknown W. Brackets
cba2374abd softgpu: Separate calculation of S/T.
We could probably reuse, but we're not right now and it complicates the
logic.
2022-01-06 21:19:47 -08:00
Unknown W. Brackets
a397bf811b UI: Fix some sign/size comparison warnings.
Mostly size_t vs int.
2022-01-06 20:40:29 -08:00
Henrik Rydgård
683289402c
Merge pull request #15279 from unknownbrackets/samplerjit-fastpath
softgpu: Correct mirroring in fastpath+nearest
2022-01-05 09:43:20 +01:00
Henrik Rydgård
f82f24a9bb
Merge pull request #15280 from unknownbrackets/samplerjit-dxt
Correct some recent regressions in samplerjit
2022-01-05 09:42:30 +01:00
Unknown W. Brackets
0993771104 samplerjit: Fix standard bufw check.
Oops, bufw could be intentionally higher while w is 16 bytes.
2022-01-05 00:11:34 -08:00
Unknown W. Brackets
741a9b0a4d samplerjit: Fix DXT compilation. 2022-01-05 00:00:03 -08:00
Unknown W. Brackets
19998976c7 samplerjit: Correct linear compile failure.
It was resetting to nullptr, because `nearest` was nullptr.
2022-01-04 23:58:07 -08:00
Unknown W. Brackets
e2f8cf8bf2 softgpu: Correct mirroring in fastpath+nearest. 2022-01-04 23:42:31 -08:00
Henrik Rydgård
40093634a6
Merge pull request #15277 from unknownbrackets/softjit-opt
Small optimizations to raster and sampler, lighting optimization
2022-01-03 23:29:52 +01:00
Unknown W. Brackets
d98e5bfc97 softgpu: Improve usage of SSE for lighting.
Gives about a 2% improvement in many places.
2022-01-03 06:45:10 -08:00
Unknown W. Brackets
2aa57679fa softjit: Keep mip S/T calc in SIMD.
This is only a tiny bit faster, though.
2022-01-03 06:45:10 -08:00
Unknown W. Brackets
a309ed791b softjit: Use RIP access in color/depth off.
Seems to help, though it's small.
2022-01-03 06:45:10 -08:00
Unknown W. Brackets
612cc0ab5c softjit: Optimize depth range checks.
This was higher than I expected on the profile.  Not a huge improvement,
but a bit faster.
2022-01-03 06:45:10 -08:00
Unknown W. Brackets
961cfcd75c softjit: Add describes here too.
Helpful to aggregate when there are multiple rasterizers.
2022-01-03 06:45:10 -08:00
Henrik Rydgård
b2bb0be05d
Merge pull request #15275 from unknownbrackets/samplerjit-avx2
Use AVX2 gather for samplerjit
2022-01-03 09:27:44 +01:00
Unknown W. Brackets
26e7768a67 samplerjit: Remove old linear nearest paths.
We only use it for DXT now, so let's not keep the dead code around.
2022-01-02 17:28:52 -08:00
Unknown W. Brackets
5e3bef7e14 samplerjit: Avoid gather if overread could crash.
This should be rare, but a game could easily shove a CLUT4 texture at the
end of VRAM, and then accessing the last index would segfault.
2022-01-02 17:28:52 -08:00
Unknown W. Brackets
7806dfddea samplerjit: Use VPGATHERDD for all types. 2022-01-02 17:19:18 -08:00
Unknown W. Brackets
ce6ea8da11 samplerjit: Apply gather lookup to all CLUT4. 2022-01-02 17:19:18 -08:00
Unknown W. Brackets
22f770c828 samplerjit: Use VPGATHERDD for simple CLUT4 loads.
Planning to expand this to more paths.
2022-01-02 17:19:17 -08:00
Unknown W. Brackets
65c84d5dd5 samplerjit: Avoid a couple more copies in AVX.
From looking at assembly, just trying to keep it small.
2022-01-02 17:01:14 -08:00
Henrik Rydgård
daf9e7020a
Merge pull request #15274 from unknownbrackets/softgpu-mask
softgpu: Skip sample lookup if masked
2022-01-02 23:30:51 +01:00
Unknown W. Brackets
7594187538 softgpu: Skip sample lookup if masked.
Was hoping making other things faster would make this unnecessary or
worse, but it hasn't seemed to.  This gives a pretty decent improvement in
most places (~4%.)
2022-01-02 13:47:14 -08:00
Unknown W. Brackets
a0fe4d06bf softgpu: Stop specializing on miplevels.
Now that samplerjit is processing mips, it no longer helps.  Just
complexity now.
2022-01-02 13:47:14 -08:00
Unknown W. Brackets
e4673a5fa4 softgpu: Separately profile verts and lighting. 2022-01-02 13:46:11 -08:00
Henrik Rydgård
d3f0af7458
Merge pull request #15273 from unknownbrackets/softjit-bloom
Optimize software renderer handling of common bloom operations
2022-01-02 18:11:07 +01:00
Henrik Rydgård
c07ca2d89d
Merge pull request #15272 from unknownbrackets/softgpu-meminfo
softgpu: Add code for tracking GPU writes
2022-01-02 18:09:16 +01:00
Henrik Rydgård
c7062d7063
Merge pull request #15271 from unknownbrackets/samplerjit-color16
samplerjit: Decode colors in parallel
2022-01-02 17:55:46 +01:00
Unknown W. Brackets
a259761262 samplerjit: Use nearest func in fast path too.
This uses the more optimal tex funcs.
2022-01-02 08:48:16 -08:00
Unknown W. Brackets
ba17f538d6 softjit: Avoid const temp registers.
Was trying to make sure register allocation was okay in the worst case.
2022-01-02 08:47:04 -08:00
Unknown W. Brackets
e93c709f5c sofjit: Correctly poison memory.
Noticed this wasn't breakpoints when reviewing some assembly output.
2022-01-02 08:47:04 -08:00
Unknown W. Brackets
745c35f320 softjit: Small bloom optimization.
Another common case, src*dst + dst*0.  Can skip the add.
2022-01-02 08:47:04 -08:00
Unknown W. Brackets
355bad666c softjit: Optimize common case bloom blending.
Bloom often uses fixed ONE + ONE, which is a lot less work for us.  And
bloom often runs over and over again on pixels, so saving work is good.
2022-01-02 08:47:04 -08:00
Henrik Rydgård
6fb5d82fe0
Merge pull request #15264 from unknownbrackets/samplerjit-vec
A couple more smaller samplerjit optimizations
2022-01-02 17:32:54 +01:00