Commit Graph

965 Commits

Author SHA1 Message Date
Unknown W. Brackets
dffc333120 softgpu: Avoid thread ordering hazard.
Must run the primitives in the right order.  No shortcutting allowed.
2022-01-13 23:03:42 -08:00
Unknown W. Brackets
970e9c2f51 softgpu: Move threading into BinManager.
This threads much more effectively, across entire prim call.
2022-01-13 22:45:23 -08:00
Unknown W. Brackets
48ef4a18b1 softgpu: Handle scissor/range in BinManager. 2022-01-13 19:07:41 -08:00
Unknown W. Brackets
a0a9b1e89b softgpu: Add class to manage and enqueue for bins.
For now, just forwarding.
2022-01-13 09:26:59 -08:00
Unknown W. Brackets
6839aac109 Debugger: Cache list PC for softgpu tagging.
Still slow, but improved.
2022-01-12 21:23:49 -08:00
Unknown W. Brackets
d962fb35d3 softgpu: Centralize more prim drawing state. 2022-01-12 21:23:49 -08:00
Unknown W. Brackets
d06f17d27b softgpu: Move tex filter setting check to state. 2022-01-11 00:07:24 -08:00
Unknown W. Brackets
75ff3e44e6 softgpu: Move texture addresses to prim state. 2022-01-11 00:00:03 -08:00
Unknown W. Brackets
d5c5e9478e softgpu: Prepare more state per prim call. 2022-01-10 22:12:35 -08:00
Unknown W. Brackets
9ec7d65c49 softgpu: Use func IDs instead of gstate more. 2022-01-10 22:12:35 -08:00
Unknown W. Brackets
d7a82ab7b8 softgpu: Compute func IDs once per batch of verts.
This saves a decent chunk of time, especially when many verts are being
drawn.
2022-01-10 22:12:35 -08:00
Unknown W. Brackets
e57730a97d softgpu: Output normals to GE debugger. 2022-01-09 21:33:45 -08:00
Unknown W. Brackets
b915a82c41 softgpu: Correct decal doubling without alpha. 2022-01-09 12:23:55 -08:00
Unknown W. Brackets
72aa4be879 samplerjit: Skip processing alpha if unused. 2022-01-09 12:23:55 -08:00
Unknown W. Brackets
fe0b3dbd01 samplerjit: Fix alpha for 565 in linear lookup. 2022-01-09 11:08:46 -08:00
Henrik Rydgård
2d7a7fd34e
Merge pull request #15288 from unknownbrackets/softgpu-self
softgpu: Draw top left of rectangles first
2022-01-09 08:33:28 +01:00
Unknown W. Brackets
88ef2d1ac1 softgpu: Skip threading when rendering to self.
This will probably always be a problem to thread.
2022-01-08 21:05:08 -08:00
Unknown W. Brackets
6367d5dc8f softgpu: Draw top left of rectangles first.
This helps when things do self-rendering, since this way we won't read
from things we've just written to when scaling down.  See #11623.
2022-01-08 20:53:01 -08:00
Unknown W. Brackets
8a00c2d233 GPU: Allow gcc/clang/icc runtime SSE4 usage.
All our builds before were only using SSE4 in jit...
2022-01-08 17:09:09 -08:00
Henrik Rydgård
eee62849fe
Merge pull request #15284 from unknownbrackets/softgpu-opt
Improve softgpu lighting accuracy and speed
2022-01-08 22:05:06 +01:00
Unknown W. Brackets
c7fc448869 softgpu: Use some SSE4 in triangle interpolation. 2022-01-08 11:38:07 -08:00
Unknown W. Brackets
3b1cc0d3b8 softgpu: Limit minX/maxX per line.
Only helps when single-threaded, though.
2022-01-08 10:04:52 -08:00
Unknown W. Brackets
9458610d96 softgpu: Avoid rsqrt path for normals.
In LittleBigPlanet, it's noticeable that the lighting is very off due to
the slight loss of accuracy - possibly due to cutoff or similar.
2022-01-07 23:22:57 -08:00
Unknown W. Brackets
ce8a49b1c1 softgpu: Retain floats in diffuse/specular.
This seems to be a bit more accurate.  Color blending seems correct now,
but the factors and especially pow results are off.

Also, normalize normal to 0, 0, 1, which seems to match results better.
2022-01-06 21:52:31 -08:00
Unknown W. Brackets
bd354164bc softgpu: Cleanup -NAN and diffuse factor. 2022-01-06 21:52:27 -08:00
Unknown W. Brackets
537e357741 softgpu: Correct NAN spotlight exponent/direction. 2022-01-06 21:19:48 -08:00
Unknown W. Brackets
b86bdc9456 softgpu: Correct handling of NAN attenuation. 2022-01-06 21:19:47 -08:00
Unknown W. Brackets
fa80c448ee softgpu: More closely match PSP light rounding. 2022-01-06 21:19:47 -08:00
Unknown W. Brackets
079b67e7ed softgpu: Use common SIMD matrix multiplies. 2022-01-06 21:19:47 -08:00
Unknown W. Brackets
cba2374abd softgpu: Separate calculation of S/T.
We could probably reuse, but we're not right now and it complicates the
logic.
2022-01-06 21:19:47 -08:00
Henrik Rydgård
683289402c
Merge pull request #15279 from unknownbrackets/samplerjit-fastpath
softgpu: Correct mirroring in fastpath+nearest
2022-01-05 09:43:20 +01:00
Henrik Rydgård
f82f24a9bb
Merge pull request #15280 from unknownbrackets/samplerjit-dxt
Correct some recent regressions in samplerjit
2022-01-05 09:42:30 +01:00
Unknown W. Brackets
0993771104 samplerjit: Fix standard bufw check.
Oops, bufw could be intentionally higher while w is 16 bytes.
2022-01-05 00:11:34 -08:00
Unknown W. Brackets
741a9b0a4d samplerjit: Fix DXT compilation. 2022-01-05 00:00:03 -08:00
Unknown W. Brackets
19998976c7 samplerjit: Correct linear compile failure.
It was resetting to nullptr, because `nearest` was nullptr.
2022-01-04 23:58:07 -08:00
Unknown W. Brackets
e2f8cf8bf2 softgpu: Correct mirroring in fastpath+nearest. 2022-01-04 23:42:31 -08:00
Unknown W. Brackets
d98e5bfc97 softgpu: Improve usage of SSE for lighting.
Gives about a 2% improvement in many places.
2022-01-03 06:45:10 -08:00
Unknown W. Brackets
2aa57679fa softjit: Keep mip S/T calc in SIMD.
This is only a tiny bit faster, though.
2022-01-03 06:45:10 -08:00
Unknown W. Brackets
a309ed791b softjit: Use RIP access in color/depth off.
Seems to help, though it's small.
2022-01-03 06:45:10 -08:00
Unknown W. Brackets
612cc0ab5c softjit: Optimize depth range checks.
This was higher than I expected on the profile.  Not a huge improvement,
but a bit faster.
2022-01-03 06:45:10 -08:00
Unknown W. Brackets
961cfcd75c softjit: Add describes here too.
Helpful to aggregate when there are multiple rasterizers.
2022-01-03 06:45:10 -08:00
Unknown W. Brackets
26e7768a67 samplerjit: Remove old linear nearest paths.
We only use it for DXT now, so let's not keep the dead code around.
2022-01-02 17:28:52 -08:00
Unknown W. Brackets
5e3bef7e14 samplerjit: Avoid gather if overread could crash.
This should be rare, but a game could easily shove a CLUT4 texture at the
end of VRAM, and then accessing the last index would segfault.
2022-01-02 17:28:52 -08:00
Unknown W. Brackets
7806dfddea samplerjit: Use VPGATHERDD for all types. 2022-01-02 17:19:18 -08:00
Unknown W. Brackets
ce6ea8da11 samplerjit: Apply gather lookup to all CLUT4. 2022-01-02 17:19:18 -08:00
Unknown W. Brackets
22f770c828 samplerjit: Use VPGATHERDD for simple CLUT4 loads.
Planning to expand this to more paths.
2022-01-02 17:19:17 -08:00
Unknown W. Brackets
65c84d5dd5 samplerjit: Avoid a couple more copies in AVX.
From looking at assembly, just trying to keep it small.
2022-01-02 17:01:14 -08:00
Unknown W. Brackets
7594187538 softgpu: Skip sample lookup if masked.
Was hoping making other things faster would make this unnecessary or
worse, but it hasn't seemed to.  This gives a pretty decent improvement in
most places (~4%.)
2022-01-02 13:47:14 -08:00
Unknown W. Brackets
a0fe4d06bf softgpu: Stop specializing on miplevels.
Now that samplerjit is processing mips, it no longer helps.  Just
complexity now.
2022-01-02 13:47:14 -08:00
Unknown W. Brackets
e4673a5fa4 softgpu: Separately profile verts and lighting. 2022-01-02 13:46:11 -08:00
Henrik Rydgård
d3f0af7458
Merge pull request #15273 from unknownbrackets/softjit-bloom
Optimize software renderer handling of common bloom operations
2022-01-02 18:11:07 +01:00
Henrik Rydgård
c07ca2d89d
Merge pull request #15272 from unknownbrackets/softgpu-meminfo
softgpu: Add code for tracking GPU writes
2022-01-02 18:09:16 +01:00
Henrik Rydgård
c7062d7063
Merge pull request #15271 from unknownbrackets/samplerjit-color16
samplerjit: Decode colors in parallel
2022-01-02 17:55:46 +01:00
Unknown W. Brackets
a259761262 samplerjit: Use nearest func in fast path too.
This uses the more optimal tex funcs.
2022-01-02 08:48:16 -08:00
Unknown W. Brackets
ba17f538d6 softjit: Avoid const temp registers.
Was trying to make sure register allocation was okay in the worst case.
2022-01-02 08:47:04 -08:00
Unknown W. Brackets
e93c709f5c sofjit: Correctly poison memory.
Noticed this wasn't breakpoints when reviewing some assembly output.
2022-01-02 08:47:04 -08:00
Unknown W. Brackets
745c35f320 softjit: Small bloom optimization.
Another common case, src*dst + dst*0.  Can skip the add.
2022-01-02 08:47:04 -08:00
Unknown W. Brackets
355bad666c softjit: Optimize common case bloom blending.
Bloom often uses fixed ONE + ONE, which is a lot less work for us.  And
bloom often runs over and over again on pixels, so saving work is good.
2022-01-02 08:47:04 -08:00
Henrik Rydgård
6fb5d82fe0
Merge pull request #15264 from unknownbrackets/samplerjit-vec
A couple more smaller samplerjit optimizations
2022-01-02 17:32:54 +01:00
Unknown W. Brackets
496545e55c softgpu: Add code for tracking GPU writes.
Unfortunately, it has a pretty noticeable speed impact, even at the basic
"assume everything's written" level.  Compiled off by default, but at
least it's there.

Doesn't account for tests (i.e. alpha test skipping write) so still not
perfectly accurate.
2022-01-02 08:28:30 -08:00
Unknown W. Brackets
0eec4e7e4d samplerjit: Decode colors in parallel.
Not used in a ton of games, but a decent improvement where it is used.
2022-01-02 08:27:55 -08:00
Henrik Rydgård
cb1f26122d
Merge pull request #15269 from unknownbrackets/softgpu-opt
softgpu: Reduce interpolation if not needed
2022-01-02 09:47:19 +01:00
Henrik Rydgård
da38c027b5
Merge pull request #15268 from unknownbrackets/samplerjit-nearest
Implement nearest in samplerjit, like linear
2022-01-02 09:46:29 +01:00
Unknown W. Brackets
025ac99f2f softgpu: Reduce interpolation if not needed.
About 3% gain in some areas.
2022-01-01 18:34:04 -08:00
Unknown W. Brackets
7060035303 samplerjit: Implement nearest in jit.
This uses the tex func and similar within jit.
2022-01-01 16:58:05 -08:00
Unknown W. Brackets
91c9343e87 samplerjit: Refactor and reuse constant pool.
It's just here to be rip accessible, the fixed values can be output just
once.
2022-01-01 16:58:05 -08:00
Unknown W. Brackets
40240be91c samplerjit: Update nearest args, temp disable jit.
This temporarily disables jit for nearest, but refactors to use the new
arg structure.  It now matches linear.
2022-01-01 16:58:05 -08:00
Unknown W. Brackets
5f84de7de7 softjit: Small optimizations. 2022-01-01 16:58:04 -08:00
Unknown W. Brackets
06e954fe2a samplerjit: Create a separate fetch func.
This allows nearest to become more similar to linear, where it applies the
texture function.
2022-01-01 16:58:04 -08:00
Unknown W. Brackets
3bc6009158 samplerjit: Refactor sampler ID calculation.
Make it the same as pixel func IDs.
2022-01-01 16:58:04 -08:00
Unknown W. Brackets
d41e42d247 softgpu: Correct off-by-one scissor mask.
Fixes Brave Story in the software renderer.  Was overwriting display list
data in the stride gap.
2022-01-01 16:42:36 -08:00
Unknown W. Brackets
b35ca3d472 softgpu: Cleanup min/max tri range handling.
The previous looked like it had off by one errors.  This is simpler.
2022-01-01 16:42:36 -08:00
Unknown W. Brackets
e82fd3bd33 GPU: Avoid spline crashes on bad data.
If we get 0 prims, we can generate confusing index bounds and go out of
bounds.  Similarly, if we get a crazy number of control points and fail to
allocate, we can crash.
2022-01-01 16:40:59 -08:00
Unknown W. Brackets
12405709f0 softgpu: Skip processing scissored triangles.
If only one side was scissored (common), we might even put it on a thread,
which ended up as a lot of overhead.  Gives 3-4% improvement in some
places.
2022-01-01 16:40:34 -08:00
Unknown W. Brackets
6aec68aa5c samplerjit: Correct wrong bufw at mip levels.
Oops, was always using the base bufw.
2022-01-01 16:40:02 -08:00
Unknown W. Brackets
dbb015f427 samplerjit: Oops, fix Linux mipmap handling. 2022-01-01 16:40:02 -08:00
Unknown W. Brackets
8c31f1bb38 softjit: Fix regcache error when clearing.
Happens for non-through clears.
2022-01-01 16:40:01 -08:00
Unknown W. Brackets
8ea67b571b samplerjit: Tiny dependency optimizations.
This had a small but measureable impact (~0.3%.)
2021-12-31 08:11:57 -08:00
Unknown W. Brackets
fc3688d273 samplerjit: Small AVX optimization to modulate.
Only gives about 0.5% but it's still something.
2021-12-31 08:10:04 -08:00
Henrik Rydgård
244b0a86f6
Merge pull request #15262 from unknownbrackets/samplerjit-vec
samplerjit: Use SSSE3/SSE4 in linear filtering
2021-12-31 09:29:59 +01:00
Unknown W. Brackets
33e9841a4a softgpu: Skip zero size triangles.
These were drawing before, incorrectly, which caused artifacts.
Noticeable in Blade Dancer.
2021-12-31 00:20:12 -08:00
Unknown W. Brackets
1addf84e90 samplerjit: Use SSSE3/SSE4 in linear filtering. 2021-12-30 23:22:56 -08:00
Unknown W. Brackets
147b81d6f7 x64jit: Add AVX/AVX2 encodings.
Also fix the FMA double ones, which were passing W wrongly.
2021-12-29 19:46:26 -08:00
Unknown W. Brackets
4bd94a4e5e samplerjit: Pass funcs as an argument.
Seeing computing the ID in some profiles, so want to avoid computing per
thread/invocation.
2021-12-29 07:11:53 -08:00
Unknown W. Brackets
28cfbe0e5a samplerjit: Add an alternate profiling method.
This is more useful to group common operations together for profiling.
2021-12-29 07:11:39 -08:00
Unknown W. Brackets
3aedea89eb samplerjit: Correct level lookup offset. 2021-12-29 07:09:36 -08:00
Unknown W. Brackets
bf06342f9d samplerjit: Minor SSE4 optimizations.
These seem to be a bit faster.
2021-12-29 07:07:35 -08:00
Unknown W. Brackets
631706a8ba samplerjit: Set stackArgPos_ early.
Unfortunately, this has to match the value set lower...
2021-12-28 20:21:21 -08:00
Unknown W. Brackets
74eb450e76 samplerjit: Move texture function into jit.
Could do this also for nearest, might end up with a third set of functions
there for a direct sample lookup (for debug funcs.)
2021-12-28 17:52:17 -08:00
Unknown W. Brackets
940e6bb1d7 samplerjit: Lookup both mip tex values. 2021-12-28 16:22:54 -08:00
Unknown W. Brackets
6b55d328e5 samplerjit: Use regcache for linear filtering.
This makes it easier to reuse for mipmap filtering.
2021-12-28 15:37:25 -08:00
Unknown W. Brackets
cdf14c8579 samplerjit: Calculate mip level U/V/offsets.
Not actually doing the sampling for the second mip level in the single jit
pass yet, but close.
2021-12-28 14:12:58 -08:00
Unknown W. Brackets
a4558a5736 samplerjit: Take texptr/bufw as arrays.
Prep for moving mip map sampling into linear.
2021-12-28 12:04:16 -08:00
Unknown W. Brackets
4864850b3b samplerjit: Handle mipmap width/height in S/T calc. 2021-12-28 11:29:29 -08:00
Unknown W. Brackets
a84accf713 samplerjit: Move S/T calculation into jit.
Gives a pretty decent 5-10% improvement in many places.
2021-12-28 09:58:23 -08:00
Unknown W. Brackets
476dfdf731 samplerjit: Add more bits for S/T, skip multiply.
For now, we're not using those other bits yet.
2021-12-27 18:24:37 -08:00
Unknown W. Brackets
9cc0883d53 softgpu: Correct non-SSE T clamp. 2021-12-27 15:31:37 -08:00
Unknown W. Brackets
39d5b1c221 softgpu: Reduce mipmap fraction to 4 bits.
For CONST (and SLOPE with flat w), this produces accurate values.
SLOPE is still wrong in its handling of w, and AUTO seems to calculate
using a different and less accurate ramp.  But they both produce values
with 16 steps, in any case.
2021-12-27 11:37:33 -08:00
Unknown W. Brackets
d6b6ef4cb1 softgpu: Correct nearest filtering too.
Turns out to have the same behavior as linear, when it comes to the
subpixel offset.
2021-12-27 11:37:33 -08:00
Unknown W. Brackets
1dfaea9062 softgpu: Remove no longer possible report.
Also, it's known how this behaves, now.
2021-12-27 11:37:33 -08:00
Unknown W. Brackets
75f105f84b softgpu: Make linear filtering more accurate.
This matches tests for various u/v offsets and x/y subpixel offsets.
Mipmaps are probably still wrong.
2021-12-27 11:37:32 -08:00
Unknown W. Brackets
3cd19b02ac samplerjit: Handle unswizzled offsets too. 2021-12-27 11:37:32 -08:00
Unknown W. Brackets
820361f34b samplerjit: Calculate texel byte offset as vector. 2021-12-27 11:37:32 -08:00
Unknown W. Brackets
4d6a2f3919 samplerjit: Blend linear using integers. 2021-12-27 11:37:32 -08:00
Unknown W. Brackets
6f4e735757 samplerjit: Accumulate results in an XMM. 2021-12-27 11:37:32 -08:00
Unknown W. Brackets
b00a66e34c samplerjit: Pass u/v coords as vector. 2021-12-27 11:37:32 -08:00
Unknown W. Brackets
ce3e29a649 softjit: Fix a function arg template warning.
We're just ignoring it because it's a false positive in this case.
2021-12-11 10:45:27 -08:00
Unknown W. Brackets
0d4ec5ca20 softjit: Fix an enum type comparion error.
Same values, though, so didn't matter.
2021-12-11 10:45:27 -08:00
Henrik Rydgård
818f33d979
Merge pull request #15225 from unknownbrackets/softjit-cond-fix
softjit: Throw away regs allocated in conditionals
2021-12-11 09:30:43 +01:00
Unknown W. Brackets
5593b8ff64 softjit: Skip a common case CMP. 2021-12-11 00:06:45 -08:00
Unknown W. Brackets
d35ef352c3 softjit: Throw away regs allocated in conditionals.
If this happens, the register no longer has a deterministic value.
2021-12-11 00:06:14 -08:00
Unknown W. Brackets
b3cd135000 samplerjit: Fix DXT1/DXT5 register releasing.
Oops, broke this while refactoring.
2021-12-09 08:17:29 -08:00
Unknown W. Brackets
3180e6c043 softgpu: Correct alpha on add + invalid texfuncs. 2021-12-05 16:28:37 -08:00
Unknown W. Brackets
325a1f75aa softgpu: Match texenv blend texfunc accurately. 2021-12-05 16:09:26 -08:00
Unknown W. Brackets
0b6e7c421f softgpu: Make decal tex func more accurate.
Tested for all values of A * B + 0 * (255 - B), as well as A * 127 + B *
(255 - 127), and matches accurately.  Spot checked other values, but not
exhaustively.
2021-12-05 13:34:19 -08:00
Unknown W. Brackets
154bb53744 softgpu: Correct accuracy on fast path modulate. 2021-12-05 13:10:18 -08:00
Unknown W. Brackets
73460f7461 softgpu: Correct accuracy of MODULATE texfunc.
This matches hardware tests for every value of A * B.
Interesting that it's a different formula than alpha blend.
2021-12-05 12:06:52 -08:00
Unknown W. Brackets
891fa8c613 softgpu: Template away uncommon mip usage.
Improves general case about 10%.
2021-12-04 15:45:06 -08:00
Unknown W. Brackets
48e9404419 softgpu: Remove useless switch by UV gen mode.
They're all handled earlier now, and the switch is on a value & 3, so the
default wasn't even possible.
2021-12-04 15:45:06 -08:00
Unknown W. Brackets
ff94974df9 softgpu: Avoid texlevel check when maxlevel is 0. 2021-12-04 15:45:06 -08:00
Unknown W. Brackets
823c4adb15 softgpu: Keep arguments in vectors for sampling. 2021-12-04 15:45:06 -08:00
Unknown W. Brackets
d7c25b3e7c samplerjit: Refactor nearest using reg cache. 2021-12-04 13:04:53 -08:00
Unknown W. Brackets
4aa5bee14c softjit: Make it an error to unlock a temp.
Also fix some register usage in logic ops.
2021-12-01 21:50:02 -08:00
Unknown W. Brackets
75a918f96f softjit: Get rid of pointless AGE00 tests. 2021-12-01 21:44:10 -08:00
Unknown W. Brackets
f47fb7e14e softjit: Normalize some stencil test patterns. 2021-12-01 21:43:52 -08:00
Unknown W. Brackets
ba69e39256 softjit: Avoid tests for greater than 0.
They take more instructions, and can be somewhat common.
2021-12-01 21:40:10 -08:00
Unknown W. Brackets
aec41b34d6 softjit: Reduce ditherMatrix to 8-bit.
Oops, not sure why I made it 16 bit.
2021-12-01 21:39:29 -08:00
Unknown W. Brackets
1c5615624a softjit: Oops, correct allocation typo.
Decided to leave these for paired operations.
2021-12-01 21:37:55 -08:00
Unknown W. Brackets
bfe82e417d softjit: Fix locked stencil reg. 2021-11-28 20:26:01 -08:00
Unknown W. Brackets
99c213f244 softjit: Centralize argument register allocation. 2021-11-28 15:53:24 -08:00
Unknown W. Brackets
7aea6d2ab0 softjit: Fix fog typo causing locking bug. 2021-11-28 12:26:23 -08:00
Unknown W. Brackets
9653c33d9c softjit: Fix PixelFuncID arg on non-Windows x64.
Oops, this is of course not put on the stack, it's in R8.
2021-11-28 08:54:36 -08:00
Unknown W. Brackets
2d8fdd8cf4 Math3D: Allow construction from NEON vectors.
This makes it match SSE and easier to keep things generic.  Will impact
alignment of non-packed Vec2/Vec3.
2021-11-28 08:24:53 -08:00
Unknown W. Brackets
96a7554053 sofjit: Move common types to reg cache header.
This makes it easier to use vectors elsewhere.
2021-11-28 08:03:15 -08:00
Unknown W. Brackets
3d5bced296 softjit: Rename reg cache so it can be reused.
Intentionally just the name changes in this commit.
2021-11-28 08:03:15 -08:00
Unknown W. Brackets
4703b6cb56 softjit: Cleanup, add other arch types to regcache. 2021-11-28 08:03:15 -08:00
Unknown W. Brackets
c1882fa1c0 softjit: Disallow use of register after unlock. 2021-11-28 08:03:14 -08:00
Unknown W. Brackets
2f039abd13 softjit: Simplify regcache usage as purpose only.
Dealing with types was annoying, and this helps validate the right
register is released.
2021-11-28 08:03:14 -08:00
Unknown W. Brackets
722c04c5e2 samplerjit: Allow disabling linear too, oops. 2021-11-28 08:03:14 -08:00
Unknown W. Brackets
cc099c73f1 softjit: Decide stack offset on compile.
This makes it easier to compile different entries or push regs.
2021-11-28 08:03:14 -08:00
Unknown W. Brackets
e1ed49a3e4 softjit: Ensure all regs are released. 2021-11-28 08:03:14 -08:00
Unknown W. Brackets
d53e13b862 softjit: Manage args in the register cache. 2021-11-28 08:03:13 -08:00
Unknown W. Brackets
6fbcf67093 softjit: Fix disabled cache. 2021-11-27 11:32:47 -08:00
Unknown W. Brackets
1cb48a7bd2 softjit: Reduce jit pool size a bit. 2021-11-26 10:30:00 -08:00
Unknown W. Brackets
1f9dc3a568 softjit: Precalculate write mask and dither.
This is slightly abusing PixelFuncID, but the intent is to provide some
memory that's easily accessible from the jit func, but still associated
with that calculation (i.e. not global.)
2021-11-26 10:12:54 -08:00
Unknown W. Brackets
4e6a5ce760 softjit: Log any failed compiles. 2021-11-26 09:30:49 -08:00
Unknown W. Brackets
446eec0dff softjit: Keep color 16-bit when useful.
Reuse it expanded where we can, in case of dither+fog+blend, etc.
2021-11-26 09:30:48 -08:00
Unknown W. Brackets
c62457bb33 softjit: Optimize common blend inverse alpha case. 2021-11-26 09:30:48 -08:00
Unknown W. Brackets
1fa4e6ba2c softjit: Add alpha blending factors. 2021-11-26 09:30:48 -08:00
Unknown W. Brackets
bc8d5ad372 softjit: Cache zero vector to avoid recreating. 2021-11-26 09:30:48 -08:00
Unknown W. Brackets
a07017dbb0 softjit: Prefer easier to refill regs. 2021-11-26 09:30:47 -08:00
Unknown W. Brackets
932481d3cd softjit: Minor tweak to reg order for XCHG.
It's easier to use it in these places, but seems it stalls longer on the
dest reg.
2021-11-26 09:30:47 -08:00
Unknown W. Brackets
7f167c3660 softjit: Implement min/max/absdiff blending.
Alpha not yet implemented.
2021-11-26 09:30:47 -08:00
Unknown W. Brackets
771d459025 softjit: Use SSE4.1 for fog and dither a bit. 2021-11-26 08:42:17 -08:00
Unknown W. Brackets
cf888257ab softjit: Fix dithering bug. 2021-11-26 08:21:15 -08:00
Unknown W. Brackets
3f3e0ea8cf softjit: Optimize typical alpha/depth test.
Messed with SSE4 then realized there's no point, just use SHR.
2021-11-26 08:21:14 -08:00
Unknown W. Brackets
6644c4225c softjit: Apply logic ops. 2021-11-26 08:21:14 -08:00
Unknown W. Brackets
961273fcf5 softjit: Apply color write mask. 2021-11-26 08:21:14 -08:00
Unknown W. Brackets
a49a189962 softjit: Refactor color conv to dedicated funcs.
Will use this for masking too.
2021-11-26 08:21:14 -08:00
Unknown W. Brackets
2b4b4ae064 softjit: Add config setting to enable/disable.
Also use it for samplerjit.
2021-11-26 08:21:14 -08:00
Unknown W. Brackets
edb21b57bb softjit: Initial color write.
At this point, it's used in some areas in some games.
Alpha blending is the main unimplemented path, then logic/masking.
2021-11-26 08:21:13 -08:00
Unknown W. Brackets
0e63b357b3 softjit: Add dithering. 2021-11-26 08:21:13 -08:00
Unknown W. Brackets
bd99448863 softjit: Keep x and y args for dither.
But let's still special case the 512 path, since it's so common.
2021-11-26 08:21:13 -08:00
Unknown W. Brackets
5ee4bdbe05 softjit: Depth and stencil testing. 2021-11-26 08:21:13 -08:00
Unknown W. Brackets
f3f32cebeb softjit: Optimize some imm sizes. 2021-11-26 08:21:13 -08:00
Unknown W. Brackets
2423285831 softjit: Add helpers to get framebuf offsets. 2021-11-26 08:21:12 -08:00
Unknown W. Brackets
f8819308ff softjit: Add levels of register locking.
Locking also in helpers, so need to nest locks.
2021-11-26 08:21:12 -08:00
Unknown W. Brackets
1e00a3b842 softjit: Add color test. 2021-11-26 08:21:12 -08:00
Unknown W. Brackets
14d322956a softjit: Add alpha test. 2021-11-26 08:21:12 -08:00
Unknown W. Brackets
d9f7b9cca2 softjit: Initial depthrange, fog.
Not really tested, just filling out parts.
2021-11-26 08:21:12 -08:00
Unknown W. Brackets
9fed7ea732 softjit: Add register cache for softjit. 2021-11-26 08:21:11 -08:00
Unknown W. Brackets
91787e63d9 softjit: Switch to the __vectorcall convention. 2021-11-26 08:21:11 -08:00
Unknown W. Brackets
ae3299ea04 softjit: Add stubbed DrawPixel for x64. 2021-11-26 08:21:11 -08:00
Unknown W. Brackets
ce5ae95854 softgpu: Correct alpha blend subtract on negative.
Oops, we need to subtract signed, but then clamp to unsigned.
2021-11-25 22:06:48 -08:00
Unknown W. Brackets
dad85b97f1 softgpu: Use KEEP for any invalid stencil ops.
This just keeps the ID more consistent.
2021-11-25 21:02:20 -08:00
Unknown W. Brackets
d4bf7ea392 sofgpu: Disable alpha blend for invalid equations. 2021-11-25 19:23:41 -08:00
Unknown W. Brackets
35444b3051 softgpu: Accurately alpha blend. 2021-11-25 19:23:41 -08:00
Unknown W. Brackets
2acf7f4edf softgpu: Use 0 alpha for 565 alpha blending.
We were previously blending as 0xFF.
2021-11-25 19:23:40 -08:00
Unknown W. Brackets
2ef7dd6b03 softgpu: Correct tagging of vertexjit. 2021-11-25 19:21:56 -08:00
Unknown W. Brackets
73de8db996 softgpu: Fix stencil DECR on 5551. 2021-11-25 19:21:56 -08:00
Unknown W. Brackets
53c6a3933d softgpu: Use ALWAYS for alpha/depth test in clear. 2021-11-25 19:21:55 -08:00
Unknown W. Brackets
876c8cd368 softgpu: Fix PixelFuncID size.
Oops, can't use unions in bitfields.  Also improve typesafety.
2021-11-21 09:40:13 -08:00
Unknown W. Brackets
28bc91bd79 softgpu: Add func to tersely name pixel funcs. 2021-11-21 08:23:32 -08:00
Unknown W. Brackets
f8bc6e5b9e softgpu: Template draw pixel on fb format.
This introduces a small 5-10% perf improvement.
2021-11-21 08:23:32 -08:00
Unknown W. Brackets
09dc38080a softgpu: Move draw pixel code to separate file.
This separates things better anyway.  No major perf impact.
2021-11-21 08:23:32 -08:00
Henrik Rydgård
824805ec1e
Merge pull request #15154 from unknownbrackets/softjit
Use a pixel func ID in software rendering
2021-11-21 10:50:06 +01:00
Unknown W. Brackets
e2f0713cc2 softgpu: Clamp and round fog by mantissa bits.
This matches hardware calculated fog values much better.
2021-11-20 20:54:52 -08:00
Unknown W. Brackets
9abf2a4725 softgpu: Confirm mask doesn't hit stencil REPLACE. 2021-11-20 18:53:51 -08:00
Unknown W. Brackets
aa3786ed21 softgpu: Force off alpha blend if uselessly on.
This is a simple optimization to prevent some work games sometimes waste.
2021-11-20 15:27:04 -08:00
Unknown W. Brackets
26378f9c89 softgpu: Specialize sprite based on pixel func ID. 2021-11-20 15:27:04 -08:00
Unknown W. Brackets
f7a31c992d softgpu: Use pixel func ID to draw pixels.
This just reduces reliance on gstate directly, and should help keep things
consistent.
2021-11-20 15:27:04 -08:00
Unknown W. Brackets
953200c995 softgpu: Add func to calculate pixel func ID.
This normalizes some things, and eventually can be used for a jit key.
2021-11-20 15:27:04 -08:00
Unknown W. Brackets
b6bdd69572 softgpu: Clear by dividing out subpixel first. 2021-11-15 06:26:11 -08:00
Unknown W. Brackets
f802c3bc6d softgpu: Add some comments and cleanup. 2021-11-15 06:09:12 -08:00
Unknown W. Brackets
babd63c644 softgpu: Tune thread minimums better.
Darkstalkers seems more sensitive to these than many other games, this
improves performance more.
2021-11-14 18:44:30 -08:00
Unknown W. Brackets
66f635cba0 sfotgpu: Use threads to apply clears. 2021-11-14 18:31:46 -08:00
Unknown W. Brackets
2ab7499d8d softgpu: Combine sliced rectangles.
This mostly affects clears, and reduces overhead.  Only about 2%
improvement, but it's a small change.
2021-11-14 18:31:46 -08:00
Unknown W. Brackets
0281e2f017 softgpu: Split out rectangle path for combining. 2021-11-14 18:31:46 -08:00
Unknown W. Brackets
9545e3b0e2 softgpu: Fixup range cull for fans and fast path. 2021-11-14 18:31:45 -08:00
Unknown W. Brackets
fb6fadbbb7 softgpu: Fast path rectangles as fans.
Some games, such as Legend of Heroes III, use fans instead of strips.
2021-11-14 18:31:45 -08:00