Commit Graph

965 Commits

Author SHA1 Message Date
Henrik Rydgård
d3f0af7458
Merge pull request from unknownbrackets/softjit-bloom
Optimize software renderer handling of common bloom operations
2022-01-02 18:11:07 +01:00
Henrik Rydgård
c07ca2d89d
Merge pull request from unknownbrackets/softgpu-meminfo
softgpu: Add code for tracking GPU writes
2022-01-02 18:09:16 +01:00
Henrik Rydgård
c7062d7063
Merge pull request from unknownbrackets/samplerjit-color16
samplerjit: Decode colors in parallel
2022-01-02 17:55:46 +01:00
Unknown W. Brackets
a259761262 samplerjit: Use nearest func in fast path too.
This uses the more optimal tex funcs.
2022-01-02 08:48:16 -08:00
Unknown W. Brackets
ba17f538d6 softjit: Avoid const temp registers.
Was trying to make sure register allocation was okay in the worst case.
2022-01-02 08:47:04 -08:00
Unknown W. Brackets
e93c709f5c sofjit: Correctly poison memory.
Noticed this wasn't breakpoints when reviewing some assembly output.
2022-01-02 08:47:04 -08:00
Unknown W. Brackets
745c35f320 softjit: Small bloom optimization.
Another common case, src*dst + dst*0.  Can skip the add.
2022-01-02 08:47:04 -08:00
Unknown W. Brackets
355bad666c softjit: Optimize common case bloom blending.
Bloom often uses fixed ONE + ONE, which is a lot less work for us.  And
bloom often runs over and over again on pixels, so saving work is good.
2022-01-02 08:47:04 -08:00
Henrik Rydgård
6fb5d82fe0
Merge pull request from unknownbrackets/samplerjit-vec
A couple more smaller samplerjit optimizations
2022-01-02 17:32:54 +01:00
Unknown W. Brackets
496545e55c softgpu: Add code for tracking GPU writes.
Unfortunately, it has a pretty noticeable speed impact, even at the basic
"assume everything's written" level.  Compiled off by default, but at
least it's there.

Doesn't account for tests (i.e. alpha test skipping write) so still not
perfectly accurate.
2022-01-02 08:28:30 -08:00
Unknown W. Brackets
0eec4e7e4d samplerjit: Decode colors in parallel.
Not used in a ton of games, but a decent improvement where it is used.
2022-01-02 08:27:55 -08:00
Henrik Rydgård
cb1f26122d
Merge pull request from unknownbrackets/softgpu-opt
softgpu: Reduce interpolation if not needed
2022-01-02 09:47:19 +01:00
Henrik Rydgård
da38c027b5
Merge pull request from unknownbrackets/samplerjit-nearest
Implement nearest in samplerjit, like linear
2022-01-02 09:46:29 +01:00
Unknown W. Brackets
025ac99f2f softgpu: Reduce interpolation if not needed.
About 3% gain in some areas.
2022-01-01 18:34:04 -08:00
Unknown W. Brackets
7060035303 samplerjit: Implement nearest in jit.
This uses the tex func and similar within jit.
2022-01-01 16:58:05 -08:00
Unknown W. Brackets
91c9343e87 samplerjit: Refactor and reuse constant pool.
It's just here to be rip accessible, the fixed values can be output just
once.
2022-01-01 16:58:05 -08:00
Unknown W. Brackets
40240be91c samplerjit: Update nearest args, temp disable jit.
This temporarily disables jit for nearest, but refactors to use the new
arg structure.  It now matches linear.
2022-01-01 16:58:05 -08:00
Unknown W. Brackets
5f84de7de7 softjit: Small optimizations. 2022-01-01 16:58:04 -08:00
Unknown W. Brackets
06e954fe2a samplerjit: Create a separate fetch func.
This allows nearest to become more similar to linear, where it applies the
texture function.
2022-01-01 16:58:04 -08:00
Unknown W. Brackets
3bc6009158 samplerjit: Refactor sampler ID calculation.
Make it the same as pixel func IDs.
2022-01-01 16:58:04 -08:00
Unknown W. Brackets
d41e42d247 softgpu: Correct off-by-one scissor mask.
Fixes Brave Story in the software renderer.  Was overwriting display list
data in the stride gap.
2022-01-01 16:42:36 -08:00
Unknown W. Brackets
b35ca3d472 softgpu: Cleanup min/max tri range handling.
The previous looked like it had off by one errors.  This is simpler.
2022-01-01 16:42:36 -08:00
Unknown W. Brackets
e82fd3bd33 GPU: Avoid spline crashes on bad data.
If we get 0 prims, we can generate confusing index bounds and go out of
bounds.  Similarly, if we get a crazy number of control points and fail to
allocate, we can crash.
2022-01-01 16:40:59 -08:00
Unknown W. Brackets
12405709f0 softgpu: Skip processing scissored triangles.
If only one side was scissored (common), we might even put it on a thread,
which ended up as a lot of overhead.  Gives 3-4% improvement in some
places.
2022-01-01 16:40:34 -08:00
Unknown W. Brackets
6aec68aa5c samplerjit: Correct wrong bufw at mip levels.
Oops, was always using the base bufw.
2022-01-01 16:40:02 -08:00
Unknown W. Brackets
dbb015f427 samplerjit: Oops, fix Linux mipmap handling. 2022-01-01 16:40:02 -08:00
Unknown W. Brackets
8c31f1bb38 softjit: Fix regcache error when clearing.
Happens for non-through clears.
2022-01-01 16:40:01 -08:00
Unknown W. Brackets
8ea67b571b samplerjit: Tiny dependency optimizations.
This had a small but measureable impact (~0.3%.)
2021-12-31 08:11:57 -08:00
Unknown W. Brackets
fc3688d273 samplerjit: Small AVX optimization to modulate.
Only gives about 0.5% but it's still something.
2021-12-31 08:10:04 -08:00
Henrik Rydgård
244b0a86f6
Merge pull request from unknownbrackets/samplerjit-vec
samplerjit: Use SSSE3/SSE4 in linear filtering
2021-12-31 09:29:59 +01:00
Unknown W. Brackets
33e9841a4a softgpu: Skip zero size triangles.
These were drawing before, incorrectly, which caused artifacts.
Noticeable in Blade Dancer.
2021-12-31 00:20:12 -08:00
Unknown W. Brackets
1addf84e90 samplerjit: Use SSSE3/SSE4 in linear filtering. 2021-12-30 23:22:56 -08:00
Unknown W. Brackets
147b81d6f7 x64jit: Add AVX/AVX2 encodings.
Also fix the FMA double ones, which were passing W wrongly.
2021-12-29 19:46:26 -08:00
Unknown W. Brackets
4bd94a4e5e samplerjit: Pass funcs as an argument.
Seeing computing the ID in some profiles, so want to avoid computing per
thread/invocation.
2021-12-29 07:11:53 -08:00
Unknown W. Brackets
28cfbe0e5a samplerjit: Add an alternate profiling method.
This is more useful to group common operations together for profiling.
2021-12-29 07:11:39 -08:00
Unknown W. Brackets
3aedea89eb samplerjit: Correct level lookup offset. 2021-12-29 07:09:36 -08:00
Unknown W. Brackets
bf06342f9d samplerjit: Minor SSE4 optimizations.
These seem to be a bit faster.
2021-12-29 07:07:35 -08:00
Unknown W. Brackets
631706a8ba samplerjit: Set stackArgPos_ early.
Unfortunately, this has to match the value set lower...
2021-12-28 20:21:21 -08:00
Unknown W. Brackets
74eb450e76 samplerjit: Move texture function into jit.
Could do this also for nearest, might end up with a third set of functions
there for a direct sample lookup (for debug funcs.)
2021-12-28 17:52:17 -08:00
Unknown W. Brackets
940e6bb1d7 samplerjit: Lookup both mip tex values. 2021-12-28 16:22:54 -08:00
Unknown W. Brackets
6b55d328e5 samplerjit: Use regcache for linear filtering.
This makes it easier to reuse for mipmap filtering.
2021-12-28 15:37:25 -08:00
Unknown W. Brackets
cdf14c8579 samplerjit: Calculate mip level U/V/offsets.
Not actually doing the sampling for the second mip level in the single jit
pass yet, but close.
2021-12-28 14:12:58 -08:00
Unknown W. Brackets
a4558a5736 samplerjit: Take texptr/bufw as arrays.
Prep for moving mip map sampling into linear.
2021-12-28 12:04:16 -08:00
Unknown W. Brackets
4864850b3b samplerjit: Handle mipmap width/height in S/T calc. 2021-12-28 11:29:29 -08:00
Unknown W. Brackets
a84accf713 samplerjit: Move S/T calculation into jit.
Gives a pretty decent 5-10% improvement in many places.
2021-12-28 09:58:23 -08:00
Unknown W. Brackets
476dfdf731 samplerjit: Add more bits for S/T, skip multiply.
For now, we're not using those other bits yet.
2021-12-27 18:24:37 -08:00
Unknown W. Brackets
9cc0883d53 softgpu: Correct non-SSE T clamp. 2021-12-27 15:31:37 -08:00
Unknown W. Brackets
39d5b1c221 softgpu: Reduce mipmap fraction to 4 bits.
For CONST (and SLOPE with flat w), this produces accurate values.
SLOPE is still wrong in its handling of w, and AUTO seems to calculate
using a different and less accurate ramp.  But they both produce values
with 16 steps, in any case.
2021-12-27 11:37:33 -08:00
Unknown W. Brackets
d6b6ef4cb1 softgpu: Correct nearest filtering too.
Turns out to have the same behavior as linear, when it comes to the
subpixel offset.
2021-12-27 11:37:33 -08:00
Unknown W. Brackets
1dfaea9062 softgpu: Remove no longer possible report.
Also, it's known how this behaves, now.
2021-12-27 11:37:33 -08:00
Unknown W. Brackets
75f105f84b softgpu: Make linear filtering more accurate.
This matches tests for various u/v offsets and x/y subpixel offsets.
Mipmaps are probably still wrong.
2021-12-27 11:37:32 -08:00
Unknown W. Brackets
3cd19b02ac samplerjit: Handle unswizzled offsets too. 2021-12-27 11:37:32 -08:00
Unknown W. Brackets
820361f34b samplerjit: Calculate texel byte offset as vector. 2021-12-27 11:37:32 -08:00
Unknown W. Brackets
4d6a2f3919 samplerjit: Blend linear using integers. 2021-12-27 11:37:32 -08:00
Unknown W. Brackets
6f4e735757 samplerjit: Accumulate results in an XMM. 2021-12-27 11:37:32 -08:00
Unknown W. Brackets
b00a66e34c samplerjit: Pass u/v coords as vector. 2021-12-27 11:37:32 -08:00
Unknown W. Brackets
ce3e29a649 softjit: Fix a function arg template warning.
We're just ignoring it because it's a false positive in this case.
2021-12-11 10:45:27 -08:00
Unknown W. Brackets
0d4ec5ca20 softjit: Fix an enum type comparion error.
Same values, though, so didn't matter.
2021-12-11 10:45:27 -08:00
Henrik Rydgård
818f33d979
Merge pull request from unknownbrackets/softjit-cond-fix
softjit: Throw away regs allocated in conditionals
2021-12-11 09:30:43 +01:00
Unknown W. Brackets
5593b8ff64 softjit: Skip a common case CMP. 2021-12-11 00:06:45 -08:00
Unknown W. Brackets
d35ef352c3 softjit: Throw away regs allocated in conditionals.
If this happens, the register no longer has a deterministic value.
2021-12-11 00:06:14 -08:00
Unknown W. Brackets
b3cd135000 samplerjit: Fix DXT1/DXT5 register releasing.
Oops, broke this while refactoring.
2021-12-09 08:17:29 -08:00
Unknown W. Brackets
3180e6c043 softgpu: Correct alpha on add + invalid texfuncs. 2021-12-05 16:28:37 -08:00
Unknown W. Brackets
325a1f75aa softgpu: Match texenv blend texfunc accurately. 2021-12-05 16:09:26 -08:00
Unknown W. Brackets
0b6e7c421f softgpu: Make decal tex func more accurate.
Tested for all values of A * B + 0 * (255 - B), as well as A * 127 + B *
(255 - 127), and matches accurately.  Spot checked other values, but not
exhaustively.
2021-12-05 13:34:19 -08:00
Unknown W. Brackets
154bb53744 softgpu: Correct accuracy on fast path modulate. 2021-12-05 13:10:18 -08:00
Unknown W. Brackets
73460f7461 softgpu: Correct accuracy of MODULATE texfunc.
This matches hardware tests for every value of A * B.
Interesting that it's a different formula than alpha blend.
2021-12-05 12:06:52 -08:00
Unknown W. Brackets
891fa8c613 softgpu: Template away uncommon mip usage.
Improves general case about 10%.
2021-12-04 15:45:06 -08:00
Unknown W. Brackets
48e9404419 softgpu: Remove useless switch by UV gen mode.
They're all handled earlier now, and the switch is on a value & 3, so the
default wasn't even possible.
2021-12-04 15:45:06 -08:00
Unknown W. Brackets
ff94974df9 softgpu: Avoid texlevel check when maxlevel is 0. 2021-12-04 15:45:06 -08:00
Unknown W. Brackets
823c4adb15 softgpu: Keep arguments in vectors for sampling. 2021-12-04 15:45:06 -08:00
Unknown W. Brackets
d7c25b3e7c samplerjit: Refactor nearest using reg cache. 2021-12-04 13:04:53 -08:00
Unknown W. Brackets
4aa5bee14c softjit: Make it an error to unlock a temp.
Also fix some register usage in logic ops.
2021-12-01 21:50:02 -08:00
Unknown W. Brackets
75a918f96f softjit: Get rid of pointless AGE00 tests. 2021-12-01 21:44:10 -08:00
Unknown W. Brackets
f47fb7e14e softjit: Normalize some stencil test patterns. 2021-12-01 21:43:52 -08:00
Unknown W. Brackets
ba69e39256 softjit: Avoid tests for greater than 0.
They take more instructions, and can be somewhat common.
2021-12-01 21:40:10 -08:00
Unknown W. Brackets
aec41b34d6 softjit: Reduce ditherMatrix to 8-bit.
Oops, not sure why I made it 16 bit.
2021-12-01 21:39:29 -08:00
Unknown W. Brackets
1c5615624a softjit: Oops, correct allocation typo.
Decided to leave these for paired operations.
2021-12-01 21:37:55 -08:00
Unknown W. Brackets
bfe82e417d softjit: Fix locked stencil reg. 2021-11-28 20:26:01 -08:00
Unknown W. Brackets
99c213f244 softjit: Centralize argument register allocation. 2021-11-28 15:53:24 -08:00
Unknown W. Brackets
7aea6d2ab0 softjit: Fix fog typo causing locking bug. 2021-11-28 12:26:23 -08:00
Unknown W. Brackets
9653c33d9c softjit: Fix PixelFuncID arg on non-Windows x64.
Oops, this is of course not put on the stack, it's in R8.
2021-11-28 08:54:36 -08:00
Unknown W. Brackets
2d8fdd8cf4 Math3D: Allow construction from NEON vectors.
This makes it match SSE and easier to keep things generic.  Will impact
alignment of non-packed Vec2/Vec3.
2021-11-28 08:24:53 -08:00
Unknown W. Brackets
96a7554053 sofjit: Move common types to reg cache header.
This makes it easier to use vectors elsewhere.
2021-11-28 08:03:15 -08:00
Unknown W. Brackets
3d5bced296 softjit: Rename reg cache so it can be reused.
Intentionally just the name changes in this commit.
2021-11-28 08:03:15 -08:00
Unknown W. Brackets
4703b6cb56 softjit: Cleanup, add other arch types to regcache. 2021-11-28 08:03:15 -08:00
Unknown W. Brackets
c1882fa1c0 softjit: Disallow use of register after unlock. 2021-11-28 08:03:14 -08:00
Unknown W. Brackets
2f039abd13 softjit: Simplify regcache usage as purpose only.
Dealing with types was annoying, and this helps validate the right
register is released.
2021-11-28 08:03:14 -08:00
Unknown W. Brackets
722c04c5e2 samplerjit: Allow disabling linear too, oops. 2021-11-28 08:03:14 -08:00
Unknown W. Brackets
cc099c73f1 softjit: Decide stack offset on compile.
This makes it easier to compile different entries or push regs.
2021-11-28 08:03:14 -08:00
Unknown W. Brackets
e1ed49a3e4 softjit: Ensure all regs are released. 2021-11-28 08:03:14 -08:00
Unknown W. Brackets
d53e13b862 softjit: Manage args in the register cache. 2021-11-28 08:03:13 -08:00
Unknown W. Brackets
6fbcf67093 softjit: Fix disabled cache. 2021-11-27 11:32:47 -08:00
Unknown W. Brackets
1cb48a7bd2 softjit: Reduce jit pool size a bit. 2021-11-26 10:30:00 -08:00
Unknown W. Brackets
1f9dc3a568 softjit: Precalculate write mask and dither.
This is slightly abusing PixelFuncID, but the intent is to provide some
memory that's easily accessible from the jit func, but still associated
with that calculation (i.e. not global.)
2021-11-26 10:12:54 -08:00
Unknown W. Brackets
4e6a5ce760 softjit: Log any failed compiles. 2021-11-26 09:30:49 -08:00
Unknown W. Brackets
446eec0dff softjit: Keep color 16-bit when useful.
Reuse it expanded where we can, in case of dither+fog+blend, etc.
2021-11-26 09:30:48 -08:00
Unknown W. Brackets
c62457bb33 softjit: Optimize common blend inverse alpha case. 2021-11-26 09:30:48 -08:00
Unknown W. Brackets
1fa4e6ba2c softjit: Add alpha blending factors. 2021-11-26 09:30:48 -08:00
Unknown W. Brackets
bc8d5ad372 softjit: Cache zero vector to avoid recreating. 2021-11-26 09:30:48 -08:00
Unknown W. Brackets
a07017dbb0 softjit: Prefer easier to refill regs. 2021-11-26 09:30:47 -08:00
Unknown W. Brackets
932481d3cd softjit: Minor tweak to reg order for XCHG.
It's easier to use it in these places, but seems it stalls longer on the
dest reg.
2021-11-26 09:30:47 -08:00
Unknown W. Brackets
7f167c3660 softjit: Implement min/max/absdiff blending.
Alpha not yet implemented.
2021-11-26 09:30:47 -08:00
Unknown W. Brackets
771d459025 softjit: Use SSE4.1 for fog and dither a bit. 2021-11-26 08:42:17 -08:00
Unknown W. Brackets
cf888257ab softjit: Fix dithering bug. 2021-11-26 08:21:15 -08:00
Unknown W. Brackets
3f3e0ea8cf softjit: Optimize typical alpha/depth test.
Messed with SSE4 then realized there's no point, just use SHR.
2021-11-26 08:21:14 -08:00
Unknown W. Brackets
6644c4225c softjit: Apply logic ops. 2021-11-26 08:21:14 -08:00
Unknown W. Brackets
961273fcf5 softjit: Apply color write mask. 2021-11-26 08:21:14 -08:00
Unknown W. Brackets
a49a189962 softjit: Refactor color conv to dedicated funcs.
Will use this for masking too.
2021-11-26 08:21:14 -08:00
Unknown W. Brackets
2b4b4ae064 softjit: Add config setting to enable/disable.
Also use it for samplerjit.
2021-11-26 08:21:14 -08:00
Unknown W. Brackets
edb21b57bb softjit: Initial color write.
At this point, it's used in some areas in some games.
Alpha blending is the main unimplemented path, then logic/masking.
2021-11-26 08:21:13 -08:00
Unknown W. Brackets
0e63b357b3 softjit: Add dithering. 2021-11-26 08:21:13 -08:00
Unknown W. Brackets
bd99448863 softjit: Keep x and y args for dither.
But let's still special case the 512 path, since it's so common.
2021-11-26 08:21:13 -08:00
Unknown W. Brackets
5ee4bdbe05 softjit: Depth and stencil testing. 2021-11-26 08:21:13 -08:00
Unknown W. Brackets
f3f32cebeb softjit: Optimize some imm sizes. 2021-11-26 08:21:13 -08:00
Unknown W. Brackets
2423285831 softjit: Add helpers to get framebuf offsets. 2021-11-26 08:21:12 -08:00
Unknown W. Brackets
f8819308ff softjit: Add levels of register locking.
Locking also in helpers, so need to nest locks.
2021-11-26 08:21:12 -08:00
Unknown W. Brackets
1e00a3b842 softjit: Add color test. 2021-11-26 08:21:12 -08:00
Unknown W. Brackets
14d322956a softjit: Add alpha test. 2021-11-26 08:21:12 -08:00
Unknown W. Brackets
d9f7b9cca2 softjit: Initial depthrange, fog.
Not really tested, just filling out parts.
2021-11-26 08:21:12 -08:00
Unknown W. Brackets
9fed7ea732 softjit: Add register cache for softjit. 2021-11-26 08:21:11 -08:00
Unknown W. Brackets
91787e63d9 softjit: Switch to the __vectorcall convention. 2021-11-26 08:21:11 -08:00
Unknown W. Brackets
ae3299ea04 softjit: Add stubbed DrawPixel for x64. 2021-11-26 08:21:11 -08:00
Unknown W. Brackets
ce5ae95854 softgpu: Correct alpha blend subtract on negative.
Oops, we need to subtract signed, but then clamp to unsigned.
2021-11-25 22:06:48 -08:00
Unknown W. Brackets
dad85b97f1 softgpu: Use KEEP for any invalid stencil ops.
This just keeps the ID more consistent.
2021-11-25 21:02:20 -08:00
Unknown W. Brackets
d4bf7ea392 sofgpu: Disable alpha blend for invalid equations. 2021-11-25 19:23:41 -08:00
Unknown W. Brackets
35444b3051 softgpu: Accurately alpha blend. 2021-11-25 19:23:41 -08:00
Unknown W. Brackets
2acf7f4edf softgpu: Use 0 alpha for 565 alpha blending.
We were previously blending as 0xFF.
2021-11-25 19:23:40 -08:00
Unknown W. Brackets
2ef7dd6b03 softgpu: Correct tagging of vertexjit. 2021-11-25 19:21:56 -08:00
Unknown W. Brackets
73de8db996 softgpu: Fix stencil DECR on 5551. 2021-11-25 19:21:56 -08:00
Unknown W. Brackets
53c6a3933d softgpu: Use ALWAYS for alpha/depth test in clear. 2021-11-25 19:21:55 -08:00
Unknown W. Brackets
876c8cd368 softgpu: Fix PixelFuncID size.
Oops, can't use unions in bitfields.  Also improve typesafety.
2021-11-21 09:40:13 -08:00
Unknown W. Brackets
28bc91bd79 softgpu: Add func to tersely name pixel funcs. 2021-11-21 08:23:32 -08:00
Unknown W. Brackets
f8bc6e5b9e softgpu: Template draw pixel on fb format.
This introduces a small 5-10% perf improvement.
2021-11-21 08:23:32 -08:00
Unknown W. Brackets
09dc38080a softgpu: Move draw pixel code to separate file.
This separates things better anyway.  No major perf impact.
2021-11-21 08:23:32 -08:00
Henrik Rydgård
824805ec1e
Merge pull request from unknownbrackets/softjit
Use a pixel func ID in software rendering
2021-11-21 10:50:06 +01:00
Unknown W. Brackets
e2f0713cc2 softgpu: Clamp and round fog by mantissa bits.
This matches hardware calculated fog values much better.
2021-11-20 20:54:52 -08:00
Unknown W. Brackets
9abf2a4725 softgpu: Confirm mask doesn't hit stencil REPLACE. 2021-11-20 18:53:51 -08:00
Unknown W. Brackets
aa3786ed21 softgpu: Force off alpha blend if uselessly on.
This is a simple optimization to prevent some work games sometimes waste.
2021-11-20 15:27:04 -08:00
Unknown W. Brackets
26378f9c89 softgpu: Specialize sprite based on pixel func ID. 2021-11-20 15:27:04 -08:00
Unknown W. Brackets
f7a31c992d softgpu: Use pixel func ID to draw pixels.
This just reduces reliance on gstate directly, and should help keep things
consistent.
2021-11-20 15:27:04 -08:00
Unknown W. Brackets
953200c995 softgpu: Add func to calculate pixel func ID.
This normalizes some things, and eventually can be used for a jit key.
2021-11-20 15:27:04 -08:00
Unknown W. Brackets
b6bdd69572 softgpu: Clear by dividing out subpixel first. 2021-11-15 06:26:11 -08:00
Unknown W. Brackets
f802c3bc6d softgpu: Add some comments and cleanup. 2021-11-15 06:09:12 -08:00
Unknown W. Brackets
babd63c644 softgpu: Tune thread minimums better.
Darkstalkers seems more sensitive to these than many other games, this
improves performance more.
2021-11-14 18:44:30 -08:00
Unknown W. Brackets
66f635cba0 sfotgpu: Use threads to apply clears. 2021-11-14 18:31:46 -08:00
Unknown W. Brackets
2ab7499d8d softgpu: Combine sliced rectangles.
This mostly affects clears, and reduces overhead.  Only about 2%
improvement, but it's a small change.
2021-11-14 18:31:46 -08:00
Unknown W. Brackets
0281e2f017 softgpu: Split out rectangle path for combining. 2021-11-14 18:31:46 -08:00
Unknown W. Brackets
9545e3b0e2 softgpu: Fixup range cull for fans and fast path. 2021-11-14 18:31:45 -08:00
Unknown W. Brackets
fb6fadbbb7 softgpu: Fast path rectangles as fans.
Some games, such as Legend of Heroes III, use fans instead of strips.
2021-11-14 18:31:45 -08:00
Unknown W. Brackets
09a9927b82 softgpu: Use range loops for sprite fast path. 2021-11-14 18:31:45 -08:00
Unknown W. Brackets
55cde6bd6a softgpu: Check flat z in fast path. 2021-11-14 12:27:39 -08:00
Unknown W. Brackets
361c8f966c softgpu: Fast path triangles without textures.
The fast path may still be useful in this case.
2021-11-14 12:27:39 -08:00
Unknown W. Brackets
5bb6245b1f softgpu: Fix leaked range flag on cull.
Fixes some backgrounds in Final Fantasy 4, probably others.
2021-11-14 08:43:52 -08:00
Unknown W. Brackets
f66e243727 softgpu: Correct scissor for pixel centers. 2021-11-07 11:19:41 -08:00
Unknown W. Brackets
8db2d37e64 softgpu: Fix depth cull in softgpu.
Was improperly skipping cull for positive Z.
2021-11-05 21:38:13 -07:00
Unknown W. Brackets
fe440d40e5 softgpu: Clip full weighted Z without truncating.
In case wsum_recip is nan or similar, we want to make sure we still
properly clip to minz/maxz.
2021-11-05 21:36:38 -07:00
Unknown W. Brackets
f03fa2b0b8 softgpu: Improve accuracy of line drawing.
Needs higher precision to change pixel at the right time.

This makes the lines in Persona 1 look right, see .
2021-11-04 00:11:09 -07:00
Unknown W. Brackets
b1009f70f9 softgpu: Allow end coordinate at bounds.
Oops, was excluding some valid usage that wouldn't wrap.
2021-09-30 06:33:25 -07:00
Unknown W. Brackets
953916a842 softgpu: Avoid fast path for clamp/wrap cases.
It doesn't clamp or wrap, and those are uncommon for the fast path.
Fixes .
2021-09-29 19:19:21 -07:00
Unknown W. Brackets
08816a544d softgpu: Implement DXT5 in samplerx86. 2021-09-12 17:17:09 -07:00
Unknown W. Brackets
c4de5bfb9f softgpu: Implement DXT3 in samplerx86. 2021-09-12 14:53:55 -07:00
Unknown W. Brackets
ee9d19430f softgpu: Implement DXT1 decoding in samplerx86. 2021-09-12 13:57:28 -07:00
Unknown W. Brackets
a0eeb52444 softgpu: Decode DXT texels directly.
This improves performance a lot compared to decoding the whole block.
Eventually we may implement a cache, but threading makes that complex to
make properly fast.
2021-09-12 09:37:34 -07:00
Unknown W. Brackets
121c56e6db softgpu: Clip only on -Z, cull if entirely outside.
This is important for several issues, like  or , where
something is drawn entirely outside valid Z, and should be culled.
2021-09-09 20:13:42 -07:00
Unknown W. Brackets
0b73c1ce83 softgpu: Correct guardband cull behavior.
Culling is based on whether clipping happens, not whether clamping
happens.  This is important for issues like .
2021-09-09 20:05:41 -07:00
Unknown W. Brackets
b5ba469826 softgpu: Prevent pixel gaps when drawing sprites.
If you end a sprite at 255.9, it draws the pixel at 255.  This uses the
same logic to handle that as in the triangle path.
2021-09-06 22:05:39 -07:00
Unknown W. Brackets
7addc18a6b softgpu: Avoid overflow infinite loop.
For certain large values, it would overflow and continue looping
endlessly.
2021-09-05 23:24:08 -07:00
Henrik Rydgård
3be5c7bd9a Make the minimum items per thread explicit. Found some bugs, optional arguments are evil. 2021-06-12 21:21:28 +02:00
Henrik Rydgård
73871b9b7e Implement new thread manager, port stuff to it. 2021-06-12 13:03:53 +02:00
Unknown W. Brackets
3304814fd6 GPU: Minor cleanup duplicate header/conditions. 2021-05-08 09:12:22 -07:00
Unknown W. Brackets
de46b0998a GPU: Correctly initialize HW tessellation support.
Oops, shouldn't call a virtual in a constructor.
2021-05-08 09:10:23 -07:00
Unknown W. Brackets
8a8328c431 Common: Move ColorConv to a more appropriate place. 2021-05-01 11:20:05 -07:00
Unknown W. Brackets
ee749804fc Debugger: Note GPU block transfer src as well. 2021-04-03 18:11:44 -07:00
Unknown W. Brackets
4178f09e57 Build: More consistently avoid _M_ defines.
We use PPSSPP_ARCH in several places already, this makes it more complete.
2021-03-02 21:49:21 -08:00
Unknown W. Brackets
d9aecffd72 Build: Remove old ARM define. 2021-03-02 21:26:03 -08:00
Henrik Rydgård
0facd4d4a6
Merge pull request from unknownbrackets/texreplace
Support texture replacement filtering overrides
2021-02-28 18:09:38 +01:00
Unknown W. Brackets
2f63f9999d GPU: Normalize 0 to 1 always in software lighting.
See .  This seems to be consistent.
2021-02-27 23:51:45 -08:00
Unknown W. Brackets
fb3ad1df4b Replacement: Read in texture filtering overrides.
If you're replacing, you can know more information about linear safety for
tests.
2021-02-27 17:16:16 -08:00
Henrik Rydgård
2f3bc2d373
Merge pull request from unknownbrackets/debugger-mem
Track memory allocations and writes for debug info
2021-02-21 10:18:11 +01:00
aliaspider
9a3e5879bb Global: Correct many endian types and casts. 2021-02-18 22:25:24 -08:00
Unknown W. Brackets
f7740edc6d Debugger: Add more metadata for memory usage. 2021-02-15 15:01:21 -08:00
Unknown W. Brackets
f32f89dd90 Global: Remove some unused variables. 2021-02-15 11:59:45 -08:00
Unknown W. Brackets
5e3579a780 SoftGPU: Fix sprite provoking vertex in fast path.
It was right everywhere else.
2021-01-16 20:13:16 -08:00
Henrik Rydgård
3f01cbb98c Initialize/Deinitialize the shader translation system once globally.
Fixes .
2021-01-04 23:51:34 +01:00
Unknown W. Brackets
e1050fe855 UWP: Don't try compiling samplerjit. 2021-01-02 09:54:35 -08:00
Unknown W. Brackets
ed65bc2327 SoftGPU: Allow rendering with no backend at all. 2021-01-02 09:25:41 -08:00
Unknown W. Brackets
6a2b3f8f78 SoftGPU: Update PPGe draw context.
Oops, this was missing.
2021-01-02 09:23:25 -08:00
Henrik Rydgård
32c9728c0c Some cleanups in GL feature and shader language detection.
Gets rid of many wrong or bad checks for IsCoreContext.
2020-12-14 19:46:11 +01:00
Henrik Rydgård
766dbc5a9f Move ShaderTranslation.cpp/h to Common/GPU. 2020-11-09 11:18:43 +01:00
Henrik Rydgård
03e8eac6ef Merge the two ShaderLanguage enums. 2020-11-04 09:40:11 +01:00
Henrik Rydgård
b7d674411e Test parsing of generated OpenGL shaders too (by using glslang). 2020-10-31 18:32:43 +01:00
Henrik Rydgård
886a8b1ac6 Remove Timer.cpp/h. Move various collections into Common/Data/Collections. 2020-10-05 21:05:23 +02:00
Henrik Rydgård
0e3a84b4a8 Move most GPU things to Common.
It works after the move, on Windows and Android at least.

Deletes the D3DX9 shader compiler loader, which was not used.
2020-10-04 23:39:02 +02:00
Henrik Rydgård
b7edf75437 Move Display.cpp/h to Common. 2020-10-04 11:42:16 +02:00
Henrik Rydgård
821817e6d4 Move the profiler to Common 2020-10-04 11:42:16 +02:00
Henrik Rydgård
9e41fafd0d Move math and some file and data conversion files out from native to Common.
Buildfixing

Move some file util files

Buildfix

Move KeyMap.cpp/h to Core where they belong better.

libretro buildfix attempt

Move ini_file

More buildfixes
2020-10-04 09:12:46 +02:00
Henrik Rydgård
3162f30158 Merge base/basictypes.h into Common/Common.h (mostly). 2020-09-29 15:51:51 +02:00
Henrik Rydgård
1b3413945b Some header include cleanup 2020-09-16 09:20:41 +02:00
Henrik Rydgård
cea35007ae Always use a linear filter for video, unless forcing NEAREST filtering. 2020-09-13 16:40:37 +02:00
Henrik Rydgård
60a6bf6d43 Optimize the DarkStalkers software rendering path a little more. 2020-09-12 16:10:17 +02:00
Unknown W. Brackets
5fae2171cc softgpu: Correct cull handling for sprites. 2020-09-08 16:29:45 -07:00
Unknown W. Brackets
3055deeba6 GPU: Fix some case warnings.
Better to avoid the warnings.
2020-08-19 21:18:44 -07:00
Henrik Rydgård
2e06386cf6 Software renderer clipper: Don't clip on the sides. Fixes and should fix for the SW renderer. 2020-08-16 21:38:07 +02:00
Henrik Rydgård
c5e0b799d9 Remove category from _assert_msg_ functions. We don't filter these by category anyway.
Fixes the inconsistency where we _assert_ didn't take a category but
_assert_msg_ did.
2020-07-19 20:33:25 +02:00
Henrik Rydgård
defa8aa480 DarkStalkers: Handle the "normal" screen stretch too, not just "wide", to avoid a surprising performance drop. 2020-05-24 16:53:44 +02:00
Henrik Rydgård
fabe987c8f Add a name tag for all render steps (GL/Vulkan). Helps with debugging and should be cheap enough (a single pointer per "step"). 2020-05-21 11:24:05 +02:00
Henrik Rydgård
7a6489ebb4
Merge pull request from unknownbrackets/postshader
Allow chained post-processing shaders
2020-05-17 16:09:05 +02:00
Unknown W. Brackets
7910b4029a arm64jit: Track writable and non-writable pointers.
Switch uses different memory regions.  We can handle this, might as well
cleanup some const abuse.
2020-05-17 00:15:12 -07:00
Unknown W. Brackets
b79ecc159f GPU: Update postshader uniforms for each. 2020-05-16 12:04:36 -07:00
Henrik Rydgård
864d138cd9 Fix DarkStalkers after the just-merged refactoring. 2020-05-14 23:28:37 +02:00
Unknown W. Brackets
7024a2877d GPU: Take A off RGB565 conversion funcs. 2020-05-13 18:17:58 -07:00
Unknown W. Brackets
03e3a935da GPU: Cleanup presentation flipping a bit. 2020-05-13 18:11:25 -07:00
Unknown W. Brackets
a41fbb9225 softgpu: Fix postshader on 5551.
This also fixes rendering on Windows 7 Direct3D 11.
2020-05-13 18:10:09 -07:00
Unknown W. Brackets
762b656ea2 GPU: Use a texture directly for MakePixelTexture.
This makes it easier to do things with it.
2020-05-13 18:10:09 -07:00
Unknown W. Brackets
2653e50200 softgpu: Avoid RB swizzle when using a postshader.
So that it can post-process correctly.
2020-05-13 18:10:09 -07:00
Unknown W. Brackets
3aa8287b74 softgpu: Enable postshader support. 2020-05-13 18:10:09 -07:00
Unknown W. Brackets
cb94487a16 GPU: Move post shader handling to new class.
Currently, Vulkan is not working properly and direct (RAM -> output) is
not hooked up.  But in general, it works.
2020-05-13 18:10:06 -07:00
Unknown W. Brackets
57bd88fc33 softgpu: Allow display rotation. 2020-05-13 18:07:25 -07:00
Unknown W. Brackets
a03e368566 GPU: Move cardboard/etc. to PresentationCommon.
Now this works on softgpu as well.

Some hacks for backend differences...
2020-05-13 18:07:25 -07:00
Unknown W. Brackets
d39b0bdca2 GPU: Split FramebufferCommon into two classes.
Only some things moved over so far.

FramebufferCommon does too much, we want to share it with softgpu without
all the buffer management stuff.
2020-05-13 18:07:22 -07:00
Unknown W. Brackets
1b9440611a softgpu: Fix texture overlap.
Mainly happened when we had wide textures and split them up between GPUs.
2020-05-13 17:53:00 -07:00
Unknown W. Brackets
ac60e2ecd4 GPU: Track HW tess at start of frame too.
This also makes it so we don't force the setting off when you change
backends, and just ignore it if unsupported.
2020-04-04 11:52:32 -07:00
Unknown W. Brackets
4a0109d273 GPU: Treat negative light exp same as 0.
Based on  and some tests, seems like negative exponents are also
fixed to a 1.0f result.
2020-03-22 22:28:05 -07:00
Unknown W. Brackets
f1dfb25427 softgpu: Correct clear/solid rect BR corner.
The scissor is inclusive, not exclusive.
2020-03-09 18:57:55 -07:00
Unknown W. Brackets
cebcfb1bbd GPU: Use old frame when presenting a skip.
If we flip using a skipped frame, we may show an even older frame causing
weird flickering.
2020-03-01 13:55:28 -08:00
Unknown W. Brackets
072041a63d SoftGPU: Convert from 16-bit if unsupported.
Should help , but not actually tested on an affected device.
2019-12-24 11:08:44 -08:00
Henrik Rydgård
54823a87cc Oops 2019-10-28 13:13:52 +01:00
Henrik Rydgård
970adfbcc9 Isolate most of the softgpu specialization code to RasterizerRectangle.
See comments.
2019-10-28 09:33:30 +01:00
Henrik Rydgård
1966c8fe75 Fix a backwards check 2019-10-27 20:55:32 +01:00
Henrik Rydgård
6c8186d046 Remove unused textureswizzle support (we use shaders instead). Universally support presenting 5551 format directly. 2019-10-27 20:55:32 +01:00
Henrik Rydgård
86c781e434 Hack around most of the problems with the save/load dialog. Software stretch gets enabled in non-wide mode, so wallpapers work at a cost of speed. 2019-10-27 20:55:32 +01:00
Henrik Rydgård
102a70b4a5 Scissor fix 2019-10-27 20:55:32 +01:00
Henrik Rydgård
a84f4a0caa Even more speed. 2019-10-27 20:55:32 +01:00
Henrik Rydgård
eb53609cb0 More speed 2019-10-27 20:55:32 +01:00
Henrik Rydgård
bbbd7f8acc Buildfix 2019-10-27 20:55:32 +01:00
Henrik Rydgård
714f83f614 Further specialization. 2019-10-27 20:54:36 +01:00
Henrik Rydgård
290e9971a7 More specialization work. 2019-10-27 20:54:36 +01:00
Henrik Rydgård
4f7c23fe79 DarkStalkers: Fix display on OpenGL ES. 2019-10-27 20:54:36 +01:00
Henrik Rydgård
796539ad7f DarkStalkers: Fix display in the D3D backends. Still broken in OpenGL. 2019-10-27 20:54:36 +01:00
Henrik Rydgård
9099441973 Darkstalkers: Gross hack to avoid the game's own stretch, and present the raw buffer instead for a sharper image. 2019-10-27 20:54:36 +01:00
Henrik Rydgård
2dd7a9aa12 More darkstalkers work 2019-10-27 20:54:36 +01:00
Henrik Rydgård
c7f6724f7e Detect sprite drawing (1:1 texture mapping), run a simpler function without the triangle state tracking.
This will allow further simplification and specialization.
2019-10-27 20:54:36 +01:00
Henrik Rydgård
510229b68b SoftGPU: Detect through-mode rectangles from triangle strips 2019-10-27 20:54:36 +01:00
Henrik Rydgård
58568632e8 Software renderer: Use hardware color conversion on Vulkan in 5551 16-bit mode 2019-10-27 20:54:36 +01:00
Henrik Rydgård
3a0804a7dd Start slowly migrating from macros 2019-10-27 20:54:36 +01:00
Henrik Rydgård
ae286aef86 Vulkan+SoftwareRenderer: Fix screen rotation on Android.
(Missed this because software rendering is normally disabled on Android)
2019-10-22 22:08:21 +02:00
Unknown W. Brackets
5871ab0538 UI: Stop caching the draw context in coreParam.
This is possibly getting outdated in some paths of graphics reinit, and
then causing crashes.  Let's just always get it from the graphicsContext.
2019-09-28 21:58:15 -07:00
Unknown W. Brackets
7412e13767 SoftGPU: Implement dithering.
Note: it applies even in 8888, so it can be used as a slight brightness
adjustment.
2019-05-26 09:52:34 -07:00
Unknown W. Brackets
0b48c6d066 SoftGPU: Apply color doubling only to RGB.
Broken in  - accidentally applied to the alpha value.  See 
for an example where this caused issues with blending.
2019-03-16 19:40:33 -07:00