Commit Graph

6945 Commits

Author SHA1 Message Date
Unknown W. Brackets
3387ab1711 samplerjit: Fix reg corruption in DXT funcs.
We'd cache something in a reg, but it'd no longer be there.
2022-01-29 20:29:08 -08:00
Unknown W. Brackets
d200ef40de samplerjit: Compile sampler funcs together.
We can't have the cache clear between nearest/linear, because then we'll
call a bunch of int3's.
2022-01-29 20:28:20 -08:00
Unknown W. Brackets
0d93200faf softjit: Add tests for compile success. 2022-01-29 18:47:36 -08:00
Unknown W. Brackets
3dde3efa9f softjit: Fix stencil bug running out of regs.
To apply the stencil test mask, we need another gen reg.
2022-01-29 18:31:40 -08:00
Henrik Rydgård
078c61cfc9
Merge pull request #15360 from unknownbrackets/samplerjit-opt
Cleanup register usage a bit in samplerjit
2022-01-30 00:31:01 +01:00
Unknown W. Brackets
ce775af76d softgpu: Skip new CLUT if identical.
Games often reupload CLUT data that is already there, this skips some
copying later in the bin manager.
2022-01-29 12:55:34 -08:00
Unknown W. Brackets
a181c9f7c4 unittest: Correct failing unit test.
Was just an invalid flag combination.
2022-01-29 12:22:11 -08:00
Unknown W. Brackets
5976cad797 samplerjit: Reduce register waste.
A few registers were allocated longer than needed, which made requiring
stack more likely.
2022-01-29 09:47:06 -08:00
Unknown W. Brackets
eb70a90347 samplerjit: Avoid frac uv transfer to gen regs.
It should just stay in vec, this is more convenient anyway.
2022-01-28 23:50:54 -08:00
Henrik Rydgård
54053b6b8b
Merge pull request #15347 from unknownbrackets/samplerjit-opt
Improve samplerjit code a bit with mipmaps
2022-01-26 09:19:06 +01:00
Unknown W. Brackets
e82b54e4b6 softgpu: Cull no-pos and through s8 pos verts.
Seems like these just don't draw anything, ever.
2022-01-25 19:29:11 -08:00
Unknown W. Brackets
61e30e8f8b softgpu: Fix cull in throughmode.
Was only an issue for triangles used to draw rectangles, but caused our
test to fail.

Also move a test that was failing due to an outdated prx to passing.
2022-01-25 19:07:33 -08:00
Unknown W. Brackets
99d6d569f0 samplerjit: Reduce transfers in nearest texel calc.
This benefits a few games, mostly where there's lots of UI or similar.
2022-01-24 21:28:04 -08:00
Unknown W. Brackets
c1e657ed47 samplerjit: Better vectorize UV linear calc.
Gives about 1-2% when mips are used.
2022-01-24 20:42:07 -08:00
Unknown W. Brackets
733046962f samplerjit: Reuse XMM reg for sizes.
Gives just under 1% overall improvement in games using mips.
2022-01-24 19:01:23 -08:00
Henrik Rydgård
0e2f5d66b6
Merge pull request #15345 from unknownbrackets/softgpu-blend
Fix some minor softgpu blending bugs
2022-01-24 09:37:59 +01:00
Henrik Rydgård
fbc965fb59
Merge pull request #15343 from unknownbrackets/gpu-region
GPU: Log and report when region1 is non-zero
2022-01-24 09:18:17 +01:00
Henrik Rydgård
1c18c172a1
Merge pull request #15339 from unknownbrackets/softgpu-flags
Use dirty flags for softgpu state updates
2022-01-24 09:17:37 +01:00
Unknown W. Brackets
07b67ef572 softgpu: Fix pixel ID for invalid blend factors.
They should still be treated as FIX, we were accidentally using our
special values.
2022-01-24 00:08:33 -08:00
Unknown W. Brackets
6c723c0517 softjit: Fix src blend factor handling.
This was causing us to skip a shift, oops.
2022-01-24 00:05:00 -08:00
Henrik Rydgård
c1b8fb737e
Merge pull request #15344 from unknownbrackets/gpu-signal-jumps
GPU: Allow relative jumps and calls
2022-01-24 08:38:45 +01:00
Henrik Rydgård
eba93f2ee0
Merge pull request #15340 from unknownbrackets/softgpu-textures
Correct UV rotation and through mipmaps, optimize texenv blend a bit
2022-01-24 08:19:34 +01:00
Unknown W. Brackets
e47ee79899 GE Debugger: Allow GPU stepping while CPU stepping.
This can happen if you step into an update stall address call or similar.
2022-01-23 23:06:33 -08:00
Unknown W. Brackets
511c822312 GPU: Allow relative jumps and calls.
These are tested in gpu/signals/jumps, so they ought to work.
Doesn't seem like games generally use them, though.
2022-01-23 23:03:30 -08:00
Unknown W. Brackets
8efb99801e GPU: Log and report when region1 is non-zero. 2022-01-23 19:38:51 -08:00
Unknown W. Brackets
818d17183b softgpu: Correct clear mode dither.
It does apply, but have to be careful about alpha.
2022-01-23 12:39:50 -08:00
Unknown W. Brackets
3010cd56d1 softgpu: Correct simple rectangles with mipmaps.
Might be used for fonts, we could potentially check for bias/slope, but
mipmaps are uncommon in direct through draws anyway.
2022-01-23 12:26:58 -08:00
Unknown W. Brackets
d8c5c35b1a samplerjit: Optimize texenv blending a bit.
This reduces to a single multiply, which is much faster.
2022-01-23 11:43:34 -08:00
Unknown W. Brackets
648b71616e softgpu: Correct UV rotation for transformed rects. 2022-01-23 08:15:15 -08:00
Unknown W. Brackets
d74001f4fa softgpu: Reuse transform state. 2022-01-23 08:08:41 -08:00
Unknown W. Brackets
9ea5367a8c softgpu: Add dirty flags for rasterization state. 2022-01-23 08:08:41 -08:00
Unknown W. Brackets
a27da25cd6 softgpu: Use dirty flags for render overlap checks. 2022-01-23 08:08:40 -08:00
Unknown W. Brackets
77db9c818f softgpu: Fix state race on screen offset.
Caused glitches in Motorstorm.
2022-01-23 08:08:40 -08:00
Unknown W. Brackets
76f9103e97 softgpu: Add a table and initial dirty flags.
Not actually using the dirty flags to skip state, but have moved to
Execute_* functions and everything else like other graphics backends.
2022-01-23 08:08:40 -08:00
Henrik Rydgård
5a6bf8b435
Merge pull request #15338 from unknownbrackets/ge-debugger
Alow flushing at will via the GE debugger
2022-01-23 00:30:52 +01:00
Unknown W. Brackets
eb95b99523 GE Debugger: Add option to auto flush.
This makes it easier to see what's happening in each draw.
2022-01-22 13:12:59 -08:00
Unknown W. Brackets
4262e657b4 samplerjit: Oops, forgot about 64 unpack.
Just a minor codegen tweak.  Always forget there are more of these than
pack instructions.
2022-01-22 10:49:36 -08:00
Unknown W. Brackets
0425b8d630 samplerjit: Fix Linux stack corruption.
Oops, nearest was not using the red zone correctly.
2022-01-22 10:47:32 -08:00
Henrik Rydgård
b5e8c21042
Merge pull request #15334 from unknownbrackets/headless
Update pspautotests, require passing in GitHub Actions
2022-01-22 09:36:30 +01:00
Unknown W. Brackets
ce0e872d37 softgpu: Define constexpr var for older C++. 2022-01-22 00:14:15 -08:00
Unknown W. Brackets
212e730e98 samplerjit: Fix some Linux register issues. 2022-01-22 00:14:15 -08:00
Unknown W. Brackets
c0c3f7284a softgpu: Avoid flush texturing from stride.
This generally detects overlap more accurately using a dirty rectangles
approach.  Also detects render to self much more accurately, including
with depth.
2022-01-20 18:39:01 -08:00
Unknown W. Brackets
dec0ba7b79 softgpu: Flush framebuf only on change.
Sometimes games are reasserting the same framebuf, which was causing
unnecessary flushing.
2022-01-20 17:02:23 -08:00
Unknown W. Brackets
c4c54730bf softgpu: Remove bin asserts.
These are active in release and used in tight loops.
2022-01-20 16:59:38 -08:00
Unknown W. Brackets
55c11425e4 softgpu: Use persistent bin task state.
It's constant, so it's better to avoid the copying and allocation.  A
small win, but removes new from the profile.
2022-01-20 16:58:43 -08:00
Unknown W. Brackets
3e4d768e7a softgpu: Pack vertexdata a bit better.
This reduces the BinItem size by 15%.
2022-01-19 23:17:09 -08:00
Unknown W. Brackets
6ec819878a samplerjit: Reduce prolog/epilog spill.
Track reg usage so we only push/pop what we need.
2022-01-19 00:03:59 -08:00
Unknown W. Brackets
357e2e9d68 softjit: Simplify constant writes. 2022-01-19 00:03:59 -08:00
Unknown W. Brackets
c2985bca31 softjit: Centralize some common funcs from sampler.
No need to duplicate this code.
2022-01-19 00:03:59 -08:00
Henrik Rydgård
b1d158e3e6
Merge pull request #15327 from unknownbrackets/softjit-const
softjit: Switch to constant pool for draw pixel
2022-01-18 09:08:44 +01:00
Unknown W. Brackets
ac2b96cec0 softjit: Switch to constant pool.
This is simpler without RIP access checks, and tends to be fast.
2022-01-17 19:50:37 -08:00
Unknown W. Brackets
0ba2d05da5 samplerjit: Simplify AVX shift-copies.
These have been the most common and the fallback is safe.  Let's just add
a helper.
2022-01-17 15:15:36 -08:00
Henrik Rydgård
4ea1c08551
Merge pull request #15323 from unknownbrackets/softgpu-opt2
softgpu: Guide more SSE light factor handling
2022-01-17 15:56:46 +01:00
Unknown W. Brackets
7218fbbe97 softgpu: Guide more SSE light factor handling.
Missed these others in computed state.  Helps mostly to do this inside
Process().
2022-01-17 06:25:52 -08:00
Unknown W. Brackets
abef17caca softgpu: Simplify mask check.
This performs a bit better.
2022-01-16 23:40:57 -08:00
Unknown W. Brackets
89bc87a388 softgpu: Reduce copying during clipping.
Common case is nothing needs to be clipped.
2022-01-16 23:33:46 -08:00
Henrik Rydgård
128e2fa14e
Merge pull request #15318 from unknownbrackets/softgpu-opt
softgpu: Heuristic to avoid over-draining
2022-01-17 07:43:34 +01:00
Henrik Rydgård
5c15054181
Merge pull request #15321 from unknownbrackets/debugger
Debugger: Fix crash in software renderer
2022-01-17 07:41:59 +01:00
Henrik Rydgård
e603e201da
Merge pull request #15320 from unknownbrackets/softgpu-flush
softgpu: Fix block transfer flush detection
2022-01-17 07:41:01 +01:00
Unknown W. Brackets
653c036ac8 Debugger: Fix crash in software renderer.
The clut isn't set by sampler state, it's set normally by the binner.
2022-01-16 21:53:55 -08:00
Unknown W. Brackets
206d586c1f softgpu: Fix block transfer flush detection.
Fixes video graphics in Gods Eater Burst.
2022-01-16 21:40:19 -08:00
Unknown W. Brackets
fcc3b7684e softgpu: Use SSE in lighting param computation.
The compiler couldn't figure this out.  Halves time in this func.
2022-01-16 21:31:53 -08:00
Unknown W. Brackets
73c143c44c softgpu: Precompute some of screen space multiply.
This at least avoids the shifts and makes it easier to vectorize.
Only helps a little.
2022-01-16 21:31:53 -08:00
Unknown W. Brackets
31745110e8 softpu: Premultiply matrix transforms.
Where possible, we can skip some multiplies per vertex.
2022-01-16 21:31:52 -08:00
Unknown W. Brackets
12a4c63fc7 softgpu: Precompute state for vertex transform.
Doesn't help a ton, but with lots of verts can improve a percent or two.
2022-01-16 21:31:52 -08:00
Unknown W. Brackets
423ec76258 softgpu: Correct texsize flush annotation. 2022-01-16 21:09:43 -08:00
Unknown W. Brackets
83adc44c2b softgpu: Heuristic to avoid over-draining.
Some games (i.e. VC3) benefit from an early drain, since they get more
done while processing more verts.  Others finish the draw quickly, and
then cause significant overhead in queueing new threads.

This attempts to balance the two, and improves Call of Duty and Blade
Dancer.
2022-01-16 21:09:28 -08:00
Henrik Rydgård
bdc69f5171
Merge pull request #15317 from unknownbrackets/softgpu-lighting
softgpu: Precompute lighting parameters
2022-01-17 01:06:35 +01:00
Unknown W. Brackets
1764111a4b softgpu: Reduce wasted memory. 2022-01-16 11:49:41 -08:00
Unknown W. Brackets
2797e035df softgpu: Precompute lighting parameters.
In many cases, games use lighting just for diffuse or something, this
helps skip what's not needed too.  Good improvement in a scene from a
Naruto game.
2022-01-16 11:27:53 -08:00
Unknown W. Brackets
cb5ac04d16 softgpu: Tune some queue sizes for perf.
Using a chunk of RAM for this, but mostly with many threads.
2022-01-16 11:27:43 -08:00
Unknown W. Brackets
d95475e021 softgpu: Expose flush reasons/times in debug stats. 2022-01-16 11:27:42 -08:00
Unknown W. Brackets
7e5f03eed1 softgpu: Reduce flushing for smaller textures. 2022-01-16 08:23:52 -08:00
Unknown W. Brackets
86749a3fe0 softgpu: Flush block xfer only on overlap too. 2022-01-16 08:23:17 -08:00
Unknown W. Brackets
2de7993dc5 softgpu: Decorate some stats for flushes. 2022-01-16 08:23:15 -08:00
Unknown W. Brackets
cc155ec460 softgpu: Avoid texture/CLUT flush unless overlap.
Only need to flush here if there's some overlap in the target.
2022-01-16 08:22:13 -08:00
Unknown W. Brackets
9466dc6397 softgpu: Flush on offset changes. 2022-01-16 08:14:10 -08:00
Unknown W. Brackets
d6fa301ab1 softgpu: Track CLUTs as states for binning.
This way we can have multiple CLUTs in process at once, which helps.
2022-01-16 08:14:09 -08:00
Henrik Rydgård
ba63d9cf09
Merge pull request #15312 from unknownbrackets/softgpu-state
softgpu: Fix alpha blend with one/zero
2022-01-16 10:32:28 +01:00
Unknown W. Brackets
18f2a45a6a softgpu: Allow binning across prim calls. 2022-01-16 00:49:49 -08:00
Henrik Rydgård
9bef900cd7
Merge pull request #15311 from unknownbrackets/softgpu-state
Avoid gstate references in rasterizerization
2022-01-16 09:40:25 +01:00
Unknown W. Brackets
2ad7d8ed29 softgpu: Fix alpha blend with one/zero.
Wasn't setting the fixed value constants in these cases, so need to handle
in the C++ version.
2022-01-16 00:38:49 -08:00
Unknown W. Brackets
fc292b127b softgpu: Correct dither matrix lookup.
Oops, need to wrap x/y, of course...
2022-01-15 23:51:21 -08:00
Unknown W. Brackets
6da7765309 softgpu: Correct logic op state update. 2022-01-15 22:31:28 -08:00
Unknown W. Brackets
b42ebe15d8 softgpu: Fix off-by-one size limit on bin queues. 2022-01-15 21:59:23 -08:00
Unknown W. Brackets
2539fb7c3c softgpu: Tune queue push/pop to reduce overhead.
These aren't safetly atomic with concurrent pushers or poppers, but as
long as there's only one of each, they're still safe.

Shaves a decent % off Drain time for heavy scenes.
2022-01-15 20:18:49 -08:00
Unknown W. Brackets
6896a7a64e softgpu: Use cached state for screen offset. 2022-01-15 18:20:25 -08:00
Unknown W. Brackets
edb79d968f softgpu: Cache CLUT params in sampler state.
And now there's no more gstate for pixel drawing or sampling.  Just a
little left in rasterization.
2022-01-15 18:09:09 -08:00
Unknown W. Brackets
c0e85e6170 softgpu: Move texenv color into sampler state. 2022-01-15 17:52:40 -08:00
Unknown W. Brackets
ad3635c82a softgpu: Move tex size to cached state. 2022-01-15 17:22:43 -08:00
Unknown W. Brackets
02c5559393 softgpu: Remove z from DrawingCoords.
It's not really used much of anywhere, anyway.
2022-01-15 15:38:56 -08:00
Unknown W. Brackets
bf2e060735 softgpu: Move c++ tex func to sampler.
It's not used anywhere else now.
2022-01-15 15:28:07 -08:00
Unknown W. Brackets
a228b2ab6c softgpu: Use cached sampler state outside jit. 2022-01-15 15:26:26 -08:00
Unknown W. Brackets
a2abf9402b softgpu: Cache line drawing state. 2022-01-15 13:17:40 -08:00
Unknown W. Brackets
58455c8cf1 softgpu: Use cached state for clear write mask. 2022-01-15 13:03:11 -08:00
Unknown W. Brackets
092b03bd67 softgpu: Move fixed blend factor to draw pix state.
This is the last of the gstate.
2022-01-15 13:03:11 -08:00
Unknown W. Brackets
f4f7ea2736 softgpu: Cache colortest params in draw pix state. 2022-01-15 13:03:11 -08:00
Unknown W. Brackets
aa9d751248 softgpu: Cache alpha/stencil test masks in state. 2022-01-15 13:03:11 -08:00
Unknown W. Brackets
acad2640dd softgpu: Cache logicOp in draw pixel state. 2022-01-15 13:03:10 -08:00
Unknown W. Brackets
c0d548846f softgpu: Use cached write mask in draw pixel. 2022-01-15 13:03:10 -08:00