Unknown W. Brackets
eb95b99523
GE Debugger: Add option to auto flush.
...
This makes it easier to see what's happening in each draw.
2022-01-22 13:12:59 -08:00
Unknown W. Brackets
4262e657b4
samplerjit: Oops, forgot about 64 unpack.
...
Just a minor codegen tweak. Always forget there are more of these than
pack instructions.
2022-01-22 10:49:36 -08:00
Unknown W. Brackets
0425b8d630
samplerjit: Fix Linux stack corruption.
...
Oops, nearest was not using the red zone correctly.
2022-01-22 10:47:32 -08:00
Henrik Rydgård
b5e8c21042
Merge pull request #15334 from unknownbrackets/headless
...
Update pspautotests, require passing in GitHub Actions
2022-01-22 09:36:30 +01:00
Unknown W. Brackets
ce0e872d37
softgpu: Define constexpr var for older C++.
2022-01-22 00:14:15 -08:00
Unknown W. Brackets
212e730e98
samplerjit: Fix some Linux register issues.
2022-01-22 00:14:15 -08:00
Unknown W. Brackets
c0c3f7284a
softgpu: Avoid flush texturing from stride.
...
This generally detects overlap more accurately using a dirty rectangles
approach. Also detects render to self much more accurately, including
with depth.
2022-01-20 18:39:01 -08:00
Unknown W. Brackets
dec0ba7b79
softgpu: Flush framebuf only on change.
...
Sometimes games are reasserting the same framebuf, which was causing
unnecessary flushing.
2022-01-20 17:02:23 -08:00
Unknown W. Brackets
c4c54730bf
softgpu: Remove bin asserts.
...
These are active in release and used in tight loops.
2022-01-20 16:59:38 -08:00
Unknown W. Brackets
55c11425e4
softgpu: Use persistent bin task state.
...
It's constant, so it's better to avoid the copying and allocation. A
small win, but removes new from the profile.
2022-01-20 16:58:43 -08:00
Unknown W. Brackets
3e4d768e7a
softgpu: Pack vertexdata a bit better.
...
This reduces the BinItem size by 15%.
2022-01-19 23:17:09 -08:00
Unknown W. Brackets
6ec819878a
samplerjit: Reduce prolog/epilog spill.
...
Track reg usage so we only push/pop what we need.
2022-01-19 00:03:59 -08:00
Unknown W. Brackets
357e2e9d68
softjit: Simplify constant writes.
2022-01-19 00:03:59 -08:00
Unknown W. Brackets
c2985bca31
softjit: Centralize some common funcs from sampler.
...
No need to duplicate this code.
2022-01-19 00:03:59 -08:00
Henrik Rydgård
b1d158e3e6
Merge pull request #15327 from unknownbrackets/softjit-const
...
softjit: Switch to constant pool for draw pixel
2022-01-18 09:08:44 +01:00
Unknown W. Brackets
ac2b96cec0
softjit: Switch to constant pool.
...
This is simpler without RIP access checks, and tends to be fast.
2022-01-17 19:50:37 -08:00
Unknown W. Brackets
0ba2d05da5
samplerjit: Simplify AVX shift-copies.
...
These have been the most common and the fallback is safe. Let's just add
a helper.
2022-01-17 15:15:36 -08:00
Henrik Rydgård
4ea1c08551
Merge pull request #15323 from unknownbrackets/softgpu-opt2
...
softgpu: Guide more SSE light factor handling
2022-01-17 15:56:46 +01:00
Unknown W. Brackets
7218fbbe97
softgpu: Guide more SSE light factor handling.
...
Missed these others in computed state. Helps mostly to do this inside
Process().
2022-01-17 06:25:52 -08:00
Unknown W. Brackets
abef17caca
softgpu: Simplify mask check.
...
This performs a bit better.
2022-01-16 23:40:57 -08:00
Unknown W. Brackets
89bc87a388
softgpu: Reduce copying during clipping.
...
Common case is nothing needs to be clipped.
2022-01-16 23:33:46 -08:00
Henrik Rydgård
128e2fa14e
Merge pull request #15318 from unknownbrackets/softgpu-opt
...
softgpu: Heuristic to avoid over-draining
2022-01-17 07:43:34 +01:00
Henrik Rydgård
5c15054181
Merge pull request #15321 from unknownbrackets/debugger
...
Debugger: Fix crash in software renderer
2022-01-17 07:41:59 +01:00
Henrik Rydgård
e603e201da
Merge pull request #15320 from unknownbrackets/softgpu-flush
...
softgpu: Fix block transfer flush detection
2022-01-17 07:41:01 +01:00
Unknown W. Brackets
653c036ac8
Debugger: Fix crash in software renderer.
...
The clut isn't set by sampler state, it's set normally by the binner.
2022-01-16 21:53:55 -08:00
Unknown W. Brackets
206d586c1f
softgpu: Fix block transfer flush detection.
...
Fixes video graphics in Gods Eater Burst.
2022-01-16 21:40:19 -08:00
Unknown W. Brackets
fcc3b7684e
softgpu: Use SSE in lighting param computation.
...
The compiler couldn't figure this out. Halves time in this func.
2022-01-16 21:31:53 -08:00
Unknown W. Brackets
73c143c44c
softgpu: Precompute some of screen space multiply.
...
This at least avoids the shifts and makes it easier to vectorize.
Only helps a little.
2022-01-16 21:31:53 -08:00
Unknown W. Brackets
31745110e8
softpu: Premultiply matrix transforms.
...
Where possible, we can skip some multiplies per vertex.
2022-01-16 21:31:52 -08:00
Unknown W. Brackets
12a4c63fc7
softgpu: Precompute state for vertex transform.
...
Doesn't help a ton, but with lots of verts can improve a percent or two.
2022-01-16 21:31:52 -08:00
Unknown W. Brackets
423ec76258
softgpu: Correct texsize flush annotation.
2022-01-16 21:09:43 -08:00
Unknown W. Brackets
83adc44c2b
softgpu: Heuristic to avoid over-draining.
...
Some games (i.e. VC3) benefit from an early drain, since they get more
done while processing more verts. Others finish the draw quickly, and
then cause significant overhead in queueing new threads.
This attempts to balance the two, and improves Call of Duty and Blade
Dancer.
2022-01-16 21:09:28 -08:00
Henrik Rydgård
bdc69f5171
Merge pull request #15317 from unknownbrackets/softgpu-lighting
...
softgpu: Precompute lighting parameters
2022-01-17 01:06:35 +01:00
Unknown W. Brackets
1764111a4b
softgpu: Reduce wasted memory.
2022-01-16 11:49:41 -08:00
Unknown W. Brackets
2797e035df
softgpu: Precompute lighting parameters.
...
In many cases, games use lighting just for diffuse or something, this
helps skip what's not needed too. Good improvement in a scene from a
Naruto game.
2022-01-16 11:27:53 -08:00
Unknown W. Brackets
cb5ac04d16
softgpu: Tune some queue sizes for perf.
...
Using a chunk of RAM for this, but mostly with many threads.
2022-01-16 11:27:43 -08:00
Unknown W. Brackets
d95475e021
softgpu: Expose flush reasons/times in debug stats.
2022-01-16 11:27:42 -08:00
Unknown W. Brackets
7e5f03eed1
softgpu: Reduce flushing for smaller textures.
2022-01-16 08:23:52 -08:00
Unknown W. Brackets
86749a3fe0
softgpu: Flush block xfer only on overlap too.
2022-01-16 08:23:17 -08:00
Unknown W. Brackets
2de7993dc5
softgpu: Decorate some stats for flushes.
2022-01-16 08:23:15 -08:00
Unknown W. Brackets
cc155ec460
softgpu: Avoid texture/CLUT flush unless overlap.
...
Only need to flush here if there's some overlap in the target.
2022-01-16 08:22:13 -08:00
Unknown W. Brackets
9466dc6397
softgpu: Flush on offset changes.
2022-01-16 08:14:10 -08:00
Unknown W. Brackets
d6fa301ab1
softgpu: Track CLUTs as states for binning.
...
This way we can have multiple CLUTs in process at once, which helps.
2022-01-16 08:14:09 -08:00
Henrik Rydgård
ba63d9cf09
Merge pull request #15312 from unknownbrackets/softgpu-state
...
softgpu: Fix alpha blend with one/zero
2022-01-16 10:32:28 +01:00
Unknown W. Brackets
18f2a45a6a
softgpu: Allow binning across prim calls.
2022-01-16 00:49:49 -08:00
Henrik Rydgård
9bef900cd7
Merge pull request #15311 from unknownbrackets/softgpu-state
...
Avoid gstate references in rasterizerization
2022-01-16 09:40:25 +01:00
Unknown W. Brackets
2ad7d8ed29
softgpu: Fix alpha blend with one/zero.
...
Wasn't setting the fixed value constants in these cases, so need to handle
in the C++ version.
2022-01-16 00:38:49 -08:00
Unknown W. Brackets
fc292b127b
softgpu: Correct dither matrix lookup.
...
Oops, need to wrap x/y, of course...
2022-01-15 23:51:21 -08:00
Unknown W. Brackets
6da7765309
softgpu: Correct logic op state update.
2022-01-15 22:31:28 -08:00
Unknown W. Brackets
b42ebe15d8
softgpu: Fix off-by-one size limit on bin queues.
2022-01-15 21:59:23 -08:00
Unknown W. Brackets
2539fb7c3c
softgpu: Tune queue push/pop to reduce overhead.
...
These aren't safetly atomic with concurrent pushers or poppers, but as
long as there's only one of each, they're still safe.
Shaves a decent % off Drain time for heavy scenes.
2022-01-15 20:18:49 -08:00
Unknown W. Brackets
6896a7a64e
softgpu: Use cached state for screen offset.
2022-01-15 18:20:25 -08:00
Unknown W. Brackets
edb79d968f
softgpu: Cache CLUT params in sampler state.
...
And now there's no more gstate for pixel drawing or sampling. Just a
little left in rasterization.
2022-01-15 18:09:09 -08:00
Unknown W. Brackets
c0e85e6170
softgpu: Move texenv color into sampler state.
2022-01-15 17:52:40 -08:00
Unknown W. Brackets
ad3635c82a
softgpu: Move tex size to cached state.
2022-01-15 17:22:43 -08:00
Unknown W. Brackets
02c5559393
softgpu: Remove z from DrawingCoords.
...
It's not really used much of anywhere, anyway.
2022-01-15 15:38:56 -08:00
Unknown W. Brackets
bf2e060735
softgpu: Move c++ tex func to sampler.
...
It's not used anywhere else now.
2022-01-15 15:28:07 -08:00
Unknown W. Brackets
a228b2ab6c
softgpu: Use cached sampler state outside jit.
2022-01-15 15:26:26 -08:00
Unknown W. Brackets
a2abf9402b
softgpu: Cache line drawing state.
2022-01-15 13:17:40 -08:00
Unknown W. Brackets
58455c8cf1
softgpu: Use cached state for clear write mask.
2022-01-15 13:03:11 -08:00
Unknown W. Brackets
092b03bd67
softgpu: Move fixed blend factor to draw pix state.
...
This is the last of the gstate.
2022-01-15 13:03:11 -08:00
Unknown W. Brackets
f4f7ea2736
softgpu: Cache colortest params in draw pix state.
2022-01-15 13:03:11 -08:00
Unknown W. Brackets
aa9d751248
softgpu: Cache alpha/stencil test masks in state.
2022-01-15 13:03:11 -08:00
Unknown W. Brackets
acad2640dd
softgpu: Cache logicOp in draw pixel state.
2022-01-15 13:03:10 -08:00
Unknown W. Brackets
c0d548846f
softgpu: Use cached write mask in draw pixel.
2022-01-15 13:03:10 -08:00
Unknown W. Brackets
f1ce2e7715
softgpu: Cache minz/maxz in draw pixel state.
2022-01-15 13:03:10 -08:00
Unknown W. Brackets
0b3f096c01
softgpu: Cache strides in draw pixel state.
2022-01-15 13:03:10 -08:00
Unknown W. Brackets
e9f3720e20
softgpu: Cache fog color draw pixel state.
2022-01-15 13:03:10 -08:00
Henrik Rydgård
165e0a12a9
Merge pull request #15305 from unknownbrackets/softgpu-opt
...
softgpu: Avoid double calculating screenpos
2022-01-15 20:58:09 +01:00
Unknown W. Brackets
880826bab4
softgpu: Remove disable of cached pixel state.
...
That mode is slower now (with the other state changes), and we don't want
to read gstate anymore anyway.
2022-01-15 11:22:50 -08:00
Unknown W. Brackets
cf3384c993
softgpu: Avoid double calculating screenpos.
2022-01-15 11:22:36 -08:00
Unknown W. Brackets
3134bd1ff9
softgpu: Cleanup push/pop atomic handling.
...
Two concurrent push/pops would hazard, though we don't do that.
This improves perf a bit by avoiding an atomic read again.
2022-01-15 00:02:31 -08:00
Unknown W. Brackets
c86a0157d8
softgpu: Remove old task.
...
Oops.
2022-01-14 20:52:20 -08:00
Unknown W. Brackets
f091225572
softgpu: Stop storing model pos.
...
We don't even use this anywhere else. Also skip needless Lerp on clip.
2022-01-14 20:36:09 -08:00
Unknown W. Brackets
d6a8cb2a0e
softgpu: Stop storing normal/worldnormal/worldpos.
...
This is only needed for lighting, which is applied right away.
This improves perf just simply from less data being copied.
2022-01-14 20:32:18 -08:00
Unknown W. Brackets
5a35525fd4
softgpu: Enqueue batches of prims when binning.
...
This cuts some thread overhead.
2022-01-14 20:19:32 -08:00
Unknown W. Brackets
46e3c71522
softgpu: Adjust binning thresholds.
...
This improves Persona 3 and LBP.
2022-01-13 23:14:45 -08:00
Unknown W. Brackets
dffc333120
softgpu: Avoid thread ordering hazard.
...
Must run the primitives in the right order. No shortcutting allowed.
2022-01-13 23:03:42 -08:00
Unknown W. Brackets
970e9c2f51
softgpu: Move threading into BinManager.
...
This threads much more effectively, across entire prim call.
2022-01-13 22:45:23 -08:00
Unknown W. Brackets
48ef4a18b1
softgpu: Handle scissor/range in BinManager.
2022-01-13 19:07:41 -08:00
Unknown W. Brackets
a0a9b1e89b
softgpu: Add class to manage and enqueue for bins.
...
For now, just forwarding.
2022-01-13 09:26:59 -08:00
Unknown W. Brackets
6839aac109
Debugger: Cache list PC for softgpu tagging.
...
Still slow, but improved.
2022-01-12 21:23:49 -08:00
Unknown W. Brackets
d962fb35d3
softgpu: Centralize more prim drawing state.
2022-01-12 21:23:49 -08:00
Unknown W. Brackets
d06f17d27b
softgpu: Move tex filter setting check to state.
2022-01-11 00:07:24 -08:00
Unknown W. Brackets
75ff3e44e6
softgpu: Move texture addresses to prim state.
2022-01-11 00:00:03 -08:00
Unknown W. Brackets
d5c5e9478e
softgpu: Prepare more state per prim call.
2022-01-10 22:12:35 -08:00
Unknown W. Brackets
9ec7d65c49
softgpu: Use func IDs instead of gstate more.
2022-01-10 22:12:35 -08:00
Unknown W. Brackets
d7a82ab7b8
softgpu: Compute func IDs once per batch of verts.
...
This saves a decent chunk of time, especially when many verts are being
drawn.
2022-01-10 22:12:35 -08:00
Unknown W. Brackets
e57730a97d
softgpu: Output normals to GE debugger.
2022-01-09 21:33:45 -08:00
Unknown W. Brackets
b915a82c41
softgpu: Correct decal doubling without alpha.
2022-01-09 12:23:55 -08:00
Unknown W. Brackets
72aa4be879
samplerjit: Skip processing alpha if unused.
2022-01-09 12:23:55 -08:00
Unknown W. Brackets
fe0b3dbd01
samplerjit: Fix alpha for 565 in linear lookup.
2022-01-09 11:08:46 -08:00
Henrik Rydgård
2d7a7fd34e
Merge pull request #15288 from unknownbrackets/softgpu-self
...
softgpu: Draw top left of rectangles first
2022-01-09 08:33:28 +01:00
Unknown W. Brackets
88ef2d1ac1
softgpu: Skip threading when rendering to self.
...
This will probably always be a problem to thread.
2022-01-08 21:05:08 -08:00
Unknown W. Brackets
6367d5dc8f
softgpu: Draw top left of rectangles first.
...
This helps when things do self-rendering, since this way we won't read
from things we've just written to when scaling down. See #11623 .
2022-01-08 20:53:01 -08:00
Unknown W. Brackets
8a00c2d233
GPU: Allow gcc/clang/icc runtime SSE4 usage.
...
All our builds before were only using SSE4 in jit...
2022-01-08 17:09:09 -08:00
Henrik Rydgård
eee62849fe
Merge pull request #15284 from unknownbrackets/softgpu-opt
...
Improve softgpu lighting accuracy and speed
2022-01-08 22:05:06 +01:00
Unknown W. Brackets
c7fc448869
softgpu: Use some SSE4 in triangle interpolation.
2022-01-08 11:38:07 -08:00
Unknown W. Brackets
3b1cc0d3b8
softgpu: Limit minX/maxX per line.
...
Only helps when single-threaded, though.
2022-01-08 10:04:52 -08:00
Unknown W. Brackets
9458610d96
softgpu: Avoid rsqrt path for normals.
...
In LittleBigPlanet, it's noticeable that the lighting is very off due to
the slight loss of accuracy - possibly due to cutoff or similar.
2022-01-07 23:22:57 -08:00