Commit Graph

965 Commits

Author SHA1 Message Date
Henrik Rydgård
cd92151de7 Add ARM64_NEON compile arch flag
This allows doing ARM64 builds without NEON support, and allows simplifying some checks.
2022-06-25 07:29:20 +02:00
Unknown W. Brackets
b60b1e84b2 softgpu: Correct stencil debugging.
Fixes crashes in GE debugger when viewing stencil.
2022-05-08 14:26:25 -07:00
Unknown W. Brackets
9367ce82ac softgpu: Fix viewport flag clean/dirty.
Fixes Split/Second effects, see #15501.
2022-05-08 14:24:21 -07:00
Unknown W. Brackets
f2bba34f84 softgpu: Combine memcpy into single. 2022-03-20 12:05:31 -07:00
Unknown W. Brackets
7389a36fad softgpu: Avoid unnecessary clearMode checks.
Already baked into the flag.
2022-03-20 12:05:31 -07:00
Unknown W. Brackets
374ccafa73 softgpu: Plug bad leak of bin queue data. 2022-03-13 13:34:37 -07:00
Unknown W. Brackets
a3f682fc5a softgpu: Reduce memory usage on 32-bit. 2022-03-13 13:34:14 -07:00
Unknown W. Brackets
e68b16af69 softgpu: Enure aligned bin queues.
On 32-bit, we still want these aligned to 16 bytes.
2022-03-13 13:33:19 -07:00
Unknown W. Brackets
da4b9e82f3 softgpu: Fix build with basic logging. 2022-03-05 00:08:09 -08:00
Henrik Rydgård
eb765a80f8
Merge pull request #15411 from unknownbrackets/softgpu-range
softgpu: Apply region x2/y2 as a scissor
2022-02-20 21:42:00 +01:00
Unknown W. Brackets
1d0936ea79 Debugger: Improve drawing range in softgpu.
We don't always want all of region, particularly if scissor is a regular
screen size.  This improves debugging in GoW.
2022-02-20 12:21:48 -08:00
Unknown W. Brackets
e3aabdc86c softgpu: Use region as a second scissor.
It's effectively a scissor in the common case of REGION1 being zero.
2022-02-20 12:01:35 -08:00
Unknown W. Brackets
ff5edb2bbc softgpu: Correct accounting for pixel center.
Filtering is still not perfect but this makes different orientations
better.
2022-02-20 10:50:59 -08:00
Unknown W. Brackets
df1a91ee25 samplerjit: Correct nearest negative texture clamp.
Was not clamping to zero when negative.
2022-02-20 10:25:00 -08:00
Unknown W. Brackets
e1eb4ba94a softgpu: Directly implement rectangle drawing. 2022-02-20 09:58:01 -08:00
Unknown W. Brackets
cc6491342e softgpu: Prepare dedicated rectangle path.
We're still sometimes using the slow rect-as-triangles path, let's do
something faster.  As a first step, just handle binning.
2022-02-20 09:38:51 -08:00
Unknown W. Brackets
6737d69f0a softgpu: Cleanup some now unused state. 2022-02-20 09:19:48 -08:00
Unknown W. Brackets
a88c9a0680 softgpu: Remove incorrect offsetting for X/Y. 2022-02-20 09:13:20 -08:00
Unknown W. Brackets
1bc3acf2ed softgpu: Use a const for subpixel screenpos factor. 2022-02-19 21:03:49 -08:00
Unknown W. Brackets
a66377fdf1 softgpu: Remove offset from screenpos.
This simplifies tighter calculations, and reduces the common magnitudes
we'll be dealing with.
2022-02-19 20:38:44 -08:00
Unknown W. Brackets
ad18833a4f samplerjit: Fix non-SSE4 bugs in jit. 2022-02-15 20:13:38 -08:00
Henrik Rydgård
df1a15938d
Merge pull request #15399 from unknownbrackets/softgpu-vertices
Convert more verts to rects, fix strip/fan skew on clip
2022-02-13 15:28:16 +01:00
Unknown W. Brackets
7cef06c191 softgpu: Track dirty vs really dirty per buffer.
When games draw and display with a frame lag, it becomes important that we
indicate really dirty for the correct buffer.  Since some triple buffer,
this attempts to track at the buffer level using 1024 byte granularity.
2022-02-12 15:27:18 -08:00
Unknown W. Brackets
3d4c1548b6 softgpu: Allow tri -> rect in transform. 2022-02-12 12:03:55 -08:00
Unknown W. Brackets
259b10d42a softgpu: Turn more tri strips into rects.
This catches a common case in Valkyrie Profile.
Rotation is resolved by just always using tl/br.
2022-02-12 11:33:42 -08:00
Unknown W. Brackets
2381f355c2 softgpu: Combine tris to rects with ignored z too. 2022-02-12 11:33:36 -08:00
Unknown W. Brackets
85cb4101dc softgpu: Cleanup todos on perspective correctness.
Only the texture appears to be perspective corrected.  Color is simply
linear.
2022-02-12 10:55:53 -08:00
Unknown W. Brackets
8e7bc80e4b softgpu: Avoid modifying source vertex data.
This was dangerous for strips and fans, which reuse the verts for
subsequent primitives.
2022-02-12 10:39:25 -08:00
Unknown W. Brackets
80e054b797 Debugger: Avoid write tag lookup on small alloc. 2022-02-06 09:28:48 -08:00
Unknown W. Brackets
99d7703d33 samplerjit: Precalculate DXT1/3/5 offsets.
This improves WALL-E by 8% overall.
2022-02-05 13:04:17 -08:00
Unknown W. Brackets
c91b51c8e1 samplerjit: Reduce DXT5 decode code size a bit. 2022-02-03 20:42:34 -08:00
Henrik Rydgård
f58d4dfcfe
Merge pull request #15372 from unknownbrackets/bmi2
Optimize jits with a bit of BMI2
2022-02-01 09:43:35 +01:00
Unknown W. Brackets
c2dd59084d samplerjit: Optimize DXT calc using BMI2. 2022-02-01 00:18:56 -08:00
Unknown W. Brackets
3e4afe2a0c samplerjit: Avoid RCX gymanstics with BMI2. 2022-01-31 22:33:09 -08:00
Unknown W. Brackets
4cadcea6da samplerjit: Decode colors with BMI2.
This only happens with nearest, though, so very small benefit.
2022-01-31 22:05:34 -08:00
Unknown W. Brackets
367525f875 softjit: Use PEXT to downsample colors.
This gives between 1-2% in many games.
2022-01-31 21:40:54 -08:00
Unknown W. Brackets
10bf375712 softjit: Use BMI2 to speed up dst color loads.
This is about 1% overall gain in some games.
2022-01-31 21:27:51 -08:00
Unknown W. Brackets
ad43380ef6 softjit: Use BMI to simplify some masking. 2022-01-31 19:50:48 -08:00
Unknown W. Brackets
be8c74cabe softgpu: Avoid flush on END.
We only, but always, flush when exiting list interp in FinishDeferred.
It's not necessary at END, and hurts for simple signals that don't stop
list processing.
2022-01-31 19:32:46 -08:00
Unknown W. Brackets
2479d52202 Global: Reduce includes of common headers.
In many places, string, map, or Common.h were included but not needed.
2022-01-30 16:35:33 -08:00
Unknown W. Brackets
1b2cf52bfe samplerjit: Fix non-shared CLUT on Linux.
Oops, good that CI will catch this now - I've broken this more than once.
2022-01-29 22:20:46 -08:00
Unknown W. Brackets
a40d32d581 samplerjit: Validate compile in a unit test. 2022-01-29 20:31:18 -08:00
Unknown W. Brackets
26a8d498d7 samplerjit: Correct level lookup in nearest. 2022-01-29 20:29:43 -08:00
Unknown W. Brackets
3387ab1711 samplerjit: Fix reg corruption in DXT funcs.
We'd cache something in a reg, but it'd no longer be there.
2022-01-29 20:29:08 -08:00
Unknown W. Brackets
d200ef40de samplerjit: Compile sampler funcs together.
We can't have the cache clear between nearest/linear, because then we'll
call a bunch of int3's.
2022-01-29 20:28:20 -08:00
Unknown W. Brackets
0d93200faf softjit: Add tests for compile success. 2022-01-29 18:47:36 -08:00
Unknown W. Brackets
3dde3efa9f softjit: Fix stencil bug running out of regs.
To apply the stencil test mask, we need another gen reg.
2022-01-29 18:31:40 -08:00
Unknown W. Brackets
ce775af76d softgpu: Skip new CLUT if identical.
Games often reupload CLUT data that is already there, this skips some
copying later in the bin manager.
2022-01-29 12:55:34 -08:00
Unknown W. Brackets
5976cad797 samplerjit: Reduce register waste.
A few registers were allocated longer than needed, which made requiring
stack more likely.
2022-01-29 09:47:06 -08:00
Unknown W. Brackets
eb70a90347 samplerjit: Avoid frac uv transfer to gen regs.
It should just stay in vec, this is more convenient anyway.
2022-01-28 23:50:54 -08:00