Henrik Rydgård
9bbdd1907d
Kind-of optimized ARM software skinning (non-NEON)
2013-11-13 18:11:35 +01:00
Henrik Rydgård
821a2f10f8
Delete obsolete code
2013-11-13 18:10:58 +01:00
Henrik Rydgard
6221dbaf5d
Optimize software skinning for x86.
...
Can't seem to get a win on Windows vs hardware skinning though, even though
draw calls drop by 2/3rd...
2013-11-13 18:10:58 +01:00
Henrik Rydgard
9333d3ea76
Vtx dec jit: Combine the scale and offset registers to save 1 more xmm register.
2013-11-13 18:10:58 +01:00
Henrik Rydgard
f0cacf46d0
No reason to involve the FPU when loading matrices
2013-11-13 18:10:58 +01:00
Henrik Rydgard
6976d6a3a0
Enable the softskinning optimizations that let us merge drawcalls
2013-11-13 18:10:57 +01:00
Henrik Rydgard
179934ec9f
Decode step by step when sw skinning
2013-11-13 18:10:57 +01:00
Henrik Rydgard
46313ced55
Prepare transform pipeline for step by step decoding
2013-11-13 18:10:57 +01:00
Henrik Rydgard
4f78eda23b
Save a couple of registers in the x86 vertex decoder jit by SIMD-ing prescale UV
2013-11-13 18:10:57 +01:00
Henrik Rydgard
7e67476b00
Simple unoptimized software skinning.
...
Does not take advantage of the possible reduction in state changes yet.
2013-11-13 18:10:57 +01:00
Henrik Rydgård
ab3fe9ba86
Extract the software transform code into its own file.
2013-11-13 14:56:34 +01:00
Henrik Rydgård
67ca4419fe
Fix minor bug where we didn't clear dirtyUniforms if early-outing from UpdateUniform.
2013-11-13 10:02:58 +01:00
Sacha
04b338f39e
Buildfix for sse2 builds.
2013-11-13 12:08:46 +10:00
Henrik Rydgard
cf15ec8a53
Add BBOX support (very conservative test)
2013-11-12 17:06:03 +01:00
Henrik Rydgard
54217deb16
Speed up UpdateUniforms a little
2013-11-12 17:06:03 +01:00
Henrik Rydgard
4b98e0d6d6
Optimize LoadClut a little
2013-11-12 17:06:03 +01:00
Henrik Rydgard
84f20a1cad
Small optimizations
2013-11-12 14:05:50 +01:00
raven02
2bca62b26e
Don't reset texture width/height unless the size is different
2013-11-11 21:12:43 +08:00
raven02
d2546bed5b
Regression fix c69ac64
2013-11-11 08:53:47 +08:00
Henrik Rydgard
c69ac64d83
Don't reconvert light colors if they don't change.
...
Also prepare for a possible further optimization in GLES_GPU::FastRunLoop
2013-11-10 11:18:58 +01:00
Henrik Rydgard
f4ad7c64e5
Fix issue with texcoord speed hack (bPrescaleUV) in software transform
...
(and also thus rectangles of course even when hw transform is enabled)
2013-11-10 11:18:26 +01:00
Henrik Rydgård
179068823c
Merge pull request #4491 from raven02/patch-15
...
Attempt for another matching framebuffer logic
2013-11-10 01:58:29 -08:00
Unknown W. Brackets
1633aa689c
Remove the extra process queues hack.
...
It seems like it's not helping anymore so it could be hurting.
2013-11-09 23:08:44 -08:00
raven02
7c6a4cf87e
Attempt for another matching framebuffer logic
2013-11-10 10:38:33 +08:00
Unknown W. Brackets
3f57f1f447
Disable the secondary texcache on mobile.
...
It helps in games like Final Fantasy 2 and Popolocrois, but it seems to
cause out of memory errors despite the checks.
2013-11-09 12:52:58 -08:00
Henrik Rydgard
63334698e1
Add temporary setting to disable the vertex decoder jit while we debug it
2013-11-09 18:16:26 +01:00
Henrik Rydgård
d04415ac5d
Merge pull request #4479 from unknownbrackets/perf
...
Small optimizations (armjit) for imms
2013-11-08 11:45:16 -08:00
Unknown W. Brackets
385f9a457e
Use GetPointerUnchecked in how, pre-checked areas.
2013-11-08 11:39:24 -08:00
Henrik Rydgard
0e542b6ecc
vtxdec: small omission
2013-11-08 20:38:49 +01:00
Henrik Rydgard
22d6b36005
Vertexdecoder "double": fix for x86 and very minor optimization for arm.
2013-11-08 20:03:28 +01:00
Henrik Rydgard
6b45c321b6
vtxdecjit: turn off excessive logging
2013-11-08 18:49:17 +01:00
Henrik Rydgard
381b6d0f05
VertexDecoder JIT: Add the last missing ones except morph, I think.
2013-11-08 12:43:46 +01:00
Henrik Rydgard
502cbc170a
Revert "Another attempt to sizing framebuffer based on fmt"
...
This reverts commit c0e8893560
.
2013-11-07 15:29:11 +01:00
Henrik Rydgård
c213c44050
Merge pull request #4468 from hrydgard/vtxdec-prescale
...
Add support for prescaled UV in vertex decoder JIT
2013-11-07 02:51:14 -08:00
Henrik Rydgard
c4e3dd14fd
Add commented-out code to save XMM4/XMM5.
...
According to all calling convention manuals I can find, we don't really
need to preserve them. If they become problematic as mentioned, we can
activate this.
2013-11-07 09:54:58 +01:00
raven02
c0e8893560
Another attempt to sizing framebuffer based on fmt
2013-11-07 10:56:41 +08:00
Henrik Rydgard
64bdb5e21d
Tiny optimization (32-bit) in GLES_GPU::FastRunLoop
2013-11-07 01:34:43 +01:00
Henrik Rydgard
b203da05e9
Prescale UV in vtx-dec-jit: Fix bugs, add ARM support
2013-11-07 01:24:53 +01:00
Henrik Rydgård
367bcf6d4f
Prescale in the vertex dec jit. Needs debugging.
2013-11-07 01:24:53 +01:00
Unknown W. Brackets
34398b7d0c
Avoid literal loads in the arm vertexjit.
2013-11-06 08:45:00 -08:00
Unknown W. Brackets
78400fd460
Avoid some dereferencing in gpu FastRunLoop.
2013-11-06 07:50:16 -08:00
Henrik Rydgård
51995a3d43
Vtx dec: After generating ARM, remember to flush the icache.
...
Will hopefully fix the random crashes in #4461 .
2013-11-06 16:14:40 +01:00
Henrik Rydgård
e3f6f25390
Buildfix for non-Windows non-ARM
2013-11-06 13:54:26 +01:00
Henrik Rydgård
ea9da85bdb
Missed one possible unaligned access
2013-11-06 13:14:49 +01:00
Henrik Rydgård
b3fdfc01c8
ARM vtx dec: Avoid all unaligned accesses entirely.
...
Seeing so much contradictory information on the support and performance
of these.
2013-11-06 12:17:41 +01:00
Henrik Rydgård
1e158fa652
ARM vtx dec: Preserving our FP scratch register appears to improve
...
stability.
Also added some logging.
2013-11-06 11:47:26 +01:00
Henrik Rydgård
b19d41f9a8
Now that LDRH works, use it where appropriate
2013-11-06 10:51:21 +01:00
Henrik Rydgard
0eb3d79de9
x86 VertexDecoder jit: Fix typo in 16-bit weight decoder. May fix crashes.
2013-11-05 21:06:43 +01:00
Henrik Rydgård
a03e5c6de0
Merge pull request #4460 from hrydgard/vertex-decoder-jit
...
Vertex decoder JIT
2013-11-05 07:30:58 -08:00
Henrik Rydgård
7bf8a4dc5e
We need to use a signed VCVT to float in PosS*Through
2013-11-05 12:17:18 +01:00