Commit Graph

77 Commits

Author SHA1 Message Date
Unknown W. Brackets
dfe113e846
Merge pull request #17634 from fp64/macro-x86-loadu
Streamline x86 SSE workaround
2023-06-27 23:01:41 -07:00
M4xw
99ce3125df [Softgpu] Fix AArch64 oversight 2023-06-27 17:20:11 +02:00
fp64
436b49c4f2 Streamline x86 SSE workaround
Seems clearer than using #ifdef's at each site. Also rationale
is clearly spelled out, one 'Go to definition' away from any instance.
2023-06-27 00:30:01 -04:00
Henrik Rydgård
08d578dce9
Merge pull request #17618 from unknownbrackets/softgpu-opt-cast
Optimize casts in softgpu
2023-06-25 07:55:30 +02:00
Unknown W. Brackets
15b66ba6c0 softgpu: Make SIMD on x86_32 a bit safer. 2023-06-24 14:49:23 -07:00
Unknown W. Brackets
795de9b164 softgpu: Use SIMD for more Vec4 casts.
A number of these were falling back to some pretty terrible code.
Thanks to fp64 for noticing.
2023-06-24 12:36:44 -07:00
Unknown W. Brackets
a7fa37d114 softgpu: Use SIMD more for dot products. 2023-06-12 19:54:32 -07:00
Unknown W. Brackets
b55dbdab7f softgpu: Use NEON for some color conv. 2023-01-07 19:06:34 -08:00
Unknown W. Brackets
a7b7bf7826 Global: Set many read-only params as const.
This makes what they do and which args to use clearer, if nothing else.
2022-12-10 21:13:36 -08:00
Henrik Rydgård
37b0c90a2d Silence address-sanitizer warnings in Math3D.h on ARM64 (not very serious but good to fix) 2022-12-09 23:47:42 +01:00
Unknown W. Brackets
b2e6a086dc softgpu: Reduce size of VertexData texture coords.
There's no real benefit to this with only two values.
Not much of a gain perf wise, but still good to transfer less data.
2022-09-12 21:10:46 -07:00
Unknown W. Brackets
b90fc7137f softgpu: Correct accuracy of fog calculation.
This matches values from a PSP exactly, with the help of immediate mode
vertex values (since this directly allows specifying the fog factor
without any floating point math.)
2022-09-11 08:24:40 -07:00
Henrik Rydgård
cd92151de7 Add ARM64_NEON compile arch flag
This allows doing ARM64 builds without NEON support, and allows simplifying some checks.
2022-06-25 07:29:20 +02:00
Unknown W. Brackets
163fa352e8 softgpu: Avoid some unaligned access on x86_32. 2022-03-13 12:44:58 -07:00
Unknown W. Brackets
8a00c2d233 GPU: Allow gcc/clang/icc runtime SSE4 usage.
All our builds before were only using SSE4 in jit...
2022-01-08 17:09:09 -08:00
Unknown W. Brackets
43f71884ee softgpu: Clarify internal matrix multiply usage. 2022-01-07 17:53:24 -08:00
Unknown W. Brackets
e7d66f2029 softgpu: Reuse SSE/NEON matrix code. 2022-01-06 21:19:47 -08:00
Unknown W. Brackets
079b67e7ed softgpu: Use common SIMD matrix multiplies. 2022-01-06 21:19:47 -08:00
Unknown W. Brackets
2d8fdd8cf4 Math3D: Allow construction from NEON vectors.
This makes it match SSE and easier to keep things generic.  Will impact
alignment of non-packed Vec2/Vec3.
2021-11-28 08:24:53 -08:00
Unknown W. Brackets
fb6fadbbb7 softgpu: Fast path rectangles as fans.
Some games, such as Legend of Heroes III, use fans instead of strips.
2021-11-14 18:31:45 -08:00
Henrik Rydgård
a498f164ee vmulq_laneq_f32 not supported on ARM32 2021-10-31 16:32:45 +01:00
Henrik Rydgård
fdacf751ce NEON/SSE-optimize some matrix multiplications used by software transform
Will hopefully reclaim any potential speed loss from the recent
refactor.
2021-10-31 13:36:34 +01:00
Unknown W. Brackets
2f63f9999d GPU: Normalize 0 to 1 always in software lighting.
See #14167.  This seems to be consistent.
2021-02-27 23:51:45 -08:00
Henrik Rydgård
9e41fafd0d Move math and some file and data conversion files out from native to Common.
Buildfixing

Move some file util files

Buildfix

Move KeyMap.cpp/h to Core where they belong better.

libretro buildfix attempt

Move ini_file

More buildfixes
2020-10-04 09:12:46 +02:00
Henrik Rydgård
510229b68b SoftGPU: Detect through-mode rectangles from triangle strips 2019-10-27 20:54:36 +01:00
xebra
62aaf6336a Math3D: Something wrong with hand simd optimization in vec2<float>, so it causes very slow down.
However, compiler optimization is faster enough, so removed it.
2018-10-07 23:54:17 +09:00
xebra
d0682d7829 [spline/bezier]Move SIMD optimization of vector operations to Math3D.h.
Needs rebuild to avoid a dialog confirmation on Visual Studio.
2018-10-07 23:53:43 +09:00
xebra
62ad5fe546 Fix namespace Vec2f. 2018-10-07 23:53:41 +09:00
Henrik Rydgård
45cfda4aa0 Small refactoring in VertexDecoderCommon 2018-03-05 00:03:47 +01:00
Henrik Rydgård
7bb427e6f1 Buildfix 2017-08-31 17:24:34 +02:00
Henrik Rydgård
6a1fa728d8 Remove Globals.h 2017-08-31 17:15:22 +02:00
Henrik Rydgård
91783a3281 SIMD-optimize some data conv routines used in uniform updates. 2017-08-20 11:43:35 +02:00
Unknown W. Brackets
4fb7e43af8 SoftGPU: Grab 4 S/T coords in non-through too. 2017-04-23 11:11:16 -07:00
Unknown W. Brackets
3142462ac6 SoftGPU: Rasterize triangles in chunks of 4 pixels.
Not very optimal yet.
2017-04-23 10:37:11 -07:00
Unknown W. Brackets
5ee062c681 Try to optimize bezier color sampling. 2015-04-18 12:47:21 -07:00
Unknown W. Brackets
f070d6f5ed Use SSE when generating spline normals. 2015-02-25 19:22:48 -08:00
Unknown W. Brackets
90605520a1 Add conversions between Vec3f and Vec3Packedf. 2015-02-22 13:16:07 -08:00
Unknown W. Brackets
ef73487fca Fix Vec4::SetZero() not clearing all lanes. 2014-12-13 10:35:16 -08:00
Unknown W. Brackets
9f7dbec050 Missing include for Linux/etc. 2014-10-31 09:51:17 -07:00
Unknown W. Brackets
eee3ac79f4 Always clamp in ToRGB[A]?().
Before we only clamped with SSE, better to be consistent.  This may also
be slightly faster.
2014-10-31 09:07:54 -07:00
Henrik Rydgard
6304d60b40 Convert 4x4 to 4x3 matrices where possible (except bones) 2014-09-18 23:08:46 +02:00
Henrik Rydgard
bf7a4f9097 D3D: Use fixed constant registers for vertex shaders too. 2014-09-10 13:43:35 +02:00
Tony Wasserka
d09b9fa6a1 Math3D: Change the vector swizzlers to return const objects.
Otherwise, people might be tempted to do things like "some_vec4.xyz() = some_vec3", which compiles fine but does not do the expected thing because xyz() does not return references.
2014-08-17 18:39:02 +02:00
Unknown W. Brackets
56b83af1f0 Don't use aligned loads in non-inlined funcs.
I'm wanting things to stay in registers, but that's not realistic for
arguments.  Force inline the others.  May help #5699.
2014-03-23 12:09:17 -07:00
Henrik Rydgard
bc121242b3 Use fast_math matrix multiplication for culling and sw transform 2014-03-22 14:40:09 +01:00
Unknown W. Brackets
a8a299c2e3 Fix ToRGB/ToRGBA possible accuracy loss.
It was always like this, but not used as much before.  Shifts are fast and
it eneds to sum anyway, there should not be any benefit to multiplying as
floats, and it will probably lose accuracy.
2014-03-18 22:56:27 -07:00
Unknown W. Brackets
416df17088 Inline From/ToRGB(A) to avoid losing SSE.
Otherwise it has to store it, which I'd like to avoid.
2014-03-17 23:03:04 -07:00
Unknown W. Brackets
6630e45eff Just add a packed version of Vec3f.
This way we can have it aligned to memory where needed.  I think it'd be
better to avoid this if possible so that we can actually vectorize
spline/etc. code.

Fixes #5673.
2014-03-17 06:59:40 -07:00
Unknown W. Brackets
dd140b73bb softgpu: Use SSE for gouraud shading. 2014-03-16 14:29:22 -07:00
Unknown W. Brackets
473fb866e6 softgpu: Implement vertex preview.
And move ConvertMatrix4x3To4x4() into a common place since there were
differing implementations, which was only confusing.
2013-12-29 13:45:10 -08:00