Unknown W. Brackets
dfe113e846
Merge pull request #17634 from fp64/macro-x86-loadu
...
Streamline x86 SSE workaround
2023-06-27 23:01:41 -07:00
M4xw
99ce3125df
[Softgpu] Fix AArch64 oversight
2023-06-27 17:20:11 +02:00
fp64
436b49c4f2
Streamline x86 SSE workaround
...
Seems clearer than using #ifdef's at each site. Also rationale
is clearly spelled out, one 'Go to definition' away from any instance.
2023-06-27 00:30:01 -04:00
Henrik Rydgård
08d578dce9
Merge pull request #17618 from unknownbrackets/softgpu-opt-cast
...
Optimize casts in softgpu
2023-06-25 07:55:30 +02:00
Unknown W. Brackets
15b66ba6c0
softgpu: Make SIMD on x86_32 a bit safer.
2023-06-24 14:49:23 -07:00
Unknown W. Brackets
795de9b164
softgpu: Use SIMD for more Vec4 casts.
...
A number of these were falling back to some pretty terrible code.
Thanks to fp64 for noticing.
2023-06-24 12:36:44 -07:00
Unknown W. Brackets
a7fa37d114
softgpu: Use SIMD more for dot products.
2023-06-12 19:54:32 -07:00
Unknown W. Brackets
b55dbdab7f
softgpu: Use NEON for some color conv.
2023-01-07 19:06:34 -08:00
Unknown W. Brackets
a7b7bf7826
Global: Set many read-only params as const.
...
This makes what they do and which args to use clearer, if nothing else.
2022-12-10 21:13:36 -08:00
Henrik Rydgård
37b0c90a2d
Silence address-sanitizer warnings in Math3D.h on ARM64 (not very serious but good to fix)
2022-12-09 23:47:42 +01:00
Unknown W. Brackets
b2e6a086dc
softgpu: Reduce size of VertexData texture coords.
...
There's no real benefit to this with only two values.
Not much of a gain perf wise, but still good to transfer less data.
2022-09-12 21:10:46 -07:00
Unknown W. Brackets
b90fc7137f
softgpu: Correct accuracy of fog calculation.
...
This matches values from a PSP exactly, with the help of immediate mode
vertex values (since this directly allows specifying the fog factor
without any floating point math.)
2022-09-11 08:24:40 -07:00
Henrik Rydgård
cd92151de7
Add ARM64_NEON compile arch flag
...
This allows doing ARM64 builds without NEON support, and allows simplifying some checks.
2022-06-25 07:29:20 +02:00
Unknown W. Brackets
163fa352e8
softgpu: Avoid some unaligned access on x86_32.
2022-03-13 12:44:58 -07:00
Unknown W. Brackets
8a00c2d233
GPU: Allow gcc/clang/icc runtime SSE4 usage.
...
All our builds before were only using SSE4 in jit...
2022-01-08 17:09:09 -08:00
Unknown W. Brackets
43f71884ee
softgpu: Clarify internal matrix multiply usage.
2022-01-07 17:53:24 -08:00
Unknown W. Brackets
e7d66f2029
softgpu: Reuse SSE/NEON matrix code.
2022-01-06 21:19:47 -08:00
Unknown W. Brackets
079b67e7ed
softgpu: Use common SIMD matrix multiplies.
2022-01-06 21:19:47 -08:00
Unknown W. Brackets
2d8fdd8cf4
Math3D: Allow construction from NEON vectors.
...
This makes it match SSE and easier to keep things generic. Will impact
alignment of non-packed Vec2/Vec3.
2021-11-28 08:24:53 -08:00
Unknown W. Brackets
fb6fadbbb7
softgpu: Fast path rectangles as fans.
...
Some games, such as Legend of Heroes III, use fans instead of strips.
2021-11-14 18:31:45 -08:00
Henrik Rydgård
a498f164ee
vmulq_laneq_f32 not supported on ARM32
2021-10-31 16:32:45 +01:00
Henrik Rydgård
fdacf751ce
NEON/SSE-optimize some matrix multiplications used by software transform
...
Will hopefully reclaim any potential speed loss from the recent
refactor.
2021-10-31 13:36:34 +01:00
Unknown W. Brackets
2f63f9999d
GPU: Normalize 0 to 1 always in software lighting.
...
See #14167 . This seems to be consistent.
2021-02-27 23:51:45 -08:00
Henrik Rydgård
9e41fafd0d
Move math and some file and data conversion files out from native to Common.
...
Buildfixing
Move some file util files
Buildfix
Move KeyMap.cpp/h to Core where they belong better.
libretro buildfix attempt
Move ini_file
More buildfixes
2020-10-04 09:12:46 +02:00
Henrik Rydgård
510229b68b
SoftGPU: Detect through-mode rectangles from triangle strips
2019-10-27 20:54:36 +01:00
xebra
62aaf6336a
Math3D: Something wrong with hand simd optimization in vec2<float>, so it causes very slow down.
...
However, compiler optimization is faster enough, so removed it.
2018-10-07 23:54:17 +09:00
xebra
d0682d7829
[spline/bezier]Move SIMD optimization of vector operations to Math3D.h.
...
Needs rebuild to avoid a dialog confirmation on Visual Studio.
2018-10-07 23:53:43 +09:00
xebra
62ad5fe546
Fix namespace Vec2f.
2018-10-07 23:53:41 +09:00
Henrik Rydgård
45cfda4aa0
Small refactoring in VertexDecoderCommon
2018-03-05 00:03:47 +01:00
Henrik Rydgård
7bb427e6f1
Buildfix
2017-08-31 17:24:34 +02:00
Henrik Rydgård
6a1fa728d8
Remove Globals.h
2017-08-31 17:15:22 +02:00
Henrik Rydgård
91783a3281
SIMD-optimize some data conv routines used in uniform updates.
2017-08-20 11:43:35 +02:00
Unknown W. Brackets
4fb7e43af8
SoftGPU: Grab 4 S/T coords in non-through too.
2017-04-23 11:11:16 -07:00
Unknown W. Brackets
3142462ac6
SoftGPU: Rasterize triangles in chunks of 4 pixels.
...
Not very optimal yet.
2017-04-23 10:37:11 -07:00
Unknown W. Brackets
5ee062c681
Try to optimize bezier color sampling.
2015-04-18 12:47:21 -07:00
Unknown W. Brackets
f070d6f5ed
Use SSE when generating spline normals.
2015-02-25 19:22:48 -08:00
Unknown W. Brackets
90605520a1
Add conversions between Vec3f and Vec3Packedf.
2015-02-22 13:16:07 -08:00
Unknown W. Brackets
ef73487fca
Fix Vec4::SetZero() not clearing all lanes.
2014-12-13 10:35:16 -08:00
Unknown W. Brackets
9f7dbec050
Missing include for Linux/etc.
2014-10-31 09:51:17 -07:00
Unknown W. Brackets
eee3ac79f4
Always clamp in ToRGB[A]?().
...
Before we only clamped with SSE, better to be consistent. This may also
be slightly faster.
2014-10-31 09:07:54 -07:00
Henrik Rydgard
6304d60b40
Convert 4x4 to 4x3 matrices where possible (except bones)
2014-09-18 23:08:46 +02:00
Henrik Rydgard
bf7a4f9097
D3D: Use fixed constant registers for vertex shaders too.
2014-09-10 13:43:35 +02:00
Tony Wasserka
d09b9fa6a1
Math3D: Change the vector swizzlers to return const objects.
...
Otherwise, people might be tempted to do things like "some_vec4.xyz() = some_vec3", which compiles fine but does not do the expected thing because xyz() does not return references.
2014-08-17 18:39:02 +02:00
Unknown W. Brackets
56b83af1f0
Don't use aligned loads in non-inlined funcs.
...
I'm wanting things to stay in registers, but that's not realistic for
arguments. Force inline the others. May help #5699 .
2014-03-23 12:09:17 -07:00
Henrik Rydgard
bc121242b3
Use fast_math matrix multiplication for culling and sw transform
2014-03-22 14:40:09 +01:00
Unknown W. Brackets
a8a299c2e3
Fix ToRGB/ToRGBA possible accuracy loss.
...
It was always like this, but not used as much before. Shifts are fast and
it eneds to sum anyway, there should not be any benefit to multiplying as
floats, and it will probably lose accuracy.
2014-03-18 22:56:27 -07:00
Unknown W. Brackets
416df17088
Inline From/ToRGB(A) to avoid losing SSE.
...
Otherwise it has to store it, which I'd like to avoid.
2014-03-17 23:03:04 -07:00
Unknown W. Brackets
6630e45eff
Just add a packed version of Vec3f.
...
This way we can have it aligned to memory where needed. I think it'd be
better to avoid this if possible so that we can actually vectorize
spline/etc. code.
Fixes #5673 .
2014-03-17 06:59:40 -07:00
Unknown W. Brackets
dd140b73bb
softgpu: Use SSE for gouraud shading.
2014-03-16 14:29:22 -07:00
Unknown W. Brackets
473fb866e6
softgpu: Implement vertex preview.
...
And move ConvertMatrix4x3To4x4() into a common place since there were
differing implementations, which was only confusing.
2013-12-29 13:45:10 -08:00