This is aimed at and tested for the German Demo menu of Passport to Peril
In the original demo, the menu item for Songs is missing from the menu. In this fix we disable it (greyed out)
because the code for removing an item from the menu causes shifting of the subsequent menu items which
leads to unreliable code and potential future bugs if it's not accounted for.
Here I moved the simd paths to their own translation units and removed
their unessesary header files. I also made it so that less of the
translation units have template forward declarations.
I made it so that surface.cpp now chooses at runtime what simd path it
should take.
I'm taking GSOC in a slightly different direction. I will finish the
PowerPC blending/blitting optimizations, but first I'm going to focus
on the general Graphics::Surface and Graphics::ManagedSurface code for
now.
PowerPC's <altivec.h> header redefines bool to be __vector(4) __bool which
is weird, so I changed the prototypes of the functions to use int instead
of bool. Hopefully this fixes things.
Not all of Arm NEON intrinsics aren't included in the iOS simulator's
arm_neon.h file, so we just don't compile arm neon for the simulator
anymore. Also, arm_neon.h on Windows seems to be just an empty header
or atleast a header with only a few intrinsics of the many that should
be there.
Made it so that iOS doesn't use Arm NEON since it only supports a very
limited set of instructions (like it apparently doesn't have intrinics
for something as simple as bit shifting?). I also changed every float
literal in surface_simd_sse from a double literal to float because
windows x64 was complaining about it.
I was using just the GCC and CLANG macros to see what platform SCUMMVM
was being compiled on, but neglected the MSVC ones. This would lead it
to not compile on that compiler. I fixed that by adding those. I also
added the fallback simd implementation .cpp file into module.mk for the
ags engine.
Finished writing the code in surface_simd_sse.cpp. I also added a backup
option in case no processor simd extensions are found. In that case it
just defualts to the normal drawInnerGeneric. I also made
drawInnerGeneric a bit faster by moving certain things into compile
time. Tests were changed to also include SSE2.
Added a template specialization for 2bpp to 2bpp blits in
BITMAP::drawInner, makes 2bpp to 2bpp now around 2 times as fast as
normal 4bpp to 4bpp blitting.
Optimized most if not all code paths in BITMAP::draw. All blending modes
have been optimized with ARM NEON intrensics, and multiple different
source and destination formats are optimized. (for bytes per pixel the
following have been optimized, 1 and 1, 2 and 2, 4 and 4, 2 and 4).
After this, I am going to clean up this code and apply more optmizations
where I can, then make the SSE versions of the functions, and try to
optimize the slow path as much as I can. Then I will see what I can do
with BITMAP::stretched draw.