TransparentSurface now scales in place instead of making a copy. This
is much faster than before.
Also BlendBlit::blit now takes a scale offset parameter to help with
vary large images being cropped, otherwise people can leave it to 0.
I optimized the NEON and Generic paths for ManagedSurface::blendBlitFrom
and the new TransparentSurface::blit. Now (on arm), the new blit function
matches the speed of the old blit function even with the added
inderections that the runtime extension detection code adds in.
Other than that, I made a benchmark for this code and you can make it
using this command:
CFLAGS="-DTEST_BLEND_SPEED" make test
I reverted wii to not use altivec anymore since it doesn't.
I also removed graphics/blit-neon.cpp from graphics/module.mk because
simply including the .cpp file in graphics/blit-alpha.cpp was a better
option because then I didn't need to instantiate every version of the
templates that I needed.