This takes the bilinear interpolation code from Skia. It
uses 4 bits of precision instead of 8. This lets it interpolate
two components at a time because the results fit in 16 bits.
The lower precision code is only used in the fallback code
and not in any of the specialized code for NEON. This means
pixman gives different results depending on the cpu which isn't
great. However, this was easiest and the NEON code doesn't
gain as much from using lower precision.
Skia actually uses even lower interpolation when working with
565 but that's harder to plug in right now, and this gives
a reasonable improvement.