Disable it for now however, as it's slightly slower
than SSE1 for the few taps we're using.
From testing, it's 10-20% faster when number of taps are increased.
The AVX path might need some more tuning, but it's fair to
assume the algorithm is memory bound.