Switch from Euclid's algorithm to Stein's algorithm for computing GCD. This
avoids the (expensive) APInt division operation in favour of bit operations.
Remove all memory allocation from within the GCD loop by tweaking our `lshr`
implementation so it can operate in-place.
Differential Revision: https://reviews.llvm.org/D31968
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@300252 91177308-0d34-0410-b5e6-96231b3b80d8