mp_words are used only on machines that support long long arithmetic.
s_mp_mod_d() was deleted. It was not being used and was not part of the
public API. The code that computes squares in s_mp_sqr was broken out
into a separate new function s_mpv_sqr_add_prop(), which is a target for
assembly language optimization. New function s_mpv_div_2dx1d(), also a
target for assembly optimization. These changes made X86 benchmark time
go from 22.5 seconds to 8.3 seconds on my reference test system.