Current GCC versions know how to generate these instructions
properly and avoiding inline asm gives better code. The MULH
function for ARMv5 uses the same instruction and is also not
needed any more.
The MLS64 macro remains since negating an input would normally
not be allowed as it would fail for INT_MIN. In our uses, the
inputs never have this value and thus negating is safe.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Prior to ARMv6, the destination registers of the SMULL instruction
must be distinct from the first source register. Marking the
output early-clobber ensures it is allocated unique registers.
This restriction is dropped in ARMv6 and later, so allowing overlap
between input and output registers there might give better code.
Signed-off-by: Mans Rullgard <mans@mansr.com>