ARMv7 NEON VQRDMULH instruction implementation
VQRDMULH : Vector Saturating Rounding Doubling Multiply Returning High Half. VQRDMULH multiplies corresponding elements in two vectors, doubles the results, and places the most significant half of the final results in the destination vector. implement reference code https://github.com/google/gemmlowp/blob/master/fixedpoint/fixedpoint.h#L329 <code> // This function implements the same computation as the ARMv7 NEON VQRDMULH // instruction. template <> inline std:: int32_t SaturatingRoundingDoublingHighMul (std:: int32_t a, std:: int32_t b) { bool overflow = a == b && a == std::numeric_limits<std:: int32_t >:: min (); std:: int64_t a_64 (a); std:: int64_t b_64 (b); std:: int64_t ab_64 = a_64 * b_64; std:: int32_t nudge = ab_64 >= 0 ? ( 1 << 30 ) : ( 1 - ( 1 << 30 )); std:: int32_t ab_x2_high32 = static_cast <std:: int32_t >((ab_64 + nudge) / ( 1ll << 31 )); return overflow ? std::numeric_limits...