Displaying 3 results from an estimated 3 matches for "vaddvq_s64".
Did you mean:
vaddq_s64
2015 Nov 20
2
[Aarch64 00/11] Patches to enable Aarch64
> On Nov 19, 2015, at 5:47 PM, John Ridges <jridges at masque.com> wrote:
>
> Any speedup from the intrinsics may just be swamped by the rest of the encode/decode process. But I think you really want SIG2WORD16 to be (vqmovns_s32(PSHR32((x), SIG_SHIFT)))
Yes, you?re right. I forgot to run the vectors under qemu with my previous version (oh, the embarrassment!) Fixed forthcoming
2015 Nov 23
1
[Aarch64 v2 05/18] Add Neon intrinsics for Silk noise shape quantization.
...et_high_s32(coef0));
you could use:
int64x2_t b1 = vmlal_high_s32(b0, a0, coef0);
and instead of:
int64x1_t c = vadd_s64(vget_low_s64(b3), vget_high_s64(b3));
int64x1_t cS = vshr_n_s64(c, 16);
int32x2_t d = vreinterpret_s32_s64(cS);
out = vget_lane_s32(d, 0);
you could use:
out = (opus_int32)(vaddvq_s64(b3) >> 16);
I understand that ARM added these intrinsics because "vget_high_xxx" generates an instruction in ARM64, and isn't just free the way it was in ARMv7 ("vget_low_xxx" is of course still free on both platforms).
Other than the one-intrinsic optimizations, I?d...
2015 Nov 23
0
[Aarch64 v2 05/18] Add Neon intrinsics for Silk noise shape quantization.
...et_high_s32(coef0));
you could use:
int64x2_t b1 = vmlal_high_s32(b0, a0, coef0);
and instead of:
int64x1_t c = vadd_s64(vget_low_s64(b3), vget_high_s64(b3));
int64x1_t cS = vshr_n_s64(c, 16);
int32x2_t d = vreinterpret_s32_s64(cS);
out = vget_lane_s32(d, 0);
you could use:
out = (opus_int32)(vaddvq_s64(b3) >> 16);
I understand that ARM added these intrinsics because "vget_high_xxx"
generates an instruction in ARM64, and isn't just free the way it was in
ARMv7 ("vget_low_xxx" is of course still free on both platforms).
Regards,
John Ridges