thr3ads.net - search: "vget_low

Displaying 3 results from an estimated 3 matches for "vget_low_xxx".

[Aarch64 00/11] Patches to enable Aarch64

2015 Nov 20

[Aarch64 00/11] Patches to enable Aarch64

> On Nov 19, 2015, at 5:47 PM, John Ridges <jridges at masque.com> wrote: > > Any speedup from the intrinsics may just be swamped by the rest of the encode/decode process. But I think you really want SIG2WORD16 to be (vqmovns_s32(PSHR32((x), SIG_SHIFT))) Yes, you?re right. I forgot to run the vectors under qemu with my previous version (oh, the embarrassment!) Fixed forthcoming

[Aarch64 v2 05/18] Add Neon intrinsics for Silk noise shape quantization.

2015 Nov 23

[Aarch64 v2 05/18] Add Neon intrinsics for Silk noise shape quantization.

...t32x2_t d = vreinterpret_s32_s64(cS); out = vget_lane_s32(d, 0); you could use: out = (opus_int32)(vaddvq_s64(b3) >> 16); I understand that ARM added these intrinsics because "vget_high_xxx" generates an instruction in ARM64, and isn't just free the way it was in ARMv7 ("vget_low_xxx" is of course still free on both platforms). Other than the one-intrinsic optimizations, I?d rather keep the Neon intrinsics code compilable on ARMv7 as well as ARM64 ? the Neon code is a performance boost for both platforms, and I?d rather not litter it with #ifdef?s unless there?s a large d...

[Aarch64 v2 05/18] Add Neon intrinsics for Silk noise shape quantization.

2015 Nov 23

[Aarch64 v2 05/18] Add Neon intrinsics for Silk noise shape quantization.

...2x2_t d = vreinterpret_s32_s64(cS); out = vget_lane_s32(d, 0); you could use: out = (opus_int32)(vaddvq_s64(b3) >> 16); I understand that ARM added these intrinsics because "vget_high_xxx" generates an instruction in ARM64, and isn't just free the way it was in ARMv7 ("vget_low_xxx" is of course still free on both platforms). Regards, John Ridges

search for: vget_low_xxx