Displaying 3 results from an estimated 3 matches for "vmovl_high_s16".
2015 Nov 20
2
[Aarch64 00/11] Patches to enable Aarch64
> On Nov 19, 2015, at 5:47 PM, John Ridges <jridges at masque.com> wrote:
>
> Any speedup from the intrinsics may just be swamped by the rest of the encode/decode process. But I think you really want SIG2WORD16 to be (vqmovns_s32(PSHR32((x), SIG_SHIFT)))
Yes, you?re right. I forgot to run the vectors under qemu with my previous version (oh, the embarrassment!) Fixed forthcoming
2015 Nov 23
1
[Aarch64 v2 05/18] Add Neon intrinsics for Silk noise shape quantization.
...;> wrote:
Hi Jonathan.
I really, really hate to bring this up this late in the game, but I just noticed that your NEON code doesn't use any of the "high" intrinsics for ARM64, e.g. instead of:
int32x4_t coef1 = vmovl_s16(vget_high_s16(coef16));
you could use:
int32x4_t coef1 = vmovl_high_s16(coef16);
and instead of:
int64x2_t b1 = vmlal_s32(b0, vget_high_s32(a0), vget_high_s32(coef0));
you could use:
int64x2_t b1 = vmlal_high_s32(b0, a0, coef0);
and instead of:
int64x1_t c = vadd_s64(vget_low_s64(b3), vget_high_s64(b3));
int64x1_t cS = vshr_n_s64(c, 16);
int32x2_t d = vreinterpre...
2015 Nov 23
0
[Aarch64 v2 05/18] Add Neon intrinsics for Silk noise shape quantization.
Hi Jonathan.
I really, really hate to bring this up this late in the game, but I just
noticed that your NEON code doesn't use any of the "high" intrinsics for
ARM64, e.g. instead of:
int32x4_t coef1 = vmovl_s16(vget_high_s16(coef16));
you could use:
int32x4_t coef1 = vmovl_high_s16(coef16);
and instead of:
int64x2_t b1 = vmlal_s32(b0, vget_high_s32(a0), vget_high_s32(coef0));
you could use:
int64x2_t b1 = vmlal_high_s32(b0, a0, coef0);
and instead of:
int64x1_t c = vadd_s64(vget_low_s64(b3), vget_high_s64(b3));
int64x1_t cS = vshr_n_s64(c, 16);
int32x2_t d = vreinterpre...