search for: vmovl_high_s16

Displaying 3 results from an estimated 3 matches for "vmovl_high_s16".

2015 Nov 20
2
[Aarch64 00/11] Patches to enable Aarch64
> On Nov 19, 2015, at 5:47 PM, John Ridges <jridges at masque.com> wrote: > > Any speedup from the intrinsics may just be swamped by the rest of the encode/decode process. But I think you really want SIG2WORD16 to be (vqmovns_s32(PSHR32((x), SIG_SHIFT))) Yes, you?re right. I forgot to run the vectors under qemu with my previous version (oh, the embarrassment!) Fixed forthcoming
2015 Nov 23
1
[Aarch64 v2 05/18] Add Neon intrinsics for Silk noise shape quantization.
...;> wrote: Hi Jonathan. I really, really hate to bring this up this late in the game, but I just noticed that your NEON code doesn't use any of the "high" intrinsics for ARM64, e.g. instead of: int32x4_t coef1 = vmovl_s16(vget_high_s16(coef16)); you could use: int32x4_t coef1 = vmovl_high_s16(coef16); and instead of: int64x2_t b1 = vmlal_s32(b0, vget_high_s32(a0), vget_high_s32(coef0)); you could use: int64x2_t b1 = vmlal_high_s32(b0, a0, coef0); and instead of: int64x1_t c = vadd_s64(vget_low_s64(b3), vget_high_s64(b3)); int64x1_t cS = vshr_n_s64(c, 16); int32x2_t d = vreinterpre...
2015 Nov 23
0
[Aarch64 v2 05/18] Add Neon intrinsics for Silk noise shape quantization.
Hi Jonathan. I really, really hate to bring this up this late in the game, but I just noticed that your NEON code doesn't use any of the "high" intrinsics for ARM64, e.g. instead of: int32x4_t coef1 = vmovl_s16(vget_high_s16(coef16)); you could use: int32x4_t coef1 = vmovl_high_s16(coef16); and instead of: int64x2_t b1 = vmlal_s32(b0, vget_high_s32(a0), vget_high_s32(coef0)); you could use: int64x2_t b1 = vmlal_high_s32(b0, a0, coef0); and instead of: int64x1_t c = vadd_s64(vget_low_s64(b3), vget_high_s64(b3)); int64x1_t cS = vshr_n_s64(c, 16); int32x2_t d = vreinterpre...