search for: vmlal_high_s32

Displaying 3 results from an estimated 3 matches for "vmlal_high_s32".

2015 Nov 20
2
[Aarch64 00/11] Patches to enable Aarch64
> On Nov 19, 2015, at 5:47 PM, John Ridges <jridges at masque.com> wrote: > > Any speedup from the intrinsics may just be swamped by the rest of the encode/decode process. But I think you really want SIG2WORD16 to be (vqmovns_s32(PSHR32((x), SIG_SHIFT))) Yes, you?re right. I forgot to run the vectors under qemu with my previous version (oh, the embarrassment!) Fixed forthcoming
2015 Nov 23
1
[Aarch64 v2 05/18] Add Neon intrinsics for Silk noise shape quantization.
...any of the "high" intrinsics for ARM64, e.g. instead of: int32x4_t coef1 = vmovl_s16(vget_high_s16(coef16)); you could use: int32x4_t coef1 = vmovl_high_s16(coef16); and instead of: int64x2_t b1 = vmlal_s32(b0, vget_high_s32(a0), vget_high_s32(coef0)); you could use: int64x2_t b1 = vmlal_high_s32(b0, a0, coef0); and instead of: int64x1_t c = vadd_s64(vget_low_s64(b3), vget_high_s64(b3)); int64x1_t cS = vshr_n_s64(c, 16); int32x2_t d = vreinterpret_s32_s64(cS); out = vget_lane_s32(d, 0); you could use: out = (opus_int32)(vaddvq_s64(b3) >> 16); I understand that ARM added these int...
2015 Nov 23
0
[Aarch64 v2 05/18] Add Neon intrinsics for Silk noise shape quantization.
...ny of the "high" intrinsics for ARM64, e.g. instead of: int32x4_t coef1 = vmovl_s16(vget_high_s16(coef16)); you could use: int32x4_t coef1 = vmovl_high_s16(coef16); and instead of: int64x2_t b1 = vmlal_s32(b0, vget_high_s32(a0), vget_high_s32(coef0)); you could use: int64x2_t b1 = vmlal_high_s32(b0, a0, coef0); and instead of: int64x1_t c = vadd_s64(vget_low_s64(b3), vget_high_s64(b3)); int64x1_t cS = vshr_n_s64(c, 16); int32x2_t d = vreinterpret_s32_s64(cS); out = vget_lane_s32(d, 0); you could use: out = (opus_int32)(vaddvq_s64(b3) >> 16); I understand that ARM added these int...