search for: t_s32x2

Displaying 6 results from an estimated 6 matches for "t_s32x2".

Did you mean: t_s32x4
2017 Apr 26
2
2 patches related to silk_biquad_alt() optimization
...NEON kernels which uses vqrdmulh_lane_s32() to do the multiplication and rounding, where A_Q28_s32x{2,4} stores doubled -A_Q28[]: static inline void silk_biquad_alt_stride1_kernel(const int32x2_t A_Q28_s32x2, const int32x4_t t_s32x4, int32x2_t *S_s32x2, int32x2_t *out32_Q14_s32x2) { int32x2_t t_s32x2; *out32_Q14_s32x2 = vadd_s32(*S_s32x2, vget_low_s32(t_s32x4)); /* silk_SMLAWB( S[ 0 ], B_Q28[ 0 ], in[ k ] ) */ *S_s32x2 = vreinterpret_s32_u64(vshr_n_ u64(vreinterpret_u64_s32(*S_s32x2), 32)); /* S[ 0 ] = S[ 1 ]; S[ 1 ] = 0;...
2017 May 15
2
2 patches related to silk_biquad_alt() optimization
...iplication and rounding, where A_Q28_s32x{2,4} stores doubled > -A_Q28[]: > > static inline void silk_biquad_alt_stride1_kernel(const int32x2_t > A_Q28_s32x2, const int32x4_t t_s32x4, int32x2_t *S_s32x2, int32x2_t > *out32_Q14_s32x2) > { > int32x2_t t_s32x2; > > *out32_Q14_s32x2 = vadd_s32(*S_s32x2, vget_low_s32(t_s32x4)); > /* silk_SMLAWB( S[ 0 ], B_Q28[ 0 ], in[ k ] > ) */ > *S_s32x2 = > vreinterpret_s32_u64(vshr_n_u64(vreinterpret_u64_s3...
2017 May 08
0
2 patches related to silk_biquad_alt() optimization
...lane_s32() to do the > multiplication and rounding, where A_Q28_s32x{2,4} stores doubled -A_Q28[]: > > static inline void silk_biquad_alt_stride1_kernel(const int32x2_t > A_Q28_s32x2, const int32x4_t t_s32x4, int32x2_t *S_s32x2, int32x2_t > *out32_Q14_s32x2) > { > int32x2_t t_s32x2; > > *out32_Q14_s32x2 = vadd_s32(*S_s32x2, vget_low_s32(t_s32x4)); > /* silk_SMLAWB( S[ 0 ], B_Q28[ 0 ], in[ k ] ) > */ > *S_s32x2 = vreinterpret_s32_u64(vshr_n_u6 > 4(vreinterpret_u64_s32(*S_s32x2), 32)); /* S[ 0 ] = S[ 1 ];...
2017 May 17
0
2 patches related to silk_biquad_alt() optimization
...28_s32x{2,4} stores doubled > > -A_Q28[]: > > > > static inline void silk_biquad_alt_stride1_kernel(const int32x2_t > > A_Q28_s32x2, const int32x4_t t_s32x4, int32x2_t *S_s32x2, int32x2_t > > *out32_Q14_s32x2) > > { > > int32x2_t t_s32x2; > > > > *out32_Q14_s32x2 = vadd_s32(*S_s32x2, vget_low_s32(t_s32x4)); > > /* silk_SMLAWB( S[ 0 ], B_Q28[ 0 ], in[ k ] > > ) */ > > *S_s32x2 = > > vreinterpret_s32_u64(vsh...
2017 Apr 25
2
2 patches related to silk_biquad_alt() optimization
On Mon, Apr 24, 2017 at 5:52 PM, Jean-Marc Valin <jmvalin at jmvalin.ca> wrote: > On 24/04/17 08:03 PM, Linfeng Zhang wrote: > > Tested on my chromebook, when stride (channel) == 1, the optimization > > has no gain compared with C function. > > You mean that the Neon code is the same speed as the C code for > stride==1? This is not terribly surprising for an IIRC
2016 Jul 14
6
Several patches of ARM NEON optimization
I rebased my previous 3 patches to the current master with minor changes. Patches 1 to 3 replace all my previous submitted patches. Patches 4 and 5 are new. Thanks, Linfeng Zhang