Displaying 6 results from an estimated 6 matches for "t_s32x2".
Did you mean:
  t_s32x4
  
2017 Apr 26
2
2 patches related to silk_biquad_alt() optimization
...NEON kernels which uses vqrdmulh_lane_s32() to do the
multiplication and rounding, where A_Q28_s32x{2,4} stores doubled -A_Q28[]:
static inline void silk_biquad_alt_stride1_kernel(const int32x2_t
A_Q28_s32x2, const int32x4_t t_s32x4, int32x2_t *S_s32x2, int32x2_t
*out32_Q14_s32x2)
{
    int32x2_t t_s32x2;
    *out32_Q14_s32x2 = vadd_s32(*S_s32x2, vget_low_s32(t_s32x4));
                 /* silk_SMLAWB( S[ 0 ], B_Q28[ 0 ], in[ k ] )
                   */
    *S_s32x2         = vreinterpret_s32_u64(vshr_n_
u64(vreinterpret_u64_s32(*S_s32x2), 32)); /* S[ 0 ] = S[ 1 ]; S[ 1 ] = 0;...
2017 May 15
2
2 patches related to silk_biquad_alt() optimization
...iplication and rounding, where A_Q28_s32x{2,4} stores doubled
>     -A_Q28[]:
> 
>     static inline void silk_biquad_alt_stride1_kernel(const int32x2_t
>     A_Q28_s32x2, const int32x4_t t_s32x4, int32x2_t *S_s32x2, int32x2_t
>     *out32_Q14_s32x2)
>     {
>         int32x2_t t_s32x2;
> 
>         *out32_Q14_s32x2 = vadd_s32(*S_s32x2, vget_low_s32(t_s32x4));  
>                              /* silk_SMLAWB( S[ 0 ], B_Q28[ 0 ], in[ k ]
>     )                                */
>         *S_s32x2         =
>     vreinterpret_s32_u64(vshr_n_u64(vreinterpret_u64_s3...
2017 May 08
0
2 patches related to silk_biquad_alt() optimization
...lane_s32() to do the
> multiplication and rounding, where A_Q28_s32x{2,4} stores doubled -A_Q28[]:
>
> static inline void silk_biquad_alt_stride1_kernel(const int32x2_t
> A_Q28_s32x2, const int32x4_t t_s32x4, int32x2_t *S_s32x2, int32x2_t
> *out32_Q14_s32x2)
> {
>     int32x2_t t_s32x2;
>
>     *out32_Q14_s32x2 = vadd_s32(*S_s32x2, vget_low_s32(t_s32x4));
>                    /* silk_SMLAWB( S[ 0 ], B_Q28[ 0 ], in[ k ] )
>                      */
>     *S_s32x2         = vreinterpret_s32_u64(vshr_n_u6
> 4(vreinterpret_u64_s32(*S_s32x2), 32)); /* S[ 0 ] = S[ 1 ];...
2017 May 17
0
2 patches related to silk_biquad_alt() optimization
...28_s32x{2,4} stores doubled
> >     -A_Q28[]:
> >
> >     static inline void silk_biquad_alt_stride1_kernel(const int32x2_t
> >     A_Q28_s32x2, const int32x4_t t_s32x4, int32x2_t *S_s32x2, int32x2_t
> >     *out32_Q14_s32x2)
> >     {
> >         int32x2_t t_s32x2;
> >
> >         *out32_Q14_s32x2 = vadd_s32(*S_s32x2, vget_low_s32(t_s32x4));
> >                              /* silk_SMLAWB( S[ 0 ], B_Q28[ 0 ], in[ k ]
> >     )                                */
> >         *S_s32x2         =
> >     vreinterpret_s32_u64(vsh...
2017 Apr 25
2
2 patches related to silk_biquad_alt() optimization
On Mon, Apr 24, 2017 at 5:52 PM, Jean-Marc Valin <jmvalin at jmvalin.ca> wrote:
> On 24/04/17 08:03 PM, Linfeng Zhang wrote:
> > Tested on my chromebook, when stride (channel) == 1, the optimization
> > has no gain compared with C function.
>
> You mean that the Neon code is the same speed as the C code for
> stride==1? This is not terribly surprising for an IIRC
2016 Jul 14
6
Several patches of ARM NEON optimization
I rebased my previous 3 patches to the current master with minor changes.
Patches 1 to 3 replace all my previous submitted patches.
Patches 4 and 5 are new.
Thanks,
Linfeng Zhang