search for: state_q

Displaying 13 results from an estimated 13 matches for "state_q".

Did you mean: state_qs

silk_warped_autocorrelation_FIX() NEON optimization

2016 Jul 01

1

silk_warped_autocorrelation_FIX() NEON optimization

Hi all, I'm sending patch "Optimize silk_warped_autocorrelation_FIX() for ARM NEON" in an separate email. It is based on Tim’s aarch64v8 branch https://git.xiph.org/?p=users/tterribe/opus.git;a=shortlog;h=refs/heads/aarch64v8 Thanks for your comments. Linfeng

Several patches of ARM NEON optimization

2016 Jul 14

6

Several patches of ARM NEON optimization

I rebased my previous 3 patches to the current master with minor changes. Patches 1 to 3 replace all my previous submitted patches. Patches 4 and 5 are new. Thanks, Linfeng Zhang

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Jan 31

6

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

Hi, Attached is a patch with arm neon optimizations for silk_warped_autocorrelation_FIX(). Please review. Thanks, Felicia -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xiph.org/pipermail/opus/attachments/20170131/9a912bb4/attachment-0001.html> -------------- next part -------------- A non-text attachment was scrubbed... Name:

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Feb 06

2

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

...lation_FIX_c()'s kernel part is for( n = 0; n < length; n++ ) { tmp1_QS = silk_LSHIFT32( (opus_int32)input[ n ], QS ); /* Loop over allpass sections */ for( i = 0; i < order; i++ ) { /* Output of allpass section */ tmp2_QS = silk_SMLAWB( state_QS[ i ], state_QS[ i + 1 ] - tmp1_QS, warping_Q16 ); state_QS[ i ] = tmp1_QS; corr_QC[ i ] += silk_RSHIFT64( silk_SMULL( tmp1_QS, state_QS[ 0 ] ), 2 * QS - QC ); tmp1_QS = tmp2_QS; } state_QS[ order ] = tmp1_QS; corr_QC[ order ] += silk_R...

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Feb 07

2

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

...PROC(in0(si+7) in1(si+6) in2(si+5) in3(si+4) in4(si+3) in5(si+2) > > in6(si+1) in7(si+0)) > > END FOR > > > > The worst thing is that corr_QC[] is so sensitive that any extra > > processing will make them wrong and propagate to the next loop (next 8 > > inputs). state_QS[] is a little better but still very sensitive. For > > instance, if adding PROC(in0(s11') in1(s10) in2(s9) in3(s8) in4(s7) > > in5(s6) in6(s5) in7(s4)) to the kernel loop (by looping one more time) > > and remove epilog 0, then all final results will be wrong. > > >...

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Feb 07

3

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

...(si+2) > > > in6(si+1) in7(si+0)) > > > END FOR > > > > > > The worst thing is that corr_QC[] is so sensitive that any extra > > > processing will make them wrong and propagate to the next loop > (next 8 > > > inputs). state_QS[] is a little better but still very sensitive. > For > > > instance, if adding PROC(in0(s11') in1(s10) in2(s9) in3(s8) in4(s7) > > > in5(s6) in6(s5) in7(s4)) to the kernel loop (by looping one more > time) > > > and remove epilog 0, then all final r...

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Feb 06

0

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

...=0 TO order-6 WITH i++ > PROC(in0(si+7) in1(si+6) in2(si+5) in3(si+4) in4(si+3) in5(si+2) > in6(si+1) in7(si+0)) > END FOR > > The worst thing is that corr_QC[] is so sensitive that any extra > processing will make them wrong and propagate to the next loop (next 8 > inputs). state_QS[] is a little better but still very sensitive. For > instance, if adding PROC(in0(s11') in1(s10) in2(s9) in3(s8) in4(s7) > in5(s6) in6(s5) in7(s4)) to the kernel loop (by looping one more time) > and remove epilog 0, then all final results will be wrong. > > That's why the...

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Apr 05

2

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

...gt;>> > > END FOR >>> > > >>> > > The worst thing is that corr_QC[] is so sensitive that any extra >>> > > processing will make them wrong and propagate to the next loop >>> (next 8 >>> > > inputs). state_QS[] is a little better but still very sensitive. >>> For >>> > > instance, if adding PROC(in0(s11') in1(s10) in2(s9) in3(s8) >>> in4(s7) >>> > > in5(s6) in6(s5) in7(s4)) to the kernel loop (by looping one more >>> time) >>&gt...

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Feb 07

0

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

...) in2(si+5) in3(si+4) in4(si+3) in5(si+2) > > in6(si+1) in7(si+0)) > > END FOR > > > > The worst thing is that corr_QC[] is so sensitive that any extra > > processing will make them wrong and propagate to the next loop (next 8 > > inputs). state_QS[] is a little better but still very sensitive. For > > instance, if adding PROC(in0(s11') in1(s10) in2(s9) in3(s8) in4(s7) > > in5(s6) in6(s5) in7(s4)) to the kernel loop (by looping one more time) > > and remove epilog 0, then all final results will be wrong. &gt...

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Apr 03

0

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

...; in6(si+1) in7(si+0)) >> > > END FOR >> > > >> > > The worst thing is that corr_QC[] is so sensitive that any extra >> > > processing will make them wrong and propagate to the next loop >> (next 8 >> > > inputs). state_QS[] is a little better but still very sensitive. >> For >> > > instance, if adding PROC(in0(s11') in1(s10) in2(s9) in3(s8) >> in4(s7) >> > > in5(s6) in6(s5) in7(s4)) to the kernel loop (by looping one more >> time) >> > > and remo...

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Apr 05

4

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

...t; > > > > The worst thing is that corr_QC[] is so sensitive > > that any extra > > > > processing will make them wrong and propagate to the > > next loop (next 8 > > > > inputs). state_QS[] is a little better but still > > very sensitive. For > > > > instance, if adding PROC(in0(s11') in1(s10) in2(s9) > > in3(s8) in4(s7) > > > > in5(s6) in6(s5) in7(s4)) to the kernel loop (by > >...

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Apr 05

0

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

...; > > > > > The worst thing is that corr_QC[] is so sensitive > that any extra > > > processing will make them wrong and propagate to the > next loop (next 8 > > > inputs). state_QS[] is a little better but still > very sensitive. For > > > instance, if adding PROC(in0(s11') in1(s10) in2(s9) > in3(s8) in4(s7) > > > in5(s6) in6(s5) in7(s4)) to the kernel loop (by > looping one mo...

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Apr 06

0

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

...sing up to 24 "taps" at the same time. If that's causing a slowdown, then it should be possible to do the processing in "sections". By that, I mean that you can implement (e.g.) an order-8 "kernel" that computes the correlations and also outputs the last element of state_QS_s32x4[0][0] back to input_QS[], so that it can be used to compute a new secion. 4) It's a minor detail, but the last element of corr_QC[] that's not currently vectorized could simply be vectorized independently outside the loop (and it's the same for all orders). Cheers, Jean-Marc...