search for: corr_qc

Displaying 16 results from an estimated 16 matches for "corr_qc".

silk_warped_autocorrelation_FIX() NEON optimization

2016 Jul 01

1

silk_warped_autocorrelation_FIX() NEON optimization

Hi all, I'm sending patch "Optimize silk_warped_autocorrelation_FIX() for ARM NEON" in an separate email. It is based on Tim’s aarch64v8 branch https://git.xiph.org/?p=users/tterribe/opus.git;a=shortlog;h=refs/heads/aarch64v8 Thanks for your comments. Linfeng

Several patches of ARM NEON optimization

2016 Jul 14

6

Several patches of ARM NEON optimization

I rebased my previous 3 patches to the current master with minor changes. Patches 1 to 3 replace all my previous submitted patches. Patches 4 and 5 are new. Thanks, Linfeng Zhang

Pointer size bugs when compiling for android arm64?

2018 May 08

2

Pointer size bugs when compiling for android arm64?

...piling for android arm64 using clang: CC silk/fixed/arm/warped_autocorrelation_FIX_neon_intr.lo silk/fixed/arm/warped_autocorrelation_FIX_neon_intr.c:43:37: warning: incompatible pointer types assigning to 'const long *' from 'long long *' [-Wincompatible-pointer-types] corr_QC_s64x2[ 0 ] = vld1q_s64( corr_QC + offset + 0 ); ^~~~~~~~~~~~~~~~~~~~ /Users/andrewl/android/toolchain-r16b-arm64-v8a/lib64/clang/5.0.300080/include/arm_neon.h:7628:46: note: expanded from macro 'vld1q_s64' __ret = (int64x2_t) __builtin_neon_vld1q_v(__p0...

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Feb 06

2

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

...opus_int32)input[ n ], QS ); /* Loop over allpass sections */ for( i = 0; i < order; i++ ) { /* Output of allpass section */ tmp2_QS = silk_SMLAWB( state_QS[ i ], state_QS[ i + 1 ] - tmp1_QS, warping_Q16 ); state_QS[ i ] = tmp1_QS; corr_QC[ i ] += silk_RSHIFT64( silk_SMULL( tmp1_QS, state_QS[ 0 ] ), 2 * QS - QC ); tmp1_QS = tmp2_QS; } state_QS[ order ] = tmp1_QS; corr_QC[ order ] += silk_RSHIFT64( silk_SMULL( tmp1_QS, state_QS[ 0 ] ), 2 * QS - QC ); } in which corr_QC[0, 1, ..., order] is the...

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Jan 31

6

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

Hi, Attached is a patch with arm neon optimizations for silk_warped_autocorrelation_FIX(). Please review. Thanks, Felicia -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xiph.org/pipermail/opus/attachments/20170131/9a912bb4/attachment-0001.html> -------------- next part -------------- A non-text attachment was scrubbed... Name:

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Apr 11

2

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

...ions and also outputs the last element of > state_QS_s32x4[0][0] back to input_QS[], so that it can be used to > compute a new secion. > Done. The speed is almost identical (slightly slower), however the extra bonus is code size saving. 4) It's a minor detail, but the last element of corr_QC[] that's not > currently vectorized could simply be vectorized independently outside > the loop (and it's the same for all orders). > Done. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xiph.org/pipermail/opus/attachments/20170411...

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Apr 13

0

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

...ent of >> state_QS_s32x4[0][0] back to input_QS[], so that it can be used to >> compute a new secion. >> > > Done. The speed is almost identical (slightly slower), however the extra > bonus is code size saving. > > 4) It's a minor detail, but the last element of corr_QC[] that's not >> currently vectorized could simply be vectorized independently outside >> the loop (and it's the same for all orders). >> > > Done. > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xiph.org/pipermail...

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Apr 05

4

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

...> > PROC(in0(si+7) in1(si+6) in2(si+5) in3(si+4) > > in4(si+3) in5(si+2) > > > > in6(si+1) in7(si+0)) > > > > END FOR > > > > > > > > The worst thing is that corr_QC[] is so sensitive > > that any extra > > > > processing will make them wrong and propagate to the > > next loop (next 8 > > > > inputs). state_QS[] is a little better but still > > very sen...

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Feb 07

2

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

...3) in7(s2)) > > /* continue */ > > is actually the expansion of the kernel loop > > FOR i=0 TO order-6 WITH i++ > > PROC(in0(si+7) in1(si+6) in2(si+5) in3(si+4) in4(si+3) in5(si+2) > > in6(si+1) in7(si+0)) > > END FOR > > > > The worst thing is that corr_QC[] is so sensitive that any extra > > processing will make them wrong and propagate to the next loop (next 8 > > inputs). state_QS[] is a little better but still very sensitive. For > > instance, if adding PROC(in0(s11') in1(s10) in2(s9) in3(s8) in4(s7) > > in5(s6) in6(s5...

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Apr 06

0

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

...ections". By that, I mean that you can implement (e.g.) an order-8 "kernel" that computes the correlations and also outputs the last element of state_QS_s32x4[0][0] back to input_QS[], so that it can be used to compute a new secion. 4) It's a minor detail, but the last element of corr_QC[] that's not currently vectorized could simply be vectorized independently outside the loop (and it's the same for all orders). Cheers, Jean-Marc On 05/04/17 02:13 PM, Linfeng Zhang wrote: > Thank Jean-Marc! > > The speedup percentages are all relative to the entire encoder. &...

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Feb 07

3

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

...ally the expansion of the kernel loop > > > FOR i=0 TO order-6 WITH i++ > > > PROC(in0(si+7) in1(si+6) in2(si+5) in3(si+4) in4(si+3) in5(si+2) > > > in6(si+1) in7(si+0)) > > > END FOR > > > > > > The worst thing is that corr_QC[] is so sensitive that any extra > > > processing will make them wrong and propagate to the next loop > (next 8 > > > inputs). state_QS[] is a little better but still very sensitive. > For > > > instance, if adding PROC(in0(s11') in1(s10) in2(s9) in3...

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Feb 06

0

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

...) in2(s7) in3(s6) in4(s5) in5(s4) in6(s3) in7(s2)) > /* continue */ > is actually the expansion of the kernel loop > FOR i=0 TO order-6 WITH i++ > PROC(in0(si+7) in1(si+6) in2(si+5) in3(si+4) in4(si+3) in5(si+2) > in6(si+1) in7(si+0)) > END FOR > > The worst thing is that corr_QC[] is so sensitive that any extra > processing will make them wrong and propagate to the next loop (next 8 > inputs). state_QS[] is a little better but still very sensitive. For > instance, if adding PROC(in0(s11') in1(s10) in2(s9) in3(s8) in4(s7) > in5(s6) in6(s5) in7(s4)) to the ke...

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Apr 05

2

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

...gt; FOR i=0 TO order-6 WITH i++ >>> > > PROC(in0(si+7) in1(si+6) in2(si+5) in3(si+4) in4(si+3) >>> in5(si+2) >>> > > in6(si+1) in7(si+0)) >>> > > END FOR >>> > > >>> > > The worst thing is that corr_QC[] is so sensitive that any extra >>> > > processing will make them wrong and propagate to the next loop >>> (next 8 >>> > > inputs). state_QS[] is a little better but still very sensitive. >>> For >>> > > instance, if adding...

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Feb 07

0

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

...tinue */ > > is actually the expansion of the kernel loop > > FOR i=0 TO order-6 WITH i++ > > PROC(in0(si+7) in1(si+6) in2(si+5) in3(si+4) in4(si+3) in5(si+2) > > in6(si+1) in7(si+0)) > > END FOR > > > > The worst thing is that corr_QC[] is so sensitive that any extra > > processing will make them wrong and propagate to the next loop (next 8 > > inputs). state_QS[] is a little better but still very sensitive. For > > instance, if adding PROC(in0(s11') in1(s10) in2(s9) in3(s8) in4(s7) > &gt...

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Apr 03

0

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

...e kernel loop >> > > FOR i=0 TO order-6 WITH i++ >> > > PROC(in0(si+7) in1(si+6) in2(si+5) in3(si+4) in4(si+3) in5(si+2) >> > > in6(si+1) in7(si+0)) >> > > END FOR >> > > >> > > The worst thing is that corr_QC[] is so sensitive that any extra >> > > processing will make them wrong and propagate to the next loop >> (next 8 >> > > inputs). state_QS[] is a little better but still very sensitive. >> For >> > > instance, if adding PROC(in0(s11')...

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Apr 05

0

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

...ITH i++ > > > PROC(in0(si+7) in1(si+6) in2(si+5) in3(si+4) > in4(si+3) in5(si+2) > > > in6(si+1) in7(si+0)) > > > END FOR > > > > > > The worst thing is that corr_QC[] is so sensitive > that any extra > > > processing will make them wrong and propagate to the > next loop (next 8 > > > inputs). state_QS[] is a little better but still > very sensitive. For >...