search for: corr_qc

Displaying 16 results from an estimated 16 matches for "corr_qc".

2016 Jul 01
1
silk_warped_autocorrelation_FIX() NEON optimization
Hi all, I'm sending patch "Optimize silk_warped_autocorrelation_FIX() for ARM NEON" in an separate email. It is based on Tim’s aarch64v8 branch https://git.xiph.org/?p=users/tterribe/opus.git;a=shortlog;h=refs/heads/aarch64v8 Thanks for your comments. Linfeng
2016 Jul 14
6
Several patches of ARM NEON optimization
I rebased my previous 3 patches to the current master with minor changes. Patches 1 to 3 replace all my previous submitted patches. Patches 4 and 5 are new. Thanks, Linfeng Zhang
2018 May 08
2
Pointer size bugs when compiling for android arm64?
...piling for android arm64 using clang:   CC       silk/fixed/arm/warped_autocorrelation_FIX_neon_intr.lo silk/fixed/arm/warped_autocorrelation_FIX_neon_intr.c:43:37: warning: incompatible pointer types assigning to 'const long *' from 'long long *' [-Wincompatible-pointer-types]     corr_QC_s64x2[ 0 ] = vld1q_s64( corr_QC + offset + 0 );                                     ^~~~~~~~~~~~~~~~~~~~ /Users/andrewl/android/toolchain-r16b-arm64-v8a/lib64/clang/5.0.300080/include/arm_neon.h:7628:46: note: expanded from macro 'vld1q_s64'   __ret = (int64x2_t) __builtin_neon_vld1q_v(__p0...
2017 Feb 06
2
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...opus_int32)input[ n ], QS ); /* Loop over allpass sections */ for( i = 0; i < order; i++ ) { /* Output of allpass section */ tmp2_QS = silk_SMLAWB( state_QS[ i ], state_QS[ i + 1 ] - tmp1_QS, warping_Q16 ); state_QS[ i ] = tmp1_QS; corr_QC[ i ] += silk_RSHIFT64( silk_SMULL( tmp1_QS, state_QS[ 0 ] ), 2 * QS - QC ); tmp1_QS = tmp2_QS; } state_QS[ order ] = tmp1_QS; corr_QC[ order ] += silk_RSHIFT64( silk_SMULL( tmp1_QS, state_QS[ 0 ] ), 2 * QS - QC ); } in which corr_QC[0, 1, ..., order] is the...
2017 Jan 31
6
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
Hi, Attached is a patch with arm neon optimizations for silk_warped_autocorrelation_FIX(). Please review. Thanks, Felicia -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xiph.org/pipermail/opus/attachments/20170131/9a912bb4/attachment-0001.html> -------------- next part -------------- A non-text attachment was scrubbed... Name:
2017 Apr 11
2
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...ions and also outputs the last element of > state_QS_s32x4[0][0] back to input_QS[], so that it can be used to > compute a new secion. > Done. The speed is almost identical (slightly slower), however the extra bonus is code size saving. 4) It's a minor detail, but the last element of corr_QC[] that's not > currently vectorized could simply be vectorized independently outside > the loop (and it's the same for all orders). > Done. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xiph.org/pipermail/opus/attachments/20170411...
2017 Apr 13
0
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...ent of >> state_QS_s32x4[0][0] back to input_QS[], so that it can be used to >> compute a new secion. >> > > Done. The speed is almost identical (slightly slower), however the extra > bonus is code size saving. > > 4) It's a minor detail, but the last element of corr_QC[] that's not >> currently vectorized could simply be vectorized independently outside >> the loop (and it's the same for all orders). >> > > Done. > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xiph.org/pipermail...
2017 Apr 05
4
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...> > PROC(in0(si+7) in1(si+6) in2(si+5) in3(si+4) > > in4(si+3) in5(si+2) > > > > in6(si+1) in7(si+0)) > > > > END FOR > > > > > > > > The worst thing is that corr_QC[] is so sensitive > > that any extra > > > > processing will make them wrong and propagate to the > > next loop (next 8 > > > > inputs). state_QS[] is a little better but still > > very sen...
2017 Feb 07
2
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...3) in7(s2)) > > /* continue */ > > is actually the expansion of the kernel loop > > FOR i=0 TO order-6 WITH i++ > > PROC(in0(si+7) in1(si+6) in2(si+5) in3(si+4) in4(si+3) in5(si+2) > > in6(si+1) in7(si+0)) > > END FOR > > > > The worst thing is that corr_QC[] is so sensitive that any extra > > processing will make them wrong and propagate to the next loop (next 8 > > inputs). state_QS[] is a little better but still very sensitive. For > > instance, if adding PROC(in0(s11') in1(s10) in2(s9) in3(s8) in4(s7) > > in5(s6) in6(s5...
2017 Apr 06
0
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...ections". By that, I mean that you can implement (e.g.) an order-8 "kernel" that computes the correlations and also outputs the last element of state_QS_s32x4[0][0] back to input_QS[], so that it can be used to compute a new secion. 4) It's a minor detail, but the last element of corr_QC[] that's not currently vectorized could simply be vectorized independently outside the loop (and it's the same for all orders). Cheers, Jean-Marc On 05/04/17 02:13 PM, Linfeng Zhang wrote: > Thank Jean-Marc! > > The speedup percentages are all relative to the entire encoder. &...
2017 Feb 07
3
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...ally the expansion of the kernel loop > > > FOR i=0 TO order-6 WITH i++ > > > PROC(in0(si+7) in1(si+6) in2(si+5) in3(si+4) in4(si+3) in5(si+2) > > > in6(si+1) in7(si+0)) > > > END FOR > > > > > > The worst thing is that corr_QC[] is so sensitive that any extra > > > processing will make them wrong and propagate to the next loop > (next 8 > > > inputs). state_QS[] is a little better but still very sensitive. > For > > > instance, if adding PROC(in0(s11') in1(s10) in2(s9) in3...
2017 Feb 06
0
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...) in2(s7) in3(s6) in4(s5) in5(s4) in6(s3) in7(s2)) > /* continue */ > is actually the expansion of the kernel loop > FOR i=0 TO order-6 WITH i++ > PROC(in0(si+7) in1(si+6) in2(si+5) in3(si+4) in4(si+3) in5(si+2) > in6(si+1) in7(si+0)) > END FOR > > The worst thing is that corr_QC[] is so sensitive that any extra > processing will make them wrong and propagate to the next loop (next 8 > inputs). state_QS[] is a little better but still very sensitive. For > instance, if adding PROC(in0(s11') in1(s10) in2(s9) in3(s8) in4(s7) > in5(s6) in6(s5) in7(s4)) to the ke...
2017 Apr 05
2
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...gt; FOR i=0 TO order-6 WITH i++ >>> > > PROC(in0(si+7) in1(si+6) in2(si+5) in3(si+4) in4(si+3) >>> in5(si+2) >>> > > in6(si+1) in7(si+0)) >>> > > END FOR >>> > > >>> > > The worst thing is that corr_QC[] is so sensitive that any extra >>> > > processing will make them wrong and propagate to the next loop >>> (next 8 >>> > > inputs). state_QS[] is a little better but still very sensitive. >>> For >>> > > instance, if adding...
2017 Feb 07
0
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...tinue */ > > is actually the expansion of the kernel loop > > FOR i=0 TO order-6 WITH i++ > > PROC(in0(si+7) in1(si+6) in2(si+5) in3(si+4) in4(si+3) in5(si+2) > > in6(si+1) in7(si+0)) > > END FOR > > > > The worst thing is that corr_QC[] is so sensitive that any extra > > processing will make them wrong and propagate to the next loop (next 8 > > inputs). state_QS[] is a little better but still very sensitive. For > > instance, if adding PROC(in0(s11') in1(s10) in2(s9) in3(s8) in4(s7) > &gt...
2017 Apr 03
0
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...e kernel loop >> > > FOR i=0 TO order-6 WITH i++ >> > > PROC(in0(si+7) in1(si+6) in2(si+5) in3(si+4) in4(si+3) in5(si+2) >> > > in6(si+1) in7(si+0)) >> > > END FOR >> > > >> > > The worst thing is that corr_QC[] is so sensitive that any extra >> > > processing will make them wrong and propagate to the next loop >> (next 8 >> > > inputs). state_QS[] is a little better but still very sensitive. >> For >> > > instance, if adding PROC(in0(s11')...
2017 Apr 05
0
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...ITH i++ > > > PROC(in0(si+7) in1(si+6) in2(si+5) in3(si+4) > in4(si+3) in5(si+2) > > > in6(si+1) in7(si+0)) > > > END FOR > > > > > > The worst thing is that corr_QC[] is so sensitive > that any extra > > > processing will make them wrong and propagate to the > next loop (next 8 > > > inputs). state_QS[] is a little better but still > very sensitive. For >...