Displaying 16 results from an estimated 16 matches for "corr_qc".
2016 Jul 01
1
silk_warped_autocorrelation_FIX() NEON optimization
Hi all,
I'm sending patch "Optimize silk_warped_autocorrelation_FIX() for ARM NEON" in an separate email.
It is based on Tim’s aarch64v8 branch https://git.xiph.org/?p=users/tterribe/opus.git;a=shortlog;h=refs/heads/aarch64v8
Thanks for your comments.
Linfeng
2016 Jul 14
6
Several patches of ARM NEON optimization
I rebased my previous 3 patches to the current master with minor changes.
Patches 1 to 3 replace all my previous submitted patches.
Patches 4 and 5 are new.
Thanks,
Linfeng Zhang
2018 May 08
2
Pointer size bugs when compiling for android arm64?
...piling for android arm64 using clang:
CC silk/fixed/arm/warped_autocorrelation_FIX_neon_intr.lo
silk/fixed/arm/warped_autocorrelation_FIX_neon_intr.c:43:37: warning:
incompatible pointer types assigning to 'const long *' from 'long long
*' [-Wincompatible-pointer-types]
corr_QC_s64x2[ 0 ] = vld1q_s64( corr_QC + offset + 0 );
^~~~~~~~~~~~~~~~~~~~
/Users/andrewl/android/toolchain-r16b-arm64-v8a/lib64/clang/5.0.300080/include/arm_neon.h:7628:46:
note: expanded from macro 'vld1q_s64'
__ret = (int64x2_t) __builtin_neon_vld1q_v(__p0...
2017 Feb 06
2
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...opus_int32)input[ n ], QS );
/* Loop over allpass sections */
for( i = 0; i < order; i++ ) {
/* Output of allpass section */
tmp2_QS = silk_SMLAWB( state_QS[ i ], state_QS[ i + 1 ] -
tmp1_QS, warping_Q16 );
state_QS[ i ] = tmp1_QS;
corr_QC[ i ] += silk_RSHIFT64( silk_SMULL( tmp1_QS, state_QS[
0 ] ), 2 * QS - QC );
tmp1_QS = tmp2_QS;
}
state_QS[ order ] = tmp1_QS;
corr_QC[ order ] += silk_RSHIFT64( silk_SMULL( tmp1_QS, state_QS[
0 ] ), 2 * QS - QC );
}
in which corr_QC[0, 1, ..., order] is the...
2017 Jan 31
6
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
Hi,
Attached is a patch with arm neon optimizations for
silk_warped_autocorrelation_FIX(). Please review.
Thanks,
Felicia
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xiph.org/pipermail/opus/attachments/20170131/9a912bb4/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name:
2017 Apr 11
2
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...ions and also outputs the last element of
> state_QS_s32x4[0][0] back to input_QS[], so that it can be used to
> compute a new secion.
>
Done. The speed is almost identical (slightly slower), however the extra
bonus is code size saving.
4) It's a minor detail, but the last element of corr_QC[] that's not
> currently vectorized could simply be vectorized independently outside
> the loop (and it's the same for all orders).
>
Done.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xiph.org/pipermail/opus/attachments/20170411...
2017 Apr 13
0
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...ent of
>> state_QS_s32x4[0][0] back to input_QS[], so that it can be used to
>> compute a new secion.
>>
>
> Done. The speed is almost identical (slightly slower), however the extra
> bonus is code size saving.
>
> 4) It's a minor detail, but the last element of corr_QC[] that's not
>> currently vectorized could simply be vectorized independently outside
>> the loop (and it's the same for all orders).
>>
>
> Done.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xiph.org/pipermail...
2017 Apr 05
4
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...> > PROC(in0(si+7) in1(si+6) in2(si+5) in3(si+4)
> > in4(si+3) in5(si+2)
> > > > in6(si+1) in7(si+0))
> > > > END FOR
> > > >
> > > > The worst thing is that corr_QC[] is so sensitive
> > that any extra
> > > > processing will make them wrong and propagate to the
> > next loop (next 8
> > > > inputs). state_QS[] is a little better but still
> > very sen...
2017 Feb 07
2
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...3) in7(s2))
> > /* continue */
> > is actually the expansion of the kernel loop
> > FOR i=0 TO order-6 WITH i++
> > PROC(in0(si+7) in1(si+6) in2(si+5) in3(si+4) in4(si+3) in5(si+2)
> > in6(si+1) in7(si+0))
> > END FOR
> >
> > The worst thing is that corr_QC[] is so sensitive that any extra
> > processing will make them wrong and propagate to the next loop (next 8
> > inputs). state_QS[] is a little better but still very sensitive. For
> > instance, if adding PROC(in0(s11') in1(s10) in2(s9) in3(s8) in4(s7)
> > in5(s6) in6(s5...
2017 Apr 06
0
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...ections".
By that, I mean that you can implement (e.g.) an order-8 "kernel" that
computes the correlations and also outputs the last element of
state_QS_s32x4[0][0] back to input_QS[], so that it can be used to
compute a new secion.
4) It's a minor detail, but the last element of corr_QC[] that's not
currently vectorized could simply be vectorized independently outside
the loop (and it's the same for all orders).
Cheers,
Jean-Marc
On 05/04/17 02:13 PM, Linfeng Zhang wrote:
> Thank Jean-Marc!
>
> The speedup percentages are all relative to the entire encoder.
&...
2017 Feb 07
3
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...ally the expansion of the kernel loop
> > > FOR i=0 TO order-6 WITH i++
> > > PROC(in0(si+7) in1(si+6) in2(si+5) in3(si+4) in4(si+3) in5(si+2)
> > > in6(si+1) in7(si+0))
> > > END FOR
> > >
> > > The worst thing is that corr_QC[] is so sensitive that any extra
> > > processing will make them wrong and propagate to the next loop
> (next 8
> > > inputs). state_QS[] is a little better but still very sensitive.
> For
> > > instance, if adding PROC(in0(s11') in1(s10) in2(s9) in3...
2017 Feb 06
0
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...) in2(s7) in3(s6) in4(s5) in5(s4) in6(s3) in7(s2))
> /* continue */
> is actually the expansion of the kernel loop
> FOR i=0 TO order-6 WITH i++
> PROC(in0(si+7) in1(si+6) in2(si+5) in3(si+4) in4(si+3) in5(si+2)
> in6(si+1) in7(si+0))
> END FOR
>
> The worst thing is that corr_QC[] is so sensitive that any extra
> processing will make them wrong and propagate to the next loop (next 8
> inputs). state_QS[] is a little better but still very sensitive. For
> instance, if adding PROC(in0(s11') in1(s10) in2(s9) in3(s8) in4(s7)
> in5(s6) in6(s5) in7(s4)) to the ke...
2017 Apr 05
2
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...gt; FOR i=0 TO order-6 WITH i++
>>> > > PROC(in0(si+7) in1(si+6) in2(si+5) in3(si+4) in4(si+3)
>>> in5(si+2)
>>> > > in6(si+1) in7(si+0))
>>> > > END FOR
>>> > >
>>> > > The worst thing is that corr_QC[] is so sensitive that any extra
>>> > > processing will make them wrong and propagate to the next loop
>>> (next 8
>>> > > inputs). state_QS[] is a little better but still very sensitive.
>>> For
>>> > > instance, if adding...
2017 Feb 07
0
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...tinue */
> > is actually the expansion of the kernel loop
> > FOR i=0 TO order-6 WITH i++
> > PROC(in0(si+7) in1(si+6) in2(si+5) in3(si+4) in4(si+3) in5(si+2)
> > in6(si+1) in7(si+0))
> > END FOR
> >
> > The worst thing is that corr_QC[] is so sensitive that any extra
> > processing will make them wrong and propagate to the next loop (next 8
> > inputs). state_QS[] is a little better but still very sensitive. For
> > instance, if adding PROC(in0(s11') in1(s10) in2(s9) in3(s8) in4(s7)
> >...
2017 Apr 03
0
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...e kernel loop
>> > > FOR i=0 TO order-6 WITH i++
>> > > PROC(in0(si+7) in1(si+6) in2(si+5) in3(si+4) in4(si+3) in5(si+2)
>> > > in6(si+1) in7(si+0))
>> > > END FOR
>> > >
>> > > The worst thing is that corr_QC[] is so sensitive that any extra
>> > > processing will make them wrong and propagate to the next loop
>> (next 8
>> > > inputs). state_QS[] is a little better but still very sensitive.
>> For
>> > > instance, if adding PROC(in0(s11')...
2017 Apr 05
0
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...ITH i++
> > > PROC(in0(si+7) in1(si+6) in2(si+5) in3(si+4)
> in4(si+3) in5(si+2)
> > > in6(si+1) in7(si+0))
> > > END FOR
> > >
> > > The worst thing is that corr_QC[] is so sensitive
> that any extra
> > > processing will make them wrong and propagate to the
> next loop (next 8
> > > inputs). state_QS[] is a little better but still
> very sensitive. For
>...