Displaying 20 results from an estimated 20 matches for "silk_warped_autocorrelation_fix".
2017 Jan 31
0
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
Hi Felicia,
Thanks for the patch. Can you give more details on what checks/tests
you've done so far on this patch?
Thanks,
Jean-Marc
On 31/01/17 12:30 PM, Felicia Lim wrote:
> Hi,
>
> Attached is a patch with arm neon optimizations for
> silk_warped_autocorrelation_FIX(). Please review.
>
> Thanks,
> Felicia
>
>
> _______________________________________________
> opus mailing list
> opus at xiph.org
> http://lists.xiph.org/mailman/listinfo/opus
>
2017 Feb 02
0
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...that so much unrolled prolog/epilog
code is needed. Did you try having less unrolling for the prolog/epilog?
That would be nicer to the I-cache (if possible).
Cheers,
Jean-Marc
On 31/01/17 12:30 PM, Felicia Lim wrote:
> Hi,
>
> Attached is a patch with arm neon optimizations for
> silk_warped_autocorrelation_FIX(). Please review.
>
> Thanks,
> Felicia
>
>
> _______________________________________________
> opus mailing list
> opus at xiph.org
> http://lists.xiph.org/mailman/listinfo/opus
>
2017 Feb 05
0
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...C = C + X*Y
I think something similar to this (assuming I didn't mess up any
details) should give you the correlations in vector C. Did I miss anything?
Cheers,
Jean-Marc
On 31/01/17 12:30 PM, Felicia Lim wrote:
> Hi,
>
> Attached is a patch with arm neon optimizations for
> silk_warped_autocorrelation_FIX(). Please review.
>
> Thanks,
> Felicia
>
>
> _______________________________________________
> opus mailing list
> opus at xiph.org
> http://lists.xiph.org/mailman/listinfo/opus
>
2017 Jan 31
6
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
Hi,
Attached is a patch with arm neon optimizations for
silk_warped_autocorrelation_FIX(). Please review.
Thanks,
Felicia
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xiph.org/pipermail/opus/attachments/20170131/9a912bb4/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001...
2017 Apr 13
0
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...>
>
> Done.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xiph.org/pipermail/opus/attachments/20170412/c3ac1cd0/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Optimize-silk_warped_autocorrelation_FIX-for-ARM-NEO.patch
Type: text/x-patch
Size: 29705 bytes
Desc: not available
URL: <http://lists.xiph.org/pipermail/opus/attachments/20170412/c3ac1cd0/attachment-0001.bin>
2017 Apr 11
2
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...all orders).
>
Done.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xiph.org/pipermail/opus/attachments/20170411/300b590e/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Optimize-silk_warped_autocorrelation_FIX-for-ARM-NEO.patch
Type: text/x-patch
Size: 29664 bytes
Desc: not available
URL: <http://lists.xiph.org/pipermail/opus/attachments/20170411/300b590e/attachment-0001.bin>
2017 Feb 06
0
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...(s10) in2(s9) in3(s8) in4(s7)
> in5(s6) in6(s5) in7(s4)) to the kernel loop (by looping one more time)
> and remove epilog 0, then all final results will be wrong.
>
> That's why the prolog and epilog cannot be saved to the best of my
> knowledge.
>
> The assembly size of silk_warped_autocorrelation_FIX_neon() is about
> 2,744 bytes. Compared with the C code size (about 452 bytes), it's 2.3
> KB larger. Considering silk_warped_autocorrelation_FIX_c() is the second
> place CPU heavy function in fixed-point, and our testing shows up to 7%
> CPU run time saving of the total encoder wi...
2017 Apr 03
0
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
Hi Jean-Marc,
Attached is the silk_warped_autocorrelation_FIX_neon() which implements
your idea.
Speed improvement vs the previous optimization:
Complexity 0-4: Doesn't call this function. Complexity 5: 2.1% (order = 16)
Complexity 6: 1.0% (order = 20) Complexity 8: 0.1% (order = 24) Complexity
10: 0.1% (order = 24)
Code size of silk_warped_autocorrela...
2017 Feb 07
0
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...n7(s4)) to the kernel loop (by looping one more time)
> > and remove epilog 0, then all final results will be wrong.
> >
> > That's why the prolog and epilog cannot be saved to the best of my
> > knowledge.
> >
> > The assembly size of silk_warped_autocorrelation_FIX_neon() is about
> > 2,744 bytes. Compared with the C code size (about 452 bytes), it's 2.3
> > KB larger. Considering silk_warped_autocorrelation_FIX_c() is the
> second
> > place CPU heavy function in fixed-point, and our testing shows up
> to 7%
>...
2016 Jul 01
1
silk_warped_autocorrelation_FIX() NEON optimization
Hi all,
I'm sending patch "Optimize silk_warped_autocorrelation_FIX() for ARM NEON" in an separate email.
It is based on Tim’s aarch64v8 branch https://git.xiph.org/?p=users/tterribe/opus.git;a=shortlog;h=refs/heads/aarch64v8
Thanks for your comments.
Linfeng
2017 Apr 05
0
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...attached a new patch with small cleanup (disassembly is identical as
> the last patch). We have done the same internal testing as usual.
>
> Also, attached 2 failed temporary versions which try to reduce code size
> (just for code review reference purpose).
>
> The new patch of silk_warped_autocorrelation_FIX_neon() has a code size
> of 3,228 bytes (with gcc).
> smaller_slower.c has a code size of 2,304 bytes, but the encoder is
> about 1.8% - 2.7% slower.
> smallest_slowest.c has a code size of 1,656 bytes, but the encoder is
> about 2.3% - 3.6% slower.
>
> Thanks,
> Linfeng
&g...
2017 Feb 06
2
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
Hi Jean-Marc,
Thanks a lot for reviewing this huge assembly function!
silk_warped_autocorrelation_FIX_c()'s kernel part is
for( n = 0; n < length; n++ ) {
tmp1_QS = silk_LSHIFT32( (opus_int32)input[ n ], QS );
/* Loop over allpass sections */
for( i = 0; i < order; i++ ) {
/* Output of allpass section */
tmp2_QS = silk_SMLAWB( state_QS[...
2017 Apr 06
0
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...y is identical as
> > the last patch). We have done the same internal testing as usual.
> >
> > Also, attached 2 failed temporary versions which try to reduce code size
> > (just for code review reference purpose).
> >
> > The new patch of silk_warped_autocorrelation_FIX_neon() has a code size
> > of 3,228 bytes (with gcc).
> > smaller_slower.c has a code size of 2,304 bytes, but the encoder is
> > about 1.8% - 2.7% slower.
> > smallest_slowest.c has a code size of 1,656 bytes, but the encoder is
> > about 2.3% - 3....
2017 Feb 07
2
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...; > in5(s6) in6(s5) in7(s4)) to the kernel loop (by looping one more time)
> > and remove epilog 0, then all final results will be wrong.
> >
> > That's why the prolog and epilog cannot be saved to the best of my
> > knowledge.
> >
> > The assembly size of silk_warped_autocorrelation_FIX_neon() is about
> > 2,744 bytes. Compared with the C code size (about 452 bytes), it's 2.3
> > KB larger. Considering silk_warped_autocorrelation_FIX_c() is the second
> > place CPU heavy function in fixed-point, and our testing shows up to 7%
> > CPU run time saving of...
2017 Feb 07
3
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...ing one more
> time)
> > > and remove epilog 0, then all final results will be wrong.
> > >
> > > That's why the prolog and epilog cannot be saved to the best of my
> > > knowledge.
> > >
> > > The assembly size of silk_warped_autocorrelation_FIX_neon() is
> about
> > > 2,744 bytes. Compared with the C code size (about 452 bytes), it's
> 2.3
> > > KB larger. Considering silk_warped_autocorrelation_FIX_c() is the
> > second
> > > place CPU heavy function in fixed-point, and our testi...
2017 Apr 05
2
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
I attached a new patch with small cleanup (disassembly is identical as the
last patch). We have done the same internal testing as usual.
Also, attached 2 failed temporary versions which try to reduce code size
(just for code review reference purpose).
The new patch of silk_warped_autocorrelation_FIX_neon() has a code size of
3,228 bytes (with gcc).
smaller_slower.c has a code size of 2,304 bytes, but the encoder is about
1.8% - 2.7% slower.
smallest_slowest.c has a code size of 1,656 bytes, but the encoder is about
2.3% - 3.6% slower.
Thanks,
Linfeng
On Mon, Apr 3, 2017 at 3:01 PM, Linfeng Z...
2017 Apr 05
4
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...mall cleanup (disassembly is identical as
> > the last patch). We have done the same internal testing as usual.
> >
> > Also, attached 2 failed temporary versions which try to reduce code size
> > (just for code review reference purpose).
> >
> > The new patch of silk_warped_autocorrelation_FIX_neon() has a code size
> > of 3,228 bytes (with gcc).
> > smaller_slower.c has a code size of 2,304 bytes, but the encoder is
> > about 1.8% - 2.7% slower.
> > smallest_slowest.c has a code size of 1,656 bytes, but the encoder is
> > about 2.3% - 3.6% slower.
> >...
2017 Jun 02
0
[PATCH] Don't use MAY_HAVE_NEON in arm_silk_map.c.
..._dec_c, /* EDSP */
silk_NSQ_del_dec_c, /* Media */
- MAY_HAVE_NEON(silk_NSQ_del_dec), /* Neon */
+ silk_NSQ_del_dec_neon, /* Neon */
};
/*There is no table for silk_noise_shape_quantizer_short_prediction because the
@@ -115,7 +115,7 @@ void (*const SILK_WARPED_AUTOCORRELATION_FIX_IMPL[OPUS_ARCHMASK + 1])(
silk_warped_autocorrelation_FIX_c, /* ARMv4 */
silk_warped_autocorrelation_FIX_c, /* EDSP */
silk_warped_autocorrelation_FIX_c, /* Media */
- MAY_HAVE_NEON(silk_warped_autocorrelation_FIX), /* Neon */
+...
2016 Jul 14
6
Several patches of ARM NEON optimization
I rebased my previous 3 patches to the current master with minor changes.
Patches 1 to 3 replace all my previous submitted patches.
Patches 4 and 5 are new.
Thanks,
Linfeng Zhang
2017 Jun 02
2
Opus floating-point NEON jump table question
Thank Jonathan!
I'll fix the MAY_HAVE_NEON() in silk/arm/arm_silk_map.c
Linfeng
On Thu, Jun 1, 2017 at 3:34 PM, Jonathan Lennox <jonathan at vidyo.com> wrote:
> Semantically, OPUS_ARM_MAY_HAVE_NEON is supposed to mean the compiler
> supports, and the CPU may support, Neon assembly code, which isn’t
> necessarily the same thing as the compiler supporting Neon intrinsics.
>