search for: silk_warped_autocorrelation_fix

Displaying 20 results from an estimated 20 matches for "silk_warped_autocorrelation_fix".

2017 Jan 31
0
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
Hi Felicia, Thanks for the patch. Can you give more details on what checks/tests you've done so far on this patch? Thanks, Jean-Marc On 31/01/17 12:30 PM, Felicia Lim wrote: > Hi, > > Attached is a patch with arm neon optimizations for > silk_warped_autocorrelation_FIX(). Please review. > > Thanks, > Felicia > > > _______________________________________________ > opus mailing list > opus at xiph.org > http://lists.xiph.org/mailman/listinfo/opus >
2017 Feb 02
0
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...that so much unrolled prolog/epilog code is needed. Did you try having less unrolling for the prolog/epilog? That would be nicer to the I-cache (if possible). Cheers, Jean-Marc On 31/01/17 12:30 PM, Felicia Lim wrote: > Hi, > > Attached is a patch with arm neon optimizations for > silk_warped_autocorrelation_FIX(). Please review. > > Thanks, > Felicia > > > _______________________________________________ > opus mailing list > opus at xiph.org > http://lists.xiph.org/mailman/listinfo/opus >
2017 Feb 05
0
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...C = C + X*Y I think something similar to this (assuming I didn't mess up any details) should give you the correlations in vector C. Did I miss anything? Cheers, Jean-Marc On 31/01/17 12:30 PM, Felicia Lim wrote: > Hi, > > Attached is a patch with arm neon optimizations for > silk_warped_autocorrelation_FIX(). Please review. > > Thanks, > Felicia > > > _______________________________________________ > opus mailing list > opus at xiph.org > http://lists.xiph.org/mailman/listinfo/opus >
2017 Jan 31
6
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
Hi, Attached is a patch with arm neon optimizations for silk_warped_autocorrelation_FIX(). Please review. Thanks, Felicia -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xiph.org/pipermail/opus/attachments/20170131/9a912bb4/attachment-0001.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001...
2017 Apr 13
0
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...> > > Done. > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xiph.org/pipermail/opus/attachments/20170412/c3ac1cd0/attachment-0001.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-Optimize-silk_warped_autocorrelation_FIX-for-ARM-NEO.patch Type: text/x-patch Size: 29705 bytes Desc: not available URL: <http://lists.xiph.org/pipermail/opus/attachments/20170412/c3ac1cd0/attachment-0001.bin>
2017 Apr 11
2
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...all orders). > Done. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xiph.org/pipermail/opus/attachments/20170411/300b590e/attachment-0001.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-Optimize-silk_warped_autocorrelation_FIX-for-ARM-NEO.patch Type: text/x-patch Size: 29664 bytes Desc: not available URL: <http://lists.xiph.org/pipermail/opus/attachments/20170411/300b590e/attachment-0001.bin>
2017 Feb 06
0
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...(s10) in2(s9) in3(s8) in4(s7) > in5(s6) in6(s5) in7(s4)) to the kernel loop (by looping one more time) > and remove epilog 0, then all final results will be wrong. > > That's why the prolog and epilog cannot be saved to the best of my > knowledge. > > The assembly size of silk_warped_autocorrelation_FIX_neon() is about > 2,744 bytes. Compared with the C code size (about 452 bytes), it's 2.3 > KB larger. Considering silk_warped_autocorrelation_FIX_c() is the second > place CPU heavy function in fixed-point, and our testing shows up to 7% > CPU run time saving of the total encoder wi...
2017 Apr 03
0
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
Hi Jean-Marc, Attached is the silk_warped_autocorrelation_FIX_neon() which implements your idea. Speed improvement vs the previous optimization: Complexity 0-4: Doesn't call this function. Complexity 5: 2.1% (order = 16) Complexity 6: 1.0% (order = 20) Complexity 8: 0.1% (order = 24) Complexity 10: 0.1% (order = 24) Code size of silk_warped_autocorrela...
2017 Feb 07
0
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...n7(s4)) to the kernel loop (by looping one more time) > > and remove epilog 0, then all final results will be wrong. > > > > That's why the prolog and epilog cannot be saved to the best of my > > knowledge. > > > > The assembly size of silk_warped_autocorrelation_FIX_neon() is about > > 2,744 bytes. Compared with the C code size (about 452 bytes), it's 2.3 > > KB larger. Considering silk_warped_autocorrelation_FIX_c() is the > second > > place CPU heavy function in fixed-point, and our testing shows up > to 7% &gt...
2016 Jul 01
1
silk_warped_autocorrelation_FIX() NEON optimization
Hi all, I'm sending patch "Optimize silk_warped_autocorrelation_FIX() for ARM NEON" in an separate email. It is based on Tim’s aarch64v8 branch https://git.xiph.org/?p=users/tterribe/opus.git;a=shortlog;h=refs/heads/aarch64v8 Thanks for your comments. Linfeng
2017 Apr 05
0
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...attached a new patch with small cleanup (disassembly is identical as > the last patch). We have done the same internal testing as usual. > > Also, attached 2 failed temporary versions which try to reduce code size > (just for code review reference purpose). > > The new patch of silk_warped_autocorrelation_FIX_neon() has a code size > of 3,228 bytes (with gcc). > smaller_slower.c has a code size of 2,304 bytes, but the encoder is > about 1.8% - 2.7% slower. > smallest_slowest.c has a code size of 1,656 bytes, but the encoder is > about 2.3% - 3.6% slower. > > Thanks, > Linfeng &g...
2017 Feb 06
2
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
Hi Jean-Marc, Thanks a lot for reviewing this huge assembly function! silk_warped_autocorrelation_FIX_c()'s kernel part is for( n = 0; n < length; n++ ) { tmp1_QS = silk_LSHIFT32( (opus_int32)input[ n ], QS ); /* Loop over allpass sections */ for( i = 0; i < order; i++ ) { /* Output of allpass section */ tmp2_QS = silk_SMLAWB( state_QS[...
2017 Apr 06
0
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...y is identical as > > the last patch). We have done the same internal testing as usual. > > > > Also, attached 2 failed temporary versions which try to reduce code size > > (just for code review reference purpose). > > > > The new patch of silk_warped_autocorrelation_FIX_neon() has a code size > > of 3,228 bytes (with gcc). > > smaller_slower.c has a code size of 2,304 bytes, but the encoder is > > about 1.8% - 2.7% slower. > > smallest_slowest.c has a code size of 1,656 bytes, but the encoder is > > about 2.3% - 3....
2017 Feb 07
2
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...; > in5(s6) in6(s5) in7(s4)) to the kernel loop (by looping one more time) > > and remove epilog 0, then all final results will be wrong. > > > > That's why the prolog and epilog cannot be saved to the best of my > > knowledge. > > > > The assembly size of silk_warped_autocorrelation_FIX_neon() is about > > 2,744 bytes. Compared with the C code size (about 452 bytes), it's 2.3 > > KB larger. Considering silk_warped_autocorrelation_FIX_c() is the second > > place CPU heavy function in fixed-point, and our testing shows up to 7% > > CPU run time saving of...
2017 Feb 07
3
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...ing one more > time) > > > and remove epilog 0, then all final results will be wrong. > > > > > > That's why the prolog and epilog cannot be saved to the best of my > > > knowledge. > > > > > > The assembly size of silk_warped_autocorrelation_FIX_neon() is > about > > > 2,744 bytes. Compared with the C code size (about 452 bytes), it's > 2.3 > > > KB larger. Considering silk_warped_autocorrelation_FIX_c() is the > > second > > > place CPU heavy function in fixed-point, and our testi...
2017 Apr 05
2
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
I attached a new patch with small cleanup (disassembly is identical as the last patch). We have done the same internal testing as usual. Also, attached 2 failed temporary versions which try to reduce code size (just for code review reference purpose). The new patch of silk_warped_autocorrelation_FIX_neon() has a code size of 3,228 bytes (with gcc). smaller_slower.c has a code size of 2,304 bytes, but the encoder is about 1.8% - 2.7% slower. smallest_slowest.c has a code size of 1,656 bytes, but the encoder is about 2.3% - 3.6% slower. Thanks, Linfeng On Mon, Apr 3, 2017 at 3:01 PM, Linfeng Z...
2017 Apr 05
4
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...mall cleanup (disassembly is identical as > > the last patch). We have done the same internal testing as usual. > > > > Also, attached 2 failed temporary versions which try to reduce code size > > (just for code review reference purpose). > > > > The new patch of silk_warped_autocorrelation_FIX_neon() has a code size > > of 3,228 bytes (with gcc). > > smaller_slower.c has a code size of 2,304 bytes, but the encoder is > > about 1.8% - 2.7% slower. > > smallest_slowest.c has a code size of 1,656 bytes, but the encoder is > > about 2.3% - 3.6% slower. > >...
2017 Jun 02
0
[PATCH] Don't use MAY_HAVE_NEON in arm_silk_map.c.
..._dec_c, /* EDSP */ silk_NSQ_del_dec_c, /* Media */ - MAY_HAVE_NEON(silk_NSQ_del_dec), /* Neon */ + silk_NSQ_del_dec_neon, /* Neon */ }; /*There is no table for silk_noise_shape_quantizer_short_prediction because the @@ -115,7 +115,7 @@ void (*const SILK_WARPED_AUTOCORRELATION_FIX_IMPL[OPUS_ARCHMASK + 1])( silk_warped_autocorrelation_FIX_c, /* ARMv4 */ silk_warped_autocorrelation_FIX_c, /* EDSP */ silk_warped_autocorrelation_FIX_c, /* Media */ - MAY_HAVE_NEON(silk_warped_autocorrelation_FIX), /* Neon */ +...
2016 Jul 14
6
Several patches of ARM NEON optimization
I rebased my previous 3 patches to the current master with minor changes. Patches 1 to 3 replace all my previous submitted patches. Patches 4 and 5 are new. Thanks, Linfeng Zhang
2017 Jun 02
2
Opus floating-point NEON jump table question
Thank Jonathan! I'll fix the MAY_HAVE_NEON() in silk/arm/arm_silk_map.c Linfeng On Thu, Jun 1, 2017 at 3:34 PM, Jonathan Lennox <jonathan at vidyo.com> wrote: > Semantically, OPUS_ARM_MAY_HAVE_NEON is supposed to mean the compiler > supports, and the CPU may support, Neon assembly code, which isn’t > necessarily the same thing as the compiler supporting Neon intrinsics. >