Timothy B. Terriberry
2016-Sep-28 01:42 UTC
[opus] [PATCH 2/5] Optimize fixed-point celt_fir_c() for ARM NEON
Linfeng Zhang wrote:> +#ifdef SMALL_FOOTPRINT > + for (i=0;i<N-7;i+=8) > + { > [snip over 80 lines of complicated NEON intrinsics code] > + } > +#elseSo, one of the points of SMALL_FOOTPRINT is to reduce the code size on targets where this matters (even if it means running slower), but this is an awful lot of code. I think it makes much more sense to expose the existing xcorr_kernel asm and use that. I wrote a simple patch demonstrating this (attached... it applies on top of your full series, so it'd be a little work to rebase it into place here). It adds one 16-byte table and 16 instructions, and even gives speed-ups on non-NEON CPUs by reusing the existing EDSP asm. Testing on comp48-stereo.sw encoded to 64 kbps and decoded with a 15% loss rate on a Novena using opus_demo (by using RTCD and changing the function pointers to the version of the code to test), optimizing xcorr_kernel gives almost as much speed-up as intrinsics for all of celt_fir: celt_fir_c, xcorr_kernel_c: 1753 ms (stddev 9) [1730 1740 {1740 1740 1740 1750 1750 1750 1750 1750 1750 1750 1750 1750 1750 1750 1760 1760 1760 1760 1770 1770} 1780 1860] celt_fir_c, xcorr_kernel_neon: 1710 ms (stddev 12) [1680 1690 {1690 1690 1700 1700 1700 1700 1710 1710 1710 1710 1710 1710 1710 1710 1710 1720 1720 1730 1730 1730} 1740 1810] celt_fir_neon: 1695 ms (stddev 9) [1670 1680 {1680 1680 1680 1690 1690 1690 1690 1690 1690 1690 1700 1700 1700 1700 1700 1700 1700 1700 1710 1710} 1720 1790] It might even be enough to use this for the non-SMALL_FOOTPRINT case. What do you think?
Timothy B. Terriberry
2016-Sep-28 01:45 UTC
[opus] [PATCH 2/5] Optimize fixed-point celt_fir_c() for ARM NEON
Timothy B. Terriberry wrote:> and use that. I wrote a simple patch demonstrating this (attached... itErr, actually attached this time. -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-Test-celt_xcorr_kernel-speed.patch Type: text/x-patch Size: 15319 bytes Desc: not available URL: <http://lists.xiph.org/pipermail/opus/attachments/20160927/da90702e/attachment.bin>
Linfeng Zhang
2016-Sep-28 16:45 UTC
[opus] [PATCH 2/5] Optimize fixed-point celt_fir_c() for ARM NEON
Hi Tim, Thanks for the speed testing and patch. Yes, I agree that it makes sense to reuse xcorr_kernel_neon() in SMALL_FOOTPRINT to save code size. For non-SMALL_FOOTPRINT, (1710-1695)/1710 = 0.88%. Maybe it's worth to have celt_fir_neon() there. However, considering that now celt_fir() is disabled by default, it's up to you if this patch (and the previous one) should be skipped. Let me know and I'll rebase all following patches which are based on them. Thanks, Linfeng On Tue, Sep 27, 2016 at 6:45 PM, Timothy B. Terriberry <tterribe at xiph.org> wrote:> Timothy B. Terriberry wrote: > >> and use that. I wrote a simple patch demonstrating this (attached... it >> > > Err, actually attached this time. > > > _______________________________________________ > opus mailing list > opus at xiph.org > http://lists.xiph.org/mailman/listinfo/opus > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xiph.org/pipermail/opus/attachments/20160928/e2556a12/attachment.html>
Apparently Analagous Threads
- [PATCH 2/5] Optimize fixed-point celt_fir_c() for ARM NEON
- [RFC][FFT][Fixed-Point][NEON] NEON-Optimize
- [RFC][FFT][Fixed-Point][NEON] NEON-Optimize Fixed-Point FFT?
- [PATCH 9/9] Optimize silk_inner_prod_aligned_scale() for ARM NEON
- [PATCH 2/5] Optimize fixed-point celt_fir_c() for ARM NEON