Linfeng Zhang
2017-Mar-01 19:30 UTC
[opus] [PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
Hi Timothy, Do you think it would be possible to improve the API of xcorr_kernel() so> that calling it in a loop is more efficient? >If it could be inlined, it will be more efficient. Besides memory bouncing, frequent function call is expensive. The other advantage to wiring up xcorr_kernel() is that it applies in more> places than your intrinsics-only celt_fir() implementation. >I agree. One solution is to put the outer for(N) loop inside xcorr_kernel() to let it return N results instead of 4 (similar to the celt_fir() NEON intrinsics did). This will make it efficient plus universal. Thanks, -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xiph.org/pipermail/opus/attachments/20170301/7bb0970c/attachment.html>
Ulrich Windl
2017-Mar-02 07:27 UTC
[opus] Antw: Re: [PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
Hi! I'm not deep i the code, but from my experience even older gcc (4.3.4) does function inlining at -O2, and at -O3 it inlines almost any function inside one module. Once I even let it inline across modules (-combine). I'm not talking about explicit inline functions; just about automatic optimization. So did you check that frequent function calls actually happen? I'm a bit afraid that after all those optimizations suggested the code may be rather hard to understand. I think compilers should do the dirty work (i.e.: optimizing and inlining). Sometimes "static" and "const" attributes help the compiler to optimize... Regards, Ulrich>>> Linfeng Zhang <linfengz at google.com> schrieb am 01.03.2017 um 20:30 in Nachricht<CAKoqLCANyWDPpy4rccL3TJ37gbhWxRWkCrqR9GCATGhTFoaDyA at mail.gmail.com>:> Hi Timothy, > > Do you think it would be possible to improve the API of xcorr_kernel() so >> that calling it in a loop is more efficient? >> > > If it could be inlined, it will be more efficient. Besides memory bouncing, > frequent function call is expensive. > > The other advantage to wiring up xcorr_kernel() is that it applies in more >> places than your intrinsics-only celt_fir() implementation. >> > > I agree. > > One solution is to put the outer for(N) loop inside xcorr_kernel() to let > it return N results instead of 4 (similar to the celt_fir() NEON intrinsics > did). This will make it efficient plus universal. > > Thanks,
Linfeng Zhang
2017-Mar-02 21:12 UTC
[opus] [PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
Thank Ulrich! Yes, but when the jump table is active, the platform specific optimization functions could not be inlined. On Wed, Mar 1, 2017 at 11:27 PM, Ulrich Windl < Ulrich.Windl at rz.uni-regensburg.de> wrote:> Hi! > > I'm not deep i the code, but from my experience even older gcc (4.3.4) > does function inlining at -O2, and at -O3 it inlines almost any function > inside one module. Once I even let it inline across modules (-combine). I'm > not talking about explicit inline functions; just about automatic > optimization. > So did you check that frequent function calls actually happen? I'm a bit > afraid that after all those optimizations suggested the code may be rather > hard to understand. I think compilers should do the dirty work (i.e.: > optimizing and inlining). Sometimes "static" and "const" attributes help > the compiler to optimize... > > Regards, > Ulrich >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xiph.org/pipermail/opus/attachments/20170302/5d460825/attachment.html>
Timothy B. Terriberry
2017-Mar-21 22:56 UTC
[opus] [PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
Linfeng Zhang wrote:> One solution is to put the outer for(N) loop inside xcorr_kernel() to > let it return N results instead of 4 (similar to the celt_fir() NEON > intrinsics did). This will make it efficient plus universal.Sorry for not replying to this earlier, but isn't this what celt_pitch_xcorr() does? Or am I missing something?
Maybe Matching Threads
- [PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
- [PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
- [PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
- [PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
- [PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON