search for: celt_fir_permit_overflow

Displaying 9 results from an estimated 9 matches for "celt_fir_permit_overflow".

2017 Feb 15
4
[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
Hi, Attached are two patches. Patch 1 refactors silk_LPC_analysis_filter(). And Patch 2 optimizes the new function celt_fir_permit_overflow() for ARM NEON. Please recommend a better function name. We did the same internal code review and testing already. Thanks, Linfeng -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xiph.org/pipermail/opus/attachments/20170215/c44c8029/attachment-000...
2017 Feb 15
2
[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
...messy. It has 2 branches chosen by #ifdef SMALL_FOOTPRINT. For floating-point, the 2 branches are identical (except the operation sequence of accumulating x[i] to sum, which is not a big deal). For fixed-point, the 2 branches are different. I separate them into 2 functions: the new celt_fir(), and celt_fir_permit_overflow() which is the SMALL_FOOTPRINT branch. The only difference for fixed-point is: celt_fir(): the sum is truncated first and then accumulated to x[i] and saturated. celt_fir_permit_overflow(): x[i] is accumulated to the sum first and then truncated saturated. Maybe this is the reason why silk_LPC_ana...
2017 Feb 18
0
[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
...ring the #else case a bit (see below). > For floating-point, the 2 branches are identical (except the operation > sequence of accumulating x[i] to sum, which is not a big deal). > For fixed-point, the 2 branches are different. I separate them into 2 > functions: the new celt_fir(), and celt_fir_permit_overflow() which is > the SMALL_FOOTPRINT branch. > The only difference for fixed-point is: > celt_fir(): the sum is truncated first and then accumulated to x[i] and > saturated. > celt_fir_permit_overflow(): x[i] is accumulated to the sum first and > then truncated saturated. Actually, t...
2017 Feb 15
0
[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
...It seems to me like it's mostly duplicating the celt_fir() optimizations? Did I miss anything? Cheers, Jean-Marc On 15/02/17 02:22 PM, Linfeng Zhang wrote: > Hi, > > Attached are two patches. Patch 1 refactors silk_LPC_analysis_filter(). > And Patch 2 optimizes the new function celt_fir_permit_overflow() for > ARM NEON. > > Please recommend a better function name. > > We did the same internal code review and testing already. > > Thanks, > Linfeng > > > > _______________________________________________ > opus mailing list > opus at xiph.org > http...
2017 Feb 15
0
[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
Hi Linfeng, On 15/02/17 02:22 PM, Linfeng Zhang wrote: > Attached are two patches. Patch 1 refactors silk_LPC_analysis_filter(). > And Patch 2 optimizes the new function celt_fir_permit_overflow() for > ARM NEON. > > Please recommend a better function name. In most other cases, we've just added the _ovflw() suffix to functions/macros where signed overflow is allowed (suppressed using unsigned cast). > We did the same internal code review and testing already. Thanks, I...
2017 Mar 01
2
[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
...is > appropriate when we have --enable-assertions without --enable-check-asm, > wheras 1) is appropriate the rest of the time. That way we can test for > overflows in the CELT code, without preventing optimization of the SILK > code. > > > Because of silk_LPC_analysis_filter(), celt_fir_permit_overflow() must > > behave the same for both floating-point and fixed-point, and this is why > > we defined ADD32_FIXED(), ..., PSHR32_FIXED() etc. > > I don't think you will need these anymore, but if you ever need > fixed-point macros that remain integer for float compilation, the...
2017 Mar 01
0
[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
Linfeng Zhang wrote: > xcorr_kernel() itself is great and provides many gains. The only issue > is that calling it in a for loop makes it less efficient. Do you think it would be possible to improve the API of xcorr_kernel() so that calling it in a loop is more efficient? I haven't looked at an instruction-level profile, but I find it hard to believe that the function
2017 Mar 02
0
Antw: Re: [PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
Hi! I'm not deep i the code, but from my experience even older gcc (4.3.4) does function inlining at -O2, and at -O3 it inlines almost any function inside one module. Once I even let it inline across modules (-combine). I'm not talking about explicit inline functions; just about automatic optimization. So did you check that frequent function calls actually happen? I'm a bit afraid
2017 Mar 01
3
[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
Hi Timothy, Do you think it would be possible to improve the API of xcorr_kernel() so > that calling it in a loop is more efficient? > If it could be inlined, it will be more efficient. Besides memory bouncing, frequent function call is expensive. The other advantage to wiring up xcorr_kernel() is that it applies in more > places than your intrinsics-only celt_fir() implementation.