Displaying 9 results from an estimated 9 matches for "celt_fir_permit_overflow".
2017 Feb 15
4
[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
Hi,
Attached are two patches. Patch 1 refactors silk_LPC_analysis_filter(). And
Patch 2 optimizes the new function celt_fir_permit_overflow() for ARM NEON.
Please recommend a better function name.
We did the same internal code review and testing already.
Thanks,
Linfeng
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xiph.org/pipermail/opus/attachments/20170215/c44c8029/attachment-000...
2017 Feb 15
2
[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
...messy. It has 2 branches chosen by
#ifdef SMALL_FOOTPRINT.
For floating-point, the 2 branches are identical (except the operation
sequence of accumulating x[i] to sum, which is not a big deal).
For fixed-point, the 2 branches are different. I separate them into 2
functions: the new celt_fir(), and celt_fir_permit_overflow() which is the
SMALL_FOOTPRINT branch.
The only difference for fixed-point is:
celt_fir(): the sum is truncated first and then accumulated to x[i] and
saturated.
celt_fir_permit_overflow(): x[i] is accumulated to the sum first and then
truncated saturated.
Maybe this is the reason why silk_LPC_ana...
2017 Feb 18
0
[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
...ring the #else case a bit (see below).
> For floating-point, the 2 branches are identical (except the operation
> sequence of accumulating x[i] to sum, which is not a big deal).
> For fixed-point, the 2 branches are different. I separate them into 2
> functions: the new celt_fir(), and celt_fir_permit_overflow() which is
> the SMALL_FOOTPRINT branch.
> The only difference for fixed-point is:
> celt_fir(): the sum is truncated first and then accumulated to x[i] and
> saturated.
> celt_fir_permit_overflow(): x[i] is accumulated to the sum first and
> then truncated saturated.
Actually, t...
2017 Feb 15
0
[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
...It seems to me like it's mostly duplicating the celt_fir()
optimizations? Did I miss anything?
Cheers,
Jean-Marc
On 15/02/17 02:22 PM, Linfeng Zhang wrote:
> Hi,
>
> Attached are two patches. Patch 1 refactors silk_LPC_analysis_filter().
> And Patch 2 optimizes the new function celt_fir_permit_overflow() for
> ARM NEON.
>
> Please recommend a better function name.
>
> We did the same internal code review and testing already.
>
> Thanks,
> Linfeng
>
>
>
> _______________________________________________
> opus mailing list
> opus at xiph.org
> http...
2017 Feb 15
0
[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
Hi Linfeng,
On 15/02/17 02:22 PM, Linfeng Zhang wrote:
> Attached are two patches. Patch 1 refactors silk_LPC_analysis_filter().
> And Patch 2 optimizes the new function celt_fir_permit_overflow() for
> ARM NEON.
>
> Please recommend a better function name.
In most other cases, we've just added the _ovflw() suffix to
functions/macros where signed overflow is allowed (suppressed using
unsigned cast).
> We did the same internal code review and testing already.
Thanks, I...
2017 Mar 01
2
[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
...is
> appropriate when we have --enable-assertions without --enable-check-asm,
> wheras 1) is appropriate the rest of the time. That way we can test for
> overflows in the CELT code, without preventing optimization of the SILK
> code.
>
> > Because of silk_LPC_analysis_filter(), celt_fir_permit_overflow() must
> > behave the same for both floating-point and fixed-point, and this is why
> > we defined ADD32_FIXED(), ..., PSHR32_FIXED() etc.
>
> I don't think you will need these anymore, but if you ever need
> fixed-point macros that remain integer for float compilation, the...
2017 Mar 01
0
[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
Linfeng Zhang wrote:
> xcorr_kernel() itself is great and provides many gains. The only issue
> is that calling it in a for loop makes it less efficient.
Do you think it would be possible to improve the API of xcorr_kernel()
so that calling it in a loop is more efficient?
I haven't looked at an instruction-level profile, but I find it hard to
believe that the function
2017 Mar 02
0
Antw: Re: [PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
Hi!
I'm not deep i the code, but from my experience even older gcc (4.3.4) does function inlining at -O2, and at -O3 it inlines almost any function inside one module. Once I even let it inline across modules (-combine). I'm not talking about explicit inline functions; just about automatic optimization.
So did you check that frequent function calls actually happen? I'm a bit afraid
2017 Mar 01
3
[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
Hi Timothy,
Do you think it would be possible to improve the API of xcorr_kernel() so
> that calling it in a loop is more efficient?
>
If it could be inlined, it will be more efficient. Besides memory bouncing,
frequent function call is expensive.
The other advantage to wiring up xcorr_kernel() is that it applies in more
> places than your intrinsics-only celt_fir() implementation.