thr3ads.net - similar to: "[PATCH] Optimize silk_LPC_analysis

Displaying 14 results from an estimated 14 matches similar to: "[PATCH] Optimize silk_LPC_analysis_filter() for ARM NEON"

[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON

2017 Feb 18

[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON

Hi Linfeng, On 15/02/17 04:05 PM, Linfeng Zhang wrote: > The original celt_fir() is a little bit messy. It has 2 branches chosen > by #ifdef SMALL_FOOTPRINT. Yeah, I agree that the #ifdef SMALL_FOOTPRINT in celt_fir() is a bit of overkill since it's not saving much code space. I just pushed a commit that gets rid of it, also refactoring the #else case a bit (see below). > For

[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON

2017 Feb 15

[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON

Hi Jean-Marc, The original celt_fir() is a little bit messy. It has 2 branches chosen by #ifdef SMALL_FOOTPRINT. For floating-point, the 2 branches are identical (except the operation sequence of accumulating x[i] to sum, which is not a big deal). For fixed-point, the 2 branches are different. I separate them into 2 functions: the new celt_fir(), and celt_fir_permit_overflow() which is the

[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON

2017 Feb 15

[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON

Hi, Attached are two patches. Patch 1 refactors silk_LPC_analysis_filter(). And Patch 2 optimizes the new function celt_fir_permit_overflow() for ARM NEON. Please recommend a better function name. We did the same internal code review and testing already. Thanks, Linfeng -------------- next part -------------- An HTML attachment was scrubbed... URL:

[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON

2017 Mar 01

[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON

> > I believe the solution would be to always have either: > 1) USE_CELT_FIR=1 and use ovflw() macros in the xcorr code; or > 2) USE_CELT_FIR=0 and no ovflw() in the xcorr code > I prefer to create a function named silk_fir() with optimization to do the calculation when USE_CELT_FIR=0. xcorr_kernel() itself is great and provides many gains. The only issue is that calling it in a

[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON

2017 Feb 15

[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON

Hi Linfeng, Can you give me a bit more details about the purpose of this patchset. It seems to me like it's mostly duplicating the celt_fir() optimizations? Did I miss anything? Cheers, Jean-Marc On 15/02/17 02:22 PM, Linfeng Zhang wrote: > Hi, > > Attached are two patches. Patch 1 refactors silk_LPC_analysis_filter(). > And Patch 2 optimizes the new function

[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON

2017 Feb 15

[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON

Hi Linfeng, On 15/02/17 02:22 PM, Linfeng Zhang wrote: > Attached are two patches. Patch 1 refactors silk_LPC_analysis_filter(). > And Patch 2 optimizes the new function celt_fir_permit_overflow() for > ARM NEON. > > Please recommend a better function name. In most other cases, we've just added the _ovflw() suffix to functions/macros where signed overflow is allowed

[PATCH 9/9] Optimize silk_inner_prod_aligned_scale() for ARM NEON

2016 Aug 26

[PATCH 9/9] Optimize silk_inner_prod_aligned_scale() for ARM NEON

Created corresponding unit test, and the optimization is bit exact with C function. --- silk/SigProc_FIX.h | 7 ++- silk/arm/arm_silk_map.c | 12 ++++ silk/arm/inner_prod_aligned_arm.h | 58 +++++++++++++++++++ silk/arm/inner_prod_aligned_neon_intr.c | 66 ++++++++++++++++++++++ silk/enc_API.c

[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON

2017 Mar 01

[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON

Linfeng Zhang wrote: > xcorr_kernel() itself is great and provides many gains. The only issue > is that calling it in a for loop makes it less efficient. Do you think it would be possible to improve the API of xcorr_kernel() so that calling it in a loop is more efficient? I haven't looked at an instruction-level profile, but I find it hard to believe that the function

Antw: Re: [PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON

2017 Mar 02

Antw: Re: [PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON

Hi! I'm not deep i the code, but from my experience even older gcc (4.3.4) does function inlining at -O2, and at -O3 it inlines almost any function inside one module. Once I even let it inline across modules (-combine). I'm not talking about explicit inline functions; just about automatic optimization. So did you check that frequent function calls actually happen? I'm a bit afraid

[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON

2017 Mar 01

[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON

Hi Timothy, Do you think it would be possible to improve the API of xcorr_kernel() so > that calling it in a loop is more efficient? > If it could be inlined, it will be more efficient. Besides memory bouncing, frequent function call is expensive. The other advantage to wiring up xcorr_kernel() is that it applies in more > places than your intrinsics-only celt_fir() implementation.

ARM NEON optimization -- celt_fir()

2016 Jun 17

ARM NEON optimization -- celt_fir()

Hi all, This is Linfeng Zhang from Google. I'll work on ARM NEON optimization in the next few months. I'm submitting 2 patches in the following couple of emails, which have the new created celt_fir_neon(). I revised celt_fir_c() to not pass in argument "mem" in Patch 1. If there are concerns to this change, please let me know. Many thanks to your comments. Linfeng Zhang

[PATCH 8/8] Optimize silk_NSQ_del_dec() for ARM NEON

2016 Aug 23

[PATCH 8/8] Optimize silk_NSQ_del_dec() for ARM NEON

Created corresponding unit test, and the optimization is bit exact with C function. This optimization speeds up SILK encoder on NEON as following. Fixed-point: Complexity 0-5: 0% Complexity 6-7: 6% Complexity 8-9: 10% Complexity 10: 8% Got similar results on floating-point. --- silk/NSQ_del_dec.c | 6 +- silk/SigProc_FIX.h | 4

Several patches of ARM NEON optimization

2016 Jul 14

Several patches of ARM NEON optimization

I rebased my previous 3 patches to the current master with minor changes. Patches 1 to 3 replace all my previous submitted patches. Patches 4 and 5 are new. Thanks, Linfeng Zhang

[PATCH 7/8] Update NSQ_LPC_BUF_LENGTH macro.

2016 Aug 23

[PATCH 7/8] Update NSQ_LPC_BUF_LENGTH macro.

NSQ_LPC_BUF_LENGTH is independent of DECISION_DELAY. --- silk/define.h | 4 ---- 1 file changed, 4 deletions(-) diff --git a/silk/define.h b/silk/define.h index 781cfdc..1286048 100644 --- a/silk/define.h +++ b/silk/define.h @@ -173,11 +173,7 @@ extern "C" #define MAX_MATRIX_SIZE MAX_LPC_ORDER /* Max of LPC Order and LTP order */ -#if( MAX_LPC_ORDER >

similar to: [PATCH] Optimize silk_LPC_analysis_filter() for ARM NEON