Displaying 14 results from an estimated 14 matches similar to: "[PATCH] Optimize silk_LPC_analysis_filter() for ARM NEON"
2017 Feb 18
0
[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
Hi Linfeng,
On 15/02/17 04:05 PM, Linfeng Zhang wrote:
> The original celt_fir() is a little bit messy. It has 2 branches chosen
> by #ifdef SMALL_FOOTPRINT.
Yeah, I agree that the #ifdef SMALL_FOOTPRINT in celt_fir() is a bit of
overkill since it's not saving much code space. I just pushed a commit
that gets rid of it, also refactoring the #else case a bit (see below).
> For
2017 Feb 15
2
[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
Hi Jean-Marc,
The original celt_fir() is a little bit messy. It has 2 branches chosen by
#ifdef SMALL_FOOTPRINT.
For floating-point, the 2 branches are identical (except the operation
sequence of accumulating x[i] to sum, which is not a big deal).
For fixed-point, the 2 branches are different. I separate them into 2
functions: the new celt_fir(), and celt_fir_permit_overflow() which is the
2017 Feb 15
4
[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
Hi,
Attached are two patches. Patch 1 refactors silk_LPC_analysis_filter(). And
Patch 2 optimizes the new function celt_fir_permit_overflow() for ARM NEON.
Please recommend a better function name.
We did the same internal code review and testing already.
Thanks,
Linfeng
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
2017 Mar 01
2
[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
>
> I believe the solution would be to always have either:
> 1) USE_CELT_FIR=1 and use ovflw() macros in the xcorr code; or
> 2) USE_CELT_FIR=0 and no ovflw() in the xcorr code
>
I prefer to create a function named silk_fir() with optimization to do the
calculation when USE_CELT_FIR=0.
xcorr_kernel() itself is great and provides many gains. The only issue is
that calling it in a
2017 Feb 15
0
[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
Hi Linfeng,
Can you give me a bit more details about the purpose of this patchset.
It seems to me like it's mostly duplicating the celt_fir()
optimizations? Did I miss anything?
Cheers,
Jean-Marc
On 15/02/17 02:22 PM, Linfeng Zhang wrote:
> Hi,
>
> Attached are two patches. Patch 1 refactors silk_LPC_analysis_filter().
> And Patch 2 optimizes the new function
2017 Feb 15
0
[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
Hi Linfeng,
On 15/02/17 02:22 PM, Linfeng Zhang wrote:
> Attached are two patches. Patch 1 refactors silk_LPC_analysis_filter().
> And Patch 2 optimizes the new function celt_fir_permit_overflow() for
> ARM NEON.
>
> Please recommend a better function name.
In most other cases, we've just added the _ovflw() suffix to
functions/macros where signed overflow is allowed
2016 Aug 26
2
[PATCH 9/9] Optimize silk_inner_prod_aligned_scale() for ARM NEON
Created corresponding unit test, and the optimization is bit exact with C
function.
---
silk/SigProc_FIX.h | 7 ++-
silk/arm/arm_silk_map.c | 12 ++++
silk/arm/inner_prod_aligned_arm.h | 58 +++++++++++++++++++
silk/arm/inner_prod_aligned_neon_intr.c | 66 ++++++++++++++++++++++
silk/enc_API.c
2017 Mar 01
0
[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
Linfeng Zhang wrote:
> xcorr_kernel() itself is great and provides many gains. The only issue
> is that calling it in a for loop makes it less efficient.
Do you think it would be possible to improve the API of xcorr_kernel()
so that calling it in a loop is more efficient?
I haven't looked at an instruction-level profile, but I find it hard to
believe that the function
2017 Mar 02
0
Antw: Re: [PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
Hi!
I'm not deep i the code, but from my experience even older gcc (4.3.4) does function inlining at -O2, and at -O3 it inlines almost any function inside one module. Once I even let it inline across modules (-combine). I'm not talking about explicit inline functions; just about automatic optimization.
So did you check that frequent function calls actually happen? I'm a bit afraid
2017 Mar 01
3
[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
Hi Timothy,
Do you think it would be possible to improve the API of xcorr_kernel() so
> that calling it in a loop is more efficient?
>
If it could be inlined, it will be more efficient. Besides memory bouncing,
frequent function call is expensive.
The other advantage to wiring up xcorr_kernel() is that it applies in more
> places than your intrinsics-only celt_fir() implementation.
2016 Jun 17
5
ARM NEON optimization -- celt_fir()
Hi all,
This is Linfeng Zhang from Google. I'll work on ARM NEON optimization in the
next few months.
I'm submitting 2 patches in the following couple of emails, which have the new
created celt_fir_neon().
I revised celt_fir_c() to not pass in argument "mem" in Patch 1. If there are
concerns to this change, please let me know.
Many thanks to your comments.
Linfeng Zhang
2016 Aug 23
0
[PATCH 8/8] Optimize silk_NSQ_del_dec() for ARM NEON
Created corresponding unit test, and the optimization is bit exact with C
function.
This optimization speeds up SILK encoder on NEON as following.
Fixed-point:
Complexity 0-5: 0%
Complexity 6-7: 6%
Complexity 8-9: 10%
Complexity 10: 8%
Got similar results on floating-point.
---
silk/NSQ_del_dec.c | 6 +-
silk/SigProc_FIX.h | 4
2016 Jul 14
6
Several patches of ARM NEON optimization
I rebased my previous 3 patches to the current master with minor changes.
Patches 1 to 3 replace all my previous submitted patches.
Patches 4 and 5 are new.
Thanks,
Linfeng Zhang
2016 Aug 23
2
[PATCH 7/8] Update NSQ_LPC_BUF_LENGTH macro.
NSQ_LPC_BUF_LENGTH is independent of DECISION_DELAY.
---
silk/define.h | 4 ----
1 file changed, 4 deletions(-)
diff --git a/silk/define.h b/silk/define.h
index 781cfdc..1286048 100644
--- a/silk/define.h
+++ b/silk/define.h
@@ -173,11 +173,7 @@ extern "C"
#define MAX_MATRIX_SIZE MAX_LPC_ORDER /* Max of LPC Order and LTP order */
-#if( MAX_LPC_ORDER >