similar to: [PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON

Displaying 20 results from an estimated 700 matches similar to: "[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON"

2017 Feb 15
2
[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
Hi Jean-Marc, The original celt_fir() is a little bit messy. It has 2 branches chosen by #ifdef SMALL_FOOTPRINT. For floating-point, the 2 branches are identical (except the operation sequence of accumulating x[i] to sum, which is not a big deal). For fixed-point, the 2 branches are different. I separate them into 2 functions: the new celt_fir(), and celt_fir_permit_overflow() which is the
2017 Mar 01
2
[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
> > I believe the solution would be to always have either: > 1) USE_CELT_FIR=1 and use ovflw() macros in the xcorr code; or > 2) USE_CELT_FIR=0 and no ovflw() in the xcorr code > I prefer to create a function named silk_fir() with optimization to do the calculation when USE_CELT_FIR=0. xcorr_kernel() itself is great and provides many gains. The only issue is that calling it in a
2017 Feb 18
0
[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
Hi Linfeng, On 15/02/17 04:05 PM, Linfeng Zhang wrote: > The original celt_fir() is a little bit messy. It has 2 branches chosen > by #ifdef SMALL_FOOTPRINT. Yeah, I agree that the #ifdef SMALL_FOOTPRINT in celt_fir() is a bit of overkill since it's not saving much code space. I just pushed a commit that gets rid of it, also refactoring the #else case a bit (see below). > For
2017 Mar 01
3
[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
Hi Timothy, Do you think it would be possible to improve the API of xcorr_kernel() so > that calling it in a loop is more efficient? > If it could be inlined, it will be more efficient. Besides memory bouncing, frequent function call is expensive. The other advantage to wiring up xcorr_kernel() is that it applies in more > places than your intrinsics-only celt_fir() implementation.
2016 Jun 17
5
ARM NEON optimization -- celt_fir()
Hi all, This is Linfeng Zhang from Google. I'll work on ARM NEON optimization in the next few months. I'm submitting 2 patches in the following couple of emails, which have the new created celt_fir_neon(). I revised celt_fir_c() to not pass in argument "mem" in Patch 1. If there are concerns to this change, please let me know. Many thanks to your comments. Linfeng Zhang
2017 Feb 15
0
[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
Hi Linfeng, Can you give me a bit more details about the purpose of this patchset. It seems to me like it's mostly duplicating the celt_fir() optimizations? Did I miss anything? Cheers, Jean-Marc On 15/02/17 02:22 PM, Linfeng Zhang wrote: > Hi, > > Attached are two patches. Patch 1 refactors silk_LPC_analysis_filter(). > And Patch 2 optimizes the new function
2017 Feb 15
0
[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
Hi Linfeng, On 15/02/17 02:22 PM, Linfeng Zhang wrote: > Attached are two patches. Patch 1 refactors silk_LPC_analysis_filter(). > And Patch 2 optimizes the new function celt_fir_permit_overflow() for > ARM NEON. > > Please recommend a better function name. In most other cases, we've just added the _ovflw() suffix to functions/macros where signed overflow is allowed
2017 Mar 01
0
[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
Linfeng Zhang wrote: > xcorr_kernel() itself is great and provides many gains. The only issue > is that calling it in a for loop makes it less efficient. Do you think it would be possible to improve the API of xcorr_kernel() so that calling it in a loop is more efficient? I haven't looked at an instruction-level profile, but I find it hard to believe that the function
2017 Mar 02
0
Antw: Re: [PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
Hi! I'm not deep i the code, but from my experience even older gcc (4.3.4) does function inlining at -O2, and at -O3 it inlines almost any function inside one module. Once I even let it inline across modules (-combine). I'm not talking about explicit inline functions; just about automatic optimization. So did you check that frequent function calls actually happen? I'm a bit afraid
2016 Sep 28
2
[PATCH 2/5] Optimize fixed-point celt_fir_c() for ARM NEON
Linfeng Zhang wrote: > +#ifdef SMALL_FOOTPRINT > + for (i=0;i<N-7;i+=8) > + { > [snip over 80 lines of complicated NEON intrinsics code] > + } > +#else So, one of the points of SMALL_FOOTPRINT is to reduce the code size on targets where this matters (even if it means running slower), but this is an awful lot of code. I think it makes much more sense to expose the
2015 Nov 05
2
AVX Optimizations
Yes, Thank you. I'll follow up with the AVX code and tests for pitch code. Radu -----Original Message----- From: opus-bounces at xiph.org [mailto:opus-bounces at xiph.org] On Behalf Of Timothy B. Terriberry Sent: Thursday, November 5, 2015 10:31 AM To: opus at xiph.org Subject: Re: [opus] AVX Optimizations Velea, Radu wrote: > I've created a pull request[1] to enable configuration
2016 Jul 14
6
Several patches of ARM NEON optimization
I rebased my previous 3 patches to the current master with minor changes. Patches 1 to 3 replace all my previous submitted patches. Patches 4 and 5 are new. Thanks, Linfeng Zhang
2016 Jul 28
0
[PATCH] Optimize silk_LPC_analysis_filter() for ARM NEON
Created corresponding unit test. --- silk/LPC_analysis_filter.c | 8 +- silk/SigProc_FIX.h | 8 +- silk/arm/LPC_analysis_filter_arm.h | 60 +++++++ silk/arm/LPC_analysis_filter_neon_intr.c | 176 +++++++++++++++++++++ silk/arm/arm_silk_map.c | 14 ++
2013 Jun 07
2
Bug fix in celt_lpc.c and some xcorr_kernel optimizations
Hi JM, At line 221 in celt_lpc.c (the celt_iir function) I think you really want the RESTORE_STACK statement to be before the #endif instead of after it. Also, I couldn't help notice that your SSE code for xcorr_kernel reads more than "len" elements of "_x". I don't know if that's really a problem when running the codec, but a tool like valgrind will have a
2013 Jun 07
2
Bug fix in celt_lpc.c and some xcorr_kernel optimizations
Hi JM, I have no doubt that Mr. Zanelli's NEON code is faster, since hand tuned assembly is bound to be faster than using intrinsics. However I notice that his code can also read past the y buffer. Cheers, --John On 6/6/2013 9:22 PM, Jean-Marc Valin wrote: > Hi John, > > Thanks for the two fixes. They're in git now. Your SSE version seems to > also be slightly faster than
2016 Jun 17
0
ARM NEON optimization -- celt_fir()
Hi, Linfeng — Please note the aarch64 optimization patches I submitted in November and December (which Tim still hasn’t gotten around to reviewing). As they used Neon intrinsics, several of these actually applied to both armv7 and aarch64 Neon. In particular, note http://lists.xiph.org/pipermail/opus/2015-December/003339.html , which added a Neon-optimized version of xcorr_kernel. xcorr_kernel
2013 Jun 07
1
Bug fix in celt_lpc.c and some xcorr_kernel optimizations
Unfortunately I don't have a setup that lets me easily profile ARM code, so I really can't tell which method is faster (though I suspect Mr. Zanelli's code is). Let me offer up another intrinsic version of the NEON xcorr_kernel that is almost identical to the SSE version, and more in line with Mr. Zanelli's code: static inline void xcorr_kernel_neon(const opus_val16 *x, const
2017 Feb 15
2
[PATCH] Optimize silk_LPC_inverse_pred_gain() for ARM NEON
Hi Jean-Marc, (forgot cc opus@) Thanks for creating the unit test code. Attached is the updated optimization patch. On Mon, Feb 13, 2017 at 10:17 AM, Jean-Marc Valin <jmvalin at jmvalin.ca> wrote: > On 13/02/17 01:09 PM, Linfeng Zhang wrote: > > For 1), I agree that an explicit unit test would be a good plus to cover > > the cases that "make check" cannot
2017 Jun 06
3
celt_inner_prod() and dual_inner_prod() NEON intrinsics
Hi Linfeng, On 05/06/17 03:31 PM, Linfeng Zhang wrote: > Yes we'll have one more patch set related to xcorr in next week. Please > don't wait if it's too late for 1.2 release. Assuming there's no issue with the patches, next week isn't too late. Also, I've started looking at your patches. So far there's one thing that puzzles me a bit. In the OPUS_CHECK_ASM
2017 Jun 05
4
celt_inner_prod() and dual_inner_prod() NEON intrinsics
Hi Jean-Marc, I attached the new version in inner_prod_5patches_v2.zip which synced to the current master. For fixed-point ARM, only 0003-Optimize-fixed-point-celt _inner_prod-and-dual_inner_.patch changes the performance. For floating-point ARM, only 0004-Optimize-floating-point-c elt_inner_prod-and-dual_inn.patch changes the performance. Patch 1 and 2 are code clean-up and can only affect x86