search for: xcorr_kernel

Displaying 20 results from an estimated 34 matches for "xcorr_kernel".

2017 Mar 01
2
[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
...> I believe the solution would be to always have either: > 1) USE_CELT_FIR=1 and use ovflw() macros in the xcorr code; or > 2) USE_CELT_FIR=0 and no ovflw() in the xcorr code > I prefer to create a function named silk_fir() with optimization to do the calculation when USE_CELT_FIR=0. xcorr_kernel() itself is great and provides many gains. The only issue is that calling it in a for loop makes it less efficient. xcorr_kernel() is called in several functions including celt_fir(), celt_pitch_xcorr() and celt_iir(). All these functions are not heavy hitters. silk_LPC_analysis_filter()'s CPU...
2017 Feb 15
2
[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
...o. Because of silk_LPC_analysis_filter(), celt_fir_permit_overflow() must behave the same for both floating-point and fixed-point, and this is why we defined ADD32_FIXED(), ..., PSHR32_FIXED() etc. It's still a messy. For the NEON optimization part, the previous celt_fir() optimization calls xcorr_kernel(). We tested and found that calling the xcorr_kernel() optimization didn't help too much here. The optimization in the patch is about 1% faster than simply calling xcorr_kernel() for the whole encoder. Considering the really small size of the new optimization, it's better to not call xcorr_...
2013 Jun 07
2
Bug fix in celt_lpc.c and some xcorr_kernel optimizations
Hi JM, At line 221 in celt_lpc.c (the celt_iir function) I think you really want the RESTORE_STACK statement to be before the #endif instead of after it. Also, I couldn't help notice that your SSE code for xcorr_kernel reads more than "len" elements of "_x". I don't know if that's really a problem when running the codec, but a tool like valgrind will have a fit if it's accessing uninitialized memory. Here's a version I wrote a few days ago you're welcome to use that does...
2013 Jun 07
0
Bug fix in celt_lpc.c and some xcorr_kernel optimizations
...Jean-Marc On 06/06/2013 08:07 PM, John Ridges wrote: > Hi JM, > > At line 221 in celt_lpc.c (the celt_iir function) I think you really > want the RESTORE_STACK statement to be before the #endif instead of > after it. Also, I couldn't help notice that your SSE code for > xcorr_kernel reads more than "len" elements of "_x". I don't know if > that's really a problem when running the codec, but a tool like valgrind > will have a fit if it's accessing uninitialized memory. Here's a version > I wrote a few days ago you're welcome t...
2015 Nov 05
2
AVX Optimizations
Yes, Thank you. I'll follow up with the AVX code and tests for pitch code. Radu -----Original Message----- From: opus-bounces at xiph.org [mailto:opus-bounces at xiph.org] On Behalf Of Timothy B. Terriberry Sent: Thursday, November 5, 2015 10:31 AM To: opus at xiph.org Subject: Re: [opus] AVX Optimizations Velea, Radu wrote: > I've created a pull request[1] to enable configuration
2017 Mar 01
3
[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
Hi Timothy, Do you think it would be possible to improve the API of xcorr_kernel() so > that calling it in a loop is more efficient? > If it could be inlined, it will be more efficient. Besides memory bouncing, frequent function call is expensive. The other advantage to wiring up xcorr_kernel() is that it applies in more > places than your intrinsics-only celt_fir()...
2015 Nov 21
0
[Aarch64 v2 08/18] Add Neon fixed-point implementation of xcorr_kernel.
...+58,23 @@ void (*const CELT_PITCH_XCORR_IMPL[OPUS_ARCHMASK+1])(const opus_val16 *, # endif # endif /* FIXED_POINT */ +#if defined(FIXED_POINT) && defined(OPUS_HAVE_RTCD) && \ + defined(OPUS_ARM_MAY_HAVE_NEON_INTR) && !defined(OPUS_ARM_PRESUME_NEON_INTR) + +void (*const XCORR_KERNEL_IMPL[OPUS_ARCHMASK + 1])( + const opus_val16 *x, + const opus_val16 *y, + opus_val32 sum[4], + int len +) = { + xcorr_kernel_c, /* ARMv4 */ + xcorr_kernel_c, /* EDSP */ + xcorr_kernel_c, /* Media */ +...
2017 Feb 18
0
[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
...on't think you will need these anymore, but if you ever need fixed-point macros that remain integer for float compilation, then you should use the silk_*() fixed-point macros (and the code should be in silk/). > For the NEON optimization part, the previous celt_fir() optimization > calls xcorr_kernel(). We tested and found that calling > the xcorr_kernel() optimization didn't help too much here. The > optimization in the patch is about 1% faster than simply > calling xcorr_kernel() for the whole encoder. Considering the really > small size of the new optimization, it's bette...
2013 Jun 11
0
Bug fix in celt_lpc.c and some xcorr_kernel, optimizations
...ctly in ASM since typically neither compilers do what you want. > > Cliff On 6/11/2013 1:00 PM, opus-request at xiph.org wrote: > Date: Tue, 11 Jun 2013 09:31:31 +0200 > From: Aur?lien Zanelli<aurelien.zanelli at parrot.com> > Subject: Re: [opus] Bug fix in celt_lpc.c and some xcorr_kernel > optimizations > To:<opus at xiph.org> > Message-ID:<51B6D253.9030505 at parrot.com> > Content-Type: text/plain; charset="ISO-8859-1"; format=flowed > > Hi, > > I compared C version, John's versions and azanelli's version. > > I encoded...
2015 Nov 05
0
AVX Optimizations
...+ 1])( int ord, opus_val16 *mem, int arch ) = { celt_fir_c, /* non-sse */ celt_fir_c, celt_fir_c, MAY_HAVE_SSE4_1(celt_fir), /* sse4.1 */ + MAY_HAVE_SSE4_1(celt_fir) /* avx */ }; void (*const XCORR_KERNEL_IMPL[OPUS_ARCHMASK + 1])( const opus_val16 *x, const opus_val16 *y, opus_val32 sum[4], int len ) = { xcorr_kernel_c, /* non-sse */ xcorr_kernel_c, xcorr_kernel_c, MAY_HAVE_SSE4_1(xcorr_kernel), /* sse4.1...
2013 Jun 07
1
Bug fix in celt_lpc.c and some xcorr_kernel optimizations
Unfortunately I don't have a setup that lets me easily profile ARM code, so I really can't tell which method is faster (though I suspect Mr. Zanelli's code is). Let me offer up another intrinsic version of the NEON xcorr_kernel that is almost identical to the SSE version, and more in line with Mr. Zanelli's code: static inline void xcorr_kernel_neon(const opus_val16 *x, const opus_val16 *y, opus_val32 sum[4], int len) { int j; int32x4_t xsum1 = vld1q_s32(sum); int32x4_t xsum2 = vdupq_n_s32(0);...
2013 Jun 07
0
Bug fix in celt_lpc.c and some xcorr_kernel optimizations
On 06/07/2013 02:33 PM, John Ridges wrote: > I have no doubt that Mr. Zanelli's NEON code is faster, since hand tuned > assembly is bound to be faster than using intrinsics. I was mostly curious about comparing vectorization approaches (assuming the two are different) than exact code. > However I notice > that his code can also read past the y buffer. Yeah we'd need to
2015 Nov 05
2
AVX Optimizations
...+ 1])( int ord, opus_val16 *mem, int arch ) = { celt_fir_c, /* non-sse */ celt_fir_c, celt_fir_c, MAY_HAVE_SSE4_1(celt_fir), /* sse4.1 */ + MAY_HAVE_SSE4_1(celt_fir) /* avx */ }; void (*const XCORR_KERNEL_IMPL[OPUS_ARCHMASK + 1])( const opus_val16 *x, const opus_val16 *y, opus_val32 sum[4], int len ) = { xcorr_kernel_c, /* non-sse */ xcorr_kernel_c, xcorr_kernel_c, MAY_HAVE_SSE4_1(xcorr_kernel), /* sse4.1...
2013 Jun 07
2
Bug fix in celt_lpc.c and some xcorr_kernel optimizations
Hi JM, I have no doubt that Mr. Zanelli's NEON code is faster, since hand tuned assembly is bound to be faster than using intrinsics. However I notice that his code can also read past the y buffer. Cheers, --John On 6/6/2013 9:22 PM, Jean-Marc Valin wrote: > Hi John, > > Thanks for the two fixes. They're in git now. Your SSE version seems to > also be slightly faster than
2017 Feb 15
4
[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
Hi, Attached are two patches. Patch 1 refactors silk_LPC_analysis_filter(). And Patch 2 optimizes the new function celt_fir_permit_overflow() for ARM NEON. Please recommend a better function name. We did the same internal code review and testing already. Thanks, Linfeng -------------- next part -------------- An HTML attachment was scrubbed... URL:
2017 Mar 01
0
[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
Linfeng Zhang wrote: > xcorr_kernel() itself is great and provides many gains. The only issue > is that calling it in a for loop makes it less efficient. Do you think it would be possible to improve the API of xcorr_kernel() so that calling it in a loop is more efficient? I haven't looked at an instruction-level profile, bu...
2017 Mar 02
0
Antw: Re: [PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
...r to optimize... Regards, Ulrich >>> Linfeng Zhang <linfengz at google.com> schrieb am 01.03.2017 um 20:30 in Nachricht <CAKoqLCANyWDPpy4rccL3TJ37gbhWxRWkCrqR9GCATGhTFoaDyA at mail.gmail.com>: > Hi Timothy, > > Do you think it would be possible to improve the API of xcorr_kernel() so >> that calling it in a loop is more efficient? >> > > If it could be inlined, it will be more efficient. Besides memory bouncing, > frequent function call is expensive. > > The other advantage to wiring up xcorr_kernel() is that it applies in more >> places...
2016 Jun 17
0
ARM NEON optimization -- celt_fir()
...n November and December (which Tim still hasn’t gotten around to reviewing). As they used Neon intrinsics, several of these actually applied to both armv7 and aarch64 Neon. In particular, note http://lists.xiph.org/pipermail/opus/2015-December/003339.html , which added a Neon-optimized version of xcorr_kernel. xcorr_kernel is used in celt_fir, celt_iir, and celt_pitch_xcorr. > On Jun 17, 2016, at 5:09 PM, Linfeng Zhang <linfengz at google.com> wrote: > > Hi all, > > This is Linfeng Zhang from Google. I'll work on ARM NEON optimization in the > next few months. > >...
2013 Jun 10
0
opus Digest, Vol 53, Issue 2
...body 'help' to opus-request at xiph.org You can reach the person managing the list at opus-owner at xiph.org When replying, please edit your Subject line so it is more specific than "Re: Contents of opus digest..." Today's Topics: 1. Re: Bug fix in celt_lpc.c and some xcorr_kernel optimizations (John Ridges) 2. Invitation to connect on LinkedIn (casey guan) 3. Invitation to connect on LinkedIn (casey guan) ---------------------------------------------------------------------- Message: 1 Date: Fri, 07 Jun 2013 16:50:48 -0600 From: John Ridges <jridges at mas...
2016 Sep 28
2
[PATCH 2/5] Optimize fixed-point celt_fir_c() for ARM NEON
...ver 80 lines of complicated NEON intrinsics code] > + } > +#else So, one of the points of SMALL_FOOTPRINT is to reduce the code size on targets where this matters (even if it means running slower), but this is an awful lot of code. I think it makes much more sense to expose the existing xcorr_kernel asm and use that. I wrote a simple patch demonstrating this (attached... it applies on top of your full series, so it'd be a little work to rebase it into place here). It adds one 16-byte table and 16 instructions, and even gives speed-ups on non-NEON CPUs by reusing the existing EDSP asm....