thr3ads.net - search: "celt

Bug fix in celt_lpc.c and some xcorr_kernel optimizations

2013 Jun 07

2

Bug fix in celt_lpc.c and some xcorr_kernel optimizations

Hi JM, At line 221 in celt_lpc.c (the celt_iir function) I think you really want the RESTORE_STACK statement to be before the #endif instead of after it. Also, I couldn't help notice that your SSE code for xcorr_kernel reads more than "len" elements of "_x". I don't know if that's really a problem when runnin...

Bug fix in celt_lpc.c and some xcorr_kernel optimizations

2013 Jun 07

2

Bug fix in celt_lpc.c and some xcorr_kernel optimizations

.... Your SSE version seems to > also be slightly faster than mine -- probably due the the partial sums. > As for the NEON code, it would be good to compare the performance with > the code Aur?lien Zanelli posted at > http://darkosphere.fr/public/0002-Add-optimized-NEON-version-of-celt_fir-celt_iir-and-.patch > > Cheers, > > Jean-Marc > >

Bug fix in celt_lpc.c and some xcorr_kernel optimizations

2013 Jun 07

0

Bug fix in celt_lpc.c and some xcorr_kernel optimizations

...ey're in git now. Your SSE version seems to also be slightly faster than mine -- probably due the the partial sums. As for the NEON code, it would be good to compare the performance with the code Aur?lien Zanelli posted at http://darkosphere.fr/public/0002-Add-optimized-NEON-version-of-celt_fir-celt_iir-and-.patch Cheers, Jean-Marc On 06/06/2013 08:07 PM, John Ridges wrote: > Hi JM, > > At line 221 in celt_lpc.c (the celt_iir function) I think you really > want the RESTORE_STACK statement to be before the #endif instead of > after it. Also, I couldn't help notice that you...

Bug fix in celt_lpc.c and some xcorr_kernel optimizations

2013 Jun 07

0

Bug fix in celt_lpc.c and some xcorr_kernel optimizations

...on seems to >> also be slightly faster than mine -- probably due the the partial sums. >> As for the NEON code, it would be good to compare the performance with >> the code Aur?lien Zanelli posted at >> http://darkosphere.fr/public/0002-Add-optimized-NEON-version-of-celt_fir-celt_iir-and-.patch >> >> >> Cheers, >> >> Jean-Marc >> >> > > >

ARM NEON optimization -- celt_fir()

2016 Jun 17

0

ARM NEON optimization -- celt_fir()

...otten around to reviewing). As they used Neon intrinsics, several of these actually applied to both armv7 and aarch64 Neon. In particular, note http://lists.xiph.org/pipermail/opus/2015-December/003339.html , which added a Neon-optimized version of xcorr_kernel. xcorr_kernel is used in celt_fir, celt_iir, and celt_pitch_xcorr. > On Jun 17, 2016, at 5:09 PM, Linfeng Zhang <linfengz at google.com> wrote: > > Hi all, > > This is Linfeng Zhang from Google. I'll work on ARM NEON optimization in the > next few months. > > I'm submitting 2 patches in the following...

[Aarch64 v2 08/18] Add Neon fixed-point implementation of xcorr_kernel.

2015 Nov 21

0

[Aarch64 v2 08/18] Add Neon fixed-point implementation of xcorr_kernel.

Used for celt_pitch_xcorr on aarch64, and celt_fir and celt_iir on both armv7 and aarch64. --- celt/arm/arm_celt_map.c | 17 +++++++++++++ celt/arm/celt_neon_intr.c | 61 ++++++++++++++++++++++++++++++++++++++++++++++- celt/arm/pitch_arm.h | 31 +++++++++++++++++++++++- 3 files changed, 107 insertions(+), 2 deletions(-) diff --git a/celt/arm/arm_celt_m...

[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON

2017 Mar 01

2

[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON

...named silk_fir() with optimization to do the calculation when USE_CELT_FIR=0. xcorr_kernel() itself is great and provides many gains. The only issue is that calling it in a for loop makes it less efficient. xcorr_kernel() is called in several functions including celt_fir(), celt_pitch_xcorr() and celt_iir(). All these functions are not heavy hitters. silk_LPC_analysis_filter()'s CPU cycles are 6.8% with complexity 8 and 8.9% with complexity 5 out of the whole encoder. It probably makes sense to have a specific optimization to not calling xcorr_kernel() too many times to save 1% to 1.5% CPU cycle...

Bug fix in celt_lpc.c and some xcorr_kernel optimizations

2013 Jun 07

1

Bug fix in celt_lpc.c and some xcorr_kernel optimizations

...>> also be slightly faster than mine -- probably due the the partial sums. >>> As for the NEON code, it would be good to compare the performance with >>> the code Aur?lien Zanelli posted at >>> http://darkosphere.fr/public/0002-Add-optimized-NEON-version-of-celt_fir-celt_iir-and-.patch >>> >>> >>> Cheers, >>> >>> Jean-Marc >>> >>> >> >> >

[PATCH] 02-

2013 May 21

0

[PATCH] 02-

...for (j=0;j<ord;j++) { - sum += MULT16_16(num[j],mem[j]); + sum = MAC16_16(sum, num[j], mem[j]); } for (j=ord-1;j>=1;j--) { @@ -111,6 +116,7 @@ void celt_fir(const opus_val16 *x, y[i] = ROUND16(sum, SIG_SHIFT); } } +#endif void celt_iir(const opus_val32 *x, const opus_val16 *den, @@ -136,6 +142,7 @@ void celt_iir(const opus_val32 *x, } } +#ifndef OVERRIDE_CELT_AUTOCORR void _celt_autocorr( const opus_val16 *x, /* in: [0...n-1] samples x */ opus_val32 *ac, /* out...

ARM NEON optimization -- celt_fir()

2016 Jun 17

5

ARM NEON optimization -- celt_fir()

Hi all, This is Linfeng Zhang from Google. I'll work on ARM NEON optimization in the next few months. I'm submitting 2 patches in the following couple of emails, which have the new created celt_fir_neon(). I revised celt_fir_c() to not pass in argument "mem" in Patch 1. If there are concerns to this change, please let me know. Many thanks to your comments. Linfeng Zhang

[PATCH] 02-Add CELT filter optimizations

2013 May 21

2

[PATCH] 02-Add CELT filter optimizations

...for (j=0;j<ord;j++) { - sum += MULT16_16(num[j],mem[j]); + sum = MAC16_16(sum, num[j], mem[j]); } for (j=ord-1;j>=1;j--) { @@ -111,6 +116,7 @@ void celt_fir(const opus_val16 *x, y[i] = ROUND16(sum, SIG_SHIFT); } } +#endif void celt_iir(const opus_val32 *x, const opus_val16 *den, @@ -136,6 +142,7 @@ void celt_iir(const opus_val32 *x, } } +#ifndef OVERRIDE_CELT_AUTOCORR void _celt_autocorr( const opus_val16 *x, /* in: [0...n-1] samples x */ opus_val32 *ac, /* out...

opus Digest, Vol 53, Issue 2

2013 Jun 10

0

opus Digest, Vol 53, Issue 2

...>> also be slightly faster than mine -- probably due the the partial sums. >>> As for the NEON code, it would be good to compare the performance with >>> the code Aur?lien Zanelli posted at >>> http://darkosphere.fr/public/0002-Add-optimized-NEON-version-of-celt_fir-celt_iir-and-.patch >>> >>> >>> Cheers, >>> >>> Jean-Marc >>> >>> >> >> > ------------------------------ Message: 2 Date: Sat, 8 Jun 2013 02:54:03 +0000 (UTC) From: casey guan <guanxiansun at gmail.com> Subject: [o...

[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON

2017 Feb 15

2

[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON

Hi Jean-Marc, The original celt_fir() is a little bit messy. It has 2 branches chosen by #ifdef SMALL_FOOTPRINT. For floating-point, the 2 branches are identical (except the operation sequence of accumulating x[i] to sum, which is not a big deal). For fixed-point, the 2 branches are different. I separate them into 2 functions: the new celt_fir(), and celt_fir_permit_overflow() which is the

ASM runtime detection and optimizations

2013 May 23

2

ASM runtime detection and optimizations

...} -#ifndef OVERRIDE_CELT_FIR -void celt_fir(const opus_val16 *x, +void celt_fir_c(const opus_val16 *x, const opus_val16 *num, opus_val16 *y, int N, @@ -116,7 +127,6 @@ void celt_fir(const opus_val16 *x, y[i] = ROUND16(sum, SIG_SHIFT); } } -#endif void celt_iir(const opus_val32 *x, const opus_val16 *den, @@ -142,7 +152,6 @@ void celt_iir(const opus_val32 *x, } } -#ifndef OVERRIDE_CELT_AUTOCORR void _celt_autocorr( const opus_val16 *x, /* in: [0...n-1] samples x */ opus_val32 *ac, /* out...

[Aarch64 00/11] Patches to enable Aarch64 (arm64) optimizations, rebased to current master.

2015 Nov 07

12

[Aarch64 00/11] Patches to enable Aarch64 (arm64) optimizations, rebased to current master.

...es rebased to the current tip of Opus master. They're largely the same as my previous patch set, with the addition of the final one (the Neon fixed-point implementation of xcorr_kernel). This replaces Viswanath's Neon fixed-point celt_pitch_xcorr, since xcorr_kernel is used in celt_fir and celt_iir as well. These have been tested for correctness under qemu (including running the test vectors), but not yet performance tested on a live aarch64 CPU (which will probably be an iPhone). I should be able to do this Monday or Tuesday. Jonathan Lennox (11): Move ARM-specific macro overrides to ar...

[AArch64 neon intrinsics v4 0/5] Rework Neon intrinsic code for Aarch64 patchset

2015 Dec 23

6

[AArch64 neon intrinsics v4 0/5] Rework Neon intrinsic code for Aarch64 patchset

Following Tim's comments, here are my reworked patches for the Neon intrinsic function patches of of my Aarch64 patchset, i.e. replacing patches 5-8 of the v2 series. Patches 1-4 and 9-18 of the old series still apply unmodified. The one new (as opposed to changed) patch is the first one in this series, to add named constants for the ARM architecture variants. There are also some minor code

[Aarch64 v2 00/18] Patches to enable Aarch64 (version 2)

2015 Nov 21

12

[Aarch64 v2 00/18] Patches to enable Aarch64 (version 2)

As promised, here's a re-send of all my Aarch64 patches, following comments by John Ridges. Note that they actually affect more than just Aarch64 -- other than the ones specifically guarded by AARCH64_NEON defines, the Neon intrinsics all also apply on armv7; and the OPUS_FAST_INT64 patches apply on any 64-bit machine. The patches should largely be independent and independently useful, other

Several patches of ARM NEON optimization

2016 Jul 14

6

Several patches of ARM NEON optimization

I rebased my previous 3 patches to the current master with minor changes. Patches 1 to 3 replace all my previous submitted patches. Patches 4 and 5 are new. Thanks, Linfeng Zhang

search for: celt_iir