Displaying 20 results from an estimated 40 matches for "celt_fir".
2017 Feb 15
2
[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
Hi Jean-Marc,
The original celt_fir() is a little bit messy. It has 2 branches chosen by
#ifdef SMALL_FOOTPRINT.
For floating-point, the 2 branches are identical (except the operation
sequence of accumulating x[i] to sum, which is not a big deal).
For fixed-point, the 2 branches are different. I separate them into 2
functions: the ne...
2013 May 23
2
ASM runtime detection and optimizations
...ndex);
@@ -496,7 +499,7 @@ static void celt_decode_lost(CELTDecoder * OPUS_RESTRICT st, opus_val16 * OPUS_R
ROUND16(buf[DECODE_BUFFER_SIZE-exc_length-1-i], SIG_SHIFT);
}
/* Compute the excitation for exc_length samples before the loss. */
- celt_fir(exc+MAX_PERIOD-exc_length, lpc+c*LPC_ORDER,
+ celt_fir[st->arch&OPUS_ARCHMASK](exc+MAX_PERIOD-exc_length, lpc+c*LPC_ORDER,
exc+MAX_PERIOD-exc_length, exc_length, LPC_ORDER, lpc_mem);
}
diff --git a/celt/celt_encoder.c b/celt/celt_encoder.c
index 26e6...
2017 Feb 18
0
[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
Hi Linfeng,
On 15/02/17 04:05 PM, Linfeng Zhang wrote:
> The original celt_fir() is a little bit messy. It has 2 branches chosen
> by #ifdef SMALL_FOOTPRINT.
Yeah, I agree that the #ifdef SMALL_FOOTPRINT in celt_fir() is a bit of
overkill since it's not saving much code space. I just pushed a commit
that gets rid of it, also refactoring the #else case a bit (see below...
2016 Jun 17
0
ARM NEON optimization -- celt_fir()
...l hasn’t gotten around to reviewing). As they used Neon intrinsics, several of these actually applied to both armv7 and aarch64 Neon.
In particular, note http://lists.xiph.org/pipermail/opus/2015-December/003339.html , which added a Neon-optimized version of xcorr_kernel. xcorr_kernel is used in celt_fir, celt_iir, and celt_pitch_xcorr.
> On Jun 17, 2016, at 5:09 PM, Linfeng Zhang <linfengz at google.com> wrote:
>
> Hi all,
>
> This is Linfeng Zhang from Google. I'll work on ARM NEON optimization in the
> next few months.
>
> I'm submitting 2 patches in the...
2017 Feb 15
4
[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
Hi,
Attached are two patches. Patch 1 refactors silk_LPC_analysis_filter(). And
Patch 2 optimizes the new function celt_fir_permit_overflow() for ARM NEON.
Please recommend a better function name.
We did the same internal code review and testing already.
Thanks,
Linfeng
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xiph.org/pipermail/opus/attachments/20170215/c44c802...
2016 Jun 17
5
ARM NEON optimization -- celt_fir()
Hi all,
This is Linfeng Zhang from Google. I'll work on ARM NEON optimization in the
next few months.
I'm submitting 2 patches in the following couple of emails, which have the new
created celt_fir_neon().
I revised celt_fir_c() to not pass in argument "mem" in Patch 1. If there are
concerns to this change, please let me know.
Many thanks to your comments.
Linfeng Zhang
2017 Mar 01
2
[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
>
> I believe the solution would be to always have either:
> 1) USE_CELT_FIR=1 and use ovflw() macros in the xcorr code; or
> 2) USE_CELT_FIR=0 and no ovflw() in the xcorr code
>
I prefer to create a function named silk_fir() with optimization to do the
calculation when USE_CELT_FIR=0.
xcorr_kernel() itself is great and provides many gains. The only issue is
that ca...
2017 Mar 01
3
[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
...corr_kernel() so
> that calling it in a loop is more efficient?
>
If it could be inlined, it will be more efficient. Besides memory bouncing,
frequent function call is expensive.
The other advantage to wiring up xcorr_kernel() is that it applies in more
> places than your intrinsics-only celt_fir() implementation.
>
I agree.
One solution is to put the outer for(N) loop inside xcorr_kernel() to let
it return N results instead of 4 (similar to the celt_fir() NEON intrinsics
did). This will make it efficient plus universal.
Thanks,
-------------- next part --------------
An HTML attachme...
2015 Nov 05
2
AVX Optimizations
Yes,
Thank you. I'll follow up with the AVX code and tests for pitch code.
Radu
-----Original Message-----
From: opus-bounces at xiph.org [mailto:opus-bounces at xiph.org] On Behalf Of Timothy B. Terriberry
Sent: Thursday, November 5, 2015 10:31 AM
To: opus at xiph.org
Subject: Re: [opus] AVX Optimizations
Velea, Radu wrote:
> I've created a pull request[1] to enable configuration
2013 Jun 07
2
Bug fix in celt_lpc.c and some xcorr_kernel optimizations
...n git now. Your SSE version seems to
> also be slightly faster than mine -- probably due the the partial sums.
> As for the NEON code, it would be good to compare the performance with
> the code Aur?lien Zanelli posted at
> http://darkosphere.fr/public/0002-Add-optimized-NEON-version-of-celt_fir-celt_iir-and-.patch
>
> Cheers,
>
> Jean-Marc
>
>
2017 Mar 01
0
[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
...elieve that the function prologue/epilogue is really responsible for 1%
to 1.5% of the whole decoder cost. Perhaps it is just bouncing the
values in and out of memory from the NEON pipeline or something like
that which is expensive? Otherwise it seems to be doing exactly the same
things as your celt_fir() (unless I've missed something, which is
certainly possible).
The other advantage to wiring up xcorr_kernel() is that it applies in
more places than your intrinsics-only celt_fir() implementation.
2017 Mar 02
0
Antw: Re: [PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON
...g it in a loop is more efficient?
>>
>
> If it could be inlined, it will be more efficient. Besides memory bouncing,
> frequent function call is expensive.
>
> The other advantage to wiring up xcorr_kernel() is that it applies in more
>> places than your intrinsics-only celt_fir() implementation.
>>
>
> I agree.
>
> One solution is to put the outer for(N) loop inside xcorr_kernel() to let
> it return N results instead of 4 (similar to the celt_fir() NEON intrinsics
> did). This will make it efficient plus universal.
>
> Thanks,
2016 Jul 14
0
[PATCH 2/5] Optimize fixed-point celt_fir_c() for ARM NEON
Create the fixed-point intrinsics optimization celt_fir_neon() for ARM NEON.
Create test tests/test_unit_optimization to unit test the optimization.
---
.gitignore | 1 +
Makefile.am | 39 ++++-
celt/arm/arm_celt_map.c | 17 +++
celt/arm/celt_lpc_arm.h | 65 ++...
2015 Nov 05
0
AVX Optimizations
...arch(void);
#else
#define OPUS_ARCHMASK 0
static OPUS_INLINE int opus_select_arch(void)
{
return 0;
diff --git a/celt/x86/x86_celt_map.c b/celt/x86/x86_celt_map.c
index 1ed2acb..8e5e449 100644
--- a/celt/x86/x86_celt_map.c
+++ b/celt/x86/x86_celt_map.c
@@ -48,44 +48,47 @@ void (*const CELT_FIR_IMPL[OPUS_ARCHMASK + 1])(
int ord,
opus_val16 *mem,
int arch
) = {
celt_fir_c, /* non-sse */
celt_fir_c,
celt_fir_c,
MAY_HAVE_SSE4_1(celt_fir), /* sse4.1 */
+ MAY_HAVE_SSE4_1(celt_fir) /* avx */...
2016 Sep 28
2
[PATCH 2/5] Optimize fixed-point celt_fir_c() for ARM NEON
...e existing EDSP asm.
Testing on comp48-stereo.sw encoded to 64 kbps and decoded with a 15%
loss rate on a Novena using opus_demo (by using RTCD and changing the
function pointers to the version of the code to test), optimizing
xcorr_kernel gives almost as much speed-up as intrinsics for all of
celt_fir:
celt_fir_c, xcorr_kernel_c:
1753 ms (stddev 9) [1730 1740 {1740 1740 1740 1750 1750 1750 1750 1750
1750 1750 1750 1750 1750 1750 1760 1760 1760 1760 1770 1770} 1780 1860]
celt_fir_c, xcorr_kernel_neon:
1710 ms (stddev 12) [1680 1690 {1690 1690 1700 1700 1700 1700 1710 1710
1710 1710 1710 1710...
2013 Jun 07
1
Bug fix in celt_lpc.c and some xcorr_kernel optimizations
...s to
>>> also be slightly faster than mine -- probably due the the partial sums.
>>> As for the NEON code, it would be good to compare the performance with
>>> the code Aur?lien Zanelli posted at
>>> http://darkosphere.fr/public/0002-Add-optimized-NEON-version-of-celt_fir-celt_iir-and-.patch
>>>
>>>
>>> Cheers,
>>>
>>> Jean-Marc
>>>
>>>
>>
>>
>
2015 Nov 05
2
AVX Optimizations
...arch(void);
#else
#define OPUS_ARCHMASK 0
static OPUS_INLINE int opus_select_arch(void)
{
return 0;
diff --git a/celt/x86/x86_celt_map.c b/celt/x86/x86_celt_map.c index 1ed2acb..8e5e449 100644
--- a/celt/x86/x86_celt_map.c
+++ b/celt/x86/x86_celt_map.c
@@ -48,44 +48,47 @@ void (*const CELT_FIR_IMPL[OPUS_ARCHMASK + 1])(
int ord,
opus_val16 *mem,
int arch
) = {
celt_fir_c, /* non-sse */
celt_fir_c,
celt_fir_c,
MAY_HAVE_SSE4_1(celt_fir), /* sse4.1 */
+ MAY_HAVE_SSE4_1(celt_fir) /* avx */...
2013 Jun 07
2
Bug fix in celt_lpc.c and some xcorr_kernel optimizations
Hi JM,
At line 221 in celt_lpc.c (the celt_iir function) I think you really
want the RESTORE_STACK statement to be before the #endif instead of
after it. Also, I couldn't help notice that your SSE code for
xcorr_kernel reads more than "len" elements of "_x". I don't know if
that's really a problem when running the codec, but a tool like valgrind
will have a
2013 May 21
0
[PATCH] 02-
...quot;
+#ifdef ARM_HAVE_NEON
+#include "celt_lpc_neon.h"
+#endif
+
void _celt_lpc(
opus_val16 *_lpc, /* out: [0...p-1] LPC coefficients */
const opus_val32 *ac, /* in: [0...p] autocorrelation values */
@@ -87,6 +91,7 @@ int p
#endif
}
+#ifndef OVERRIDE_CELT_FIR
void celt_fir(const opus_val16 *x,
const opus_val16 *num,
opus_val16 *y,
@@ -101,7 +106,7 @@ void celt_fir(const opus_val16 *x,
opus_val32 sum = SHL32(EXTEND32(x[i]), SIG_SHIFT);
for (j=0;j<ord;j++)
{
- sum += MULT16_16(num[j],mem[j]);
+...
2013 May 21
2
[PATCH] 02-Add CELT filter optimizations
...quot;
+#ifdef ARM_HAVE_NEON
+#include "celt_lpc_neon.h"
+#endif
+
void _celt_lpc(
opus_val16 *_lpc, /* out: [0...p-1] LPC coefficients */
const opus_val32 *ac, /* in: [0...p] autocorrelation values */
@@ -87,6 +91,7 @@ int p
#endif
}
+#ifndef OVERRIDE_CELT_FIR
void celt_fir(const opus_val16 *x,
const opus_val16 *num,
opus_val16 *y,
@@ -101,7 +106,7 @@ void celt_fir(const opus_val16 *x,
opus_val32 sum = SHL32(EXTEND32(x[i]), SIG_SHIFT);
for (j=0;j<ord;j++)
{
- sum += MULT16_16(num[j],mem[j]);
+...