thr3ads.net - search: "sig

2016 Jun 17

5

ARM NEON optimization -- celt_fir()

Hi all, This is Linfeng Zhang from Google. I'll work on ARM NEON optimization in the next few months. I'm submitting 2 patches in the following couple of emails, which have the new created celt_fir_neon(). I revised celt_fir_c() to not pass in argument "mem" in Patch 1. If there are concerns to this change, please let me know. Many thanks to your comments. Linfeng Zhang

[PATCH 2/5] Optimize fixed-point celt_fir_c() for ARM NEON

2016 Jul 14

0

[PATCH 2/5] Optimize fixed-point celt_fir_c() for ARM NEON

...ord-i-4))); + for (;i<ord;i++) + rnum[i] = num[ord-i-1]; + rnum[ord] = rnum[ord+1] = rnum[ord+2] = 0; + (void)arch; + +#ifdef SMALL_FOOTPRINT + for (i=0;i<N-7;i+=8) + { + int16x8_t x_s16x8 = vld1q_s16(_x+i); + int32x4_t sum0_s32x4 = vshll_n_s16(vget_low_s16 (x_s16x8), SIG_SHIFT); + int32x4_t sum1_s32x4 = vshll_n_s16(vget_high_s16(x_s16x8), SIG_SHIFT); + for (j=0;j<ord;j+=4) + { + const int16x4_t rnum_s16x4 = vld1_s16(rnum+j); + x_s16x8 = vld1q_s16(x+i+j+0); + sum0_s32x4 = vmlal_lane_s16(sum0_s32x4, vget_low_s16 (x_s16x8), rnum_s16...

Possible bug in "pitch_downsample"

2010 Nov 16

1

Possible bug in "pitch_downsample"

Hi Jean-Marc, I could be way off base here, but it seems to me that line 115 in pitch.c in the function "pitch_downsample": x_lp[i] = SHR32(HALF32(HALF32(x[1][(2*i-1)]+x[1][(2*i+1)])+x[1][2*i]), SIG_SHIFT+2); should actually be: x_lp[i] += SHR32(HALF32(HALF32(x[1][(2*i-1)]+x[1][(2*i+1)])+x[1][2*i]), SIG_SHIFT+2); Sorry if I'm totally misreading things and just wasting your time. Cheers, John Ridges

Opus 1.1.1 beta breaks floating point integrity?

2014 Oct 15

1

Opus 1.1.1 beta breaks floating point integrity?

HI trying to build libopus beta compiler complains about more fixed point specific names referenced in common unconditional code Particularly: ./celt/x86/celt_lpc_sse.c(100): error: identifier "SIG_SHIFT" is undefined noA = EXTEND32(1) << SIG_SHIFT >> 1; ^ SIG_SHIFT is defined in arch.h provided FIXED_POINT is used (not on floating point build) and silk/x86/x86_silk_map.c unconditionally references alot names defined in fixed point sources for exam...

Several patches of ARM NEON optimization

2016 Jul 14

6

Several patches of ARM NEON optimization

I rebased my previous 3 patches to the current master with minor changes. Patches 1 to 3 replace all my previous submitted patches. Patches 4 and 5 are new. Thanks, Linfeng Zhang

[PATCH] 02-

2013 May 21

0

[PATCH] 02-

.....p] autocorrelation values */ @@ -87,6 +91,7 @@ int p #endif } +#ifndef OVERRIDE_CELT_FIR void celt_fir(const opus_val16 *x, const opus_val16 *num, opus_val16 *y, @@ -101,7 +106,7 @@ void celt_fir(const opus_val16 *x, opus_val32 sum = SHL32(EXTEND32(x[i]), SIG_SHIFT); for (j=0;j<ord;j++) { - sum += MULT16_16(num[j],mem[j]); + sum = MAC16_16(sum, num[j], mem[j]); } for (j=ord-1;j>=1;j--) { @@ -111,6 +116,7 @@ void celt_fir(const opus_val16 *x, y[i] = ROUND16(sum, SIG_SHIFT); } } +#endif voi...

[Aarch64 00/11] Patches to enable Aarch64

2015 Nov 19

2

[Aarch64 00/11] Patches to enable Aarch64

> On Nov 16, 2015, at 4:42 PM, Jonathan Lennox <jonathan at vidyo.com> wrote: > > I haven?t yet tried replacing SIG2WORD16 (or silk_ADD_SAT32/silk_SUB_SAT32) with Neon intrinsics. That?s an obvious next step. This doesn?t show any appreciable speed difference in my tests, but the code is obviously better by inspection (all three of these map directly to a single Aarch64

[PATCH] 02-Add CELT filter optimizations

2013 May 21

2

[PATCH] 02-Add CELT filter optimizations

.....p] autocorrelation values */ @@ -87,6 +91,7 @@ int p #endif } +#ifndef OVERRIDE_CELT_FIR void celt_fir(const opus_val16 *x, const opus_val16 *num, opus_val16 *y, @@ -101,7 +106,7 @@ void celt_fir(const opus_val16 *x, opus_val32 sum = SHL32(EXTEND32(x[i]), SIG_SHIFT); for (j=0;j<ord;j++) { - sum += MULT16_16(num[j],mem[j]); + sum = MAC16_16(sum, num[j], mem[j]); } for (j=ord-1;j>=1;j--) { @@ -111,6 +116,7 @@ void celt_fir(const opus_val16 *x, y[i] = ROUND16(sum, SIG_SHIFT); } } +#endif voi...

innov_save, what is it? why does it hurt me so?

2007 Sep 13

2

innov_save, what is it? why does it hurt me so?

...#39;s it never worked. I found innov_save will write over memory it shouldn't be when I fill memory with 1's but not when I fill it with zeros. When my DSP gets to this code chunk: if (innov_save){ for (i=0;i<st->subframeSize;i++) innov_save[i] = EXTRACT16(PSHR32(innov[i], SIG_SHIFT)); } it will just start filling data in, which it shouldn't. I see that innov_save is set at the beginning of a for loop at: for (sub=0;sub<st->nbSubframes;sub++) { int offset; spx_word16_t *exc; spx_word16_t *sp; spx_word16_t *innov_save = NULL; spx_word...

fir_mem16,iir_mem16 and filter_mem16 optimisations

2008 Aug 02

2

fir_mem16,iir_mem16 and filter_mem16 optimisations

Hi! I have some questions about that functions: fir_mem16, iir_mem16 and filter_mem16. Filtering is very slow on TI DSP, and i want to optimise it. Can somebody give me formulas which discribe work of this filters? Or any suggestions about how to transform code for better performance. I going to implement this functions in assembler, but it is hard to do without full understanding how functions

[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON

2017 Feb 18

0

[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON

...rated. > celt_fir_permit_overflow(): x[i] is accumulated to the sum first and > then truncated saturated. Actually, the two branches are bit-exact for fixed-point. There is indeed a difference in where x[i] gets accumulated, but because in the SMALL_FOOTPRINT case it first gets shifted up by SIG_SHIFT, the result of the downshift (also by SIG_SHIFT) is the same no matter when it gets added. That being said, I thought adding at the beginning was nicer so I changed the remaining code to do that. > Maybe this is the reason why silk_LPC_analysis_filter() switched the FIR > from celt_fir() to...

[ANNOUNCE] PocketPC Port for speex-1.1.5 with sample code

2004 Aug 06

0

[ANNOUNCE] PocketPC Port for speex-1.1.5 with sample code

..._t; typedef spx_word16_t spx_coef_t; typedef spx_word16_t spx_lsp_t; typedef spx_word32_t spx_sig_t; #define LPC_SCALING 8192 #define SIG_SCALING 16384 #define LSP_SCALING 8192. #define GAMMA_SCALING 32768. #define GAIN_SCALING 64 #define GAIN_SCALING_1 0.015625 #define LPC_SHIFT 13 #define SIG_SHIFT 14 #define VERY_SMALL 0 #ifdef ARM_ASM #include "fixed_arm.h" #elif defined (FIXED_DEBUG) #include "fixed_debug.h" #else #include "fixed_generic.h" #endif #else typedef float spx_mem_t; typedef float spx_coef_t; typedef float spx_lsp_t; typedef float spx_sig_...

Speex on TI C6x, Problem with TI C5x Patch

2005 May 25

3

Speex on TI C6x, Problem with TI C5x Patch

..._t; typedef spx_word16_t spx_coef_t; typedef spx_word16_t spx_lsp_t; typedef spx_word32_t spx_sig_t; #define LPC_SCALING 8192 #define SIG_SCALING 16384 #define LSP_SCALING 8192. #define GAMMA_SCALING 32768. #define GAIN_SCALING 64 #define GAIN_SCALING_1 0.015625 #define LPC_SHIFT 13 #define SIG_SHIFT 14 #define VERY_SMALL 0 #ifdef ARM5E_ASM #include "fixed_arm5e.h" #elif defined (ARM4_ASM) #include "fixed_arm4.h" #elif defined (FIXED_DEBUG) #include "fixed_debug.h" #elif defined (C55X_ASM) #include "fixed_c55x.h" #else #include "fixed_generic.h...

11kbps narrowband on a 24bit DSP

2007 Aug 06

2

11kbps narrowband on a 24bit DSP

...32 bit registers. Can anyone tell me what scaling values like #define LPC_SCALING 8192 #define SIG_SCALING 16384 #define LSP_SCALING 8192. #define GAMMA_SCALING 32768. #define GAIN_SCALING 64 #define GAIN_SCALING_1 0.015625 #define LPC_SHIFT 13 #define LSP_SHIFT 13 #define SIG_SHIFT 14 Would be required for a 24 bit register? 2. I would like to only include only the data tables as required to support 11kbps narrowband mode. I think they would be the following signed byte tables cdbk_nb[640] cdbk_nb_low1[320] cdbk_nb_high1[320] gain_cdbk_lbr[128] e...

[PATCH] Blackfin: cleanup astat/cc/hardware loop asm clobbers

2009 Apr 24

2

[PATCH] Blackfin: cleanup astat/cc/hardware loop asm clobbers

...ig_t *x, spx_word16_t *y, spx_sig_t max_scale, int le "LOOP_END norm_max%=;\n\t" : "=&d" (max_val) : "a" (x), "a" (len) - : "R1", "R2" + : "R1", "R2", "ASTAT" BFIN_HWLOOP0_REGS ); sig_shift=0; @@ -74,7 +76,7 @@ int normalize16(const spx_sig_t *x, spx_word16_t *y, spx_sig_t max_scale, int le "R1 = ASHIFT R0 by %2.L;\n\t" "W[P1++] = R1;\n\t" : : "a" (x), "a" (y), "d" (-sig_shift), "a" (len-1) - : "I0", &...

[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON

2017 Feb 15

2

[PATCH] Refactor silk_LPC_analysis_filter() & Optimize celt_fir_permit_overflow() for ARM NEON

Hi Jean-Marc, The original celt_fir() is a little bit messy. It has 2 branches chosen by #ifdef SMALL_FOOTPRINT. For floating-point, the 2 branches are identical (except the operation sequence of accumulating x[i] to sum, which is not a big deal). For fixed-point, the 2 branches are different. I separate them into 2 functions: the new celt_fir(), and celt_fir_permit_overflow() which is the

[ANNOUNCE] PocketPC Port for speex-1.1.5 with sample code

2004 Aug 06

2

[ANNOUNCE] PocketPC Port for speex-1.1.5 with sample code

Hi Jean-Marc, Based on the wonderful Speex project, I've created SpeexOutLoud, essentially a Speex codec port for Windows Mobile 2003 devices. I've included a sample project intended to show the usage of SpeexOutLoud codec in a Pocket PC application based on .NET Compact Framework. I'd request you to please go through the attached build, and include it as a contribution to the

innov_save, what is it? why does it hurt me so?

2007 Sep 13

0

innov_save, what is it? why does it hurt me so?

...emory it shouldn't be when > I fill memory with 1's but not when I fill it with zeros. Which is normal, see below. > When my DSP > gets to this code chunk: > if (innov_save){ > for (i=0;i<st->subframeSize;i++) > innov_save[i] = EXTRACT16(PSHR32(innov[i], SIG_SHIFT)); > } This bit of code is for copying the innov variable to a buffer owned by the wideband decoder -- but only if there is actually a wideband decoder. > it will just start filling data in, which it shouldn't. I see that > innov_save is set at the beginning of a for loop at: > f...

Speex crashing on ARM with assembler optimization enabled.

2007 Dec 12

1

Speex crashing on ARM with assembler optimization enabled.

Alexander Chemeris a ?crit : > Ok, if I comment out inclusion of "filters_arm4.h" (or comment out its > only overridden function - normalize16()) it works fine. > > Also it works fine if I use -O0 for compilation. Specifying -O1 or -O2 lead > to segfault (if "filters_arm4.h" is included, sure). OK, so either I screwed up the alignment/constraints in the

[Aarch64 00/11] Patches to enable Aarch64

2015 Nov 19

0

[Aarch64 00/11] Patches to enable Aarch64

Any speedup from the intrinsics may just be swamped by the rest of the encode/decode process. But I think you really want SIG2WORD16 to be (vqmovns_s32(PSHR32((x), SIG_SHIFT))) On 11/19/2015 2:52 PM, Jonathan Lennox wrote: >> On Nov 16, 2015, at 4:42 PM, Jonathan Lennox <jonathan at vidyo.com> wrote: >> >> I haven?t yet tried replacing SIG2WORD16 (or silk_ADD_SAT32/silk_SUB_SAT32) with Neon intrinsics. That?s an obvious next step. > This d...

search for: sig_shift