thr3ads.net - search: "mac16

[PATCH 1/2] arm: Use the UAL syntax for ldr<cc>h instructions

2014 Feb 08

3

[PATCH 1/2] arm: Use the UAL syntax for ldr<cc>h instructions

On Fri, 7 Feb 2014, Timothy B. Terriberry wrote: > Martin Storsjo wrote: >> This is required in order to build using the built-in assembler >> in clang. > > These patches break the gcc build (with "Error: bad instruction"). Ah, right, sorry about that. > Documentation I've seen is contradictory on which order ({cond}{size} or > {size}{cond}) is correct.

[PATCH v2] arm: Use the UAL syntax for instructions

2014 Feb 08

0

[PATCH v2] arm: Use the UAL syntax for instructions

...orr_arm.s b/celt/arm/celt_pitch_xcorr_arm.s index 09917b1..598e45b 100644 --- a/celt/arm/celt_pitch_xcorr_arm.s +++ b/celt/arm/celt_pitch_xcorr_arm.s @@ -309,7 +309,7 @@ xcorr_kernel_edsp_process4_done SUBS r2, r2, #1 ; j-- ; Stall SMLABB r6, r12, r10, r6 ; sum[0] = MAC16_16(sum[0],x,y_0) - LDRGTH r14, [r4], #2 ; r14 = *x++ + LDRHGT r14, [r4], #2 ; r14 = *x++ SMLABT r7, r12, r10, r7 ; sum[1] = MAC16_16(sum[1],x,y_1) SMLABB r8, r12, r11, r8 ; sum[2] = MAC16_16(sum[2],x,y_2) SMLABT r9, r12, r11, r9 ; sum[3] = MAC16...

[PATCH 1/2] arm: Use the UAL syntax for ldr<cc>h instructions

2014 Feb 07

3

[PATCH 1/2] arm: Use the UAL syntax for ldr<cc>h instructions

...orr_arm.s b/celt/arm/celt_pitch_xcorr_arm.s index 09917b1..3c4b950 100644 --- a/celt/arm/celt_pitch_xcorr_arm.s +++ b/celt/arm/celt_pitch_xcorr_arm.s @@ -309,7 +309,7 @@ xcorr_kernel_edsp_process4_done SUBS r2, r2, #1 ; j-- ; Stall SMLABB r6, r12, r10, r6 ; sum[0] = MAC16_16(sum[0],x,y_0) - LDRGTH r14, [r4], #2 ; r14 = *x++ + LDRHGT r14, [r4], #2 ; r14 = *x++ SMLABT r7, r12, r10, r7 ; sum[1] = MAC16_16(sum[1],x,y_1) SMLABB r8, r12, r11, r8 ; sum[2] = MAC16_16(sum[2],x,y_2) SMLABT r9, r12, r11, r9 ; sum[3] = MAC16...

Re: speex echo cancellation limitations

2006 May 01

2

Re: speex echo cancellation limitations

> I am writing to gain a better understanding of the limitations of speex echo > cancellation, esp. with respect to the fixed point implementation. > If these limitations have been documented elsewhere already, please let me > know! Nothing officially documented, sorry. > I observe experimentally that when one or both of the echo or ref data for > speex_echo_cancel() have

[PATCH] 02-Add CELT filter optimizations

2013 May 21

2

[PATCH] 02-Add CELT filter optimizations

Please ignore my previous mail and patch, there is a new version :). Patch changes are: - Use MAC16_16 macros instead of (sum += a*b) and unroll a loop by 2. It increase performance when using optimized macros (ex: ARMv5E). A possible side effect of loop unroll is that i don't check for odd length here. - Add NEON version of FIR filter and autocorr - Add a section in autoconf in order to chec...

[PATCH 12/15] Replace call of celt_inner_prod_c() (step 1)

2016 Sep 13

4

[PATCH 12/15] Replace call of celt_inner_prod_c() (step 1)

Should call celt_inner_prod(). --- celt/bands.c | 7 ++++--- celt/bands.h | 2 +- celt/celt_encoder.c | 6 +++--- celt/pitch.c | 2 +- src/opus_multistream_encoder.c | 2 +- 5 files changed, 10 insertions(+), 9 deletions(-) diff --git a/celt/bands.c b/celt/bands.c index bbe8a4c..1ab24aa 100644 --- a/celt/bands.c +++ b/celt/bands.c

[PATCH] 02-

2013 May 21

0

[PATCH] 02-

- Use MAC16_16 macros instead of (sum += a*b) and unroll a loop by 2. It increase performance when using optimized macros (ex: ARMv5E). A possible side effect of loop unroll is that i don't check for odd length here. - Add NEON version of FIR filter and autocorr -- Aur?lien Zanelli Parrot SA 174, quai d...

Re: speex echo cancellation limitations

2006 May 02

0

Re: speex echo cancellation limitations

...agnitude +/- 32767 -- 2nd arg is file containing all zeroes The division by zero appears to be caused by the calculation: See = inner_prod(st->e+st->frame_size, st->e+st->frame_size, st->frame_size) which returns negative due to overflow occuring in mdf.c:inner_prod() : part = MAC16_16(part,*x++,*y++); part = MAC16_16(part,*x++,*y++); part = MAC16_16(part,*x++,*y++); part = MAC16_16(part,*x++,*y++); sum = ADD32(sum,SHR32(part,6)); This overflow can be avoided by rewriting this as: part = part + ((*x++ * *y++)>>1); part = part + ((*x++ * *y...

ARM NEON optimization -- celt_fir()

2016 Jun 17

5

ARM NEON optimization -- celt_fir()

Hi all, This is Linfeng Zhang from Google. I'll work on ARM NEON optimization in the next few months. I'm submitting 2 patches in the following couple of emails, which have the new created celt_fir_neon(). I revised celt_fir_c() to not pass in argument "mem" in Patch 1. If there are concerns to this change, please let me know. Many thanks to your comments. Linfeng Zhang

Re: speex echo cancellation limitations

2006 May 02

3

Re: speex echo cancellation limitations

..._prod(st->e+st->frame_size, st->e+st->frame_size, st->frame_size) Does that also happen with "real life" signals or just high-amplitude sinusoids (probably worth fixing anyway). > which returns negative due to overflow occuring in mdf.c:inner_prod() : > part = MAC16_16(part,*x++,*y++); > part = MAC16_16(part,*x++,*y++); > part = MAC16_16(part,*x++,*y++); > part = MAC16_16(part,*x++,*y++); > sum = ADD32(sum,SHR32(part,6)); > This overflow can be avoided by rewriting this as: > part = part + ((*x++ * *y++)>>1); &...

high-pass filter issues

2007 Aug 29

2

high-pass filter issues

...num; if (filtID>4) filtID=4; den = Pcoef[filtID]; num = Zcoef[filtID]; /*return;*/ for (i=0;i<len;i++) { spx_word16_t yi; spx_word32_t vout = ADD32(MULT16_16(num[0], x[i]),mem[0]); yi = EXTRACT16(SATURATE(PSHR32(vout,14),32767)); mem[0] = ADD32(MAC16_16(mem[1], num[1],x[i]), SHL32(MULT16_32_Q15(-den[1],vout),1)); mem[1] = ADD32(MULT16_16(num[2],x[i]), SHL32(MULT16_32_Q15(-den[2],vout),1)); y[i] = yi; } } I can step into the function just fine, but when I run it, even just from the initial variable declarations to the top of that...

[Patch]01-Add ARM5E macros

2013 May 17

1

[Patch]01-Add ARM5E macros

...us_val16 a, opus_val32 b) +{ + int res; + __asm__( + "smlawb %0, %1, %2, %3;\n" + : "=&r"(res) + : "%r"(b<<1),"r"(a), "r"(c) + ); + return res; +} + +/** 16x16 multiply-add where the result fits in 32 bits */ +#undef MAC16_16 +static inline opus_val32 MAC16_16(opus_val32 c, opus_val16 a, opus_val16 b) +{ + __asm__( + "smlabb %0, %1, %2, %0;\n" + : "=&r"(c), "=r"(a), "=r"(b) + : "0"(c), "1"(a), "2"(b) + ); + return c; +} + +/*...

[PATCH] compute_weighted_codebook a little bit faster

2005 Dec 09

1

[PATCH] compute_weighted_codebook a little bit faster

Hi, here is a patch making the function compute_weighted_codebook a little bit faster. Not so impressive but avoid a loop and is really faster on small platforms like the MIPS I'm working on. Enjoy, Matthieu Poullet -------------- next part -------------- A non-text attachment was scrubbed... Name: cwc_patch Type: application/octet-stream Size: 1226 bytes Desc: not available Url :

fixed point macros

2004 Aug 06

1

fixed point macros

...ally and are defined to short (16 bits) and int (32 bits) > for fixed-point. As for the macros, here are some of them (the rest > should be easy to guess): > > ADD16, ADD32 adders for 16 and 32 bits > MULT16_16 multiply a 16 bit value by another 16 bit value (result in > 32) > MAC16_16 same but also adds to the first argument > MULT16_16_Q15 multiply a 16 bit value by another 16 bit value and shift > right by 15 (result assumed to fit in 16 bits) > > Note that all these functions DO NOT perform saturation, so you need to > make sure that the operations can't po...

[PATCH] Pitch now quantised at the band level, got rid of all the VQ code.

2009 Jan 14

0

[PATCH] Pitch now quantised at the band level, got rid of all the VQ code.

...- for (i=0;i<entries;i++) - { - celt_word32_t dist=0; - const celt_pgain_t *inp = in; - j=0; do { - celt_pgain_t tmp1 = SUB16(*inp++,PGAIN_EVEN14(codebook, ind)); - celt_pgain_t tmp2 = SUB16(*inp++,PGAIN_ODD14(codebook, ind)); - ind++; - dist = MAC16_16(dist, tmp1, tmp1); - dist = MAC16_16(dist, tmp2, tmp2); - } while (++j<len>>1); - if (dist<min_dist) - { - min_dist=dist; - best_index=i; - } - } - return best_index; -} - -int quant_pitch(celt_pgain_t *gains, int len) -{ - int i, id; -...

[ANNOUNCE] PocketPC Port for speex-1.1.5 with sample code

2004 Aug 06

0

[ANNOUNCE] PocketPC Port for speex-1.1.5 with sample code

...) (a) #define SHL(a,shift) (a) #define SATURATE(x,a) (x) #define ADD16(a,b) ((a)+(b)) #define SUB16(a,b) ((a)-(b)) #define ADD32(a,b) ((a)+(b)) #define SUB32(a,b) ((a)-(b)) #define ADD64(a,b) ((a)+(b)) #define MULT16_16_16(a,b) ((a)*(b)) #define MULT16_16(a,b) ((a)*(b)) #define MAC16_16(c,a,b) ((c)+(a)*(b)) #define MULT16_32_Q11(a,b) ((a)*(b)) #define MULT16_32_Q13(a,b) ((a)*(b)) #define MULT16_32_Q14(a,b) ((a)*(b)) #define MULT16_32_Q15(a,b) ((a)*(b)) #define MAC16_32_Q11(c,a,b) ((c)+(a)*(b)) #define MAC16_32_Q15(c,a,b) ((c)+(a)*(b)) #define MAC16_1...

[RFC PATCH v3] Intrinsics/RTCD related fixes. Mostly x86.

2015 Mar 13

1

[RFC PATCH v3] Intrinsics/RTCD related fixes. Mostly x86.

...)); + xsum1 = _mm_add_ss(xsum1, _mm_shuffle_ps(xsum1, xsum1, 0x55)); + _mm_store_ss(xy1, xsum1); + xsum2 = _mm_add_ps(xsum2, _mm_movehl_ps(xsum2, xsum2)); + xsum2 = _mm_add_ss(xsum2, _mm_shuffle_ps(xsum2, xsum2, 0x55)); + _mm_store_ss(xy2, xsum2); + for (;i<N;i++) + { + *xy1 = MAC16_16(*xy1, x[i], y01[i]); + *xy2 = MAC16_16(*xy2, x[i], y02[i]); + } } -#endif -#if defined(OPUS_X86_MAY_HAVE_SSE2) -opus_val32 celt_inner_prod_sse2(const opus_val16 *x, const opus_val16 *y, +opus_val32 celt_inner_prod_sse(const opus_val16 *x, const opus_val16 *y, int N) { - opus_in...

[RFC PATCHv2] Intrinsics/RTCD related fixes. Mostly x86.

2015 Mar 12

1

[RFC PATCHv2] Intrinsics/RTCD related fixes. Mostly x86.

...)); + xsum1 = _mm_add_ss(xsum1, _mm_shuffle_ps(xsum1, xsum1, 0x55)); + _mm_store_ss(xy1, xsum1); + xsum2 = _mm_add_ps(xsum2, _mm_movehl_ps(xsum2, xsum2)); + xsum2 = _mm_add_ss(xsum2, _mm_shuffle_ps(xsum2, xsum2, 0x55)); + _mm_store_ss(xy2, xsum2); + for (;i<N;i++) + { + *xy1 = MAC16_16(*xy1, x[i], y01[i]); + *xy2 = MAC16_16(*xy2, x[i], y02[i]); + } } -#endif -#if defined(OPUS_X86_MAY_HAVE_SSE2) -opus_val32 celt_inner_prod_sse2(const opus_val16 *x, const opus_val16 *y, +opus_val32 celt_inner_prod_sse(const opus_val16 *x, const opus_val16 *y, int N) { - opus_in...

[ANNOUNCE] PocketPC Port for speex-1.1.5 with sample code

2004 Aug 06

2

[ANNOUNCE] PocketPC Port for speex-1.1.5 with sample code

Hi Jean-Marc, Based on the wonderful Speex project, I've created SpeexOutLoud, essentially a Speex codec port for Windows Mobile 2003 devices. I've included a sample project intended to show the usage of SpeexOutLoud codec in a Pocket PC application based on .NET Compact Framework. I'd request you to please go through the attached build, and include it as a contribution to the

Speex for TI 5509 DSP

2005 Mar 02

7

Speex for TI 5509 DSP

I saw a thread in the list archives about a speex port to TI 55x DSP. Wondering how that worked out (is working out)? Also wondering if there is a source archive for it, or if the patch in the email archives is still current, or if there's been updates. Any info appreciated. Thanks Paul

search for: mac16_16