thr3ads.net - search: "inner

Displaying 20 results from an estimated 43 matches for "inner_prod".

2006 Feb 03

Speex inner_prod()

Hi, Basically, inner_prod() can and should be adapted to the architecture it will run on. It is not really sensitive to noise, so it's possible to tweak it a lot. Also, in the current code, I saturate it to +-16384, which is OK to prevent overflows. I'm not concerned with the case of a constant -16384 value because...

Speex inner_prod()

2006 Feb 03

Speex inner_prod()

...before the shift. I can fix this by accumulating each term into a long, but if the code scales the x[],y[] vectors to avoid this problem I could use parallel 16x16 multiply/adds. You can see this problem with the following test case. for (i=0;i<40;i++) { x[i]=-16384; y[i]=-32768; } sum0=inner_prod(x, y, 40); fprintf(stderr,"inner_prod0(%8d).\n",sum0); Jerry J. Trantow Applied Signal Processing, Inc. jtrantow@ieee.org

Speex inner_prod(), normalize, C64 MIPS

2006 Feb 04

Speex inner_prod(), normalize, C64 MIPS

...What do you mean here? The C64x has a _dotp2() instruction that does two 16x16 multiplies and adds the products together. Since the values are scaled to 16384, I can add the results of the two _dotp2()s together before the long add without worrying about overflow. I didn't understand that inner_prod() was always passed scaled vectors. That's the danger of optimizing routines without knowing how they are called. I split a norm_shift() out of your normalize16(). This function can also be used twice in pitch_gain_search_3tap(). Are there any other places that would benefit from this optim...

Speex at ARM Devices (Symbian OS)

2006 Feb 03

Speex at ARM Devices (Symbian OS)

> That's possible. In any case, u-law conversion can be done with far less > than 1 MHz... About Speex, you would likely need to enable ARM > optimizations and set the complexity to 1 (default it 2). done with arm optimizations and i was still getting high load ... guess it's from gstreamer somewhere. I'll check that next week. thanks, - Christophe

TI 6xxx platform performance

2006 Jan 18

TI 6xxx platform performance

...encoder. (CBR only, 8khz, 8kbps) To get a feel for the computational load, I am running 1 second (50 frames) of voice through the encoder. My profile of the 6416 indicates I'm at 27.4M cycles/channel. I need to get below 720Mhz/32 channels = 22.5M cycles per channel. I did a little work on inner_prod() and normalize16() and I'm confident I can get 32 channels by optimizing 5 or 6 functions. I expect these numbers to translate over the DM642. Symbol Name Count cycle.Total: Incl. cycle.Total:Excl. compute_weighted_codebook 200 4511420 4511420 iir_mem2 599 3338308 3338308 filter_...

[PATCH] Make SSE Run Time option.

2004 Aug 06

[PATCH] Make SSE Run Time option.

...ly useless for Speex IMO and I doubt it's worth writing 3DNow non-ext code (or even 3DNow! at all). Same for SSE2: Speex simply doesn't use doubles at all. That's why i think only defining NONE, SSE and ALTIVEC (maybe 3DNow?) would be enough. > We already have it implemented for the inner_prod function. After it is > stable and fully tested, we will send you a patch. If you have never done > Altivec coding it is quite simple since it is all C Macro's / functions. > Not nearly as nasty as inline asm code, although the 16 byte alignment > issues can be quite a pain. Our...

[PATCH] Make SSE Run Time option.

2004 Aug 06

[PATCH] Make SSE Run Time option.

...s MMX. 3DNOWEXT implies SSE. SSE2 implies SSEFP. SSEFP implies SSE. Either way, all the current Speex SSE should be flag checked against SSEFP. >Do you already have that implemented? I know it's possible, but the code >will likely be really ugly. We already have it implemented for the inner_prod function. After it is stable and fully tested, we will send you a patch. If you have never done Altivec coding it is quite simple since it is all C Macro's / functions. Not nearly as nasty as inline asm code, although the 16 byte alignment issues can be quite a pain. Our current working cod...

[PATCH] Make SSE Run Time option. Add Win32 SSE code

2004 Aug 06

[PATCH] Make SSE Run Time option. Add Win32 SSE code

...<p>>Actually, SSE also requires 16-byte alignment for most instructions >(except movups, which is slow anyway). That's why I have those kludges >with the pointer masks in the current code. I think we should find a >general solution for the problem. Also, there's one place (inner_prod, >called by the open-loop pirch estimator) where non 16-byte-aligned loads >are really required. It's probably possible to work around that, but it >might require 4 copies of the data (with 4-byte offsets). Agreed, although the inner_prod isn't that big a deal since you can do cle...

Errors in speex lib with Blackfin

2006 Jan 18

Errors in speex lib with Blackfin

Hello! I'v downloaded speex lib 1.1.11.1. I am trying to port speex lib to Blackfin processor. I am using VisualDSP++ 4.0. If I am compiling source codes with using floating point everything ok. When I am compiling with FIXED_POINT defined everything's ok and code works about two times faster. But when I am defining BFIN_ASM I am getting several compiling errors in Blackfin assembler

[PATCH] build warnings in mdf.c

2005 Nov 08

[PATCH] build warnings in mdf.c

...==== --- libspeex/mdf.c (revision 10357) +++ libspeex/mdf.c (working copy) @@ -314,11 +314,10 @@ /*printf ("%f\n", leak_estimate);*/ } - float adapt_rate; + float adapt_rate = 0; if (!st->adapted) { float Sxx; - adapt_rate; Sxx = inner_prod(st->x+st->frame_size, st->x+st->frame_size, st->frame_size); /* We consider that the filter is adapted if the following is true*/

[PATCH] Make SSE Run Time option.

2004 Aug 06

[PATCH] Make SSE Run Time option.

...IMO and I doubt it's >worth writing 3DNow non-ext code (or even 3DNow! at all). Same for SSE2: >Speex simply doesn't use doubles at all. That's why i think only >defining NONE, SSE and ALTIVEC (maybe 3DNow?) would be enough. > > > We already have it implemented for the inner_prod function. After it is > > stable and fully tested, we will send you a patch. If you have never done > > Altivec coding it is quite simple since it is all C Macro's / functions. > > Not nearly as nasty as inline asm code, although the 16 byte alignment > > issues can be q...

Re: speex echo cancellation limitations

2006 May 01

Re: speex echo cancellation limitations

> I am writing to gain a better understanding of the limitations of speex echo > cancellation, esp. with respect to the fixed point implementation. > If these limitations have been documented elsewhere already, please let me > know! Nothing officially documented, sorry. > I observe experimentally that when one or both of the echo or ref data for > speex_echo_cancel() have

Resampler saturation, blackfin performance

2009 Jun 14

Resampler saturation, blackfin performance

...t resample.patch converts the "unrolled > by four" loop into a plain one that's easier on DSPs, right? Yes exactly, plus a little explanation in comments. I really have no idea of the performance difference on x86. But I think gcc/msvc can unroll. Up to you. Anyway I can OVERRIDE_INNER_PRODUCT_SINGLE. Talking about performance (still using generic version with VDSP compiler): 1. I got a pretty good boost by using a scratch buffer in SRAM. 2. Wideband Encode+Decode takes 79.1 + 7.2 MIPS on my BF536 400/133 Mhz 3. Profiler says: vq_nbest 33.05% vq_nbest_sign...

Re: speex echo cancellation limitations

2006 May 02

Re: speex echo cancellation limitations

...t;Pyy),14)); where st->Pey and st->Pyy are both zero, which happens for the following input data to testecho program: -- first arg is file containing sine wave of magnitude +/- 32767 -- 2nd arg is file containing all zeroes The division by zero appears to be caused by the calculation: See = inner_prod(st->e+st->frame_size, st->e+st->frame_size, st->frame_size) which returns negative due to overflow occuring in mdf.c:inner_prod() : part = MAC16_16(part,*x++,*y++); part = MAC16_16(part,*x++,*y++); part = MAC16_16(part,*x++,*y++); part = MAC16_16(part,*x++,*y+...

TI 6xxx platform performance

2006 Jan 19

TI 6xxx platform performance

...voice through the encoder. You might want to use a bit more just so you don't see the initialization complexity at all. > My profile of the 6416 indicates I'm at 27.4M cycles/channel. I need to get > below 720Mhz/32 channels = 22.5M cycles per channel. I did a little work on > inner_prod() and normalize16() and I'm confident I can get 32 channels by > optimizing 5 or 6 functions. I expect these numbers to translate over the > DM642. have you tried defining PRECISION16? That should reduce the computation cost. > A lower cost option would be to use a floating point 6...

[PATCH] Make SSE Run Time option. Add Win32 SSE code

2004 Aug 06

[PATCH] Make SSE Run Time option. Add Win32 SSE code

Jean-Marc, There is a big difference between SSE and SSEFP. The SSEFP means that the CPU supports the xmm registers. All Intel chips with SSE support do, however no current 32 bit AMD chips support the XMM registers. They will support the SSE instructions but not those registers. You are right about the SSE2 not being used. The AMD Opterons are the first AMD CPU's which support

TI 6xxx platform performance

2006 Jan 18

TI 6xxx platform performance

Info on Symbian, ARM and OFFSET_IMM8 relocation error

2007 Apr 02

Info on Symbian, ARM and OFFSET_IMM8 relocation error

Hi all, i'm using speex under symbian. When i have compiled the lib for ARM platform i have obtained the follow error: "Error: Can not represent OFFSET_IMM8 relocation in this object file format (1)" I have defined FIXED_POINT 1 and ARM4_ASM. The error is in the function forced_pitch_quant contained in ltp.c. The line that produce the error is:

Fixed Point on wideband-mode: Single Frame loss on 2000 Hz sine causes "freak off"

2009 Dec 18

Fixed Point on wideband-mode: Single Frame loss on 2000 Hz sine causes "freak off"

...tp.c: line 68 ADD32: output is not int: 1455757562 in ltp.c: line 69 Call stack for this (only set the break point once, may be there are other call stacks on error, too): sb_decode ( ) at sb_celp.c:898 nb_decode ( ) at nb_celp.c:1471 multicomb ( ) at filter.c:709 interp_pitch ( ) at filter.c:603 inner_prod ( ) at ltp.c:68 Can you help please? best regards, Frank ___________________________________________________________ Preisknaller: WEB.DE DSL Flatrate f?r nur 16,99 Euro/mtl.! http://produkte.web.de/go/02/

Fwd: Fixed Point on wideband-mode: Single Frame loss on 2000 Hz sine causes "freak off"

2009 Dec 21

Fwd: Fixed Point on wideband-mode: Single Frame loss on 2000 Hz sine causes "freak off"

...tp.c: line 68 ADD32: output is not int: 1455757562 in ltp.c: line 69 Call stack for this (only set the break point once, may be there are other call stacks on error, too): sb_decode ( ) at sb_celp.c:898 nb_decode ( ) at nb_celp.c:1471 multicomb ( ) at filter.c:709 interp_pitch ( ) at filter.c:603 inner_prod ( ) at ltp.c:68 Can you help please? best regards, Frank ______________________________________________________ GRATIS f?r alle WEB.DE-Nutzer: Die maxdome Movie-FLAT! Jetzt freischalten unter http://movieflat.web.de

search for: inner_prod