thr3ads.net - Speex dev - [Speex-dev] Speex echo canceller on TI C55 DSP [May 2006]

If this information is useful, please help other people find it:
Share via:

Jean-Marc Valin

2006-May-02 17:14 UTC

[Speex-dev] Re: speex echo cancellation limitations

Hi Ted,

Thanks a lot for this analysis.
> In FLOAT_DIVU() it hangs at the following:
>    while (a.m >= b.m)
>    {
>       e++;
>       a.m >>= 1;
>    }
> for the case where a and b are both zero (yes, division by zero).
> This happens from mdf.c:
True, that needs to be fixed even after I fix the rest.
> leak_estimate = FLOAT_EXTRACT16(FLOAT_SHL(FLOAT_DIVU(st->Pey,
st->Pyy),14));
> where st->Pey and st->Pyy are both zero, which happens for the
following input
> data to testecho program:
> -- first arg is file containing sine wave of magnitude +/- 32767
> -- 2nd arg is file containing all zeroes
> 
> The division by zero appears to be caused by the calculation:
> See = inner_prod(st->e+st->frame_size, st->e+st->frame_size,
st->frame_size)
Does that also happen with "real life" signals or just high-amplitude
sinusoids (probably worth fixing anyway).
> which returns negative due to overflow occuring in mdf.c:inner_prod() :
>       part = MAC16_16(part,*x++,*y++);
>       part = MAC16_16(part,*x++,*y++);
>       part = MAC16_16(part,*x++,*y++);
>       part = MAC16_16(part,*x++,*y++);
>      sum = ADD32(sum,SHR32(part,6));
> This overflow can be avoided by rewriting this as:
>       part = part + ((*x++ * *y++)>>1);
>       part = part + ((*x++ * *y++)>>1);
>       part = part + ((*x++ * *y++)>>1);
>       part = part + ((*x++ * *y++)>>1);
>       sum += part>>5;
> I am assuming that the value 0x8000 is avoided in the input.
I think we can safely assume that -- or actually enforce that because it
would likely break other stuff.
> Even with this fix there is definitely some bad stuff going on; the output 
> data is corrupted looking.
> I put assertions into FLOAT_MUL32U(), FLOAT_DIV32_FLOAT() and FLOAT_DIV32()
to
> assert that the "a" arguments were non-negative. 
Technically these functions *should* work (they don't at the moment) for
negative inputs, but mdf.c isn't supposed to use that in the first
place. I guess it comes down to the same problem as above...
> Using some real life data, i 
> found that i had to shift the real life data right by two (i shifted both 
> inputs by same amount) to avoid asserting; shifting by just one almost
worked
> but failed for some case (i don't remember what that case was... not
the one
> i mentioned above).
I'd be interested in that sample if you can find it.
> [Alternately i could just clip the input to some threshold].
Except for the -32768 value, I don't want to do that because clipping is
highly non-linear and thus something the AEC doesn't like at all.
> To be on the safe side with the above functions, i removed the asserts and 
> modified the tests for == 0 to <= 0 ...
> 
> Clearly, some sort of automated testing scheme could help here...
One thing I used to do (should do it more often) is test with
FIXED_DEBUG (--enable-fixed-point-debug) which puts assertions in all
fixed-point operators (except the pseudofloat stuff, which is rather
new). Helps a lot tracking problems down.

	Jean-Marc

Jim Crichton

2006-May-08 09:12 UTC

head link

[Speex-dev] Speex echo canceller on TI C55 DSP

Jean-Marc,

I recently started looking at running the echo canceller on a TI C55 DSP 
along with the 8kbps narrowband Speex encoder/decoder.  This is one of those 
"braindead compilers" that you refer to from time to time, and cannot
handle
the float struct assignments in the return statements in pseudofloat.h.

Most of these were eliminated in build 11311 (patch by Brian Retford), but 
there were four left that I had to break apart.  I started with build 11343.

I got several compiler warnings for "shift out of range" in mdf.c,
which I
fixed by adding EXTEND32 to all of the SHL32s with 16 bit operands 
(st->frame_size in 6 places, st->wtmp2 in 1 place).  I have not sent
patches
for these two changes, because I still have other problems.

If fftwrap.c, I ifdefed out the spx_fft_float and spx_ifft_float routines, 
because there were not used and required smallft.c (which is not so small at 
all) to be added to the build.

With these changes, the link was successful, using testecho.c with some 
modifications for the C55 environment.  The code and data memory 
requirements were a lot more than I had hoped (>20kbytes of dynamic data 
memory for block size=128, tail length = 1024), and I will probably not be 
able to fit it in the production build without some trimming.

When I run the build, it goes into an infinite loop in FLOAT_DIV32 (mdf.c 
line 660), which occurs because adapt_rate is < 0, which happens when 
FLOAT_EXTRACT16 gets the input {0x7ff0, 0xfffb}.  The rounding is causing 
the result to go negative.  I worked around this by changing

      return (a.m+(1<<(-a.e-1)))>>-a.e;
to
      return (((spx_uint16_t) a.m)+(1<<(-a.e-1)))>>-a.e;

in FLOAT_EXTRACT16.  This changes the returned value from 0xfc00 to 0x400. 
Now it runs on for a while, then hits another infinite loop at mdf.c line 
641:

         st->power_1[i] = 
FLOAT_SHL(FLOAT_DIV32_FLOAT(MULT16_32_Q15(M_1,r),FLOAT_MUL32U(e,st->power[i]+10)),WEIGHT_SHIFT+16);

I have not had time to trace this, but it looks like a similar problem, 
where the result of MULT16_32_Q15(M_1,r) is negative, and FLOAT_DIV32_FLOAT 
bombs.  Maybe the best thing to do next is to instrument the routines in 
pseudofloat.h which have loops, but I will not get to that for a day or two.

I need to make a decision soon whether to seriously pursue making this work. 
With that in mind, here are some questions:

1.  speex_echo_state_init takes about 20M instructions, which is a little 
frightening, and the calls to speex_echo_cancel take about 630K instructions 
for 128 samples.  Given your recent experience with the fixed point 
canceller, does this sound rational?  The MIPs for the canceller are similar 
to the MIPs for the encoder running 8kbps, complexity 1.

2.  The testecho example uses a frame length and tail size that are powers 
of two (128, 1024).  Are there any implications to using sizes which are not 
powers of two?  It would be most convenient to use the encoder frame size 
(160), and some multiple of that for the tail size.  How does the frame size 
affect performance (I understand that the tail length determines what echo 
signals are cancelable)?

3.  Do you have any suggestions for code/data memory reduction for the 
canceller, other than to make the tail length no longer than necessary (this 
is a line echo canceller for a local phone, so I should be able to keep it 
to 40ms).  I was surprised by the size of the FFT code, but I guess that it 
is doing much more than the radix2 version in the TI library.

Regards,

Jim Crichton

Jim Crichton

2006-May-08 15:18 UTC

head link

[Speex-dev] Speex echo canceller on TI C55 DSP

Jean-Marc,

I have traced the second infinite loop further.  When st->adapted becomes 
true (mdf.c line 623), the first Yf[i] value is 4, the leak_estimate is 
0xd4e, the resulting r is 3.  The first value in st->Rf is 0, so e is 1, and 
r is set to e>>1, or 0.  A little later there is a divide by r, and there
is
the hang.

It seems that the 0 in Rf[0] is the problem, but I am not sure what else to 
look at there.  The first few 32-bit values in Rf are:

0 0 1 1 0 2 D D 5 0 8 11 19 3D 51 74 3A 8 11 8 4

Do you have any suggestions?

- Jim

----- Original Message ----- 
From: "Jim Crichton" <jim.crichton@comcast.net>
To: <speex-dev@xiph.org>
Sent: Monday, May 08, 2006 12:11 PM
Subject: [Speex-dev] Speex echo canceller on TI C55 DSP

> Jean-Marc,
>
> I recently started looking at running the echo canceller on a TI C55 DSP 
> along with the 8kbps narrowband Speex encoder/decoder.  This is one of 
> those "braindead compilers" that you refer to from time to time,
and
> cannot handle the float struct assignments in the return statements in 
> pseudofloat.h.
>
> Most of these were eliminated in build 11311 (patch by Brian Retford), but 
> there were four left that I had to break apart.  I started with build 
> 11343.
>
> I got several compiler warnings for "shift out of range" in
mdf.c, which I
> fixed by adding EXTEND32 to all of the SHL32s with 16 bit operands 
> (st->frame_size in 6 places, st->wtmp2 in 1 place).  I have not sent 
> patches for these two changes, because I still have other problems.
>
> If fftwrap.c, I ifdefed out the spx_fft_float and spx_ifft_float routines, 
> because there were not used and required smallft.c (which is not so small 
> at all) to be added to the build.
>
> With these changes, the link was successful, using testecho.c with some 
> modifications for the C55 environment.  The code and data memory 
> requirements were a lot more than I had hoped (>20kbytes of dynamic data
> memory for block size=128, tail length = 1024), and I will probably not be 
> able to fit it in the production build without some trimming.
>
> When I run the build, it goes into an infinite loop in FLOAT_DIV32 (mdf.c 
> line 660), which occurs because adapt_rate is < 0, which happens when 
> FLOAT_EXTRACT16 gets the input {0x7ff0, 0xfffb}.  The rounding is causing 
> the result to go negative.  I worked around this by changing
>
>      return (a.m+(1<<(-a.e-1)))>>-a.e;
> to
>      return (((spx_uint16_t) a.m)+(1<<(-a.e-1)))>>-a.e;
>
> in FLOAT_EXTRACT16.  This changes the returned value from 0xfc00 to 0x400. 
> Now it runs on for a while, then hits another infinite loop at mdf.c line 
> 641:
>
>         st->power_1[i] = 
>
FLOAT_SHL(FLOAT_DIV32_FLOAT(MULT16_32_Q15(M_1,r),FLOAT_MUL32U(e,st->power[i]+10)),WEIGHT_SHIFT+16);
>
> I have not had time to trace this, but it looks like a similar problem, 
> where the result of MULT16_32_Q15(M_1,r) is negative, and 
> FLOAT_DIV32_FLOAT bombs.  Maybe the best thing to do next is to instrument 
> the routines in pseudofloat.h which have loops, but I will not get to that 
> for a day or two.
>
> I need to make a decision soon whether to seriously pursue making this 
> work. With that in mind, here are some questions:
>
> 1.  speex_echo_state_init takes about 20M instructions, which is a little 
> frightening, and the calls to speex_echo_cancel take about 630K 
> instructions for 128 samples.  Given your recent experience with the fixed 
> point canceller, does this sound rational?  The MIPs for the canceller are 
> similar to the MIPs for the encoder running 8kbps, complexity 1.
>
> 2.  The testecho example uses a frame length and tail size that are powers 
> of two (128, 1024).  Are there any implications to using sizes which are 
> not powers of two?  It would be most convenient to use the encoder frame 
> size (160), and some multiple of that for the tail size.  How does the 
> frame size affect performance (I understand that the tail length 
> determines what echo signals are cancelable)?
>
> 3.  Do you have any suggestions for code/data memory reduction for the 
> canceller, other than to make the tail length no longer than necessary 
> (this is a line echo canceller for a local phone, so I should be able to 
> keep it to 40ms).  I was surprised by the size of the FFT code, but I 
> guess that it is doing much more than the radix2 version in the TI 
> library.
>
> Regards,
>
> Jim Crichton
>
> _______________________________________________
> Speex-dev mailing list
> Speex-dev@xiph.org
> http://lists.xiph.org/mailman/listinfo/speex-dev
>

Jean-Marc Valin

2006-May-08 15:49 UTC

head link

[Speex-dev] Speex echo canceller on TI C55 DSP

Hi Jim,

I've just been made aware of these problems (look for the thread "speex
echo cancellation limitations"). It's on my short-term TODO list.
> If fftwrap.c, I ifdefed out the spx_fft_float and spx_ifft_float routines, 
> because there were not used and required smallft.c (which is not so small
at
> all) to be added to the build.
Right, need to cleanup that part...
> With these changes, the link was successful, using testecho.c with some 
> modifications for the C55 environment.  The code and data memory 
> requirements were a lot more than I had hoped (>20kbytes of dynamic data
> memory for block size=128, tail length = 1024), and I will probably not be 
> able to fit it in the production build without some trimming.
Yes, there may be a bit of memory reduction possible here. Of course,
decreasing the tail length is also a rather easy way.
> When I run the build, it goes into an infinite loop in FLOAT_DIV32 (mdf.c 
> line 660), which occurs because adapt_rate is < 0, which happens when 
> FLOAT_EXTRACT16 gets the input {0x7ff0, 0xfffb}.  The rounding is causing 
> the result to go negative.  I worked around this by changing
I think that was mentioned in the previous thread...
>       return (a.m+(1<<(-a.e-1)))>>-a.e;
> to
>       return (((spx_uint16_t) a.m)+(1<<(-a.e-1)))>>-a.e;
Is that sufficient to remove all the overflows at this place?
> I have not had time to trace this, but it looks like a similar problem, 
> where the result of MULT16_32_Q15(M_1,r) is negative, and FLOAT_DIV32_FLOAT
> bombs.  Maybe the best thing to do next is to instrument the routines in 
> pseudofloat.h which have loops, but I will not get to that for a day or
two.
Yeah, r is never supposed to be negative and the float routines assume
that.
> 1.  speex_echo_state_init takes about 20M instructions, which is a little 
> frightening, 
That's the fft initialization that calls a lot of float cos() functions.
If you have a fixed version of cos() you can use it there, otherwise a
fixed table (for a certain size) would work.
> and the calls to speex_echo_cancel take about 630K instructions 
> for 128 samples.  Given your recent experience with the fixed point 
> canceller, does this sound rational?  The MIPs for the canceller are
similar
> to the MIPs for the encoder running 8kbps, complexity 1.
The order of magnitude seems right. It may be possible to reduce that a
bit, though. If you have an optimized FFT, you could replace kiss_fft
with it and get a big improvement right there.
> 2.  The testecho example uses a frame length and tail size that are powers 
> of two (128, 1024).  Are there any implications to using sizes which are
not
> powers of two?  It would be most convenient to use the encoder frame size 
> (160), and some multiple of that for the tail size.  How does the frame
size
> affect performance (I understand that the tail length determines what echo 
> signals are cancelable)?
Non powers of two will be a bit slower because of the FFT, but that's
all. I made sure the echo canceller works with 160, precisely because
it's the frame size used by Speex. Note that I don't recommend using
frames more than 20 ms long (at any sampling rate).
> 3.  Do you have any suggestions for code/data memory reduction for the 
> canceller, other than to make the tail length no longer than necessary
(this
> is a line echo canceller for a local phone, so I should be able to keep it 
> to 40ms).  I was surprised by the size of the FFT code, but I guess that it
> is doing much more than the radix2 version in the TI library.
The FFT code has more than just the radix two, so you can save there. It
wasn't meant to be an optimized FFT, so if TI supplies you with one,
it's probably a good idea to use it (that's what fft_wrap is for). Also,
given that the memory use is almost directly proportional to the tail
length, reducing that one to 40 ms will make a huge difference.

	Jean-Marc

Apparently Analagous Threads

Search for more apparently analagous threads

Speex dev - May 2006 - Speex echo canceller on TI C55 DSP

[Speex-dev] Re: speex echo cancellation limitations

[Speex-dev] Speex echo canceller on TI C55 DSP

[Speex-dev] Speex echo canceller on TI C55 DSP

[Speex-dev] Speex echo canceller on TI C55 DSP

Apparently Analagous Threads