thr3ads.net - Speex dev - [Speex-dev] Speex echo canceller on TI C55 DSP [May 2006]

If this information is useful, please help other people find it:
Share via:

Jean-Marc Valin

2006-May-08 15:49 UTC

[Speex-dev] Speex echo canceller on TI C55 DSP

Hi Jim,

I've just been made aware of these problems (look for the thread "speex
echo cancellation limitations"). It's on my short-term TODO list.
> If fftwrap.c, I ifdefed out the spx_fft_float and spx_ifft_float routines, 
> because there were not used and required smallft.c (which is not so small
at
> all) to be added to the build.
Right, need to cleanup that part...
> With these changes, the link was successful, using testecho.c with some 
> modifications for the C55 environment.  The code and data memory 
> requirements were a lot more than I had hoped (>20kbytes of dynamic data
> memory for block size=128, tail length = 1024), and I will probably not be 
> able to fit it in the production build without some trimming.
Yes, there may be a bit of memory reduction possible here. Of course,
decreasing the tail length is also a rather easy way.
> When I run the build, it goes into an infinite loop in FLOAT_DIV32 (mdf.c 
> line 660), which occurs because adapt_rate is < 0, which happens when 
> FLOAT_EXTRACT16 gets the input {0x7ff0, 0xfffb}.  The rounding is causing 
> the result to go negative.  I worked around this by changing
I think that was mentioned in the previous thread...
>       return (a.m+(1<<(-a.e-1)))>>-a.e;
> to
>       return (((spx_uint16_t) a.m)+(1<<(-a.e-1)))>>-a.e;
Is that sufficient to remove all the overflows at this place?
> I have not had time to trace this, but it looks like a similar problem, 
> where the result of MULT16_32_Q15(M_1,r) is negative, and FLOAT_DIV32_FLOAT
> bombs.  Maybe the best thing to do next is to instrument the routines in 
> pseudofloat.h which have loops, but I will not get to that for a day or
two.
Yeah, r is never supposed to be negative and the float routines assume
that.
> 1.  speex_echo_state_init takes about 20M instructions, which is a little 
> frightening, 
That's the fft initialization that calls a lot of float cos() functions.
If you have a fixed version of cos() you can use it there, otherwise a
fixed table (for a certain size) would work.
> and the calls to speex_echo_cancel take about 630K instructions 
> for 128 samples.  Given your recent experience with the fixed point 
> canceller, does this sound rational?  The MIPs for the canceller are
similar
> to the MIPs for the encoder running 8kbps, complexity 1.
The order of magnitude seems right. It may be possible to reduce that a
bit, though. If you have an optimized FFT, you could replace kiss_fft
with it and get a big improvement right there.
> 2.  The testecho example uses a frame length and tail size that are powers 
> of two (128, 1024).  Are there any implications to using sizes which are
not
> powers of two?  It would be most convenient to use the encoder frame size 
> (160), and some multiple of that for the tail size.  How does the frame
size
> affect performance (I understand that the tail length determines what echo 
> signals are cancelable)?
Non powers of two will be a bit slower because of the FFT, but that's
all. I made sure the echo canceller works with 160, precisely because
it's the frame size used by Speex. Note that I don't recommend using
frames more than 20 ms long (at any sampling rate).
> 3.  Do you have any suggestions for code/data memory reduction for the 
> canceller, other than to make the tail length no longer than necessary
(this
> is a line echo canceller for a local phone, so I should be able to keep it 
> to 40ms).  I was surprised by the size of the FFT code, but I guess that it
> is doing much more than the radix2 version in the TI library.
The FFT code has more than just the radix two, so you can save there. It
wasn't meant to be an optimized FFT, so if TI supplies you with one,
it's probably a good idea to use it (that's what fft_wrap is for). Also,
given that the memory use is almost directly proportional to the tail
length, reducing that one to 40 ms will make a huge difference.

	Jean-Marc

Jim Crichton

2006-May-08 17:06 UTC

head link

[Speex-dev] Speex echo canceller on TI C55 DSP

> I've just been made aware of these problems (look for the thread
"speex
> echo cancellation limitations"). It's on my short-term TODO list.
I saw the other thread, my problems happened in different (but similar) 
routines.
>> If fftwrap.c, I ifdefed out the spx_fft_float and spx_ifft_float 
>> routines,
>> because there were not used and required smallft.c (which is not so
small
>> at
>> all) to be added to the build.
>
> Right, need to cleanup that part...
>
>> With these changes, the link was successful, using testecho.c with some
>> modifications for the C55 environment.  The code and data memory
>> requirements were a lot more than I had hoped (>20kbytes of dynamic
data
>> memory for block size=128, tail length = 1024), and I will probably not
>> be
>> able to fit it in the production build without some trimming.
>
> Yes, there may be a bit of memory reduction possible here. Of course,
> decreasing the tail length is also a rather easy way.
>
>> When I run the build, it goes into an infinite loop in FLOAT_DIV32
(mdf.c
>> line 660), which occurs because adapt_rate is < 0, which happens
when
>> FLOAT_EXTRACT16 gets the input {0x7ff0, 0xfffb}.  The rounding is
causing
>> the result to go negative.  I worked around this by changing
>
> I think that was mentioned in the previous thread...
>
>>       return (a.m+(1<<(-a.e-1)))>>-a.e;
>> to
>>       return (((spx_uint16_t) a.m)+(1<<(-a.e-1)))>>-a.e;
>
> Is that sufficient to remove all the overflows at this place?
The rounding takes the value to exactly 0x8000, and it is followed by a 
right shift, so you just need to avoid the sign extension.
>> I have not had time to trace this, but it looks like a similar problem,
>> where the result of MULT16_32_Q15(M_1,r) is negative, and 
>> FLOAT_DIV32_FLOAT
>> bombs.  Maybe the best thing to do next is to instrument the routines
in
>> pseudofloat.h which have loops, but I will not get to that for a day or
>> two.
>
> Yeah, r is never supposed to be negative and the float routines assume
> that.
No, it was a divide by zero, as explained in my second post.  I will try a 
build on the C6x DSP to see if this is a 16 vs. 32-bit problem.  I sent the 
test files off-list.
>> 1.  speex_echo_state_init takes about 20M instructions, which is a
little
>> frightening,
>
> That's the fft initialization that calls a lot of float cos()
functions.
> If you have a fixed version of cos() you can use it there, otherwise a
> fixed table (for a certain size) would work.
>
>> and the calls to speex_echo_cancel take about 630K instructions
>> for 128 samples.  Given your recent experience with the fixed point
>> canceller, does this sound rational?  The MIPs for the canceller are 
>> similar
>> to the MIPs for the encoder running 8kbps, complexity 1.
>
> The order of magnitude seems right. It may be possible to reduce that a
> bit, though. If you have an optimized FFT, you could replace kiss_fft
> with it and get a big improvement right there.
Yeah, but then I have to try to actually understand the algorithm.  I am not 
sure that those brain cells are still alive.
>> 2.  The testecho example uses a frame length and tail size that are 
>> powers
>> of two (128, 1024).  Are there any implications to using sizes which
are
>> not
>> powers of two?  It would be most convenient to use the encoder frame
size
>> (160), and some multiple of that for the tail size.  How does the frame
>> size
>> affect performance (I understand that the tail length determines what 
>> echo
>> signals are cancelable)?
>
> Non powers of two will be a bit slower because of the FFT, but that's
> all. I made sure the echo canceller works with 160, precisely because
> it's the frame size used by Speex. Note that I don't recommend
using
> frames more than 20 ms long (at any sampling rate).
>
>> 3.  Do you have any suggestions for code/data memory reduction for the
>> canceller, other than to make the tail length no longer than necessary 
>> (this
>> is a line echo canceller for a local phone, so I should be able to keep
>> it
>> to 40ms).  I was surprised by the size of the FFT code, but I guess
that
>> it
>> is doing much more than the radix2 version in the TI library.
>
> The FFT code has more than just the radix two, so you can save there. It
> wasn't meant to be an optimized FFT, so if TI supplies you with one,
> it's probably a good idea to use it (that's what fft_wrap is for).
Also,
> given that the memory use is almost directly proportional to the tail
> length, reducing that one to 40 ms will make a huge difference.
Thanks for the advice.

- Jim

Jean-Marc Valin

2006-May-09 01:08 UTC

head link

[Speex-dev] Speex echo canceller on TI C55 DSP

> >>       return (a.m+(1<<(-a.e-1)))>>-a.e;
> >> to
> >>       return (((spx_uint16_t)
a.m)+(1<<(-a.e-1)))>>-a.e;
> >
> > Is that sufficient to remove all the overflows at this place?
> 
> The rounding takes the value to exactly 0x8000, and it is followed by a 
> right shift, so you just need to avoid the sign extension.
OK, I'll have a closer look to understand what happens.
> No, it was a divide by zero, as explained in my second post.  I will try a 
> build on the C6x DSP to see if this is a 16 vs. 32-bit problem.  I sent the
> test files off-list.
Will have a closer look at "r". Let me know the results on a C64.
> > The order of magnitude seems right. It may be possible to reduce that
a
> > bit, though. If you have an optimized FFT, you could replace kiss_fft
> > with it and get a big improvement right there.
> 
> Yeah, but then I have to try to actually understand the algorithm.  I am
not
> sure that those brain cells are still alive.
No need to understand the algorithm. All you need to do is adapt 
fft_wrap.c to wrap your fft (making sure the scaling is the same) and 
the echo canceller will use that FFT.

	Jean-Marc

Jean-Marc Valin

2006-May-09 03:59 UTC

head link

[Speex-dev] Speex echo canceller on TI C55 DSP

Just tried your files and I'm not running into any infinite loops and
the cancellation works fine. Unless the C6x has the same problem, I
suspect a 16-bit problem. I'll check and see if I find something. About
the r=0 problem, I can't find where it ends up in a denominator, so I
suspect is not (directly) the problem.

	Jean-Marc

Le lundi 08 mai 2006 ? 20:05 -0400, Jim Crichton a ?crit
:> > I've just been made aware of these problems (look for the thread
"speex
> > echo cancellation limitations"). It's on my short-term TODO
list.
> 
> I saw the other thread, my problems happened in different (but similar) 
> routines.
> 
> >> If fftwrap.c, I ifdefed out the spx_fft_float and spx_ifft_float 
> >> routines,
> >> because there were not used and required smallft.c (which is not
so small
> >> at
> >> all) to be added to the build.
> >
> > Right, need to cleanup that part...
> >
> >> With these changes, the link was successful, using testecho.c with
some
> >> modifications for the C55 environment.  The code and data memory
> >> requirements were a lot more than I had hoped (>20kbytes of
dynamic data
> >> memory for block size=128, tail length = 1024), and I will
probably not
> >> be
> >> able to fit it in the production build without some trimming.
> >
> > Yes, there may be a bit of memory reduction possible here. Of course,
> > decreasing the tail length is also a rather easy way.
> >
> >> When I run the build, it goes into an infinite loop in FLOAT_DIV32
(mdf.c
> >> line 660), which occurs because adapt_rate is < 0, which
happens when
> >> FLOAT_EXTRACT16 gets the input {0x7ff0, 0xfffb}.  The rounding is
causing
> >> the result to go negative.  I worked around this by changing
> >
> > I think that was mentioned in the previous thread...
> >
> >>       return (a.m+(1<<(-a.e-1)))>>-a.e;
> >> to
> >>       return (((spx_uint16_t)
a.m)+(1<<(-a.e-1)))>>-a.e;
> >
> > Is that sufficient to remove all the overflows at this place?
> 
> The rounding takes the value to exactly 0x8000, and it is followed by a 
> right shift, so you just need to avoid the sign extension.
> 
> >> I have not had time to trace this, but it looks like a similar
problem,
> >> where the result of MULT16_32_Q15(M_1,r) is negative, and 
> >> FLOAT_DIV32_FLOAT
> >> bombs.  Maybe the best thing to do next is to instrument the
routines in
> >> pseudofloat.h which have loops, but I will not get to that for a
day or
> >> two.
> >
> > Yeah, r is never supposed to be negative and the float routines assume
> > that.
> 
> No, it was a divide by zero, as explained in my second post.  I will try a 
> build on the C6x DSP to see if this is a 16 vs. 32-bit problem.  I sent the
> test files off-list.
> 
> >> 1.  speex_echo_state_init takes about 20M instructions, which is a
little
> >> frightening,
> >
> > That's the fft initialization that calls a lot of float cos()
functions.
> > If you have a fixed version of cos() you can use it there, otherwise a
> > fixed table (for a certain size) would work.
> >
> >> and the calls to speex_echo_cancel take about 630K instructions
> >> for 128 samples.  Given your recent experience with the fixed
point
> >> canceller, does this sound rational?  The MIPs for the canceller
are
> >> similar
> >> to the MIPs for the encoder running 8kbps, complexity 1.
> >
> > The order of magnitude seems right. It may be possible to reduce that
a
> > bit, though. If you have an optimized FFT, you could replace kiss_fft
> > with it and get a big improvement right there.
> 
> Yeah, but then I have to try to actually understand the algorithm.  I am
not
> sure that those brain cells are still alive.
> 
> >> 2.  The testecho example uses a frame length and tail size that
are
> >> powers
> >> of two (128, 1024).  Are there any implications to using sizes
which are
> >> not
> >> powers of two?  It would be most convenient to use the encoder
frame size
> >> (160), and some multiple of that for the tail size.  How does the
frame
> >> size
> >> affect performance (I understand that the tail length determines
what
> >> echo
> >> signals are cancelable)?
> >
> > Non powers of two will be a bit slower because of the FFT, but
that's
> > all. I made sure the echo canceller works with 160, precisely because
> > it's the frame size used by Speex. Note that I don't recommend
using
> > frames more than 20 ms long (at any sampling rate).
> >
> >> 3.  Do you have any suggestions for code/data memory reduction for
the
> >> canceller, other than to make the tail length no longer than
necessary
> >> (this
> >> is a line echo canceller for a local phone, so I should be able to
keep
> >> it
> >> to 40ms).  I was surprised by the size of the FFT code, but I
guess that
> >> it
> >> is doing much more than the radix2 version in the TI library.
> >
> > The FFT code has more than just the radix two, so you can save there.
It
> > wasn't meant to be an optimized FFT, so if TI supplies you with
one,
> > it's probably a good idea to use it (that's what fft_wrap is
for). Also,
> > given that the memory use is almost directly proportional to the tail
> > length, reducing that one to 40 ms will make a huge difference.
> 
> Thanks for the advice.
> 
> - Jim 
> 
> 
>

Jim Crichton

2006-May-11 07:19 UTC

head link

[Speex-dev] Echo Canceller Memory Usage, Frame Size

(from thread Re: [Speex-dev] Speex echo canceller on TI C55 DSP, but this is 
a more general topic)
>> With these changes, the link was successful, using testecho.c with some
>> modifications for the C55 environment.  The code and data memory
>> requirements were a lot more than I had hoped (>20kbytes of dynamic
data
>> memory for block size=128, tail length = 1024), and I will probably not
>> be
>> able to fit it in the production build without some trimming.
>
> Yes, there may be a bit of memory reduction possible here. Of course,
> decreasing the tail length is also a rather easy way.
>
>> 2.  The testecho example uses a frame length and tail size that are 
>> powers
>> of two (128, 1024).  Are there any implications to using sizes which
are
>> not
>> powers of two?  It would be most convenient to use the encoder frame
size
>> (160), and some multiple of that for the tail size.  How does the frame
>> size
>> affect performance (I understand that the tail length determines what 
>> echo
>> signals are cancelable)?
>
> Non powers of two will be a bit slower because of the FFT, but that's
> all. I made sure the echo canceller works with 160, precisely because
> it's the frame size used by Speex. Note that I don't recommend
using
> frames more than 20 ms long (at any sampling rate).
>
>> 3.  Do you have any suggestions for code/data memory reduction for the
>> canceller, other than to make the tail length no longer than necessary 
>> (this
>> is a line echo canceller for a local phone, so I should be able to keep
>> it
>> to 40ms).  I was surprised by the size of the FFT code, but I guess
that
>> it
>> is doing much more than the radix2 version in the TI library.
>
> The FFT code has more than just the radix two, so you can save there. It
> wasn't meant to be an optimized FFT, so if TI supplies you with one,
> it's probably a good idea to use it (that's what fft_wrap is for).
Also,
> given that the memory use is almost directly proportional to the tail
> length, reducing that one to 40 ms will make a huge difference.
The overall allocated memory usage of the echo canceler is, in bytes:
4*frame_size *(27 + 5*ceiling(filter_length/frame_size)) + C

Where C = 420 on a TI C55 DSP (16 bit machine), and C = 760 on a TI C64 DSP 
(32 bit machine).

Where the tail length is an integer multiple of the frame size, this reduces 
to:
108*frame_size + 20*filter_length.

So the memory usage is a much stronger function of the frame length than the 
tail length, but both factors are pretty large.  I recall seeing in an 
earlier thread that performance is degraded when the filter length is too 
long because of added noise (but I cannot find that thread at the moment). 
Of course, if the filter length is too short, then it will not be able to 
cancel all of the echo.  For the frame length, however, the tradeoffs are 
not so obvious.  You said the following last week in the thread "Re: speex 
echo cancellation limitations":

"I recommend using frame sizes of about 5-20 ms (samples depend on
sampling rate) and tail lengths of 100-200 ms for acoustic echo. Of
course, line echo would require less than that, but I've focused mainly
on acoustic echo, which is a harder problem (but line echo should work
as well)."

Could you elaborate a bit on the effect of changing the frame size, other 
than memory usage?  In the small test case I have been using, there is a 
20ms delay between the speaker and microphone signals.  Test-echo.c has 
defaults frame_len=128, filter_len=1024.  I ran this case, and also 
frame_len=80, filter_len=320 (10 and 40ms at 8000 Hz).  The second case 
attenuated the echo better, probably because the first filter length is much 
longer than the echo path delay.  How low a frame length would you recommend 
for 8000Hz sample rate?

- Jim

Jean-Marc Valin

2006-May-11 08:03 UTC

head link

[Speex-dev] Echo Canceller Memory Usage, Frame Size

Note to everyone here, I'll be traveling for the next two weeks, so I'll
be a bit less responsive.
> The overall allocated memory usage of the echo canceler is, in bytes:
> 4*frame_size *(27 + 5*ceiling(filter_length/frame_size)) + C
> 
> Where C = 420 on a TI C55 DSP (16 bit machine), and C = 760 on a TI C64 DSP
> (32 bit machine).
> 
> Where the tail length is an integer multiple of the frame size, this
reduces
> to:
> 108*frame_size + 20*filter_length.
Hmm, hadn't realized the coef in front of the frame size was that big. I
guess some of that would probably be reduced a bit. In general, there's
a lot of things that can be adapted. For example, it would be easy to
reduce the 20*filter_length to 12*filter_length at the cost of slightly
more noise during the adaptation phase. 8*filter_length would be
possible at the cost of less steady-state cancellation and/or shorter
tail lengths. I'll have to look for the 108*frame_size term, but I
suspect there could be a bit of waste there.
> So the memory usage is a much stronger function of the frame length than
the
> tail length, but both factors are pretty large.  
Well, considering that filter_length is usually 5-20 times larger than
frame_size, then both terms are usually in the same order of magnitude.
> I recall seeing in an 
> earlier thread that performance is degraded when the filter length is too 
> long because of added noise (but I cannot find that thread at the moment). 
> Of course, if the filter length is too short, then it will not be able to 
> cancel all of the echo.  
The problem with long tail lengths is not only the noise, but the fact
that the adaptation time is more or less proportional to the tail
length.
> For the frame length, however, the tradeoffs are 
> not so obvious.  You said the following last week in the thread "Re:
speex
> echo cancellation limitations":
> 
> "I recommend using frame sizes of about 5-20 ms (samples depend on
> sampling rate) and tail lengths of 100-200 ms for acoustic echo. Of
> course, line echo would require less than that, but I've focused mainly
> on acoustic echo, which is a harder problem (but line echo should work
> as well)."
I've mainly observed that 5-20 ms seems to work good and I've designed
the fixed-point according to that. In general, the longer the frame
size, the less CPU it takes (for equal tail size). Otherwise, there's a
tradeoff. Large frame sizes provide better decorrelation of the data
(good), but the adaptation is done less often (bad). That's about all I
can say here. You just have to test and see what works best. In many
cases, however, the application dictates the frame size. 
> Could you elaborate a bit on the effect of changing the frame size, other 
> than memory usage?  In the small test case I have been using, there is a 
> 20ms delay between the speaker and microphone signals. 
You should probably delay the speaker signal so you don't waste
CPU/memory/efficiency because of that delay.
>  Test-echo.c has 
> defaults frame_len=128, filter_len=1024.  I ran this case, and also 
> frame_len=80, filter_len=320 (10 and 40ms at 8000 Hz).  The second case 
> attenuated the echo better, probably because the first filter length is
much
> longer than the echo path delay.  
Exactly.
> How low a frame length would you recommend 
> for 8000Hz sample rate?
5-20 ms, as noted above.

	Jean-Marc

Apparently Analagous Threads

Search for more possibly parallel threads

Speex dev - May 2006 - Speex echo canceller on TI C55 DSP

[Speex-dev] Speex echo canceller on TI C55 DSP

[Speex-dev] Speex echo canceller on TI C55 DSP

[Speex-dev] Speex echo canceller on TI C55 DSP

[Speex-dev] Speex echo canceller on TI C55 DSP

[Speex-dev] Echo Canceller Memory Usage, Frame Size

[Speex-dev] Echo Canceller Memory Usage, Frame Size

Apparently Analagous Threads