thr3ads.net - Speex dev - [Speex-dev] Speex echo canceller on TI C55 DSP [May 2006]

If this information is useful, please help other people find it:
Share via:

Jean-Marc Valin

2006-May-09 04:36 UTC

[Speex-dev] Speex echo canceller on TI C55 DSP

> I built and ran the same test on the TI C64 simulator, and the echo was 
> canceled nicely (about 10:1 reduction in the peak amplitude during the 
> second of two brief speech bursts).  So, my problem must again be related
to
> the 16-bit processing on the C5X DSPs.
Good. At least we've narrowed it down a bit.
> Also, the line where it is hanging is:
>          st->power_1[i] = 
>
FLOAT_SHL(FLOAT_DIV32_FLOAT(MULT16_32_Q15(M_1,r),FLOAT_MUL32U(e,st->power[i]+10)),WEIGHT_SHIFT+16);
Actually, I just found a 16-bit bug in FLOAT_DIV32_FLOAT. Could you
update svn and let me know if it works?
> and it is e that is in the denominator, not r (sorry for the confusion).  I
> can now run the simulations side-by-side and look for differences.
Can e really go to 0? I don't see how.

	Jean-Marc
> - Jim
> 
> Le lundi 08 mai 2006 ? 20:05 -0400, Jim Crichton a ?crit :
> > > I've just been made aware of these problems (look for the
thread "speex
> > > echo cancellation limitations"). It's on my short-term
TODO list.
> >
> > I saw the other thread, my problems happened in different (but
similar)
> > routines.
> >
> > >> If fftwrap.c, I ifdefed out the spx_fft_float and
spx_ifft_float
> > >> routines,
> > >> because there were not used and required smallft.c (which is
not so
> > >> small
> > >> at
> > >> all) to be added to the build.
> > >
> > > Right, need to cleanup that part...
> > >
> > >> With these changes, the link was successful, using testecho.c
with some
> > >> modifications for the C55 environment.  The code and data
memory
> > >> requirements were a lot more than I had hoped (>20kbytes
of dynamic
> > >> data
> > >> memory for block size=128, tail length = 1024), and I will
probably not
> > >> be
> > >> able to fit it in the production build without some trimming.
> > >
> > > Yes, there may be a bit of memory reduction possible here. Of
course,
> > > decreasing the tail length is also a rather easy way.
> > >
> > >> When I run the build, it goes into an infinite loop in
FLOAT_DIV32
> > >> (mdf.c
> > >> line 660), which occurs because adapt_rate is < 0, which
happens when
> > >> FLOAT_EXTRACT16 gets the input {0x7ff0, 0xfffb}.  The
rounding is
> > >> causing
> > >> the result to go negative.  I worked around this by changing
> > >
> > > I think that was mentioned in the previous thread...
> > >
> > >>       return (a.m+(1<<(-a.e-1)))>>-a.e;
> > >> to
> > >>       return (((spx_uint16_t)
a.m)+(1<<(-a.e-1)))>>-a.e;
> > >
> > > Is that sufficient to remove all the overflows at this place?
> >
> > The rounding takes the value to exactly 0x8000, and it is followed by
a
> > right shift, so you just need to avoid the sign extension.
> >
> > >> I have not had time to trace this, but it looks like a
similar problem,
> > >> where the result of MULT16_32_Q15(M_1,r) is negative, and
> > >> FLOAT_DIV32_FLOAT
> > >> bombs.  Maybe the best thing to do next is to instrument the
routines
> > >> in
> > >> pseudofloat.h which have loops, but I will not get to that
for a day or
> > >> two.
> > >
> > > Yeah, r is never supposed to be negative and the float routines
assume
> > > that.
> >
> > No, it was a divide by zero, as explained in my second post.  I will
try a
> > build on the C6x DSP to see if this is a 16 vs. 32-bit problem.  I
sent
> > the
> > test files off-list.
> >
> > >> 1.  speex_echo_state_init takes about 20M instructions, which
is a
> > >> little
> > >> frightening,
> > >
> > > That's the fft initialization that calls a lot of float cos()
functions.
> > > If you have a fixed version of cos() you can use it there,
otherwise a
> > > fixed table (for a certain size) would work.
> > >
> > >> and the calls to speex_echo_cancel take about 630K
instructions
> > >> for 128 samples.  Given your recent experience with the fixed
point
> > >> canceller, does this sound rational?  The MIPs for the
canceller are
> > >> similar
> > >> to the MIPs for the encoder running 8kbps, complexity 1.
> > >
> > > The order of magnitude seems right. It may be possible to reduce
that a
> > > bit, though. If you have an optimized FFT, you could replace
kiss_fft
> > > with it and get a big improvement right there.
> >
> > Yeah, but then I have to try to actually understand the algorithm.  I
am
> > not
> > sure that those brain cells are still alive.
> >
> > >> 2.  The testecho example uses a frame length and tail size
that are
> > >> powers
> > >> of two (128, 1024).  Are there any implications to using
sizes which
> > >> are
> > >> not
> > >> powers of two?  It would be most convenient to use the
encoder frame
> > >> size
> > >> (160), and some multiple of that for the tail size.  How does
the frame
> > >> size
> > >> affect performance (I understand that the tail length
determines what
> > >> echo
> > >> signals are cancelable)?
> > >
> > > Non powers of two will be a bit slower because of the FFT, but
that's
> > > all. I made sure the echo canceller works with 160, precisely
because
> > > it's the frame size used by Speex. Note that I don't
recommend using
> > > frames more than 20 ms long (at any sampling rate).
> > >
> > >> 3.  Do you have any suggestions for code/data memory
reduction for the
> > >> canceller, other than to make the tail length no longer than
necessary
> > >> (this
> > >> is a line echo canceller for a local phone, so I should be
able to keep
> > >> it
> > >> to 40ms).  I was surprised by the size of the FFT code, but I
guess
> > >> that
> > >> it
> > >> is doing much more than the radix2 version in the TI library.
> > >
> > > The FFT code has more than just the radix two, so you can save
there. It
> > > wasn't meant to be an optimized FFT, so if TI supplies you
with one,
> > > it's probably a good idea to use it (that's what fft_wrap
is for). Also,
> > > given that the memory use is almost directly proportional to the
tail
> > > length, reducing that one to 40 ms will make a huge difference.
> >
> > Thanks for the advice.
> >
> > - Jim
> >
> >
> >
> 
> 
>

Jim Crichton

2006-May-10 04:29 UTC

head link

[Speex-dev] Speex echo canceller on TI C55 DSP

Jean-Marc,

Well I finally tracked down the problem.  Then I checked my mail and found 
that you had fixed it several hours earlier.  :(

Build 11387 produces the same result as my modified build 11343.  Because of 
compiler limitations in the TI tools, I did have to make modifications to 
pseudofloat.h (separating return of float values) and nb_celp.c (adding 
braces around a variable declaration in the middle of code).  I have 
attached a patch.  You might prefer to do the nb_celp.c change in a 
different way.

Also in the pseudofloat.h is the change to EXTRACT16 which I mentioned 
earlier in the thread.  The change that I made (casting as unsigned to 
prevent MSB extension) solved my problem, but maybe it would be safer to use 
EXTEND32 here.  That is, after all, the way all of the other 16 bit issues 
get solved.

Outside of what is in the patch, I ifdefed out the unused spx_fft_float and 
spx_ifft_float in fftwrap.c (I described this earlier, it prevents the need 
to include smallft.c) and commented out the memory.h and malloc.h includes 
in kiss_fft.h (the TI compiler does not have these files).  When the dust 
settles, I will send a patch with additions to the TI directory for the echo 
canceler example, including instructions on these last changes.  Also, I 
would like to modify the memory allocation to follow the same structure as 
the encoder/decoder, including the ability to override the malloc function.

Finally, there is a lingering issue.  The C55 and C64 builds cancel the echo 
similarly, but they do not produce the same results.  The files match for 
the first 0x1000 bytes (2048 samples, or 16 frames).  We recently sorted out 
something like this in the encoder, but I am not sure if the expected 
results are the same here.  Do you expect bit-exact results for all 
fixed-point builds?  I can look into this some more, but first I need to 
figure out if I can fit the echo canceler into my application at all.

- Jim

----- Original Message ----- 
From: "Jean-Marc Valin" <Jean-Marc.Valin@USherbrooke.ca>
To: "Jim Crichton" <jim.crichton@comcast.net>
Cc: <speex-dev@xiph.org>
Sent: Tuesday, May 09, 2006 7:36 AM
Subject: Re: [Speex-dev] Speex echo canceller on TI C55 DSP

> I built and ran the same test on the TI C64 simulator, and the echo was
> canceled nicely (about 10:1 reduction in the peak amplitude during the
> second of two brief speech bursts).  So, my problem must again be related 
> to
> the 16-bit processing on the C5X DSPs.
Good. At least we've narrowed it down a bit.
> Also, the line where it is hanging is:
>          st->power_1[i] >
FLOAT_SHL(FLOAT_DIV32_FLOAT(MULT16_32_Q15(M_1,r),FLOAT_MUL32U(e,st->power[i]+10)),WEIGHT_SHIFT+16);
Actually, I just found a 16-bit bug in FLOAT_DIV32_FLOAT. Could you
update svn and let me know if it works?
> and it is e that is in the denominator, not r (sorry for the confusion). 
> I
> can now run the simulations side-by-side and look for differences.
Can e really go to 0? I don't see how.

Jean-Marc
> - Jim
>
> Le lundi 08 mai 2006 ? 20:05 -0400, Jim Crichton a ?crit :
> > > I've just been made aware of these problems (look for the
thread
> > > "speex
> > > echo cancellation limitations"). It's on my short-term
TODO list.
> >
> > I saw the other thread, my problems happened in different (but
similar)
> > routines.
> >
> > >> If fftwrap.c, I ifdefed out the spx_fft_float and
spx_ifft_float
> > >> routines,
> > >> because there were not used and required smallft.c (which is
not so
> > >> small
> > >> at
> > >> all) to be added to the build.
> > >
> > > Right, need to cleanup that part...
> > >
> > >> With these changes, the link was successful, using testecho.c
with
> > >> some
> > >> modifications for the C55 environment.  The code and data
memory
> > >> requirements were a lot more than I had hoped (>20kbytes
of dynamic
> > >> data
> > >> memory for block size=128, tail length = 1024), and I will
probably
> > >> not
> > >> be
> > >> able to fit it in the production build without some trimming.
> > >
> > > Yes, there may be a bit of memory reduction possible here. Of
course,
> > > decreasing the tail length is also a rather easy way.
> > >
> > >> When I run the build, it goes into an infinite loop in
FLOAT_DIV32
> > >> (mdf.c
> > >> line 660), which occurs because adapt_rate is < 0, which
happens when
> > >> FLOAT_EXTRACT16 gets the input {0x7ff0, 0xfffb}.  The
rounding is
> > >> causing
> > >> the result to go negative.  I worked around this by changing
> > >
> > > I think that was mentioned in the previous thread...
> > >
> > >>       return (a.m+(1<<(-a.e-1)))>>-a.e;
> > >> to
> > >>       return (((spx_uint16_t)
a.m)+(1<<(-a.e-1)))>>-a.e;
> > >
> > > Is that sufficient to remove all the overflows at this place?
> >
> > The rounding takes the value to exactly 0x8000, and it is followed by
a
> > right shift, so you just need to avoid the sign extension.
> >
> > >> I have not had time to trace this, but it looks like a
similar
> > >> problem,
> > >> where the result of MULT16_32_Q15(M_1,r) is negative, and
> > >> FLOAT_DIV32_FLOAT
> > >> bombs.  Maybe the best thing to do next is to instrument the
routines
> > >> in
> > >> pseudofloat.h which have loops, but I will not get to that
for a day
> > >> or
> > >> two.
> > >
> > > Yeah, r is never supposed to be negative and the float routines
assume
> > > that.
> >
> > No, it was a divide by zero, as explained in my second post.  I will
try
> > a
> > build on the C6x DSP to see if this is a 16 vs. 32-bit problem.  I
sent
> > the
> > test files off-list.
> >
> > >> 1.  speex_echo_state_init takes about 20M instructions, which
is a
> > >> little
> > >> frightening,
> > >
> > > That's the fft initialization that calls a lot of float cos()
> > > functions.
> > > If you have a fixed version of cos() you can use it there,
otherwise a
> > > fixed table (for a certain size) would work.
> > >
> > >> and the calls to speex_echo_cancel take about 630K
instructions
> > >> for 128 samples.  Given your recent experience with the fixed
point
> > >> canceller, does this sound rational?  The MIPs for the
canceller are
> > >> similar
> > >> to the MIPs for the encoder running 8kbps, complexity 1.
> > >
> > > The order of magnitude seems right. It may be possible to reduce
that
> > > a
> > > bit, though. If you have an optimized FFT, you could replace
kiss_fft
> > > with it and get a big improvement right there.
> >
> > Yeah, but then I have to try to actually understand the algorithm.  I
am
> > not
> > sure that those brain cells are still alive.
> >
> > >> 2.  The testecho example uses a frame length and tail size
that are
> > >> powers
> > >> of two (128, 1024).  Are there any implications to using
sizes which
> > >> are
> > >> not
> > >> powers of two?  It would be most convenient to use the
encoder frame
> > >> size
> > >> (160), and some multiple of that for the tail size.  How does
the
> > >> frame
> > >> size
> > >> affect performance (I understand that the tail length
determines what
> > >> echo
> > >> signals are cancelable)?
> > >
> > > Non powers of two will be a bit slower because of the FFT, but
that's
> > > all. I made sure the echo canceller works with 160, precisely
because
> > > it's the frame size used by Speex. Note that I don't
recommend using
> > > frames more than 20 ms long (at any sampling rate).
> > >
> > >> 3.  Do you have any suggestions for code/data memory
reduction for
> > >> the
> > >> canceller, other than to make the tail length no longer than 
> > >> necessary
> > >> (this
> > >> is a line echo canceller for a local phone, so I should be
able to
> > >> keep
> > >> it
> > >> to 40ms).  I was surprised by the size of the FFT code, but I
guess
> > >> that
> > >> it
> > >> is doing much more than the radix2 version in the TI library.
> > >
> > > The FFT code has more than just the radix two, so you can save
there.
> > > It
> > > wasn't meant to be an optimized FFT, so if TI supplies you
with one,
> > > it's probably a good idea to use it (that's what fft_wrap
is for).
> > > Also,
> > > given that the memory use is almost directly proportional to the
tail
> > > length, reducing that one to 40 ms will make a huge difference.
> >
> > Thanks for the advice.
> >
> > - Jim
> >
> >
> >
>
>
>-------------- next part --------------
A non-text attachment was scrubbed...
Name: echo11387.zip
Type: application/x-zip-compressed
Size: 837 bytes
Desc: not available
Url :
http://lists.xiph.org/pipermail/speex-dev/attachments/20060510/ff85a753/echo11387-0001.bin

Jean-Marc Valin

2006-May-10 04:44 UTC

head link

[Speex-dev] Speex echo canceller on TI C55 DSP

> Build 11387 produces the same result as my modified build 11343.  Because
of
> compiler limitations in the TI tools, I did have to make modifications to 
> pseudofloat.h (separating return of float values) and nb_celp.c (adding 
> braces around a variable declaration in the middle of code).  I have 
> attached a patch.  You might prefer to do the nb_celp.c change in a 
> different way.
Funny thing is I applied exactly the same patch to nb_celp.c a few hours
ago!
> Also in the pseudofloat.h is the change to EXTRACT16 which I mentioned 
> earlier in the thread.  The change that I made (casting as unsigned to 
> prevent MSB extension) solved my problem, but maybe it would be safer to
use
> EXTEND32 here.  That is, after all, the way all of the other 16 bit issues 
> get solved.
I'll need to have a closer look on that one.
> Outside of what is in the patch, I ifdefed out the unused spx_fft_float and
> spx_ifft_float in fftwrap.c (I described this earlier, it prevents the need
> to include smallft.c) and commented out the memory.h and malloc.h includes 
> in kiss_fft.h (the TI compiler does not have these files).  
True, forgot about that one.
> When the dust 
> settles, I will send a patch with additions to the TI directory for the
echo
> canceler example, including instructions on these last changes.  Also, I 
> would like to modify the memory allocation to follow the same structure as 
> the encoder/decoder, including the ability to override the malloc function.
What do you mean here?
> Finally, there is a lingering issue.  The C55 and C64 builds cancel the
echo
> similarly, but they do not produce the same results.  The files match for 
> the first 0x1000 bytes (2048 samples, or 16 frames).  We recently sorted
out
> something like this in the encoder, but I am not sure if the expected 
> results are the same here.  Do you expect bit-exact results for all 
> fixed-point builds?  I can look into this some more, but first I need to 
> figure out if I can fit the echo canceler into my application at all.
Assuming the FFT tables are exactly the same (that's the only place
using float), then I would definitely expect bit-exact results.

	Jean-Marc
> - Jim
> 
> ----- Original Message ----- 
> From: "Jean-Marc Valin" <Jean-Marc.Valin@USherbrooke.ca>
> To: "Jim Crichton" <jim.crichton@comcast.net>
> Cc: <speex-dev@xiph.org>
> Sent: Tuesday, May 09, 2006 7:36 AM
> Subject: Re: [Speex-dev] Speex echo canceller on TI C55 DSP
> 
> 
> > I built and ran the same test on the TI C64 simulator, and the echo
was
> > canceled nicely (about 10:1 reduction in the peak amplitude during the
> > second of two brief speech bursts).  So, my problem must again be
related
> > to
> > the 16-bit processing on the C5X DSPs.
> 
> Good. At least we've narrowed it down a bit.
> 
> > Also, the line where it is hanging is:
> >          st->power_1[i] > >
FLOAT_SHL(FLOAT_DIV32_FLOAT(MULT16_32_Q15(M_1,r),FLOAT_MUL32U(e,st->power[i]+10)),WEIGHT_SHIFT+16);
> 
> Actually, I just found a 16-bit bug in FLOAT_DIV32_FLOAT. Could you
> update svn and let me know if it works?
> 
> > and it is e that is in the denominator, not r (sorry for the
confusion).
> > I
> > can now run the simulations side-by-side and look for differences.
> 
> Can e really go to 0? I don't see how.
> 
> Jean-Marc
> 
> > - Jim
> >
> > Le lundi 08 mai 2006 ? 20:05 -0400, Jim Crichton a ?crit :
> > > > I've just been made aware of these problems (look for
the thread
> > > > "speex
> > > > echo cancellation limitations"). It's on my
short-term TODO list.
> > >
> > > I saw the other thread, my problems happened in different (but
similar)
> > > routines.
> > >
> > > >> If fftwrap.c, I ifdefed out the spx_fft_float and
spx_ifft_float
> > > >> routines,
> > > >> because there were not used and required smallft.c
(which is not so
> > > >> small
> > > >> at
> > > >> all) to be added to the build.
> > > >
> > > > Right, need to cleanup that part...
> > > >
> > > >> With these changes, the link was successful, using
testecho.c with
> > > >> some
> > > >> modifications for the C55 environment.  The code and
data memory
> > > >> requirements were a lot more than I had hoped
(>20kbytes of dynamic
> > > >> data
> > > >> memory for block size=128, tail length = 1024), and I
will probably
> > > >> not
> > > >> be
> > > >> able to fit it in the production build without some
trimming.
> > > >
> > > > Yes, there may be a bit of memory reduction possible here.
Of course,
> > > > decreasing the tail length is also a rather easy way.
> > > >
> > > >> When I run the build, it goes into an infinite loop in
FLOAT_DIV32
> > > >> (mdf.c
> > > >> line 660), which occurs because adapt_rate is < 0,
which happens when
> > > >> FLOAT_EXTRACT16 gets the input {0x7ff0, 0xfffb}.  The
rounding is
> > > >> causing
> > > >> the result to go negative.  I worked around this by
changing
> > > >
> > > > I think that was mentioned in the previous thread...
> > > >
> > > >>       return (a.m+(1<<(-a.e-1)))>>-a.e;
> > > >> to
> > > >>       return (((spx_uint16_t)
a.m)+(1<<(-a.e-1)))>>-a.e;
> > > >
> > > > Is that sufficient to remove all the overflows at this
place?
> > >
> > > The rounding takes the value to exactly 0x8000, and it is
followed by a
> > > right shift, so you just need to avoid the sign extension.
> > >
> > > >> I have not had time to trace this, but it looks like a
similar
> > > >> problem,
> > > >> where the result of MULT16_32_Q15(M_1,r) is negative,
and
> > > >> FLOAT_DIV32_FLOAT
> > > >> bombs.  Maybe the best thing to do next is to instrument
the routines
> > > >> in
> > > >> pseudofloat.h which have loops, but I will not get to
that for a day
> > > >> or
> > > >> two.
> > > >
> > > > Yeah, r is never supposed to be negative and the float
routines assume
> > > > that.
> > >
> > > No, it was a divide by zero, as explained in my second post.  I
will try
> > > a
> > > build on the C6x DSP to see if this is a 16 vs. 32-bit problem. 
I sent
> > > the
> > > test files off-list.
> > >
> > > >> 1.  speex_echo_state_init takes about 20M instructions,
which is a
> > > >> little
> > > >> frightening,
> > > >
> > > > That's the fft initialization that calls a lot of float
cos()
> > > > functions.
> > > > If you have a fixed version of cos() you can use it there,
otherwise a
> > > > fixed table (for a certain size) would work.
> > > >
> > > >> and the calls to speex_echo_cancel take about 630K
instructions
> > > >> for 128 samples.  Given your recent experience with the
fixed point
> > > >> canceller, does this sound rational?  The MIPs for the
canceller are
> > > >> similar
> > > >> to the MIPs for the encoder running 8kbps, complexity 1.
> > > >
> > > > The order of magnitude seems right. It may be possible to
reduce that
> > > > a
> > > > bit, though. If you have an optimized FFT, you could replace
kiss_fft
> > > > with it and get a big improvement right there.
> > >
> > > Yeah, but then I have to try to actually understand the
algorithm.  I am
> > > not
> > > sure that those brain cells are still alive.
> > >
> > > >> 2.  The testecho example uses a frame length and tail
size that are
> > > >> powers
> > > >> of two (128, 1024).  Are there any implications to using
sizes which
> > > >> are
> > > >> not
> > > >> powers of two?  It would be most convenient to use the
encoder frame
> > > >> size
> > > >> (160), and some multiple of that for the tail size.  How
does the
> > > >> frame
> > > >> size
> > > >> affect performance (I understand that the tail length
determines what
> > > >> echo
> > > >> signals are cancelable)?
> > > >
> > > > Non powers of two will be a bit slower because of the FFT,
but that's
> > > > all. I made sure the echo canceller works with 160,
precisely because
> > > > it's the frame size used by Speex. Note that I don't
recommend using
> > > > frames more than 20 ms long (at any sampling rate).
> > > >
> > > >> 3.  Do you have any suggestions for code/data memory
reduction for
> > > >> the
> > > >> canceller, other than to make the tail length no longer
than
> > > >> necessary
> > > >> (this
> > > >> is a line echo canceller for a local phone, so I should
be able to
> > > >> keep
> > > >> it
> > > >> to 40ms).  I was surprised by the size of the FFT code,
but I guess
> > > >> that
> > > >> it
> > > >> is doing much more than the radix2 version in the TI
library.
> > > >
> > > > The FFT code has more than just the radix two, so you can
save there.
> > > > It
> > > > wasn't meant to be an optimized FFT, so if TI supplies
you with one,
> > > > it's probably a good idea to use it (that's what
fft_wrap is for).
> > > > Also,
> > > > given that the memory use is almost directly proportional to
the tail
> > > > length, reducing that one to 40 ms will make a huge
difference.
> > >
> > > Thanks for the advice.
> > >
> > > - Jim
> > >
> > >
> > >
> >
> >
> >
> _______________________________________________
> Speex-dev mailing list
> Speex-dev@xiph.org
> http://lists.xiph.org/mailman/listinfo/speex-dev

Apparently Analagous Threads

Search for more maybe matching threads

Speex dev - May 2006 - Speex echo canceller on TI C55 DSP

[Speex-dev] Speex echo canceller on TI C55 DSP

[Speex-dev] Speex echo canceller on TI C55 DSP

[Speex-dev] Speex echo canceller on TI C55 DSP

Apparently Analagous Threads