Just tried your files and I'm not running into any infinite loops and the cancellation works fine. Unless the C6x has the same problem, I suspect a 16-bit problem. I'll check and see if I find something. About the r=0 problem, I can't find where it ends up in a denominator, so I suspect is not (directly) the problem. Jean-Marc Le lundi 08 mai 2006 ? 20:05 -0400, Jim Crichton a ?crit :> > I've just been made aware of these problems (look for the thread "speex > > echo cancellation limitations"). It's on my short-term TODO list. > > I saw the other thread, my problems happened in different (but similar) > routines. > > >> If fftwrap.c, I ifdefed out the spx_fft_float and spx_ifft_float > >> routines, > >> because there were not used and required smallft.c (which is not so small > >> at > >> all) to be added to the build. > > > > Right, need to cleanup that part... > > > >> With these changes, the link was successful, using testecho.c with some > >> modifications for the C55 environment. The code and data memory > >> requirements were a lot more than I had hoped (>20kbytes of dynamic data > >> memory for block size=128, tail length = 1024), and I will probably not > >> be > >> able to fit it in the production build without some trimming. > > > > Yes, there may be a bit of memory reduction possible here. Of course, > > decreasing the tail length is also a rather easy way. > > > >> When I run the build, it goes into an infinite loop in FLOAT_DIV32 (mdf.c > >> line 660), which occurs because adapt_rate is < 0, which happens when > >> FLOAT_EXTRACT16 gets the input {0x7ff0, 0xfffb}. The rounding is causing > >> the result to go negative. I worked around this by changing > > > > I think that was mentioned in the previous thread... > > > >> return (a.m+(1<<(-a.e-1)))>>-a.e; > >> to > >> return (((spx_uint16_t) a.m)+(1<<(-a.e-1)))>>-a.e; > > > > Is that sufficient to remove all the overflows at this place? > > The rounding takes the value to exactly 0x8000, and it is followed by a > right shift, so you just need to avoid the sign extension. > > >> I have not had time to trace this, but it looks like a similar problem, > >> where the result of MULT16_32_Q15(M_1,r) is negative, and > >> FLOAT_DIV32_FLOAT > >> bombs. Maybe the best thing to do next is to instrument the routines in > >> pseudofloat.h which have loops, but I will not get to that for a day or > >> two. > > > > Yeah, r is never supposed to be negative and the float routines assume > > that. > > No, it was a divide by zero, as explained in my second post. I will try a > build on the C6x DSP to see if this is a 16 vs. 32-bit problem. I sent the > test files off-list. > > >> 1. speex_echo_state_init takes about 20M instructions, which is a little > >> frightening, > > > > That's the fft initialization that calls a lot of float cos() functions. > > If you have a fixed version of cos() you can use it there, otherwise a > > fixed table (for a certain size) would work. > > > >> and the calls to speex_echo_cancel take about 630K instructions > >> for 128 samples. Given your recent experience with the fixed point > >> canceller, does this sound rational? The MIPs for the canceller are > >> similar > >> to the MIPs for the encoder running 8kbps, complexity 1. > > > > The order of magnitude seems right. It may be possible to reduce that a > > bit, though. If you have an optimized FFT, you could replace kiss_fft > > with it and get a big improvement right there. > > Yeah, but then I have to try to actually understand the algorithm. I am not > sure that those brain cells are still alive. > > >> 2. The testecho example uses a frame length and tail size that are > >> powers > >> of two (128, 1024). Are there any implications to using sizes which are > >> not > >> powers of two? It would be most convenient to use the encoder frame size > >> (160), and some multiple of that for the tail size. How does the frame > >> size > >> affect performance (I understand that the tail length determines what > >> echo > >> signals are cancelable)? > > > > Non powers of two will be a bit slower because of the FFT, but that's > > all. I made sure the echo canceller works with 160, precisely because > > it's the frame size used by Speex. Note that I don't recommend using > > frames more than 20 ms long (at any sampling rate). > > > >> 3. Do you have any suggestions for code/data memory reduction for the > >> canceller, other than to make the tail length no longer than necessary > >> (this > >> is a line echo canceller for a local phone, so I should be able to keep > >> it > >> to 40ms). I was surprised by the size of the FFT code, but I guess that > >> it > >> is doing much more than the radix2 version in the TI library. > > > > The FFT code has more than just the radix two, so you can save there. It > > wasn't meant to be an optimized FFT, so if TI supplies you with one, > > it's probably a good idea to use it (that's what fft_wrap is for). Also, > > given that the memory use is almost directly proportional to the tail > > length, reducing that one to 40 ms will make a huge difference. > > Thanks for the advice. > > - Jim > > >
>Just tried your files and I'm not running into any infinite loops and >the cancellation works fine. Unless the C6x has the same problem, I >suspect a 16-bit problem. I'll check and see if I find something. About >the r=0 problem, I can't find where it ends up in a denominator, so I >suspect is not (directly) the problem.I built and ran the same test on the TI C64 simulator, and the echo was canceled nicely (about 10:1 reduction in the peak amplitude during the second of two brief speech bursts). So, my problem must again be related to the 16-bit processing on the C5X DSPs. Also, the line where it is hanging is: st->power_1[i] = FLOAT_SHL(FLOAT_DIV32_FLOAT(MULT16_32_Q15(M_1,r),FLOAT_MUL32U(e,st->power[i]+10)),WEIGHT_SHIFT+16); and it is e that is in the denominator, not r (sorry for the confusion). I can now run the simulations side-by-side and look for differences. - Jim Le lundi 08 mai 2006 ? 20:05 -0400, Jim Crichton a ?crit :> > I've just been made aware of these problems (look for the thread "speex > > echo cancellation limitations"). It's on my short-term TODO list. > > I saw the other thread, my problems happened in different (but similar) > routines. > > >> If fftwrap.c, I ifdefed out the spx_fft_float and spx_ifft_float > >> routines, > >> because there were not used and required smallft.c (which is not so > >> small > >> at > >> all) to be added to the build. > > > > Right, need to cleanup that part... > > > >> With these changes, the link was successful, using testecho.c with some > >> modifications for the C55 environment. The code and data memory > >> requirements were a lot more than I had hoped (>20kbytes of dynamic > >> data > >> memory for block size=128, tail length = 1024), and I will probably not > >> be > >> able to fit it in the production build without some trimming. > > > > Yes, there may be a bit of memory reduction possible here. Of course, > > decreasing the tail length is also a rather easy way. > > > >> When I run the build, it goes into an infinite loop in FLOAT_DIV32 > >> (mdf.c > >> line 660), which occurs because adapt_rate is < 0, which happens when > >> FLOAT_EXTRACT16 gets the input {0x7ff0, 0xfffb}. The rounding is > >> causing > >> the result to go negative. I worked around this by changing > > > > I think that was mentioned in the previous thread... > > > >> return (a.m+(1<<(-a.e-1)))>>-a.e; > >> to > >> return (((spx_uint16_t) a.m)+(1<<(-a.e-1)))>>-a.e; > > > > Is that sufficient to remove all the overflows at this place? > > The rounding takes the value to exactly 0x8000, and it is followed by a > right shift, so you just need to avoid the sign extension. > > >> I have not had time to trace this, but it looks like a similar problem, > >> where the result of MULT16_32_Q15(M_1,r) is negative, and > >> FLOAT_DIV32_FLOAT > >> bombs. Maybe the best thing to do next is to instrument the routines > >> in > >> pseudofloat.h which have loops, but I will not get to that for a day or > >> two. > > > > Yeah, r is never supposed to be negative and the float routines assume > > that. > > No, it was a divide by zero, as explained in my second post. I will try a > build on the C6x DSP to see if this is a 16 vs. 32-bit problem. I sent > the > test files off-list. > > >> 1. speex_echo_state_init takes about 20M instructions, which is a > >> little > >> frightening, > > > > That's the fft initialization that calls a lot of float cos() functions. > > If you have a fixed version of cos() you can use it there, otherwise a > > fixed table (for a certain size) would work. > > > >> and the calls to speex_echo_cancel take about 630K instructions > >> for 128 samples. Given your recent experience with the fixed point > >> canceller, does this sound rational? The MIPs for the canceller are > >> similar > >> to the MIPs for the encoder running 8kbps, complexity 1. > > > > The order of magnitude seems right. It may be possible to reduce that a > > bit, though. If you have an optimized FFT, you could replace kiss_fft > > with it and get a big improvement right there. > > Yeah, but then I have to try to actually understand the algorithm. I am > not > sure that those brain cells are still alive. > > >> 2. The testecho example uses a frame length and tail size that are > >> powers > >> of two (128, 1024). Are there any implications to using sizes which > >> are > >> not > >> powers of two? It would be most convenient to use the encoder frame > >> size > >> (160), and some multiple of that for the tail size. How does the frame > >> size > >> affect performance (I understand that the tail length determines what > >> echo > >> signals are cancelable)? > > > > Non powers of two will be a bit slower because of the FFT, but that's > > all. I made sure the echo canceller works with 160, precisely because > > it's the frame size used by Speex. Note that I don't recommend using > > frames more than 20 ms long (at any sampling rate). > > > >> 3. Do you have any suggestions for code/data memory reduction for the > >> canceller, other than to make the tail length no longer than necessary > >> (this > >> is a line echo canceller for a local phone, so I should be able to keep > >> it > >> to 40ms). I was surprised by the size of the FFT code, but I guess > >> that > >> it > >> is doing much more than the radix2 version in the TI library. > > > > The FFT code has more than just the radix two, so you can save there. It > > wasn't meant to be an optimized FFT, so if TI supplies you with one, > > it's probably a good idea to use it (that's what fft_wrap is for). Also, > > given that the memory use is almost directly proportional to the tail > > length, reducing that one to 40 ms will make a huge difference. > > Thanks for the advice. > > - Jim > > >
> I built and ran the same test on the TI C64 simulator, and the echo was > canceled nicely (about 10:1 reduction in the peak amplitude during the > second of two brief speech bursts). So, my problem must again be related to > the 16-bit processing on the C5X DSPs.Good. At least we've narrowed it down a bit.> Also, the line where it is hanging is: > st->power_1[i] = > FLOAT_SHL(FLOAT_DIV32_FLOAT(MULT16_32_Q15(M_1,r),FLOAT_MUL32U(e,st->power[i]+10)),WEIGHT_SHIFT+16);Actually, I just found a 16-bit bug in FLOAT_DIV32_FLOAT. Could you update svn and let me know if it works?> and it is e that is in the denominator, not r (sorry for the confusion). I > can now run the simulations side-by-side and look for differences.Can e really go to 0? I don't see how. Jean-Marc> - Jim > > Le lundi 08 mai 2006 ? 20:05 -0400, Jim Crichton a ?crit : > > > I've just been made aware of these problems (look for the thread "speex > > > echo cancellation limitations"). It's on my short-term TODO list. > > > > I saw the other thread, my problems happened in different (but similar) > > routines. > > > > >> If fftwrap.c, I ifdefed out the spx_fft_float and spx_ifft_float > > >> routines, > > >> because there were not used and required smallft.c (which is not so > > >> small > > >> at > > >> all) to be added to the build. > > > > > > Right, need to cleanup that part... > > > > > >> With these changes, the link was successful, using testecho.c with some > > >> modifications for the C55 environment. The code and data memory > > >> requirements were a lot more than I had hoped (>20kbytes of dynamic > > >> data > > >> memory for block size=128, tail length = 1024), and I will probably not > > >> be > > >> able to fit it in the production build without some trimming. > > > > > > Yes, there may be a bit of memory reduction possible here. Of course, > > > decreasing the tail length is also a rather easy way. > > > > > >> When I run the build, it goes into an infinite loop in FLOAT_DIV32 > > >> (mdf.c > > >> line 660), which occurs because adapt_rate is < 0, which happens when > > >> FLOAT_EXTRACT16 gets the input {0x7ff0, 0xfffb}. The rounding is > > >> causing > > >> the result to go negative. I worked around this by changing > > > > > > I think that was mentioned in the previous thread... > > > > > >> return (a.m+(1<<(-a.e-1)))>>-a.e; > > >> to > > >> return (((spx_uint16_t) a.m)+(1<<(-a.e-1)))>>-a.e; > > > > > > Is that sufficient to remove all the overflows at this place? > > > > The rounding takes the value to exactly 0x8000, and it is followed by a > > right shift, so you just need to avoid the sign extension. > > > > >> I have not had time to trace this, but it looks like a similar problem, > > >> where the result of MULT16_32_Q15(M_1,r) is negative, and > > >> FLOAT_DIV32_FLOAT > > >> bombs. Maybe the best thing to do next is to instrument the routines > > >> in > > >> pseudofloat.h which have loops, but I will not get to that for a day or > > >> two. > > > > > > Yeah, r is never supposed to be negative and the float routines assume > > > that. > > > > No, it was a divide by zero, as explained in my second post. I will try a > > build on the C6x DSP to see if this is a 16 vs. 32-bit problem. I sent > > the > > test files off-list. > > > > >> 1. speex_echo_state_init takes about 20M instructions, which is a > > >> little > > >> frightening, > > > > > > That's the fft initialization that calls a lot of float cos() functions. > > > If you have a fixed version of cos() you can use it there, otherwise a > > > fixed table (for a certain size) would work. > > > > > >> and the calls to speex_echo_cancel take about 630K instructions > > >> for 128 samples. Given your recent experience with the fixed point > > >> canceller, does this sound rational? The MIPs for the canceller are > > >> similar > > >> to the MIPs for the encoder running 8kbps, complexity 1. > > > > > > The order of magnitude seems right. It may be possible to reduce that a > > > bit, though. If you have an optimized FFT, you could replace kiss_fft > > > with it and get a big improvement right there. > > > > Yeah, but then I have to try to actually understand the algorithm. I am > > not > > sure that those brain cells are still alive. > > > > >> 2. The testecho example uses a frame length and tail size that are > > >> powers > > >> of two (128, 1024). Are there any implications to using sizes which > > >> are > > >> not > > >> powers of two? It would be most convenient to use the encoder frame > > >> size > > >> (160), and some multiple of that for the tail size. How does the frame > > >> size > > >> affect performance (I understand that the tail length determines what > > >> echo > > >> signals are cancelable)? > > > > > > Non powers of two will be a bit slower because of the FFT, but that's > > > all. I made sure the echo canceller works with 160, precisely because > > > it's the frame size used by Speex. Note that I don't recommend using > > > frames more than 20 ms long (at any sampling rate). > > > > > >> 3. Do you have any suggestions for code/data memory reduction for the > > >> canceller, other than to make the tail length no longer than necessary > > >> (this > > >> is a line echo canceller for a local phone, so I should be able to keep > > >> it > > >> to 40ms). I was surprised by the size of the FFT code, but I guess > > >> that > > >> it > > >> is doing much more than the radix2 version in the TI library. > > > > > > The FFT code has more than just the radix two, so you can save there. It > > > wasn't meant to be an optimized FFT, so if TI supplies you with one, > > > it's probably a good idea to use it (that's what fft_wrap is for). Also, > > > given that the memory use is almost directly proportional to the tail > > > length, reducing that one to 40 ms will make a huge difference. > > > > Thanks for the advice. > > > > - Jim > > > > > > > > >