> I am writing to gain a better understanding of the limitations of speex echo > cancellation, esp. with respect to the fixed point implementation. > If these limitations have been documented elsewhere already, please let me > know!Nothing officially documented, sorry.> I observe experimentally that when one or both of the echo or ref data for > speex_echo_cancel() have values outside of the range +/- 2^13 (and especially > +/- 2^14) that overflows occur leading to the obvious symptom of infinite > looping and probably less obvious results.Where does the infinite loop happen?> What was your intended domain for the input data?I was targeting about that type of range (a few samples outside of +-2^14), so I'd be interested in samples that cause incorrect behaviour. Because of the limited resolution (16 bits for most of the things) I've chosen, I sort of accepted I could not *guarantee* there wouldn't be overflows, but I thought they wouldn't happen in practice. This is why I'm interested in any help fixing that (especially if you can check where the overflows happen. In any case, an implementation on a "real DSP" (i.e. not ARM) should saturate the additions.> I observe experimentally that under some pathological conditions of double > talk where the echo and ref have the same frequency, the output is attenuated > to near zero.Well, if the ref and echo are perfectly correlated for a sufficient amount of time, it's simply not possible to distinguish one from the other, so this behaviour would be expected. But do you really have an example of that happening in the real world?> What assurance is there that in real life telephony we will not run into such > problems?No guarantee, but I'm targeting normal use of telephony. If I missed anything, let me know.> Non-power of two frame sizes and tail lengths seem to work just fine. What > should i know about acceptable or optimal frame sizes and tail lengths?I recommend using frame sizes of about 5-20 ms (samples depend on sampling rate) and tail lengths of 100-200 ms for acoustic echo. Of course, line echo would require less than that, but I've focused mainly on acoustic echo, which is a harder problem (but line echo should work as well).> I'm looking for ways to further reduce the cycle count besides enabling fixed > point and possibly providing some inline assembler for a fixed*.h file (my > target processor is mips4k series... no floating point).I think most of the gain would come from using an FFT optimised for your CPU. I've made it (relatively) easy to swap FFTs through my fftwrap.c wrapper.> I observe experimentally that the computation time on a per sample basis is > not heavily dependent on the tail length and is almost independent of the > frame size except for smaller frame sizes... i think... does this seem > correct?The complexity has many different components that depend on frame size, tail size and constant terms.> Is there a reasonable way to e.g. perform certain calculations only every > other frame or something of that manner?You could maybe do that for a few things, but the speed improvement probably would be worth the performance degradation. There are a few tricks you could use. For example, two "#if 1" that could be replaced by "#if 0" and reduce the complexity at the price of a bit of noise when adaptation is fast. I guess you could also do without the re-normalization of the weights and save a bit there as well. Jean-Marc
Thanks for your prompt and helpful reply. In FLOAT_DIVU() it hangs at the following: while (a.m >= b.m) { e++; a.m >>= 1; } for the case where a and b are both zero (yes, division by zero). This happens from mdf.c: leak_estimate = FLOAT_EXTRACT16(FLOAT_SHL(FLOAT_DIVU(st->Pey, st->Pyy),14)); where st->Pey and st->Pyy are both zero, which happens for the following input data to testecho program: -- first arg is file containing sine wave of magnitude +/- 32767 -- 2nd arg is file containing all zeroes The division by zero appears to be caused by the calculation: See = inner_prod(st->e+st->frame_size, st->e+st->frame_size, st->frame_size) which returns negative due to overflow occuring in mdf.c:inner_prod() : part = MAC16_16(part,*x++,*y++); part = MAC16_16(part,*x++,*y++); part = MAC16_16(part,*x++,*y++); part = MAC16_16(part,*x++,*y++); sum = ADD32(sum,SHR32(part,6)); This overflow can be avoided by rewriting this as: part = part + ((*x++ * *y++)>>1); part = part + ((*x++ * *y++)>>1); part = part + ((*x++ * *y++)>>1); part = part + ((*x++ * *y++)>>1); sum += part>>5; I am assuming that the value 0x8000 is avoided in the input. Even with this fix there is definitely some bad stuff going on; the output data is corrupted looking. I put assertions into FLOAT_MUL32U(), FLOAT_DIV32_FLOAT() and FLOAT_DIV32() to assert that the "a" arguments were non-negative. Using some real life data, i found that i had to shift the real life data right by two (i shifted both inputs by same amount) to avoid asserting; shifting by just one almost worked but failed for some case (i don't remember what that case was... not the one i mentioned above). [Alternately i could just clip the input to some threshold]. To be on the safe side with the above functions, i removed the asserts and modified the tests for == 0 to <= 0 ... Clearly, some sort of automated testing scheme could help here... -Ted On Monday 01 May 2006 21:33, Jean-Marc Valin wrote:> > I am writing to gain a better understanding of the limitations of speex > > echo cancellation, esp. with respect to the fixed point implementation. > > If these limitations have been documented elsewhere already, please let > > me know! > > Nothing officially documented, sorry. > > > I observe experimentally that when one or both of the echo or ref data > > for speex_echo_cancel() have values outside of the range +/- 2^13 (and > > especially +/- 2^14) that overflows occur leading to the obvious symptom > > of infinite looping and probably less obvious results. > > Where does the infinite loop happen? > > > What was your intended domain for the input data? > > I was targeting about that type of range (a few samples outside of > +-2^14), so I'd be interested in samples that cause incorrect behaviour. > Because of the limited resolution (16 bits for most of the things) I've > chosen, I sort of accepted I could not *guarantee* there wouldn't be > overflows, but I thought they wouldn't happen in practice. This is why > I'm interested in any help fixing that (especially if you can check > where the overflows happen. In any case, an implementation on a "real > DSP" (i.e. not ARM) should saturate the additions. > > > I observe experimentally that under some pathological conditions of > > double talk where the echo and ref have the same frequency, the output is > > attenuated to near zero. > > Well, if the ref and echo are perfectly correlated for a sufficient > amount of time, it's simply not possible to distinguish one from the > other, so this behaviour would be expected. But do you really have an > example of that happening in the real world? > > > What assurance is there that in real life telephony we will not run into > > such problems? > > No guarantee, but I'm targeting normal use of telephony. If I missed > anything, let me know. > > > Non-power of two frame sizes and tail lengths seem to work just fine. > > What should i know about acceptable or optimal frame sizes and tail > > lengths? > > I recommend using frame sizes of about 5-20 ms (samples depend on > sampling rate) and tail lengths of 100-200 ms for acoustic echo. Of > course, line echo would require less than that, but I've focused mainly > on acoustic echo, which is a harder problem (but line echo should work > as well). > > > I'm looking for ways to further reduce the cycle count besides enabling > > fixed point and possibly providing some inline assembler for a fixed*.h > > file (my target processor is mips4k series... no floating point). > > I think most of the gain would come from using an FFT optimised for your > CPU. I've made it (relatively) easy to swap FFTs through my fftwrap.c > wrapper. > > > I observe experimentally that the computation time on a per sample basis > > is not heavily dependent on the tail length and is almost independent of > > the frame size except for smaller frame sizes... i think... does this > > seem correct? > > The complexity has many different components that depend on frame size, > tail size and constant terms. > > > Is there a reasonable way to e.g. perform certain calculations only every > > other frame or something of that manner? > > You could maybe do that for a few things, but the speed improvement > probably would be worth the performance degradation. There are a few > tricks you could use. For example, two "#if 1" that could be replaced by > "#if 0" and reduce the complexity at the price of a bit of noise when > adaptation is fast. I guess you could also do without the > re-normalization of the weights and save a bit there as well. > > Jean-Marc
Hi Ted, Thanks a lot for this analysis.> In FLOAT_DIVU() it hangs at the following: > while (a.m >= b.m) > { > e++; > a.m >>= 1; > } > for the case where a and b are both zero (yes, division by zero). > This happens from mdf.c:True, that needs to be fixed even after I fix the rest.> leak_estimate = FLOAT_EXTRACT16(FLOAT_SHL(FLOAT_DIVU(st->Pey, st->Pyy),14)); > where st->Pey and st->Pyy are both zero, which happens for the following input > data to testecho program: > -- first arg is file containing sine wave of magnitude +/- 32767 > -- 2nd arg is file containing all zeroes > > The division by zero appears to be caused by the calculation: > See = inner_prod(st->e+st->frame_size, st->e+st->frame_size, st->frame_size)Does that also happen with "real life" signals or just high-amplitude sinusoids (probably worth fixing anyway).> which returns negative due to overflow occuring in mdf.c:inner_prod() : > part = MAC16_16(part,*x++,*y++); > part = MAC16_16(part,*x++,*y++); > part = MAC16_16(part,*x++,*y++); > part = MAC16_16(part,*x++,*y++); > sum = ADD32(sum,SHR32(part,6)); > This overflow can be avoided by rewriting this as: > part = part + ((*x++ * *y++)>>1); > part = part + ((*x++ * *y++)>>1); > part = part + ((*x++ * *y++)>>1); > part = part + ((*x++ * *y++)>>1); > sum += part>>5; > I am assuming that the value 0x8000 is avoided in the input.I think we can safely assume that -- or actually enforce that because it would likely break other stuff.> Even with this fix there is definitely some bad stuff going on; the output > data is corrupted looking. > I put assertions into FLOAT_MUL32U(), FLOAT_DIV32_FLOAT() and FLOAT_DIV32() to > assert that the "a" arguments were non-negative.Technically these functions *should* work (they don't at the moment) for negative inputs, but mdf.c isn't supposed to use that in the first place. I guess it comes down to the same problem as above...> Using some real life data, i > found that i had to shift the real life data right by two (i shifted both > inputs by same amount) to avoid asserting; shifting by just one almost worked > but failed for some case (i don't remember what that case was... not the one > i mentioned above).I'd be interested in that sample if you can find it.> [Alternately i could just clip the input to some threshold].Except for the -32768 value, I don't want to do that because clipping is highly non-linear and thus something the AEC doesn't like at all.> To be on the safe side with the above functions, i removed the asserts and > modified the tests for == 0 to <= 0 ... > > Clearly, some sort of automated testing scheme could help here...One thing I used to do (should do it more often) is test with FIXED_DEBUG (--enable-fixed-point-debug) which puts assertions in all fixed-point operators (except the pseudofloat stuff, which is rather new). Helps a lot tracking problems down. Jean-Marc