>> Generate a test signal (10+x sine waves per frame), where x increases by >> one for each iteration, and wraps around at 100. > > Testing with sine waves is usually not a good idea. If you intend on > cancelling speech, then test with speech.Ok, I tested more extensively with both music and two-way speech. More on this below.>> However, when peeking at the values, it seems that the weights for >> frame 0 (newest) are very low. > > Peeking at the value tells you nothing unless you do the inverse FFT and > all so you can see them in the time domain. Even then, it's not that > useful.Actually, computing the "power spectrum" for each frame of W shows how large an ammount of the original signal at time offset j the echo canceller thinks should be removed from the current input frame. If you compute W*X for each j and ifft, you'll get the original signal with each frequency component scaled and time-shifted according to what W was (for that j). Anyway, I did some proper testing. I took my headset, bent the microphone arm so it's resting inside the .. uh.. whatever you call that large muffler thing that goes around your ear. This is an important testcase, as a lot of our users have complained about hearing echo that is propagated at the remote end either directly though the air from the "speaker" to the microphone (common with open headsets), and with closed headsets we see echo propagated mechanically down the arm of the microphone. Playing regular pop music (Garbage: Push It), things work out well, and the canceller ends up with semi-stable weights, almost entirely in the (j==M-1) bin (0-20ms delay, which is quite natural). It's the same with normal speech as long as it's spoken reasonably fast. I see some "banding" of the output, it seems there's more output signal (and more to cancel) in the 1-3khz and 5-6 khz area, but I blame that on the headphones; they're cheap. However, when switching to AC DC: Big Gun, we see and hear a large residual echo from the opening el-guitar. This seems to be a result of a semi-stable sound that lasts more than 20 ms; the canceller finds a correlation in 4-5 timebins instead of just one. We could reproduce the same result by playing a human voice saying "aaaaaaaaaa" without variation in pitch; the weights for those frequency bins would increase for all the timeslots in W. Now, people don't say "aaaaaaaaaaaaa" all that often, but they do play music that has a few "long" sounds, and saying "aaaaanyway" is enough to trigger this. Next test, what happens if the user has an external (physical) on-off switch? Same setup, playing Big Gun as loud as it gets. Apart from the problems with the opening guitar everything is good, and we see the weights set as they should be and things are cancelled out. So, I switch the mic off externally with the switch. Input becomes practically zero, so the weights readjust to zero as well. Turn the microphone back on and the echo canceller doesn't adapt. That is, no echo cancellation, and the weights all stay at their zero values. This can happen quite frequently, so it would be nice if the echo canceller could deal with this situation without a complete reset. Now, when trying to visualize the weights to see a bit of what was going on, I also computed the phase for each frequency bin. When looking just at the phase, I can see a very clear and distinct pattern of going from -pi to +pi in the areas where I know there is echo (specifically, the lower 7khz of j==M-1), and what looks like random noise for the rest. Do you have any idea where this pattern originates from, and more importantly, could it be used as additional conditioning of W? (ie: if the phase doesn't match the pattern, reduce the amplitude as it's a false match).
> Actually, computing the "power spectrum" for each frame of W shows > how large an ammount of the original signal at time offset j the > echo canceller thinks should be removed from the current input frame.Careful when looking at W because of how the real and imaginary parts are packed in the array.> If you compute W*X for each j and ifft, you'll get the > original signal with each frequency component scaled and time-shifted > according to what W was (for that j).Yes, that's the Y/y signal in the code.> Anyway, I did some proper testing. I took my headset, bent the microphone > arm so it's resting inside the .. uh.. whatever you call that large > muffler thing that goes around your ear. This is an important testcase, as > a lot of our users have complained about hearing echo that is propagated > at the remote end either directly though the air from the "speaker" to the > microphone (common with open headsets), and with closed headsets we see > echo propagated mechanically down the arm of the microphone.If you hold that in you're hand, you're probably making it harder than for a real scenario because any movement causes the echo path to change.> Playing regular pop music (Garbage: Push It), things work out well, and > the canceller ends up with semi-stable weights, almost entirely in the > (j==M-1) bin (0-20ms delay, which is quite natural). It's the same with > normal speech as long as it's spoken reasonably fast.Fine.> I see some "banding" of the output, it seems there's more output signal > (and more to cancel) in the 1-3khz and 5-6 khz area, but I blame that on > the headphones; they're cheap.Not sure what you mean but it doesn't seem to be a problem.> However, when switching to AC DC: Big Gun, we see and hear a large > residual echo from the opening el-guitar. This seems to be a result of a > semi-stable sound that lasts more than 20 ms; the canceller finds a > correlation in 4-5 timebins instead of just one. We could reproduce the > same result by playing a human voice saying "aaaaaaaaaa" without variation > in pitch; the weights for those frequency bins would increase for all the > timeslots in W. > > Now, people don't say "aaaaaaaaaaaaa" all that often, but they do play > music that has a few "long" sounds, and saying "aaaaanyway" is enough to > trigger this.Can you sent a pair of files so I can run testecho on?> Next test, what happens if the user has an external (physical) on-off > switch? Same setup, playing Big Gun as loud as it gets. Apart from the > problems with the opening guitar everything is good, and we see the > weights set as they should be and things are cancelled out. > > So, I switch the mic off externally with the switch. Input becomes > practically zero, so the weights readjust to zero as well. Turn the > microphone back on and the echo canceller doesn't adapt. > That is, no echo cancellation, and the weights all stay at their zero > values. > > This can happen quite frequently, so it would be nice if the echo > canceller could deal with this situation without a complete reset.That can be predicted from the code. It's sort of hard to fix without hurting accuracy for the general case. I'll have to think about it.> Now, when trying to visualize the weights to see a bit of what was going > on, I also computed the phase for each frequency bin. When looking just at > the phase, I can see a very clear and distinct pattern of going from -pi > to +pi in the areas where I know there is echo (specifically, the lower > 7khz of j==M-1),What you see is a "linear phase", which is the frequency equivalent of a delay in the time domain. So basically, the phase you see is just the representation of where the "main impulse" is in the time domain version of W (i.e. the time offset between the two signals you sent to the AEC).> and what looks like random noise for the rest. Do you > have any idea where this pattern originates from, and more importantly, > could it be used as additional conditioning of W? (ie: if the phase > doesn't match the pattern, reduce the amplitude as it's a false match).A random phase is expected. I don't see much usefult info you can get from that. Jean-Marc
>> Actually, computing the "power spectrum" for each frame of W shows >> how large an ammount of the original signal at time offset j the >> echo canceller thinks should be removed from the current input frame. > > Careful when looking at W because of how the real and imaginary parts > are packed in the array.Err. Ok, as I got it, 'bin 0' has it's amplitude in W[0], bin 1 to N-1 has it's real part in W[i*2-1] and it's imag in W[i*2], and finally the nyquist amplitude is in W[N-1] I took this from how power_spectrum() computes, so I might be off :)>> Anyway, I did some proper testing. I took my headset, bent the microphone >> arm so it's resting inside the .. uh.. whatever you call that large >> muffler thing that goes around your ear. This is an important testcase, as >> a lot of our users have complained about hearing echo that is propagated >> at the remote end either directly though the air from the "speaker" to the >> microphone (common with open headsets), and with closed headsets we see >> echo propagated mechanically down the arm of the microphone. > > If you hold that in you're hand, you're probably making it harder than > for a real scenario because any movement causes the echo path to change.Actually, with maximum volume (which I used to make sure the echo really dominated over the noise), it's quite loud, so I left it in the corner.>> Now, people don't say "aaaaaaaaaaaaa" all that often, but they do play >> music that has a few "long" sounds, and saying "aaaaanyway" is enough to >> trigger this. > > Can you sent a pair of files so I can run testecho on?I'll need to add support for saving audio to my program, so I can give you the "actual" sampled loudspeaker and mic files, and I'll also need to get hold of a test person again. (I had a friend with a friend who has an exceptionally clear voice. My own "aaaaaa" is far too muddy to cause this). I'll try to get this done this week, but it might be delayed 'till after christmas.>> This can happen quite frequently, so it would be nice if the echo >> canceller could deal with this situation without a complete reset. > > That can be predicted from the code. It's sort of hard to fix without > hurting accuracy for the general case. I'll have to think about it.An idea might be to enable the noise cancellation to "feed back" into the echo cancellator. If, after noise cancellation, there's nothing left at all, then stop adapting the echo cancellator.>> Now, when trying to visualize the weights to see a bit of what was going >> on, I also computed the phase for each frequency bin. When looking just at >> the phase, I can see a very clear and distinct pattern of going from -pi >> to +pi in the areas where I know there is echo (specifically, the lower >> 7khz of j==M-1), > > What you see is a "linear phase", which is the frequency equivalent of a > delay in the time domain. So basically, the phase you see is just the > representation of where the "main impulse" is in the time domain version > of W (i.e. the time offset between the two signals you sent to the AEC).Ah, yes. I'm reading up on my DFT now. Amazing how much stuff you can forget.>> and what looks like random noise for the rest. Do you >> have any idea where this pattern originates from, and more importantly, >> could it be used as additional conditioning of W? (ie: if the phase >> doesn't match the pattern, reduce the amplitude as it's a false match). > > A random phase is expected. I don't see much usefult info you can get > from that.Well, from what I can see in this testcase, it's only "random" where there is no correlation. For example, in the 20ms-40ms timeslot, the amplitude can spike a bit (such as on those "aaaaaa"), but the phase is still random, whereas in the '0-20ms' slot, it's very regular. My thought was to use the "regularity" of the phase shift as an indication for a good match. So, if arg(W[i+1])-arg(W[i])==arg(W[i])-arg(W[i-1]), we know it's a steady increase, so it's probably a good match. It's quite hackish, and probably not based in any kind of good scientific basis, but it's an idea for dealing better with the specific kind of echo I see here. Then again, it will likely fail horribly if you have 2 echos; one delayed by 5ms with equal amplitude, and another delayed by 15ms with a much lower amplitude. I have no idea what the "phase diagram" will look like then.