>> Actually, computing the "power spectrum" for each frame of W shows >> how large an ammount of the original signal at time offset j the >> echo canceller thinks should be removed from the current input frame. > > Careful when looking at W because of how the real and imaginary parts > are packed in the array.Err. Ok, as I got it, 'bin 0' has it's amplitude in W[0], bin 1 to N-1 has it's real part in W[i*2-1] and it's imag in W[i*2], and finally the nyquist amplitude is in W[N-1] I took this from how power_spectrum() computes, so I might be off :)>> Anyway, I did some proper testing. I took my headset, bent the microphone >> arm so it's resting inside the .. uh.. whatever you call that large >> muffler thing that goes around your ear. This is an important testcase, as >> a lot of our users have complained about hearing echo that is propagated >> at the remote end either directly though the air from the "speaker" to the >> microphone (common with open headsets), and with closed headsets we see >> echo propagated mechanically down the arm of the microphone. > > If you hold that in you're hand, you're probably making it harder than > for a real scenario because any movement causes the echo path to change.Actually, with maximum volume (which I used to make sure the echo really dominated over the noise), it's quite loud, so I left it in the corner.>> Now, people don't say "aaaaaaaaaaaaa" all that often, but they do play >> music that has a few "long" sounds, and saying "aaaaanyway" is enough to >> trigger this. > > Can you sent a pair of files so I can run testecho on?I'll need to add support for saving audio to my program, so I can give you the "actual" sampled loudspeaker and mic files, and I'll also need to get hold of a test person again. (I had a friend with a friend who has an exceptionally clear voice. My own "aaaaaa" is far too muddy to cause this). I'll try to get this done this week, but it might be delayed 'till after christmas.>> This can happen quite frequently, so it would be nice if the echo >> canceller could deal with this situation without a complete reset. > > That can be predicted from the code. It's sort of hard to fix without > hurting accuracy for the general case. I'll have to think about it.An idea might be to enable the noise cancellation to "feed back" into the echo cancellator. If, after noise cancellation, there's nothing left at all, then stop adapting the echo cancellator.>> Now, when trying to visualize the weights to see a bit of what was going >> on, I also computed the phase for each frequency bin. When looking just at >> the phase, I can see a very clear and distinct pattern of going from -pi >> to +pi in the areas where I know there is echo (specifically, the lower >> 7khz of j==M-1), > > What you see is a "linear phase", which is the frequency equivalent of a > delay in the time domain. So basically, the phase you see is just the > representation of where the "main impulse" is in the time domain version > of W (i.e. the time offset between the two signals you sent to the AEC).Ah, yes. I'm reading up on my DFT now. Amazing how much stuff you can forget.>> and what looks like random noise for the rest. Do you >> have any idea where this pattern originates from, and more importantly, >> could it be used as additional conditioning of W? (ie: if the phase >> doesn't match the pattern, reduce the amplitude as it's a false match). > > A random phase is expected. I don't see much usefult info you can get > from that.Well, from what I can see in this testcase, it's only "random" where there is no correlation. For example, in the 20ms-40ms timeslot, the amplitude can spike a bit (such as on those "aaaaaa"), but the phase is still random, whereas in the '0-20ms' slot, it's very regular. My thought was to use the "regularity" of the phase shift as an indication for a good match. So, if arg(W[i+1])-arg(W[i])==arg(W[i])-arg(W[i-1]), we know it's a steady increase, so it's probably a good match. It's quite hackish, and probably not based in any kind of good scientific basis, but it's an idea for dealing better with the specific kind of echo I see here. Then again, it will likely fail horribly if you have 2 echos; one delayed by 5ms with equal amplitude, and another delayed by 15ms with a much lower amplitude. I have no idea what the "phase diagram" will look like then.
> Err. Ok, as I got it, 'bin 0' has it's amplitude in W[0], bin 1 to N-1 has > it's real part in W[i*2-1] and it's imag in W[i*2], and finally the > nyquist amplitude is in W[N-1]Not quite, it's packet "real, real, imag, real, imag, ...".> I took this from how power_spectrum() computes, so I might be off :)But power_spectrum() handles that fine, you're right.> > If you hold that in you're hand, you're probably making it harder than > > for a real scenario because any movement causes the echo path to change. > > Actually, with maximum volume (which I used to make sure the echo really > dominated over the noise), it's quite loud, so I left it in the corner.That fine then... as long as the max volume doesn't cause too much distortion (the AEC models only linear effects).> I'll need to add support for saving audio to my program, so I can give you > the "actual" sampled loudspeaker and mic files, and I'll also need to get > hold of a test person again. (I had a friend with a friend who has an > exceptionally clear voice. My own "aaaaaa" is far too muddy to cause > this). I'll try to get this done this week, but it might be delayed 'till > after christmas.Let me know if you have files that cause the problem. Otherwise, it's pretty much impossible to debug.> >> This can happen quite frequently, so it would be nice if the echo > >> canceller could deal with this situation without a complete reset. > > > > That can be predicted from the code. It's sort of hard to fix without > > hurting accuracy for the general case. I'll have to think about it. > > An idea might be to enable the noise cancellation to "feed back" into the > echo cancellator. If, after noise cancellation, there's nothing left at > all, then stop adapting the echo cancellator.There's always "something" left after the echo cancellation, if only the input noise. And even then it wouldn't fix the problem.> Well, from what I can see in this testcase, it's only "random" where there > is no correlation. For example, in the 20ms-40ms timeslot, the amplitude > can spike a bit (such as on those "aaaaaa"), but the phase is still > random, whereas in the '0-20ms' slot, it's very regular. My thought was to > use the "regularity" of the phase shift as an indication for a good match.No. All the regularity means is that you have a dominant pulse in the transfer function. That's expected for the first section of the filter (because of the direct sound path), but not the others (that are really just a lot of incoherent stuff).> So, if arg(W[i+1])-arg(W[i])==arg(W[i])-arg(W[i-1]), we know it's a steady > increase, so it's probably a good match. It's quite hackish, and probably > not based in any kind of good scientific basis, but it's an idea for > dealing better with the specific kind of echo I see here.What good would it really tell you anyway?> Then again, it will likely fail horribly if you have 2 echos; one delayed > by 5ms with equal amplitude, and another delayed by 15ms with a much lower > amplitude.That's actually common if you have a wall (or the floor) not too far from the mic or the speaker.> I have no idea what the "phase diagram" will look like then.Messy and dependent on the amplitudes too. Jean-Marc
>> I'll need to add support for saving audio to my program, so I can give you >> the "actual" sampled loudspeaker and mic files, and I'll also need to get >> hold of a test person again. (I had a friend with a friend who has an >> exceptionally clear voice. My own "aaaaaa" is far too muddy to cause >> this). I'll try to get this done this week, but it might be delayed 'till >> after christmas. > > Let me know if you have files that cause the problem. Otherwise, it's > pretty much impossible to debug.Ah. It seems I made a "minor" mistake. Remember I said I used my headset for testing and stuck the microphone into one of it's "speakers"? Well.. headsets ARE stereo, and the music it had trouble cancelling was music with strong stereo separation -- the residual echo was the sound from the other speaker. Which is quite natural. Same goes for the voice, as we use positional audio. (Helps phenomenally to have positional audio when two people try to talk at the same time; without positioning it is very hard to distinguish them). So, redoing the simple tests after adjusting the balance so all the sounds came from just the right side, everything worked PERFECTLY. Sorry for the erronous bugreport :( Anyway, I found a few papers on multisource echo cancellation. Unless it's already a feature on the short-term "TODO" list, I'll take a stab at adding simple multisource cancelling after christmas.