thr3ads.net - Speex dev - [Speex-dev] mdf -- better adaption of W? [Dec 2005]

If this information is useful, please help other people find it:
Share via:

Thorvald Natvig

2005-Dec-12 22:29 UTC

[Speex-dev] mdf -- better adaption of W?

>> Actually, computing the "power spectrum" for each frame of W
shows
>> how large an ammount of the original signal at time offset j the
>> echo canceller thinks should be removed from the current input frame.
>
> Careful when looking at W because of how the real and imaginary parts
> are packed in the array.
Err. Ok, as I got it, 'bin 0' has it's amplitude in W[0], bin 1 to
N-1 has
it's real part in W[i*2-1] and it's imag in W[i*2], and finally the 
nyquist amplitude is in W[N-1]

I took this from how power_spectrum() computes, so I might be off :)
>> Anyway, I did some proper testing. I took my headset, bent the
microphone
>> arm so it's resting inside the .. uh.. whatever you call that large
>> muffler thing that goes around your ear. This is an important testcase,
as
>> a lot of our users have complained about hearing echo that is
propagated
>> at the remote end either directly though the air from the
"speaker" to the
>> microphone (common with open headsets), and with closed headsets we see
>> echo propagated mechanically down the arm of the microphone.
>
> If you hold that in you're hand, you're probably making it harder
than
> for a real scenario because any movement causes the echo path to change.
Actually, with maximum volume (which I used to make sure the echo really 
dominated over the noise), it's quite loud, so I left it in the corner.
>> Now, people don't say "aaaaaaaaaaaaa" all that often, but
they do play
>> music that has a few "long" sounds, and saying
"aaaaanyway" is enough to
>> trigger this.
>
> Can you sent a pair of files so I can run testecho on?
I'll need to add support for saving audio to my program, so I can give you 
the "actual" sampled loudspeaker and mic files, and I'll also need
to get
hold of a test person again. (I had a friend with a friend who has an 
exceptionally clear voice. My own "aaaaaa" is far too muddy to cause 
this). I'll try to get this done this week, but it might be delayed
'till
after christmas.
>> This can happen quite frequently, so it would be nice if the echo
>> canceller could deal with this situation without a complete reset.
>
> That can be predicted from the code. It's sort of hard to fix without
> hurting accuracy for the general case. I'll have to think about it.
An idea might be to enable the noise cancellation to "feed back" into
the
echo cancellator. If, after noise cancellation, there's nothing left at 
all, then stop adapting the echo cancellator.
>> Now, when trying to visualize the weights to see a bit of what was
going
>> on, I also computed the phase for each frequency bin. When looking just
at
>> the phase, I can see a very clear and distinct pattern of going from
-pi
>> to +pi in the areas where I know there is echo (specifically, the lower
>> 7khz of j==M-1),
>
> What you see is a "linear phase", which is the frequency
equivalent of a
> delay in the time domain. So basically, the phase you see is just the
> representation of where the "main impulse" is in the time domain
version
> of W (i.e. the time offset between the two signals you sent to the AEC).
Ah, yes. I'm reading up on my DFT now. Amazing how much stuff you can 
forget.
>> and what looks like random noise for the rest. Do you
>> have any idea where this pattern originates from, and more importantly,
>> could it be used as additional conditioning of W? (ie: if the phase
>> doesn't match the pattern, reduce the amplitude as it's a false
match).
>
> A random phase is expected. I don't see much usefult info you can get
> from that.
Well, from what I can see in this testcase, it's only "random"
where there
is no correlation. For example, in the 20ms-40ms timeslot, the amplitude 
can spike a bit (such as on those "aaaaaa"), but the phase is still 
random, whereas in the '0-20ms' slot, it's very regular. My thought
was to
use the "regularity" of the phase shift as an indication for a good
match.
So, if arg(W[i+1])-arg(W[i])==arg(W[i])-arg(W[i-1]), we know it's a steady 
increase, so it's probably a good match. It's quite hackish, and
probably
not based in any kind of good scientific basis, but it's an idea for 
dealing better with the specific kind of echo I see here.

Then again, it will likely fail horribly if you have 2 echos; one delayed 
by 5ms with equal amplitude, and another delayed by 15ms with a much lower 
amplitude. I have no idea what the "phase diagram" will look like
then.

Jean-Marc Valin

2005-Dec-13 01:34 UTC

head link

[Speex-dev] mdf -- better adaption of W?

> Err. Ok, as I got it, 'bin 0' has it's amplitude in W[0], bin 1
to N-1 has
> it's real part in W[i*2-1] and it's imag in W[i*2], and finally the
> nyquist amplitude is in W[N-1]
Not quite, it's packet "real, real, imag, real, imag, ...".
> I took this from how power_spectrum() computes, so I might be off :)
But power_spectrum() handles that fine, you're right.
> > If you hold that in you're hand, you're probably making it
harder than
> > for a real scenario because any movement causes the echo path to
change.
> 
> Actually, with maximum volume (which I used to make sure the echo really 
> dominated over the noise), it's quite loud, so I left it in the corner.
That fine then... as long as the max volume doesn't cause too much
distortion (the AEC models only linear effects).
> I'll need to add support for saving audio to my program, so I can give
you
> the "actual" sampled loudspeaker and mic files, and I'll also
need to get
> hold of a test person again. (I had a friend with a friend who has an 
> exceptionally clear voice. My own "aaaaaa" is far too muddy to
cause
> this). I'll try to get this done this week, but it might be delayed
'till
> after christmas.
Let me know if you have files that cause the problem. Otherwise, it's
pretty much impossible to debug.
> >> This can happen quite frequently, so it would be nice if the echo
> >> canceller could deal with this situation without a complete reset.
> >
> > That can be predicted from the code. It's sort of hard to fix
without
> > hurting accuracy for the general case. I'll have to think about
it.
> 
> An idea might be to enable the noise cancellation to "feed back"
into the
> echo cancellator. If, after noise cancellation, there's nothing left at
> all, then stop adapting the echo cancellator.
There's always "something" left after the echo cancellation, if
only the
input noise. And even then it wouldn't fix the problem.
> Well, from what I can see in this testcase, it's only
"random" where there
> is no correlation. For example, in the 20ms-40ms timeslot, the amplitude 
> can spike a bit (such as on those "aaaaaa"), but the phase is
still
> random, whereas in the '0-20ms' slot, it's very regular. My
thought was to
> use the "regularity" of the phase shift as an indication for a
good match.
No. All the regularity means is that you have a dominant pulse in the
transfer function. That's expected for the first section of the filter
(because of the direct sound path), but not the others (that are really
just a lot of incoherent stuff).
> So, if arg(W[i+1])-arg(W[i])==arg(W[i])-arg(W[i-1]), we know it's a
steady
> increase, so it's probably a good match. It's quite hackish, and
probably
> not based in any kind of good scientific basis, but it's an idea for 
> dealing better with the specific kind of echo I see here.
What good would it really tell you anyway?
> Then again, it will likely fail horribly if you have 2 echos; one delayed 
> by 5ms with equal amplitude, and another delayed by 15ms with a much lower 
> amplitude. 
That's actually common if you have a wall (or the floor) not too far
from the mic or the speaker.
> I have no idea what the "phase diagram" will look like then.
Messy and dependent on the amplitudes too.

	Jean-Marc

Thorvald Natvig

2005-Dec-15 22:49 UTC

head link

[Speex-dev] mdf -- better adaption of W?

>> I'll need to add support for saving audio to my program, so I can
give you
>> the "actual" sampled loudspeaker and mic files, and I'll
also need to get
>> hold of a test person again. (I had a friend with a friend who has an
>> exceptionally clear voice. My own "aaaaaa" is far too muddy
to cause
>> this). I'll try to get this done this week, but it might be delayed
'till
>> after christmas.
>
> Let me know if you have files that cause the problem. Otherwise, it's
> pretty much impossible to debug.
Ah. It seems I made a "minor" mistake. Remember I said I used my 
headset for testing and stuck the microphone into one of it's
"speakers"?
Well.. headsets ARE stereo, and the music it had trouble cancelling was 
music with strong stereo separation -- the residual echo was the sound 
from the other speaker. Which is quite natural.

Same goes for the voice, as we use positional audio. (Helps phenomenally 
to have positional audio when two people try to talk at the same time; 
without positioning it is very hard to distinguish them).

So, redoing the simple tests after adjusting the balance so all the sounds 
came from just the right side, everything worked PERFECTLY.

Sorry for the erronous bugreport :(

Anyway, I found a few papers on multisource echo cancellation. Unless it's 
already a feature on the short-term "TODO" list, I'll take a stab
at
adding simple multisource cancelling after christmas.

Seemingly Similar Threads

Search for more apparently analagous threads

Speex dev - Dec 2005 - mdf -- better adaption of W?

[Speex-dev] mdf -- better adaption of W?

[Speex-dev] mdf -- better adaption of W?

[Speex-dev] mdf -- better adaption of W?

Seemingly Similar Threads