thr3ads.net - Speex dev - [Speex-dev] Backup Echo Suppression [Jul 2007]

If this information is useful, please help other people find it:
Share via:

zmorris@mac.com

2007-Jul-02 19:42 UTC

[Speex-dev] Backup Echo Suppression

On Jul 2, 2007, at 7:34 PM, Jean-Marc Valin wrote:
> Selon "Coffey, Michael" <mcoffey@avistar.com>:
>> Believe me; I've "played with" priorities and buffering.
>
> Then either you haven't played well enough or you're using a  
> braindead OS.
This is sort of what I was talking about with nibbling.  Imagine you  
have a microphone sampling at 128 samples at a time, filling a 256  
byte buffer, and you have a player that writes 256 samples at a time,  
or 512 bytes.  You have to nibble a frame every 160 samples, so you  
get this, where each digit represents 32 samples, so 00000 is 160  
samples:

     0   1   2   3       4   5   6   7   	<- timestamp when each  
speex frame is read
0000111122223333444455556666777788889999	<- input frame
0000011111222223333344444555556666677777	<- speex frame

     00000
         11111
             22222
                 33333
                         44444
                             55555
                                 66666
                                     77777

0000011111222223333344444555556666677777	<- speex frame
0000000011111111222222223333333344444444	<- output frame
0       2       4       5       7       	<- timestamp when each speex  
frame is written
1       3               6

00000
11111
         22222
         33333
                 44444
                         55555
                         66666
                                 77777

I've shown the points in time when an input buffer can be passed into  
a speex frame, or a speex frame can be passed into an output buffer.

The echo canceler can't assume that each input/output pair are going  
to arrive perfectly synced and at the same time.  Due to threading  
delays and other issues, it could easily get 2 inputs and 1 output  
briefly, or vice versa.

I THINK that looking at this from a high level, the echo canceler IS  
guaranteed to get an input frame for every output frame, as long as  
it doesn't look at the frame's timestamp.  Perhaps internally it has  
a queue that can save up frames until it has both an input and an  
output frame.  In that case, it needs to stop writing warnings about  
extra or missing frames to the console, which seems to happen every  
time I run.

But if the echo canceler IS using each frame's timestamp when it's  
trying to converge, it's almost guaranteed to fail on most operating  
systems, because the timestamp has such a high variability between  
frames, and can even sometimes be 0 for the output buffer in this  
example.

Also, I think that many machines have separate input/output hardware  
that can suffer from clock drift.  I'd really like to see an echo  
canceler that can work even when input/output frames are fed in with  
a large random time delta.  I should be able to skip the first few  
input or output frames, and the echo canceller should be able to find  
out what the time delta is, and know from that point on, it will be  
relatively constant between any given pair of input/output frames.

The easiest way to do this might be to look at the maximum of the  
covariance of the input/output, or find the phase offset of the input  
and output FFTs.  Maybe it already does this, and someone can say if so?

P.S. The above situation is almost exactly what happens on my Mac,  
and would be exacerbated by people with third party sound cards.

------------------------------------------------------------------------
Zack Morris              Z Sculpt Entertainment               This Space
zmorris@zsculpt.com      http://www.zsculpt.com                 For Rent
------------------------------------------------------------------------
If the doors of perception were cleansed, everything would appear to man
   as it is, infinite. -William Blake, The Marriage of Heaven and Hell

Jean-Marc Valin

2007-Jul-02 20:48 UTC

head link

[Speex-dev] Backup Echo Suppression

Selon zmorris@mac.com:> This is sort of what I was talking about with nibbling.  Imagine you
> have a microphone sampling at 128 samples at a time, filling a 256
> byte buffer, and you have a player that writes 256 samples at a time,
> or 512 bytes.  You have to nibble a frame every 160 samples, so you
> get this, where each digit represents 32 samples, so 00000 is 160
> samples:
...> I've shown the points in time when an input buffer can be passed into
> a speex frame, or a speex frame can be passed into an output buffer.
>
> The echo canceler can't assume that each input/output pair are going
> to arrive perfectly synced and at the same time.  Due to threading
> delays and other issues, it could easily get 2 inputs and 1 output
> briefly, or vice versa.
As long as the capture and playback clocks are in sync, there's no problem.
If
the frame sizes don't match, you'll just need to do a bit of buffering.
No big
deal. The only requirement is that the first playback sample you send the AEC
has to arrive as echo in the capture with a fixed delay (that isn't too
large
compared to the tail length).
> I THINK that looking at this from a high level, the echo canceler IS
> guaranteed to get an input frame for every output frame, as long as
> it doesn't look at the frame's timestamp.  Perhaps internally it
has
> a queue that can save up frames until it has both an input and an
> output frame.  In that case, it needs to stop writing warnings about
> extra or missing frames to the console, which seems to happen every
> time I run.
One of those warnings is OK. If you get many, something's wrong.
> But if the echo canceler IS using each frame's timestamp when it's
> trying to converge, it's almost guaranteed to fail on most operating
> systems, because the timestamp has such a high variability between
> frames, and can even sometimes be 0 for the output buffer in this
> example.
Don't know what you mean about timestamps. the AEC doesn't use/need
timestamps.
But it does require you send the audio in the same order you capture/play it.
> Also, I think that many machines have separate input/output hardware
> that can suffer from clock drift.  I'd really like to see an echo
> canceler that can work even when input/output frames are fed in with
> a large random time delta.  I should be able to skip the first few
> input or output frames, and the echo canceller should be able to find
> out what the time delta is, and know from that point on, it will be
> relatively constant between any given pair of input/output frames.
This is a lot harder than you may think. estimating the drift accurately enough
is highly non-trivial. It's much easier to make sure the clocks are in sync
(e.g. tell the user to use the same card for both).
> The easiest way to do this might be to look at the maximum of the
> covariance of the input/output, or find the phase offset of the input
> and output FFTs.  Maybe it already does this, and someone can say if so?
If you think it's easy, then I guess I'll be waiting for your patch...
> P.S. The above situation is almost exactly what happens on my Mac,
> and would be exacerbated by people with third party sound cards.
You mean Apple can't ship a soundcard that records and plays at the same
rate? I
have a hard time believing that.

   Jean-Marc

zmorris@mac.com

2007-Jul-02 23:45 UTC

head link

[Speex-dev] Backup Echo Suppression

On Jul 2, 2007, at 9:48 PM, Jean-Marc Valin wrote:
> Selon zmorris@mac.com:
>> But if the echo canceler IS using each frame's timestamp when
it's
>> trying to converge, it's almost guaranteed to fail on most
operating
>> systems, because the timestamp has such a high variability between
>> frames, and can even sometimes be 0 for the output buffer in this
>> example.
>
> Don't know what you mean about timestamps. the AEC doesn't use/need
> timestamps.
> But it does require you send the audio in the same order you  
> capture/play it.
I just mean, if the echo canceler is using timing information to try  
to find the echo, then it probably won't work, but if it just works  
on each pair of buffers, then it will, and it sounds like that is how  
it works, which is good to know, thanx.
>> Also, I think that many machines have separate input/output hardware
>> that can suffer from clock drift.  I'd really like to see an echo
>> canceler that can work even when input/output frames are fed in with
>> a large random time delta.  I should be able to skip the first few
>> input or output frames, and the echo canceller should be able to find
>> out what the time delta is, and know from that point on, it will be
>> relatively constant between any given pair of input/output frames.
>
> This is a lot harder than you may think. estimating the drift  
> accurately enough
> is highly non-trivial. It's much easier to make sure the clocks are  
> in sync
> (e.g. tell the user to use the same card for both).
Unfortunately, I think that with things like Garage Band introducing  
the world to electronic music, we're going to see more and more  
strange configurations with third party sound cards, which never used  
to be an issue on the Mac.  As a shareware game designer, I have to  
cope with the fact that people only tolerate about 3 seconds of fuss  
before they toss my game in the garbage.

So even though it is difficult, I think that echo canceler 2.0 should  
be able to tolerate things like multiple audio sources (5.1 etc),  
even multiple mics, from multiple sound cards.  This is probably  
beyond the scope of speex, but it's going to become more of an issue  
as people want high fidelity video chat/telepresence that "just  
works".  I haven't read the manual you suggested, and I haven't
even
tried the newest speex beta yet, so maybe all of this isn't needed.
>> The easiest way to do this might be to look at the maximum of the
>> covariance of the input/output, or find the phase offset of the input
>> and output FFTs.  Maybe it already does this, and someone can say  
>> if so?
>
> If you think it's easy, then I guess I'll be waiting for your
patch...
Hah ya blah.  One way to go about exploring something like that would  
be in something like Matlab that makes brainstorming easy.  The  
covariance thing is actually only 1 line of code in Matlab, but it  
can be intensive.  It would be better to use FFTs from elsewhere in  
speex if you have them, but I dunno how your underlying  
implementation works.
>> P.S. The above situation is almost exactly what happens on my Mac,
>> and would be exacerbated by people with third party sound cards.
>
> You mean Apple can't ship a soundcard that records and plays at the  
> same rate? I
> have a hard time believing that.
It is actually very likely that input and output are on the same card  
at 44100 Hz.  What's problematic is getting the right sized hardware  
buffers.  I wrote a whole library to nibble and give me the 160 I  
need, but I have no idea what is happening under the hood, or if I  
can even ask apple for the same size input and output buffers,  
because I am using their classic sound engine which is archaic by  
today's standards.  I could probably switch to using core audio or  
quicktime to get the right buffers, it's just a lot of work.  It  
sounds like all of that isn't necessary though, if as you said above,  
nibbling won't affect things because it doesn't use timestamps.

Thanx for the info,

--Zack

Apparently Analagous Threads

Search for more maybe matching threads

Speex dev - Jul 2007 - Backup Echo Suppression

[Speex-dev] Backup Echo Suppression

[Speex-dev] Backup Echo Suppression

[Speex-dev] Backup Echo Suppression

Apparently Analagous Threads