thr3ads.net - Speex dev - [Speex-dev] Stream Synchronization for Echo Cancellation [Nov 2006]

If this information is useful, please help other people find it:
Share via:

Jean-Marc Valin

2006-Nov-01 07:55 UTC

[Speex-dev] Stream Synchronization for Echo Cancellation

> In those cases, when you get let's say 1000 packets of 20ms from the
mic
> you may have only 990 packets of 20ms from RTP incoming stream.
> 
> Thus, before sending outgoing mic/RTP stream, you would wait for 1000
> incoming packets: where last packet in fact arrive 10*20ms = 200ms
> after it was supposed to. I have from my experience already seen 4s
> of clock deviation each minutes between one USB headset and other
> sound card....
> 
> In this case, synchronisation is a nightmare. It seems to be similar
> issue than the one described in your link, but the difference is really
> unpredictable and the resolution does not seems as simple...
> 
> Anybody that wish to share experience on this?
Actually, the jitter buffer in Speex tends to cope relatively well with
non-synchronised clocks. The only that that really doesn't like it is
the echo canceller. Even a drift by one sample means that the echo
canceller needs to re-adapt. So as soon as the (local) clocks aren't
*perfectly* synchronised, the echo cancellation performance goes down to
a point where it's mainly unusable.

	Jean-Marc

> Tks,
> Aymeric MOIZARD / ANTISIP
> amsip - http://www.antisip.com
> osip2 - http://www.osip.org
> eXosip2 - http://savannah.nongnu.org/projects/exosip/
> 
> 
> On Wed, 1 Nov 2006, Tom Grandgent wrote:
> 
>> Isn't this the same problem described starting at the bottom of
>> this page?
>> http://www.embeddedstar.com/articles/2003/7/article20030720-11.html
>>
>> Jean-Marc Valin <jean-marc.valin@usherbrooke.ca> wrote:
>>>
>>>> As it says in 5.4.1 of the good book "Using a different
soundcard to do
>>>> the capture and playback will *not* work, regardless of what
you may
>>>> think. The only exception to that is if the two cards can be
made to
>>>> have their sampling clock 'locked' on the same clock
source."
>>>>
>>>> It seems to me that it should be possible to achieve
synchronization
>>>> using some combination of cross-correlation, clock skew
estimation, and
>>>> sample interpolation. But there are so many details to
consider, I bet
>>>> it would take a long time to get right.
>>>
>>> When you get that to work, please let me know and we'll publish
some
>>> papers about it. Until then, your best hope is in echo
*suppression*
>>> (i.e. frequency-dependent gain), although even that could be a bit
>>> tricky.
>>>
>>>     Jean-Marc
>>> _______________________________________________
>>> Speex-dev mailing list
>>> Speex-dev@xiph.org
>>> http://lists.xiph.org/mailman/listinfo/speex-dev
>>
>> _______________________________________________
>> Speex-dev mailing list
>> Speex-dev@xiph.org
>> http://lists.xiph.org/mailman/listinfo/speex-dev
>>
>>
> 
> 
>

Aymeric Moizard

2006-Nov-01 09:29 UTC

head link

[Speex-dev] Stream Synchronization for Echo Cancellation

On Thu, 2 Nov 2006, Jean-Marc Valin wrote:
>> In those cases, when you get let's say 1000 packets of 20ms from
the mic
>> you may have only 990 packets of 20ms from RTP incoming stream.
>>
>> Thus, before sending outgoing mic/RTP stream, you would wait for 1000
>> incoming packets: where last packet in fact arrive 10*20ms = 200ms
>> after it was supposed to. I have from my experience already seen 4s
>> of clock deviation each minutes between one USB headset and other
>> sound card....
>>
>> In this case, synchronisation is a nightmare. It seems to be similar
>> issue than the one described in your link, but the difference is really
>> unpredictable and the resolution does not seems as simple...
>>
>> Anybody that wish to share experience on this?
>
> Actually, the jitter buffer in Speex tends to cope relatively well with
> non-synchronised clocks.
Can you explain why?

My problem is not at all related to local input/output non-synchronised 
clocks: my problem is really between non-synchronised clock between one
PC and another...
> The only that that really doesn't like it is the echo canceller.
In my above case, If I add 10 extra packets regularly in the incoming 
stream (the one that miss 10 packets), the echo canceller is working 
perfectly.

I was just trying to comment on the paper you linked to: My opinion is 
that the problem don't only comes from local hardware (where non-synchro 
clocks leads to problem with aec). There are other problems with different
clocks on 2 remote hardware. (where non-synchro does not lead to aec 
issue, but leads to missing data (sometimes no data is played) or too 
much data (the application has to discard else the voice delay is 
growing because a buffer is growing)

The only way would be to extend or reduce frames: so my question was:
does anybody here have ever tried this in real time on audio streaming?
Any simple idea to do this?

I would love to see code on this...
Tks,
Aymeric MOIZARD / ANTISIP
amsip - http://www.antisip.com
osip2 - http://www.osip.org
eXosip2 - http://savannah.nongnu.org/projects/exosip/

Jim Crichton

2006-Nov-01 14:00 UTC

head link

[Speex-dev] Stream Synchronization for Echo Cancellation

>>> In those cases, when you get let's say 1000 packets of 20ms
from the mic
>>> you may have only 990 packets of 20ms from RTP incoming stream.
>>>
>>> Thus, before sending outgoing mic/RTP stream, you would wait for
1000
>>> incoming packets: where last packet in fact arrive 10*20ms = 200ms
>>> after it was supposed to. I have from my experience already seen 4s
>>> of clock deviation each minutes between one USB headset and other
>>> sound card....
>>>
>>> In this case, synchronisation is a nightmare. It seems to be
similar
>>> issue than the one described in your link, but the difference is
really
>>> unpredictable and the resolution does not seems as simple...
>>>
>>> Anybody that wish to share experience on this?
>>
>> Actually, the jitter buffer in Speex tends to cope relatively well with
>> non-synchronised clocks.
>
> Can you explain why?
>
> My problem is not at all related to local input/output non-synchronised 
> clocks: my problem is really between non-synchronised clock between one
> PC and another...
>
>> The only that that really doesn't like it is the echo canceller.
>
> In my above case, If I add 10 extra packets regularly in the incoming 
> stream (the one that miss 10 packets), the echo canceller is working 
> perfectly.
>If your microphone and speaker clocks are locked, then the echo canceller 
will be happy.  However, the Speex decoder needs to run according to the 
local timing, not according to the RTP packet arrival rate.  Otherwise, the 
output sample stream will over/underrun, and that will kill the echo 
canceller.  That is one function of a jitter buffer.  If you monitor the 
fill level of the buffer, you can drop or duplicate frames when some 
threshold is reached (rather than doing this at fixed intervals based on a 
measured packet arrival rate).  This is less disruptive than having the 
jitter buffer delay rebuild when it overruns/underruns.  In the presence of 
jitter, the measurement gets more difficult, of course.  I do not know how 
the Speex jitter buffer works in this situation, since I use something 
different in my application.
> I was just trying to comment on the paper you linked to: My opinion is 
> that the problem don't only comes from local hardware (where
non-synchro
> clocks leads to problem with aec). There are other problems with different
> clocks on 2 remote hardware. (where non-synchro does not lead to aec 
> issue, but leads to missing data (sometimes no data is played) or too much 
> data (the application has to discard else the voice delay is growing 
> because a buffer is growing)
>
> The only way would be to extend or reduce frames: so my question was:
> does anybody here have ever tried this in real time on audio streaming?
> Any simple idea to do this?
If there is no jitter or packet loss, then it is easy:  just buffer a couple 
of frames, and then repeat a frame if you run out, and drop a frame if the 
buffer gets too full.

You need some kind of jitter buffer, certainly, but I wonder if the USB 
problem is really the sample clock rate.  You are talking about a 1% error 
in frequency.  A typical spec for voiceband modems (e.g. V.32) is 0.01%.  To 
get a 1% error, the USB device would have to have a resistor/capacitor 
timing circuit instead of a crystal oscillator, and that is hard to believe. 
I suppose in a cheap headset, anything is possible.

Is it possible that there is a processing problem such that samples are 
being dropped on the USB interface, which creates the apparently low sample 
rate (only 99 of every 100 samples are encoded)?  You could still compensate 
for that with jitter buffer adjustments, as above, but the audio would 
certainly be degraded.

- Jim

Jean-Marc Valin

2006-Nov-01 14:59 UTC

head link

[Speex-dev] Stream Synchronization for Echo Cancellation

>> Actually, the jitter buffer in Speex tends to cope relatively well with
>> non-synchronised clocks.
> 
> Can you explain why?
> 
> My problem is not at all related to local input/output non-synchronised
> clocks: my problem is really between non-synchronised clock between one
> PC and another...
What happens is that my jitter buffer is designed without any explicit
clock. The pace at which you get data from it is assumed to be the local
clock. The jitter buffer is designed to buffer just enough packets to
prevent most packets from arriving late. It means that when we are in
steady-state conditions (jitter not changing). In that case:
1) If the packets arrive to fast (remote clock is faster), then it will
discard packets once in a while to maintain the optimal buffer size.
2) If the packets arrive too slow, interpolation will happen because the
buffer becomes too small.

Note that there is no explicit buffer size, the cases 1) and 2) above
are determined only based on the histogram of the packet arrival time. I
don't even attempt to know whether 1) the network delay is changing 2)
the clocks are drifting. It would be impossible anyway without an
accurate clock.
>> The only that that really doesn't like it is the echo canceller.
> 
> In my above case, If I add 10 extra packets regularly in the incoming
> stream (the one that miss 10 packets), the echo canceller is working
> perfectly.
Well, if your local capture clock and your local playback clock are
synchronised it's fine. The AEC doesn't care if the remote clock is
drifting.
> I was just trying to comment on the paper you linked to: My opinion is
> that the problem don't only comes from local hardware (where
non-synchro
> clocks leads to problem with aec). There are other problems with different
> clocks on 2 remote hardware. (where non-synchro does not lead to aec
> issue, but leads to missing data (sometimes no data is played) or too
> much data (the application has to discard else the voice delay is
> growing because a buffer is growing)
As long as the clock drift is small (typically <1%), I don't see any
problem from a jitter buffer point of view.
> The only way would be to extend or reduce frames: so my question was:
> does anybody here have ever tried this in real time on audio streaming?
> Any simple idea to do this?
You can try the jitter buffer I have. The only thing that I need to
improve is to wait for silence periods before adding/removing packets.

	Jean-Marc

Coffey, Michael

2006-Nov-10 19:04 UTC

head link

[Speex-dev] Stream Synchronization for Echo Cancellation

Following up on the original topic of synchronization between the local
mic and local speaker streams:

We can separate this problem into two sub-problems: (1) compensating for
differences in sampling rates; and (2) compensating for delay between
the two streams.

For estimating the delay, what do you think of the idea of using
cross-correlation?

-mjc

-----Original Message-----
From: Jean-Marc Valin [mailto:jean-marc.valin@usherbrooke.ca] 
Sent: Wednesday, November 01, 2006 7:51 AM
To: Aymeric Moizard
Cc: Tom Grandgent; Coffey, Michael; speex-dev@xiph.org
Subject: Re: [Speex-dev] Stream Synchronization for Echo Cancellation
> In those cases, when you get let's say 1000 packets of 20ms from the
mic> you may have only 990 packets of 20ms from RTP incoming stream.
> 
> Thus, before sending outgoing mic/RTP stream, you would wait for 1000
> incoming packets: where last packet in fact arrive 10*20ms = 200ms
> after it was supposed to. I have from my experience already seen 4s
> of clock deviation each minutes between one USB headset and other
> sound card....
> 
> In this case, synchronisation is a nightmare. It seems to be similar
> issue than the one described in your link, but the difference is
really> unpredictable and the resolution does not seems as simple...
> 
> Anybody that wish to share experience on this?
Actually, the jitter buffer in Speex tends to cope relatively well with
non-synchronised clocks. The only that that really doesn't like it is
the echo canceller. Even a drift by one sample means that the echo
canceller needs to re-adapt. So as soon as the (local) clocks aren't
*perfectly* synchronised, the echo cancellation performance goes down to
a point where it's mainly unusable.

	Jean-Marc

> Tks,
> Aymeric MOIZARD / ANTISIP
> amsip - http://www.antisip.com
> osip2 - http://www.osip.org
> eXosip2 - http://savannah.nongnu.org/projects/exosip/
> 
> 
> On Wed, 1 Nov 2006, Tom Grandgent wrote:
> 
>> Isn't this the same problem described starting at the bottom of
>> this page?
>> http://www.embeddedstar.com/articles/2003/7/article20030720-11.html
>>
>> Jean-Marc Valin <jean-marc.valin@usherbrooke.ca> wrote:
>>>
>>>> As it says in 5.4.1 of the good book "Using a different
soundcard
to do>>>> the capture and playback will *not* work, regardless of what
you
may>>>> think. The only exception to that is if the two cards can be
made
to>>>> have their sampling clock 'locked' on the same clock
source."
>>>>
>>>> It seems to me that it should be possible to achieve
synchronization>>>> using some combination of cross-correlation, clock skew
estimation,
and>>>> sample interpolation. But there are so many details to
consider, I
bet>>>> it would take a long time to get right.
>>>
>>> When you get that to work, please let me know and we'll publish
some
>>> papers about it. Until then, your best hope is in echo
*suppression*
>>> (i.e. frequency-dependent gain), although even that could be a bit
>>> tricky.
>>>
>>>     Jean-Marc
>>> _______________________________________________
>>> Speex-dev mailing list
>>> Speex-dev@xiph.org
>>> http://lists.xiph.org/mailman/listinfo/speex-dev
>>
>> _______________________________________________
>> Speex-dev mailing list
>> Speex-dev@xiph.org
>> http://lists.xiph.org/mailman/listinfo/speex-dev
>>
>>
> 
> 
>

Jean-Marc Valin

2006-Nov-10 19:46 UTC

head link

[Speex-dev] Stream Synchronization for Echo Cancellation

> For estimating the delay, what do you think of the idea of using
> cross-correlation?
That would surely work fine. What I prefer though is to simply *know*
what the delay is based on the soundcard settings.

	Jean-Marc
> -mjc
> 
> -----Original Message-----
> From: Jean-Marc Valin [mailto:jean-marc.valin@usherbrooke.ca] 
> Sent: Wednesday, November 01, 2006 7:51 AM
> To: Aymeric Moizard
> Cc: Tom Grandgent; Coffey, Michael; speex-dev@xiph.org
> Subject: Re: [Speex-dev] Stream Synchronization for Echo Cancellation
> 
>> In those cases, when you get let's say 1000 packets of 20ms from
the
> mic
>> you may have only 990 packets of 20ms from RTP incoming stream.
>>
>> Thus, before sending outgoing mic/RTP stream, you would wait for 1000
>> incoming packets: where last packet in fact arrive 10*20ms = 200ms
>> after it was supposed to. I have from my experience already seen 4s
>> of clock deviation each minutes between one USB headset and other
>> sound card....
>>
>> In this case, synchronisation is a nightmare. It seems to be similar
>> issue than the one described in your link, but the difference is
> really
>> unpredictable and the resolution does not seems as simple...
>>
>> Anybody that wish to share experience on this?
> 
> Actually, the jitter buffer in Speex tends to cope relatively well with
> non-synchronised clocks. The only that that really doesn't like it is
> the echo canceller. Even a drift by one sample means that the echo
> canceller needs to re-adapt. So as soon as the (local) clocks aren't
> *perfectly* synchronised, the echo cancellation performance goes down to
> a point where it's mainly unusable.
> 
> 	Jean-Marc
> 
> 
>> Tks,
>> Aymeric MOIZARD / ANTISIP
>> amsip - http://www.antisip.com
>> osip2 - http://www.osip.org
>> eXosip2 - http://savannah.nongnu.org/projects/exosip/
>>
>>
>> On Wed, 1 Nov 2006, Tom Grandgent wrote:
>>
>>> Isn't this the same problem described starting at the bottom of
>>> this page?
>>> http://www.embeddedstar.com/articles/2003/7/article20030720-11.html
>>>
>>> Jean-Marc Valin <jean-marc.valin@usherbrooke.ca> wrote:
>>>>> As it says in 5.4.1 of the good book "Using a
different soundcard
> to do
>>>>> the capture and playback will *not* work, regardless of
what you
> may
>>>>> think. The only exception to that is if the two cards can
be made
> to
>>>>> have their sampling clock 'locked' on the same
clock source."
>>>>>
>>>>> It seems to me that it should be possible to achieve
> synchronization
>>>>> using some combination of cross-correlation, clock skew
estimation,
> and
>>>>> sample interpolation. But there are so many details to
consider, I
> bet
>>>>> it would take a long time to get right.
>>>> When you get that to work, please let me know and we'll
publish some
>>>> papers about it. Until then, your best hope is in echo
*suppression*
>>>> (i.e. frequency-dependent gain), although even that could be a
bit
>>>> tricky.
>>>>
>>>>     Jean-Marc
>>>> _______________________________________________
>>>> Speex-dev mailing list
>>>> Speex-dev@xiph.org
>>>> http://lists.xiph.org/mailman/listinfo/speex-dev
>>> _______________________________________________
>>> Speex-dev mailing list
>>> Speex-dev@xiph.org
>>> http://lists.xiph.org/mailman/listinfo/speex-dev
>>>
>>>
>>
>>
> 
>

Apparently Analagous Threads

Search for more seemingly similar threads

Speex dev - Nov 2006 - Stream Synchronization for Echo Cancellation

[Speex-dev] Stream Synchronization for Echo Cancellation

[Speex-dev] Stream Synchronization for Echo Cancellation

[Speex-dev] Stream Synchronization for Echo Cancellation

[Speex-dev] Stream Synchronization for Echo Cancellation

[Speex-dev] Stream Synchronization for Echo Cancellation

[Speex-dev] Stream Synchronization for Echo Cancellation

Apparently Analagous Threads