Jean-Marc Valin
2006-Nov-01 07:55 UTC
[Speex-dev] Stream Synchronization for Echo Cancellation
> In those cases, when you get let's say 1000 packets of 20ms from the mic > you may have only 990 packets of 20ms from RTP incoming stream. > > Thus, before sending outgoing mic/RTP stream, you would wait for 1000 > incoming packets: where last packet in fact arrive 10*20ms = 200ms > after it was supposed to. I have from my experience already seen 4s > of clock deviation each minutes between one USB headset and other > sound card.... > > In this case, synchronisation is a nightmare. It seems to be similar > issue than the one described in your link, but the difference is really > unpredictable and the resolution does not seems as simple... > > Anybody that wish to share experience on this?Actually, the jitter buffer in Speex tends to cope relatively well with non-synchronised clocks. The only that that really doesn't like it is the echo canceller. Even a drift by one sample means that the echo canceller needs to re-adapt. So as soon as the (local) clocks aren't *perfectly* synchronised, the echo cancellation performance goes down to a point where it's mainly unusable. Jean-Marc> Tks, > Aymeric MOIZARD / ANTISIP > amsip - http://www.antisip.com > osip2 - http://www.osip.org > eXosip2 - http://savannah.nongnu.org/projects/exosip/ > > > On Wed, 1 Nov 2006, Tom Grandgent wrote: > >> Isn't this the same problem described starting at the bottom of >> this page? >> http://www.embeddedstar.com/articles/2003/7/article20030720-11.html >> >> Jean-Marc Valin <jean-marc.valin@usherbrooke.ca> wrote: >>> >>>> As it says in 5.4.1 of the good book "Using a different soundcard to do >>>> the capture and playback will *not* work, regardless of what you may >>>> think. The only exception to that is if the two cards can be made to >>>> have their sampling clock 'locked' on the same clock source." >>>> >>>> It seems to me that it should be possible to achieve synchronization >>>> using some combination of cross-correlation, clock skew estimation, and >>>> sample interpolation. But there are so many details to consider, I bet >>>> it would take a long time to get right. >>> >>> When you get that to work, please let me know and we'll publish some >>> papers about it. Until then, your best hope is in echo *suppression* >>> (i.e. frequency-dependent gain), although even that could be a bit >>> tricky. >>> >>> Jean-Marc >>> _______________________________________________ >>> Speex-dev mailing list >>> Speex-dev@xiph.org >>> http://lists.xiph.org/mailman/listinfo/speex-dev >> >> _______________________________________________ >> Speex-dev mailing list >> Speex-dev@xiph.org >> http://lists.xiph.org/mailman/listinfo/speex-dev >> >> > > >
Aymeric Moizard
2006-Nov-01 09:29 UTC
[Speex-dev] Stream Synchronization for Echo Cancellation
On Thu, 2 Nov 2006, Jean-Marc Valin wrote:>> In those cases, when you get let's say 1000 packets of 20ms from the mic >> you may have only 990 packets of 20ms from RTP incoming stream. >> >> Thus, before sending outgoing mic/RTP stream, you would wait for 1000 >> incoming packets: where last packet in fact arrive 10*20ms = 200ms >> after it was supposed to. I have from my experience already seen 4s >> of clock deviation each minutes between one USB headset and other >> sound card.... >> >> In this case, synchronisation is a nightmare. It seems to be similar >> issue than the one described in your link, but the difference is really >> unpredictable and the resolution does not seems as simple... >> >> Anybody that wish to share experience on this? > > Actually, the jitter buffer in Speex tends to cope relatively well with > non-synchronised clocks.Can you explain why? My problem is not at all related to local input/output non-synchronised clocks: my problem is really between non-synchronised clock between one PC and another...> The only that that really doesn't like it is the echo canceller.In my above case, If I add 10 extra packets regularly in the incoming stream (the one that miss 10 packets), the echo canceller is working perfectly. I was just trying to comment on the paper you linked to: My opinion is that the problem don't only comes from local hardware (where non-synchro clocks leads to problem with aec). There are other problems with different clocks on 2 remote hardware. (where non-synchro does not lead to aec issue, but leads to missing data (sometimes no data is played) or too much data (the application has to discard else the voice delay is growing because a buffer is growing) The only way would be to extend or reduce frames: so my question was: does anybody here have ever tried this in real time on audio streaming? Any simple idea to do this? I would love to see code on this... Tks, Aymeric MOIZARD / ANTISIP amsip - http://www.antisip.com osip2 - http://www.osip.org eXosip2 - http://savannah.nongnu.org/projects/exosip/
Jim Crichton
2006-Nov-01 14:00 UTC
[Speex-dev] Stream Synchronization for Echo Cancellation
>>> In those cases, when you get let's say 1000 packets of 20ms from the mic >>> you may have only 990 packets of 20ms from RTP incoming stream. >>> >>> Thus, before sending outgoing mic/RTP stream, you would wait for 1000 >>> incoming packets: where last packet in fact arrive 10*20ms = 200ms >>> after it was supposed to. I have from my experience already seen 4s >>> of clock deviation each minutes between one USB headset and other >>> sound card.... >>> >>> In this case, synchronisation is a nightmare. It seems to be similar >>> issue than the one described in your link, but the difference is really >>> unpredictable and the resolution does not seems as simple... >>> >>> Anybody that wish to share experience on this? >> >> Actually, the jitter buffer in Speex tends to cope relatively well with >> non-synchronised clocks. > > Can you explain why? > > My problem is not at all related to local input/output non-synchronised > clocks: my problem is really between non-synchronised clock between one > PC and another... > >> The only that that really doesn't like it is the echo canceller. > > In my above case, If I add 10 extra packets regularly in the incoming > stream (the one that miss 10 packets), the echo canceller is working > perfectly. >If your microphone and speaker clocks are locked, then the echo canceller will be happy. However, the Speex decoder needs to run according to the local timing, not according to the RTP packet arrival rate. Otherwise, the output sample stream will over/underrun, and that will kill the echo canceller. That is one function of a jitter buffer. If you monitor the fill level of the buffer, you can drop or duplicate frames when some threshold is reached (rather than doing this at fixed intervals based on a measured packet arrival rate). This is less disruptive than having the jitter buffer delay rebuild when it overruns/underruns. In the presence of jitter, the measurement gets more difficult, of course. I do not know how the Speex jitter buffer works in this situation, since I use something different in my application.> I was just trying to comment on the paper you linked to: My opinion is > that the problem don't only comes from local hardware (where non-synchro > clocks leads to problem with aec). There are other problems with different > clocks on 2 remote hardware. (where non-synchro does not lead to aec > issue, but leads to missing data (sometimes no data is played) or too much > data (the application has to discard else the voice delay is growing > because a buffer is growing) > > The only way would be to extend or reduce frames: so my question was: > does anybody here have ever tried this in real time on audio streaming? > Any simple idea to do this?If there is no jitter or packet loss, then it is easy: just buffer a couple of frames, and then repeat a frame if you run out, and drop a frame if the buffer gets too full. You need some kind of jitter buffer, certainly, but I wonder if the USB problem is really the sample clock rate. You are talking about a 1% error in frequency. A typical spec for voiceband modems (e.g. V.32) is 0.01%. To get a 1% error, the USB device would have to have a resistor/capacitor timing circuit instead of a crystal oscillator, and that is hard to believe. I suppose in a cheap headset, anything is possible. Is it possible that there is a processing problem such that samples are being dropped on the USB interface, which creates the apparently low sample rate (only 99 of every 100 samples are encoded)? You could still compensate for that with jitter buffer adjustments, as above, but the audio would certainly be degraded. - Jim
Jean-Marc Valin
2006-Nov-01 14:59 UTC
[Speex-dev] Stream Synchronization for Echo Cancellation
>> Actually, the jitter buffer in Speex tends to cope relatively well with >> non-synchronised clocks. > > Can you explain why? > > My problem is not at all related to local input/output non-synchronised > clocks: my problem is really between non-synchronised clock between one > PC and another...What happens is that my jitter buffer is designed without any explicit clock. The pace at which you get data from it is assumed to be the local clock. The jitter buffer is designed to buffer just enough packets to prevent most packets from arriving late. It means that when we are in steady-state conditions (jitter not changing). In that case: 1) If the packets arrive to fast (remote clock is faster), then it will discard packets once in a while to maintain the optimal buffer size. 2) If the packets arrive too slow, interpolation will happen because the buffer becomes too small. Note that there is no explicit buffer size, the cases 1) and 2) above are determined only based on the histogram of the packet arrival time. I don't even attempt to know whether 1) the network delay is changing 2) the clocks are drifting. It would be impossible anyway without an accurate clock.>> The only that that really doesn't like it is the echo canceller. > > In my above case, If I add 10 extra packets regularly in the incoming > stream (the one that miss 10 packets), the echo canceller is working > perfectly.Well, if your local capture clock and your local playback clock are synchronised it's fine. The AEC doesn't care if the remote clock is drifting.> I was just trying to comment on the paper you linked to: My opinion is > that the problem don't only comes from local hardware (where non-synchro > clocks leads to problem with aec). There are other problems with different > clocks on 2 remote hardware. (where non-synchro does not lead to aec > issue, but leads to missing data (sometimes no data is played) or too > much data (the application has to discard else the voice delay is > growing because a buffer is growing)As long as the clock drift is small (typically <1%), I don't see any problem from a jitter buffer point of view.> The only way would be to extend or reduce frames: so my question was: > does anybody here have ever tried this in real time on audio streaming? > Any simple idea to do this?You can try the jitter buffer I have. The only thing that I need to improve is to wait for silence periods before adding/removing packets. Jean-Marc
Coffey, Michael
2006-Nov-10 19:04 UTC
[Speex-dev] Stream Synchronization for Echo Cancellation
Following up on the original topic of synchronization between the local mic and local speaker streams: We can separate this problem into two sub-problems: (1) compensating for differences in sampling rates; and (2) compensating for delay between the two streams. For estimating the delay, what do you think of the idea of using cross-correlation? -mjc -----Original Message----- From: Jean-Marc Valin [mailto:jean-marc.valin@usherbrooke.ca] Sent: Wednesday, November 01, 2006 7:51 AM To: Aymeric Moizard Cc: Tom Grandgent; Coffey, Michael; speex-dev@xiph.org Subject: Re: [Speex-dev] Stream Synchronization for Echo Cancellation> In those cases, when you get let's say 1000 packets of 20ms from themic> you may have only 990 packets of 20ms from RTP incoming stream. > > Thus, before sending outgoing mic/RTP stream, you would wait for 1000 > incoming packets: where last packet in fact arrive 10*20ms = 200ms > after it was supposed to. I have from my experience already seen 4s > of clock deviation each minutes between one USB headset and other > sound card.... > > In this case, synchronisation is a nightmare. It seems to be similar > issue than the one described in your link, but the difference isreally> unpredictable and the resolution does not seems as simple... > > Anybody that wish to share experience on this?Actually, the jitter buffer in Speex tends to cope relatively well with non-synchronised clocks. The only that that really doesn't like it is the echo canceller. Even a drift by one sample means that the echo canceller needs to re-adapt. So as soon as the (local) clocks aren't *perfectly* synchronised, the echo cancellation performance goes down to a point where it's mainly unusable. Jean-Marc> Tks, > Aymeric MOIZARD / ANTISIP > amsip - http://www.antisip.com > osip2 - http://www.osip.org > eXosip2 - http://savannah.nongnu.org/projects/exosip/ > > > On Wed, 1 Nov 2006, Tom Grandgent wrote: > >> Isn't this the same problem described starting at the bottom of >> this page? >> http://www.embeddedstar.com/articles/2003/7/article20030720-11.html >> >> Jean-Marc Valin <jean-marc.valin@usherbrooke.ca> wrote: >>> >>>> As it says in 5.4.1 of the good book "Using a different soundcardto do>>>> the capture and playback will *not* work, regardless of what youmay>>>> think. The only exception to that is if the two cards can be madeto>>>> have their sampling clock 'locked' on the same clock source." >>>> >>>> It seems to me that it should be possible to achievesynchronization>>>> using some combination of cross-correlation, clock skew estimation,and>>>> sample interpolation. But there are so many details to consider, Ibet>>>> it would take a long time to get right. >>> >>> When you get that to work, please let me know and we'll publish some >>> papers about it. Until then, your best hope is in echo *suppression* >>> (i.e. frequency-dependent gain), although even that could be a bit >>> tricky. >>> >>> Jean-Marc >>> _______________________________________________ >>> Speex-dev mailing list >>> Speex-dev@xiph.org >>> http://lists.xiph.org/mailman/listinfo/speex-dev >> >> _______________________________________________ >> Speex-dev mailing list >> Speex-dev@xiph.org >> http://lists.xiph.org/mailman/listinfo/speex-dev >> >> > > >
Jean-Marc Valin
2006-Nov-10 19:46 UTC
[Speex-dev] Stream Synchronization for Echo Cancellation
> For estimating the delay, what do you think of the idea of using > cross-correlation?That would surely work fine. What I prefer though is to simply *know* what the delay is based on the soundcard settings. Jean-Marc> -mjc > > -----Original Message----- > From: Jean-Marc Valin [mailto:jean-marc.valin@usherbrooke.ca] > Sent: Wednesday, November 01, 2006 7:51 AM > To: Aymeric Moizard > Cc: Tom Grandgent; Coffey, Michael; speex-dev@xiph.org > Subject: Re: [Speex-dev] Stream Synchronization for Echo Cancellation > >> In those cases, when you get let's say 1000 packets of 20ms from the > mic >> you may have only 990 packets of 20ms from RTP incoming stream. >> >> Thus, before sending outgoing mic/RTP stream, you would wait for 1000 >> incoming packets: where last packet in fact arrive 10*20ms = 200ms >> after it was supposed to. I have from my experience already seen 4s >> of clock deviation each minutes between one USB headset and other >> sound card.... >> >> In this case, synchronisation is a nightmare. It seems to be similar >> issue than the one described in your link, but the difference is > really >> unpredictable and the resolution does not seems as simple... >> >> Anybody that wish to share experience on this? > > Actually, the jitter buffer in Speex tends to cope relatively well with > non-synchronised clocks. The only that that really doesn't like it is > the echo canceller. Even a drift by one sample means that the echo > canceller needs to re-adapt. So as soon as the (local) clocks aren't > *perfectly* synchronised, the echo cancellation performance goes down to > a point where it's mainly unusable. > > Jean-Marc > > >> Tks, >> Aymeric MOIZARD / ANTISIP >> amsip - http://www.antisip.com >> osip2 - http://www.osip.org >> eXosip2 - http://savannah.nongnu.org/projects/exosip/ >> >> >> On Wed, 1 Nov 2006, Tom Grandgent wrote: >> >>> Isn't this the same problem described starting at the bottom of >>> this page? >>> http://www.embeddedstar.com/articles/2003/7/article20030720-11.html >>> >>> Jean-Marc Valin <jean-marc.valin@usherbrooke.ca> wrote: >>>>> As it says in 5.4.1 of the good book "Using a different soundcard > to do >>>>> the capture and playback will *not* work, regardless of what you > may >>>>> think. The only exception to that is if the two cards can be made > to >>>>> have their sampling clock 'locked' on the same clock source." >>>>> >>>>> It seems to me that it should be possible to achieve > synchronization >>>>> using some combination of cross-correlation, clock skew estimation, > and >>>>> sample interpolation. But there are so many details to consider, I > bet >>>>> it would take a long time to get right. >>>> When you get that to work, please let me know and we'll publish some >>>> papers about it. Until then, your best hope is in echo *suppression* >>>> (i.e. frequency-dependent gain), although even that could be a bit >>>> tricky. >>>> >>>> Jean-Marc >>>> _______________________________________________ >>>> Speex-dev mailing list >>>> Speex-dev@xiph.org >>>> http://lists.xiph.org/mailman/listinfo/speex-dev >>> _______________________________________________ >>> Speex-dev mailing list >>> Speex-dev@xiph.org >>> http://lists.xiph.org/mailman/listinfo/speex-dev >>> >>> >> >> > >