thr3ads.net - Speex dev - [Speex-dev] How does the jitter buffer "catch up"? [Sep 2005]

If this information is useful, please help other people find it:
Share via:

Baldvin Hansson

2005-Sep-18 07:43 UTC

[Speex-dev] How does the jitter buffer "catch up"?

Is is possible to give a short hint about how the jitter buffer would
"catch up" when network condition have been bad and then get better?
 
I'm using the jitter buffer with success now, but sometimes I have a
long delay that's caused by bad network conditions and then later when
the conditions get better, I would think we would want the audio to
gradually catch up with real-time to minimize the latency in the voice?
 
Is it not realistic to expect the jitter buffer to do this sort of
"catching up" (of course doing so by "skipping" some of the
older
received audio I guess)?
 
I understand the basic idea of the jitter.c code but am aparently not
bright enough to get the whole point of the short- and long-term margin
values etc. Just wonder if it's possible to get a short description of
each of these variables, their purpose and how they apply to the whole
jitter buffer functionality?
 
Thank you very much.
 
Baldvin
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.xiph.org/pipermail/speex-dev/attachments/20050918/70daef21/attachment.htm

Thorvald Natvig

2005-Sep-18 09:25 UTC

head link

[Speex-dev] How does the jitter buffer "catch up"?

> 
> Is is possible to give a short hint about how the jitter buffer would
> "catch up" when network condition have been bad and then get
better?
>
> I'm using the jitter buffer with success now, but sometimes I have a
> long delay that's caused by bad network conditions and then later when
> the conditions get better, I would think we would want the audio to
> gradually catch up with real-time to minimize the latency in the voice?
>
> Is it not realistic to expect the jitter buffer to do this sort of
> "catching up" (of course doing so by "skipping" some of
the older
> received audio I guess)?
>
> I understand the basic idea of the jitter.c code but am aparently not
> bright enough to get the whole point of the short- and long-term margin
> values etc. Just wonder if it's possible to get a short description of
> each of these variables, their purpose and how they apply to the whole
> jitter buffer functionality?
>
> Thank you very much.
>
> Baldvin
>
>
FYI: The below is just my interpretation of the code, I might be wrong.

Each time a new packet arrives, the jitter buffer calculates how far ahead 
or behind the "current" timestamp it is; this is called
arrival_margin.
The  "current" timestamp is simply the last frame successfully
decoded.

It maintains a list of bins for margins, this is short and longterm 
margin.

Think of the bins like this:

-60ms -40ms -20ms 0ms +20ms +40ms +60ms

when a packet arrives, the margin matching it's arrivel_margin is 
increased, so if this packet was 40ms after the current timestamp, the 
40ms bin would be increased. If this packet arrived 60ms too late (and 
hence is useless), the -60ms bin would increase.

early_ratio_XX is the sum of all the positive bins.
late_ratio_XX is the sum of all the negative bins.

The difference between _long and _short is just how fast they change.

If a packet has timestamp outside the bins, it's not used for calculation.

Now, clearly, if early_ratio is high and late_ratio is very low, the 
buffer is buffering more than it needs to; it will skip a frame to reduce 
latency. Alternately, if late_ratio is even marginally above 0, more 
buffering is needed, and it duplicates a frame. This decision is done when 
decoding.

Depending on your chosen transmission method, during network hiccups 
you'll either have lost packets or they'll come in a burst when the 
network conditions restore themselves. In either case, after missing 20 
packets or so the jitter buffer will prepare to "reset", and it's
new
current timestamp will be the timestamp on whatever packet arrives. It 
will also hold decoding until at least buffer_size frames have arrived.

Since it sounds like you're using reliable transmission (packets are not 
lost), what will happen is that there's a whole stream of packets suddenly 
arriving, and they'll fill up the buffer much much faster than it's 
emptied. In fact, you're likely to fill it so fast the buffer runs out of 
room, meaning the first few packets gets dropped to make room for the 
later ones. However, as the current timestamp was set to the first 
arriving packet, the decoder won't find the packet it's looking for, 
meaning the jitter buffer will soon reset again.

So no, it doesn't "catch up", it tries to keep latency to an
absolute
minimum whatever the circumstances, so most of the late frames will be 
dropped.

To achieve the effect you're describing, you'd need to increase
SPEEX_JITTER_MAX_BUFFER_SIZE to the longest delay you're expecting, and 
then inside the block on line 231 (which says)
    if (late_ratio_short + ontime_ratio_short < .005 &&
late_ratio_long +
ontime_ratio_long < .01 && early_ratio_short > .8)
.. add something that multiplies all the magins with 0.75 or so at the 
end. This will force the jitter buffer to only skip 1 frame at a time and 
wait a bit before it skips the next one.

Baldvin Hansson

2005-Sep-18 15:03 UTC

head link

[Speex-dev] How does the jitter buffer "catch up"?

Thank you for a very good explanation which shed light on some of the
questions that I had after reading the source code.

Reading your text however, I wonder if I'm perhaps missing an important
point on the proper use of the jitter buffer: 

...> Now, clearly, if early_ratio is high and late_ratio is very 
> low, the buffer is buffering more than it needs to; it will 
> skip a frame to reduce latency....

Question:
Do I understand it that I should not put every incoming packet through the
jitter buffer?

The way my code works today is:

1) Packet read from socket
2) Call speex_jitter_put(...) with the just-arrived packet
3) Read one packet from jitter buffer using speex_jitter_get(...) function
4) Feed just read-from-jitter packet to the sound card for playback

This will in fact feed one 20msec batch of sound to play at the sound card
for every packet received from the speex encoder at the other end.

I know I may sound a bit slow-on-the-pickup here, but at the risk of
sounding very beginner like (which I'll gladly admit I am) I wonder if this
is totally wrong to do?

Question:
Should the jitter buffer implementation not have a packet to return (data is
simply missing) should I bother to feed the 20msec packet of silence
(comfort noise perhaps?) to the speaker? Or should the jitter buffer perhaps
hint me (with a return value?) that no packet was available and there is no
need to feed anything to the sound card?

In my current implementation, running on a Windows XP box, I have a growing
number of outstanding packets queued to the soundcard. I believe this is
happening because when packets are delayed (in my test case I have no packet
loss, just delays) the jitter buffer interpolates and returns a packet to
play. When the packets finally arrive, they too are queued to the soundcard.
Resulting in an increasing non-recoverable delay of the speech coming out of
the sound card.

Your feedback is greatly appreciated. I thank you for taking the time to
respond with any relevant details or hints.

Respectfully,
Baldvin
> -----Original Message-----
> From: speex-dev-bounces@xiph.org 
> [mailto:speex-dev-bounces@xiph.org] On Behalf Of Thorvald Natvig
> Sent: 18. september 2005 16:25
> To: speex-dev@xiph.org
> Subject: Re: [Speex-dev] How does the jitter buffer "catch up"?
> 
> 
> > 
> > Is is possible to give a short hint about how the jitter 
> buffer would 
> > "catch up" when network condition have been bad and then get
better?
> >
> > I'm using the jitter buffer with success now, but sometimes 
> I have a 
> > long delay that's caused by bad network conditions and then 
> later when 
> > the conditions get better, I would think we would want the audio to 
> > gradually catch up with real-time to minimize the latency 
> in the voice?
> >
> > Is it not realistic to expect the jitter buffer to do this sort of 
> > "catching up" (of course doing so by "skipping"
some of the older
> > received audio I guess)?
> >
> > I understand the basic idea of the jitter.c code but am 
> aparently not 
> > bright enough to get the whole point of the short- and long-term 
> > margin values etc. Just wonder if it's possible to get a short 
> > description of each of these variables, their purpose and how they 
> > apply to the whole jitter buffer functionality?
> >
> > Thank you very much.
> >
> > Baldvin
> >
> >
> 
> FYI: The below is just my interpretation of the code, I might 
> be wrong.
> 
> Each time a new packet arrives, the jitter buffer calculates 
> how far ahead or behind the "current" timestamp it is; this 
> is called arrival_margin. 
> The  "current" timestamp is simply the last frame 
> successfully decoded.
> 
> It maintains a list of bins for margins, this is short and 
> longterm margin.
> 
> Think of the bins like this:
> 
> -60ms -40ms -20ms 0ms +20ms +40ms +60ms
> 
> when a packet arrives, the margin matching it's 
> arrivel_margin is increased, so if this packet was 40ms after 
> the current timestamp, the 40ms bin would be increased. If 
> this packet arrived 60ms too late (and hence is useless), the 
> -60ms bin would increase.
> 
> early_ratio_XX is the sum of all the positive bins.
> late_ratio_XX is the sum of all the negative bins.
> 
> The difference between _long and _short is just how fast they change.
> 
> If a packet has timestamp outside the bins, it's not used for 
> calculation.
> 
> Now, clearly, if early_ratio is high and late_ratio is very 
> low, the buffer is buffering more than it needs to; it will 
> skip a frame to reduce latency. Alternately, if late_ratio is 
> even marginally above 0, more buffering is needed, and it 
> duplicates a frame. This decision is done when decoding.
> 
> Depending on your chosen transmission method, during network 
> hiccups you'll either have lost packets or they'll come in a 
> burst when the network conditions restore themselves. In 
> either case, after missing 20 packets or so the jitter buffer 
> will prepare to "reset", and it's new current timestamp will 
> be the timestamp on whatever packet arrives. It will also 
> hold decoding until at least buffer_size frames have arrived.
> 
> Since it sounds like you're using reliable transmission 
> (packets are not lost), what will happen is that there's a 
> whole stream of packets suddenly arriving, and they'll fill 
> up the buffer much much faster than it's emptied. In fact, 
> you're likely to fill it so fast the buffer runs out of room, 
> meaning the first few packets gets dropped to make room for 
> the later ones. However, as the current timestamp was set to 
> the first arriving packet, the decoder won't find the packet 
> it's looking for, meaning the jitter buffer will soon reset again.
> 
> So no, it doesn't "catch up", it tries to keep latency to an 
> absolute minimum whatever the circumstances, so most of the 
> late frames will be dropped.
> 
> To achieve the effect you're describing, you'd need to 
> increase SPEEX_JITTER_MAX_BUFFER_SIZE to the longest delay 
> you're expecting, and then inside the block on line 231 (which says)
>     if (late_ratio_short + ontime_ratio_short < .005 && 
> late_ratio_long + ontime_ratio_long < .01 && 
> early_ratio_short > .8) .. add something that multiplies all 
> the magins with 0.75 or so at the end. This will force the 
> jitter buffer to only skip 1 frame at a time and wait a bit 
> before it skips the next one.
> 
> _______________________________________________
> Speex-dev mailing list
> Speex-dev@xiph.org
> http://lists.xiph.org/mailman/listinfo/speex-dev
>

Jean-Marc Valin

2005-Sep-18 16:21 UTC

head link

[Speex-dev] How does the jitter buffer "catch up"?

> FYI: The below is just my interpretation of the code, I might be wrong.
Most of it is right. Actually, would you mind if I use part of your
email for documenting the jitter buffer in the manual?
> Each time a new packet arrives, the jitter buffer calculates how far ahead 
> or behind the "current" timestamp it is; this is called
arrival_margin.
> The  "current" timestamp is simply the last frame successfully
decoded.
Minor detail, it's the last played (whether it was successfully decoded
or not).
> It maintains a list of bins for margins, this is short and longterm 
> margin.
> Think of the bins like this:
> -60ms -40ms -20ms 0ms +20ms +40ms +60ms
> when a packet arrives, the margin matching it's arrivel_margin is 
> increased, so if this packet was 40ms after the current timestamp, the 
> 40ms bin would be increased. If this packet arrived 60ms too late (and 
> hence is useless), the -60ms bin would increase.
Right.
> early_ratio_XX is the sum of all the positive bins.
> late_ratio_XX is the sum of all the negative bins.
Right. And only the packets that are "just in time" don't get
counted in
any ratio.
> The difference between _long and _short is just how fast they change.
> 
> If a packet has timestamp outside the bins, it's not used for
calculation.
> 
> Now, clearly, if early_ratio is high and late_ratio is very low, the 
> buffer is buffering more than it needs to; it will skip a frame to reduce 
> latency. Alternately, if late_ratio is even marginally above 0, more 
> buffering is needed, and it duplicates a frame. This decision is done when 
> decoding.
Right.
> Depending on your chosen transmission method, during network hiccups 
> you'll either have lost packets or they'll come in a burst when the
> network conditions restore themselves. In either case, after missing 20 
> packets or so the jitter buffer will prepare to "reset", and
it's new
> current timestamp will be the timestamp on whatever packet arrives. It 
> will also hold decoding until at least buffer_size frames have arrived.
Right, except it will only actually reset when receiving the first new
packet.
> Since it sounds like you're using reliable transmission (packets are
not
> lost), what will happen is that there's a whole stream of packets
suddenly
> arriving, and they'll fill up the buffer much much faster than it's
> emptied. In fact, you're likely to fill it so fast the buffer runs out
of
> room, meaning the first few packets gets dropped to make room for the 
> later ones. However, as the current timestamp was set to the first 
> arriving packet, the decoder won't find the packet it's looking
for,
> meaning the jitter buffer will soon reset again.
I'm not sure here what will happen. Normally, you'd want to make the
buffer larger than what you expect to have in it. In that case, the
jitter buffer would likely drop frames until it catches up.
> So no, it doesn't "catch up", it tries to keep latency to an
absolute
> minimum whatever the circumstances, so most of the late frames will be 
> dropped.
Yes. Actually, the best way to handle that would be to (eventually)
change the code to drop frames in silence or low-energy periods.
> To achieve the effect you're describing, you'd need to increase
> SPEEX_JITTER_MAX_BUFFER_SIZE to the longest delay you're expecting, and
> then inside the block on line 231 (which says)
>     if (late_ratio_short + ontime_ratio_short < .005 &&
late_ratio_long +
> ontime_ratio_long < .01 && early_ratio_short > .8)
> .. add something that multiplies all the magins with 0.75 or so at the 
> end. This will force the jitter buffer to only skip 1 frame at a time and 
> wait a bit before it skips the next one.
Don't think it's necessary since there's already some code that
shifts
the histogram whenever I skip or interpolate a packet. This means that
if the packets are on average 20 ms in advance when we drop a frame,
then they will be considered all "on time" (0 ms) after that.

	Jean-Marc

-- 
Jean-Marc Valin <Jean-Marc.Valin@USherbrooke.ca>
Universit? de Sherbrooke

Reasonably Related Threads

Search for more apparently analagous threads

Speex dev - Sep 2005 - How does the jitter buffer "catch up"?

[Speex-dev] How does the jitter buffer "catch up"?

[Speex-dev] How does the jitter buffer "catch up"?

[Speex-dev] How does the jitter buffer "catch up"?

[Speex-dev] How does the jitter buffer "catch up"?

Reasonably Related Threads