thr3ads.net - Speex dev - [Speex-dev] How does the jitter buffer "catch up"? [Sep 2005]

If this information is useful, please help other people find it:
Share via:

Jean-Marc Valin

2005-Sep-18 16:21 UTC

[Speex-dev] How does the jitter buffer "catch up"?

> FYI: The below is just my interpretation of the code, I might be wrong.
Most of it is right. Actually, would you mind if I use part of your
email for documenting the jitter buffer in the manual?
> Each time a new packet arrives, the jitter buffer calculates how far ahead 
> or behind the "current" timestamp it is; this is called
arrival_margin.
> The  "current" timestamp is simply the last frame successfully
decoded.
Minor detail, it's the last played (whether it was successfully decoded
or not).
> It maintains a list of bins for margins, this is short and longterm 
> margin.
> Think of the bins like this:
> -60ms -40ms -20ms 0ms +20ms +40ms +60ms
> when a packet arrives, the margin matching it's arrivel_margin is 
> increased, so if this packet was 40ms after the current timestamp, the 
> 40ms bin would be increased. If this packet arrived 60ms too late (and 
> hence is useless), the -60ms bin would increase.
Right.
> early_ratio_XX is the sum of all the positive bins.
> late_ratio_XX is the sum of all the negative bins.
Right. And only the packets that are "just in time" don't get
counted in
any ratio.
> The difference between _long and _short is just how fast they change.
> 
> If a packet has timestamp outside the bins, it's not used for
calculation.
> 
> Now, clearly, if early_ratio is high and late_ratio is very low, the 
> buffer is buffering more than it needs to; it will skip a frame to reduce 
> latency. Alternately, if late_ratio is even marginally above 0, more 
> buffering is needed, and it duplicates a frame. This decision is done when 
> decoding.
Right.
> Depending on your chosen transmission method, during network hiccups 
> you'll either have lost packets or they'll come in a burst when the
> network conditions restore themselves. In either case, after missing 20 
> packets or so the jitter buffer will prepare to "reset", and
it's new
> current timestamp will be the timestamp on whatever packet arrives. It 
> will also hold decoding until at least buffer_size frames have arrived.
Right, except it will only actually reset when receiving the first new
packet.
> Since it sounds like you're using reliable transmission (packets are
not
> lost), what will happen is that there's a whole stream of packets
suddenly
> arriving, and they'll fill up the buffer much much faster than it's
> emptied. In fact, you're likely to fill it so fast the buffer runs out
of
> room, meaning the first few packets gets dropped to make room for the 
> later ones. However, as the current timestamp was set to the first 
> arriving packet, the decoder won't find the packet it's looking
for,
> meaning the jitter buffer will soon reset again.
I'm not sure here what will happen. Normally, you'd want to make the
buffer larger than what you expect to have in it. In that case, the
jitter buffer would likely drop frames until it catches up.
> So no, it doesn't "catch up", it tries to keep latency to an
absolute
> minimum whatever the circumstances, so most of the late frames will be 
> dropped.
Yes. Actually, the best way to handle that would be to (eventually)
change the code to drop frames in silence or low-energy periods.
> To achieve the effect you're describing, you'd need to increase
> SPEEX_JITTER_MAX_BUFFER_SIZE to the longest delay you're expecting, and
> then inside the block on line 231 (which says)
>     if (late_ratio_short + ontime_ratio_short < .005 &&
late_ratio_long +
> ontime_ratio_long < .01 && early_ratio_short > .8)
> .. add something that multiplies all the magins with 0.75 or so at the 
> end. This will force the jitter buffer to only skip 1 frame at a time and 
> wait a bit before it skips the next one.
Don't think it's necessary since there's already some code that
shifts
the histogram whenever I skip or interpolate a packet. This means that
if the packets are on average 20 ms in advance when we drop a frame,
then they will be considered all "on time" (0 ms) after that.

	Jean-Marc

-- 
Jean-Marc Valin <Jean-Marc.Valin@USherbrooke.ca>
Universit? de Sherbrooke

Thorvald Natvig

2005-Sep-18 18:44 UTC

head link

[Speex-dev] How does the jitter buffer "catch up"?

>> FYI: The below is just my interpretation of the code, I might be wrong.
>
> Most of it is right. Actually, would you mind if I use part of your
> email for documenting the jitter buffer in the manual?
It would be my pleasure :)
>> early_ratio_XX is the sum of all the positive bins.
>> late_ratio_XX is the sum of all the negative bins.
>
> Right. And only the packets that are "just in time" don't get
counted in
> any ratio.
Well.. they're counted in the ontime_ratio_long and _short, right?

One thing that might be worth mentioning: the sum of all the margins will 
never be higher than 1.0, so a test for early_ratio_short > 0.7 means 
(roughly) that 70% or more of the packets in the last short-term time 
period were early.
>> Depending on your chosen transmission method, during network hiccups
>> you'll either have lost packets or they'll come in a burst when
the
>> network conditions restore themselves. In either case, after missing 20
>> packets or so the jitter buffer will prepare to "reset", and
it's new
>> current timestamp will be the timestamp on whatever packet arrives. It
>> will also hold decoding until at least buffer_size frames have arrived.
>
> Right, except it will only actually reset when receiving the first new
> packet.
That's when I meant with "will be the timestamp on whatever packet 
arrives". .. Could be clearer though, I totally agree.
>> Since it sounds like you're using reliable transmission (packets
are not
>> lost), what will happen is that there's a whole stream of packets
suddenly
>> arriving, and they'll fill up the buffer much much faster than
it's
>> emptied. In fact, you're likely to fill it so fast the buffer runs
out of
>> room, meaning the first few packets gets dropped to make room for the
>> later ones. However, as the current timestamp was set to the first
>> arriving packet, the decoder won't find the packet it's looking
for,
>> meaning the jitter buffer will soon reset again.
>
> I'm not sure here what will happen. Normally, you'd want to make
the
> buffer larger than what you expect to have in it. In that case, the
> jitter buffer would likely drop frames until it catches up.
There's a problem with increasing the buffer size, btw: you need to change 
the header, which means you need to recompile both speex and your 
application. So changing the maximum number of buffered packets means you 
can't share libspeex.dll/.so with other applications.
>> To achieve the effect you're describing, you'd need to increase
>> SPEEX_JITTER_MAX_BUFFER_SIZE to the longest delay you're expecting,
and
>> then inside the block on line 231 (which says)
>>     if (late_ratio_short + ontime_ratio_short < .005 &&
late_ratio_long +
>> ontime_ratio_long < .01 && early_ratio_short > .8)
>> .. add something that multiplies all the magins with 0.75 or so at the
>> end. This will force the jitter buffer to only skip 1 frame at a time
and
>> wait a bit before it skips the next one.
>
> Don't think it's necessary since there's already some code that
shifts
> the histogram whenever I skip or interpolate a packet. This means that
> if the packets are on average 20 ms in advance when we drop a frame,
> then they will be considered all "on time" (0 ms) after that.
Yes, but assume that after a long steady period, your network latency 
suddenly drops with 100ms. (100ms is excessive, but I see 60ms quite 
frequently from users on DSL/Cable connections who also do a bit of P2P 
on the same line)

What happens now is that the +100ms bin starts increasing steadily,
and suddenly it's enough to skip a frame.

A frame is skipped, and the histogram gets shifted.

On the next call to _get(), it's now the +80ms bin that has that high 
value, and the ratio is still more than high enough to skip a frame.

A frame is skipped, and the histogram gets shifted.

Repeat for +60, +40 and +20. In short, over a period to decode 5 frames, 
we're also skipping 5 frames, which means you have 100ms of audio that 
sounds weird.

It works well for me though, I prefer that sudden network jumps result in 
an audible "jump" in dialogue rather then users not being sure that 
latency is at an absolute minimum.

Come to think of it, it might actually be better if it just skipped 5 
frames at once. Might be doable by shifting the histogram, and if it still 
meets the criteria, keep skipping and shifting it until it doesn't meet 
the criteria anymore. More work though, and less clear code.

Jean-Marc Valin

2005-Sep-18 19:17 UTC

head link

[Speex-dev] How does the jitter buffer "catch up"?

> > Most of it is right. Actually, would you mind if I use part of your
> > email for documenting the jitter buffer in the manual?
> 
> It would be my pleasure :)
Thanks. Whenever I have some time to update the manual I'll put that in.
> >> early_ratio_XX is the sum of all the positive bins.
> >> late_ratio_XX is the sum of all the negative bins.
> >
> > Right. And only the packets that are "just in time"
don't get counted in
> > any ratio.
> 
> Well.. they're counted in the ontime_ratio_long and _short, right?
Right. It's there so I know how many late packets I'll have if I drop a
frame.
> One thing that might be worth mentioning: the sum of all the margins will 
> never be higher than 1.0, so a test for early_ratio_short > 0.7 means 
> (roughly) that 70% or more of the packets in the last short-term time 
> period were early.
Note that the sum can be <1 if the buffer had a reset recently.
> > I'm not sure here what will happen. Normally, you'd want to
make the
> > buffer larger than what you expect to have in it. In that case, the
> > jitter buffer would likely drop frames until it catches up.
> 
> There's a problem with increasing the buffer size, btw: you need to
change
> the header, which means you need to recompile both speex and your 
> application. So changing the maximum number of buffered packets means you 
> can't share libspeex.dll/.so with other applications.
I agree, which is why making the buffer dynamic is on the TODO list.
> Yes, but assume that after a long steady period, your network latency 
> suddenly drops with 100ms. (100ms is excessive, but I see 60ms quite 
> frequently from users on DSL/Cable connections who also do a bit of P2P 
> on the same line)
> What happens now is that the +100ms bin starts increasing steadily,
> and suddenly it's enough to skip a frame.
> A frame is skipped, and the histogram gets shifted.
> On the next call to _get(), it's now the +80ms bin that has that high 
> value, and the ratio is still more than high enough to skip a frame.
> A frame is skipped, and the histogram gets shifted.
> Repeat for +60, +40 and +20. In short, over a period to decode 5 frames, 
> we're also skipping 5 frames, which means you have 100ms of audio that 
> sounds weird.
Yes. And the fix would simply be to wait for silence periods (e.g.
between words) before dropping frames. It's also on the TODO list.
> Come to think of it, it might actually be better if it just skipped 5 
> frames at once. Might be doable by shifting the histogram, and if it still 
> meets the criteria, keep skipping and shifting it until it doesn't meet
> the criteria anymore. More work though, and less clear code.
I can probably do that after the drop during silence.

	Jean-Marc

-- 
Jean-Marc Valin <Jean-Marc.Valin@USherbrooke.ca>
Universit? de Sherbrooke

Seemingly Similar Threads

Search for more reasonably related threads

Speex dev - Sep 2005 - How does the jitter buffer "catch up"?

[Speex-dev] How does the jitter buffer "catch up"?

[Speex-dev] How does the jitter buffer "catch up"?

[Speex-dev] How does the jitter buffer "catch up"?

Seemingly Similar Threads