thr3ads.net - Speex dev - [Speex-dev] New jitter.c, bug in speex_jitter

If this information is useful, please help other people find it:
Share via:

Jean-Marc Valin

2006-May-03 18:13 UTC

[Speex-dev] New jitter.c, bug in speex_jitter_get?

> We just return a frame with the return value JB_DROP, which tells the  
> caller to drop this frame, and call jb_get again.
> 
> When the caller is done with the jitterbuffer, it calls jb_getall()  
> repeatedly, until it's empty, and then it can discard all the frames.
Hmm, looks a bit error-prone to me. Especially considering I still have
to explain that "no, you can't pass ulaw instead of float to
speex_encode and expect 4x better compression" :-)
> Perhaps it's not expensive, but it's unnecessary..  It also means  
> that the jitterbuffer's pointers can point to structures, or other  
> types of data, and the jitterbuffer doesn't need to understand them.   
> In particular, it's designed to be able to buffer and reorder frames  
> (things) which aren't audio -- like video and control frames.
How are video frames or control frames different. My jitter buffer only
takes raw packets (i.e. N bytes), it doesn't care about the content or
meaning. Also, why would you want to give it structs? AFAIK, IP packets
can only contain bytes anyway.
> > A couple things I don't understand. Why do you need both the local
> > clock
> > and the remote clock and how do you define those anyway?
> 
> There's the "timestamp", which the remote side puts on
frames, and
> the local time, which is used for jb_put and jb_get.  They're defined  
> in milliseconds.
I think this is equivalent to what I'm doing with jitter_buffer_tick(),
except that my approach doesn't require explicit knowledge of the local
time (I don't care what the local time is, just how it's incremented).
Also, I think milliseconds (which I used before) is bad because 1) RTP
uses the sampling clock and 2) in many cases, it's not an integer. I've
even used my jitter_buffer to send raw PCM packets of 1/3 ms each (16
samples at 48 kHz).
> 
> > BTW, does your
> > jitter buffer consider that there can be overlaps or holes between
> > frames, especially if using PCM?
> 
> There shouldn't be overlaps, but I think it currently deals with some  
> types of messy timestamps that are, for example, +- one frame length  
> from precise.  (i.e. sequences of 20 ms frames with timestamps like  
> 0, 18, 42, 55, 82, instead of 0, 20, 40, 60, 80)..
Why no overlap? What if you want to include a bit of redundancy (doesn't
have to be 100% either) to make your app more robust to packet loss? You
could want to send a packet that covers 0-60 ms, followed by 40-100 ms,
followed by 80-140 ms, ...
> For reference, the API I've shown is already used in asterisk and  
> iaxclient.  There are two other jitterbuffers that I know of which  
> are using basically the same API:  A non-adaptive jitterbuffer, and  
> another one which more closely follows the research models I was  
> looking at when I wrote mine, by Jesse Kaijen.  The former is in one  
> of asterisk's SVN branches and the bugtracker, and I think that  
> Jesse's is at speakup.nl.
Well, that API clearly has limitations that mean I can't use them to do
what I need. Unless you're willing to change that (and even then I'm not
sure), there's no way we can use the same API. I still suspect it may be
possible to wrap by current API in that API. Of course, some features
would just not be available.
> My jitterbuffer is by no means perfect -- mainly it seems to still  
> have situations where there's garbage input which confuses it in a  
> way that I'd hope it could recover from.  (i.e. nonsensical  
> timestamps that it gets sent).
I have yet to see any situation that confuses my jitter buffer to that
point. For example, if you feed it two streams with different timestamp
sequences, it will just follow one of them. The only case where it can
get a little confused is if there's a sudden jump in the timestamp
sequence, but even then it will automatically re-sync after a second or
two.

	Jean-Marc

SteveK

2006-May-03 18:31 UTC

head link

[Speex-dev] New jitter.c, bug in speex_jitter_get?

On May 3, 2006, at 9:12 PM, Jean-Marc Valin wrote:
>> We just return a frame with the return value JB_DROP, which tells the
>> caller to drop this frame, and call jb_get again.
>>
>> When the caller is done with the jitterbuffer, it calls jb_getall()
>> repeatedly, until it's empty, and then it can discard all the
frames.
>
> Hmm, looks a bit error-prone to me. Especially considering I still  
> have
> to explain that "no, you can't pass ulaw instead of float to
> speex_encode and expect 4x better compression" :-)
Perhaps, but then you need to assume that the jitterbuffer can just  
throw away the data, and that limits how you can use it.  In object- 
oriented terms, you might want to pass objects to the JB, and then  
call a destructor on them.  In C terms, you may want to allocate  
frames via malloc(), and then call free() on them later.  You might  
want to pass in reference-counted objects of some sort, etc.

>> Perhaps it's not expensive, but it's unnecessary..  It also
means
>> that the jitterbuffer's pointers can point to structures, or other
>> types of data, and the jitterbuffer doesn't need to understand
them.
>> In particular, it's designed to be able to buffer and reorder
frames
>> (things) which aren't audio -- like video and control frames.
>
> How are video frames or control frames different. My jitter buffer  
> only
> takes raw packets (i.e. N bytes), it doesn't care about the content or
> meaning.
Mainly they're different because you don't ever want the jitterbuffer  
to throw them away -- you always want to deliver them.  They probably  
have zero duration (are impulses), and will overlap in timestamps  
with the audio frames.  You may not want to consider them in your  
jitter calculations.

> Also, why would you want to give it structs? AFAIK, IP packets
> can only contain bytes anyway.
Of course.  But, in the way I've used the JB, and I would imagine in  
most cases, the application which uses it is going to be parsing the  
network stuff before putting it into a JB, and would put it into a  
structure or object.  Clearly, everything is just bytes, and you  
could do something similar with your JB api by passing in pointers  
and len==4, _if_ your jitterbuffer didn't have the ability to just  
drop frames internally.
>>> A couple things I don't understand. Why do you need both the
local
>>> clock
>>> and the remote clock and how do you define those anyway?
>>
>> There's the "timestamp", which the remote side puts on
frames, and
>> the local time, which is used for jb_put and jb_get.  They're
defined
>> in milliseconds.
>
> I think this is equivalent to what I'm doing with jitter_buffer_tick 
> (),
> except that my approach doesn't require explicit knowledge of the  
> local
> time (I don't care what the local time is, just how it's
incremented).
> Also, I think milliseconds (which I used before) is bad because 1) RTP
> uses the sampling clock and 2) in many cases, it's not an integer.  
> I've
> even used my jitter_buffer to send raw PCM packets of 1/3 ms each (16
> samples at 48 kHz).
The time in my implementation doesn't need to be wall time, nor do  
timestamps;  They're all relative to each other, and the beginning of  
the "session".   I think everything would work OK +- some constants  
if the scale were different.
>
>>
>>> BTW, does your
>>> jitter buffer consider that there can be overlaps or holes between
>>> frames, especially if using PCM?
>>
>> There shouldn't be overlaps, but I think it currently deals with
some
>> types of messy timestamps that are, for example, +- one frame length
>> from precise.  (i.e. sequences of 20 ms frames with timestamps like
>> 0, 18, 42, 55, 82, instead of 0, 20, 40, 60, 80)..
>
> Why no overlap? What if you want to include a bit of redundancy  
> (doesn't
> have to be 100% either) to make your app more robust to packet  
> loss? You
> could want to send a packet that covers 0-60 ms, followed by 40-100  
> ms,
> followed by 80-140 ms, ...
I see now.  I hadn't considered this, but it could also be expressed  
as a sequence of 20ms frames, some of which are dups, and some which  
have identical arrival times.  I'm not sure how my implementation  
would handle this, but I don't think it breaks the API.
>
>> For reference, the API I've shown is already used in asterisk and
>> iaxclient.  There are two other jitterbuffers that I know of which
>> are using basically the same API:  A non-adaptive jitterbuffer, and
>> another one which more closely follows the research models I was
>> looking at when I wrote mine, by Jesse Kaijen.  The former is in one
>> of asterisk's SVN branches and the bugtracker, and I think that
>> Jesse's is at speakup.nl.
>
> Well, that API clearly has limitations that mean I can't use them  
> to do
> what I need. Unless you're willing to change that (and even then  
> I'm not
> sure), there's no way we can use the same API. I still suspect it  
> may be
> possible to wrap by current API in that API. Of course, some features
> would just not be available.
I think it would, except that your API lets the jb destroy data on  
it's own, which would be bad, for example, if the data was a control  
frame, or in every case, because frames are usually malloced.

>
>> My jitterbuffer is by no means perfect -- mainly it seems to still
>> have situations where there's garbage input which confuses it in a
>> way that I'd hope it could recover from.  (i.e. nonsensical
>> timestamps that it gets sent).
>
> I have yet to see any situation that confuses my jitter buffer to that
> point. For example, if you feed it two streams with different  
> timestamp
> sequences, it will just follow one of them. The only case where it can
> get a little confused is if there's a sudden jump in the timestamp
> sequence, but even then it will automatically re-sync after a  
> second or
> two.
Yours may indeed be better than mine, but before you say it won't get  
confused, let's see what happens if it gets into asterisk and a lot  
of real-world broken streams get thrown at it :)

What really would help in the long run is if we had some kind of test  
harness to run these things in, and good test data culled from real- 
world situations.   I had some hacky tools like this I used when I  
built my implementation, but nothing really good.

-SteveK

Jean-Marc Valin

2006-May-03 18:54 UTC

head link

[Speex-dev] New jitter.c, bug in speex_jitter_get?

> Perhaps, but then you need to assume that the jitterbuffer can just  
> throw away the data, and that limits how you can use it.  In object- 
> oriented terms, you might want to pass objects to the JB, and then  
> call a destructor on them.  In C terms, you may want to allocate  
> frames via malloc(), and then call free() on them later.  You might  
> want to pass in reference-counted objects of some sort, etc.
Are we talking about the same thing here? I'm talking about IP packets,
more specifically datagrams. These contain a bunch of bytes and a
length. With RTP you get a timestamp as well. That's all. No object
oriented stuff until you decode them (which is done after it leaves the
jitter buffer).
> Mainly they're different because you don't ever want the
jitterbuffer
> to throw them away -- you always want to deliver them.  They probably  
> have zero duration (are impulses), and will overlap in timestamps  
> with the audio frames.  You may not want to consider them in your  
> jitter calculations.
Depending on how the control stuff works, you probably don't *need* a
jitter buffer in the first place. At best you'll want to reorder the
data, no?
> > Also, why would you want to give it structs? AFAIK, IP packets
> > can only contain bytes anyway.
> 
> Of course.  But, in the way I've used the JB, and I would imagine in  
> most cases, the application which uses it is going to be parsing the  
> network stuff before putting it into a JB, and would put it into a  
> structure or object.  Clearly, everything is just bytes, and you  
> could do something similar with your JB api by passing in pointers  
> and len==4, _if_ your jitterbuffer didn't have the ability to just  
> drop frames internally.
Why would you parse and do work *before* putting it in the jitter
buffer, especially not even knowing whether you'll actually use it.
> The time in my implementation doesn't need to be wall time, nor do  
> timestamps;  They're all relative to each other, and the beginning of  
> the "session".   I think everything would work OK +- some
constants
> if the scale were different.
But why do you need that time in the first place?
> > Why no overlap? What if you want to include a bit of redundancy  
> > (doesn't
> > have to be 100% either) to make your app more robust to packet  
> > loss? You
> > could want to send a packet that covers 0-60 ms, followed by 40-100  
> > ms,
> > followed by 80-140 ms, ...
> 
> I see now.  I hadn't considered this, but it could also be expressed  
> as a sequence of 20ms frames, some of which are dups, and some which  
> have identical arrival times.  I'm not sure how my implementation  
> would handle this, but I don't think it breaks the API.
What do you mean expressed as a sequence. You mean you'd break the
frames down before sending them? Sounds complicated and even technically
impossible for the general case (what if the frames *can't* be broken
down for a particular codec?).
> > Well, that API clearly has limitations that mean I can't use them
> > to do
> > what I need. Unless you're willing to change that (and even then  
> > I'm not
> > sure), there's no way we can use the same API. I still suspect it
> > may be
> > possible to wrap by current API in that API. Of course, some features
> > would just not be available.
> 
> I think it would, except that your API lets the jb destroy data on  
> it's own, which would be bad, for example, if the data was a control  
> frame, or in every case, because frames are usually malloced.
What's the problem about the jitter buffer destroying control frames. If
you need to send them reliably, don't use UDP in the first place and
don't use a jitter buffer.
> Yours may indeed be better than mine, but before you say it won't get  
> confused, let's see what happens if it gets into asterisk and a lot  
> of real-world broken streams get thrown at it :)
Of course, I'm always interested in more testing. However, I've already
(voluntarily and especially involuntarily) abused it nonsensical data
and I have yet to see it fail (i.e. go into an irrecoverable state). 
> What really would help in the long run is if we had some kind of test  
> harness to run these things in, and good test data culled from real- 
> world situations.   I had some hacky tools like this I used when I  
> built my implementation, but nothing really good.
Sure. I guess it comes down to collecting (recording timestamps and all)
data from real applications in real scenarios. Then you figure out the
"right" behaviour and compare with what you obtain.

	Jean-Marc

Reasonably Related Threads

Search for more seemingly similar threads

Speex dev - May 2006 - New jitter.c, bug in speex_jitter_get?

[Speex-dev] New jitter.c, bug in speex_jitter_get?

[Speex-dev] New jitter.c, bug in speex_jitter_get?

[Speex-dev] New jitter.c, bug in speex_jitter_get?

Reasonably Related Threads