> Perhaps, but then you need to assume that the jitterbuffer can just > throw away the data, and that limits how you can use it. In object- > oriented terms, you might want to pass objects to the JB, and then > call a destructor on them. In C terms, you may want to allocate > frames via malloc(), and then call free() on them later. You might > want to pass in reference-counted objects of some sort, etc.Are we talking about the same thing here? I'm talking about IP packets, more specifically datagrams. These contain a bunch of bytes and a length. With RTP you get a timestamp as well. That's all. No object oriented stuff until you decode them (which is done after it leaves the jitter buffer).> Mainly they're different because you don't ever want the jitterbuffer > to throw them away -- you always want to deliver them. They probably > have zero duration (are impulses), and will overlap in timestamps > with the audio frames. You may not want to consider them in your > jitter calculations.Depending on how the control stuff works, you probably don't *need* a jitter buffer in the first place. At best you'll want to reorder the data, no?> > Also, why would you want to give it structs? AFAIK, IP packets > > can only contain bytes anyway. > > Of course. But, in the way I've used the JB, and I would imagine in > most cases, the application which uses it is going to be parsing the > network stuff before putting it into a JB, and would put it into a > structure or object. Clearly, everything is just bytes, and you > could do something similar with your JB api by passing in pointers > and len==4, _if_ your jitterbuffer didn't have the ability to just > drop frames internally.Why would you parse and do work *before* putting it in the jitter buffer, especially not even knowing whether you'll actually use it.> The time in my implementation doesn't need to be wall time, nor do > timestamps; They're all relative to each other, and the beginning of > the "session". I think everything would work OK +- some constants > if the scale were different.But why do you need that time in the first place?> > Why no overlap? What if you want to include a bit of redundancy > > (doesn't > > have to be 100% either) to make your app more robust to packet > > loss? You > > could want to send a packet that covers 0-60 ms, followed by 40-100 > > ms, > > followed by 80-140 ms, ... > > I see now. I hadn't considered this, but it could also be expressed > as a sequence of 20ms frames, some of which are dups, and some which > have identical arrival times. I'm not sure how my implementation > would handle this, but I don't think it breaks the API.What do you mean expressed as a sequence. You mean you'd break the frames down before sending them? Sounds complicated and even technically impossible for the general case (what if the frames *can't* be broken down for a particular codec?).> > Well, that API clearly has limitations that mean I can't use them > > to do > > what I need. Unless you're willing to change that (and even then > > I'm not > > sure), there's no way we can use the same API. I still suspect it > > may be > > possible to wrap by current API in that API. Of course, some features > > would just not be available. > > I think it would, except that your API lets the jb destroy data on > it's own, which would be bad, for example, if the data was a control > frame, or in every case, because frames are usually malloced.What's the problem about the jitter buffer destroying control frames. If you need to send them reliably, don't use UDP in the first place and don't use a jitter buffer.> Yours may indeed be better than mine, but before you say it won't get > confused, let's see what happens if it gets into asterisk and a lot > of real-world broken streams get thrown at it :)Of course, I'm always interested in more testing. However, I've already (voluntarily and especially involuntarily) abused it nonsensical data and I have yet to see it fail (i.e. go into an irrecoverable state).> What really would help in the long run is if we had some kind of test > harness to run these things in, and good test data culled from real- > world situations. I had some hacky tools like this I used when I > built my implementation, but nothing really good.Sure. I guess it comes down to collecting (recording timestamps and all) data from real applications in real scenarios. Then you figure out the "right" behaviour and compare with what you obtain. Jean-Marc
On May 3, 2006, at 9:54 PM, Jean-Marc Valin wrote:>> Perhaps, but then you need to assume that the jitterbuffer can just >> throw away the data, and that limits how you can use it. In object- >> oriented terms, you might want to pass objects to the JB, and then >> call a destructor on them. In C terms, you may want to allocate >> frames via malloc(), and then call free() on them later. You might >> want to pass in reference-counted objects of some sort, etc. > > Are we talking about the same thing here? I'm talking about IP > packets, > more specifically datagrams. These contain a bunch of bytes and a > length. With RTP you get a timestamp as well. That's all. No object > oriented stuff until you decode them (which is done after it leaves > the > jitter buffer).It depends on the protocol. RTP packets, though, don't just have the audio payload and a length; they also have the timestamp, the payload type, etc. Some RTP packets may be audio data, some may be video, some may be DTMF digits, etc. It's not brain surgery, but you'd generally parse these into some kind of structure, even if the structure is just a mapping onto a buffer. Anyway, the point is that if you want to an abstraction above the jitterbuffer, it makes sense that the jitterbuffer would treat it's payload as opaque. That would mean it can't just throw it away. Then if it wants to destroy it, you could either allow the jitterbuffer to callback to an application-passed destroy function, or have it return it with a flag of some kind.> >> Mainly they're different because you don't ever want the jitterbuffer >> to throw them away -- you always want to deliver them. They probably >> have zero duration (are impulses), and will overlap in timestamps >> with the audio frames. You may not want to consider them in your >> jitter calculations. > > Depending on how the control stuff works, you probably don't *need* a > jitter buffer in the first place. At best you'll want to reorder the > data, no?Hmm, I just had this discussion here: http://bugs.digium.com/ view.php?id=6011 Here's the case where you would want it: The theory behind doing that is that in general, you want control to be synchronized with voice. Imagine a situation where you are leaving a voicemail, and you have a big (say 1500ms) jitterbuffer; The VM system lets you press "#" to end the recording. So you say "I'll pay you A million dollars; just kidding!", and press "#"; If the system acted on the DTMF immediately, your message might be seriously misinterpreted. The same concept holds true for processing HANGUP frames, etc. I don't know what is done by other applications, and buffering control frames, etc certainly leads to complication.> >>> Also, why would you want to give it structs? AFAIK, IP packets >>> can only contain bytes anyway. >> >> Of course. But, in the way I've used the JB, and I would imagine in >> most cases, the application which uses it is going to be parsing the >> network stuff before putting it into a JB, and would put it into a >> structure or object. Clearly, everything is just bytes, and you >> could do something similar with your JB api by passing in pointers >> and len==4, _if_ your jitterbuffer didn't have the ability to just >> drop frames internally. > > Why would you parse and do work *before* putting it in the jitter > buffer, especially not even knowing whether you'll actually use it.In the case of IAX2, there's lots of reasons: 1) You need to determine which call it belongs to. 2) If it's a reliably-sent frame, you acknowledge it immediately. 3) Some frame types are not buffered.> >> The time in my implementation doesn't need to be wall time, nor do >> timestamps; They're all relative to each other, and the beginning of >> the "session". I think everything would work OK +- some constants >> if the scale were different. > > But why do you need that time in the first place?Time is just what drives the output of the jitterbuffer. Time could be determined by the number of samples needed to keep a soundcard full, or it could be time according to the system clock, if you're not (or not able to) synchronize to a soundcard. I don't think it's very different from your concept of ticks. I use actual measures of time, because the implementation of my jitterbuffer sometimes makes decisions where knowing what time means helps.> >>> Why no overlap? What if you want to include a bit of redundancy >>> (doesn't >>> have to be 100% either) to make your app more robust to packet >>> loss? You >>> could want to send a packet that covers 0-60 ms, followed by 40-100 >>> ms, >>> followed by 80-140 ms, ... >> >> I see now. I hadn't considered this, but it could also be expressed >> as a sequence of 20ms frames, some of which are dups, and some which >> have identical arrival times. I'm not sure how my implementation >> would handle this, but I don't think it breaks the API. > > What do you mean expressed as a sequence. You mean you'd break the > frames down before sending them? Sounds complicated and even > technically > impossible for the general case (what if the frames *can't* be broken > down for a particular codec?).Again, I don't think the API prohibits the use that you've described. You can call jb_put() with that overlapping data. My _implementation_ may not like it, but the API can support it.> >>> Well, that API clearly has limitations that mean I can't use them >>> to do >>> what I need. Unless you're willing to change that (and even then >>> I'm not >>> sure), there's no way we can use the same API. I still suspect it >>> may be >>> possible to wrap by current API in that API. Of course, some >>> features >>> would just not be available. >> >> I think it would, except that your API lets the jb destroy data on >> it's own, which would be bad, for example, if the data was a control >> frame, or in every case, because frames are usually malloced. > > What's the problem about the jitter buffer destroying control > frames. If > you need to send them reliably, don't use UDP in the first place and > don't use a jitter buffer.See above. I'm not married to the idea of buffering control frames, but there's a valid argument to be made for it.> >> Yours may indeed be better than mine, but before you say it won't get >> confused, let's see what happens if it gets into asterisk and a lot >> of real-world broken streams get thrown at it :) > > Of course, I'm always interested in more testing. However, I've > already > (voluntarily and especially involuntarily) abused it nonsensical data > and I have yet to see it fail (i.e. go into an irrecoverable state). > >> What really would help in the long run is if we had some kind of test >> harness to run these things in, and good test data culled from real- >> world situations. I had some hacky tools like this I used when I >> built my implementation, but nothing really good. > > Sure. I guess it comes down to collecting (recording timestamps and > all) > data from real applications in real scenarios. Then you figure out the > "right" behaviour and compare with what you obtain.There may not always be a single "right" behavior, because the system is always trying to predict future events. But there's definitely "wrong" behaviors that can easily be tested objectively. -SteveK
> It depends on the protocol. RTP packets, though, don't just have the > audio payload and a length; they also have the timestamp, the > payload type, etc. Some RTP packets may be audio data, some may be > video, some may be DTMF digits, etc.Timestamp is already supported. Different payload types should require different jitter buffers (one jitter buffer per payload). I'm planning on supporting synchronization between jitter buffers, but haven't done it yet. I first need to have a better understanding of how it needs to be done when the timestamp units and offsets are different.> It's not brain surgery, but you'd generally parse these into some > kind of structure, even if the structure is just a mapping onto a > buffer. Anyway, the point is that if you want to an abstraction > above the jitterbuffer, it makes sense that the jitterbuffer would > treat it's payload as opaque. That would mean it can't just throw it > away. Then if it wants to destroy it, you could either allow the > jitterbuffer to callback to an application-passed destroy function, > or have it return it with a flag of some kind.I'm willing to consider the destroy function (callback) method. But it still means the jitter buffer can destroy stuff with telling the application (which is I think reasonable to expect).> Hmm, I just had this discussion here: http://bugs.digium.com/ > view.php?id=6011 Here's the case where you would want it: > > The theory behind doing that is that in general, you want control to be > synchronized with voice. Imagine a situation where you are leaving a > voicemail, and you have a big (say 1500ms) jitterbuffer; The VM system > lets you press "#" to end the recording. So you say "I'll pay you A > million dollars; just kidding!", and press "#"; If the system acted on > the DTMF immediately, your message might be seriously misinterpreted.I still think you need a different type of buffering for stuff like DTMF. You want to reorder and maybe synchronize, but that's it. Plus I'd say sending DTMF over UDP is a bad idea in the first place.> The same concept holds true for processing HANGUP frames, etc.Same. If you send the HANGUP over UDP, what do you do if it goes missing?> I don't know what is done by other applications, and buffering > control frames, etc certainly leads to complication.Which is why you need to treat them separately and not attempt to make them fit in a framework they don't belong to.> > Why would you parse and do work *before* putting it in the jitter > > buffer, especially not even knowing whether you'll actually use it. > > In the case of IAX2, there's lots of reasons: > > 1) You need to determine which call it belongs to.No big deal if you still send the bytes...> 2) If it's a reliably-sent frame, you acknowledge it immediately.See above.> 3) Some frame types are not buffered.Then why do you use a jitter buffer on them?> Time is just what drives the output of the jitterbuffer. Time could > be determined by the number of samples needed to keep a soundcard > full, or it could be time according to the system clock, if you're > not (or not able to) synchronize to a soundcard. > > I don't think it's very different from your concept of ticks. I use > actual measures of time, because the implementation of my > jitterbuffer sometimes makes decisions where knowing what time means > helps.I just think that creates confusion. What do you do if the user sends the system clock and the soundcard is slightly out of sync with that one? Also, the only reason I now use ticks (the previous version didn't) is that it gives me slightly finer time estimates (for adjusting the buffer delay) in case the audio (or whatever data) frames are bigger than the soundcard period. The jitter buffer doesn't even have an explicit "buffering time" value in it.> Again, I don't think the API prohibits the use that you've > described. You can call jb_put() with that overlapping data. My > _implementation_ may not like it, but the API can support it.Does it have a way to tell you that "OK, you asked data for timestamps 60-80ms, but I'm giving you a packet that spans 70-90"... or anything that could have overlaps and/or holes compared to what you're asking?> See above. I'm not married to the idea of buffering control frames, > but there's a > valid argument to be made for it.I guess it truly depends on the meaning of the "control frames". If they are "rendered" (e.g. DTMF), they might be representable with timestamps and durations (e.g. the duration of the tone), so they *may* fit in (as long as you consider they may be lost). I really don't see stuff like hangup going into a jitter buffer. You may want to sync it with the rest, but that's it.> There may not always be a single "right" behavior, because the system > is always trying to predict future events. But there's definitely > "wrong" behaviors that can easily be tested objectively.By "right", I mean "does the get() return roughly the right stuff without too many gaps?" or "how big is the latency?". Of course, "does the jitter buffer segfault?" is also good to test :-) Jean-Marc