Ok, this is a silly question, but what does the jitter buffer do? I'm really new to audio, so please bear with me. From what I gather (primarily from the list archive), the jitter buffer is a wrapper around the Speex decoder. I give it the packets I receive, in whatever order I receive them, and then it gives me back a clean stream of audio samples. But what I don't entirely understand is how this is different from just working with the decoder directly. Right now, I dump my RTP packets direct into the Speex decoder, and then queue the output for playback. This works reasonably well. However, it doesn't accomodate dropped packets well. If I drop samples 10-20, I'll just queue 0-10 and then 20-30 immediately after, which isn't great. I think I read the jitter buffer will fabricate a fake replacement for the missing samples 10-20, and thus improving quality of playback. Is this correct? But what else does it do? I see mention of "clock skew", but I don't know what that means in this context. What am I missing? Most importantly, what does it have to do with jitter, and how can I use it to solve my problems? Specifically: 1) Assuming lossless, in-order, but highly irregular delivery of packets (as I'm witnessing), what advantage does the jitter buffer offer over going straight to the Speex decoder? 2) Assuming samples arrive at an average rate of 22KHz, but arrive in a highly irregular fashion, is there any way to ensure regular playback other than to just wait some "prebuffer" duration before beginning playback? How do I pick the smallest prebuffer duration to accomodate a given connection's jitter? 3) Assuming I want to deliver samples at a rate of 22KHz, what's the best graularity at which to encode and broadcast? Granted, I need to stay beneath the MTU. But should I be going for the largest granularity that fits under the MTU, or should I be going for the smallest granularity that my CPU can churn out? Thanks! -david Jean-Marc Valin wrote:> Have you looked at the Speex (adaptive) jitter buffer? See > speex_jitter.h > > Jean-Marc > > Le mardi 14 juin 2005 ? 17:50 -0700, David Barrett a ?crit : > >>What is the best way to pick a prebuffering length for a streaming audio >>application using UDP transport? >> >>I'm using Speex in a VoIP application with RTP transport, currently with >>a fixed 500ms prebuffer on the playback side. However, I'd like >>something a bit more adaptive to accomodate high-jitter connections. >> >>For example, in one test configuration there is a very low average >>round-trip latency (50ms), but it spikes all over the place (sometimes >>10ms, sometimes 500ms). Thus I can't make my prebuffer duration >>proportional to latency, but somehow proportional to "jitter". But I'm >>not sure the best way to quantify this, nor how to tranform that into a >>reasonable prebuffer length. >> >>Thus I'm curious what experience you've had in this area, and what you >>can recommend as a good way to adaptively compute a prebuffer duration. >> Thanks! >> >>-david >>_______________________________________________ >>Speex-dev mailing list >>Speex-dev@xiph.org >>http://lists.xiph.org/mailman/listinfo/speex-dev >>
I strongly suggest you start by reading the Speex manual (you can skip the technical parts about CELP). If you still ask questions, then post them. Jean-Marc Le mardi 14 juin 2005 ? 22:30 -0700, David Barrett a ?crit :> Ok, this is a silly question, but what does the jitter buffer do? I'm > really new to audio, so please bear with me. > > From what I gather (primarily from the list archive), the jitter buffer > is a wrapper around the Speex decoder. I give it the packets I receive, > in whatever order I receive them, and then it gives me back a clean > stream of audio samples. But what I don't entirely understand is how > this is different from just working with the decoder directly. > > Right now, I dump my RTP packets direct into the Speex decoder, and then > queue the output for playback. This works reasonably well. > > However, it doesn't accomodate dropped packets well. If I drop samples > 10-20, I'll just queue 0-10 and then 20-30 immediately after, which > isn't great. I think I read the jitter buffer will fabricate a fake > replacement for the missing samples 10-20, and thus improving quality of > playback. Is this correct? > > But what else does it do? I see mention of "clock skew", but I don't > know what that means in this context. What am I missing? Most > importantly, what does it have to do with jitter, and how can I use it > to solve my problems? Specifically: > > 1) Assuming lossless, in-order, but highly irregular delivery of packets > (as I'm witnessing), what advantage does the jitter buffer offer over > going straight to the Speex decoder? > > 2) Assuming samples arrive at an average rate of 22KHz, but arrive in a > highly irregular fashion, is there any way to ensure regular playback > other than to just wait some "prebuffer" duration before beginning > playback? How do I pick the smallest prebuffer duration to accomodate a > given connection's jitter? > > 3) Assuming I want to deliver samples at a rate of 22KHz, what's the > best graularity at which to encode and broadcast? Granted, I need to > stay beneath the MTU. But should I be going for the largest granularity > that fits under the MTU, or should I be going for the smallest > granularity that my CPU can churn out? > > > Thanks! > > -david > > > Jean-Marc Valin wrote: > > Have you looked at the Speex (adaptive) jitter buffer? See > > speex_jitter.h > > > > Jean-Marc > > > > Le mardi 14 juin 2005 ? 17:50 -0700, David Barrett a ?crit : > > > >>What is the best way to pick a prebuffering length for a streaming audio > >>application using UDP transport? > >> > >>I'm using Speex in a VoIP application with RTP transport, currently with > >>a fixed 500ms prebuffer on the playback side. However, I'd like > >>something a bit more adaptive to accomodate high-jitter connections. > >> > >>For example, in one test configuration there is a very low average > >>round-trip latency (50ms), but it spikes all over the place (sometimes > >>10ms, sometimes 500ms). Thus I can't make my prebuffer duration > >>proportional to latency, but somehow proportional to "jitter". But I'm > >>not sure the best way to quantify this, nor how to tranform that into a > >>reasonable prebuffer length. > >> > >>Thus I'm curious what experience you've had in this area, and what you > >>can recommend as a good way to adaptively compute a prebuffer duration. > >> Thanks! > >> > >>-david > >>_______________________________________________ > >>Speex-dev mailing list > >>Speex-dev@xiph.org > >>http://lists.xiph.org/mailman/listinfo/speex-dev > >> > _______________________________________________ > Speex-dev mailing list > Speex-dev@xiph.org > http://lists.xiph.org/mailman/listinfo/speex-dev >-- Jean-Marc Valin <Jean-Marc.Valin@USherbrooke.ca> Universit? de Sherbrooke
Ah, I'm sorry, I have read the manual and believe I have a reasonably good grasp on how to use the Speex encoder and decoder altogether. In fact I've been using it with great success in my P2P SIP/RTP VoIP application for almost a year now; it's been working wonderfully and I can't thank you enough. However, the manual makes no mention of the jitter buffer, nor does it (so far as I can tell) address the questions I've raised. The list archive has been more helpful in this regard, but I still have holes in my understanding. Specifically, I'm trying to refine my working system to work even better over high-jitter connections. I'm eager and open to using the jitter buffer as you suggest, and I see how it can impove playback quality in high packet-loss situations considerably, but I haven't yet wrapped my head around what benefit it offers in reliable, high-jitter environments. So far as I can tell, the only solution to jittery transport is an adequate prebuffer, and thus I'm looking for advice on how to determine what "adequate" means. Likewise, I can easily broadcast anywhere from 33ms to 500ms audio packets (I currently use 50ms), but I'd like to hear your real-world advice on what the ideal packet size is I should be using. Thanks for all your help! -david Jean-Marc Valin wrote:> I strongly suggest you start by reading the Speex manual (you can skip > the technical parts about CELP). If you still ask questions, then post > them. > > Jean-Marc > > Le mardi 14 juin 2005 ? 22:30 -0700, David Barrett a ?crit : > >>Ok, this is a silly question, but what does the jitter buffer do? I'm >>really new to audio, so please bear with me. >> >> From what I gather (primarily from the list archive), the jitter buffer >>is a wrapper around the Speex decoder. I give it the packets I receive, >>in whatever order I receive them, and then it gives me back a clean >>stream of audio samples. But what I don't entirely understand is how >>this is different from just working with the decoder directly. >> >>Right now, I dump my RTP packets direct into the Speex decoder, and then >>queue the output for playback. This works reasonably well. >> >>However, it doesn't accomodate dropped packets well. If I drop samples >>10-20, I'll just queue 0-10 and then 20-30 immediately after, which >>isn't great. I think I read the jitter buffer will fabricate a fake >>replacement for the missing samples 10-20, and thus improving quality of >>playback. Is this correct? >> >>But what else does it do? I see mention of "clock skew", but I don't >>know what that means in this context. What am I missing? Most >>importantly, what does it have to do with jitter, and how can I use it >>to solve my problems? Specifically: >> >>1) Assuming lossless, in-order, but highly irregular delivery of packets >>(as I'm witnessing), what advantage does the jitter buffer offer over >>going straight to the Speex decoder? >> >>2) Assuming samples arrive at an average rate of 22KHz, but arrive in a >>highly irregular fashion, is there any way to ensure regular playback >>other than to just wait some "prebuffer" duration before beginning >>playback? How do I pick the smallest prebuffer duration to accomodate a >>given connection's jitter? >> >>3) Assuming I want to deliver samples at a rate of 22KHz, what's the >>best graularity at which to encode and broadcast? Granted, I need to >>stay beneath the MTU. But should I be going for the largest granularity >>that fits under the MTU, or should I be going for the smallest >>granularity that my CPU can churn out? >> >> >>Thanks! >> >>-david >> >> >>Jean-Marc Valin wrote: >> >>>Have you looked at the Speex (adaptive) jitter buffer? See >>>speex_jitter.h >>> >>> Jean-Marc >>> >>>Le mardi 14 juin 2005 ? 17:50 -0700, David Barrett a ?crit : >>> >>> >>>>What is the best way to pick a prebuffering length for a streaming audio >>>>application using UDP transport? >>>> >>>>I'm using Speex in a VoIP application with RTP transport, currently with >>>>a fixed 500ms prebuffer on the playback side. However, I'd like >>>>something a bit more adaptive to accomodate high-jitter connections. >>>> >>>>For example, in one test configuration there is a very low average >>>>round-trip latency (50ms), but it spikes all over the place (sometimes >>>>10ms, sometimes 500ms). Thus I can't make my prebuffer duration >>>>proportional to latency, but somehow proportional to "jitter". But I'm >>>>not sure the best way to quantify this, nor how to tranform that into a >>>>reasonable prebuffer length. >>>> >>>>Thus I'm curious what experience you've had in this area, and what you >>>>can recommend as a good way to adaptively compute a prebuffer duration. >>>> Thanks! >>>> >>>>-david >>>>_______________________________________________ >>>>Speex-dev mailing list >>>>Speex-dev@xiph.org >>>>http://lists.xiph.org/mailman/listinfo/speex-dev >>>> >> >>_______________________________________________ >>Speex-dev mailing list >>Speex-dev@xiph.org >>http://lists.xiph.org/mailman/listinfo/speex-dev >>