thr3ads.net - Speex dev - [Speex-dev] Jitter buffer [Nov 2004]

If this information is useful, please help other people find it:
Share via:

Steve Kann

2004-Nov-15 16:21 UTC

[Speex-dev] Jitter buffer

Jean-Marc Valin wrote:
>>I believe it is adaptive, but no, I haven't used it, because
it's
>>coupled only to the speex codec.  We're working on a generic
>>application and codec-independent jitter buffer algorithm, for use in
>>asterisk and iaxclient (at least).  Some information is available at
>>http://www.voip-info.org/tiki-index.php?page=Asterisk%20new%
>>20jitterbuffer
>>    
>>
>
>Yes, this jitter buffer is adaptive. There are still some improvements
>left to do (e.g. making sure packets are dropped/interpolated during
>silent periods), but it's already working good. 
>
>As for being Speex-dependent, I'd say yes and no. It calls speex_decode,
>which is a Speex "virtual" function, so it would be rather simple
to
>wrap any other codec to make it work like that. If you're interested, I
>can provide help doing that.
>  
>OK, I'm actually about ready to start working on this now.

If people in the speex community are interested in working with me on 
this, I can probably start with the speex buffer, but I imagine there's 
going to be a lot more work needed to get this where I'd like it to go.

At the API level, It seems pretty easy to make the speex implementation 
become speex-independent.  Instead of having speex_jitter_get call any 
particular speex_decode or speex_read_bits, etc functions, it could 
instead just return the "thing" it got, and a flag.  I.e.

#define JB_OK           0
#define JB_EMPTY        1
#define JB_NOFRAME      2
#define JB_DROP         3

JB_OK means here's the frame for the timestamp you asked for.
JB_EMPTY   means  the jitterbuffer is empty, (we just started or whatnot)
JB_NOFRAME means the caller should interpolate a frame
JB_DROP means we'd like you to drop this frame, and try calling us again 
with the same timestamp.

We could then have a second-level API wrapper around this for speex 
which would then call speex_decode, etc, as necessary.

Basically, I'd like the jitterbuffer to do all the history, length, etc 
calculations, but not actually know anything about what it's managing.

In asterisk and iaxclient (my project), the things I'd pass into the 
jitterbuffer would be different kinds of structures.  Some of these may 
be audio frames, some might be control frames (DTMF, etc) that we want 
synchronized with the audio stream, etc.  In the future, we'd also want 
to have video frames thrown in there, which would need to be synchronized.

So, I guess my questions (for Jean-Marc mostly, but others as well):

1) Is it OK with you to add this extra abstraction layer to your jitter 
buffer?
2) What's the state of the current implementation: (does it work?)
3) Is there a paper or something that you're using in your design that I 
can read. 
4) Are people interested in collaborating and contributing to this.

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.xiph.org/pipermail/speex-dev/attachments/20041115/0ea0cd03/attachment.html

Jean-Marc Valin

2004-Nov-15 22:30 UTC

head link

[Speex-dev] Jitter buffer

> OK, I'm actually about ready to start working on this now.
> 
> If people in the speex community are interested in working with me on
> this, I can probably start with the speex buffer, but I imagine
> there's going to be a lot more work needed to get this where I'd
like
> it to go.
And where would you like it to go? ;-)
> At the API level, It seems pretty easy to make the speex
> implementation become speex-independent.  Instead of having
> speex_jitter_get call any particular speex_decode or speex_read_bits,
> etc functions, it could instead just return the "thing" it got,
and a
> flag.  I.e. 
It's not as simple as it may look -- otherwise that's what I would have
done. These are some of the things that you can't do easily if you
"just
return the thing":
- Allow more than one frame per packet, especially if the frames don't
end on a byte boundary
- Let the jitter buffer drop/interpolate frames during silence periods
- Anything that requires the jitter buffer to know about what is being
decoded.
> We could then have a second-level API wrapper around this for speex
> which would then call speex_decode, etc, as necessary.
> 
> Basically, I'd like the jitterbuffer to do all the history, length,
> etc calculations, but not actually know anything about what it's
> managing.
I would suggest the opposite. You can just think of the current
implementation as being callback-based. If you look at the
implementation of speex_decode, it merely looks for a callback function
in a struct (mode definition). It would not be very hard to provide
similar (callback structures) wrappers for other codecs. I'm willing to
modify the current implementation to make that easier (though it's
already not very hard).
> In asterisk and iaxclient (my project), the things I'd pass into the
> jitterbuffer would be different kinds of structures.  Some of these
> may be audio frames, some might be control frames (DTMF, etc) that we
> want synchronized with the audio stream, etc.  In the future, we'd
> also want to have video frames thrown in there, which would need to be
> synchronized. 
I'm not sure of the best way to do that. Audio has different constraints
as video when you're doing jitter buffering. For example, it's much
easier (in terms of perceptual degradation) to skip frames with video
than with audio, which means that the algorithm to handle that optimally
may be quite different. Don't you think?
> So, I guess my questions (for Jean-Marc mostly, but others as well):
> 
> 1) Is it OK with you to add this extra abstraction layer to your
> jitter buffer?
I think there might be better ways to abstract the codec out of that
(callbacks and all).
> 2) What's the state of the current implementation: (does it work?)
As of 1.1.6, the jitter buffer actually works and I've been able to get
good results with it.
> 3) Is there a paper or something that you're using in your design that
> I can read.  
Sorry, it just came out of my head. It's probably similar to what others
are doing, but I "invented" it independently.
> 4) Are people interested in collaborating and contributing to this.
I am.

	Jean-Marc

Steve Kann

2004-Nov-16 12:53 UTC

head link

[Speex-dev] Jitter buffer

Jean-Marc Valin wrote:
>>OK, I'm actually about ready to start working on this now.
>>
>>If people in the speex community are interested in working with me on
>>this, I can probably start with the speex buffer, but I imagine
>>there's going to be a lot more work needed to get this where I'd
like
>>it to go.
>>    
>>
>
>And where would you like it to go? ;-)
>  
>Heh.  I guess after playing with different jitter buffers long enough, 
I've realized that there's always situations that you haven't
properly
accounted for when designing one. 

>>At the API level, It seems pretty easy to make the speex
>>implementation become speex-independent.  Instead of having
>>speex_jitter_get call any particular speex_decode or speex_read_bits,
>>etc functions, it could instead just return the "thing" it
got, and a
>>flag.  I.e. 
>>    
>>
>
>It's not as simple as it may look -- otherwise that's what I would
have
>done. These are some of the things that you can't do easily if you
"just
>return the thing":
>- Allow more than one frame per packet, especially if the frames don't
>end on a byte boundary
>- Let the jitter buffer drop/interpolate frames during silence periods
>- Anything that requires the jitter buffer to know about what is being
>decoded.
>  
>I think the only difficult part here that you do is dealing with 
multiple frames per packet, without that information being available to 
the jitter buffer.  If the jitter buffer can be told when a packet is 
added that the packet contains Xms of audio, then the jitter buffer 
won't have a problem handling this.

This is something I've encountered in trying to make a particular 
asterisk application handle properly IAX2 frames which contain either 
20ms of 40ms of speex data.  For a CBR case, where the bitrate is known, 
this is fairly easy to do, especially if the frames _do_ always end on 
byte boundaries.  For a VBR case, it is more difficult, because it 
doesn't look like there's a way to just parse the speex bitstream and 
break it up into the constituent 20ms frames.

The problem isn't so much that the jb can't return the right thing, but 
that internally it can't know if it just passed back a packet that 
contained 40ms of data or 20ms of data, so later it can't know if it's 
lost a frame or not.

The other things can be handled based on the return value of the _get 
method:  dropping frames, interpolating, etc.
>>We could then have a second-level API wrapper around this for speex
>>which would then call speex_decode, etc, as necessary.
>>
>>Basically, I'd like the jitterbuffer to do all the history, length,
>>etc calculations, but not actually know anything about what it's
>>managing.
>>    
>>
>
>I would suggest the opposite. You can just think of the current
>implementation as being callback-based. If you look at the
>implementation of speex_decode, it merely looks for a callback function
>in a struct (mode definition). It would not be very hard to provide
>similar (callback structures) wrappers for other codecs. I'm willing to
>modify the current implementation to make that easier (though it's
>already not very hard).
>  
>I can see how you'd do that, but I don't think that would work for me.  
I really don't want the jitterbuffer to handle decoding at all, because 
in some cases, I want to dejitter the stream, but not decode it.

For example, I will be running this in front of a conferencing 
application.  This conferencing application handles participants, each 
of which can use a different codec.   Often, we "optimize" the path 
through the conferencing application by passing the encoded stream 
straight-through to listeners when there is only one speaker, and the 
speaker and participant use the same codec(*).  In this case, I want to 
pass back the actual encoded frame, and also the information about what 
to do with it, so that I can pass along the frame to some participants, 
and decode (and possibly transcode) it for others.

(*) (Yes, I do understand that this violates the expectations of the 
codecs, but so far, it seems to work well for GSM, and for speex, since 
the changes generally occur surrounded by silence, and there is a huge 
benefit in scalability, and also in clarity because there is no 
generational loss; the only drawback is that there is _sometimes_ some 
distortion when the contexts change).

>>In asterisk and iaxclient (my project), the things I'd pass into the
>>jitterbuffer would be different kinds of structures.  Some of these
>>may be audio frames, some might be control frames (DTMF, etc) that we
>>want synchronized with the audio stream, etc.  In the future, we'd
>>also want to have video frames thrown in there, which would need to be
>>synchronized. 
>>    
>>
>
>I'm not sure of the best way to do that. Audio has different constraints
>as video when you're doing jitter buffering. For example, it's much
>easier (in terms of perceptual degradation) to skip frames with video
>than with audio, which means that the algorithm to handle that optimally
>may be quite different. Don't you think?
>  
>Yes.  While I don't plan on doing video in the first pass, the API for 
this would be that when you pass in your "thing", you also pass along:

a) The timestamp for that "thing"
b) Some flags for the "thing".  This might be a "stream
number" for the
thing, or it might just be a flag saying this "thing" is audio, or
this
"thing" is a control frame which must never be dropped, etc, or this
is
a video frame (and maybe a keyframe), etc.

A jitterbuffer for an audio/video system needs to be integrated, of 
course, so that the audio and video are synchronized.  In my first 
implementation, I haven't decided if I want to do more than video, but 
presently, in asterisk, all frames go through the jitterbuffer, and it 
makes some sense to include, e.g. DTMF frames in there, HANGUP frames, 
etc.  (imagine leaving a voicemail, if DTMF or HANGUP isn't 
jitterbuffered, and your have a _lot_ of jitter, you could lose part of 
your message.  If DTMF frames don't go through there, they may be 
processed out-of-order, etc.).

>>So, I guess my questions (for Jean-Marc mostly, but others as well):
>>
>>1) Is it OK with you to add this extra abstraction layer to your
>>jitter buffer?
>>    
>>
>
>I think there might be better ways to abstract the codec out of that
>(callbacks and all).
>  
>callbacks or flags, I think, would work the same way, except for the 
above.  I think it could work for your use as well, though, if you had 
some way to tell the jb out-of-band how long the "packet" you sent
along
was (how many milliseconds).

-SteveK

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.xiph.org/pipermail/speex-dev/attachments/20041116/95e9f96a/attachment.htm

Possibly Parallel Threads

Search for more apparently analagous threads

Speex dev - Nov 2004 - Jitter buffer

[Speex-dev] Jitter buffer

[Speex-dev] Jitter buffer

[Speex-dev] Jitter buffer

Possibly Parallel Threads