>> The main idea is that Speex supports many bit-rates, but for one reason >> or another, some modes may be left out in implementations (e.g. for RAM >> or network reasons). What we're saying here is that you should make an >> effoft to at least support (and offer) the 8 kbps mode to maximise >> compatibility. > > I understood this. But as you may know: the SDP parameters are PROPOSAL > only and a remote application might use another "mode": this typically > lead to interoperability issue and you should advise in the specification > to always support all "modes". I understand this can be seen as a > limitation, but in real world, it will not be acceptable to support > only a few mode among the provided ones.Consider a device that only has enough ROM to store one set of quantization tables (the limitation could also be about speed, network, ...). If you specify MUST be able to decode, then it means that this device simply *cannot* implement the spec *at all*. This is bad for interoperability.> I understand that speex needs multiple 20ms packets: for speex > "packetisation interval MUST be a multiple of 20ms", but you have > to provide a specification compliant with other ones: "ptime" can > have any other value and there can't be a MUST there. > > Round it up is a much better idea: usually, 30ms is used when > 20ms would introduce too much bandwidth overhead: if you round > it down, then you would get less quality.Fair enough.> Some application "allocate" a buffer based on the "ptime": thus > they copy 20ms of PCMU data each time they get a packet even > if they receive packets each 30ms... > > The sound cards play 2/3 of data received... This happen more > than you would imagine. Look at this implementation of current > iLBC in asterisk:Oh, I've seen a lot worse... like calling speex_encode() with 640 u-law samples instead of 160 floats (hey, it's the same number of bytes) to get 4x more compression! Though in this case, no IETF draft can save you :-)>>> Also, this table exists for narrowband, but still it does not for >>> wideband or ultrawideband: it would be nice to get also those ones. I >>> was really lost implementing this in my SIP application. >> >> Yes, I just checked that in into svn. Will be part of the 1.2beta2 >> manual (expected soon). > > And will you add thoses tables in the draft?Hadn't thought about it, but why not. Wouldn't take too much space and it would make things simpler.>> Well, the idea is what happens if all modes can't be supported for some >> reason. This is why we were saying 8 kbps (mode 3) SHOULD be supported. >> In practice, we can also strongly recommend supporting all modes, but >> I'm not sure I want to say MUST for that. > > I guess you already have my idea about this: all modes should be supported > unless you know you won't have issue. > > On good thing with g729 and its extension (g729 annexe b?) is that you > can still > receive g729b if you support only g729: this is transparent (as far as I > understood it). > > For speex, the modes are not transparent and thus, If I was the one > to choose, I would add in the draft: ALL MODES MUST BE SUPPORTED ON > THE RECEIVER SIDE. That's experience of real world.As I said, it's not possible unless you explicitly exclude some devices from being able to implement this. However, I'm not against saying something to the effect that if the client/device is physically capable of encoding/decoding a mode, then it MUST do it -- or something like that. Again, I'm open to any suggestion that doesn't involve banning certain devices outright. Another thing to consider. Even if I'm able to everything and all, if I'm on a 33.6 modem link and you attempt to send me 24.6 kbps with a ptime of 20 ms, it won't work, no matter what and the client might as well try something else (even if that something else is LPC10!).> The other way would be to make it transparent like g279.Not sure what kind of transparence you mean? The Speex decoder (unless you remove some tables) is able to decode anything without even knowing how it was encoded.>> I'm just trying to allow that while still taking into account the fact >> that some clients just don't have enough bandwidth or even enough >> RAM/ROM/MIPS to handle really handle anything that is sent to them. >> I'm definitely interested in any suggestion that can make both >> possible though. > > Make "mode" transparent! or forget about this. My own opinion...Again, what do you mean exactly by transparent?> I mean always the same: be prepared to decode all modes, no matter what > you sent in the SDP as preference. > > For example, xlite used to have a speex "quality" parameter and no > negotiation was done: if you were sending data with another mode, > the audio was not decocded. This was exactly the same issue than > the one described above for iLBC decoder in asterisk.If all you mean is "do your best to decode anything you get no matter how different it is from what you asked for", then I agree. Jean-Marc
On Wed, 16 May 2007, Jean-Marc Valin wrote:>>> The main idea is that Speex supports many bit-rates, but for one reason >>> or another, some modes may be left out in implementations (e.g. for RAM >>> or network reasons). What we're saying here is that you should make an >>> effoft to at least support (and offer) the 8 kbps mode to maximise >>> compatibility. >> >> I understood this. But as you may know: the SDP parameters are PROPOSAL >> only and a remote application might use another "mode": this typically >> lead to interoperability issue and you should advise in the specification >> to always support all "modes". I understand this can be seen as a >> limitation, but in real world, it will not be acceptable to support >> only a few mode among the provided ones. > > Consider a device that only has enough ROM to store one set of > quantization tables (the limitation could also be about speed, network, > ...). If you specify MUST be able to decode, then it means that this > device simply *cannot* implement the spec *at all*. This is bad for > interoperability.For me: device which don't have all mode implemented are already banned: the SDP offer/answer model is based on codec and parameter *proposal*: still the sender is *allowed* to not send the given mode. That said, UA that don't support all modes are already not capable of interoperability: they will receive sometimes modes that they do not proposed and do not support. One possible way would be to use "mode=any" to clearly indicate that the UA support any modes. The prefered mode could be specified by order of preference: "mode=5;mode=any". Other "limited device" would announce all modes supported: "mode=5;mode=3;mode=1" Samples for SDP negotiation must clearly indicates this behaviour: m=audio 8088 RTP/AVP 97 a=rtpmap:97 speex/8000 -> equivalent to "mode=any" (for smooth evolution of current implementations) m=audio 8088 RTP/AVP 97 a=rtpmap:97 speex/8000 a=fmtp:97 mode=3 -> ONLY mode=3 is supported. (LIMITED DEVICE) m=audio 8088 RTP/AVP 97 a=rtpmap:97 speex/8000 a=fmtp:97 mode=3;mode=1 (LIMITED DEVICE) -> mode=3 and mode=1 are supported, mode=3 is prefered. m=audio 8088 RTP/AVP 97 a=rtpmap:97 speex/8000 a=fmtp:97 mode=4;mode=any -> ALL modes are supported but mode=4 is prefered. The above would make sense and compliant application would be interoperable? right? That would be perfect for me, and I guess would be fine for you?>> I understand that speex needs multiple 20ms packets: for speex >> "packetisation interval MUST be a multiple of 20ms", but you have >> to provide a specification compliant with other ones: "ptime" can >> have any other value and there can't be a MUST there. >> >> Round it up is a much better idea: usually, 30ms is used when >> 20ms would introduce too much bandwidth overhead: if you round >> it down, then you would get less quality. > > Fair enough. > >> Some application "allocate" a buffer based on the "ptime": thus >> they copy 20ms of PCMU data each time they get a packet even >> if they receive packets each 30ms... >> >> The sound cards play 2/3 of data received... This happen more >> than you would imagine. Look at this implementation of current >> iLBC in asterisk: > > Oh, I've seen a lot worse... like calling speex_encode() with 640 u-law > samples instead of 160 floats (hey, it's the same number of bytes) to > get 4x more compression! Though in this case, no IETF draft can save you :-);(>>>> Also, this table exists for narrowband, but still it does not for >>>> wideband or ultrawideband: it would be nice to get also those ones. I >>>> was really lost implementing this in my SIP application. >>> >>> Yes, I just checked that in into svn. Will be part of the 1.2beta2 >>> manual (expected soon). >> >> And will you add thoses tables in the draft? > > Hadn't thought about it, but why not. Wouldn't take too much space and > it would make things simpler.Good. This will allow implementors to not always ask you what is a "mode"!>>> Well, the idea is what happens if all modes can't be supported for some >>> reason. This is why we were saying 8 kbps (mode 3) SHOULD be supported. >>> In practice, we can also strongly recommend supporting all modes, but >>> I'm not sure I want to say MUST for that. >> >> I guess you already have my idea about this: all modes should be supported >> unless you know you won't have issue. >> >> On good thing with g729 and its extension (g729 annexe b?) is that you >> can still >> receive g729b if you support only g729: this is transparent (as far as I >> understood it). >> >> For speex, the modes are not transparent and thus, If I was the one >> to choose, I would add in the draft: ALL MODES MUST BE SUPPORTED ON >> THE RECEIVER SIDE. That's experience of real world. > > As I said, it's not possible unless you explicitly exclude some devices > from being able to implement this. However, I'm not against saying > something to the effect that if the client/device is physically capable > of encoding/decoding a mode, then it MUST do it -- or something like > that. Again, I'm open to any suggestion that doesn't involve banning > certain devices outright.I guess what I proposed might reach a consensus between you and me, at least!> Another thing to consider. Even if I'm able to everything and all, if > I'm on a 33.6 modem link and you attempt to send me 24.6 kbps with a > ptime of 20 ms, it won't work, no matter what and the client might as > well try something else (even if that something else is LPC10!).This is one reason why ptime should be round up! Anyway, if a mode is not supported for network reason, the application that supports any more should propose a fmtp line: (acting like a limited device) a=fmtp:97 mode=1;mode=2>> The other way would be to make it transparent like g279. > > Not sure what kind of transparence you mean? The Speex decoder (unless > you remove some tables) is able to decode anything without even knowing > how it was encoded.Except for limited device which are not capable of decoding the "mode" that were removed from code ;) If I understood well the speex stuff: a decoder without wideband implementation would be capable of locating the narrowband part of the speex bitstream and return at least narrowband in audio? right? Is this possible? (I guess it's not that complex): --->wideband data --> NARROW BAND DECODER --> narrow band decoded audio! This does not seems possible: --->mode=4 nband data --> LIMITED DEVICE ---> correct nband data WITH MODE=3 ONLY With g729 / g729 annex b ---> g729 annex b --> LIMITED DEVICE ---> correct audio data. WITHOUT ANNEX B This is what I mean with "transparence".> If all you mean is "do your best to decode anything you get no matter > how different it is from what you asked for", then I agree.This has to be at least of recommendation so people are aware that speex implementation allow this behavior: it was unclear to people that they don't need to set option before starting decode process. Because they get a lot of decode failure before, I think speex users are more prepared now. Anyway, it's good to inform people. Sometimes, draft are providing some C code: I would love to see some kind of annex with a sample routine to decode speex data. Something like: static void dec_process(MSFilter *f) http://cvs.savannah.nongnu.org/viewvc/linphone/mediastreamer2/src/msspeex.c?root=linphone&view=markup I'm not sure of my implementation (mainly I have a strange behavior of "speex_bits_remaining"...) tks, Aymeric> Jean-Marc > >
>> Consider a device that only has enough ROM to store one set of >> quantization tables (the limitation could also be about speed, network, >> ...). If you specify MUST be able to decode, then it means that this >> device simply *cannot* implement the spec *at all*. This is bad for >> interoperability. > > For me: device which don't have all mode implemented are already banned: > the SDP offer/answer model is based on codec and parameter *proposal*: > still the sender is *allowed* to not send the given mode. That said, UA > that don't support all modes are already not capable of > interoperability: they will receive sometimes modes that they do not > proposed and do not support.The same reasoning could apply to other codecs as well: if a client doesn't support G.729, you can't say it's broken if someone unexpectedly start sending that. Of course, in the case of Speex, we try and do our best...> One possible way would be to use "mode=any" to clearly indicate that the > UA support any modes. The prefered mode could be specified by order of > preference: "mode=5;mode=any". Other "limited device" would announce all > modes supported: "mode=5;mode=3;mode=1"Yes, I agree with that.> Samples for SDP negotiation must clearly indicates this behaviour: > > m=audio 8088 RTP/AVP 97 > a=rtpmap:97 speex/8000 > > -> equivalent to "mode=any" (for smooth evolution of current > implementations)Shouldn't clients have at least a preferred mode? Otherwise, if everyone says "any", which bit-rate do you choose? <ship>> The above would make sense and compliant application would > be interoperable? right?I think so.> That would be perfect for me, and I guess would be fine for you?Yes, I think it's the best compromise -- and actually not very far from what we have now. I would still keep the thing that says 8 kbps SHOULD be supported. Or maybe we can say "all modes SHOULD be supported and if it's not possible, then at least 8 kbps (mode 3) SHOULD be supported".>>> The other way would be to make it transparent like g279. >> >> Not sure what kind of transparence you mean? The Speex decoder (unless >> you remove some tables) is able to decode anything without even knowing >> how it was encoded. > > Except for limited device which are not capable of decoding the "mode" > that were removed from code ;) > > If I understood well the speex stuff: a decoder without wideband > implementation would be capable of locating the narrowband part > of the speex bitstream and return at least narrowband in audio? right?Actually, if you give wideband data to a narrowband decoder, it will decode it as narrowband without even *realising* it's wideband (the wideband bits are transparently ignored).> Is this possible? (I guess it's not that complex): > --->wideband data --> NARROW BAND DECODER --> narrow band decoded audio!yes, that's possible.> This does not seems possible: > --->mode=4 nband data --> LIMITED DEVICE ---> correct nband data > WITH MODE=3 ONLYThat's not possible.>> If all you mean is "do your best to decode anything you get no matter >> how different it is from what you asked for", then I agree. > > This has to be at least of recommendation so people are aware that speex > implementation allow this behavior: it was unclear to people that they > don't need to set option before starting decode process. Because they > get a lot of decode failure before, I think speex users are more prepared > now. Anyway, it's good to inform people.What do you mean by "they get a lot of decode failure"?> Sometimes, draft are providing some C code: I would love to see some kind > of annex with a sample routine to decode speex data. > > Something like: static void dec_process(MSFilter *f) > http://cvs.savannah.nongnu.org/viewvc/linphone/mediastreamer2/src/msspeex.c?root=linphone&view=markup > > I'm not sure of my implementation (mainly I have a strange behavior of > "speex_bits_remaining"...)I'm not really sure I understand your code, but it looks much more complicated than it should be. Normally, the decode code should look like: speex_bits_init(&bits); speex_bits_read_from(&bits, packet, length); while (1) { err = speex_decode_int(state, bits, out_buffer); if (err != 0) break; play_audio(out_buffer); } if (err == -2) ms_warning("there was an error decoding"); I see from your code that you use speex_bits_remaining(&bits) but I can't really understand why. Could you explain? Jean-Marc