> Page 3: > > To be compliant with this specification, implementations MUST support > 8 kHz sampling rate (narrowband)" and SHOULD support 8 kbps bitrate. > The sampling rate MUST be 8, 16 or 32 kHz. > > There is a type above after (narrowband), there is a " extra character. > > I don't understand what is the motivation to specify "SHOULD support 8 > kbps bitrate".The main idea is that Speex supports many bit-rates, but for one reason or another, some modes may be left out in implementations (e.g. for RAM or network reasons). What we're saying here is that you should make an effoft to at least support (and offer) the 8 kbps mode to maximise compatibility.> Page 8: > > Optional parameters: > > ptime: see RFC 4566. SHOULD be a multiple of 20 msec. > > maxptime: see RFC 4566. SHOULD be a multiple of 20 msec. > > In real world, many SIP application use either 20 or 30ms. This > ptime parameter is really not reliable for negotiation... On possible > way to handle non multiple would be to take the right above value: > if 30ms is specify, then recommand to use 40ms for speex.Actually, it needs to be "MUST" be a multiple of 20 ms because no matter what happens, Speex frames are 20 ms long. I expect most clients would use 20 ms, as it corresponds to one packet. As to what we need to do if the ptime is invalid, I'm not quite sure, though maybe as you say round it up (or down?).> Page 10: > > The value of the sampling frequency is typically 8000 for narrow band > operation, 16000 for wide band operation, and 32000 for ultra-wide > band operation. > > The word "typically" means to me that it could be something else than > 8000, 16000 or 32000: I would recommend to make it clear: > > The value of the sampling frequency MUST be either 8000 for narrow band > operation, 16000 for wide band operation, and 32000 for ultra-wide > band operation.Agreed.> > ptime: duration of each packet in milliseconds. > > http://www.ietf.org/rfc/rfc4566.txt specify that in the ptime definition: > "it is intended as a recommendation for the encoding/packetisation of > audio". Thus, I would recommend to specify the same text as in rfc3264 > for sdp offer/answer model: > > "If the ptime attribute is present for a stream, it indicates the > desired packetization interval that the offerer would like to > receive. The ptime attribute MUST be greater than zero." > > It might also be a good idea to say that even if an offerer would like > to receive 20ms, the sender MAY use a different packetization interval... > This is the origin of numerous interop issue with speex in SIP > applications.Sounds fair. Just curious, what's the exact interop issue?> sr: actual sample rate in Hz. > > ebw: encoding bandwidth - either 'narrow' or 'wide' or 'ultra' > (corresponds to nominal 8000, 16000, and 32000 Hz sampling rates). > > Both the "sr" and "ebw" conflicts with speex/XXXX rtpmap. I really > recommend to remote both those definition so that application will > configure themselves using either speex/8000, speex/16000, speex/32000. > Having 3 way to specify sampling rate is a nightmare for interop.Had missed that one. It definitely makes sense. The original draft allowed specifying using the narrowband/wideband encoder independently of the sampling rate, but in retrospect, that was just plain wrong.> Page 11: > > mode: Speex encoding mode. Can be {1,2,3,4,5,6,any} defaults to 3 > in narrowband, 6 in wide and ultra-wide. > > I always asked for a "table" in the specification here providing link > between "mode" and "bitrate". Else, you get those mails: > > http://lists.xiph.org/pipermail/speex-dev/2006-March/004288.html > > If I get it right, the table is there: > http://www.speex.org/docs/manual/speex-manual/node10.html > Table 4: Quality versus bit-rate > > Also, this table exists for narrowband, but still it does not for > wideband or ultrawideband: it would be nice to get also those ones. I > was really lost implementing this in my SIP application.Yes, I just checked that in into svn. Will be part of the 1.2beta2 manual (expected soon).> > Examples: > > m=audio 8008 RTP/AVP 97 > a=rtpmap:97 speex/8000 > a=fmtp:97 mode=4 > > This examples illustrate an offerer that wishes to receive a Speex > stream at 8000Hz, but only using speex mode 4. > > Is it a recommandation or a MUST: for me, and to allow better > interoperability, an application is sending "mode=4" because it > wishes to receive "mode=4": but, in case, the remote application > can only send "mode=3", the receiver MUST be prepared to receive > ANY mode. We can't get interoperability without this and I would > recommand to specify that such use-case will often happen in real > world and that it MUST be supported.Well, the idea is what happens if all modes can't be supported for some reason. This is why we were saying 8 kbps (mode 3) SHOULD be supported. In practice, we can also strongly recommend supporting all modes, but I'm not sure I want to say MUST for that.> Several Speex specific parameters can be given in a single a=fmtp > line provided that they are separated by a semi-colon: > > a=fmtp:97 mode=any;mode=1 > > No error here: just curious why you want to allow this? Wouldn't it > be nice to specify that the order of mode parameter is significant? > I guess this is what you want? (in that case, "mode=1,mode=any" might > be more meaningfull?)No longer sure why we had that... Albert? Greg?> More generally, I would really like to have a line specifying that > whatever you proposed (ptime, mode, vbr, cng), the sender could > use different encoder configuration for any reason (bandwidth reason > or lazy developper): a speex decoder don't have to be configured before > decoding so an application MUST be able to decode any speex stream > it receive provided that the sample rate was correctly negotiated.Actually, even with an incorrect sampling rate (narrowband vs wideband), the Speex decoder will be able to cope. Again, I totally agree with the idea of getting clients to accept pretty much anything, I'm just trying to allow that while still taking into account the fact that some clients just don't have enough bandwidth or even enough RAM/ROM/MIPS to handle really handle anything that is sent to them. I'm definitely interested in any suggestion that can make both possible though.> today, many speex application I've seen are broken on the receiver side, > because they configure decoders using SDP negotiation "wish" or "static > configuration": providing information about this can be valuable.Not sure I understand what you mean here. Again suggestions welcome. Jean-Marc> tks, > Aymeric MOIZARD / ANTISIP > amsip - http://www.antisip.com > osip2 - http://www.osip.org > eXosip2 - http://savannah.nongnu.org/projects/exosip/ > > > On Tue, 15 May 2007, Alfred E. Heggestad wrote: > >> Hi all >> >> We are about to send an updated version of the internet draft >> "RTP Payload Format for the Speex Codec" to the IETF AVT working group. >> Before submitting we would like your input, if you have any comments >> or input please send them to the mailing list. >> >> If we don't get any comments in 1 week (by 22. May 2007) we will go ahead >> and submit it to the IETF. Of course you can comment on it also after it >> has been submitted, but we would like to get the input from the Speex >> community first.. >> >> The Internet Draft is attached. >> >> >> /alfred >> >> > > _______________________________________________ > Speex-dev mailing list > Speex-dev@xiph.org > http://lists.xiph.org/mailman/listinfo/speex-dev > >
comment inline. On Wed, 16 May 2007, Jean-Marc Valin wrote:>> Page 3: >> >> To be compliant with this specification, implementations MUST support >> 8 kHz sampling rate (narrowband)" and SHOULD support 8 kbps bitrate. >> The sampling rate MUST be 8, 16 or 32 kHz. >> >> There is a type above after (narrowband), there is a " extra character. >> >> I don't understand what is the motivation to specify "SHOULD support 8 >> kbps bitrate". > > The main idea is that Speex supports many bit-rates, but for one reason > or another, some modes may be left out in implementations (e.g. for RAM > or network reasons). What we're saying here is that you should make an > effoft to at least support (and offer) the 8 kbps mode to maximise > compatibility.I understood this. But as you may know: the SDP parameters are PROPOSAL only and a remote application might use another "mode": this typically lead to interoperability issue and you should advise in the specification to always support all "modes". I understand this can be seen as a limitation, but in real world, it will not be acceptable to support only a few mode among the provided ones.>> Page 8: >> >> Optional parameters: >> >> ptime: see RFC 4566. SHOULD be a multiple of 20 msec. >> >> maxptime: see RFC 4566. SHOULD be a multiple of 20 msec. >> >> In real world, many SIP application use either 20 or 30ms. This >> ptime parameter is really not reliable for negotiation... On possible >> way to handle non multiple would be to take the right above value: >> if 30ms is specify, then recommand to use 40ms for speex. > > Actually, it needs to be "MUST" be a multiple of 20 ms because no matter > what happens, Speex frames are 20 ms long. I expect most clients would > use 20 ms, as it corresponds to one packet. As to what we need to do if > the ptime is invalid, I'm not quite sure, though maybe as you say round > it up (or down?).I understand that speex needs multiple 20ms packets: for speex "packetisation interval MUST be a multiple of 20ms", but you have to provide a specification compliant with other ones: "ptime" can have any other value and there can't be a MUST there. Round it up is a much better idea: usually, 30ms is used when 20ms would introduce too much bandwidth overhead: if you round it down, then you would get less quality.>> Page 10: >> >> The value of the sampling frequency is typically 8000 for narrow band >> operation, 16000 for wide band operation, and 32000 for ultra-wide >> band operation. >> >> The word "typically" means to me that it could be something else than >> 8000, 16000 or 32000: I would recommend to make it clear: >> >> The value of the sampling frequency MUST be either 8000 for narrow band >> operation, 16000 for wide band operation, and 32000 for ultra-wide >> band operation. > > Agreed.Good.>> ptime: duration of each packet in milliseconds. >> >> http://www.ietf.org/rfc/rfc4566.txt specify that in the ptime definition: >> "it is intended as a recommendation for the encoding/packetisation of >> audio". Thus, I would recommend to specify the same text as in rfc3264 >> for sdp offer/answer model: >> >> "If the ptime attribute is present for a stream, it indicates the >> desired packetization interval that the offerer would like to >> receive. The ptime attribute MUST be greater than zero." >> >> It might also be a good idea to say that even if an offerer would like >> to receive 20ms, the sender MAY use a different packetization interval... >> This is the origin of numerous interop issue with speex in SIP >> applications. > > Sounds fair. Just curious, what's the exact interop issue?Some application "allocate" a buffer based on the "ptime": thus they copy 20ms of PCMU data each time they get a packet even if they receive packets each 30ms... The sound cards play 2/3 of data received... This happen more than you would imagine. Look at this implementation of current iLBC in asterisk: http://www.asteriskpbx.org/doxygen/trunk/codec__ilbc_8c-source.html The initEncode is called with a #define #define ILBC_MS 30 /* #define ILBC_MS 20 */ static int ilbctolin_new(struct ast_trans_pvt *pvt) { struct ilbc_coder_pvt *tmp = pvt->pvt; initDecode(&tmp->dec, ILBC_MS, USE_ILBC_ENHANCER); return 0; } The above means: If you negotiate a different packetisation interval: just recompile ;( Weird... Old speex implementation in asterisk was doing the same, it seems it's now fixed in latest version.>> sr: actual sample rate in Hz. >> >> ebw: encoding bandwidth - either 'narrow' or 'wide' or 'ultra' >> (corresponds to nominal 8000, 16000, and 32000 Hz sampling rates). >> >> Both the "sr" and "ebw" conflicts with speex/XXXX rtpmap. I really >> recommend to remote both those definition so that application will >> configure themselves using either speex/8000, speex/16000, speex/32000. >> Having 3 way to specify sampling rate is a nightmare for interop. > > Had missed that one. It definitely makes sense. The original draft > allowed specifying using the narrowband/wideband encoder independently > of the sampling rate, but in retrospect, that was just plain wrong.I'm so happy of that answer: I hope there is a consensus here.>> Page 11: >> >> mode: Speex encoding mode. Can be {1,2,3,4,5,6,any} defaults to 3 >> in narrowband, 6 in wide and ultra-wide. >> >> I always asked for a "table" in the specification here providing link >> between "mode" and "bitrate". Else, you get those mails: >> >> http://lists.xiph.org/pipermail/speex-dev/2006-March/004288.html >> >> If I get it right, the table is there: >> http://www.speex.org/docs/manual/speex-manual/node10.html >> Table 4: Quality versus bit-rate >> >> Also, this table exists for narrowband, but still it does not for >> wideband or ultrawideband: it would be nice to get also those ones. I >> was really lost implementing this in my SIP application. > > Yes, I just checked that in into svn. Will be part of the 1.2beta2 > manual (expected soon).And will you add thoses tables in the draft?>> >> Examples: >> >> m=audio 8008 RTP/AVP 97 >> a=rtpmap:97 speex/8000 >> a=fmtp:97 mode=4 >> >> This examples illustrate an offerer that wishes to receive a Speex >> stream at 8000Hz, but only using speex mode 4. >> >> Is it a recommandation or a MUST: for me, and to allow better >> interoperability, an application is sending "mode=4" because it >> wishes to receive "mode=4": but, in case, the remote application >> can only send "mode=3", the receiver MUST be prepared to receive >> ANY mode. We can't get interoperability without this and I would >> recommand to specify that such use-case will often happen in real >> world and that it MUST be supported. > > Well, the idea is what happens if all modes can't be supported for some > reason. This is why we were saying 8 kbps (mode 3) SHOULD be supported. > In practice, we can also strongly recommend supporting all modes, but > I'm not sure I want to say MUST for that.I guess you already have my idea about this: all modes should be supported unless you know you won't have issue. On good thing with g729 and its extension (g729 annexe b?) is that you can still receive g729b if you support only g729: this is transparent (as far as I understood it). For speex, the modes are not transparent and thus, If I was the one to choose, I would add in the draft: ALL MODES MUST BE SUPPORTED ON THE RECEIVER SIDE. That's experience of real world. The other way would be to make it transparent like g279.>> Several Speex specific parameters can be given in a single a=fmtp >> line provided that they are separated by a semi-colon: >> >> a=fmtp:97 mode=any;mode=1 >> >> No error here: just curious why you want to allow this? Wouldn't it >> be nice to specify that the order of mode parameter is significant? >> I guess this is what you want? (in that case, "mode=1,mode=any" might >> be more meaningfull?) > > No longer sure why we had that... Albert? Greg? > >> More generally, I would really like to have a line specifying that >> whatever you proposed (ptime, mode, vbr, cng), the sender could >> use different encoder configuration for any reason (bandwidth reason >> or lazy developper): a speex decoder don't have to be configured before >> decoding so an application MUST be able to decode any speex stream >> it receive provided that the sample rate was correctly negotiated. > > Actually, even with an incorrect sampling rate (narrowband vs wideband), > the Speex decoder will be able to cope. Again, I totally agree with the > idea of getting clients to accept pretty much anything,Good.> I'm just trying to allow that while still taking into account the fact > that some clients just don't have enough bandwidth or even enough > RAM/ROM/MIPS to handle really handle anything that is sent to them. I'm > definitely interested in any suggestion that can make both possible > though.Make "mode" transparent! or forget about this. My own opinion...>> today, many speex application I've seen are broken on the receiver side, >> because they configure decoders using SDP negotiation "wish" or "static >> configuration": providing information about this can be valuable. > > Not sure I understand what you mean here. Again suggestions welcome.I mean always the same: be prepared to decode all modes, no matter what you sent in the SDP as preference. For example, xlite used to have a speex "quality" parameter and no negotiation was done: if you were sending data with another mode, the audio was not decocded. This was exactly the same issue than the one described above for iLBC decoder in asterisk. Aymeric> Jean-Marc
>> The main idea is that Speex supports many bit-rates, but for one reason >> or another, some modes may be left out in implementations (e.g. for RAM >> or network reasons). What we're saying here is that you should make an >> effoft to at least support (and offer) the 8 kbps mode to maximise >> compatibility. > > I understood this. But as you may know: the SDP parameters are PROPOSAL > only and a remote application might use another "mode": this typically > lead to interoperability issue and you should advise in the specification > to always support all "modes". I understand this can be seen as a > limitation, but in real world, it will not be acceptable to support > only a few mode among the provided ones.Consider a device that only has enough ROM to store one set of quantization tables (the limitation could also be about speed, network, ...). If you specify MUST be able to decode, then it means that this device simply *cannot* implement the spec *at all*. This is bad for interoperability.> I understand that speex needs multiple 20ms packets: for speex > "packetisation interval MUST be a multiple of 20ms", but you have > to provide a specification compliant with other ones: "ptime" can > have any other value and there can't be a MUST there. > > Round it up is a much better idea: usually, 30ms is used when > 20ms would introduce too much bandwidth overhead: if you round > it down, then you would get less quality.Fair enough.> Some application "allocate" a buffer based on the "ptime": thus > they copy 20ms of PCMU data each time they get a packet even > if they receive packets each 30ms... > > The sound cards play 2/3 of data received... This happen more > than you would imagine. Look at this implementation of current > iLBC in asterisk:Oh, I've seen a lot worse... like calling speex_encode() with 640 u-law samples instead of 160 floats (hey, it's the same number of bytes) to get 4x more compression! Though in this case, no IETF draft can save you :-)>>> Also, this table exists for narrowband, but still it does not for >>> wideband or ultrawideband: it would be nice to get also those ones. I >>> was really lost implementing this in my SIP application. >> >> Yes, I just checked that in into svn. Will be part of the 1.2beta2 >> manual (expected soon). > > And will you add thoses tables in the draft?Hadn't thought about it, but why not. Wouldn't take too much space and it would make things simpler.>> Well, the idea is what happens if all modes can't be supported for some >> reason. This is why we were saying 8 kbps (mode 3) SHOULD be supported. >> In practice, we can also strongly recommend supporting all modes, but >> I'm not sure I want to say MUST for that. > > I guess you already have my idea about this: all modes should be supported > unless you know you won't have issue. > > On good thing with g729 and its extension (g729 annexe b?) is that you > can still > receive g729b if you support only g729: this is transparent (as far as I > understood it). > > For speex, the modes are not transparent and thus, If I was the one > to choose, I would add in the draft: ALL MODES MUST BE SUPPORTED ON > THE RECEIVER SIDE. That's experience of real world.As I said, it's not possible unless you explicitly exclude some devices from being able to implement this. However, I'm not against saying something to the effect that if the client/device is physically capable of encoding/decoding a mode, then it MUST do it -- or something like that. Again, I'm open to any suggestion that doesn't involve banning certain devices outright. Another thing to consider. Even if I'm able to everything and all, if I'm on a 33.6 modem link and you attempt to send me 24.6 kbps with a ptime of 20 ms, it won't work, no matter what and the client might as well try something else (even if that something else is LPC10!).> The other way would be to make it transparent like g279.Not sure what kind of transparence you mean? The Speex decoder (unless you remove some tables) is able to decode anything without even knowing how it was encoded.>> I'm just trying to allow that while still taking into account the fact >> that some clients just don't have enough bandwidth or even enough >> RAM/ROM/MIPS to handle really handle anything that is sent to them. >> I'm definitely interested in any suggestion that can make both >> possible though. > > Make "mode" transparent! or forget about this. My own opinion...Again, what do you mean exactly by transparent?> I mean always the same: be prepared to decode all modes, no matter what > you sent in the SDP as preference. > > For example, xlite used to have a speex "quality" parameter and no > negotiation was done: if you were sending data with another mode, > the audio was not decocded. This was exactly the same issue than > the one described above for iLBC decoder in asterisk.If all you mean is "do your best to decode anything you get no matter how different it is from what you asked for", then I agree. Jean-Marc