Hello, *Background:* The RFC 5574 suggests the RTP payload format for the speex codec. The payload formation is straight forward; the encoded frames are to be concatenated one after another. Once we have appended desired number of frames, we have to pad the stream with 01111 sort of sequence to ensure that payload ends on a octet boundary. *Observation:* I am using the speex encoder at 2150 Kbps (by setting the quality to 0). For a frame of 20 ms ~ 160 samples (considering 8000 samples per second as the sampling rate), the encoder is giving me encoded output of 6 bytes. As a test case, I encoded some 10 frames one after another each time getting 6 bytes of encoded output. I concatenated each of the 6 byte encoded outputs. As suggested in couple of posts I tried to decode this stream of encoded voice by calling the decoder repeatedly until the bits remaining api returned me a value less than 1. What I observed was this sequence: First time the decoder returned successful decode; Second time it returned end of stream; thrid time it returned successful decode; fourth time it returned end of stream; ... That is: decode success, EoS, decode success, EoS, decode success, EoS, .... *Hypothesis:* Based on the above observation, what might be happening is: For a frame of 20 ms (=> 50 frames in a second), the encoder (running at 2150 bps) computes 43 bits of encoded stream. Since it has to return in terms of full bytes, it pads 01111 sequence to give a 48 bit output. Now while decoding 43 bits are first decoded; Then 01111 sequence is interpreted as end of stream; Then next 43 bits are decoded and 01111 is interpreted as end of stream and so on. *Query:* For Speex, when we are packing multiple encoded frames in the RTP packet, should we, a. pack the encoded frame in full bytes as received from the encoder (i.e. 48 bits) or b. we should be chopping the end of stream marker 0 followed by 1's (i.e. strictly 43 bits) and have the 0 followed by 1's sequence used only for padding the payload to ensure octet boundary. *Reason for the query:* I want to implement the RTP packetization that is interoperable. If the receiver is not in my control, it should still be able to decode the stream that I am sending. Regards, Manish S. Jalan -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.xiph.org/pipermail/speex-dev/attachments/20091210/77fa0acd/attachment.htm
You cannot concatenate bytes because Speex frames don't necessarily end on octet boundaries. You need to call the encoder multiple times on the same SpeexBits bitpacket. Jean-Marc Manish Jalan wrote:> Hello, > > _*Background:*_ > The RFC 5574 suggests the RTP payload format for the speex codec. The > payload formation is straight forward; the encoded frames are to be > concatenated one after another. Once we have appended desired number of > frames, we have to pad the stream with 01111 sort of sequence to ensure > that payload ends on a octet boundary. > > _*Observation:*_ > I am using the speex encoder at 2150 Kbps (by setting the quality to 0). > For a frame of 20 ms ~ 160 samples (considering 8000 samples per second > as the sampling rate), the encoder is giving me encoded output of 6 bytes. > As a test case, I encoded some 10 frames one after another each time > getting 6 bytes of encoded output. I concatenated each of the 6 byte > encoded outputs. > > As suggested in couple of posts I tried to decode this stream of encoded > voice by calling the decoder repeatedly until the bits remaining api > returned me a value less than 1. > > What I observed was this sequence: First time the decoder returned > successful decode; Second time it returned end of stream; thrid time it > returned successful decode; fourth time it returned end of stream; ... > > That is: decode success, EoS, decode success, EoS, decode success, EoS, .... > > _*Hypothesis:*_ > Based on the above observation, what might be happening is: > For a frame of 20 ms (=> 50 frames in a second), the encoder (running at > 2150 bps) computes 43 bits of encoded stream. Since it has to return in > terms of full bytes, it pads 01111 sequence to give a 48 bit output. > Now while decoding 43 bits are first decoded; Then 01111 sequence is > interpreted as end of stream; Then next 43 bits are decoded and 01111 > is interpreted as end of stream and so on. > > _*Query:*_ > For Speex, when we are packing multiple encoded frames in the RTP > packet, should we, > a. pack the encoded frame in full bytes as received from the encoder > (i.e. 48 bits) > or > b. we should be chopping the end of stream marker 0 followed by 1's > (i.e. strictly 43 bits) and have the 0 followed by 1's sequence used > only for padding the payload to ensure octet boundary. > > _*Reason for the query:*_ > I want to implement the RTP packetization that is interoperable. If the > receiver is not in my control, it should still be able to decode the > stream that I am sending. > > > Regards, > Manish S. Jalan > > > ------------------------------------------------------------------------ > > _______________________________________________ > Speex-dev mailing list > Speex-dev at xiph.org > http://lists.xiph.org/mailman/listinfo/speex-dev
Hello Jean-Marc We really appreciate your input. If I understand it right, we should be calling the encoder on the same SpeexBits structure passing it a frame at a time to encode for as many times as the number of frames that we want to pack in the RTP payload. The output then obtained from from the encoder will have the necessary padding at the end without any separators between individual encoded frames. That would be really cool. Other wise I was scratching my head as to we'll have to chop the padding from each encoded frame before it can be put in the RTP payload. Lot of bit operations otherwise reqd from the appln have been saved. Really appreciate your input. Thanks and Regards, -Manish S. Jalan On Thu, Dec 10, 2009 at 5:22 PM, Jean-Marc Valin < jean-marc.valin at usherbrooke.ca> wrote:> You cannot concatenate bytes because Speex frames don't necessarily end > on octet boundaries. You need to call the encoder multiple times on the > same SpeexBits bitpacket. > > Jean-Marc > > Manish Jalan wrote: > > Hello, > > > > _*Background:*_ > > The RFC 5574 suggests the RTP payload format for the speex codec. The > > payload formation is straight forward; the encoded frames are to be > > concatenated one after another. Once we have appended desired number of > > frames, we have to pad the stream with 01111 sort of sequence to ensure > > that payload ends on a octet boundary. > > > > _*Observation:*_ > > I am using the speex encoder at 2150 Kbps (by setting the quality to 0). > > For a frame of 20 ms ~ 160 samples (considering 8000 samples per second > > as the sampling rate), the encoder is giving me encoded output of 6 > bytes. > > As a test case, I encoded some 10 frames one after another each time > > getting 6 bytes of encoded output. I concatenated each of the 6 byte > > encoded outputs. > > > > As suggested in couple of posts I tried to decode this stream of encoded > > voice by calling the decoder repeatedly until the bits remaining api > > returned me a value less than 1. > > > > What I observed was this sequence: First time the decoder returned > > successful decode; Second time it returned end of stream; thrid time it > > returned successful decode; fourth time it returned end of stream; ... > > > > That is: decode success, EoS, decode success, EoS, decode success, EoS, > .... > > > > _*Hypothesis:*_ > > Based on the above observation, what might be happening is: > > For a frame of 20 ms (=> 50 frames in a second), the encoder (running at > > 2150 bps) computes 43 bits of encoded stream. Since it has to return in > > terms of full bytes, it pads 01111 sequence to give a 48 bit output. > > Now while decoding 43 bits are first decoded; Then 01111 sequence is > > interpreted as end of stream; Then next 43 bits are decoded and 01111 > > is interpreted as end of stream and so on. > > > > _*Query:*_ > > For Speex, when we are packing multiple encoded frames in the RTP > > packet, should we, > > a. pack the encoded frame in full bytes as received from the encoder > > (i.e. 48 bits) > > or > > b. we should be chopping the end of stream marker 0 followed by 1's > > (i.e. strictly 43 bits) and have the 0 followed by 1's sequence used > > only for padding the payload to ensure octet boundary. > > > > _*Reason for the query:*_ > > I want to implement the RTP packetization that is interoperable. If the > > receiver is not in my control, it should still be able to decode the > > stream that I am sending. > > > > > > Regards, > > Manish S. Jalan > > > > > > ------------------------------------------------------------------------ > > > > _______________________________________________ > > Speex-dev mailing list > > Speex-dev at xiph.org > > http://lists.xiph.org/mailman/listinfo/speex-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.xiph.org/pipermail/speex-dev/attachments/20091210/bc47fca6/attachment.htm