Hi All, I am quite new to voice encoding and to Speex library. I have some programming questions that I am not able to answer even after having read the documentation (from the start to the end an from the end to the start) and by reading the sample code (provided in the documentation) and the speexdec.c and speexenc.c programs. I am currently working on an interface between Java code and C code for using the C version of Speex library inside Java programs. I have to encode a stream with 44100 Khz 16 bits Stereo and I would encode it by using packets containing a duration of about 1/10 sec of unencoded speech (this value is not fixed and will be an integer multiple of the Speex encoder frame size to have a duration of about 1/10 of a second). The duration imposed by the Speex encoder frame size is a little too short for the application. My problems are the following: 0) Is it possible to manipulate voice data representing more than the frame size of encoder (this seems possible in Speex Java code used until now). 1) How can I specify to the encoder that I use 16 bits stereo data ? The ctl function allows only to set the sampling rate. I have found nowhere some details. In speexenc.c I have seen the use of some related to stereo functions but without understanding well how to use them. 2) What is exactly the frame size ? It is an integer representing a number of sample or a number of bytes or even a number of shorts (in sampleenc.c this seems to be a number of shorts) 3) How (if this is possible) to decode a certain amount (known) of coded data if these coded data represent more than the frame size of encoder. This situation arise when the coded data for a duration greater than the frame size of decoder are used. Should I call some integer number of times the decoding procedure (like it is done in the java code of Speex). 4) What is the difference "SPEEX_GET_FRAME_SIZE" and "SPEEX_MODE_FRAME_SIZE" operations. This should not be identical since these operations are provided by two different functions but I don't see any explanation of a difference in the documentation. Iwould greatly appreciate If somebody has some answer to these questions. Thanks and best regards, Alain Aubord
> 0) Is it possible to manipulate voice data representing more than > the frame size of encoder (this seems possible in Speex Java code > used until now).Speex itself manipulates audio one frame at the time, but what you do after that if up to you.> 1) How can I specify to the encoder that I use 16 bits stereo data ? > The ctl function allows only to set the sampling rate. I have > found nowhere some details. In speexenc.c I have seen the use > of some related to stereo functions but without understanding > well how to use them.speex_encode is for float and speex_encode_int is for 16-bit short integer.> 2) What is exactly the frame size ? It is an integer representing > a number of sample or a number of bytes or even a number of > shorts (in sampleenc.c this seems to be a number of shorts)Always the number of samples. Speex doesn't care what the original encoding was.> 3) How (if this is possible) to decode a certain amount (known) of > coded data if these coded data represent more than the frame size > of encoder. This situation arise when the coded data for a > duration greater than the frame size of decoder are used. Should > I call some integer number of times the decoding procedure (like > it is done in the java code of Speex).Same as encoding. Decoding decodes one frame, but you may call it as many times as you like.> 4) What is the difference "SPEEX_GET_FRAME_SIZE" and > "SPEEX_MODE_FRAME_SIZE" operations. This should not be identical > since these operations are provided by two different functions > but I don't see any explanation of a difference in the > documentation.Easy. You don't have to create an encoder/decoder state in order to use SPEEX_MODE_FRAME_SIZE. Jean-Marc -- Jean-Marc Valin <Jean-Marc.Valin@USherbrooke.ca> Universit? de Sherbrooke
alain,> > I have to encode a stream with 44100 Khz 16 bits Stereo and I would encode > it by using packets containing a duration of about 1/10 sec of unencoded > speech (this value is not fixed and will be an integer multiple of the > Speex encoder frame size to have a duration of about 1/10 of a second). > The duration imposed by the Speex encoder frame size is a little too short > for the application.you are better off using the vogg orbis codec. speex is meant specifically for telephonic voice. it takes a single human voice and compresses it well. it cannot handle muliple voices or music very well. For instance, a whistle can after a while, completely fade out on speex.> > My problems are the following: > 0) Is it possible to manipulate voice data representing more than > the frame size of encoder (this seems possible in Speex Java code > used until now).what every your sized chunks you have, for speex, you will have to repack them as 160 samples per frame.> 1) How can I specify to the encoder that I use 16 bits stereo data ? > The ctl function allows only to set the sampling rate. I have > found nowhere some details. In speexenc.c I have seen the use > of some related to stereo functions but without understanding > well how to use them.simply use two instances of speex state structures, one for each channel.> 2) What is exactly the frame size ? It is an integer representing > a number of sample or a number of bytes or even a number of > shorts (in sampleenc.c this seems to be a number of shorts)for sanity's sake, just accept that speex will work only with 160 samples per frame.> 3) How (if this is possible) to decode a certain amount (known) of > coded data if these coded data represent more than the frame size > of encoder. This situation arise when the coded data for a > duration greater than the frame size of decoder are used. Should > I call some integer number of times the decoding procedure (like > it is done in the java code of Speex).the decode funtion will internally decode the data 160 samples at time from each frame. It can detect that the bit stream has multiple frames (each decoding to 160 samples).> 4) What is the difference "SPEEX_GET_FRAME_SIZE" and > "SPEEX_MODE_FRAME_SIZE" operations. This should not be identical > since these operations are provided by two different functions > but I don't see any explanation of a difference in the > documentation.SPPEX_GET_FRAME_SIZE will return the size of the uncompressed frame(whic is 160). it is the number of PCM samples that each speex frame represents. Speex is a variable bit rate codec. Hence, the 160 samples are compressed into a speex packet of a size that is determined by the selected mode of compression. SPEEX_MODE_FRAME_SIZE returns the size of the compressed frame. and it will vary depending upon the mode selected. - farhan> > Iwould greatly appreciate If somebody has some answer to these questions. > > Thanks and best regards, > > > Alain Aubord > > > _______________________________________________ > Speex-dev mailing list > Speex-dev@xiph.org > http://lists.xiph.org/mailman/listinfo/speex-dev >