I don't know why you're getting sound carrying over to the next time you encode - that doesn't sound normal to me. Have you tried saving and examining the raw audio you're feeding to the encoder? Have you tried encoding and decoding that using speexenc/speexdec? I use Speex in VBR mode for a VoIP app. I'm always recording and running the audio through the preprocessor (denoise, AGC) while a session is established. I only encode and transmit while speech is detected (if the user has chosen VAD mode) or while a button is held down (if the user has chosen PTT mode). (These VAD/PTT are application-level constructs - I'm not using Speex VAD or DTX.) No sound is carried over from the end of one burst to the beginning of the next. I use the same encoder/decoder objects for the life of the session and don't do anything to reset state between bursts. As for VBR quality, VBR mode is designed to target a specified level of quality without guaranteeing how much bandwidth might be used at any particular moment. So, it's ideal when you want the best tradeoff between quality and bandwidth while not requiring a strict constraint on bandwidth. Try watching a graph of bandwidth utilization as you're talking and transmitting using VBR at a particular VBR quality setting. Try varying the VBR quality while listening for the difference in audio quality and watching the different in the bandwidth graph. It behaves as one would expect - I doubt you'll be surprised by the results. Tom "Chris Weiland" <hobbiticus at gmail.com> wrote:> > I'm writing a voice communication application, and I've got a few > issues that I'd like to get ironed out, but I don't know enough about > the speex implementation. > > First of all, this application is mainly used for conferencing - many > people are in a room and only 1-2 are ever talking at a time. So, > always encoding and transmitting everyone's audio stream would be > rather wasteful. I also do not want to be creating and destroying > encoder and decoder objects every time someone start/stops talking. > > Ideally, what I'd like to do is record/encode when someone pushes the > talk key, then stop when they release it, AND reuse the same > encoder/decoder objects every time this happens. However, if the > person lets go of the talk key while there is still audible sound, > that sound will carry over to the next time they start transmitting. > I know why this happens, but I don't know the proper way to prevent > this from happening. Right now, I encode a few frames of silence and > send them over the network in order to "reset" the encoder state, but > this is not ideal bandwidth-wise. > > Is there a better way to do this? I know that DTX handles a similar > problem, but I'm not sure if it would do any good. > > Secondly, I'm curious to know if VBR really has any drawbacks in terms > of quality. Specifically, can I be guaranteed to achieve the same > quality with AT MOST the same bandwidth as with CBR, or can a VBR > encoded frame actually be significantly bigger than a CBR encoded > frame of the same quality? I also noticed that VBR has its own quality > setting - does this override the main quality setting when VBR is > enabled? Is there any noticeable difference in quality (audibly or > mathematically) between a CBR and VBR stream with the same quality > setting? > > Basically, I'd like to know if there is any reason to NOT have VBR > enabled for my application. It's meant to run over broadband > internet, but I know that whoever is hosting would like to conserve > their bandwidth. However, I also want to maximize voice quality for > the users. > > > -Chris Weiland > _______________________________________________ > Speex-dev mailing list > Speex-dev at xiph.org > http://lists.xiph.org/mailman/listinfo/speex-dev