As I understand it, there are two separate ways to get VAD information from Speex: 1) Using the encoder. 2) Using speex_preprocess(). I present the following observations from an application developer's perspective. They may be wrong, in which case I would appreciate corrections. - The two VAD systems are implemented differently. - speex_preprocess()'s VAD provides more accurate detection than the encoder's VAD at the cost of more CPU usage. - speex_preprocess()'s VAD is affected by the AGC and/or denoise state more directly than the encoder's VAD. - Possibly as a result of the previous point, speex_preprocess()'s VAD can get into a bad state, given an input that varies drastically in amplitude/behavior, after which point its accuracy is ruined and the only solution is to destroy/recreate the preprocess state. Tom "Paul Gryting" <paul.gryting@teligy.com> wrote:> > In speexenc.c, speex_preprocess() is not called unless AGC or denoise is > enabled. > If only VAD is enabled, it does not get called. > > speex_preprocess() has vad_enabled specific code to detect voice activity. > speex_preprocess() > { > ... > ... > if (st->vad_enabled) > is_speech = speex_compute_vad(st, ps, mean_prior, mean_post); > > ... > ... > return is_speech; > } > > Some questions for the knowledgable: > Is speex_preprocess() needed to use vad? > > Can speex_preprocess() be used to detect silent frames if vad is enabled, > but not agc or denoise? > What internally does speex do differntly for silent frames when VAD is > enabled? > > > Paul