Lis, I suggest you try tweaking Speex's VAD probabilities as Steve suggested. But consider a simple threshold-based approach as a backup option. Personally, I struggled with Speex's VAD algorithms (both encoder and preprocessor) for a long time, tweaked the probabilities, wrote special case code to work around the mistakes, and was still never satisfied with the results. In times of really obvious silence, it would detect speech. Often, it would detect many brief background noises as speech, such as clicks or typing. Sometimes at the beginning or end of speech, it would detect silence. (It seemed to vary based on the frequency content of the speech.) And, there were issues with VAD and AGC together. I finally switched to a very simple power threshold check and all of my problems went away. It worked far better than I ever expected. Background noise is not a problem if you just use the Speex denoiser (which is VERY effective) and calculate the power of the signal after that. This is the function I use to calculate the power: // Returns the power of a signal (sample_t is signed 16-bit int) float getPower(sample_t *signal, int numSamples) { float powerSum = 0.0f; for (int i = 0; i < numSamples; i++) { float amp = (float) abs(signal[i]); powerSum += amp * amp; } return powerSum / (32768.0f * 32768.0f * (float) numSamples); } I can't say that this is optimal or even correct, but it works very well for me. And users rarely have to adjust the threshold as long as they're using AGC to bring the signal up to a proper range. I don't mean to bash Speex VAD. I really wanted it to work. But for me, in a wideband PC-based VoIP application that relies heavily on VAD, a simple power threshold based approach ended up working much more reliably. (By the way, I'm curious about power vs. energy here. Doesn't it make more sense to use power instead of energy for VAD? Or, maybe, the terms are sometimes used interchangably?) Tom Steve Kann <stevek@stevek.com> wrote:> > > Lis, > > The Voice Activity Detection (VAD) algorithm in the speex > preprocessor does not work simply by detecting the energy level (volume > or loudness) in the audio frames, but it uses a more complex algorithm > which (a) tries to ignore background noise, and (b) tries to detect > speech, in particular, and not just energy. > > If you need to adjust the sensitivity of this, you can use these > settings: > > #define SPEEX_PREPROCESS_SET_PROB_START 14 > #define SPEEX_PREPROCESS_GET_PROB_START 15 > > #define SPEEX_PREPROCESS_SET_PROB_CONTINUE 16 > #define SPEEX_PREPROCESS_GET_PROB_CONTINUE 17 > > which adjusts the 'probabilities' that are used to define speech and > non-speech, for the start of speech, and to continue speech. > > -SteveK > > > Lis wrote: > > > Sorry. > > > > I forgotten the words volume or loudness. > > But it is know as microphone stroke too, i think. > > If something can tell me something about that > > procedure it would complete my pleasure. > > To bring back memories, > > i only wanted to know wheather i can change a > > variable that holds the sound intensity (loudness) > > needet to start "encoding >> sending" if the speex codec > > is in voice activation mode. > > If that isnt implementet yet it would enjoy me > > to get information about the preprocess->loudness2 > > for example, or a function (if the lib contains one) that returns a > > value whitch equals to the overall > > loudness of a frame. > > > > So i can do some simple interactions with users > > whitch doesnt want to yell in their microphone > > for talking something. > > Other ones got headsets that record their breathing > > and anyone can listen to. > > This is not funny the whole day... > > > > Greets Lis > > > > ----- Original Message ----- From: "Jean-Marc Valin" > > <jean-marc.valin@usherbrooke.ca> > > To: "Lis" <lis@1234567890qwertzuiopasdfghjklyxcvbnm.de> > > Cc: <speex-dev@xiph.org> > > Sent: Thursday, March 02, 2006 2:35 AM > > Subject: Re: [Speex-dev] Voice Activation Level (speex 1.1.11.1) > > > > > >> Please define what you mean by "voice activation level". > >> > >> Jean-Marc > >> > >> On Thu, 2006-03-02 at 02:22 +0100, Lis wrote: > >> > >>> I havent had found anything in the documentation about voice > >>> activation levels. > >>> Does i can change a variable to change the accuracy for activations? > >>> > >>> If not does the speex lib already implement a function for read out > >>> the > >>> sound level of a frame? > >>> > >>> Thanks for the advance. > >>> > >>> Lis (Louis Hoefler) > >>> _______________________________________________ > >>> Speex-dev mailing list > >>> Speex-dev@xiph.org > >>> http://lists.xiph.org/mailman/listinfo/speex-dev > >> > >> > > _______________________________________________ > > Speex-dev mailing list > > Speex-dev@xiph.org > > http://lists.xiph.org/mailman/listinfo/speex-dev > > > > _______________________________________________ > Speex-dev mailing list > Speex-dev@xiph.org > http://lists.xiph.org/mailman/listinfo/speex-dev