Thorvald Natvig
2005-Jun-20 10:16 UTC
[Speex-dev] Speech detection in preprocessor with echo
Echo cancellation works like a charm, but it seems to confuse the preprocessor a bit. If listening to background music (properly fed through the echo cancellator), the music is removed but the result is still detected as speech even if almost silence remains in the signal. Also, the AGC keeps adjusting to the minute remains in the signal, meaning that sooner or later it will amplify the remains enough that it's clearly audible on the other side. If I cough or say a word, the AGC readjusts and all is fine. Looking at the members of the speex_preprocess structure, I see that during these long periods of "silence" (only the background music or only the other end talking while I shut up): - Zlast (which looks like a SNR variable) is at 0.05-0.2, but jumps up above 1.0 if I actually say something. - loudness2 keeps decreasing from the "normal" of ~6000 to 1000 or so, at which point the residual echo is amplified enough that it's clearly audible at the other end. If I say something, it adjusts. - speech_prob is at 0.999 or 1.000 as long as the other end talks. This is all with up-to-date SVN version of speex, and in a fairly noisy environment (it's hot, so I have the window open, so passing cars on the nearby road are quite audible, as is my air cleaner). Is there something I can do to tune this away, a way to tell the AGC to never go that low, and a way to tell the speech detector that echo remains are not speech?
Jean-Marc Valin
2005-Jun-22 01:25 UTC
[Speex-dev] Speech detection in preprocessor with echo
This is mainly a problem with the VAD that wasn't designed to differentiate background speech (or echo residual) from foreground speech. It could probably be fixed by using the echo residual estimation from the VAD. Jean-Marc Le lundi 20 juin 2005 ? 19:04 +0200, Thorvald Natvig a ?crit :> Echo cancellation works like a charm, but it seems to confuse the > preprocessor a bit. > > If listening to background music (properly fed through the echo > cancellator), the music is removed but the result is still detected as > speech even if almost silence remains in the signal. > > Also, the AGC keeps adjusting to the minute remains in the signal, meaning > that sooner or later it will amplify the remains enough that it's clearly > audible on the other side. If I cough or say a word, the AGC readjusts and > all is fine. > > Looking at the members of the speex_preprocess structure, I see that > during these long periods of "silence" (only the background music or > only the other end talking while I shut up): > > - Zlast (which looks like a SNR variable) is at 0.05-0.2, but jumps up > above 1.0 if I actually say something. > - loudness2 keeps decreasing from the "normal" of ~6000 to 1000 or so, at > which point the residual echo is amplified enough that it's clearly > audible at the other end. If I say something, it adjusts. > - speech_prob is at 0.999 or 1.000 as long as the other end talks. > > This is all with up-to-date SVN version of speex, and in a fairly noisy > environment (it's hot, so I have the window open, so passing cars on the > nearby road are quite audible, as is my air cleaner). > > Is there something I can do to tune this away, a way to tell the AGC to > never go that low, and a way to tell the speech detector that echo remains > are not speech? > > _______________________________________________ > Speex-dev mailing list > Speex-dev@xiph.org > http://lists.xiph.org/mailman/listinfo/speex-dev