thr3ads.net - Speex dev - [Speex-dev] Speech detection in preprocessor with echo [Jun 2005]

If this information is useful, please help other people find it:
Share via:

Tom Grandgent

2005-Jun-22 07:46 UTC

[Speex-dev] Speech detection in preprocessor with echo

agc_gain seemed to fit with the idea of what I wanted to do, it was 
easy to understand its units and behavior, and freezing it produced 
the desired results.  Also I wanted to cap it, so that's done at the 
same place, and that definitely works.

All I want to do is be able to freeze AGC adaptation and put an 
upper bound on the AGC (for example, 2x amplification).  Both of 
these things seem necessary in a real-world app because:

1) AGC gain should not increase when speech is not detected.  If it 
does, then it will inevitably rise during periods of inactivity on 
the part of the speaker, and then background sounds will be end up 
being amplified too much and detected as speech.  This is a problem 
regardless of echo.

2) The upper bound is necessary in some situations when VAD is not 
sufficient to distinguish between desired and undesired sounds.  
For example, consider a person using a headset and communicating 
infrequently while constantly using a nearby and noisy peripheral 
such as a force-feedback steering wheel.  Noises from the wheel are 
going to get picked up and detected as speech, but they usually 
won't be as loud as speech.  By capping the AGC at the right level, 
it's possible to prevent the AGC from amplifying the wheel noises 
too much while still allowing it to do its job for the speech.

I see now that st->loudness2 is also used in the VAD.  Maybe this 
explains some problems I was having. :)  I'll have to give the 
preprocessor's VAD another try now that I'm aware of this.

So, do you think it's better to use st->loudness2 for both freezing 
and capping the AGC?

Tom

Jean-Marc Valin <Jean-Marc.Valin@USherbrooke.ca>
wrote:> 
> Just curious, why are you freezing agc_gain instead of freezing
> st->loudness2 ?
> 
> Jean-Marc
> 
> 
> Le lundi 20 juin 2005 ? 14:40 -0400, Tom Grandgent a ?crit : 
> > I think you'll have to modify Speex to get the functionality
you're
> > looking for.  I've made a few simple modifications to the AGC to
prevent
> > it from 1) exceeding a specified level of amplification and 2) enable 
> > and disable adaptation, so I can freeze it at a certain level while 
> > speech is not detected.  It's mostly just a matter of doing this
at the
> > end of speex_compute_agc():
> > 
> >    if (!st->agc_frozen)
> >    {
> > 	   agc_gain = st->agc_level/st->loudness2;
> > 	   /*fprintf (stderr, "%f %f %f %f\n", active_bands,
st->loudness, st->loudness2, agc_gain);*/
> > 	   if (agc_gain>st->agc_max_gain)	/* was 200 */
> > 		   agc_gain = st->agc_max_gain;	/* was 200*/
> >    }
> >    else
> > 	   agc_gain = st->agc_gain;
> >    st->agc_gain = agc_gain;
> > 
> > and adding a few items to speex_preprocess_ctl() and the state struct.
> > (I control these things at the application level.. you may wish to 
> > control them from within the preprocessor if you're using the 
> > preprocessor's VAD.)
> > 
> > Anyway, if you can figure out what's going on with the variables
you
> > named, I'm sure you can make the necessary modifications to do
what
> > you've asked for.  I think the preprocessor in general needs a
little
> > tweaking like this to work well in various real-world situations, but 
> > I'm not sure how much of this Jean-Marc wants to incorporate into 
> > Speex vs. leave to application developers.
> > 
> > Tom
> > 
> > Thorvald Natvig <speex@natvig.com> wrote:
> > > 
> > > 
> > > Echo cancellation works like a charm, but it seems to confuse the
> > > preprocessor a bit.
> > > 
> > > If listening to background music (properly fed through the echo 
> > > cancellator), the music is removed but the result is still
detected as
> > > speech even if almost silence remains in the signal.
> > > 
> > > Also, the AGC keeps adjusting to the minute remains in the
signal, meaning
> > > that sooner or later it will amplify the remains enough that
it's clearly
> > > audible on the other side. If I cough or say a word, the AGC
readjusts and
> > > all is fine.
> > > 
> > > Looking at the members of the speex_preprocess structure, I see
that
> > > during these long periods of "silence" (only the
background music or
> > > only the other end talking while I shut up):
> > > 
> > > - Zlast (which looks like a SNR variable) is at 0.05-0.2, but
jumps up
> > >    above 1.0 if I actually say something.
> > > - loudness2 keeps decreasing from the "normal" of ~6000
to 1000 or so, at
> > >    which point the residual echo is amplified enough that
it's clearly
> > >    audible at the other end. If I say something, it adjusts.
> > > - speech_prob is at 0.999 or 1.000 as long as the other end
talks.
> > > 
> > > This is all with up-to-date SVN version of speex, and in a fairly
noisy
> > > environment (it's hot, so I have the window open, so passing
cars on the
> > > nearby road are quite audible, as is my air cleaner).
> > > 
> > > Is there something I can do to tune this away, a way to tell the
AGC to
> > > never go that low, and a way to tell the speech detector that
echo remains
> > > are not speech?
> > > 
> > > _______________________________________________
> > > Speex-dev mailing list
> > > Speex-dev@xiph.org
> > > http://lists.xiph.org/mailman/listinfo/speex-dev
> > 
> > _______________________________________________
> > Speex-dev mailing list
> > Speex-dev@xiph.org
> > http://lists.xiph.org/mailman/listinfo/speex-dev
> 
> 
>

Jean-Marc Valin

2005-Jun-22 16:13 UTC

head link

[Speex-dev] Speech detection in preprocessor with echo

The main advantage I see in freezing st->loudness2 is that if you
unfreeze it, then the transition will be gradual, whereas if you
unfreeze agc_gain, then it will jump to the new value directly. I have
no idea what the freezing will do to the VAD though.

	Jean-Marc

Le mercredi 22 juin 2005 ? 10:46 -0400, Tom Grandgent a ?crit
:> agc_gain seemed to fit with the idea of what I wanted to do, it was 
> easy to understand its units and behavior, and freezing it produced 
> the desired results.  Also I wanted to cap it, so that's done at the 
> same place, and that definitely works.
> 
> All I want to do is be able to freeze AGC adaptation and put an 
> upper bound on the AGC (for example, 2x amplification).  Both of 
> these things seem necessary in a real-world app because:
> 
> 1) AGC gain should not increase when speech is not detected.  If it 
> does, then it will inevitably rise during periods of inactivity on 
> the part of the speaker, and then background sounds will be end up 
> being amplified too much and detected as speech.  This is a problem 
> regardless of echo.
> 
> 2) The upper bound is necessary in some situations when VAD is not 
> sufficient to distinguish between desired and undesired sounds.  
> For example, consider a person using a headset and communicating 
> infrequently while constantly using a nearby and noisy peripheral 
> such as a force-feedback steering wheel.  Noises from the wheel are 
> going to get picked up and detected as speech, but they usually 
> won't be as loud as speech.  By capping the AGC at the right level, 
> it's possible to prevent the AGC from amplifying the wheel noises 
> too much while still allowing it to do its job for the speech.
> 
> I see now that st->loudness2 is also used in the VAD.  Maybe this 
> explains some problems I was having. :)  I'll have to give the 
> preprocessor's VAD another try now that I'm aware of this.
> 
> So, do you think it's better to use st->loudness2 for both freezing 
> and capping the AGC?
> 
> Tom
> 
> Jean-Marc Valin <Jean-Marc.Valin@USherbrooke.ca> wrote:
> > 
> > Just curious, why are you freezing agc_gain instead of freezing
> > st->loudness2 ?
> > 
> > Jean-Marc
> > 
> > 
> > Le lundi 20 juin 2005 ? 14:40 -0400, Tom Grandgent a ?crit : 
> > > I think you'll have to modify Speex to get the functionality
you're
> > > looking for.  I've made a few simple modifications to the AGC
to prevent
> > > it from 1) exceeding a specified level of amplification and 2)
enable
> > > and disable adaptation, so I can freeze it at a certain level
while
> > > speech is not detected.  It's mostly just a matter of doing
this at the
> > > end of speex_compute_agc():
> > > 
> > >    if (!st->agc_frozen)
> > >    {
> > > 	   agc_gain = st->agc_level/st->loudness2;
> > > 	   /*fprintf (stderr, "%f %f %f %f\n", active_bands,
st->loudness, st->loudness2, agc_gain);*/
> > > 	   if (agc_gain>st->agc_max_gain)	/* was 200 */
> > > 		   agc_gain = st->agc_max_gain;	/* was 200*/
> > >    }
> > >    else
> > > 	   agc_gain = st->agc_gain;
> > >    st->agc_gain = agc_gain;
> > > 
> > > and adding a few items to speex_preprocess_ctl() and the state
struct.
> > > (I control these things at the application level.. you may wish
to
> > > control them from within the preprocessor if you're using the
> > > preprocessor's VAD.)
> > > 
> > > Anyway, if you can figure out what's going on with the
variables you
> > > named, I'm sure you can make the necessary modifications to
do what
> > > you've asked for.  I think the preprocessor in general needs
a little
> > > tweaking like this to work well in various real-world situations,
but
> > > I'm not sure how much of this Jean-Marc wants to incorporate
into
> > > Speex vs. leave to application developers.
> > > 
> > > Tom
> > > 
> > > Thorvald Natvig <speex@natvig.com> wrote:
> > > > 
> > > > 
> > > > Echo cancellation works like a charm, but it seems to
confuse the
> > > > preprocessor a bit.
> > > > 
> > > > If listening to background music (properly fed through the
echo
> > > > cancellator), the music is removed but the result is still
detected as
> > > > speech even if almost silence remains in the signal.
> > > > 
> > > > Also, the AGC keeps adjusting to the minute remains in the
signal, meaning
> > > > that sooner or later it will amplify the remains enough that
it's clearly
> > > > audible on the other side. If I cough or say a word, the AGC
readjusts and
> > > > all is fine.
> > > > 
> > > > Looking at the members of the speex_preprocess structure, I
see that
> > > > during these long periods of "silence" (only the
background music or
> > > > only the other end talking while I shut up):
> > > > 
> > > > - Zlast (which looks like a SNR variable) is at 0.05-0.2,
but jumps up
> > > >    above 1.0 if I actually say something.
> > > > - loudness2 keeps decreasing from the "normal" of
~6000 to 1000 or so, at
> > > >    which point the residual echo is amplified enough that
it's clearly
> > > >    audible at the other end. If I say something, it adjusts.
> > > > - speech_prob is at 0.999 or 1.000 as long as the other end
talks.
> > > > 
> > > > This is all with up-to-date SVN version of speex, and in a
fairly noisy
> > > > environment (it's hot, so I have the window open, so
passing cars on the
> > > > nearby road are quite audible, as is my air cleaner).
> > > > 
> > > > Is there something I can do to tune this away, a way to tell
the AGC to
> > > > never go that low, and a way to tell the speech detector
that echo remains
> > > > are not speech?
> > > > 
> > > > _______________________________________________
> > > > Speex-dev mailing list
> > > > Speex-dev@xiph.org
> > > > http://lists.xiph.org/mailman/listinfo/speex-dev
> > > 
> > > _______________________________________________
> > > Speex-dev mailing list
> > > Speex-dev@xiph.org
> > > http://lists.xiph.org/mailman/listinfo/speex-dev
> > 
> > 
> > 
>

Seemingly Similar Threads

Search for more maybe matching threads

Speex dev - Jun 2005 - Speech detection in preprocessor with echo

[Speex-dev] Speech detection in preprocessor with echo

[Speex-dev] Speech detection in preprocessor with echo

Seemingly Similar Threads