thr3ads.net - Speex dev - [Speex-dev] VAD Questions [Jun 2007]

If this information is useful, please help other people find it:
Share via:

Larry Gadallah

2007-Jun-07 13:55 UTC

[Speex-dev] VAD Questions

Hello all:

I am interested in using Speex for an application that streams audio
from a (noisy) source, so I am interested in VAD and DTX operation.
However, after browsing the archives of this list, I note that a
number of people have not been satisfied with the operation of the VAD
algorithm in Speex. This leads me to a few questions:

- Is there a reference somewhere (other than the source itself) that
explains how the latest VAD algorithm works?
- Is it possible to obtain the VAD status of a Speex stream
asynchronously? The current API seems to imply that some kind of
polling is required to determine the voice/non-voice status.
- Does the VAD algorithm implement syllabic/sonorant rate detection,
as has been implemented many times in analog circuitry, and is
described in this (and other) papers?
http://people.csail.mit.edu/jrg/2005/IS05_schutte.pdf
- Over what time period is VAD done? Is it done on a frame by frame
basis or over some longer period?

Thank you,
-- 
Larry Gadallah, VE6VQ/W7                          lgadallah AT gmail DOT com
PGP Sig: 616D 4E52 CF1F 3FEC FFFB  F11B 7DB9 C79A EA7E B25B

Jean-Marc Valin

2007-Jun-07 16:50 UTC

head link

[Speex-dev] VAD Questions

> - Is there a reference somewhere (other than the source itself) that
> explains how the latest VAD algorithm works?
Read the source, Luke :-) (sorry)
> - Is it possible to obtain the VAD status of a Speex stream
> asynchronously? The current API seems to imply that some kind of
> polling is required to determine the voice/non-voice status.
Don't understand your question. Also which VAD are you talking about?
The one in the encoder or the one in the preprocessor?
> - Does the VAD algorithm implement syllabic/sonorant rate detection,
> as has been implemented many times in analog circuitry, and is
> described in this (and other) papers?
> http://people.csail.mit.edu/jrg/2005/IS05_schutte.pdf
As far as I understand, the paper you reference above isn't applicable
to the problem here. Basically, we have to decide whether we have speech
or silence based only on 20 ms of audio (and the past). If we could
"look into the future" of the signals, things would be much easier.
> - Over what time period is VAD done? Is it done on a frame by frame
> basis or over some longer period?
It *has* to be done frame by frame, otherwise you add latency, which
isn't acceptable.

	Jean-Marc

Larry Gadallah

2007-Jun-08 08:13 UTC

head link

[Speex-dev] VAD Questions

Hello Jean-Marc et al:

On 07/06/07, Jean-Marc Valin <jean-marc.valin@usherbrooke.ca>
wrote:> > - Is there a reference somewhere (other than the source itself) that
> > explains how the latest VAD algorithm works?
>
> Read the source, Luke :-) (sorry)
Okay. I had to ask :-)
>
> > - Is it possible to obtain the VAD status of a Speex stream
> > asynchronously? The current API seems to imply that some kind of
> > polling is required to determine the voice/non-voice status.
>
> Don't understand your question. Also which VAD are you talking about?
> The one in the encoder or the one in the preprocessor?
Either one. The question is: If we treat the software like a black
box, and we feed in PCM audio, we get Speex encoded data out. Where is
the information that indicates whether the encoded data contains
speech or not? The API has a "get VAD status", but it seems like that
might only indicate whether VAD is currently enabled. Perhaps the VAD
status is contained somewhere in the data frames?
>
> > - Does the VAD algorithm implement syllabic/sonorant rate detection,
> > as has been implemented many times in analog circuitry, and is
> > described in this (and other) papers?
> > http://people.csail.mit.edu/jrg/2005/IS05_schutte.pdf
>
> As far as I understand, the paper you reference above isn't applicable
> to the problem here. Basically, we have to decide whether we have speech
> or silence based only on 20 ms of audio (and the past). If we could
> "look into the future" of the signals, things would be much
easier.
>
> > - Over what time period is VAD done? Is it done on a frame by frame
> > basis or over some longer period?
>
> It *has* to be done frame by frame, otherwise you add latency, which
> isn't acceptable.
Okay. What I was trying to determine was whether or not the speech
detection was done with something more sophisticated than frame
energy. As you said above, I'll have to look at the sources. For many
systems, sonorant energy rate detection is used to detect voice, even
under very poor SNR conditions.

Cheers,
-- 
Larry Gadallah, VE6VQ/W7                          lgadallah AT gmail DOT com
PGP Sig: 616D 4E52 CF1F 3FEC FFFB  F11B 7DB9 C79A EA7E B25B

Seemingly Similar Threads

Search for more seemingly similar threads

Speex dev - Jun 2007 - VAD Questions

[Speex-dev] VAD Questions

[Speex-dev] VAD Questions

[Speex-dev] VAD Questions

Seemingly Similar Threads