thr3ads.net - Speex dev - [Speex-dev] VAD Questions [Jun 2007]

If this information is useful, please help other people find it:
Share via:

Larry Gadallah

2007-Jun-08 10:10 UTC

[Speex-dev] VAD Questions

Hello Jean-Marc:

On 08/06/07, Jean-Marc Valin <jean-marc.valin@usherbrooke.ca>
wrote:> > Either one. The question is: If we treat the software like a black
> > box, and we feed in PCM audio, we get Speex encoded data out. Where is
> > the information that indicates whether the encoded data contains
> > speech or not? The API has a "get VAD status", but it seems
like that
> > might only indicate whether VAD is currently enabled. Perhaps the VAD
> > status is contained somewhere in the data frames?
>
> Look at the return value of either speex_encode() or
speex_preprocess_run().
OK. Thanks.
>
> > Okay. What I was trying to determine was whether or not the speech
> > detection was done with something more sophisticated than frame
> > energy. As you said above, I'll have to look at the sources. For
many
> > systems, sonorant energy rate detection is used to detect voice, even
> > under very poor SNR conditions.
>
> I *do* use more than the frame energy. I use the pitch and (IIRC) one of
> two other things. However, it's still *very* hard to do any sort of
good
> detection based only on 20 ms. Give me 1 second of latency and it would
> be *much* easier -- though completely useless.
While I can agree with this if you are dealing with real-time, full
duplex links, for my application (non-real-time, half-duplex), the
latency has no effect at all. Do you know of anyone else who has
implemented some post-processing software to provide more "exotic"
speech detection, even at the expense of increased latency?

Cheers,
-- 
Larry Gadallah, VE6VQ/W7                          lgadallah AT gmail DOT com
PGP Sig: 616D 4E52 CF1F 3FEC FFFB  F11B 7DB9 C79A EA7E B25B

Steve Kann

2007-Jun-08 11:07 UTC

head link

[Speex-dev] VAD Questions

Larry Gadallah wrote:>
> While I can agree with this if you are dealing with real-time, full
> duplex links, for my application (non-real-time, half-duplex), the
> latency has no effect at all. Do you know of anyone else who has
> implemented some post-processing software to provide more
"exotic"
> speech detection, even at the expense of increased latency?
I'd look at the speech-to-text implementations for this -- I think CMU 
Sphinx has done something like this.

-SteveK

Larry Gadallah

2007-Jun-08 11:31 UTC

head link

[Speex-dev] VAD Questions

On 08/06/07, Steve Kann <stevek@stevek.com> wrote:>
> I'd look at the speech-to-text implementations for this -- I think CMU
> Sphinx has done something like this.
>Thanks. I had a look at their web pages, and the Sphinx software looks
interesting, but I was unable to determine if there is a "hook" in
their system to allow simple speech _detection_ rather than
recognition, which is what I am looking for.

Cheers,
-- 
Larry Gadallah, VE6VQ/W7                          lgadallah AT gmail DOT com
PGP Sig: 616D 4E52 CF1F 3FEC FFFB  F11B 7DB9 C79A EA7E B25B

Speex dev - Jun 2007 - VAD Questions

[Speex-dev] VAD Questions

[Speex-dev] VAD Questions

[Speex-dev] VAD Questions

Seemingly Similar Threads