Hello Jean-Marc:
On 08/06/07, Jean-Marc Valin <jean-marc.valin@usherbrooke.ca>
wrote:> > Either one. The question is: If we treat the software like a black
> > box, and we feed in PCM audio, we get Speex encoded data out. Where is
> > the information that indicates whether the encoded data contains
> > speech or not? The API has a "get VAD status", but it seems
like that
> > might only indicate whether VAD is currently enabled. Perhaps the VAD
> > status is contained somewhere in the data frames?
>
> Look at the return value of either speex_encode() or
speex_preprocess_run().
OK. Thanks.
>
> > Okay. What I was trying to determine was whether or not the speech
> > detection was done with something more sophisticated than frame
> > energy. As you said above, I'll have to look at the sources. For
many
> > systems, sonorant energy rate detection is used to detect voice, even
> > under very poor SNR conditions.
>
> I *do* use more than the frame energy. I use the pitch and (IIRC) one of
> two other things. However, it's still *very* hard to do any sort of
good
> detection based only on 20 ms. Give me 1 second of latency and it would
> be *much* easier -- though completely useless.
While I can agree with this if you are dealing with real-time, full
duplex links, for my application (non-real-time, half-duplex), the
latency has no effect at all. Do you know of anyone else who has
implemented some post-processing software to provide more "exotic"
speech detection, even at the expense of increased latency?
Cheers,
--
Larry Gadallah, VE6VQ/W7 lgadallah AT gmail DOT com
PGP Sig: 616D 4E52 CF1F 3FEC FFFB F11B 7DB9 C79A EA7E B25B