thr3ads.net - Speex dev - [Speex-dev] preprocessor VAD only rocognize between silence andnot silence [Dec 2008]

If this information is useful, please help other people find it:
Share via:

Tom Grandgent

2008-Dec-15 15:32 UTC

[Speex-dev] preprocessor VAD only rocognize between silence andnot silence

Jesus,

Unfortunately, FFT and magic algorithms don't work (yet?).  You 
might want to try this if you're not satisfied with Speex VAD:

http://lists.xiph.org/pipermail/speex-dev/2008-August/006860.html

It won't perform any miracles, but I think it works pretty well 
and is easy to tweak.

Tom
>---- Original Message ----
>From: jmorion at toomeeting.com
>To: speex-dev at xiph.org
>Subject: Re: [Speex-dev] preprocessor VAD only rocognize between
>silence andnot	silence
>Date: Mon, 15 Dec 2008 12:41:53 +0100
>
>>Hi, i would like if someone has experienced the same problem and if
>im 
>>using the preprocessor VAD correctly.
>>
>>A Voice Aactivity Detector is spected to detect human voice (using
>FFT 
>>and magic algorithms) but it only works as an Activity Detector,
>doesnt 
>>difference between voice and knoking the table.
>>
>>Is the VAD performance a tabu theme?
>>
>>Thank you.
>>
>>
>>
>>jesus escribi?:
>>> Hello,
>>>
>>> in my project im using speex 1.2rc1 and the preprocessor VAD seems
>to 
>>> only separate complete silence from not complete silence frames.
>>>
>>> The Speex Manual, you can read "The voice activity detector
(VAD)
>>> provided by the preprocessor is more advanced than the one
>directly 
>>> provided in the codec."
>>>
>>> but if you go to the source code in preprocess.c line 995 "/*
>FIXME: 
>>> This VAD is a kludge */"
>>>
>>> I've seem in te roadmap that you are testing a new VAD, but  Im
>not sure 
>>> if im doing something wrong with the current.
>>>
>>> Here is the code of the compressor:
>>>
>>>
>>>
>>>     if (inicializado == false)
>>>     {   
>>>         bits = new SpeexBits;
>>>         speex_bits_init(bits);
>>>         enc_state = speex_encoder_init(&speex_nb_mode);
>>>
>>>         // obtenemos tama?o frame
>>>         speex_encoder_ctl(enc_state, SPEEX_GET_FRAME_SIZE,
>&frame_size);
>>>
>>>         // configuramos parametros
>>>         int complexity = 5;
>>>         speex_encoder_ctl(enc_state, SPEEX_SET_COMPLEXITY,
>&complexity);
>>>
>>>         int samplingrate = 8000;
>>>         speex_encoder_ctl(enc_state, SPEEX_SET_SAMPLING_RATE, 
>>> &samplingrate);
>>>
>>>         int quality = 8; //Calidad relativa de 0 a 10
>>>         speex_encoder_ctl(enc_state, SPEEX_SET_QUALITY,
&quality);
>>>
>>>         int dtx = 0;
>>>         speex_encoder_ctl(enc_state, SPEEX_SET_DTX, &dtx);
>>>
>>>         int vbr = 0;
>>>         speex_encoder_ctl(enc_state, SPEEX_SET_VBR, &vbr);
>>>
>>>
>>>         // PREPROCESADOR
>>>    
>>>         pre_state = speex_preprocess_state_init(frame_size,
>samplingrate);
>>>
>>>         int denoise = 1;
>>>         speex_preprocess_ctl(pre_state,
>SPEEX_PREPROCESS_SET_DENOISE, 
>>> &denoise);
>>>
>>>         int pvad = 1;
>>>         speex_preprocess_ctl(pre_state, SPEEX_PREPROCESS_SET_VAD,
>&pvad);
>>>
>>>         int agc = 1;
>>>         speex_preprocess_ctl(pre_state, SPEEX_PREPROCESS_SET_AGC,
>&agc);
>>>
>>>         inicializado = true;
>>>     }
>>>
>>>     __try {
>>>         tdestino = 0;
>>>         int frame_size;
>>>         speex_encoder_ctl(enc_state, SPEEX_GET_FRAME_SIZE,
>&frame_size);
>>>
>>>         int nbloques_sample = this->torigen/frame_size/2;
>>>
>>>         speex_bits_reset(bits);
>>>
>>>         int voces = 0;
>>>         for(int bloque=0;bloque<nbloques_sample;bloque++)
>>>         {
>>>             // como el preprocesador devuelve 1 o 0 si es voz o no
>(por 
>>> el VAD), lo uso
>>>             // para saber si es silencio
>>>             voces += speex_preprocess(pre_state, 
>>> ((short*)origen+(bloque*frame_size)), NULL);
>>>
>>>             // introducimos los datos a comprimir
>>>             int e = speex_encode_int(enc_state, 
>>> ((short*)origen+(bloque*frame_size)), bits);
>>>         }
>>>
>>>         // extraemos la informacion comprimida
>>>         tdestino = speex_bits_write(bits, destino, 20000);
>>>         if(voces == 0) // no habia ningun frame de voz en el
>bloque
>>>             es_silencio = true;
>>>         else
>>>             es_silencio = false;
>>>
>>>
>>> Thank you.
>>>
>>> _______________________________________________
>>> Speex-dev mailing list
>>> Speex-dev at xiph.org
>>> http://lists.xiph.org/mailman/listinfo/speex-dev
>>>
>>>   
>>
>>_______________________________________________
>>Speex-dev mailing list
>>Speex-dev at xiph.org
>>http://lists.xiph.org/mailman/listinfo/speex-dev
>>

Speex dev - Dec 2008 - preprocessor VAD only rocognize between silence andnot silence

[Speex-dev] preprocessor VAD only rocognize between silence andnot silence

Apparently Analagous Threads

Wisdom of the Ancients