Tom Grandgent
2008-Dec-15 15:32 UTC
[Speex-dev] preprocessor VAD only rocognize between silence andnot silence
Jesus, Unfortunately, FFT and magic algorithms don't work (yet?). You might want to try this if you're not satisfied with Speex VAD: http://lists.xiph.org/pipermail/speex-dev/2008-August/006860.html It won't perform any miracles, but I think it works pretty well and is easy to tweak. Tom>---- Original Message ---- >From: jmorion at toomeeting.com >To: speex-dev at xiph.org >Subject: Re: [Speex-dev] preprocessor VAD only rocognize between >silence andnot silence >Date: Mon, 15 Dec 2008 12:41:53 +0100 > >>Hi, i would like if someone has experienced the same problem and if >im >>using the preprocessor VAD correctly. >> >>A Voice Aactivity Detector is spected to detect human voice (using >FFT >>and magic algorithms) but it only works as an Activity Detector, >doesnt >>difference between voice and knoking the table. >> >>Is the VAD performance a tabu theme? >> >>Thank you. >> >> >> >>jesus escribi?: >>> Hello, >>> >>> in my project im using speex 1.2rc1 and the preprocessor VAD seems >to >>> only separate complete silence from not complete silence frames. >>> >>> The Speex Manual, you can read "The voice activity detector (VAD) >>> provided by the preprocessor is more advanced than the one >directly >>> provided in the codec." >>> >>> but if you go to the source code in preprocess.c line 995 "/* >FIXME: >>> This VAD is a kludge */" >>> >>> I've seem in te roadmap that you are testing a new VAD, but Im >not sure >>> if im doing something wrong with the current. >>> >>> Here is the code of the compressor: >>> >>> >>> >>> if (inicializado == false) >>> { >>> bits = new SpeexBits; >>> speex_bits_init(bits); >>> enc_state = speex_encoder_init(&speex_nb_mode); >>> >>> // obtenemos tama?o frame >>> speex_encoder_ctl(enc_state, SPEEX_GET_FRAME_SIZE, >&frame_size); >>> >>> // configuramos parametros >>> int complexity = 5; >>> speex_encoder_ctl(enc_state, SPEEX_SET_COMPLEXITY, >&complexity); >>> >>> int samplingrate = 8000; >>> speex_encoder_ctl(enc_state, SPEEX_SET_SAMPLING_RATE, >>> &samplingrate); >>> >>> int quality = 8; //Calidad relativa de 0 a 10 >>> speex_encoder_ctl(enc_state, SPEEX_SET_QUALITY, &quality); >>> >>> int dtx = 0; >>> speex_encoder_ctl(enc_state, SPEEX_SET_DTX, &dtx); >>> >>> int vbr = 0; >>> speex_encoder_ctl(enc_state, SPEEX_SET_VBR, &vbr); >>> >>> >>> // PREPROCESADOR >>> >>> pre_state = speex_preprocess_state_init(frame_size, >samplingrate); >>> >>> int denoise = 1; >>> speex_preprocess_ctl(pre_state, >SPEEX_PREPROCESS_SET_DENOISE, >>> &denoise); >>> >>> int pvad = 1; >>> speex_preprocess_ctl(pre_state, SPEEX_PREPROCESS_SET_VAD, >&pvad); >>> >>> int agc = 1; >>> speex_preprocess_ctl(pre_state, SPEEX_PREPROCESS_SET_AGC, >&agc); >>> >>> inicializado = true; >>> } >>> >>> __try { >>> tdestino = 0; >>> int frame_size; >>> speex_encoder_ctl(enc_state, SPEEX_GET_FRAME_SIZE, >&frame_size); >>> >>> int nbloques_sample = this->torigen/frame_size/2; >>> >>> speex_bits_reset(bits); >>> >>> int voces = 0; >>> for(int bloque=0;bloque<nbloques_sample;bloque++) >>> { >>> // como el preprocesador devuelve 1 o 0 si es voz o no >(por >>> el VAD), lo uso >>> // para saber si es silencio >>> voces += speex_preprocess(pre_state, >>> ((short*)origen+(bloque*frame_size)), NULL); >>> >>> // introducimos los datos a comprimir >>> int e = speex_encode_int(enc_state, >>> ((short*)origen+(bloque*frame_size)), bits); >>> } >>> >>> // extraemos la informacion comprimida >>> tdestino = speex_bits_write(bits, destino, 20000); >>> if(voces == 0) // no habia ningun frame de voz en el >bloque >>> es_silencio = true; >>> else >>> es_silencio = false; >>> >>> >>> Thank you. >>> >>> _______________________________________________ >>> Speex-dev mailing list >>> Speex-dev at xiph.org >>> http://lists.xiph.org/mailman/listinfo/speex-dev >>> >>> >> >>_______________________________________________ >>Speex-dev mailing list >>Speex-dev at xiph.org >>http://lists.xiph.org/mailman/listinfo/speex-dev >>