jesus
2008-Dec-11 18:18 UTC
[Speex-dev] preprocessor VAD only rocognize between silence and not silence
Hello, in my project im using speex 1.2rc1 and the preprocessor VAD seems to only separate complete silence from not complete silence frames. The Speex Manual, you can read "The voice activity detector (VAD) provided by the preprocessor is more advanced than the one directly provided in the codec." but if you go to the source code in preprocess.c line 995 "/* FIXME: This VAD is a kludge */" I've seem in te roadmap that you are testing a new VAD, but Im not sure if im doing something wrong with the current. Here is the code of the compressor: if (inicializado == false) { bits = new SpeexBits; speex_bits_init(bits); enc_state = speex_encoder_init(&speex_nb_mode); // obtenemos tama?o frame speex_encoder_ctl(enc_state, SPEEX_GET_FRAME_SIZE, &frame_size); // configuramos parametros int complexity = 5; speex_encoder_ctl(enc_state, SPEEX_SET_COMPLEXITY, &complexity); int samplingrate = 8000; speex_encoder_ctl(enc_state, SPEEX_SET_SAMPLING_RATE, &samplingrate); int quality = 8; //Calidad relativa de 0 a 10 speex_encoder_ctl(enc_state, SPEEX_SET_QUALITY, &quality); int dtx = 0; speex_encoder_ctl(enc_state, SPEEX_SET_DTX, &dtx); int vbr = 0; speex_encoder_ctl(enc_state, SPEEX_SET_VBR, &vbr); // PREPROCESADOR pre_state = speex_preprocess_state_init(frame_size, samplingrate); int denoise = 1; speex_preprocess_ctl(pre_state, SPEEX_PREPROCESS_SET_DENOISE, &denoise); int pvad = 1; speex_preprocess_ctl(pre_state, SPEEX_PREPROCESS_SET_VAD, &pvad); int agc = 1; speex_preprocess_ctl(pre_state, SPEEX_PREPROCESS_SET_AGC, &agc); inicializado = true; } __try { tdestino = 0; int frame_size; speex_encoder_ctl(enc_state, SPEEX_GET_FRAME_SIZE, &frame_size); int nbloques_sample = this->torigen/frame_size/2; speex_bits_reset(bits); int voces = 0; for(int bloque=0;bloque<nbloques_sample;bloque++) { // como el preprocesador devuelve 1 o 0 si es voz o no (por el VAD), lo uso // para saber si es silencio voces += speex_preprocess(pre_state, ((short*)origen+(bloque*frame_size)), NULL); // introducimos los datos a comprimir int e = speex_encode_int(enc_state, ((short*)origen+(bloque*frame_size)), bits); } // extraemos la informacion comprimida tdestino = speex_bits_write(bits, destino, 20000); if(voces == 0) // no habia ningun frame de voz en el bloque es_silencio = true; else es_silencio = false; Thank you.
jesus
2008-Dec-15 11:41 UTC
[Speex-dev] preprocessor VAD only rocognize between silence and not silence
Hi, i would like if someone has experienced the same problem and if im using the preprocessor VAD correctly. A Voice Aactivity Detector is spected to detect human voice (using FFT and magic algorithms) but it only works as an Activity Detector, doesnt difference between voice and knoking the table. Is the VAD performance a tabu theme? Thank you. jesus escribi?:> Hello, > > in my project im using speex 1.2rc1 and the preprocessor VAD seems to > only separate complete silence from not complete silence frames. > > The Speex Manual, you can read "The voice activity detector (VAD) > provided by the preprocessor is more advanced than the one directly > provided in the codec." > > but if you go to the source code in preprocess.c line 995 "/* FIXME: > This VAD is a kludge */" > > I've seem in te roadmap that you are testing a new VAD, but Im not sure > if im doing something wrong with the current. > > Here is the code of the compressor: > > > > if (inicializado == false) > { > bits = new SpeexBits; > speex_bits_init(bits); > enc_state = speex_encoder_init(&speex_nb_mode); > > // obtenemos tama?o frame > speex_encoder_ctl(enc_state, SPEEX_GET_FRAME_SIZE, &frame_size); > > // configuramos parametros > int complexity = 5; > speex_encoder_ctl(enc_state, SPEEX_SET_COMPLEXITY, &complexity); > > int samplingrate = 8000; > speex_encoder_ctl(enc_state, SPEEX_SET_SAMPLING_RATE, > &samplingrate); > > int quality = 8; //Calidad relativa de 0 a 10 > speex_encoder_ctl(enc_state, SPEEX_SET_QUALITY, &quality); > > int dtx = 0; > speex_encoder_ctl(enc_state, SPEEX_SET_DTX, &dtx); > > int vbr = 0; > speex_encoder_ctl(enc_state, SPEEX_SET_VBR, &vbr); > > > // PREPROCESADOR > > pre_state = speex_preprocess_state_init(frame_size, samplingrate); > > int denoise = 1; > speex_preprocess_ctl(pre_state, SPEEX_PREPROCESS_SET_DENOISE, > &denoise); > > int pvad = 1; > speex_preprocess_ctl(pre_state, SPEEX_PREPROCESS_SET_VAD, &pvad); > > int agc = 1; > speex_preprocess_ctl(pre_state, SPEEX_PREPROCESS_SET_AGC, &agc); > > inicializado = true; > } > > __try { > tdestino = 0; > int frame_size; > speex_encoder_ctl(enc_state, SPEEX_GET_FRAME_SIZE, &frame_size); > > int nbloques_sample = this->torigen/frame_size/2; > > speex_bits_reset(bits); > > int voces = 0; > for(int bloque=0;bloque<nbloques_sample;bloque++) > { > // como el preprocesador devuelve 1 o 0 si es voz o no (por > el VAD), lo uso > // para saber si es silencio > voces += speex_preprocess(pre_state, > ((short*)origen+(bloque*frame_size)), NULL); > > // introducimos los datos a comprimir > int e = speex_encode_int(enc_state, > ((short*)origen+(bloque*frame_size)), bits); > } > > // extraemos la informacion comprimida > tdestino = speex_bits_write(bits, destino, 20000); > if(voces == 0) // no habia ningun frame de voz en el bloque > es_silencio = true; > else > es_silencio = false; > > > Thank you. > > _______________________________________________ > Speex-dev mailing list > Speex-dev at xiph.org > http://lists.xiph.org/mailman/listinfo/speex-dev > >