Jean-Marc Valin wrote:>>Reverberation suppression? >> >> > >Basically, it means that if you are in a room with lots of echo (long >decay), I can reduce it a bit. > > > >>I guess this would help reduce local source echoes? I've never >>_noticed_ that to be a problem in my use, but I would imagine that >>using a notebook's built-in microphone, you'd get some echo off of the >>screen and stuff [also from the whole room].. >> >>Most of these echoes aren't so bad, but I guess they might make the >>encoding job harder. I'd sure rather see the echo cancellation >>finished [not that I have any say on what you work on!!!]. >> >> > >Well, I'm still looking for help :) > > > >>Here's the numbers I got doing vad on 655 seconds of audio (about half >>is speech, half is absolute silence [0's]). >>P3-600: 25 seconds >>Athlon XP 1700+ (1.45Ghz): 5 seconds >>P4 2.8Ghz: 8.8 seconds. >> >> > >These numbers sound like a problem I has a while ago with the decoder. >The VAD shouldn't take much CPU so I suspect there might be floating >point underflows in some part, slowing down the Intel CPUs a lot (for >some reason, the AMD CPUs seem to handle underflows faster). > >Hmm, How can I find that out? How much CPU would you expect it to take? I've been playing with oprofile, but I don't see it getting that finely grained..>>Anyway, I think I might need to find a less computationally intensive >>VAD solution for the conference. VAD is currently only used when >>people connect via the PSTN, so they presumably have a decent SNR, and >>I may be able to get away with an energy envelope type of thing, >>without needing frequency domain analysis. But before I go and start >>coding this, is there any simple optimizations that can be done to the >>preprocessor when it is being used only for the VAD decision? >> >> > >Have you tried using the (less accurate) VAD that's in the codec itself >(SPEEX_SET_VAD)? > >I'll take a look at that. In this case [in the conferencing application], I'm not actually using speex encoding [these are PSTN callers, I do VAD in clients when I control them], so I'd need to see if I could rip it out of speex to use it. Also, I do have a couple of patches to the preprocessor to send along actually; basically this makes the start and continue probabilities parameters that can be set by callers. We're currently using very low probabilities; Much lower than your defaults, VAD_START=0.05 VAD_CONTINUE=0.02. We also have 20 frame (2/5 sec) "tail" that is outside the preprocessor, which continues treating some frames as speech after the detector has dropped out. <p>Here's a patch: <p>================================================= <p>Diff for file preprocess.c, 1.2 -> 1.3 Index: preprocess.c ==================================================================RCS file: /home/UniServ/dls/CVS/hms/app_conference/libspeex/preprocess.c,v retrieving revision 1.2 retrieving revision 1.3 diff -u -w -r1.2 -r1.3 --- preprocess.c 2003/11/07 23:40:23 1.2 +++ preprocess.c 2004/02/06 17:10:24 1.3 @@ -145,6 +145,9 @@ st->agc_level = 8000; st->vad_enabled = 0; + st->speech_prob_start = SPEEX_PROB_START ; + st->speech_prob_continue = SPEEX_PROB_CONTINUE ; + st->frame = (float*)speex_alloc(2*N*sizeof(float)); st->ps = (float*)speex_alloc(N*sizeof(float)); st->gain2 = (float*)speex_alloc(N*sizeof(float)); @@ -435,12 +438,19 @@ st->speech_prob = p0/(1e-25+p1+p0); /*fprintf (stderr, "%f %f %f ", tot_loudness, st->loudness2, st->speech_prob);*/ + /* decide if frame is speech using speech probability settings */ + /* if (st->speech_prob> .35 || (st->last_speech < 20 && st->speech_prob>.1)) */ - if (st->speech_prob> .20 || (st->last_speech < 20 && st->speech_prob>.05)) + if ( + st->speech_prob > st->speech_prob_start + || ( st->last_speech < 20 && st->speech_prob > st->speech_prob_continue ) + ) { is_speech = 1; st->last_speech = 0; - } else { + } + else + { st->last_speech++; if (st->last_speech<20) is_speech = 1; @@ -985,6 +995,30 @@ case SPEEX_PREPROCESS_GET_VAD: (*(int*)ptr) = st->vad_enabled; break; + + case SPEEX_PREPROCESS_SET_PROB_START: + st->speech_prob_start = (*(float*)ptr) ; + if ( st->speech_prob_start > 1 ) + st->speech_prob_start = st->speech_prob_start / 100 ; + if ( st->speech_prob_start > 1 || st->speech_prob_start < 0 ) + st->speech_prob_start = SPEEX_PROB_START ; + break ; + case SPEEX_PREPROCESS_GET_PROB_START: + (*(float*)ptr) = st->speech_prob_start ; + break ; + + case SPEEX_PREPROCESS_SET_PROB_CONTINUE: + st->speech_prob_continue = (*(float*)ptr) ; + if ( st->speech_prob_continue > 1 ) + st->speech_prob_continue = st->speech_prob_continue / 100 ; + if ( st->speech_prob_continue > 1 || st->speech_prob_continue < 0 ) + st->speech_prob_continue = SPEEX_PROB_CONTINUE ; + break ; + break ; + case SPEEX_PREPROCESS_GET_PROB_CONTINUE: + (*(float*)ptr) = st->speech_prob_continue ; + break ; + default: speex_warning_int("Unknown speex_preprocess_ctl request: ", request); return -1; Diff for file speex_preprocess.h, 1.1 -> 1.2 Index: speex_preprocess.h ==================================================================RCS file: /home/UniServ/dls/CVS/hms/app_conference/libspeex/speex_preprocess.h,v retrieving revision 1.1 retrieving revision 1.2 diff -u -w -r1.1 -r1.2 --- speex_preprocess.h 2003/11/06 21:57:59 1.1 +++ speex_preprocess.h 2004/02/06 17:10:24 1.2 @@ -49,6 +49,10 @@ float agc_level; int vad_enabled; + // probabilities to check speech_prob against + float speech_prob_start ; + float speech_prob_continue ; + float *frame; /**< Processing frame (2*ps_size) */ float *ps; /**< Current power spectrum */ float *gain2; /**< Adjusted gains */ @@ -108,8 +112,9 @@ /** Used like the ioctl function to control the preprocessor parameters */ int speex_preprocess_ctl(SpeexPreprocessState *st, int request, void *ptr); - +#define SPEEX_PROB_START 0.35 +#define SPEEX_PROB_CONTINUE 0.1 #define SPEEX_PREPROCESS_SET_DENOISE 0 #define SPEEX_PREPROCESS_GET_DENOISE 1 @@ -122,6 +127,12 @@ #define SPEEX_PREPROCESS_SET_AGC_LEVEL 6 #define SPEEX_PREPROCESS_GET_AGC_LEVEL 7 + +#define SPEEX_PREPROCESS_SET_PROB_START 8 +#define SPEEX_PREPROCESS_GET_PROB_START 9 + +#define SPEEX_PREPROCESS_SET_PROB_CONTINUE 10 +#define SPEEX_PREPROCESS_GET_PROB_CONTINUE 11 #ifdef __cplusplus ================================================= } <p><p><p><p><p><p>> Jean-Marc> > ><p>--- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'speex-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.
Jean-Marc Valin
2004-Aug-06 15:02 UTC
[speex-dev] Memory leak in denoiser + a few questions
> Hmm, How can I find that out? How much CPU would you expect it to > take?I don't know. It's been a while since I last played with that code, but I'd expect it to take less time.> I've been playing with oprofile, but I don't see it getting that > finely grained..Can you make sure the time is spent in the VAD and not in the encoder or decoder (at the other end) when the VAD is on (the underflow problem I had appeared with VBR, but the problem was in the decoder).> I'll take a look at that. In this case [in the conferencing > application], I'm not actually using speex encoding [these are PSTN > callers, I do VAD in clients when I control them], so I'd need to see > if I could rip it out of speex to use it.Don't waste too much time, though. That VAD is really basic.> Also, I do have a couple of patches to the preprocessor to send along > actually; basically this makes the start and continue probabilities > parameters that can be set by callers. We're currently using very low > probabilities; Much lower than your defaults, VAD_START=0.05 > VAD_CONTINUE=0.02. We also have 20 frame (2/5 sec) "tail" that is > outside the preprocessor, which continues treating some frames as > speech after the detector has dropped out.That's the same patch you sent a while ago, right? Sorry, I haven't had much time for Speex lately. Jean-Marc -- Jean-Marc Valin http://www.xiph.org/~jm/ LABORIUS Université de Sherbrooke, Québec, Canada -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 190 bytes Desc: Ceci est une partie de message numériquement signée. Url : http://lists.xiph.org/pipermail/speex-dev/attachments/20040330/829fc57b/signature-0001.pgp
Jean-Marc Valin wrote:>>Hmm, How can I find that out? How much CPU would you expect it to >>take? >> >> > >I don't know. It's been a while since I last played with that code, but >I'd expect it to take less time. > > > >>I've been playing with oprofile, but I don't see it getting that >>finely grained.. >> >> > >Can you make sure the time is spent in the VAD and not in the encoder or >decoder (at the other end) when the VAD is on (the underflow problem I >had appeared with VBR, but the problem was in the decoder). > >The tests I did to determine VAD CPU usage were pretty basic, and I'm sure it was just VAD being used. I made a small test program which reads a bunch of audio data into a buffer, and then does: /* speex implementation */ { SpeexPreprocessState *dsp = speex_preprocess_state_init( AST_CONF_BLOCK_SAMPLES, AST_CONF_SAMPLE_RATE ) ; int set; set = 1; speex_preprocess_ctl( dsp, SPEEX_PREPROCESS_SET_VAD, &set ) ; set = 0; speex_preprocess_ctl( dsp, SPEEX_PREPROCESS_SET_DENOISE, &set ) ; speex_preprocess_ctl( dsp, SPEEX_PREPROCESS_SET_AGC, &set ) ; while(reps-- > 0) { int i; printf("beginning pass\n"); for(i=0;i<BUFSAMP;i+=AST_CONF_BLOCK_SAMPLES) { speex_preprocess(dsp, audbuf+i, NULL); } } } The sample data I used was 1024x1024 samples, and I went through it 5 times (nreps = 5, BUFSAMP = 1024*1024). The data in the buffer is 537760 samples of speech, with the rest being zeros.>>I'll take a look at that. In this case [in the conferencing >>application], I'm not actually using speex encoding [these are PSTN >>callers, I do VAD in clients when I control them], so I'd need to see >>if I could rip it out of speex to use it. >> >> > >Don't waste too much time, though. That VAD is really basic. > >I might be able to get by, in this application, with something more basic, though. In my limited testing, I seem to remember getting SNR from PSTN clients which was _much_ better than that from microphones on PCs. I'd like to be able to handle sume number of hundreds of calls in the conference, with up to maybe 100 of them being processed by the VAD. Right now, the VAD is the dominant part of the conference [encoding and decoding are actually smaller, because the channels which do encoding/decoding do VAD on the client end, and the channels that do need VAD are ulaw encoded. <p>>>Also, I do have a couple of patches to the preprocessor to send along>>actually; basically this makes the start and continue probabilities >>parameters that can be set by callers. We're currently using very low >>probabilities; Much lower than your defaults, VAD_START=0.05 >>VAD_CONTINUE=0.02. We also have 20 frame (2/5 sec) "tail" that is >>outside the preprocessor, which continues treating some frames as >>speech after the detector has dropped out. >> >> > >That's the same patch you sent a while ago, right? Sorry, I haven't had >much time for Speex lately. > >I already sent that? I forgot :) We're all busy! <p><p>--- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'speex-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.
Hi, I've been testing speex echo cancellation , but i only obtain the echo amplified :-( I'm using a reference signal ref(n)=signal1(n), and an echo signal echo(n)=signal2(n)+0.2*signal1(n-10). Somebody is using it? How can i test it? Thank you very much. G. --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'speex-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.
Jean-Marc Valin
2004-Aug-06 15:02 UTC
[speex-dev] Memory leak in denoiser + a few questions
> > These numbers sound like a problem I has a while ago with the decoder. > > The VAD shouldn't take much CPU so I suspect there might be floating > > point underflows in some part, slowing down the Intel CPUs a lot (for > > some reason, the AMD CPUs seem to handle underflows faster). > > > > Hmm, How can I find that out? How much CPU would you expect it to > take?I just did a quick check and you shouldn't even notice the amount of CPU the VAD takes. The problem is most likely caused by floating point exceptions of some sort, either NaN's, denorms, or underflows. I've never encountered the problem, so it's likely specific to the kind of data you process. Can you try pinpointing the problem a bit and then send me a sample file? Jean-Marc -- Jean-Marc Valin http://www.xiph.org/~jm/ LABORIUS Université de Sherbrooke, Québec, Canada -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 190 bytes Desc: Ceci est une partie de message numériquement signée. Url : http://lists.xiph.org/pipermail/speex-dev/attachments/20040330/7e9bf1db/signature-0001.pgp