On Mar 28, 2004, at 8:23 PM, Jean-Marc Valin wrote:>> The st->zeta pointer isn't freed in the >> speex_preprocess_state_destroy() >> function of the preprocess.c file (alloced in line 167). It's in >> Speex 1.1.4 >> by the way. > > Oops... Thanks for letting me know. I'll change that for the next > release (in the mean time, the fix is obvious). In case you're > interested, I'm currently working on a reverberation suppression > algorithm. I'll put it in CVS.Reverberation suppression? I guess this would help reduce local source echoes? I've never _noticed_ that to be a problem in my use, but I would imagine that using a notebook's built-in microphone, you'd get some echo off of the screen and stuff [also from the whole room].. Most of these echoes aren't so bad, but I guess they might make the encoding job harder. I'd sure rather see the echo cancellation finished [not that I have any say on what you work on!!!]. FWIW, I'm currently using noise reduction, VAD, and AGC in a VoIP client which is getting more use out there, and it's working well. I'm currently using VAD in a conferencing application [where the VAD decision helps me avoid adding noise to the common conference, and ostensibly avoid redundant mixing steps]. The problem here is that VAD is very expensive. I did a little set of tests, and VAD is currently my bottleneck for users who are using it.. Here's the numbers I got doing vad on 655 seconds of audio (about half is speech, half is absolute silence [0's]). P3-600: 25 seconds Athlon XP 1700+ (1.45Ghz): 5 seconds P4 2.8Ghz: 8.8 seconds. I was surprised to see the Athlon win this by such a wide margin, but I triple-checked to make sure the machine was idle, and the binaries and test data were exactly the same (compiled with just -O2). Anyway, I think I might need to find a less computationally intensive VAD solution for the conference. VAD is currently only used when people connect via the PSTN, so they presumably have a decent SNR, and I may be able to get away with an energy envelope type of thing, without needing frequency domain analysis. But before I go and start coding this, is there any simple optimizations that can be done to the preprocessor when it is being used only for the VAD decision? <p><p><p><p>--- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'speex-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.
Jean-Marc Valin
2004-Aug-06 15:02 UTC
[speex-dev] Memory leak in denoiser + a few questions
> Reverberation suppression?Basically, it means that if you are in a room with lots of echo (long decay), I can reduce it a bit.> I guess this would help reduce local source echoes? I've never > _noticed_ that to be a problem in my use, but I would imagine that > using a notebook's built-in microphone, you'd get some echo off of the > screen and stuff [also from the whole room].. > > Most of these echoes aren't so bad, but I guess they might make the > encoding job harder. I'd sure rather see the echo cancellation > finished [not that I have any say on what you work on!!!].Well, I'm still looking for help :)> Here's the numbers I got doing vad on 655 seconds of audio (about half > is speech, half is absolute silence [0's]). > P3-600: 25 seconds > Athlon XP 1700+ (1.45Ghz): 5 seconds > P4 2.8Ghz: 8.8 seconds.These numbers sound like a problem I has a while ago with the decoder. The VAD shouldn't take much CPU so I suspect there might be floating point underflows in some part, slowing down the Intel CPUs a lot (for some reason, the AMD CPUs seem to handle underflows faster).> Anyway, I think I might need to find a less computationally intensive > VAD solution for the conference. VAD is currently only used when > people connect via the PSTN, so they presumably have a decent SNR, and > I may be able to get away with an energy envelope type of thing, > without needing frequency domain analysis. But before I go and start > coding this, is there any simple optimizations that can be done to the > preprocessor when it is being used only for the VAD decision?Have you tried using the (less accurate) VAD that's in the codec itself (SPEEX_SET_VAD)? Jean-Marc -- Jean-Marc Valin http://www.xiph.org/~jm/ LABORIUS Université de Sherbrooke, Québec, Canada -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 190 bytes Desc: Ceci est une partie de message numériquement signée. Url : http://lists.xiph.org/pipermail/speex-dev/attachments/20040330/2844dacb/signature-0001.pgp
Jean-Marc Valin wrote:>>Reverberation suppression? >> >> > >Basically, it means that if you are in a room with lots of echo (long >decay), I can reduce it a bit. > > > >>I guess this would help reduce local source echoes? I've never >>_noticed_ that to be a problem in my use, but I would imagine that >>using a notebook's built-in microphone, you'd get some echo off of the >>screen and stuff [also from the whole room].. >> >>Most of these echoes aren't so bad, but I guess they might make the >>encoding job harder. I'd sure rather see the echo cancellation >>finished [not that I have any say on what you work on!!!]. >> >> > >Well, I'm still looking for help :) > > > >>Here's the numbers I got doing vad on 655 seconds of audio (about half >>is speech, half is absolute silence [0's]). >>P3-600: 25 seconds >>Athlon XP 1700+ (1.45Ghz): 5 seconds >>P4 2.8Ghz: 8.8 seconds. >> >> > >These numbers sound like a problem I has a while ago with the decoder. >The VAD shouldn't take much CPU so I suspect there might be floating >point underflows in some part, slowing down the Intel CPUs a lot (for >some reason, the AMD CPUs seem to handle underflows faster). > >Hmm, How can I find that out? How much CPU would you expect it to take? I've been playing with oprofile, but I don't see it getting that finely grained..>>Anyway, I think I might need to find a less computationally intensive >>VAD solution for the conference. VAD is currently only used when >>people connect via the PSTN, so they presumably have a decent SNR, and >>I may be able to get away with an energy envelope type of thing, >>without needing frequency domain analysis. But before I go and start >>coding this, is there any simple optimizations that can be done to the >>preprocessor when it is being used only for the VAD decision? >> >> > >Have you tried using the (less accurate) VAD that's in the codec itself >(SPEEX_SET_VAD)? > >I'll take a look at that. In this case [in the conferencing application], I'm not actually using speex encoding [these are PSTN callers, I do VAD in clients when I control them], so I'd need to see if I could rip it out of speex to use it. Also, I do have a couple of patches to the preprocessor to send along actually; basically this makes the start and continue probabilities parameters that can be set by callers. We're currently using very low probabilities; Much lower than your defaults, VAD_START=0.05 VAD_CONTINUE=0.02. We also have 20 frame (2/5 sec) "tail" that is outside the preprocessor, which continues treating some frames as speech after the detector has dropped out. <p>Here's a patch: <p>================================================= <p>Diff for file preprocess.c, 1.2 -> 1.3 Index: preprocess.c ==================================================================RCS file: /home/UniServ/dls/CVS/hms/app_conference/libspeex/preprocess.c,v retrieving revision 1.2 retrieving revision 1.3 diff -u -w -r1.2 -r1.3 --- preprocess.c 2003/11/07 23:40:23 1.2 +++ preprocess.c 2004/02/06 17:10:24 1.3 @@ -145,6 +145,9 @@ st->agc_level = 8000; st->vad_enabled = 0; + st->speech_prob_start = SPEEX_PROB_START ; + st->speech_prob_continue = SPEEX_PROB_CONTINUE ; + st->frame = (float*)speex_alloc(2*N*sizeof(float)); st->ps = (float*)speex_alloc(N*sizeof(float)); st->gain2 = (float*)speex_alloc(N*sizeof(float)); @@ -435,12 +438,19 @@ st->speech_prob = p0/(1e-25+p1+p0); /*fprintf (stderr, "%f %f %f ", tot_loudness, st->loudness2, st->speech_prob);*/ + /* decide if frame is speech using speech probability settings */ + /* if (st->speech_prob> .35 || (st->last_speech < 20 && st->speech_prob>.1)) */ - if (st->speech_prob> .20 || (st->last_speech < 20 && st->speech_prob>.05)) + if ( + st->speech_prob > st->speech_prob_start + || ( st->last_speech < 20 && st->speech_prob > st->speech_prob_continue ) + ) { is_speech = 1; st->last_speech = 0; - } else { + } + else + { st->last_speech++; if (st->last_speech<20) is_speech = 1; @@ -985,6 +995,30 @@ case SPEEX_PREPROCESS_GET_VAD: (*(int*)ptr) = st->vad_enabled; break; + + case SPEEX_PREPROCESS_SET_PROB_START: + st->speech_prob_start = (*(float*)ptr) ; + if ( st->speech_prob_start > 1 ) + st->speech_prob_start = st->speech_prob_start / 100 ; + if ( st->speech_prob_start > 1 || st->speech_prob_start < 0 ) + st->speech_prob_start = SPEEX_PROB_START ; + break ; + case SPEEX_PREPROCESS_GET_PROB_START: + (*(float*)ptr) = st->speech_prob_start ; + break ; + + case SPEEX_PREPROCESS_SET_PROB_CONTINUE: + st->speech_prob_continue = (*(float*)ptr) ; + if ( st->speech_prob_continue > 1 ) + st->speech_prob_continue = st->speech_prob_continue / 100 ; + if ( st->speech_prob_continue > 1 || st->speech_prob_continue < 0 ) + st->speech_prob_continue = SPEEX_PROB_CONTINUE ; + break ; + break ; + case SPEEX_PREPROCESS_GET_PROB_CONTINUE: + (*(float*)ptr) = st->speech_prob_continue ; + break ; + default: speex_warning_int("Unknown speex_preprocess_ctl request: ", request); return -1; Diff for file speex_preprocess.h, 1.1 -> 1.2 Index: speex_preprocess.h ==================================================================RCS file: /home/UniServ/dls/CVS/hms/app_conference/libspeex/speex_preprocess.h,v retrieving revision 1.1 retrieving revision 1.2 diff -u -w -r1.1 -r1.2 --- speex_preprocess.h 2003/11/06 21:57:59 1.1 +++ speex_preprocess.h 2004/02/06 17:10:24 1.2 @@ -49,6 +49,10 @@ float agc_level; int vad_enabled; + // probabilities to check speech_prob against + float speech_prob_start ; + float speech_prob_continue ; + float *frame; /**< Processing frame (2*ps_size) */ float *ps; /**< Current power spectrum */ float *gain2; /**< Adjusted gains */ @@ -108,8 +112,9 @@ /** Used like the ioctl function to control the preprocessor parameters */ int speex_preprocess_ctl(SpeexPreprocessState *st, int request, void *ptr); - +#define SPEEX_PROB_START 0.35 +#define SPEEX_PROB_CONTINUE 0.1 #define SPEEX_PREPROCESS_SET_DENOISE 0 #define SPEEX_PREPROCESS_GET_DENOISE 1 @@ -122,6 +127,12 @@ #define SPEEX_PREPROCESS_SET_AGC_LEVEL 6 #define SPEEX_PREPROCESS_GET_AGC_LEVEL 7 + +#define SPEEX_PREPROCESS_SET_PROB_START 8 +#define SPEEX_PREPROCESS_GET_PROB_START 9 + +#define SPEEX_PREPROCESS_SET_PROB_CONTINUE 10 +#define SPEEX_PREPROCESS_GET_PROB_CONTINUE 11 #ifdef __cplusplus ================================================= } <p><p><p><p><p><p>> Jean-Marc> > ><p>--- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'speex-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.