thr3ads.net - Speex dev - [speex-dev] Memory leak in denoiser + a few questions [Aug 2004]

If this information is useful, please help other people find it:
Share via:

Steve Kann

2004-Aug-06 15:02 UTC

[speex-dev] Memory leak in denoiser + a few questions

On Mar 28, 2004, at 8:23 PM, Jean-Marc Valin wrote:
>> The st->zeta pointer isn't freed in the 
>> speex_preprocess_state_destroy()
>> function of the preprocess.c file (alloced in line 167). It's in 
>> Speex 1.1.4
>> by the way.
>
> Oops... Thanks for letting me know. I'll change that for the next
> release (in the mean time, the fix is obvious). In case you're
> interested, I'm currently working on a reverberation suppression
> algorithm. I'll put it in CVS.
Reverberation suppression?

I guess this would help reduce local source echoes?  I've never 
_noticed_ that to be a problem in my use, but I would imagine that 
using a notebook's built-in microphone, you'd get some echo off of the 
screen and stuff [also from the whole room]..

Most of these echoes aren't so bad, but I guess they might make the 
encoding job harder.  I'd sure rather see the echo cancellation 
finished [not that I have any say on what you work on!!!].

FWIW, I'm currently using noise reduction, VAD, and AGC in a VoIP 
client which is getting more use out there, and it's working well.

I'm currently using VAD in a conferencing application [where the VAD 
decision helps me avoid adding noise to the common conference, and 
ostensibly avoid redundant mixing steps].  The problem here is that VAD 
is very expensive.  I did a little set of tests, and VAD is currently 
my bottleneck for users who are using it..

Here's the numbers I got doing vad on 655 seconds of audio (about half 
is speech, half is absolute silence [0's]).
P3-600: 25 seconds
Athlon XP 1700+ (1.45Ghz): 5 seconds
P4 2.8Ghz: 8.8 seconds.

I was surprised to see the Athlon win this by such a wide margin, but I 
triple-checked to make sure the machine was idle, and the binaries and 
test data were exactly the same (compiled with just -O2).

Anyway, I think I might need to find a less computationally intensive 
VAD solution for the conference.  VAD is currently only used when 
people connect via the PSTN, so they presumably have a decent SNR, and 
I may be able to get away with an energy envelope type of thing, 
without needing frequency domain analysis.  But before I go and start 
coding this, is there any simple optimizations that can be done to the 
preprocessor when it is  being used only for the VAD decision?

<p><p><p><p>--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to
'speex-dev-request@xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is
needed.
Unsubscribe messages sent to the list will be ignored/filtered.

Jean-Marc Valin

2004-Aug-06 15:02 UTC

head link

[speex-dev] Memory leak in denoiser + a few questions

> Reverberation suppression?
Basically, it means that if you are in a room with lots of echo (long
decay), I can reduce it a bit.
> I guess this would help reduce local source echoes?  I've never 
> _noticed_ that to be a problem in my use, but I would imagine that 
> using a notebook's built-in microphone, you'd get some echo off of
the
> screen and stuff [also from the whole room]..
> 
> Most of these echoes aren't so bad, but I guess they might make the 
> encoding job harder.  I'd sure rather see the echo cancellation 
> finished [not that I have any say on what you work on!!!].
Well, I'm still looking for help :)
> Here's the numbers I got doing vad on 655 seconds of audio (about half 
> is speech, half is absolute silence [0's]).
> P3-600: 25 seconds
> Athlon XP 1700+ (1.45Ghz): 5 seconds
> P4 2.8Ghz: 8.8 seconds.
These numbers sound like a problem I has a while ago with the decoder.
The VAD shouldn't take much CPU so I suspect there might be floating
point underflows in some part, slowing down the Intel CPUs a lot (for
some reason, the AMD CPUs seem to handle underflows faster).
> Anyway, I think I might need to find a less computationally intensive 
> VAD solution for the conference.  VAD is currently only used when 
> people connect via the PSTN, so they presumably have a decent SNR, and 
> I may be able to get away with an energy envelope type of thing, 
> without needing frequency domain analysis.  But before I go and start 
> coding this, is there any simple optimizations that can be done to the 
> preprocessor when it is  being used only for the VAD decision?
Have you tried using the (less accurate) VAD that's in the codec itself
(SPEEX_SET_VAD)?

        Jean-Marc


-- 
Jean-Marc Valin
http://www.xiph.org/~jm/
LABORIUS
Université de Sherbrooke, Québec, Canada


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 190 bytes
Desc: Ceci est une partie de message numériquement signée.
Url :
http://lists.xiph.org/pipermail/speex-dev/attachments/20040330/2844dacb/signature-0001.pgp

Steve Kann

2004-Aug-06 15:02 UTC

head link

[speex-dev] Memory leak in denoiser + a few questions

Jean-Marc Valin wrote:
>>Reverberation suppression?
>>    
>>
>
>Basically, it means that if you are in a room with lots of echo (long
>decay), I can reduce it a bit.
>
>  
>
>>I guess this would help reduce local source echoes?  I've never 
>>_noticed_ that to be a problem in my use, but I would imagine that 
>>using a notebook's built-in microphone, you'd get some echo off
of the
>>screen and stuff [also from the whole room]..
>>
>>Most of these echoes aren't so bad, but I guess they might make the 
>>encoding job harder.  I'd sure rather see the echo cancellation 
>>finished [not that I have any say on what you work on!!!].
>>    
>>
>
>Well, I'm still looking for help :)
>
>  
>
>>Here's the numbers I got doing vad on 655 seconds of audio (about
half
>>is speech, half is absolute silence [0's]).
>>P3-600: 25 seconds
>>Athlon XP 1700+ (1.45Ghz): 5 seconds
>>P4 2.8Ghz: 8.8 seconds.
>>    
>>
>
>These numbers sound like a problem I has a while ago with the decoder.
>The VAD shouldn't take much CPU so I suspect there might be floating
>point underflows in some part, slowing down the Intel CPUs a lot (for
>some reason, the AMD CPUs seem to handle underflows faster).
>  
>
Hmm, How can I find that out?  How much CPU would you expect it to take?

I've been playing with oprofile, but I don't see it getting that finely 
grained..
>>Anyway, I think I might need to find a less computationally intensive 
>>VAD solution for the conference.  VAD is currently only used when 
>>people connect via the PSTN, so they presumably have a decent SNR, and 
>>I may be able to get away with an energy envelope type of thing, 
>>without needing frequency domain analysis.  But before I go and start 
>>coding this, is there any simple optimizations that can be done to the 
>>preprocessor when it is  being used only for the VAD decision?
>>    
>>
>
>Have you tried using the (less accurate) VAD that's in the codec itself
>(SPEEX_SET_VAD)?
>  
>I'll take a look at that.  In this case [in the conferencing 
application], I'm not actually using speex encoding [these are PSTN 
callers, I do VAD in clients when I control them], so I'd need to see if 
I could rip it out of speex to use it.

Also, I do have a couple of patches to the preprocessor to send along 
actually; basically this makes the start and continue probabilities 
parameters that can be set by callers.  We're currently using very low 
probabilities;   Much lower than your defaults, VAD_START=0.05 
VAD_CONTINUE=0.02.  We also have 20 frame (2/5 sec) "tail" that is 
outside the preprocessor, which continues treating some frames as speech 
after the detector has dropped out.

<p>Here's a patch:

<p>=================================================
<p>Diff for file preprocess.c, 1.2 -> 1.3
Index: preprocess.c
==================================================================RCS file:
/home/UniServ/dls/CVS/hms/app_conference/libspeex/preprocess.c,v
retrieving revision 1.2
retrieving revision 1.3
diff -u -w -r1.2 -r1.3
--- preprocess.c	2003/11/07 23:40:23	1.2
+++ preprocess.c	2004/02/06 17:10:24	1.3
@@ -145,6 +145,9 @@
    st->agc_level = 8000;
    st->vad_enabled = 0;
 
+   st->speech_prob_start = SPEEX_PROB_START ;
+   st->speech_prob_continue = SPEEX_PROB_CONTINUE ;
+   
    st->frame = (float*)speex_alloc(2*N*sizeof(float));
    st->ps = (float*)speex_alloc(N*sizeof(float));
    st->gain2 = (float*)speex_alloc(N*sizeof(float));
@@ -435,12 +438,19 @@
       st->speech_prob = p0/(1e-25+p1+p0);
       /*fprintf (stderr, "%f %f %f ", tot_loudness, st->loudness2,
st->speech_prob);*/
 
+	/* decide if frame is speech using speech probability settings */
+
 /*      if (st->speech_prob> .35 || (st->last_speech < 20
&& st->speech_prob>.1)) */
-      if (st->speech_prob> .20 || (st->last_speech < 20 &&
st->speech_prob>.05))
+	if (
+		st->speech_prob > st->speech_prob_start
+		|| ( st->last_speech < 20 && st->speech_prob >
st->speech_prob_continue )
+	)
       {
          is_speech = 1;
          st->last_speech = 0;
-      } else {
+	} 
+	else 
+	{
          st->last_speech++;
          if (st->last_speech<20)
            is_speech = 1;
@@ -985,6 +995,30 @@
    case SPEEX_PREPROCESS_GET_VAD:
       (*(int*)ptr) = st->vad_enabled;
       break;
+      
+	case SPEEX_PREPROCESS_SET_PROB_START:
+		st->speech_prob_start = (*(float*)ptr) ;
+		if ( st->speech_prob_start > 1 )
+			st->speech_prob_start = st->speech_prob_start / 100 ;
+		if ( st->speech_prob_start > 1 || st->speech_prob_start < 0 )
+			st->speech_prob_start = SPEEX_PROB_START ;
+		break ;
+	case SPEEX_PREPROCESS_GET_PROB_START:
+		(*(float*)ptr) = st->speech_prob_start ;
+		break ;
+      
+	case SPEEX_PREPROCESS_SET_PROB_CONTINUE:
+		st->speech_prob_continue = (*(float*)ptr) ;
+		if ( st->speech_prob_continue > 1 )
+			st->speech_prob_continue = st->speech_prob_continue / 100 ;
+		if ( st->speech_prob_continue > 1 || st->speech_prob_continue < 0
)
+			st->speech_prob_continue = SPEEX_PROB_CONTINUE ;
+		break ;
+		break ;
+	case SPEEX_PREPROCESS_GET_PROB_CONTINUE:
+		(*(float*)ptr) = st->speech_prob_continue ;
+		break ;
+      
    default:
       speex_warning_int("Unknown speex_preprocess_ctl request: ",
request);
       return -1;

Diff for file speex_preprocess.h, 1.1 -> 1.2
Index: speex_preprocess.h
==================================================================RCS file:
/home/UniServ/dls/CVS/hms/app_conference/libspeex/speex_preprocess.h,v
retrieving revision 1.1
retrieving revision 1.2
diff -u -w -r1.1 -r1.2
--- speex_preprocess.h	2003/11/06 21:57:59	1.1
+++ speex_preprocess.h	2004/02/06 17:10:24	1.2
@@ -49,6 +49,10 @@
    float  agc_level;
    int    vad_enabled;
 
+	// probabilities to check speech_prob against
+	float speech_prob_start ;
+	float speech_prob_continue ;
+
    float *frame;             /**< Processing frame (2*ps_size) */
    float *ps;                /**< Current power spectrum */
    float *gain2;             /**< Adjusted gains */
@@ -108,8 +112,9 @@
 
 /** Used like the ioctl function to control the preprocessor parameters */
 int speex_preprocess_ctl(SpeexPreprocessState *st, int request, void *ptr);
-
 
+#define SPEEX_PROB_START 0.35 
+#define SPEEX_PROB_CONTINUE 0.1
 
 #define SPEEX_PREPROCESS_SET_DENOISE 0
 #define SPEEX_PREPROCESS_GET_DENOISE 1
@@ -122,6 +127,12 @@
 
 #define SPEEX_PREPROCESS_SET_AGC_LEVEL 6
 #define SPEEX_PREPROCESS_GET_AGC_LEVEL 7
+
+#define SPEEX_PREPROCESS_SET_PROB_START 8
+#define SPEEX_PREPROCESS_GET_PROB_START 9
+
+#define SPEEX_PREPROCESS_SET_PROB_CONTINUE 10
+#define SPEEX_PREPROCESS_GET_PROB_CONTINUE 11
 
 #ifdef __cplusplus

================================================= }

<p><p><p><p><p><p>>
Jean-Marc>
>  
>
<p>--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to
'speex-dev-request@xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is
needed.
Unsubscribe messages sent to the list will be ignored/filtered.

Possibly Parallel Threads

Search for more maybe matching threads

Speex dev - Aug 2004 - Memory leak in denoiser + a few questions

[speex-dev] Memory leak in denoiser + a few questions

[speex-dev] Memory leak in denoiser + a few questions

[speex-dev] Memory leak in denoiser + a few questions

Possibly Parallel Threads