Steve Kann
2004-Aug-06 15:02 UTC
preprocessor performance (was Re: [speex-dev] Memory leak in denoiser + a few questions)
Jean-Marc Valin wrote:>If you set the denoiser to "on" and the VAD to "off", what difference >does it make in CPU time? ><p>Same program, running on Athlon XP 1700+: Test 1, using VAD, but AGC, denoise off: tevek@canarsie:~/work/hms/app_conference $ time ./vad_test /tmp/demo-instruct.sw 5 reading from /tmp/demo-instruct.sw, repeating 5 times read 537760 samples beginning pass beginning pass beginning pass beginning pass beginning pass done. real 0m4.970s user 0m4.628s sys 0m0.014s <p>Test 2, using denoise only: tevek@canarsie:~/work/hms/app_conference $ time ./vad_test /tmp/demo-instruct.sw 5 reading from /tmp/demo-instruct.sw, repeating 5 times read 537760 samples beginning pass beginning pass beginning pass beginning pass beginning pass done. real 0m5.359s user 0m4.301s sys 0m0.024s <p>=================== So, it doesn't seem to make much difference. I also ran the code, unoptimized, with oprofile. I'll send results from that to you separately. -SteveK --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'speex-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.
Jean-Marc Valin
2004-Aug-06 15:02 UTC
preprocessor performance (was Re: [speex-dev] Memory leak in denoiser + a few questions)
OK, so the problem doesn't seem to be the VAD specifically. Can you tell me how much audio you had in the test? It may be that nothing's wrong and the code just isn't so fast that you can do 100 channels. Or maybe it just needs a bit of optimization... Jean-Marc Le mer 31/03/2004 à 10:03, Steve Kann a écrit :> Jean-Marc Valin wrote: > > >If you set the denoiser to "on" and the VAD to "off", what difference > >does it make in CPU time? > > > > > Same program, running on Athlon XP 1700+: > > Test 1, using VAD, but AGC, denoise off: > > stevek@canarsie:~/work/hms/app_conference $ time ./vad_test > /tmp/demo-instruct.sw 5 > reading from /tmp/demo-instruct.sw, repeating 5 times > read 537760 samples > beginning pass > beginning pass > beginning pass > beginning pass > beginning pass > done. > > > real 0m4.970s > user 0m4.628s > sys 0m0.014s > > > Test 2, using denoise only: > > stevek@canarsie:~/work/hms/app_conference $ time ./vad_test > /tmp/demo-instruct.sw 5 > reading from /tmp/demo-instruct.sw, repeating 5 times > read 537760 samples > beginning pass > beginning pass > beginning pass > beginning pass > beginning pass > done. > > real 0m5.359s > user 0m4.301s > sys 0m0.024s > > > ===================> > So, it doesn't seem to make much difference. > > I also ran the code, unoptimized, with oprofile. I'll send results from > that to you separately. > > -SteveK-- Jean-Marc Valin http://www.xiph.org/~jm/ LABORIUS Université de Sherbrooke, Québec, Canada -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 190 bytes Desc: Ceci est une partie de message numériquement signée. Url : http://lists.xiph.org/pipermail/speex-dev/attachments/20040331/e07f80e5/signature-0001.pgp
Steve Kann
2004-Aug-06 15:02 UTC
preprocessor performance (was Re: [speex-dev] Memory leak in denoiser + a few questions)
Jean-Marc Valin wrote:>OK, so the problem doesn't seem to be the VAD specifically. Can you tell >me how much audio you had in the test? It may be that nothing's wrong >and the code just isn't so fast that you can do 100 channels. Or maybe >it just needs a bit of optimization... > >In my test, I have a buffer which is 1024x1024 (about 1Million, or 65 seconds) samples long, which I zero and then fill with 537760 (about 500K, or 30 seconds) of sampled audio. The rest of the buffer is empty. Then, I run the preprocessor over it 5 times; This simulates about 5 minutes of preprocessing, consisting of alternating 30 second segments of speech and silence. I sent (off-list) some oprofile output, but I'm not sure what to make of it. Some operations that don't look any more complicated than others seem to take a long time. I also tried getting samples on DATA_CACHE_MISSES. Here's an example of the hotspots I found (in preprocessor.c, code modified a bit to include local pointers to arrays in the st struct): The first four columns are the counter hits and percentage of hits for CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 10000 and DATA_CACHE_MISSES events (Data cache misses) with a unit mask of 0x00 (No unit mask) count 1000 respectively. The hits attributed to inc %ebx might be due to the previous instruction, though, but clearly this loop itselff is taking almost 7% of the time, which doesn't make sense.. <p> : for (i=1;i<N;i++) : 804a340: mov $0x1,%ebx 18 0.0012 0 0.0e+00 : 804a345: cmp %edi,%ebx : 804a347: jge 804a377 <speex_preprocess+0x3c7> : 804a349: fldl 0x804d810 11 7.2e-04 0 0.0e+00 : 804a34f: fldl 0x804d818 : zeta[i] = .7*zeta[i] + .3*prior[i]; : 804a355: mov 0xffffffb4(%ebp),%ecx 1494 0.0979 1 0.0695 : 804a358: mov 0xffffffac(%ebp),%eax 22 0.0014 0 0.0e+00 : 804a35b: fld %st(1) : 804a35d: fld %st(1) 1546 0.1013 1 0.0695 : 804a35f: fxch %st(1) 1532 0.1004 0 0.0e+00 : 804a361: fmuls (%ecx,%ebx,4) 1 6.6e-05 0 0.0e+00 : 804a364: fxch %st(1) 8 5.2e-04 0 0.0e+00 : 804a366: fmuls (%eax,%ebx,4) 1416 0.0928 9 0.6254 : 804a369: faddp %st,%st(1) 1 6.6e-05 0 0.0e+00 : 804a36b: fstps (%ecx,%ebx,4) 102158 6.6924 15 1.0424 : 804a36e: inc %ebx 5864 0.3842 0 0.0e+00 : 804a36f: cmp %edi,%ebx 1564 0.1025 0 0.0e+00 : 804a371: jl 804a355 <speex_preprocess+0x3a5> : 804a373: fstp %st(0) 144 0.0094 0 0.0e+00 : 804a375: fstp %st(0) <p>Here, this area of the code is taking (in this example) about 13% of the execution time: : zeta1 = zeta[i]; : else : zeta1 = .25*zeta[i-1] + .5*zeta[i] + .25*zeta[i+1]; : 804a490: mov 0xffffffb4(%ebp),%edx 4292 0.2812 0 0.0e+00 : 804a493: fldl 0x804d868 287 0.0188 0 0.0e+00 : 804a499: flds (%edx,%ebx,4) 146543 9.6001 26 1.8068 : 804a49c: fxch %st(1) 28942 1.8960 3 0.2085 : 804a49e: fmuls 0xfffffffc(%edx,%ebx,4) 9996 0.6548 1 0.0695 : 804a4a2: fxch %st(1) : 804a4a4: fmuls 0x804d708 1655 0.1084 0 0.0e+00 : 804a4aa: faddp %st,%st(1) 1030 0.0675 1 0.0695 : 804a4ac: fldl 0x804d868 657 0.0430 0 0.0e+00 : 804a4b2: fmuls 0x4(%edx,%ebx,4) 553 0.0362 0 0.0e+00 : 804a4b6: faddp %st,%st(1) 1129 0.0740 0 0.0e+00 : 804a4b8: fstps 0xffffffe4(%ebp) 53350 3.4950 3 0.2085 : 804a4bb: flds 0xffffffe4(%ebp) <p>I see that there's probably some optimizations that could be made when using the preprocessor only for VAD; the reverse fft and writing back results, etc could certainly be skipped, since if only VAD is enabled, then there's no point in modifying the samples. But, that isn't the bulk of the consumption, assuming that what oprofile is telling me is even close to correct. <p>--- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'speex-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.