Hello, I am testing speex 1.1.6's echo canceller. I am using testecho.c, with a few modifications to get it to run on Windows. My problem is that I am unable to get the echo cancellation to work correctly. I am working on an audio conferencing software, and one issue we have is sometimes the microphone picks up what is being played through the headset, resulting in an echo of the other person who is talking. In order to test this scenario through the echo canceller, I am running two samples through it. The first is the signal I want to clean - it is my husband talking, with an echo of myself talking. The second sample is the reference sample - it is just myself talking. (I used the second sample to create the first, so it is the exact same wav, with my husband's speaking placed on top of it.) The result has my husband speaking clearly, but the echo of myself talking is garbled and sounds like the Borg Hive from Star Trek. It's more distracting than the original signal was. I have tried introducing delays to my echo, but it does not help the signal to become clearer. I think I am doing something wrong, but I am not sure what. I notice you are using two samples - play.sw and ref.sw - which are not included in the source. What do they sound like? Is there a place one can pick up these files? Is this echo cancellation designed more for musical tones instead of actual echoes of speech? I know this is experimental and I feel I may be using it in the wrong way. If you want, I can email my samples and the resulting output. Thank you for your support and for a great product, Shana -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.xiph.org/pipermail/speex-dev/attachments/20040824/e87f0faa/attachment.html
Hi, On Tue, Aug 24, 2004 at 10:44:49AM -0700, Shana Cooke (Gitnick) wrote:> The result has my husband speaking clearly, but the echo of myself > talking is garbled and sounds like the Borg Hive from Star Trek. It's > more distracting than the original signal was. I have tried introducing > delays to my echo, but it does not help the signal to become clearer.Try this patch. It effectively disables the adaption rate adjustment, and it uses a lower adaption rate. Play with that rate. Suddenly you almost concurred the Borg. (Until the Borg adapts of-course :-) ).> I think I am doing something wrong, but I am not sure what. I notice you > are using two samples - play.sw and ref.sw - which are not included in > the source. What do they sound like? Is there a place one can pick up > these files? Is this echo cancellation designed more for musical tones > instead of actual echoes of speech? I know this is experimental and I > feel I may be using it in the wrong way.Oh no! Please, never try perfect sinussen on the filter while it is adapting. If I do that (that seems my biggest problem), it looks like the tone gets modulated by the echo. These things are my own experiences. I am not a wizard like Jean-Marc Valin :-(. My setup is maybe a bit different. I use headsets, which unfortunately have accoustic feedback, but they are not so loud. At my setup the echo is created by an analog modem connected to an analog line. The signal I get from the modem contains the voices of both the caller and the callee in almost the same volume. The biggest problems I have encountered so far: having clear tones while the filter is adapting (I play with the adapt_rate during the conversation). These tones will have a major bad impact on the filter. So I've hacked the telephone program that it will put the adapt_rate at 0 when "silence" is detected at the microphone. My latest problem I try to concur is when I adapt_rate = 0 (the filter has already adapted), and when I talk, my voice gets louder and louder (up until a certain point, echo is certainly cancelled, but not enough), until I initiate an impuls (hitting the mic with a nail), and suddenly it gets al quiet, while the other site still seems ok. I have two problems here, since the plantronics headset decide forthemselves that they should turn down the volume, and it takes a minute or more to get them back to the original volume (heh, will try a logitech usb headset). To be exact about my systemspecs: - headset: plantronics DSP-100 USB headset with volume buttons - modem: sweex something USB modem. (A smartlink reference design modem) - system: neoware capio-506 running diskless debian GNU/linux (geode system with 32MB RAM) Anyway, back to hacking and reading the MDF publication, and trying to understand at least which variables do what, and have what significance.
On Thu, Aug 26, 2004 at 11:54:19AM +0200, Ard van Breemen wrote:> Try this patch. > It effectively disables the adaption rate adjustment, and it uses > a lower adaption rate. Play with that rate. Suddenly you almost > concurred the Borg. (Until the Borg adapts of-course :-) ).Sigh... It is early I guess :-) -------------- next part -------------- --- mdf.c.org 2004-08-18 11:26:51.000000000 +0200 +++ mdf.c 2004-08-18 11:27:26.000000000 +0200 @@ -57,7 +57,7 @@ N = st->window_size; M = st->M = (filter_length+N-1)/frame_size; st->cancel_count=0; - st->adapt_rate = .01f; + st->adapt_rate = .001f; st->fft_lookup = (struct drft_lookup*)speex_alloc(sizeof(struct drft_lookup)); drft_init(st->fft_lookup, N); @@ -310,6 +310,7 @@ } /* Adjust adaptation rate */ +#if 0 if (st->cancel_count>2*M) { if (st->cancel_count<8*M) @@ -329,6 +330,7 @@ } } else st->adapt_rate = .0f; +#endif /* Update weights */ for (i=0;i<M*N;i++)
Hi, On Tue, Aug 24, 2004 at 10:44:49AM -0700, Shana Cooke (Gitnick) wrote:> I think I am doing something wrong, but I am not sure what. I notice you > are using two samples - play.sw and ref.sw - which are not included in > the source. What do they sound like? Is there a place one can pick up > these files? Is this echo cancellation designed more for musical tones > instead of actual echoes of speech? I know this is experimental and I > feel I may be using it in the wrong way.Well, At least I have some examples of my latest experiments: http://www.kwaak.net/~ard/example.tar.bz2 It's a shocking 9 megabytes, and contains 3 files: The source (out.wav) going into the modem, the mixed signal (in.wav) coming from the modem, and the results (hds.wav) I get with my latest experiments. The experiment: - disable-adaption-rate.patch - disable-AUMDF-weight-constraints-when-adaption-rate=0.patch (hah, no patch yet). - call, initialize the filter by using impulses (ie. ticking the microphone, and humming something). adapt_rate=0 until a clear silent line is available, and only then I start training. - save the echo-state --- - load the echo-state - adapt_rate = 0 (so AUMDF-weight constraints is also disabled) - call and record upload the samples. What happens during the recording: I talk to my collegea, and somewhere during the converstation we switch places. Eh, yes, there is a big system humming near me.