thr3ads.net - Speex dev - [Speex-dev] Acoustic echo cancellation [Apr 2011]

If this information is useful, please help other people find it:
Share via:

Li Maoquan

2011-Apr-21 00:35 UTC

[Speex-dev] Acoustic echo cancellation

Simply to say, in a quiet room, you can play a impulse signal and then find
it's impulse response signal from the
microphone. For example, if the delay between the impulse signal and its
response signal range from 500 to
3000 cycles, you can buffer the far-end signal to 0-300 cycles and set the
filter length to 4000. It is also called
to align far-end signal and near-end signal.

BTW: Speex AEC is sensiive to mismatch between sample rates of capturing and
rendering. But most low-cost
computer soundcards have this problem.



At 2011-04-21 03:00:01?speex-dev-request at xiph.org
wrote:>> >>>
>> >>> I have a scenario in a mobile VoIP app that requires echo
cancellation
>> but
>> >>> is somewhat different from what's described in the
docs.
>> >>>
>> >>> Audio is received from and sent to the network at 8000Hz.
Each packet
>> >>> contains 160 samples worth a playback of 20ms.
>> >>>
>> >>> But the hardware requires aggregation for both playback
and capture. So
>> for
>> >>> playback, I coalesce 4 packets in a buffer and queue them
as a larger
>> buffer
>> >>> for playback.
>> >>> On the send side, I read a large buffer (worth 4 packets)
and send them
>> out
>> >>> over time 20ms apart.
>> >>>
>> >>> I tried using speex_echo_playback just when a 160-sample
packet arrives
>> from
>> >>> the network, before coalescing and speex_echo_capture just
before a
>> packet
>> >>> is sent out to the network but that doesn't seem to
work properly
>> (doesn't
>> >>> cancel any echo).
>> >>
>> >> The most likely reason is that you didn't align the
far-end and near-end
>> samples.
>> >> So the filter can not converge.
>> >
>> >Thanks for your response. Can you please explain what you mean by
>> >align samples from near-end and far-end? And how is that usually
>> >accomplished?
>>
>> You need to know the total delay caused by DAC buffer before speaker,
ADC
>> buffer
>> after microphone and acoustic path between speaker and microphone.
Simply
>> to say,
>> if you play an impluse signal and its first echo appears after N sample
>> cycles,
>> you can call N as the delay between y (echo in near-end signal) and x
>> (far-end
>> signal). Then you can buffer far-end signal for N-M cycles before
sending
>> to AEC.
>> M is a little number (such as 100) in order to avoid filter failure
when
>> echo
>> path drifts.
>>
>>
>Thanks again. I am trying to model the delay between the near and far end
>signals using a circular queue of length n. Every time a frame is received
>and queued for playback, it is also entered into the queue. Each frame being
>read from the mic is echo-cancelled ( speex_echo_cancellation ) using the
>oldest frame in the queue if the queue is filled up, thus I am cancelling
>the recorded frame using a playback frame that is N-frames old.
>
>I have played with different values of N from 2 to 50 (320 samples to 8000
>samples), attempting to align the input and output but the cancellation
>doesn't seem to work. The echo is steady as ever.
>
>Is this model correct and expected to converge with a right value of
"N"? Or
>do I need some other adaptation to account for drifts here. Right now,
it's
>a black box for me. I am not sure how to get some feedback from this system
>to tune the AEC (and the delay parameters) correctly.
>
>Also, I did not follow the use of "M" in your description above
and how it
>helps with drifts. My queue stores frames (160 samples each). So a number of
>100 samples seems too small.
>
>Btw, I am assuming that speex AEC API can be used even though I am not using
>the speex encoder/decoder.
>
>
>
>> >>
>> >>>> So, in this scenario above, please recommend a good
place to insert
>> >>> speex_echo_playback and speex_echo_capture. Should I be
just before the
>> read
>> >>> and write to hardware? In that case, should I use a larger
"frame size"
>> of
>> >>> 160 samples x 4?
>> >>
>> >> Of course you can set frame size to 160*4. Otherwise you can
feed
>> samples 4 times
>> >> to the AEC if you don't want to modify the frame size.
>> >>
>> >>>
>> >> Thanks in advance,
>> >> Daniel.-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.xiph.org/pipermail/speex-dev/attachments/20110421/999197c9/attachment.htm

Yue Meng Chen

2011-Apr-21 01:38 UTC

head link

[Speex-dev] Acoustic echo cancellation

Assumption of a quiet room is inapplicable in reality. Plus, the impulse
response might change in the middle of a call ? so, you basically need to
find a good initial alignment point, then, track it along the way. Make it
reliable is very much the key.

 

  _____  

From: speex-dev-bounces at xiph.org [mailto:speex-dev-bounces at xiph.org] On
Behalf Of Li Maoquan
Sent: Wednesday, April 20, 2011 5:36 PM
To: speex-dev at xiph.org
Subject: Re: [Speex-dev] Acoustic echo cancellation

 

Simply to say, in a quiet room, you can play a impulse signal and then find
it's impulse response signal from the 
microphone. For example, if the delay between the impulse signal and its
response signal range from 500 to
3000 cycles, you can buffer the far-end signal to 0-300 cycles and set the
filter length to 4000. It is also called
to align far-end signal and near-end signal.

BTW: Speex AEC is sensiive to mismatch between sample rates of capturing and
rendering. But most low-cost
computer soundcards have this problem.

 




At 2011-04-21 03:00:01?speex-dev-request at xiph.org
wrote:>> >>>
>> >>> I have a scenario in a mobile VoIP app that requires echo
cancellation>> but
>> >>> is somewhat different from what's described in the
docs.
>> >>>
>> >>> Audio is received from and sent to the network at 8000Hz.
Each packet
>> >>> contains 160 samples worth a playback of 20ms.
>> >>>
>> >>> But the hardware requires aggregation for both playback
and capture.
So>> for
>> >>> playback, I coalesce 4 packets in a buffer and queue them
as a larger
>> buffer
>> >>> for playback.
>> >>> On the send side, I read a large buffer (worth 4 packets)
and send
them>> out
>> >>> over time 20ms apart.
>> >>>
>> >>> I tried using speex_echo_playback just when a 160-sample
packet
arrives>> from
>> >>> the network, before coalescing and speex_echo_capture just
before a
>> packet
>> >>> is sent out to the network but that doesn't seem to
work properly
>> (doesn't
>> >>> cancel any echo).
>> >>
>> >> The most likely reason is that you didn't align the
far-end and
near-end>> samples.
>> >> So the filter can not converge.
>> >
>> >Thanks for your response. Can you please explain what you mean by
>> >align samples from near-end and far-end? And how is that usually
>> >accomplished?
>> 
>> You need to know the total delay caused by DAC buffer before speaker,
ADC
>> buffer
>> after microphone and acoustic path between speaker and microphone.
Simply
>> to say,
>> if you play an impluse signal and its first echo appears after N sample
>> cycles,
>> you can call N as the delay between y (echo in near-end signal) and x
>> (far-end
>> signal). Then you can buffer far-end signal for N-M cycles before
sending
>> to AEC.
>> M is a little number (such as 100) in order to avoid filter failure
when
>> echo
>> path drifts.
>> 
>> 
>Thanks again. I am trying to model the delay between the near and far end
>signals using a circular queue of length n. Every time a frame is received
>and queued for playback, it is also entered into the queue. Each frame
being>read from the mic is echo-cancelled ( speex_echo_cancellation ) using the
>oldest frame in the queue if the queue is filled up, thus I am cancelling
>the recorded frame using a playback frame that is N-frames old.
> 
>I have played with different values of N from 2 to 50 (320 samples to 8000
>samples), attempting to align the input and output but the cancellation
>doesn't seem to work. The echo is steady as ever.
> 
>Is this model correct and expected to converge with a right value of
"N"?
Or>do I need some other adaptation to account for drifts here. Right now,
it's
>a black box for me. I am not sure how to get some feedback from this system
>to tune the AEC (and the delay parameters) correctly.
> 
>Also, I did not follow the use of "M" in your description above
and how it
>helps with drifts. My queue stores frames (160 samples each). So a number
of>100 samples seems too small.
> 
>Btw, I am assuming that speex AEC API can be used even though I am not
using>the speex encoder/decoder.
> 
> 
> 
>> >>
>> >>>> So, in this scenario above, please recommend a good
place to insert
>> >>> speex_echo_playback and speex_echo_capture. Should I be
just before
the>> read
>> >>> and write to hardware? In that case, should I use a larger
"frame
size">> of
>> >>> 160 samples x 4?
>> >>
>> >> Of course you can set frame size to 160*4. Otherwise you can
feed
>> samples 4 times
>> >> to the AEC if you don't want to modify the frame size.
>> >>
>> >>>
>> >> Thanks in advance,
>> >> Daniel. 






  _____  

 <http://mail.163.com/html/110414_attachment/att1.htm> ??????2G???
?????????????????3?! 

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.xiph.org/pipermail/speex-dev/attachments/20110420/8ec9d20c/attachment-0001.htm

Daniel K

2011-Apr-21 08:20 UTC

head link

[Speex-dev] Acoustic echo cancellation

2011/4/20 Li Maoquan <limaoquan2000 at 126.com>
> Simply to say, in a quiet room, you can play a impulse signal and then find
> it's impulse response signal from the
> microphone. For example, if the delay between the impulse signal and its
> response signal range from 500 to
> 3000 cycles, you can buffer the far-end signal to 0-300 cycles and set the
> filter length to 4000. It is also called
> to align far-end signal and near-end signal.
>
> BTW: Speex AEC is sensiive to mismatch between sample rates of capturing
> and rendering. But most low-cost
> computer soundcards have this problem.
>
>Based on your explanation above, I have attempted to measure the delay
between the near-end and far-end signal. I send in a sharp impulse sound,
have it play on the speaker and see when it comes out of the mic. I am
looking at peak amplitudes in each frame (160 samples) to identify the
impulse sound.

The time from when the sound enters the stack (before it is queued with the
speaker) to the time when it comes out of the mic ready to go out, varies
between 0.44 to 0.57 seconds. So I queue the far-end samples for 0.4
seconds, and then send them to speex_echo_playback. Also, I am using a
filter length of 0.2 seconds (1600 samples). Each outgoing (near-end) sample
coming from the mic is passed through speex_echo_capture to cancel the echo
and the output is sent out of the device.

Despite this, the cancellation does not seem to work. I have also tried
using speex_echo_cancellation. That doesn't help either.

What could I be doing wrong? Any suggestions?
Also, I am not using the speex encoder/decoder, just the echo cancellation.
I hope that's not an issue. Please correct me if it is.

Thanks for helping me with this!




> At 2011-04-21 03:00:01?speex-dev-request at xiph.org wrote:
> >> >>>
> >> >>> I have a scenario in a mobile VoIP app that requires
echo cancellation
> >> but
> >> >>> is somewhat different from what's described in
the docs.
> >> >>>
> >> >>> Audio is received from and sent to the network at
8000Hz. Each packet
> >> >>> contains 160 samples worth a playback of 20ms.
> >> >>>
> >> >>> But the hardware requires aggregation for both
playback and capture. So
> >> for
> >> >>> playback, I coalesce 4 packets in a buffer and queue
them as a larger
> >> buffer
> >> >>> for playback.
> >> >>> On the send side, I read a large buffer (worth 4
packets) and send them
> >> out
> >> >>> over time 20ms apart.
> >> >>>
> >> >>> I tried using speex_echo_playback just when a
160-sample packet arrives
> >> from
> >> >>> the network, before coalescing and speex_echo_capture
just before a
> >> packet
> >> >>> is sent out to the network but that doesn't seem
to work properly
> >> (doesn't
> >> >>> cancel any echo).
> >> >>
> >> >> The most likely reason is that you didn't align the
far-end and near-end
> >> samples.
> >> >> So the filter can not converge.
> >> >
> >> >Thanks for your response. Can you please explain what you mean
by
> >> >align samples from near-end and far-end? And how is that
usually
> >> >accomplished?
> >>
> >> You need to know the total delay caused by DAC buffer before
speaker, ADC
> >> buffer
> >> after microphone and acoustic path between speaker and microphone.
Simply
> >> to say,
> >> if you play an impluse signal and its first echo appears after N
sample
> >> cycles,
> >> you can call N as the delay between y (echo in near-end signal)
and x
> >> (far-end
> >> signal). Then you can buffer far-end signal for N-M cycles before
sending
> >> to AEC.
> >> M is a little number (such as 100) in order to avoid filter
failure when
> >> echo
> >> path drifts.
> >>
> >>
> >Thanks again. I am trying to model the delay between the near and far
end
> >signals using a circular queue of length n. Every time a frame is
received
> >and queued for playback, it is also entered into the queue. Each frame
being
> >read from the mic is echo-cancelled ( speex_echo_cancellation ) using
the
> >oldest frame in the queue if the queue is filled up, thus I am
cancelling
> >the recorded frame using a playback frame that is N-frames old.
> >
> >I have played with different values of N from 2 to 50 (320 samples to
8000
> >samples), attempting to align the input and output but the cancellation
> >doesn't seem to work. The echo is steady as ever.
> >
> >Is this model correct and expected to converge with a right value of
"N"? Or
> >do I need some other adaptation to account for drifts here. Right now,
it's
> >a black box for me. I am not sure how to get some feedback from this
system
> >to tune the AEC (and the delay parameters) correctly.
> >
> >Also, I did not follow the use of "M" in your description
above and how it
> >helps with drifts. My queue stores frames (160 samples each). So a
number of
> >100 samples seems too small.
> >
> >Btw, I am assuming that speex AEC API can be used even though I am not
using
> >the speex encoder/decoder.
> >
> >
> >
> >> >>
> >> >>>> So, in this scenario above, please recommend a
good place to insert
> >> >>> speex_echo_playback and speex_echo_capture. Should I
be just before the
> >> read
> >> >>> and write to hardware? In that case, should I use a
larger "frame size"
> >> of
> >> >>> 160 samples x 4?
> >> >>
> >> >> Of course you can set frame size to 160*4. Otherwise you
can feed
> >> samples 4 times
> >> >> to the AEC if you don't want to modify the frame
size.
> >> >>
> >> >>>
> >> >> Thanks in advance,
> >> >> Daniel.
>
>
>
>
> ------------------------------
>
??????2G????????????????????3?!<http://mail.163.com/html/110414_attachment/att1.htm>
> _______________________________________________
> Speex-dev mailing list
> Speex-dev at xiph.org
> http://lists.xiph.org/mailman/listinfo/speex-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.xiph.org/pipermail/speex-dev/attachments/20110421/adb9f60a/attachment.htm

Andras Kadinger

2011-May-01 12:03 UTC

head link

[Speex-dev] Acoustic echo cancellation

Daniel,

I recommend you to start from a simple case and gradually progress
towards your goal.

Can you make things work with the "Speex in a Disco" (Example 6)
testcase at http://ns.surfnonstop.com/~bandit/speex/echocard1/ ?

These files were captured via a stereo soundcard, so the channel sample
clocks are exactly synchronized, making for a simple baseline testcase
for AEC.

Download the uncompressed far end and near end unaltered files there,
pretend in your code they were your inputs and outputs, and make sure
your results using them is about good as on the example. If not, you
probably have a problem in your code that you need to fix first, before
worrying about unsynced sample clocks.

Andras

2011.04.21. 10:20 keltez?ssel, Daniel K ?rta:>
> 2011/4/20 Li Maoquan <limaoquan2000 at 126.com
> <mailto:limaoquan2000 at 126.com>>
>
>     Simply to say, in a quiet room, you can play a impulse signal and
>     then find it's impulse response signal from the
>     microphone. For example, if the delay between the impulse signal
>     and its response signal range from 500 to
>     3000 cycles, you can buffer the far-end signal to 0-300 cycles and
>     set the filter length to 4000. It is also called
>     to align far-end signal and near-end signal.
>
>     BTW: Speex AEC is sensiive to mismatch between sample rates of
>     capturing and rendering. But most low-cost
>     computer soundcards have this problem.
>
>
> Based on your explanation above, I have attempted to measure the delay
> between the near-end and far-end signal. I send in a sharp impulse
> sound, have it play on the speaker and see when it comes out of the
> mic. I am looking at peak amplitudes in each frame (160 samples) to
> identify the impulse sound.
>
> The time from when the sound enters the stack (before it is queued
> with the speaker) to the time when it comes out of the mic ready to go
> out, varies between 0.44 to 0.57 seconds. So I queue the far-end
> samples for 0.4 seconds, and then send them to speex_echo_playback.
> Also, I am using a filter length of 0.2 seconds (1600 samples). Each
> outgoing (near-end) sample coming from the mic is passed through
> speex_echo_capture to cancel the echo and the output is sent out of
> the device.
>
> Despite this, the cancellation does not seem to work. I have also
> tried using speex_echo_cancellation. That doesn't help either.
>
> What could I be doing wrong? Any suggestions?
> Also, I am not using the speex encoder/decoder, just the echo
> cancellation. I hope that's not an issue. Please correct me if it is.
>
> Thanks for helping me with this!
>
>
>
>
>     At 2011-04-21 03:00:01?speex-dev-request at xiph.org
<mailto:speex-dev-request at xiph.org> wrote:
>     >> >>>
>     >> >>> I have a scenario in a mobile VoIP app that
requires echo cancellation
>     >> but
>     >> >>> is somewhat different from what's described
in the docs.
>     >> >>>
>     >> >>> Audio is received from and sent to the network at
8000Hz. Each packet
>     >> >>> contains 160 samples worth a playback of 20ms.
>     >> >>>
>     >> >>> But the hardware requires aggregation for both
playback and capture. So
>     >> for
>     >> >>> playback, I coalesce 4 packets in a buffer and
queue them as a larger
>     >> buffer
>     >> >>> for playback.
>     >> >>> On the send side, I read a large buffer (worth 4
packets) and send them
>     >> out
>     >> >>> over time 20ms apart.
>     >> >>>
>     >> >>> I tried using speex_echo_playback just when a
160-sample packet arrives
>     >> from
>     >> >>> the network, before coalescing and
speex_echo_capture just before a
>     >> packet
>     >> >>> is sent out to the network but that doesn't
seem to work properly
>     >> (doesn't
>     >> >>> cancel any echo).
>     >> >>
>     >> >> The most likely reason is that you didn't align
the far-end and near-end
>     >> samples.
>     >> >> So the filter can not converge.
>     >> >
>     >> >Thanks for your response. Can you please explain what you
mean by
>     >> >align samples from near-end and far-end? And how is that
usually
>     >> >accomplished?
>     >>
>     >> You need to know the total delay caused by DAC buffer before
speaker, ADC
>     >> buffer
>     >> after microphone and acoustic path between speaker and
microphone. Simply
>     >> to say,
>     >> if you play an impluse signal and its first echo appears after
N sample
>     >> cycles,
>     >> you can call N as the delay between y (echo in near-end
signal) and x
>     >> (far-end
>     >> signal). Then you can buffer far-end signal for N-M cycles
before sending
>     >> to AEC.
>     >> M is a little number (such as 100) in order to avoid filter
failure when
>     >> echo
>     >> path drifts.
>     >>
>     >>
>     >Thanks again. I am trying to model the delay between the near and
far end
>     >signals using a circular queue of length n. Every time a frame is
received
>     >and queued for playback, it is also entered into the queue. Each
frame being
>     >read from the mic is echo-cancelled ( speex_echo_cancellation )
using the
>     >oldest frame in the queue if the queue is filled up, thus I am
cancelling
>     >the recorded frame using a playback frame that is N-frames old.
>     >
>     >I have played with different values of N from 2 to 50 (320 samples
to 8000
>     >samples), attempting to align the input and output but the
cancellation
>     >doesn't seem to work. The echo is steady as ever.
>     >
>     >Is this model correct and expected to converge with a right value
of "N"? Or
>     >do I need some other adaptation to account for drifts here. Right
now, it's
>     >a black box for me. I am not sure how to get some feedback from
this system
>     >to tune the AEC (and the delay parameters) correctly.
>     >
>     >Also, I did not follow the use of "M" in your description
above and how it
>     >helps with drifts. My queue stores frames (160 samples each). So a
number of
>     >100 samples seems too small.
>     >
>     >Btw, I am assuming that speex AEC API can be used even though I am
not using
>     >the speex encoder/decoder.
>     >
>     >
>     >
>     >> >>
>     >> >>>> So, in this scenario above, please recommend
a good place to insert
>     >> >>> speex_echo_playback and speex_echo_capture.
Should I be just before the
>     >> read
>     >> >>> and write to hardware? In that case, should I use
a larger "frame size"
>     >> of
>     >> >>> 160 samples x 4?
>     >> >>
>     >> >> Of course you can set frame size to 160*4. Otherwise
you can feed
>     >> samples 4 times
>     >> >> to the AEC if you don't want to modify the frame
size.
>     >> >>
>     >> >>>
>     >> >> Thanks in advance,
>     >> >> Daniel.
>
>
>
>
>    
------------------------------------------------------------------------
>     ??????2G????????????????????3?!
>     <http://mail.163.com/html/110414_attachment/att1.htm>
>     _______________________________________________
>     Speex-dev mailing list
>     Speex-dev at xiph.org <mailto:Speex-dev at xiph.org>
>     http://lists.xiph.org/mailman/listinfo/speex-dev
>
>
>
> _______________________________________________
> Speex-dev mailing list
> Speex-dev at xiph.org
> http://lists.xiph.org/mailman/listinfo/speex-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.xiph.org/pipermail/speex-dev/attachments/20110501/945a8737/attachment.htm

Apparently Analagous Threads

Search for more reasonably related threads

Speex dev - Apr 2011 - Acoustic echo cancellation

[Speex-dev] Acoustic echo cancellation

[Speex-dev] Acoustic echo cancellation

[Speex-dev] Acoustic echo cancellation

[Speex-dev] Acoustic echo cancellation

Apparently Analagous Threads