thr3ads.net - Speex dev - [Speex-dev] FFT Resampler [May 2008]

If this information is useful, please help other people find it:
Share via:

Thorvald Natvig

2008-May-29 11:41 UTC

[Speex-dev] FFT Resampler

>>  Yes, I plan to use it in a VoIP environment if I can get latency
reduced to
>> an acceptable level :)
>>  The latency depends directly on the overlap parameter, which also
controls
>> the quality. Higher quality => higher latency. You could set the
overlap to
>> 0, but that would give you some nasty artifacts.
>>  You can also resample with smaller block sizes. In the example I used
20ms
>> blocks and 50% overlap. If you use 10ms blocks and 50% overlap, latency
>> sinks to 5ms.
>>     
>
> How quality and CPU usage depends from block size?
> Looking at edge case - 1 sample blocks - I should have <1 sample
latency,
> but something should be  wrong here, e.g. required CPU power may be
> infinite, no? :)
>   Higher block size and more overlap => higher CPU. The idea is to let the 
block size be whatever block size you use for the rest of your program. 
For Speex, 20ms is perfect.
Lower overlap will give less CPU requirements, but also introduce 
artifacts in the resampling.>>  It could be 50% overlap is complete overkill. It could be it's not
enough.
>> It could be how much overlap you need depends on the block size. I need
to
>> do some quality testing before I can say for sure :)
>>     
> *nod*
> Also comparison with standard resampler would be very useful. Probably
> the best way would be to take standard resampler as a reference
> and see how much latency/CPU is needed to reach each of its complexity
> levels. This would give clear idea of what's going, even for people
> who unaware of implementation details.
>   That's the idea, I just haven't gotten that far
:)>
>>> What is input and output latency? As a user, I think there is only
one
>>>       
>> latency,
>>     
>>> latency between data I passed to resampler and data I've got
from it.
>>> I suppose there may be some internal idea behind this division of
latency,
>>> but is end user interested in it?
>>>
>>>
>>>       
>>  It's copied directly from the speex_resampler.h; it's the same
latency, but
>> measured in input and output samples. So if I resample from a blocksize
of
>> 320 to 960 with 50% overlap, the input latency is 160 samples and the
output
>> latency is 480 samples.
>>     
>
> Aha, got it. Documentation is very unclear at this point. =\ It should be
> something like what you've written, e.g. "Latency, measured in
samples
> at input (output) sample rate".
>   I must admit I didn't get the difference either until I checked the 
source for the original resampler. I'll see if we can't reword both of 
them to make it more clear.

Alexander Chemeris

2008-May-29 12:35 UTC

head link

[Speex-dev] FFT Resampler

On 5/29/08, Thorvald Natvig <thorvald at natvig.com>
wrote:> > >  Yes, I plan to use it in a VoIP environment if I can get latency
reduced to
> > > an acceptable level :)
> > >  The latency depends directly on the overlap parameter, which
also controls
> > > the quality. Higher quality => higher latency. You could set
the overlap to
> > > 0, but that would give you some nasty artifacts.
> > >  You can also resample with smaller block sizes. In the example I
used
> 20ms
> > > blocks and 50% overlap. If you use 10ms blocks and 50% overlap,
latency
> > > sinks to 5ms.
> >
> > How quality and CPU usage depends from block size?
> > Looking at edge case - 1 sample blocks - I should have <1 sample
latency,
> > but something should be  wrong here, e.g. required CPU power may be
> > infinite, no? :)
> >
> >
>  Higher block size and more overlap => higher CPU. The idea is to let
the
> block size be whatever block size you use for the rest of your program. For
> Speex, 20ms is perfect.
>  Lower overlap will give less CPU requirements, but also introduce
artifacts
> in the resampling.
Do you mean that using 10ms frames with 50% overlap will give you worse
quality then 20ms frames with 50% overlap, like when you're using 25%
overlap? Else, why not to work on smaller 10ms frames instead of 20ms
frames, or even 5ms frames - from your words it follows that you'll get
smaller latency and CPU usage in this case.
If I'm correct, it seems to better talk about overlap in terms of samples/ms
then in terms of percents, or it will confuse a lot.

-- 
Regards,
Alexander Chemeris.

SIPez LLC.
SIP VoIP, IM and Presence Consulting
http://www.SIPez.com
tel: +1 (617) 273-4000

Thorvald Natvig

2008-May-29 23:18 UTC

head link

[Speex-dev] FFT Resampler

Ok. I did some quality tests.

First off; never do quality tests with ints. I had serious problems 
interpreting my results until it dawned on me that the signal 
differences were just 0 or 1. So, after a lot of scratching my head, 
these are done comparing the result from the _float versions (which is 
how both resamplers work internally anyway).

What I did was this:
Load speex_wb.wav as one large chunk of data.
Pad data with as many zeroes as there are samples.
Convert to long double.
Use one long double FFT for the entire thing.
Insert or chop off zeroes so the new length is 
(input_length)*(sample_target)/(sample_source)
Use one long double iFFT for the entire thing.
We'll call the FFT and iFFT of this our reference.

Then, for each resampler below, I've reported the maximum numerical 
difference in the time domain(comparing ref[i] with sig[i]) as well as 
SNR. Since my knowledge of SNR for this is a bit sketchy, it's computed 
as follows:

Pad resampled signal with as many zeroes as there are samples.
Convert to long double.
Use one long double FFT for the entire thing.

Then, for both reference and resampled, let power[i] = sqrt(real[i]^2 + 
imag[i]^2). We only care about the lower half of this power (remember we 
padded with zeroes).
Then, let SNR = sum[all i] abs(ref_power[i] / resamp_power[i] - 1.0)
IE; SNR = 0 is a perfect signal. Everything else means the signal deviates.

There are 3 SNR values posted below. The first value is the 0->4khz 
range (which for 48khz output means the lower 1/6th of the power 
spectrum). The second is the 0->8khz range (full original signal), and 
the last is the full range.
The reason I split it is that the filter-based resampler has cutoff 
filter, so it zeroes out frequencies near the nyquist. So the SNR is 
unfair for the 0->8 range.

Anyway, on to the results.

First, a 16=>16 resampling.
Filt Q10: Diff 0.883327, SNR 3.12531e-07 / 0.472589
FFT 320: Diff 0.00292969, SNR 2.57974e-07 / 4.77473e-05
Both resamplers will recreate the original samples. The filter based 
does limit the upper part of the signal.

Both resamplers deal fine with 16=>48, so let's skip directly to
16=>44.1:
Filt Q0: 2.57e-03 3.15e-01 7.51e-01
Filt Q1: 2.12e-04 4.29e-01 7.93e-01
Filt Q2: 1.33e-04 2.92e-01 7.43e-01
Filt Q3: 2.20e-05 9.20e-01 9.71e-01
Filt Q4: 1.96e-05
Filt Q5: 9.61e-06

FFT+0: 3.83e-02 1.91e-01 7.06e-01 (And you can clearly hear this)
FFT+16: 8.10e-03 6.18e-02 6.60e-01 (violates the resampler requirements 
and shifts frequencies slightly)
FFT+160: 1.14e-05 3.75e-03 6.39e-01 (shortest allowed overlap)

So, FFT160 is somewhere between Q4 and Q5. And it's 6 times faster than Q4.
Testing with twice the block and overlap length:
FFT 640/320: 1.13e-05 3.49e-03 6.38e-01
erm. Hm. Need more testing on that one, I think.

Moving to 16=>48, let's examine different block and overlap lengths:
160+16: 1.20e-05 3.55e-02 6.78e-01
160+80: 1.20e-05 4.02e-03 6.68e-01
320+32: 1.19e-05 7.97e-03 6.69e-01
320+160: 1.19e-05 3.82e-03 6.68e-01
320+320: 1.19e-05 3.60e-03 6.68e-01

While this is not nearly enough data, it seems longer overlap reduces 
artifacts in the higher frequencies.

Finally, I do a 48=>44.1 test. (FFT base is now 960).
Filt Q9: 1.58e-05
FFT+160: 5.24e-07
FFT+320: 5.11e-07
FFT+480: 5.20e-07
Not bad, and for 48=>44.1, FFT+160 runs three times faster than Q0 (and 
~50 times faster than Q9 ;))

So if you can survive a bit of latency, this should give you decent 
results very quickly. (Do remember that latency though, it's not 
insignificant).
There's still work to be done looking at aliasing; I'll need to use a 
signal with frequencies closer to the nyquist to do that.

Maybe Matching Threads

Search for more reasonably related threads

Speex dev - May 2008 - FFT Resampler

[Speex-dev] FFT Resampler

[Speex-dev] FFT Resampler

[Speex-dev] FFT Resampler

Maybe Matching Threads