>> Yes, I plan to use it in a VoIP environment if I can get latency reduced to >> an acceptable level :) >> The latency depends directly on the overlap parameter, which also controls >> the quality. Higher quality => higher latency. You could set the overlap to >> 0, but that would give you some nasty artifacts. >> You can also resample with smaller block sizes. In the example I used 20ms >> blocks and 50% overlap. If you use 10ms blocks and 50% overlap, latency >> sinks to 5ms. >> > > How quality and CPU usage depends from block size? > Looking at edge case - 1 sample blocks - I should have <1 sample latency, > but something should be wrong here, e.g. required CPU power may be > infinite, no? :) >Higher block size and more overlap => higher CPU. The idea is to let the block size be whatever block size you use for the rest of your program. For Speex, 20ms is perfect. Lower overlap will give less CPU requirements, but also introduce artifacts in the resampling.>> It could be 50% overlap is complete overkill. It could be it's not enough. >> It could be how much overlap you need depends on the block size. I need to >> do some quality testing before I can say for sure :) >> > *nod* > Also comparison with standard resampler would be very useful. Probably > the best way would be to take standard resampler as a reference > and see how much latency/CPU is needed to reach each of its complexity > levels. This would give clear idea of what's going, even for people > who unaware of implementation details. >That's the idea, I just haven't gotten that far :)> >>> What is input and output latency? As a user, I think there is only one >>> >> latency, >> >>> latency between data I passed to resampler and data I've got from it. >>> I suppose there may be some internal idea behind this division of latency, >>> but is end user interested in it? >>> >>> >>> >> It's copied directly from the speex_resampler.h; it's the same latency, but >> measured in input and output samples. So if I resample from a blocksize of >> 320 to 960 with 50% overlap, the input latency is 160 samples and the output >> latency is 480 samples. >> > > Aha, got it. Documentation is very unclear at this point. =\ It should be > something like what you've written, e.g. "Latency, measured in samples > at input (output) sample rate". >I must admit I didn't get the difference either until I checked the source for the original resampler. I'll see if we can't reword both of them to make it more clear.
On 5/29/08, Thorvald Natvig <thorvald at natvig.com> wrote:> > > Yes, I plan to use it in a VoIP environment if I can get latency reduced to > > > an acceptable level :) > > > The latency depends directly on the overlap parameter, which also controls > > > the quality. Higher quality => higher latency. You could set the overlap to > > > 0, but that would give you some nasty artifacts. > > > You can also resample with smaller block sizes. In the example I used > 20ms > > > blocks and 50% overlap. If you use 10ms blocks and 50% overlap, latency > > > sinks to 5ms. > > > > How quality and CPU usage depends from block size? > > Looking at edge case - 1 sample blocks - I should have <1 sample latency, > > but something should be wrong here, e.g. required CPU power may be > > infinite, no? :) > > > > > Higher block size and more overlap => higher CPU. The idea is to let the > block size be whatever block size you use for the rest of your program. For > Speex, 20ms is perfect. > Lower overlap will give less CPU requirements, but also introduce artifacts > in the resampling.Do you mean that using 10ms frames with 50% overlap will give you worse quality then 20ms frames with 50% overlap, like when you're using 25% overlap? Else, why not to work on smaller 10ms frames instead of 20ms frames, or even 5ms frames - from your words it follows that you'll get smaller latency and CPU usage in this case. If I'm correct, it seems to better talk about overlap in terms of samples/ms then in terms of percents, or it will confuse a lot. -- Regards, Alexander Chemeris. SIPez LLC. SIP VoIP, IM and Presence Consulting http://www.SIPez.com tel: +1 (617) 273-4000
Ok. I did some quality tests. First off; never do quality tests with ints. I had serious problems interpreting my results until it dawned on me that the signal differences were just 0 or 1. So, after a lot of scratching my head, these are done comparing the result from the _float versions (which is how both resamplers work internally anyway). What I did was this: Load speex_wb.wav as one large chunk of data. Pad data with as many zeroes as there are samples. Convert to long double. Use one long double FFT for the entire thing. Insert or chop off zeroes so the new length is (input_length)*(sample_target)/(sample_source) Use one long double iFFT for the entire thing. We'll call the FFT and iFFT of this our reference. Then, for each resampler below, I've reported the maximum numerical difference in the time domain(comparing ref[i] with sig[i]) as well as SNR. Since my knowledge of SNR for this is a bit sketchy, it's computed as follows: Pad resampled signal with as many zeroes as there are samples. Convert to long double. Use one long double FFT for the entire thing. Then, for both reference and resampled, let power[i] = sqrt(real[i]^2 + imag[i]^2). We only care about the lower half of this power (remember we padded with zeroes). Then, let SNR = sum[all i] abs(ref_power[i] / resamp_power[i] - 1.0) IE; SNR = 0 is a perfect signal. Everything else means the signal deviates. There are 3 SNR values posted below. The first value is the 0->4khz range (which for 48khz output means the lower 1/6th of the power spectrum). The second is the 0->8khz range (full original signal), and the last is the full range. The reason I split it is that the filter-based resampler has cutoff filter, so it zeroes out frequencies near the nyquist. So the SNR is unfair for the 0->8 range. Anyway, on to the results. First, a 16=>16 resampling. Filt Q10: Diff 0.883327, SNR 3.12531e-07 / 0.472589 FFT 320: Diff 0.00292969, SNR 2.57974e-07 / 4.77473e-05 Both resamplers will recreate the original samples. The filter based does limit the upper part of the signal. Both resamplers deal fine with 16=>48, so let's skip directly to 16=>44.1: Filt Q0: 2.57e-03 3.15e-01 7.51e-01 Filt Q1: 2.12e-04 4.29e-01 7.93e-01 Filt Q2: 1.33e-04 2.92e-01 7.43e-01 Filt Q3: 2.20e-05 9.20e-01 9.71e-01 Filt Q4: 1.96e-05 Filt Q5: 9.61e-06 FFT+0: 3.83e-02 1.91e-01 7.06e-01 (And you can clearly hear this) FFT+16: 8.10e-03 6.18e-02 6.60e-01 (violates the resampler requirements and shifts frequencies slightly) FFT+160: 1.14e-05 3.75e-03 6.39e-01 (shortest allowed overlap) So, FFT160 is somewhere between Q4 and Q5. And it's 6 times faster than Q4. Testing with twice the block and overlap length: FFT 640/320: 1.13e-05 3.49e-03 6.38e-01 erm. Hm. Need more testing on that one, I think. Moving to 16=>48, let's examine different block and overlap lengths: 160+16: 1.20e-05 3.55e-02 6.78e-01 160+80: 1.20e-05 4.02e-03 6.68e-01 320+32: 1.19e-05 7.97e-03 6.69e-01 320+160: 1.19e-05 3.82e-03 6.68e-01 320+320: 1.19e-05 3.60e-03 6.68e-01 While this is not nearly enough data, it seems longer overlap reduces artifacts in the higher frequencies. Finally, I do a 48=>44.1 test. (FFT base is now 960). Filt Q9: 1.58e-05 FFT+160: 5.24e-07 FFT+320: 5.11e-07 FFT+480: 5.20e-07 Not bad, and for 48=>44.1, FFT+160 runs three times faster than Q0 (and ~50 times faster than Q9 ;)) So if you can survive a bit of latency, this should give you decent results very quickly. (Do remember that latency though, it's not insignificant). There's still work to be done looking at aliasing; I'll need to use a signal with frequencies closer to the nyquist to do that.