Martijn van Beurden
2014-Aug-01 17:51 UTC
[flac-dev] Fix and question apodization functions
Hi, I was doing some speed and compression comparisons with various apodization/windowing functions, and found out that the definitions for the bartlett and bartlett_hann window in the FLAC codebase have been wrong since their introduction. The attached patch fixes that. Furthermore, I found some peculiar behaviour of the gauss apodization that seems to expose bug. Using different windows does usually not change the encoding speed (only the number of them), but when I use -A gauss(0,05) (or gauss(0.05), it's locale-specific), the encoding time doubles. It seems to be when using STDDEV parameters between 0.1 and 0.01. This is also the case when using FLAC 1.3.0. I used gprof to check and it seems that the time spend in the function FLAC__lpc_compute_autocorrelation_intrin_sse_lag_16 increases 20-fold while the number of calls stays the same. Could this be a bug? -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-Fix-apodization-functions.patch Type: text/x-patch Size: 0 bytes Desc: not available Url : http://lists.xiph.org/pipermail/flac-dev/attachments/20140801/5efdce17/attachment.bin
Martijn van Beurden wrote:> Hi, > > I was doing some speed and compression comparisons with various > apodization/windowing functions, and found out that the > definitions for the bartlett and bartlett_hann window in the > FLAC codebase have been wrong since their introduction. The > attached patch fixes that.How it affects compression ratio?> Furthermore, I found some peculiar behaviour of the gauss > apodization that seems to expose bug. Using different windows > does usually not change the encoding speed (only the number of > them), but when I use -A gauss(0,05) (or gauss(0.05), it's > locale-specific), the encoding time doubles. It seems to be when > using STDDEV parameters between 0.1 and 0.01. This is also the > case when using FLAC 1.3.0. > > I used gprof to check and it seems that the time spend in the > function FLAC__lpc_compute_autocorrelation_intrin_sse_lag_16 > increases 20-fold while the number of calls stays the same. > Could this be a bug?<http://en.wikipedia.org/wiki/Denormal_number#Performance_issues> ? Try something like this: void FLAC__window_gauss(FLAC__real *window, const FLAC__int32 L, const FLAC__real stddev) { static const double anti_denormal = 0.88817841970012523233890533447266e-15; /* 2e-50 */ const FLAC__int32 N = L - 1; const double N2 = (double)N / 2.; FLAC__int32 n; for (n = 0; n <= N; n++) { const double k = ((double)n - N2) / (stddev * N2); window[n] = (FLAC__real)exp(-0.5f * k * k); window[n] += anti_denormal; window[n] -= anti_denormal; } }
lvqcl wrote:> Try something like this: > > void FLAC__window_gauss(FLAC__real *window, const FLAC__int32 L, const FLAC__real stddev) > { > static const double anti_denormal = 0.88817841970012523233890533447266e-15; /* 2e-50 */ > const FLAC__int32 N = L - 1;Sorry, it was copy-pasted from another program where I use doubles. But window[] has FLAC__real (==float) type, so anti_denormal should also be FLAC__real for performance reasons. Also, maybe a value of anti_denormal is not optimal, I currently don't remember why I decided to use this value. According to <http://musicdsp.org/files/denormal.pdf>, 3.1.2: "For example if the whole calculation is done in the FPU registers, a 80-bit arithmetic may be used, with 64-bit mantissas. The anti_denormal value should therefore be 2^64 times higher than FLT_MIN."
Martijn van Beurden
2014-Aug-02 06:47 UTC
[flac-dev] Fix and question apodization functions
Op 02-08-14 om 05:58 schreef lvqcl:> How it affects compression ratio?It does not. These apodization functions are only used when requested by the user, and according to my measurements they are not the best choice (i.e., there are window functions that perform better on most material) Still, as they are in the codebase, I thought fixing them would be a good idea.> <http://en.wikipedia.org/wiki/Denormal_number#Performance_issues> ?Thanks for the lead! I'll take a look. I could zero the function below a certain limit as well, looks like that solves it as well. Not that it is very important: those windows don't seem to perform very well anyway.