The file src/share/replaygain_synthesis/include/private/fast_float_math_hack.h redefines 'tanh' as 'tanhf'. This file is intended for Intel Compiler only, but it includes outdated mathf.h and doesn't work with current versions of ICC. The fixes are trivial though, and I compiled 2 versions of flac.exe: with this 'hack' turned off an on. The difference in decoding speed is very close to measurement inaccuracy: for 32-bit encoder the decoding time decreases from 94.5s to 94.0s, for 64-bit it increases from 82.6s to 82.9s. (the option for this test was: --apply-replaygain-which-is-not-lossless=Ln0) So this hack is really useless today, and the first patch removes fast_float_math_hack.h from the sources. MSVS profiler shows that tanh calculation doesn't require too much CPU resources, the real problem is an integer division (int_64/int_32) in this line: val64 = dither_output_(........) / conv_factor; Since all possible values of conv_factor are powers of 2, it's possible to replace division with shift. The second patch does this. Decoding time decreases from 94.5s to 64.1s for 32-bit ICC compile, and from 82.6s to 50.0s for 64-bit ICC compile. ************************************************* P.S. Actually, shift ( x << n ) and division ( x / (1<<n) ) can give different results if x < 0. The difference is very small though: WAV files differ by 1 LSB. And probably shift gives better results than division. Let's compare shift by 2 and division by (1<<2) == 4: *** shift *** argument result .... 12, 13, 14, 15 -> 3 8, 9, 10, 11 -> 2 4, 5, 6, 7 -> 1 0, 1, 2, 3 -> 0 -4, -3, -2, -1 -> -1 -8, -7, -6, -5 -> -2 .... *** division *** argument result .... 12, 13, 14, 15 -> 3 8, 9, 10, 11 -> 2 4, 5, 6, 7 -> 1 -3, -2, -1, 0, 1, 2, 3 -> 0 -7, -6, -5, -4 -> -> -1 -11,-10,-9, -8 -> -> -2 .... So, shift results in small DC offset (1/2 LSB), division results in small 'nonlinearity' near 0. -------------- next part -------------- A non-text attachment was scrubbed... Name: 1__remove_ffmhack.patch Type: application/octet-stream Size: 2594 bytes Desc: not available Url : http://lists.xiph.org/pipermail/flac-dev/attachments/20140517/e2f5dcb0/attachment.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: 2__apply_gain.patch Type: application/octet-stream Size: 3073 bytes Desc: not available Url : http://lists.xiph.org/pipermail/flac-dev/attachments/20140517/e2f5dcb0/attachment-0001.obj
lvqcl wrote:> The file src/share/replaygain_synthesis/include/private/fast_float_math_hack.h > redefines 'tanh' as 'tanhf'. This file is intended for Intel Compiler only, > but it includes outdated mathf.h and doesn't work with current versions of ICC. > > The fixes are trivial though, and I compiled 2 versions of flac.exe: with this > 'hack' turned off an on. The difference in decoding speed is very close to > measurement inaccuracy: for 32-bit encoder the decoding time decreases from 94.5s > to 94.0s, for 64-bit it increases from 82.6s to 82.9s. > (the option for this test was: --apply-replaygain-which-is-not-lossless=Ln0) > > So this hack is really useless today, and the first patch removes > fast_float_math_hack.h from the sources.Both patches applied. Thanks. Cheers, Erik -- ---------------------------------------------------------------------- Erik de Castro Lopo http://www.mega-nerd.com/
I've not benchmarked to know if their is any real benefit, but changing the include in fast_float_math_hack.h to <mathimf.h> is all that is required to use the latest ICC. John On 17/05/2014 10:26, lvqcl wrote:> The file > src/share/replaygain_synthesis/include/private/fast_float_math_hack.h > redefines 'tanh' as 'tanhf'. This file is intended for Intel Compiler only, > but it includes outdated mathf.h and doesn't work with current versions > of ICC. > > The fixes are trivial though, and I compiled 2 versions of flac.exe: > with this > 'hack' turned off an on. The difference in decoding speed is very close to > measurement inaccuracy: for 32-bit encoder the decoding time decreases > from 94.5s > to 94.0s, for 64-bit it increases from 82.6s to 82.9s. > (the option for this test was: > --apply-replaygain-which-is-not-lossless=Ln0) > > So this hack is really useless today, and the first patch removes > fast_float_math_hack.h from the sources. > > > > > MSVS profiler shows that tanh calculation doesn't require too much CPU > resources, > the real problem is an integer division (int_64/int_32) in this line: > > val64 = dither_output_(........) / conv_factor; > > Since all possible values of conv_factor are powers of 2, it's possible to > replace division with shift. The second patch does this. > > Decoding time decreases from 94.5s to 64.1s for 32-bit ICC compile, and > from 82.6s to 50.0s for 64-bit ICC compile. > > > > ************************************************* > P.S. Actually, shift ( x << n ) and division ( x / (1<<n) ) can give > different results if x < 0. The difference is very small though: WAV files > differ by 1 LSB. And probably shift gives better results than division. > > Let's compare shift by 2 and division by (1<<2) == 4: > > *** shift *** > argument result > .... > 12, 13, 14, 15 -> 3 > 8, 9, 10, 11 -> 2 > 4, 5, 6, 7 -> 1 > 0, 1, 2, 3 -> 0 > -4, -3, -2, -1 -> -1 > -8, -7, -6, -5 -> -2 > .... > > *** division *** > argument result > .... > 12, 13, 14, 15 -> 3 > 8, 9, 10, 11 -> 2 > 4, 5, 6, 7 -> 1 > -3, -2, -1, 0, 1, 2, 3 -> 0 > -7, -6, -5, -4 -> -> -1 > -11,-10,-9, -8 -> -> -2 > .... > > > So, shift results in small DC offset (1/2 LSB), division results in > small 'nonlinearity' near 0. > > > _______________________________________________ > flac-dev mailing list > flac-dev at xiph.org > http://lists.xiph.org/mailman/listinfo/flac-dev >
John Edwards wrote:> I've not benchmarked to know if their is any real benefit, but changing > the include in fast_float_math_hack.h to <mathimf.h> is all that is > required to use the latest ICC. > > JohnWell, it was also was necessary to change replaygain_synthesis.c: the inclusion on private/fast_float_math_hack.h should be before the inclusion of math.h. But yes, that's all that was necessary to compile this project.