The file src/share/replaygain_synthesis/include/private/fast_float_math_hack.h
redefines 'tanh' as 'tanhf'. This file is intended for Intel
Compiler only,
but it includes outdated mathf.h and doesn't work with current versions of
ICC.
The fixes are trivial though, and I compiled 2 versions of flac.exe: with this
'hack' turned off an on. The difference in decoding speed is very close
to
measurement inaccuracy: for 32-bit encoder the decoding time decreases from
94.5s
to 94.0s, for 64-bit it increases from 82.6s to 82.9s.
(the option for this test was: --apply-replaygain-which-is-not-lossless=Ln0)
So this hack is really useless today, and the first patch removes
fast_float_math_hack.h from the sources.
MSVS profiler shows that tanh calculation doesn't require too much CPU
resources,
the real problem is an integer division (int_64/int_32) in this line:
val64 = dither_output_(........) / conv_factor;
Since all possible values of conv_factor are powers of 2, it's possible to
replace division with shift. The second patch does this.
Decoding time decreases from 94.5s to 64.1s for 32-bit ICC compile, and
from 82.6s to 50.0s for 64-bit ICC compile.
*************************************************
P.S. Actually, shift ( x << n ) and division ( x / (1<<n) ) can give
different results if x < 0. The difference is very small though: WAV files
differ by 1 LSB. And probably shift gives better results than division.
Let's compare shift by 2 and division by (1<<2) == 4:
*** shift ***
argument result
....
12, 13, 14, 15 -> 3
8, 9, 10, 11 -> 2
4, 5, 6, 7 -> 1
0, 1, 2, 3 -> 0
-4, -3, -2, -1 -> -1
-8, -7, -6, -5 -> -2
....
*** division ***
argument result
....
12, 13, 14, 15 -> 3
8, 9, 10, 11 -> 2
4, 5, 6, 7 -> 1
-3, -2, -1, 0, 1, 2, 3 -> 0
-7, -6, -5, -4 -> -> -1
-11,-10,-9, -8 -> -> -2
....
So, shift results in small DC offset (1/2 LSB), division results in
small 'nonlinearity' near 0.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 1__remove_ffmhack.patch
Type: application/octet-stream
Size: 2594 bytes
Desc: not available
Url :
http://lists.xiph.org/pipermail/flac-dev/attachments/20140517/e2f5dcb0/attachment.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 2__apply_gain.patch
Type: application/octet-stream
Size: 3073 bytes
Desc: not available
Url :
http://lists.xiph.org/pipermail/flac-dev/attachments/20140517/e2f5dcb0/attachment-0001.obj
lvqcl wrote:> The file src/share/replaygain_synthesis/include/private/fast_float_math_hack.h > redefines 'tanh' as 'tanhf'. This file is intended for Intel Compiler only, > but it includes outdated mathf.h and doesn't work with current versions of ICC. > > The fixes are trivial though, and I compiled 2 versions of flac.exe: with this > 'hack' turned off an on. The difference in decoding speed is very close to > measurement inaccuracy: for 32-bit encoder the decoding time decreases from 94.5s > to 94.0s, for 64-bit it increases from 82.6s to 82.9s. > (the option for this test was: --apply-replaygain-which-is-not-lossless=Ln0) > > So this hack is really useless today, and the first patch removes > fast_float_math_hack.h from the sources.Both patches applied. Thanks. Cheers, Erik -- ---------------------------------------------------------------------- Erik de Castro Lopo http://www.mega-nerd.com/
I've not benchmarked to know if their is any real benefit, but changing the include in fast_float_math_hack.h to <mathimf.h> is all that is required to use the latest ICC. John On 17/05/2014 10:26, lvqcl wrote:> The file > src/share/replaygain_synthesis/include/private/fast_float_math_hack.h > redefines 'tanh' as 'tanhf'. This file is intended for Intel Compiler only, > but it includes outdated mathf.h and doesn't work with current versions > of ICC. > > The fixes are trivial though, and I compiled 2 versions of flac.exe: > with this > 'hack' turned off an on. The difference in decoding speed is very close to > measurement inaccuracy: for 32-bit encoder the decoding time decreases > from 94.5s > to 94.0s, for 64-bit it increases from 82.6s to 82.9s. > (the option for this test was: > --apply-replaygain-which-is-not-lossless=Ln0) > > So this hack is really useless today, and the first patch removes > fast_float_math_hack.h from the sources. > > > > > MSVS profiler shows that tanh calculation doesn't require too much CPU > resources, > the real problem is an integer division (int_64/int_32) in this line: > > val64 = dither_output_(........) / conv_factor; > > Since all possible values of conv_factor are powers of 2, it's possible to > replace division with shift. The second patch does this. > > Decoding time decreases from 94.5s to 64.1s for 32-bit ICC compile, and > from 82.6s to 50.0s for 64-bit ICC compile. > > > > ************************************************* > P.S. Actually, shift ( x << n ) and division ( x / (1<<n) ) can give > different results if x < 0. The difference is very small though: WAV files > differ by 1 LSB. And probably shift gives better results than division. > > Let's compare shift by 2 and division by (1<<2) == 4: > > *** shift *** > argument result > .... > 12, 13, 14, 15 -> 3 > 8, 9, 10, 11 -> 2 > 4, 5, 6, 7 -> 1 > 0, 1, 2, 3 -> 0 > -4, -3, -2, -1 -> -1 > -8, -7, -6, -5 -> -2 > .... > > *** division *** > argument result > .... > 12, 13, 14, 15 -> 3 > 8, 9, 10, 11 -> 2 > 4, 5, 6, 7 -> 1 > -3, -2, -1, 0, 1, 2, 3 -> 0 > -7, -6, -5, -4 -> -> -1 > -11,-10,-9, -8 -> -> -2 > .... > > > So, shift results in small DC offset (1/2 LSB), division results in > small 'nonlinearity' near 0. > > > _______________________________________________ > flac-dev mailing list > flac-dev at xiph.org > http://lists.xiph.org/mailman/listinfo/flac-dev >
John Edwards wrote:> I've not benchmarked to know if their is any real benefit, but changing > the include in fast_float_math_hack.h to <mathimf.h> is all that is > required to use the latest ICC. > > JohnWell, it was also was necessary to change replaygain_synthesis.c: the inclusion on private/fast_float_math_hack.h should be before the inclusion of math.h. But yes, that's all that was necessary to compile this project.