I downloaded current version of FLAC sources and compiled it with:
* GCC 4.8.1 (MSYS from http://xhmikosr.1f0.de/tools/)
* Intel C++ Composer XE 2013 update 5
* MSVS 2010 SP1
* MSVS 2012 update 3
(SSSE3 and SSE4.1 code was disabled for all compilers)
Stereo 24-bit WAV file was encoded with -8 preset.
Encoding time, in seconds:
GCC 32-bit: 209
ICC 32-bit: 130
VS10 32-bit: 116
VS12 32-bit: 114
GCC 64-bit: 79.5
ICC 64-bit: 81.2
VS10 64-bit: 81.1
VS12 64-bit: 83.3
According to a profiler, FLAC__lpc_compute_residual_from_qlp_coefficients_wide()
is one of the most CPU consuming. I added __restrict keyword to its parameters.
before it was:
void FLAC__lpc_compute_residual_from_qlp_coefficients_wide(const FLAC__int32
*data,
unsigned data_len, const FLAC__int32 qlp_coeff[], unsigned order,
int lp_quantization, FLAC__int32 residual[])
after:
void FLAC__lpc_compute_residual_from_qlp_coefficients_wide(const FLAC__int32 *
__restrict data,
unsigned data_len, const FLAC__int32 * __restrict qlp_coeff, unsigned
order,
int lp_quantization, FLAC__int32 * __restrict residual)
Encoding time, in seconds:
GCC 32-bit: 180 (16% speedup)
ICC 32-bit: 121 (7.5%)
VS10 32-bit: 439 (sic!)
VS12 32-bit: 440 (sic!)
GCC 64-bit: 72.8 (9%)
ICC 64-bit: 75.0 (8%)
VS10 64-bit: 75.7 (7%)
VS12 64-bit: 77.7 (7%)
Also I wonder what other functions can also benefit from `restrict'
keyword?..
Erik de Castro Lopo
2013-Oct-09 10:42 UTC
[flac-dev] Again about encoding speed of different compiles
lvqcl wrote:> Encoding time, in seconds: > GCC 32-bit: 180 (16% speedup) > ICC 32-bit: 121 (7.5%) > VS10 32-bit: 439 (sic!) > VS12 32-bit: 440 (sic!) > > GCC 64-bit: 72.8 (9%) > ICC 64-bit: 75.0 (8%) > VS10 64-bit: 75.7 (7%) > VS12 64-bit: 77.7 (7%) > > > Also I wonder what other functions can also benefit from `restrict' > keyword?..For others to reference, this is the GCC documentations on __restrict: http://gcc.gnu.org/onlinedocs/gcc/Restricted-Pointers.html Googling suggests that use of restrict is a little controvertial. Eg: http://stackoverflow.com/questions/2005473/rules-for-using-the-restrict-keyword-in-c http://cellperformance.beyond3d.com/articles/2006/05/demystifying-the-restrict-keyword.html http://blog.frama-c.com/index.php?post/2012/07/25/On-the-redundancy-of-C99-s-restrict Also, do you have any idea why this causes such a slow down in VS10 and VS12? Erik -- ---------------------------------------------------------------------- Erik de Castro Lopo http://www.mega-nerd.com/
Erik de Castro Lopo wrote:> Googling suggests that use of restrict is a little controvertial.Maybe, but Opus encoder uses this keyword in its en-/decoding routines. So I think it's not dangerous.> Also, do you have any idea why this causes such a slow down in VS10 and > VS12?Without __restrict, VS12 generates the following pattern: ... mov eax, DWORD PTR [edi+32] adc esi, edx imul DWORD PTR [ebx+8] add ecx, eax ...etc... With __restrict: ... mov eax, DWORD PTR [ecx+16] cdq mov DWORD PTR tv7279[esp+116], eax mov DWORD PTR tv7278[esp+116], edx ...etc... followed by ... push DWORD PTR tv7278[esp+116] mov eax, DWORD PTR [ebx+24] push DWORD PTR tv7279[esp+120] adc edi, edx cdq push edx push eax call __allmul add esi, eax ...etc... I have no idea why it does this. Maybe it tries to "optimize" 32bit * 32bit -> 64bit multiplication?