I downloaded current version of FLAC sources and compiled it with: * GCC 4.8.1 (MSYS from http://xhmikosr.1f0.de/tools/) * Intel C++ Composer XE 2013 update 5 * MSVS 2010 SP1 * MSVS 2012 update 3 (SSSE3 and SSE4.1 code was disabled for all compilers) Stereo 24-bit WAV file was encoded with -8 preset. Encoding time, in seconds: GCC 32-bit: 209 ICC 32-bit: 130 VS10 32-bit: 116 VS12 32-bit: 114 GCC 64-bit: 79.5 ICC 64-bit: 81.2 VS10 64-bit: 81.1 VS12 64-bit: 83.3 According to a profiler, FLAC__lpc_compute_residual_from_qlp_coefficients_wide() is one of the most CPU consuming. I added __restrict keyword to its parameters. before it was: void FLAC__lpc_compute_residual_from_qlp_coefficients_wide(const FLAC__int32 *data, unsigned data_len, const FLAC__int32 qlp_coeff[], unsigned order, int lp_quantization, FLAC__int32 residual[]) after: void FLAC__lpc_compute_residual_from_qlp_coefficients_wide(const FLAC__int32 * __restrict data, unsigned data_len, const FLAC__int32 * __restrict qlp_coeff, unsigned order, int lp_quantization, FLAC__int32 * __restrict residual) Encoding time, in seconds: GCC 32-bit: 180 (16% speedup) ICC 32-bit: 121 (7.5%) VS10 32-bit: 439 (sic!) VS12 32-bit: 440 (sic!) GCC 64-bit: 72.8 (9%) ICC 64-bit: 75.0 (8%) VS10 64-bit: 75.7 (7%) VS12 64-bit: 77.7 (7%) Also I wonder what other functions can also benefit from `restrict' keyword?..
Erik de Castro Lopo
2013-Oct-09 10:42 UTC
[flac-dev] Again about encoding speed of different compiles
lvqcl wrote:> Encoding time, in seconds: > GCC 32-bit: 180 (16% speedup) > ICC 32-bit: 121 (7.5%) > VS10 32-bit: 439 (sic!) > VS12 32-bit: 440 (sic!) > > GCC 64-bit: 72.8 (9%) > ICC 64-bit: 75.0 (8%) > VS10 64-bit: 75.7 (7%) > VS12 64-bit: 77.7 (7%) > > > Also I wonder what other functions can also benefit from `restrict' > keyword?..For others to reference, this is the GCC documentations on __restrict: http://gcc.gnu.org/onlinedocs/gcc/Restricted-Pointers.html Googling suggests that use of restrict is a little controvertial. Eg: http://stackoverflow.com/questions/2005473/rules-for-using-the-restrict-keyword-in-c http://cellperformance.beyond3d.com/articles/2006/05/demystifying-the-restrict-keyword.html http://blog.frama-c.com/index.php?post/2012/07/25/On-the-redundancy-of-C99-s-restrict Also, do you have any idea why this causes such a slow down in VS10 and VS12? Erik -- ---------------------------------------------------------------------- Erik de Castro Lopo http://www.mega-nerd.com/
Erik de Castro Lopo wrote:> Googling suggests that use of restrict is a little controvertial.Maybe, but Opus encoder uses this keyword in its en-/decoding routines. So I think it's not dangerous.> Also, do you have any idea why this causes such a slow down in VS10 and > VS12?Without __restrict, VS12 generates the following pattern: ... mov eax, DWORD PTR [edi+32] adc esi, edx imul DWORD PTR [ebx+8] add ecx, eax ...etc... With __restrict: ... mov eax, DWORD PTR [ecx+16] cdq mov DWORD PTR tv7279[esp+116], eax mov DWORD PTR tv7278[esp+116], edx ...etc... followed by ... push DWORD PTR tv7278[esp+116] mov eax, DWORD PTR [ebx+24] push DWORD PTR tv7279[esp+120] adc edi, edx cdq push edx push eax call __allmul add esi, eax ...etc... I have no idea why it does this. Maybe it tries to "optimize" 32bit * 32bit -> 64bit multiplication?