As I wrote earlier, GCC generates slow ia32 code for FLAC__lpc_compute_residual_from_qlp_coefficients_wide() and FLAC__lpc_restore_signal_wide(). So 24-bit encoding/decoding is slower for GCC compile than for MSVS or ICC compile. I took FLAC__lpc_compute_residual_from_qlp_coefficients_asm_ia32 and FLAC__lpc_restore_signal_asm_ia32 asm functions and wrote their _wide versions. -------------- next part -------------- A non-text attachment was scrubbed... Name: wide_asm.patch Type: application/octet-stream Size: 15653 bytes Desc: not available Url : http://lists.xiph.org/pipermail/flac-dev/attachments/20140103/45594e52/attachment-0001.obj
Erik de Castro Lopo
2014-Jan-07 10:37 UTC
[flac-dev] PATCH: asm versions for two _wide() functions
lvqcl wrote:> As I wrote earlier, GCC generates slow ia32 code for FLAC__lpc_compute_residual_from_qlp_coefficients_wide() > and FLAC__lpc_restore_signal_wide(). So 24-bit encoding/decoding is slower > for GCC compile than for MSVS or ICC compile. > > I took FLAC__lpc_compute_residual_from_qlp_coefficients_asm_ia32 > and FLAC__lpc_restore_signal_asm_ia32 asm functions and wrote their _wide > versions.Patch applied. Thanks. I'l do a little more testing on this and the other patches before pushing to git. Cheers, Erik -- ---------------------------------------------------------------------- Erik de Castro Lopo http://www.mega-nerd.com/
Erik de Castro Lopo wrote:> I'l do a little more testing on this and the other patches before pushing > to git.According to my tests, the speed increase after the patch that changes "call .get_eip0 / pop eax" to "call .mov_eip_to_eax / mov eax, [esp] / ret" is negligible or absent. OTOH, libFLAC is a very wide-spread library, and it's better to do things in a recommended way.