lvqcl
2013-Sep-17 18:21 UTC
[flac-dev] Performance and precompute_partition_info_sums_32bit_asm_ia32_()
Previously I wrote that precompute_partition_info_sums_32bit_asm_ia32_() only makes encoding slower. Now I managed to compile flac with GCC 4.8.1, with this function enabled and disabled. NASM was enabled, SSE intrinsics disabled. Then I added -msse option (so that all C code was compiled with -msse), then -msse2 and so on. Input file for test: 44.1kHz/16bit/stereo; best compression mode (flac -8); CPU = Core i7. Here are the results (1st column: SSE instruction set, 2nd column: the state of precompute_partition_info_sums_32bit_asm_ia32_(), 3rd column: encoding time in seconds, smaller=better): no SSE disabled 53.9 no SSE enabled 55.2 SSE1 disabled 53.9 SSE1 enabled 55.3 SSE2 disabled 51.9 SSE2 enabled 53.1 SSE3 disabled 51.8 SSE3 enabled 53.2 SSSE3 disabled 45.7 SSSE3 enabled 51.4 SSE41 disabled 46.1 SSE41 enabled 51.6 SSE42 disabled 46.1 SSE42 enabled 51.6 Conclusions: 1) flac is always faster when precompute_partition_info_sums_32bit_asm_ia32_() is disabled. 2) Some C code benefits noticeably from SSSE3 instructions; at least when compiled with GCC 4.8.1.
Erik de Castro Lopo
2013-Sep-25 13:01 UTC
[flac-dev] Performance and precompute_partition_info_sums_32bit_asm_ia32_()
lvqcl wrote:> Conclusions: > > 1) flac is always faster when precompute_partition_info_sums_32bit_asm_ia32_() > is disabled.Maybe its time to either improve or remove that code. Erik -- ---------------------------------------------------------------------- Erik de Castro Lopo http://www.mega-nerd.com/
Erik de Castro Lopo wrote:> Maybe its time to either improve or remove that code.I posted a patch that adds SSE2/SSSE3 versions of precompute_partition_info_sums_(). I didn't remove the old code; just changed "#if defined ..." to "#if defined ... && 0". Currently there are two useless asm files: * bitreader_asm.nasm (defines FLAC__bitreader_read_rice_signed_block_asm_ia32_bswap) * stream_encoder_asm.nasm (defines precompute_partition_info_sums_32bit_asm_ia32_) and I doubt that they will become useful again. BTW, after removing bitreader_asm.nasm from build it will be possible to remove local_bitreader_read_rice_signed_block from struct FLAC__StreamDecoderPrivate and call FLAC__bitreader_read_rice_signed_block() directly (IOW, revert the patch http://git.xiph.org/?p=flac.git;a=commit;h=ddddff6a5604da5c7223a075e58ca532d7ad375d ). Or to make bitreader_read_from_client_() function static again (it was static before http://git.xiph.org/?p=flac.git;a=commit;h=c63cf41cccba25a268f235e83ed8603adc0ac3ec ).