I wrote a patch that enables FLAC__BYTES_PER_WORD==8 in libFLAC/bitreader.c and libFLAC\bitwriter.c. The tests were done on an Intel Nehalem CPU, and flac was compiled with CGG 4.9.x. Average speed increase for FLAC__BYTES_PER_WORD change from 4 to 8: Decoding speed: ia32 architecture 16-bit .flac: -15% 24-bit .flac: -11% x86-64 architecture 16-bit .flac: +3% 24-bit .flac: -0.6% Encoding speed (only fastest presets (-0...-5) were tested): ia32 architecture 16-bit .wav: +0.6% 24-bit .wav: +3% x86-64 architecture 16-bit .wav: +6% 24-bit .wav: +7%
On 29 December 2015 at 17:10, lvqcl <lvqcl.mail at gmail.com> wrote:> I wrote a patch that enables FLAC__BYTES_PER_WORD==8 in > libFLAC/bitreader.c and libFLAC\bitwriter.c. > The tests were done on an Intel Nehalem CPU, and flac was compiled > with CGG 4.9.x.If you want to share the patch, I am happy to repeat some testing on Sandy Bridge and Core2 with clang.> Average speed increase for FLAC__BYTES_PER_WORD change from 4 to 8: > [...]The slower decoding speed for 24 bit content on x86_64 seems surprising, but minor. However losing 15% decoding speed on i386 would be very bad. Riggs
Thomas Zander wrote:> If you want to share the patch, I am happy to repeat some testing on > Sandy Bridge and Core2 with clang.The patch changes many files, libFLAC/bitwriter.c and test_libFLAC/bitwriter.c among them. So now I wait for the decision for patches #3 and #4 that I posted yesterday.> The slower decoding speed for 24 bit content on x86_64 seems > surprising, but minor. > However losing 15% decoding speed on i386 would be very bad.So, does it make sense to #define FLAC__BYTES_PER_WORD (in bitreader.c) as 4 for 32-bit and as 8 for 64-bit targets?