Erik de Castro Lopo wrote:> The think in and ideal world we would a: > > * Make it work correctly FLAC__BYTES_PER_WORD == 8 and compare the performance > with FLAC__BYTES_PER_WORD == 4. > * If there is an statistically measurable performance, keep it, otherwise > remove the FLAC__BYTES_PER_WORD == 8 code all together.I'll try to do it, but I don't have a deep understanding of bit(read|write) routines such as FLAC__bitreader_read_rice_signed_block() and other. Maybe Miroslav Lichvar can say something?> Since you seem to think there little to be gainnedNo, I just don't know what performance improvement can be obtained (and of course FLAC is very fast already).
On Sun, Dec 20, 2015 at 01:30:57PM +0300, lvqcl wrote:> Erik de Castro Lopo wrote: > > > The think in and ideal world we would a: > > > > * Make it work correctly FLAC__BYTES_PER_WORD == 8 and compare the performance > > with FLAC__BYTES_PER_WORD == 4. > > * If there is an statistically measurable performance, keep it, otherwise > > remove the FLAC__BYTES_PER_WORD == 8 code all together. > > I'll try to do it, but I don't have a deep understanding of bit(read|write) > routines such as FLAC__bitreader_read_rice_signed_block() and other. > Maybe Miroslav Lichvar can say something?Using larger registers in the bitreader/bitwriter should reduce the number of instructions needed when reading/writing the encoded stream as fewer words need to be read/written and encoded values are less likely to be split across two words. In the measurements comparing 32-bit and 64-bit words you posted later, I'm wondering why there is a slow down in decoding of 24-bit files. Does profiling show it comes from the read_rice_signed_block() function or something else? Could it be slower SWAP_BE_WORD_TO_HOST or COUNT_ZERO_MSBS2? -- Miroslav Lichvar
Miroslav Lichvar wrote:> In the measurements comparing 32-bit and 64-bit words you posted > later, I'm wondering why there is a slow down in decoding of 24-bit > files. Does profiling show it comes from the read_rice_signed_block() > function or something else? Could it be slower SWAP_BE_WORD_TO_HOST or > COUNT_ZERO_MSBS2?The difference is so small that it's difficult to find a place where FLAC__BYTES_PER_WORD==8 is slower than 4. Also I don't know how to perform profiling in MSYS/MinGW ecosystem. Profiling in MSVS2015 hints that for some reason crc16_update_word_() may take more time if FLAC__BYTES_PER_WORD==8. Maybe the new code (the patch is attached) is faster? Currently I can't measure the difference with high enough precision, the measurement error on my current system is too high :( -------------- next part -------------- A non-text attachment was scrubbed... Name: crc16.patch Type: application/octet-stream Size: 1734 bytes Desc: not available Url : http://lists.xiph.org/pipermail/flac-dev/attachments/20160104/ff938e94/attachment.obj