MSVS profiler shows that the following code in stream_encoder.c takes several percent of CPU time: for(rice_parameter = 0, k = partition_samples; k < mean; rice_parameter++, k <<= 1) ; this code is equivalent to: rice_parameter = 0; k = partition_samples; while(k < mean) { rice_parameter++; k <<= 1; } The idea was to accelerate it: rice_parameter = 0; k = partition_samples; while(k*2 < mean) { rice_parameter+=2; k <<= 2; } while(k < mean) { rice_parameter++; k <<= 1; } or: rice_parameter = 0; k = partition_samples; while(k*4 < mean) { rice_parameter+=3; k <<= 3; } while(k < mean) { rice_parameter++; k <<= 1; } After tuning the code for 16-/24-bit WAV and 32-/64-bit compiles I wrote more complex code (see attach). It doesn't look pretty but it's faster than the current version. For highest compression preset, 24-bit input and 32-bit exe the encoding speed increases by 6..7%. -------------- next part -------------- A non-text attachment was scrubbed... Name: rice_parameter.patch Type: application/octet-stream Size: 1616 bytes Desc: not available Url : http://lists.xiph.org/pipermail/flac-dev/attachments/20131009/4934c85d/attachment.obj
lvqcl wrote:> MSVS profiler shows that the following code in stream_encoder.c takes > several percent of CPU time:This has been applied. Erik -- ---------------------------------------------------------------------- Erik de Castro Lopo http://www.mega-nerd.com/
Hmm, maybe I'm missing something, but what about this: rice_parameter = 0; k = partition_samples; int n = mean - k; if (n > 0) { rice_parameter += n; k <<= n; } I've not looked at this code in its context within stream_encoder.c, so it's easily possible that I left out something. Brian Willoughby Sound Consulting On Oct 9, 2013, at 08:54, lvqcl wrote:> MSVS profiler shows that the following code in stream_encoder.c takes > several percent of CPU time: > > for(rice_parameter = 0, k = partition_samples; k < mean; > rice_parameter++, k <<= 1) > ; > > this code is equivalent to: > > rice_parameter = 0; k = partition_samples; > while(k < mean) { > rice_parameter++; k <<= 1; > } > > The idea was to accelerate it: > > rice_parameter = 0; k = partition_samples; > while(k*2 < mean) { > rice_parameter+=2; k <<= 2; > } > while(k < mean) { > rice_parameter++; k <<= 1; > } > > or: > rice_parameter = 0; k = partition_samples; > while(k*4 < mean) { > rice_parameter+=3; k <<= 3; > } > while(k < mean) { > rice_parameter++; k <<= 1; > } > > > After tuning the code for 16-/24-bit WAV and 32-/64-bit compiles > I wrote more complex code (see attach). It doesn't look pretty but > it's faster than the current version. For highest compression preset, > 24-bit input and 32-bit exe the encoding speed increases by 6..7%.
Or, I was originally thinking: rice_parameter = 0; k = partition_samples; if (k < mean) { int n = mean - k; rice_parameter += n; k <<= n; } (sorry for the hasty post) On Oct 11, 2013, at 10:34, Brian Willoughby wrote:> Hmm, maybe I'm missing something, but what about this: > > rice_parameter = 0; k = partition_samples; > int n = mean - k; > if (n > 0) { > rice_parameter += n; > k <<= n; > } > > I've not looked at this code in its context within stream_encoder.c, > so it's easily possible that I left out something. > > Brian Willoughby > Sound Consulting > > > On Oct 9, 2013, at 08:54, lvqcl wrote: >> MSVS profiler shows that the following code in stream_encoder.c takes >> several percent of CPU time: >> >> for(rice_parameter = 0, k = partition_samples; k < mean; >> rice_parameter++, k <<= 1) >> ; >> >> this code is equivalent to: >> >> rice_parameter = 0; k = partition_samples; >> while(k < mean) { >> rice_parameter++; k <<= 1; >> } >> >> The idea was to accelerate it: >> >> rice_parameter = 0; k = partition_samples; >> while(k*2 < mean) { >> rice_parameter+=2; k <<= 2; >> } >> while(k < mean) { >> rice_parameter++; k <<= 1; >> } >> >> or: >> rice_parameter = 0; k = partition_samples; >> while(k*4 < mean) { >> rice_parameter+=3; k <<= 3; >> } >> while(k < mean) { >> rice_parameter++; k <<= 1; >> }
Possibly Parallel Threads
- PATCH for rice_parameter calculation
- Updated CFLAGS patches and make test compilation conditional
- Upstreaming Gentoo patches
- [PATCH] stream_encoder : Improve selection of residual accumulator width
- [PATCH 1/4] Do not override CFLAGS, as CFLAGS is a user flag.