MSVS profiler shows that the following code in stream_encoder.c takes
several percent of CPU time:
     for(rice_parameter = 0, k = partition_samples; k < mean;
rice_parameter++, k <<= 1)
         ;
this code is equivalent to:
     rice_parameter = 0; k = partition_samples;
     while(k < mean) {
         rice_parameter++; k <<= 1;
     }
The idea was to accelerate it:
     rice_parameter = 0; k = partition_samples;
     while(k*2 < mean) {
         rice_parameter+=2; k <<= 2;
     }
     while(k < mean) {
         rice_parameter++; k <<= 1;
     }
or:
     rice_parameter = 0; k = partition_samples;
     while(k*4 < mean) {
         rice_parameter+=3; k <<= 3;
     }
     while(k < mean) {
         rice_parameter++; k <<= 1;
     }
After tuning the code for 16-/24-bit WAV and 32-/64-bit compiles
I wrote more complex code (see attach). It doesn't look pretty but
it's faster than the current version. For highest compression preset,
24-bit input and 32-bit exe the encoding speed increases by 6..7%.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rice_parameter.patch
Type: application/octet-stream
Size: 1616 bytes
Desc: not available
Url :
http://lists.xiph.org/pipermail/flac-dev/attachments/20131009/4934c85d/attachment.obj
lvqcl wrote:> MSVS profiler shows that the following code in stream_encoder.c takes > several percent of CPU time:This has been applied. Erik -- ---------------------------------------------------------------------- Erik de Castro Lopo http://www.mega-nerd.com/
Hmm, maybe I'm missing something, but what about this:
	rice_parameter = 0; k = partition_samples;
	int n = mean - k;
	if (n > 0) {
		rice_parameter += n;
		k <<= n;
	}
I've not looked at this code in its context within stream_encoder.c,  
so it's easily possible that I left out something.
Brian Willoughby
Sound Consulting
On Oct 9, 2013, at 08:54, lvqcl wrote:> MSVS profiler shows that the following code in stream_encoder.c takes
> several percent of CPU time:
>
>     for(rice_parameter = 0, k = partition_samples; k < mean;  
> rice_parameter++, k <<= 1)
>         ;
>
> this code is equivalent to:
>
>     rice_parameter = 0; k = partition_samples;
>     while(k < mean) {
>         rice_parameter++; k <<= 1;
>     }
>
> The idea was to accelerate it:
>
>     rice_parameter = 0; k = partition_samples;
>     while(k*2 < mean) {
>         rice_parameter+=2; k <<= 2;
>     }
>     while(k < mean) {
>         rice_parameter++; k <<= 1;
>     }
>
> or:
>     rice_parameter = 0; k = partition_samples;
>     while(k*4 < mean) {
>         rice_parameter+=3; k <<= 3;
>     }
>     while(k < mean) {
>         rice_parameter++; k <<= 1;
>     }
>
>
> After tuning the code for 16-/24-bit WAV and 32-/64-bit compiles
> I wrote more complex code (see attach). It doesn't look pretty but
> it's faster than the current version. For highest compression preset,
> 24-bit input and 32-bit exe the encoding speed increases by 6..7%.
Or, I was originally thinking:
	rice_parameter = 0; k = partition_samples;
	if (k < mean) {
		int n = mean - k;
		rice_parameter += n;
		k <<= n;
	}
(sorry for the hasty post)
On Oct 11, 2013, at 10:34, Brian Willoughby wrote:> Hmm, maybe I'm missing something, but what about this:
>
> 	rice_parameter = 0; k = partition_samples;
> 	int n = mean - k;
> 	if (n > 0) {
> 		rice_parameter += n;
> 		k <<= n;
> 	}
>
> I've not looked at this code in its context within stream_encoder.c,
> so it's easily possible that I left out something.
>
> Brian Willoughby
> Sound Consulting
>
>
> On Oct 9, 2013, at 08:54, lvqcl wrote:
>> MSVS profiler shows that the following code in stream_encoder.c takes
>> several percent of CPU time:
>>
>>     for(rice_parameter = 0, k = partition_samples; k < mean;
>> rice_parameter++, k <<= 1)
>>         ;
>>
>> this code is equivalent to:
>>
>>     rice_parameter = 0; k = partition_samples;
>>     while(k < mean) {
>>         rice_parameter++; k <<= 1;
>>     }
>>
>> The idea was to accelerate it:
>>
>>     rice_parameter = 0; k = partition_samples;
>>     while(k*2 < mean) {
>>         rice_parameter+=2; k <<= 2;
>>     }
>>     while(k < mean) {
>>         rice_parameter++; k <<= 1;
>>     }
>>
>> or:
>>     rice_parameter = 0; k = partition_samples;
>>     while(k*4 < mean) {
>>         rice_parameter+=3; k <<= 3;
>>     }
>>     while(k < mean) {
>>         rice_parameter++; k <<= 1;
>>     }
Possibly Parallel Threads
- PATCH for rice_parameter calculation
- Updated CFLAGS patches and make test compilation conditional
- Upstreaming Gentoo patches
- [PATCH] stream_encoder : Improve selection of residual accumulator width
- [PATCH 1/4] Do not override CFLAGS, as CFLAGS is a user flag.