MSVS profiler shows that the following code in stream_encoder.c takes
several percent of CPU time:
for(rice_parameter = 0, k = partition_samples; k < mean;
rice_parameter++, k <<= 1)
;
this code is equivalent to:
rice_parameter = 0; k = partition_samples;
while(k < mean) {
rice_parameter++; k <<= 1;
}
The idea was to accelerate it:
rice_parameter = 0; k = partition_samples;
while(k*2 < mean) {
rice_parameter+=2; k <<= 2;
}
while(k < mean) {
rice_parameter++; k <<= 1;
}
or:
rice_parameter = 0; k = partition_samples;
while(k*4 < mean) {
rice_parameter+=3; k <<= 3;
}
while(k < mean) {
rice_parameter++; k <<= 1;
}
After tuning the code for 16-/24-bit WAV and 32-/64-bit compiles
I wrote more complex code (see attach). It doesn't look pretty but
it's faster than the current version. For highest compression preset,
24-bit input and 32-bit exe the encoding speed increases by 6..7%.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rice_parameter.patch
Type: application/octet-stream
Size: 1616 bytes
Desc: not available
Url :
http://lists.xiph.org/pipermail/flac-dev/attachments/20131009/4934c85d/attachment.obj
lvqcl wrote:> MSVS profiler shows that the following code in stream_encoder.c takes > several percent of CPU time:This has been applied. Erik -- ---------------------------------------------------------------------- Erik de Castro Lopo http://www.mega-nerd.com/
Hmm, maybe I'm missing something, but what about this:
rice_parameter = 0; k = partition_samples;
int n = mean - k;
if (n > 0) {
rice_parameter += n;
k <<= n;
}
I've not looked at this code in its context within stream_encoder.c,
so it's easily possible that I left out something.
Brian Willoughby
Sound Consulting
On Oct 9, 2013, at 08:54, lvqcl wrote:> MSVS profiler shows that the following code in stream_encoder.c takes
> several percent of CPU time:
>
> for(rice_parameter = 0, k = partition_samples; k < mean;
> rice_parameter++, k <<= 1)
> ;
>
> this code is equivalent to:
>
> rice_parameter = 0; k = partition_samples;
> while(k < mean) {
> rice_parameter++; k <<= 1;
> }
>
> The idea was to accelerate it:
>
> rice_parameter = 0; k = partition_samples;
> while(k*2 < mean) {
> rice_parameter+=2; k <<= 2;
> }
> while(k < mean) {
> rice_parameter++; k <<= 1;
> }
>
> or:
> rice_parameter = 0; k = partition_samples;
> while(k*4 < mean) {
> rice_parameter+=3; k <<= 3;
> }
> while(k < mean) {
> rice_parameter++; k <<= 1;
> }
>
>
> After tuning the code for 16-/24-bit WAV and 32-/64-bit compiles
> I wrote more complex code (see attach). It doesn't look pretty but
> it's faster than the current version. For highest compression preset,
> 24-bit input and 32-bit exe the encoding speed increases by 6..7%.
Or, I was originally thinking:
rice_parameter = 0; k = partition_samples;
if (k < mean) {
int n = mean - k;
rice_parameter += n;
k <<= n;
}
(sorry for the hasty post)
On Oct 11, 2013, at 10:34, Brian Willoughby wrote:> Hmm, maybe I'm missing something, but what about this:
>
> rice_parameter = 0; k = partition_samples;
> int n = mean - k;
> if (n > 0) {
> rice_parameter += n;
> k <<= n;
> }
>
> I've not looked at this code in its context within stream_encoder.c,
> so it's easily possible that I left out something.
>
> Brian Willoughby
> Sound Consulting
>
>
> On Oct 9, 2013, at 08:54, lvqcl wrote:
>> MSVS profiler shows that the following code in stream_encoder.c takes
>> several percent of CPU time:
>>
>> for(rice_parameter = 0, k = partition_samples; k < mean;
>> rice_parameter++, k <<= 1)
>> ;
>>
>> this code is equivalent to:
>>
>> rice_parameter = 0; k = partition_samples;
>> while(k < mean) {
>> rice_parameter++; k <<= 1;
>> }
>>
>> The idea was to accelerate it:
>>
>> rice_parameter = 0; k = partition_samples;
>> while(k*2 < mean) {
>> rice_parameter+=2; k <<= 2;
>> }
>> while(k < mean) {
>> rice_parameter++; k <<= 1;
>> }
>>
>> or:
>> rice_parameter = 0; k = partition_samples;
>> while(k*4 < mean) {
>> rice_parameter+=3; k <<= 3;
>> }
>> while(k < mean) {
>> rice_parameter++; k <<= 1;
>> }
Maybe Matching Threads
- PATCH for rice_parameter calculation
- Updated CFLAGS patches and make test compilation conditional
- Upstreaming Gentoo patches
- [PATCH] stream_encoder : Improve selection of residual accumulator width
- [PATCH 1/4] Do not override CFLAGS, as CFLAGS is a user flag.