thr3ads.net - flac dev - [flac-dev] Performance checks [Jun 2013]

If this information is useful, please help other people find it:
Share via:

Janne Hyvärinen

2013-Jun-01 11:24 UTC

[flac-dev] Performance checks

On 31.5.2013 13:04, Miroslav Lichvar wrote:> On Wed, May 29, 2013 at 04:08:57PM +0200, Martijn van Beurden wrote:
>> I was surprised to see that the Windows compile on wine actually
>> outperformed the native Linux one. Probably GCC 4.6 optimized a little
>> better or something very weird is going on in wine, I don't know.
The
>> assembly optimizations work very well on encoding, but actually slow
>> things down when decoding. The difference is not very large however.
> In a quick test with a pre 4.8 gcc on a Core 2 CPU I see a small
> improvement in decoding speed with assembly optimizations turned on,
> but I think the difference used to be larger. Perhaps the compilers
> got better or MMX is slower relative to normal code on current CPUs.
>
> Disabling the FLAC__bitreader_read_rice_signed_block_asm_ia32_bswap
> function seems to help a bit. (there is an #if disabling the function
> with comment "OPT: not clearly faster, needs more testing" in the
> src/libFLAC/stream_decoder.c file)
>
> Here is the relative decoding speed with -5 and -8:
> 			-5		-8
> no asm			99.0%		97.0%
> asm			100.0%		100.0%
> asm (no ia32_bswap)	102.7%		102.7%
>
> I think we should drop that assembly function as the C
> version seems to be faster now.
>
> Can anyone confirm this?
>
> Thanks,
>
I can confirm. I see 10% speed improvement with that change on Core i7.
Decoding a 1h18min38.133s long test FLAC -8 encoded file takes with 
normal asm optimizations 7.656s (speed: 616,266x realtime) and with that 
tiny change 6.937s (speed: 680,140x realtime).

Janne Hyvärinen

2013-Jun-01 11:33 UTC

head link

[flac-dev] Performance checks

On 1.6.2013 14:24, Janne Hyv?rinen wrote:> On 31.5.2013 13:04, Miroslav Lichvar wrote:
>> On Wed, May 29, 2013 at 04:08:57PM +0200, Martijn van Beurden wrote:
>>> I was surprised to see that the Windows compile on wine actually
>>> outperformed the native Linux one. Probably GCC 4.6 optimized a
little
>>> better or something very weird is going on in wine, I don't
know. The
>>> assembly optimizations work very well on encoding, but actually
slow
>>> things down when decoding. The difference is not very large
however.
>> In a quick test with a pre 4.8 gcc on a Core 2 CPU I see a small
>> improvement in decoding speed with assembly optimizations turned on,
>> but I think the difference used to be larger. Perhaps the compilers
>> got better or MMX is slower relative to normal code on current CPUs.
>>
>> Disabling the FLAC__bitreader_read_rice_signed_block_asm_ia32_bswap
>> function seems to help a bit. (there is an #if disabling the function
>> with comment "OPT: not clearly faster, needs more testing" in
the
>> src/libFLAC/stream_decoder.c file)
>>
>> Here is the relative decoding speed with -5 and -8:
>> 			-5		-8
>> no asm			99.0%		97.0%
>> asm			100.0%		100.0%
>> asm (no ia32_bswap)	102.7%		102.7%
>>
>> I think we should drop that assembly function as the C
>> version seems to be faster now.
>>
>> Can anyone confirm this?
>>
>> Thanks,
>>
> I can confirm. I see 10% speed improvement with that change on Core i7.
> Decoding a 1h18min38.133s long test FLAC -8 encoded file takes with
> normal asm optimizations 7.656s (speed: 616,266x realtime) and with that
> tiny change 6.937s (speed: 680,140x realtime).
>
>
I noticed a side effect for this change. Encoding got a bit slower at 
least when md5 checksumming is enabled.

Miroslav Lichvar

2013-Jun-03 11:24 UTC

head link

[flac-dev] Performance checks

On Sat, Jun 01, 2013 at 02:33:55PM +0300, Janne Hyv?rinen
wrote:> On 1.6.2013 14:24, Janne Hyv?rinen wrote:
> > I can confirm. I see 10% speed improvement with that change on Core
i7.
> > Decoding a 1h18min38.133s long test FLAC -8 encoded file takes with
> > normal asm optimizations 7.656s (speed: 616,266x realtime) and with
that
> > tiny change 6.937s (speed: 680,140x realtime).
Thanks for the testing.
> I noticed a side effect for this change. Encoding got a bit slower at 
> least when md5 checksumming is enabled.
That's odd. How much slower was the encoding? Could it be caused by
increase in the size of the function (only with -funroll-loops?) and
not fitting in the cache during encoding?

It might be good to use -funroll-loops only with some files, IIRC it
helped most to stream_encoder.c.

-- 
Miroslav Lichvar

Seemingly Similar Threads

Search for more seemingly similar threads

flac dev - Jun 2013 - Performance checks

[flac-dev] Performance checks

[flac-dev] Performance checks

[flac-dev] Performance checks

Seemingly Similar Threads