thr3ads.net - flac dev - [flac-dev] Autocorrelation precision insufficient [Jun 2021]

If this information is useful, please help other people find it:
Share via:

Martijn van Beurden

2021-Jun-24 07:17 UTC

[flac-dev] Autocorrelation precision insufficient

Hi all,

Recently I've been investigating various ways to improve FLAC
compression, and now I've stumbled upon quite a small change with
large implications.

Flake, an alternative compressor using the FLAC format, has always
provided better compression than FLAC. I've found out why: Flake uses
doubles (64-bit floating point) for calculating autocorrelation
values, while FLAC uses regular floats (32-bit floating point). The
largest problem with implementing this, is that intrinsics routines
(for SSE and VSX) have to be rewritten. I've done quite a bit of
testing and comparing, see the next two PDFs.

http://www.audiograaf.nl/misc_stuff/double-autoc-with-sse2-intrinsics.pdf
http://www.audiograaf.nl/misc_stuff/double-autoc-with-sse2-intrinsics-per-track.pdf

There are four lines, all going from setting -4 as the rightmost
(fastest) through -5, -6, -7 to -8 as the leftmost (slowest).
- darkblue line is current git
- green line is current git but with SSE intrinsics for
autocorrelation calculation disabled
- lightblue line is calculating autocorrelation in doubles instead of real
- red line is calculating autocorrelation in doubles but with new SSE2
intrinsics routines

As you can see in the PDFs, the overall gain for setting -4 is large
(0.3%point or 0.5%) with minimal slowdown. This gain grows smaller
while the slowdown increases with increasing setting. The -per-track
PDF shows that the gain is highly dependent on the kind of audio that
is being compressed. Tracks with strong tonal components, like piano
music (14 and 15) benefit the most. Orchestral music (2, 6, 10 and 9)
and electronic music (4 and 13) benefit in varying degrees. Music with
much more noisy content, like metal (3, 5 and 12) have (almost) no
benefit. However, in the tracks that benefit, gains can be large.
Track 15, which is piano music, sees a gain of 2.2%point or 5% for
setting -4 and 1%point or 2% for -8.

Code is here: https://github.com/ktmf01/flac/tree/autoc-sse2 Before I
send a push request, I'd like to discuss a choice that has to be made.
I see a few options
- Don't switch to autoc[] as doubles, keep current speed and ignore
possible compression gain
- Switch to autoc[] as doubles, but keep current intrinsics routines.
This means some platforms (with only SSE but not SSE2 or with VSX)
will get less compression, but won't see a large slowdown.
- Switch to autoc[] as doubles, but remove current SSE and disable VSX
intrinsics for someone to update them later (I don't have any POWER8
or POWER9 hardware to test). This means all platforms will get the
same compression, but some (with only SSE but not SSE2 or with VSX)
will see a large slowdown.

Thanks in advance for your replies and comments on this.

Kind regards,

Martijn van Beurden

Martijn van Beurden

2021-Jun-25 07:48 UTC

head link

[flac-dev] Autocorrelation precision insufficient

Op do 24 jun. 2021 om 09:17 schreef Martijn van Beurden <mvanb1 at
gmail.com>:> - Switch to autoc[] as doubles, but remove current SSE and disable VSX
> intrinsics for someone to update them later (I don't have any POWER8
> or POWER9 hardware to test). This means all platforms will get the
> same compression, but some (with only SSE but not SSE2 or with VSX)
> will see a large slowdown.
I see now that besides routines with SSE intrinsics (which I rewrote
into SSE2) and with VSX intrinsics (which I don't have hardware for)
there is also a open pull request for routines with ARM intrinsics. I
am willing and able to rewrite those if this change is accepted and
merged. I have access to ARMv8 with 32-bit OS, ARMv8 with 64-bit OS,
ARMv6 and I might be able to get hold of ARMv7 hardware.

flac dev - Jun 2021 - Autocorrelation precision insufficient

[flac-dev] Autocorrelation precision insufficient

[flac-dev] Autocorrelation precision insufficient