thr3ads.net - flac dev - [Flac-dev] Large compression test [Sep 2004]

If this information is useful, please help other people find it:
Share via:

Josh Coalson

2004-Sep-10 16:45 UTC

[Flac-dev] Large compression test

A large test I ran on flac 1.0 recently finished so I thought
I'd post the results.  I took about 60 CDs, totalling around
30 gigs uncompressed, and compressed them all using all 10 of
flac's default compression modes (-0 through -9).  The CDs are
of a wide variety of music; I think the only major genres not
represented are country and rap (freudian slip).  Anyway, the
raw numbers:

Opt  Uncompressed    Compressed    Ratio  Encode Time
---  ------------   -----------   ------  -----------
-0    18705533061   30308309960   0.6171      3:49:31
-1    18296233794   30308309960   0.6036      3:56:11
-2    18213733462   30308309960   0.6009      4:15:55
-3    17947006657   30308309960   0.5921      4:29:18
-4    17492915245   30308309960   0.5771      4:58:26
-5    17447297373   30308309960   0.5756      5:37:22
-6    17435250044   30308309960   0.5752      5:48:08
-7    17414666580   30308309960   0.5745     10:58:38
-8    17385832738   30308309960   0.5736     13:43:05
-9    17350388508   30308309960   0.5724    282:30:39

Yes, that last one is almost 12 days.  You probably noticed
that the compression difference is about 0.1% from -8,
which is why I say -9 is more theoretical than practical.

The test ran on a PIII-600 and I used -V for all the tests,
so the runtime without -V would be a tiny bit less for
-7 -8 -9 and noticibly faster for the lower modes.  And
I'm glad to say that there were no verify errors and the
decoded WAVs compared exactly to the originals every time.

The range of ratios ranged from 0.20 for some jazz tracks
(quiet Ella Fitzgerald stuff) to 0.78.  The hardest
stuff to encode was constistently by the band Dream
Theater (the ultimate in progressive rock), even harder
than death metal like Cannibal Corpse.  Classical, jazz,
chant were almost always below 0.5.  Rock, techno, world
music usually fell in the range 0.5-0.7.  A ratio of
0.2 like with some of the jazz and classical tracks means
a bitrate of under 300kbps which is not bad for lossless.

The other interesting thing is that the sweet spot seems
to be more near -4 than -5.

Josh

P.S. My next project is to rip and encode all my CDs and
store the CD metadata in a database.  I've got a nice
schema worked out and a better hash than CDindex for
creating a primary key from the CD TOC.  If there's
interest I can publish the code for the little TOC
reader + key generator (UNIX only, gotta love ioctl).


__________________________________________________
Do You Yahoo!?
Make international calls for as low as $.04/minute with Yahoo! Messenger
http://phonecard.yahoo.com/

Svante Eriksson

2004-Sep-10 16:45 UTC

head link

[Flac-dev] Large compression test

Interesting figures.

Some corresponding figures for flac 1.0 for a set of 404
CDs, with a total of 4357 tracks are:

Opt  Uncompressed    Compressed    Ratio  Encode Time
---  ------------   -----------   ------  -----------
-8   234507744748  127468328349   0.5436  n/a

Admittedly, this set includes at least one country album,
and has a fair share of jazz in it.  The worst-case track in
this set has a ratio of 0.9442, due to some heavy use of
distortion.

Compiling this statistics, I noticed that on two of the
tracks, there was a problem with the MD5 and min/max
framesize being all zeroes, and the seek-table being
corrupt.  Recompressing caused the symptom to disappear.

I have seen a similar behaviour earlier when recompressing
some tracks using flac 0.10, but those symptoms disappeared
after recompression with 1.0.

Your TOC-hash algorithm would be interesting to examine, as
I'm also intending to move the metadata into a database from
a set of plain files.

/Svante

"JC" == Josh Coalson <xflac@yahoo.com> writes:

JC> A large test I ran on flac 1.0 recently finished so I thought
JC> I'd post the results.  I took about 60 CDs, totalling around
JC> 30 gigs uncompressed, and compressed them all using all 10 of
JC> flac's default compression modes (-0 through -9).  The CDs are
JC> of a wide variety of music; I think the only major genres not
JC> represented are country and rap (freudian slip).  Anyway, the
JC> raw numbers:

JC> Opt  Uncompressed    Compressed    Ratio  Encode Time
JC> ---  ------------   -----------   ------  -----------
JC> -0    18705533061   30308309960   0.6171      3:49:31
JC> -1    18296233794   30308309960   0.6036      3:56:11
JC> -2    18213733462   30308309960   0.6009      4:15:55
JC> -3    17947006657   30308309960   0.5921      4:29:18
JC> -4    17492915245   30308309960   0.5771      4:58:26
JC> -5    17447297373   30308309960   0.5756      5:37:22
JC> -6    17435250044   30308309960   0.5752      5:48:08
JC> -7    17414666580   30308309960   0.5745     10:58:38
JC> -8    17385832738   30308309960   0.5736     13:43:05
JC> -9    17350388508   30308309960   0.5724    282:30:39

JC> Yes, that last one is almost 12 days.  You probably noticed
JC> that the compression difference is about 0.1% from -8,
JC> which is why I say -9 is more theoretical than practical.

JC> The test ran on a PIII-600 and I used -V for all the tests,
JC> so the runtime without -V would be a tiny bit less for
JC> -7 -8 -9 and noticibly faster for the lower modes.  And
JC> I'm glad to say that there were no verify errors and the
JC> decoded WAVs compared exactly to the originals every time.

JC> The range of ratios ranged from 0.20 for some jazz tracks
JC> (quiet Ella Fitzgerald stuff) to 0.78.  The hardest
JC> stuff to encode was constistently by the band Dream
JC> Theater (the ultimate in progressive rock), even harder
JC> than death metal like Cannibal Corpse.  Classical, jazz,
JC> chant were almost always below 0.5.  Rock, techno, world
JC> music usually fell in the range 0.5-0.7.  A ratio of
JC> 0.2 like with some of the jazz and classical tracks means
JC> a bitrate of under 300kbps which is not bad for lossless.

JC> The other interesting thing is that the sweet spot seems
JC> to be more near -4 than -5.

JC> Josh

JC> P.S. My next project is to rip and encode all my CDs and
JC> store the CD metadata in a database.  I've got a nice
JC> schema worked out and a better hash than CDindex for
JC> creating a primary key from the CD TOC.  If there's
JC> interest I can publish the code for the little TOC
JC> reader + key generator (UNIX only, gotta love ioctl).

Josh Coalson

2004-Sep-10 16:45 UTC

head link

[Flac-dev] CD TOC hash, WAS: Large compression test

--- Svante Eriksson <ser@as9-6-1.mt.g.bonet.se>
wrote:> "JC" == Josh Coalson <xflac@yahoo.com> writes:
> 
> JC> P.S. My next project is to rip and encode all my CDs and
> JC> store the CD metadata in a database.  I've got a nice
> JC> schema worked out and a better hash than CDindex for
> JC> creating a primary key from the CD TOC.  If there's
> JC> interest I can publish the code for the little TOC
> JC> reader + key generator (UNIX only, gotta love ioctl).
>
> Your TOC-hash algorithm would be interesting to examine, as
> I'm also intending to move the metadata into a database from
> a set of plain files.
All methods basically form a message from the CD table
of contents, then pass it through a hash function to get
a digest.  CDDB has a high chance of collision because
their hash function doesn't use most of the data in the
TOC and wastes several bits of the digest.  The effective
digest length is usually around 26 bits.

So I thought it would be better to use the whole TOC and
pass it through SHA1 which yields a 160 bit digest.  Then
I found cdindex (http://www.cdindex.org/disc.html) which
does just that.  But the way the message is formed makes
it unnecessarily long (804 bytes).  Plus they feed the
digest through a pseudo-base64 encoding.

So the one I use forms a message from the binary contents
of the TOC, and also uses some data cdindex doesn't (like
the track type), so the message size is just 3 bytes per
track.  From my understanding this should reduce the
chance of collisions even further but maybe by that time
it's a moot point.

One thing you should do is store the whole CD TOC in your
database.  That way you should be able to generate any
index you need, including CDDB and cdindex.

Josh

__________________________________________________
Do You Yahoo!?
Make international calls for as low as $.04/minute with Yahoo! Messenger
http://phonecard.yahoo.com/

Reasonably Related Threads

Search for more seemingly similar threads

flac dev - Sep 2004 - Large compression test

[Flac-dev] Large compression test

[Flac-dev] Large compression test

[Flac-dev] CD TOC hash, WAS: Large compression test

Reasonably Related Threads