A large test I ran on flac 1.0 recently finished so I thought I'd post the results. I took about 60 CDs, totalling around 30 gigs uncompressed, and compressed them all using all 10 of flac's default compression modes (-0 through -9). The CDs are of a wide variety of music; I think the only major genres not represented are country and rap (freudian slip). Anyway, the raw numbers: Opt Uncompressed Compressed Ratio Encode Time --- ------------ ----------- ------ ----------- -0 18705533061 30308309960 0.6171 3:49:31 -1 18296233794 30308309960 0.6036 3:56:11 -2 18213733462 30308309960 0.6009 4:15:55 -3 17947006657 30308309960 0.5921 4:29:18 -4 17492915245 30308309960 0.5771 4:58:26 -5 17447297373 30308309960 0.5756 5:37:22 -6 17435250044 30308309960 0.5752 5:48:08 -7 17414666580 30308309960 0.5745 10:58:38 -8 17385832738 30308309960 0.5736 13:43:05 -9 17350388508 30308309960 0.5724 282:30:39 Yes, that last one is almost 12 days. You probably noticed that the compression difference is about 0.1% from -8, which is why I say -9 is more theoretical than practical. The test ran on a PIII-600 and I used -V for all the tests, so the runtime without -V would be a tiny bit less for -7 -8 -9 and noticibly faster for the lower modes. And I'm glad to say that there were no verify errors and the decoded WAVs compared exactly to the originals every time. The range of ratios ranged from 0.20 for some jazz tracks (quiet Ella Fitzgerald stuff) to 0.78. The hardest stuff to encode was constistently by the band Dream Theater (the ultimate in progressive rock), even harder than death metal like Cannibal Corpse. Classical, jazz, chant were almost always below 0.5. Rock, techno, world music usually fell in the range 0.5-0.7. A ratio of 0.2 like with some of the jazz and classical tracks means a bitrate of under 300kbps which is not bad for lossless. The other interesting thing is that the sweet spot seems to be more near -4 than -5. Josh P.S. My next project is to rip and encode all my CDs and store the CD metadata in a database. I've got a nice schema worked out and a better hash than CDindex for creating a primary key from the CD TOC. If there's interest I can publish the code for the little TOC reader + key generator (UNIX only, gotta love ioctl). __________________________________________________ Do You Yahoo!? Make international calls for as low as $.04/minute with Yahoo! Messenger http://phonecard.yahoo.com/
Interesting figures. Some corresponding figures for flac 1.0 for a set of 404 CDs, with a total of 4357 tracks are: Opt Uncompressed Compressed Ratio Encode Time --- ------------ ----------- ------ ----------- -8 234507744748 127468328349 0.5436 n/a Admittedly, this set includes at least one country album, and has a fair share of jazz in it. The worst-case track in this set has a ratio of 0.9442, due to some heavy use of distortion. Compiling this statistics, I noticed that on two of the tracks, there was a problem with the MD5 and min/max framesize being all zeroes, and the seek-table being corrupt. Recompressing caused the symptom to disappear. I have seen a similar behaviour earlier when recompressing some tracks using flac 0.10, but those symptoms disappeared after recompression with 1.0. Your TOC-hash algorithm would be interesting to examine, as I'm also intending to move the metadata into a database from a set of plain files. /Svante "JC" == Josh Coalson <xflac@yahoo.com> writes: JC> A large test I ran on flac 1.0 recently finished so I thought JC> I'd post the results. I took about 60 CDs, totalling around JC> 30 gigs uncompressed, and compressed them all using all 10 of JC> flac's default compression modes (-0 through -9). The CDs are JC> of a wide variety of music; I think the only major genres not JC> represented are country and rap (freudian slip). Anyway, the JC> raw numbers: JC> Opt Uncompressed Compressed Ratio Encode Time JC> --- ------------ ----------- ------ ----------- JC> -0 18705533061 30308309960 0.6171 3:49:31 JC> -1 18296233794 30308309960 0.6036 3:56:11 JC> -2 18213733462 30308309960 0.6009 4:15:55 JC> -3 17947006657 30308309960 0.5921 4:29:18 JC> -4 17492915245 30308309960 0.5771 4:58:26 JC> -5 17447297373 30308309960 0.5756 5:37:22 JC> -6 17435250044 30308309960 0.5752 5:48:08 JC> -7 17414666580 30308309960 0.5745 10:58:38 JC> -8 17385832738 30308309960 0.5736 13:43:05 JC> -9 17350388508 30308309960 0.5724 282:30:39 JC> Yes, that last one is almost 12 days. You probably noticed JC> that the compression difference is about 0.1% from -8, JC> which is why I say -9 is more theoretical than practical. JC> The test ran on a PIII-600 and I used -V for all the tests, JC> so the runtime without -V would be a tiny bit less for JC> -7 -8 -9 and noticibly faster for the lower modes. And JC> I'm glad to say that there were no verify errors and the JC> decoded WAVs compared exactly to the originals every time. JC> The range of ratios ranged from 0.20 for some jazz tracks JC> (quiet Ella Fitzgerald stuff) to 0.78. The hardest JC> stuff to encode was constistently by the band Dream JC> Theater (the ultimate in progressive rock), even harder JC> than death metal like Cannibal Corpse. Classical, jazz, JC> chant were almost always below 0.5. Rock, techno, world JC> music usually fell in the range 0.5-0.7. A ratio of JC> 0.2 like with some of the jazz and classical tracks means JC> a bitrate of under 300kbps which is not bad for lossless. JC> The other interesting thing is that the sweet spot seems JC> to be more near -4 than -5. JC> Josh JC> P.S. My next project is to rip and encode all my CDs and JC> store the CD metadata in a database. I've got a nice JC> schema worked out and a better hash than CDindex for JC> creating a primary key from the CD TOC. If there's JC> interest I can publish the code for the little TOC JC> reader + key generator (UNIX only, gotta love ioctl).
--- Svante Eriksson <ser@as9-6-1.mt.g.bonet.se> wrote:> "JC" == Josh Coalson <xflac@yahoo.com> writes: > > JC> P.S. My next project is to rip and encode all my CDs and > JC> store the CD metadata in a database. I've got a nice > JC> schema worked out and a better hash than CDindex for > JC> creating a primary key from the CD TOC. If there's > JC> interest I can publish the code for the little TOC > JC> reader + key generator (UNIX only, gotta love ioctl). > > Your TOC-hash algorithm would be interesting to examine, as > I'm also intending to move the metadata into a database from > a set of plain files.All methods basically form a message from the CD table of contents, then pass it through a hash function to get a digest. CDDB has a high chance of collision because their hash function doesn't use most of the data in the TOC and wastes several bits of the digest. The effective digest length is usually around 26 bits. So I thought it would be better to use the whole TOC and pass it through SHA1 which yields a 160 bit digest. Then I found cdindex (http://www.cdindex.org/disc.html) which does just that. But the way the message is formed makes it unnecessarily long (804 bytes). Plus they feed the digest through a pseudo-base64 encoding. So the one I use forms a message from the binary contents of the TOC, and also uses some data cdindex doesn't (like the track type), so the message size is just 3 bytes per track. From my understanding this should reduce the chance of collisions even further but maybe by that time it's a moot point. One thing you should do is store the whole CD TOC in your database. That way you should be able to generate any index you need, including CDDB and cdindex. Josh __________________________________________________ Do You Yahoo!? Make international calls for as low as $.04/minute with Yahoo! Messenger http://phonecard.yahoo.com/