I'm copying the flac-dev list to see if anyone has any feedback also... --- Juhana Sadeharju <kouhia@nic.funet.fi> wrote:> Hello again. I had time to check the paper out. I have filled the > steps given in the paper with formulae, and then written a piece of > C code. It is not complete code, but could be a reasonable start. > Maybe there is one typo in the paper -- I have pointed out it in > my notes below -- please check. This is only encoder, and I don't > know what Q, L and S exactly are -- perhaps Sox's AMI ADPCM code > could tell that; I have not yet checked at it. > [C pseudo-code snipped]This is referring to the following paper: ftp://ftp.funet.fi/pub/sci/audio/devel/newpapers/00871117.pdf> What do you think about the algorithm given in the paper, and > should we implement it to FLAC? > > The paper writes about 1:5 compression ratios, but I'm not sure > could it be true with pop music. For sure 1:4 ratio would be > a great thing to have, compared to FLAC's 1:2 ratio.I don't think this method is relevant to FLAC and here's why. First, the results they show are for compression of data that has already been lossily quantized to fewer bits per sample, e.g. u-Law and A-Law are logarithmic quantizations of 16-bit data to 8-bit. Second, the average ratio (assuming the table describes ratios, since the omitted the units) for 44.1kHz audio is 3:1. They only vaguely mention the sources for the material. I can choose material that gives those ratios even for linear PCM. Aside from that, their prediction idea is to use a very large filter kernel (thousands of taps), and adapt the kernel instead of transmitting it for each frame. A long kernel theoretically means more accurate prediction because of the long time correlation, especially since audio data is highly oversampled most of the time. It is apparent their method is geared for speech and I think is not so good for general music compression, for a few reasons: 1. Computing such a large filter is computationally expensive. Standard autocorrelation->Levinson-Durbin will be too slow. So they use RLS, which has stability problems. 2. They can use Huffman encoding on the residual because the alphabet is small (since the samples have already been quantized down to 8 bits or less). If you are working with 16-bit or 24-bit data, generic Huffman is not practical because of the dictionary size. That's why most (all?) such codecs use Rice coding. I have done some tests with long kernels and it does not buy very much extra compression. Most of the slack can be taken up with better entropy coding. Josh __________________________________________________ Do You Yahoo!? Get personalized email addresses from Yahoo! Mail http://personal.mail.yahoo.com/
>From: Josh Coalson <j_coalson@yahoo.com> > >I'm copying the flac-dev list to see if anyone has any >feedback also...I'm supposed to be there myself since yesterday but have not got the first digest yet.>First, the results they show are for compression of data >that has already been lossily quantized to fewer bits per >sample, e.g. u-Law and A-Law are logarithmic quantizations >of 16-bit data to 8-bit.I thought the author has two models, A-law and AMI ADPCM, which both he extends. AMI ADPCM starts with 16-bit samples, but I'm not sure if A-law is involved in that process.>Second, the average ratio (assuming the table describes >ratios, since the omitted the units) for 44.1kHz audio >is 3:1.I though they are bits/sample. The text says "from the table one can see that the software works better for audio": average for audio is 2.75, and average for audio/speech is 3.17. So, 2.75 is better only if it is the number of bits (or such).>They only vaguely mention the sources for the >material. I can choose material that gives those ratios >even for linear PCM.Yeah, and it is supposed to be a scientific paper. Considering that the author has e-mail address, I would have expected him to check against Shorten which he even references.>expensive. Standard autocorrelation->Levinson-Durbin >will be too slow. So they use RLS, which has stability >problems.His RLS seems to be stable enough, up to 5000 samples, and fast. Would the RLS make FLAC run faster?>I have done some tests with long kernels and it does >not buy very much extra compression. Most of the slack >can be taken up with better entropy coding.OK. But if anyone understands the further details of the algorithm, I would like to code and test it out. I have written down as much details as I could --- they are available via e-mail for those who didn't got the entire first mail on this topic. Perhaps it would be a waste of time, but the author of that paper should have mentioned the audio sources in the first place. Bad science. Regards, Juhana
>average for audio is 2.75, and average for audio/speech is 3.17.There were another paper on audio compression: ftp://ftp.funet.fi/pub/sci/audio/devel/newpapers/2283-2294.pdf It compares to Shorten and seems to give more meaningful results. But the audio samples there too are quite different from music; they are instrument samples. It now looks like the first paper gave too strong hopes to me. -*- Some time ago I suggested that FLAC could include a codebook based compressor: the codebook could be as large as 650 Mbytes (one CD) and would be used in compressing many audiofiles. For Vorbis I suggested the following simple method: every 16th sample would be stored to compressed file without modification. The 16 samples residue blocks are then vector quantized to the codebook. The residue blocks would start from the value 0 and end to 0. The codes would have 15 samples. The reference to the codebook would be 32 bit integer which in total would mean 1:4 compression. I have no idea how well it would perform. The algorithm is linear but would some kind of high order system work better? Back to FLAC: would the final error signal squeeze enough so that at least 1:3 total compression ratio is obtained? Well, at least I believe that using 650 MBytes external data would help to compress the audio further, but I have no idea how much. Regards, Juhana