thr3ads.net - flac dev - [Flac-dev] Re: Lossless AMI ADPCM [Sep 2004]

If this information is useful, please help other people find it:
Share via:

Josh Coalson

2004-Sep-10 16:45 UTC

[Flac-dev] Re: Lossless AMI ADPCM

I'm copying the flac-dev list to see if anyone has any
feedback also...

--- Juhana Sadeharju <kouhia@nic.funet.fi> wrote:> Hello again. I had time to check the paper out. I have filled the
> steps given in the paper with formulae, and then written a piece of
> C code. It is not complete code, but could be a reasonable start.
> Maybe there is one typo in the paper -- I have pointed out it in
> my notes below -- please check. This is only encoder, and I don't
> know what Q, L and S exactly are -- perhaps Sox's AMI ADPCM code
> could tell that; I have not yet checked at it.
> [C pseudo-code snipped]
This is referring to the following paper:

ftp://ftp.funet.fi/pub/sci/audio/devel/newpapers/00871117.pdf
> What do you think about the algorithm given in the paper, and
> should we implement it to FLAC?
> 
> The paper writes about 1:5 compression ratios, but I'm not sure
> could it be true with pop music. For sure 1:4 ratio would be
> a great thing to have, compared to FLAC's 1:2 ratio.
I don't think this method is relevant to FLAC and here's why.

First, the results they show are for compression of data
that has already been lossily quantized to fewer bits per
sample, e.g. u-Law and A-Law are logarithmic quantizations
of 16-bit data to 8-bit.

Second, the average ratio (assuming the table describes
ratios, since the omitted the units) for 44.1kHz audio
is 3:1.  They only vaguely mention the sources for the
material.  I can choose material that gives those ratios
even for linear PCM.

Aside from that, their prediction idea is to use a very
large filter kernel (thousands of taps), and adapt the
kernel instead of transmitting it for each frame.  A
long kernel theoretically means more accurate prediction
because of the long time correlation, especially since
audio data is highly oversampled most of the time.  It
is apparent their method is geared for speech and I
think is not so good for general music compression, for
a few reasons:

1. Computing such a large filter is computationally
expensive.  Standard autocorrelation->Levinson-Durbin
will be too slow.  So they use RLS, which has stability
problems.

2. They can use Huffman encoding on the residual because
the alphabet is small (since the samples have already
been quantized down to 8 bits or less).  If you are
working with 16-bit or 24-bit data, generic Huffman is
not practical because of the dictionary size.  That's
why most (all?) such codecs use Rice coding.

I have done some tests with long kernels and it does
not buy very much extra compression.  Most of the slack
can be taken up with better entropy coding.

Josh

__________________________________________________
Do You Yahoo!?
Get personalized email addresses from Yahoo! Mail
http://personal.mail.yahoo.com/

Juhana Sadeharju

2004-Sep-10 16:45 UTC

head link

[Flac-dev] Re: Lossless AMI ADPCM

>From:	Josh Coalson <j_coalson@yahoo.com>
>
>I'm copying the flac-dev list to see if anyone has any
>feedback also...
I'm supposed to be there myself since yesterday but have not got
the first digest yet.
>First, the results they show are for compression of data
>that has already been lossily quantized to fewer bits per
>sample, e.g. u-Law and A-Law are logarithmic quantizations
>of 16-bit data to 8-bit.
I thought the author has two models, A-law and AMI ADPCM, which
both he extends. AMI ADPCM starts with 16-bit samples, but I'm not
sure if A-law is involved in that process.
>Second, the average ratio (assuming the table describes
>ratios, since the omitted the units) for 44.1kHz audio
>is 3:1.
I though they are bits/sample. The text says "from the table one
can see that the software works better for audio": average for audio
is 2.75, and average for audio/speech is 3.17. So, 2.75 is better
only if it is the number of bits (or such).
>They only vaguely mention the sources for the
>material.  I can choose material that gives those ratios
>even for linear PCM.
Yeah, and it is supposed to be a scientific paper. Considering
that the author has e-mail address, I would have expected him
to check against Shorten which he even references.
>expensive.  Standard autocorrelation->Levinson-Durbin
>will be too slow.  So they use RLS, which has stability
>problems.
His RLS seems to be stable enough, up to 5000 samples, and fast.
Would the RLS make FLAC run faster?
>I have done some tests with long kernels and it does
>not buy very much extra compression.  Most of the slack
>can be taken up with better entropy coding.
OK.
But if anyone understands the further details of the algorithm,
I would like to code and test it out. I have written down as much
details as I could --- they are available via e-mail for those who
didn't got the entire first mail on this topic. Perhaps it would be
a waste of time, but the author of that paper should have mentioned
the audio sources in the first place. Bad science.

Regards,

Juhana

Juhana Sadeharju

2004-Sep-10 16:45 UTC

head link

[Flac-dev] Re: Lossless AMI ADPCM

>average for audio is 2.75, and average for audio/speech is 3.17.
There were another paper on audio compression:
 ftp://ftp.funet.fi/pub/sci/audio/devel/newpapers/2283-2294.pdf

It compares to Shorten and seems to give more meaningful results.
But the audio samples there too are quite different from music;
they are instrument samples.

It now looks like the first paper gave too strong hopes to me.

 -*-

Some time ago I suggested that FLAC could include a codebook based
compressor: the codebook could be as large as 650 Mbytes (one CD)
and would be used in compressing many audiofiles.

For Vorbis I suggested the following simple method: every 16th sample
would be stored to compressed file without modification. The 16 samples
residue blocks are then vector quantized to the codebook. The residue
blocks would start from the value 0 and end to 0. The codes would have
15 samples. The reference to the codebook would be 32 bit integer which
in total would mean 1:4 compression. I have no idea how well it would
perform. The algorithm is linear but would some kind of high order
system work better?

Back to FLAC: would the final error signal squeeze enough so that
at least 1:3 total compression ratio is obtained?

Well, at least I believe that using 650 MBytes external data would
help to compress the audio further, but I have no idea how much.

Regards,

Juhana

Seemingly Similar Threads

Search for more reasonably related threads

flac dev - Sep 2004 - Re: Lossless AMI ADPCM

[Flac-dev] Re: Lossless AMI ADPCM

[Flac-dev] Re: Lossless AMI ADPCM

[Flac-dev] Re: Lossless AMI ADPCM

Seemingly Similar Threads