porte64 at free.fr
2009-Mar-09 14:53 UTC
[Flac] audio data encoding in FLAC: how complex ?
Hello ! The following algorithm describes Golomb(=Rice?) encoding: http://en.wikipedia.org/wiki/Golomb_coding#Simple_algorithm What is unclear to me is how many audio samples encoded with a fixed parameter (denoted 'M' in the wiki, which i presume is equivalent to the 'Rice parameter' invohed here): http://flac.sourceforge.net/documentation_format_overview.html But things are a bit more complicated because the wiki says: "audio codecs [...] use a Rice code after the linear prediction step". But it does not tell us about how many points/samples get interpolated, and what criteria are used. So i guess that in the end, for every frame, 2 points of the interpolating line and a Rice parameter are stored, and, at each point/sample, the distance to this line is encoded as a series of residues ?! On the whole, it *seems* to me that it boils down to simple operations (after all, it's just basic school arithmetic) but it's really hard to put all the pieces together and it's a pity (and may be dangerous for maintenance over years -- where project poeple may change) that it's not documented besides the source code. It would enjoy writing a note on the whole process but there are too many unknowns. Sorry, i am feeling a bit desperate, having spent several nights in a row trying to understand the encoding process globally from the source code. So if any of you has notes about it, please share ! Phil
Hi Phil, There is one sample per code. I'm not sure what you mean when you use the term 'interpolated' because there is no interpolating done in FLAC or Rice Coding. Rice Coding is simply used to change from a fixed number of bits per sample, such as 16 or 24, to a variable number of bits per sample. Hopefully, the most common 16-bit or 24-bit samples will have the shortest codes, while some will necessarily have more than 24 bits but will be very rare. Overall, fewer bits will be used, even though there is a one-to-one translation from each sample to its Rice Code. Because of the oscillating nature of audio, samples near zero are more common. Also, FLAC uses differential coding rather than absolute values, and since each subsequent sample is usually very close to the previous sample, the Rice Coding uses short bits sequences for small differences and larger bits sequences for larger differences. But this is all my interpretation over the years, so please don't quote me. Perhaps you should write some programs of your own to explore the topics, including things outside FLAC, and then you might be better prepared to describe and document the specifics. I would welcome a chance to read whatever you write, but I think you've got a bit of work ahead of you. Brian Willoughby Sound Consulting On Mar 9, 2009, at 07:53, porte64 at free.fr wrote: What is unclear to me is how many audio samples encoded with a fixed parameter (denoted 'M' in the wiki, which i presume is equivalent to the 'Rice parameter' invohed here): http://flac.sourceforge.net/documentation_format_overview.html But things are a bit more complicated because the wiki says: "audio codecs [...] use a Rice code after the linear prediction step". But it does not tell us about how many points/samples get interpolated, and what criteria are used. So i guess that in the end, for every frame, 2 points of the interpolating line and a Rice parameter are stored, and, at each point/sample, the distance to this line is encoded as a series of residues ?! On the whole, it *seems* to me that it boils down to simple operations (after all, it's just basic school arithmetic) but it's really hard to put all the pieces together and it's a pity (and may be dangerous for maintenance over years -- where project poeple may change) that it's not documented besides the source code. It would enjoy writing a note on the whole process but there are too many unknowns. Sorry, i am feeling a bit desperate, having spent several nights in a row trying to understand the encoding process globally from the source code. Phil