Lourens Veen
2002-Sep-19 03:44 UTC
[vorbis-dev] Using large-scale repetition in audio compression
This idea is so simple that I'm sure it must have been thought of before, and discarded, since AFAIK it's not used anywhere. I did a quick web search but that didn't turn up much, so I figured I'd put it up for discussion here anyway. How about using large-scale repetition in audio compression? I'm thinking of redundancy in repeated pieces of a song, ie a chorus. Ofcourse, the different choruses aren't exactly the same (unless it was mixed digitally and they cheated :-)), but wouldn't there be at least some redundancy in the frequency domain? And could that be used to lower the required bitrate for repeated parts of a song? Ofcourse this is hard to do when streaming live audio, or even when streaming from a fixed source if you don't buffer the entire broadcast on the client side, but for compressing a song from a CD I'd say that it would work. Obviously, it's not used (at least AFAIK) so there must be something against it. Anyone care to enlighten me? Lourens -- GPG public key: http://home.student.utwente.nl/l.e.veen/lourens.key --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'vorbis-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.
ChristianHJW
2002-Sep-19 04:52 UTC
[vorbis-dev] Re: Using large-scale repetition in audio compression
<"Lourens Veen" <lourens@rainbowdesert.net> schrieb im Newsbeitrag < news:200209191244.43764.lourens@rainbowdesert.net... <How about using large-scale repetition in audio compression? I'm <thinking of redundancy in repeated pieces of a song, ie a chorus. Two similar ideas have been discussed i know of : 1. Video : using frame repetition for anime encoding http://forum.doom9.org/showthread.php?s=&threadid=28773 2. Encoding different language movie soundtracks into one single multi-channel Ogg Vorbis file, using channel coupling or frame repetition for size reduction, instead of storing several Stereo audio streams with the movie, one for each language , containing same or similar sound information if there is no talking, but only background noise, etc. http://forum.doom9.org/showthread.php?s=&postid=111862#post111862 Sorry if this is not exactly what you are talking about, but i thought it is somehow related. -- Christian Sites : http://mcf.sourceforge.net http://sf.net/projects/mcf MCF mailing lists : news://news.gmane.org gmane.comp.video.mcf.general gmane.comp.video.mcf.devel gmane.comp.video.mcf.mplayer gmane.comp.video.mcf.announce gmane.comp.video.mcf.mpc Soon : www.corecodec.com <p><p><p>--- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'vorbis-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.
Kenneth Arnold
2002-Sep-19 10:13 UTC
[vorbis-dev] Using large-scale repetition in audio compression
On Thu, Sep 19, 2002 at 12:44:42PM +0200, Lourens Veen wrote:> This idea is so simple that I'm sure it must have been thought of > before, and discarded, since AFAIK it's not used anywhere. I did a > quick web search but that didn't turn up much, so I figured I'd put > it up for discussion here anyway.Thought about it before and postponed for lack of time and resources.> How about using large-scale repetition in audio compression? I'm > thinking of redundancy in repeated pieces of a song, ie a chorus. > Ofcourse, the different choruses aren't exactly the same (unless it > was mixed digitally and they cheated :-)), but wouldn't there be at > least some redundancy in the frequency domain? And could that be > used to lower the required bitrate for repeated parts of a song?Very similar to patterns in tracker files, huh? In a sense an entire song should be self-similar in the frequency domain -- after all, in most cases it's the same instruments playing a lot of the same structures. So then my thoughts came down to identifying the instruments used in the song, extracting samples of them, and sort of templating them in the time-frequency domain against the spectrum of the song and storing that as pretty much a tracker-style format plus a residue track. It turns out you can identify and extract individual instruments pretty well (my algorithm didn't have the greatest results, but others' have). The problem comes when you start combining them. Very poor time resolution in most analysis methods (why do we block up the MDCT transform in Vorbis anyway? ... exactly.), combined with harmonic distortion from transient signals -- the attack is important in identifying the instrument in many cases -- and nonlinear processing of the signal in the many stages of recording ... it quickly turns out to be a big mess. I think continuing with a classical transform, e.g. the MDCT used in Vorbis, or even a wavelet-based method such as the Matching Pursuit that I used, will result in a system where half of your bits will be devoted to fixing up the mistakes made by the other half. I'm exploring new possibilities in time-frequency transforms, with other goals right at the moment but I may get back to the audio compression side later on.> Ofcourse this is hard to do when streaming live audio, or even when > streaming from a fixed source if you don't buffer the entire > broadcast on the client side, but for compressing a song from a CD > I'd say that it would work.In the above proposal, the streamer would send the known instruments in the song thus far to the client before starting the main streaming, much like Vorbis does with codebooks.> Obviously, it's not used (at least AFAIK) so there must be something > against it. Anyone care to enlighten me?... keep each other posted. -- Kenneth Arnold <ken@arnoldnet.net> - "Know thyself." -------------- next part -------------- A non-text attachment was scrubbed... Name: part Type: application/pgp-signature Size: 190 bytes Desc: not available Url : http://lists.xiph.org/pipermail/vorbis-dev/attachments/20020919/87ba9776/part-0001.pgp
Monty
2002-Sep-19 16:03 UTC
[vorbis-dev] Using large-scale repetition in audio compression
On Thu, Sep 19, 2002 at 12:44:42PM +0200, Lourens Veen wrote:> This idea is so simple that I'm sure it must have been thought of > before, and discarded, since AFAIK it's not used anywhere. I did a > quick web search but that didn't turn up much, so I figured I'd put > it up for discussion here anyway. > > How about using large-scale repetition in audio compression? I'm > thinking of redundancy in repeated pieces of a song, ie a chorus. > Ofcourse, the different choruses aren't exactly the same (unless it > was mixed digitally and they cheated :-)), but wouldn't there be at > least some redundancy in the frequency domain? And could that be > used to lower the required bitrate for repeated parts of a song?This is one of the earliest things I tried in 1993. The answer was 'in fact, no'. The problem is not compressing the predictable parts of a song; that's only a few bytes a second. The difficulty is compressing the audible randomness. The brute-force FFT/autocorelation compressor I wrote nearly ten years ago know took several hours per song and when used in conjunction with a standard LPC-style standard lossless compressor added as much information to the stream as it eliminated. In retrospect, this result should have been obvious.> Obviously, it's not used (at least AFAIK) so there must be something > against it. Anyone care to enlighten me?"It doesn't work the way you think it does". Spend a few months on it and you'll see why :-) Monty --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'vorbis-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.