On Aug 8, 2009, at 23:11, Didier Dambrin wrote:> Electronic music quite often doesn't leave a computer these days. > And it > mainly consists of drums, synths & vocals/effects. Drums are often > samples > sequenced at sample (not sub-sample) accuracy, thus repeated (of > course if > the song was post-resampled, there will be sub-sample times).Good point. I have certainly seen songs which were at a fixed tempo, say 128 BPM, and were so precise that you could cut and paste pieces of the song without glitches. Every measure lined up closely enough with the others that you could separate instruments from each other by subtracting out the repeated patterns.> Synths are a problem, as the riffs will have more variations, and > also free-running > oscillators will give troubles.Not only that, but some synths are oversampled, thus you have the "analog" problem of subsampled waveforms.> Anyway, right now I get what I wanted somewhat working, well enough > considering I've only spent a couple of hours.Excellent!> ..But sadly none of FLAC, WavPack or OptimFrog could compress the > pre-processed song better, or hardly. And considering you'd also > have to add > the pool of frames, it would end up worse.This surprises me. Have you tried aligning your frames to the standard FLAC frame size? As for the pool, it seems like the first occurrence of a repetition would compress like usual, and the subsequent ones would compress more than usual.> The problem is the discontinuities I think. Say you work with little, > non-tempo-synced frames, and you find a matching frame, which you > subtract > from the song at the places it matches. You'll have a discontinuity > around > it. If the frames around this one also match, it doesn't matter as > they will > be subtracted as well. But if they don't (enough), the > discontinuity will > stay.You may be right about the discontinuities. Have you tried making your transitions only at zero-crossings?> I also tried windowing the frame before subtracting it, no more > discontinuity but with small frames it's not very useful anymore.Windowing may seem like a good idea, but remember that your decoder will have to recreate every step that your encoder uses so that it can be undone. Thus, windowing may make it difficult to be lossless.> But if I run the pre-processing on something perfectly repeated > several > times, it really finds the frames, and it doesn't require knowing > the tempo. > If you don't know the tempo, the only problem will be misalignment, > which > will leave little bits of audio that were too short to find > matching frames, > but most of the processed waveform will still be silence.Seems like knowing the tempo would allow the encoding phase to take far less time. It makes sense that you don't absolutely "need" it, but you did say it takes a really long time to find matches.> Btw, are all lossless compression methods working in the time domain?I would guess that most lossless audio compression methods are time domain. However, LJPG (lossless JPEG) uses a very efficient lossy compression followed by lossless compression of the difference. I wouldn't be surprised if there is an audio codec which combines lossy frequency domain compression with lossless compression of the difference between the lossy version and the original. If there isn't then I'll just patent that... Brian Willoughby Sound Consulting
>> ..But sadly none of FLAC, WavPack or OptimFrog could compress the >> pre-processed song better, or hardly. And considering you'd also >> have to add >> the pool of frames, it would end up worse. > > This surprises me. Have you tried aligning your frames to the > standard FLAC frame size? >Not at all, because I have no idea how it works internally, I've only be using the standalone binary for now. What's the standard frame size? (I work in Delphi, so playing with C++ API's require painfully translating/adapting them, so I wanted to stay away from that for now)> As for the pool, it seems like the first occurrence of a repetition > would compress like usual, and the subsequent ones would compress > more than usual. > >> The problem is the discontinuities I think. Say you work with little, >> non-tempo-synced frames, and you find a matching frame, which you >> subtract >> from the song at the places it matches. You'll have a discontinuity >> around >> it. If the frames around this one also match, it doesn't matter as >> they will >> be subtracted as well. But if they don't (enough), the >> discontinuity will >> stay. > > You may be right about the discontinuities. Have you tried making > your transitions only at zero-crossings? >no but that could be worth a try>> I also tried windowing the frame before subtracting it, no more >> discontinuity but with small frames it's not very useful anymore. > > Windowing may seem like a good idea, but remember that your decoder > will have to recreate every step that your encoder uses so that it > can be undone. Thus, windowing may make it difficult to be lossless. >It would be undoable, as long as it's the original frame that you store, you can compute the adapted frame from it, and do anything you want. For the normalization, you'd store the frame gain along with the time where it repeats.>> But if I run the pre-processing on something perfectly repeated >> several >> times, it really finds the frames, and it doesn't require knowing >> the tempo. >> If you don't know the tempo, the only problem will be misalignment, >> which >> will leave little bits of audio that were too short to find >> matching frames, >> but most of the processed waveform will still be silence. > > Seems like knowing the tempo would allow the encoding phase to take > far less time. It makes sense that you don't absolutely "need" it, > but you did say it takes a really long time to find matches. > > >> Btw, are all lossless compression methods working in the time domain? > > I would guess that most lossless audio compression methods are time > domain. However, LJPG (lossless JPEG) uses a very efficient lossy > compression followed by lossless compression of the difference. I > wouldn't be surprised if there is an audio codec which combines lossy > frequency domain compression with lossless compression of the > difference between the lossy version and the original. If there > isn't then I'll just patent that... >I've tried that already, and was very surprised by the results. Wikipedia told me about those lossy+correction methods, and that there's supposedly a version of AAC that can do this (& WavPack & others too, but in the time domain I'd assume). ..so I started OGGing a song at various bitrates, subtracted it from the original, and tried encoding the residual using the lossless packers I mentioned. To my surprise, the size of the OGG+the packed residual always roughly matched the size of the packed original. Tried with 32k, 128k & 450k oggs.. always the same! Not exactly the same of course, but I was expecting much bigger results (not really smaller, assuming someone had tried the same before me). (haven't tried with MP3, but it's probably worse) The residual from the OGG seemed to be very stable in gain, with a bit depth decreasing along with the increasing OGG bitrate. I wasn't expecting that, knowing it works in the freq domain.
On Aug 9, 2009, at 21:18, Didier Dambrin wrote:>>> ..But sadly none of FLAC, WavPack or OptimFrog could compress the >>> pre-processed song better, or hardly. And considering you'd also >>> have to add >>> the pool of frames, it would end up worse. >> >> This surprises me. Have you tried aligning your frames to the >> standard FLAC frame size? > > Not at all, because I have no idea how it works internally, I've > only be > using the standalone binary for now. What's the standard frame size? > (I work in Delphi, so playing with C++ API's require painfully > translating/adapting them, so I wanted to stay away from that for now)I was going to suggest that you could use the FLAC library via the C API, instead of the C++ API, but some quick research on Delphi doesn't seem to show support for C. I use Objective C for object- oriented development, and it is very easy to incorporate C API. I can't really use Delphi since it is Windows only, so I can't really help you there.>> I would guess that most lossless audio compression methods are time >> domain. However, LJPG (lossless JPEG) uses a very efficient lossy >> compression followed by lossless compression of the difference. I >> wouldn't be surprised if there is an audio codec which combines lossy >> frequency domain compression with lossless compression of the >> difference between the lossy version and the original. If there >> isn't then I'll just patent that... > > I've tried that already, and was very surprised by the results. > Wikipedia > told me about those lossy+correction methods, and that there's > supposedly a > version of AAC that can do this (& WavPack & others too, but in the > time > domain I'd assume). > ..so I started OGGing a song at various bitrates, subtracted it > from the > original, and tried encoding the residual using the lossless packers I > mentioned. > To my surprise, the size of the OGG+the packed residual always roughly > matched the size of the packed original. Tried with 32k, 128k & > 450k oggs.. > always the same! Not exactly the same of course, but I was > expecting much > bigger results (not really smaller, assuming someone had tried the > same > before me). > (haven't tried with MP3, but it's probably worse)One thing to keep in mind is that FLAC isn't necessarily very efficient at compressing silence. While amplitude does correlate with size to some extent, it does not continue to improve below a certain amplitude. Perhaps this is due to the overhead of the format itself. One solution might be a custom format which embeds FLAC- compressed packets along with the lossy packets, thus sharing the overhead instead of having two completely independent files. After noticing that quieter tracks are compressed smaller, I tried compressing silence, and I seem to recall that it didn't do quite as well as I expected. Even if I am recalling this benchmark correctly, I suppose it isn't really important since very little music is that quiet.> The residual from the OGG seemed to be very stable in gain, with a > bit depth > decreasing along with the increasing OGG bitrate. I wasn't > expecting that, > knowing it works in the freq domain.In some respects, it should not really matter whether the compression is time domain or frequency domain, because the end result of lossy compression is added "noise." Whether this noise comes about from time domain errors or frequency domain errors should be irrelevant. In either case, the amplitude of the error should be quite small, and an algorithm like FLAC can compress low-amplitude signals quite well. Brian Willoughby Sound Consulting