thr3ads.net - flac dev - [Flac-dev] alternate compression [Aug 2009]

If this information is useful, please help other people find it:
Share via:

Brian Willoughby

2009-Aug-09 05:02 UTC

[Flac-dev] floating point

On Aug 7, 2009, at 21:48, Didier Dambrin wrote:> FLAC doesn't preserve every chunk? I thought it did. I only gave a  
> quick try
> but it seemed to have preserved even the most obscure chunks.
> Let me check: it even seems to preserve "MIDI note associated to  
> marker",
> which is a very unknown metadata used by SoundForge (& even defined  
> in a
> buggy way), so I assumed it was saving them transparently.
You are correct, Didier.  FLAC preserve every chunk, precisely.  WAV  
and AIFF define a chunk in a very generic fashion, such that any  
chunk can be preserved regardless of its contents.  FLAC does not  
interpret any chunk except the one holding the audio data.  The  
optional chunk preserving code does not treat any chunk differently,  
thus it cannot preserve some chunks and not others.  I think that  
Martin is speaking from out of date experience.

> Btw, what do you think of this?
> http://www.hydrogenaudio.org/forums/index.php? 
> s=95a0210a0ba3304eca44ac3bd57990cb&showtopic=73895
> (didn't know where to post this, that forum seemed related)
That article is very naive, or at least the way it is described is  
very naive.  Real music does not repeat in terms of whole frames.   
Frames are a completely artificial creation of the digital world, and  
frame timing does not correspond to the timing of music repetitions  
in music.  Because music represents an analog signal, the repetition  
could occur at a fraction of a frame, or even a fraction of a  
sample.  Compressing a drum loop would require a lot of tricks to  
detect the repetition unless the frame size were somehow luckily  
aligned with the tempo.  Maybe a song with 70.3125 BPM or 140.625 BPM  
could be compressed this way, but most music will not have such a  
precise tempo - in fact, tempo may drift if a live band is recorded.

> So I thought: imagine a pre-processing coupled with FLAC. It would  
> take
> frames out of the whole song, and try to cross-correlate them with  
> the song
> itself. When it finds strong matches (under a certain threshold, and
> starting with a couple of matches), the frame is saved to a pool,  
> and it's
> subtracted from the song.
> Then you FLAC the (small) pool, and the song, full of near-silent  
> spots (&
> silence where pure repetitions occured).
> At decode time, you unFLAC the pool and the song, and you add back the
> frames from the pool to the song.This might work, but you would have to be very lucky to find matches  
given the block size of FLAC (or the frame size of any format, for  
that matter).  But, you're right, if you can predict the waveform  
with reasonable accuracy, then you can reduce the size.

FLAC and many other compression algorithms do, in fact, use this  
technique.  They look at the music, predict future samples, and then  
encode the difference between the predicted value and the actual value.

It's doubtful that you could find a better algorithm at predicting  
the waveform, but if you do, then FLAC will work well with your added  
processing layer.

> I haven't experimented yet, but let's say I try to correlate frames
> with the
> song, and I get something like 20 near-repeats, I may end up with a  
> very
> silent "song leftover", still as long as the song, but maybe in  
> 4bits worth
> or something? But it would also have bumps of original audio (that  
> didn't
> find any matching frame).
> The thing is, I don't really know how FLAC compresses so I don't  
> know if it
> would compress the "leftover" so much better.It's doubtful that you could find such repetition, given that the  
frame size has nothing to do with the tempo of the song, and  
repetition in music are based on tempo.  But, if you could find a  
match or even a near match, then FLAC would compress the difference  
better than the original.
> And I don't really know how much matching frames you'd find out in
> music out
> there, it would be very genre-dependent. But I'm surprised that no one
> really investigated this (there were old discussions in that  
> forum). Sure,
> streaming is important, but it's common to fully download a song.The repetition would not be genre-dependent, but would be tempo- 
dependent.  I suppose you could say that certain genres might have a  
prevalent tempo, but there is enough variation within each genre to  
make the problem as big as non-genre-dependent matching.

People have investigated this, but perhaps not at the macro level as  
is being discussed here.  I think you'll discover that finding a  
match within a song is very difficult.  You could perhaps start with  
BPM detection code, and then try to find repetitions based upon  
tempo, but even if you find matches this way, you still would need to  
find some way to squeeze the repetitions into whole frames, which are  
not divisions of tempo.

Feel free to experiment.  The FLAC library makes it possible for you  
to work at the high level without writing everything yourself.
> At the same time this wouldn't be very interesting for my need,  
> which is to
> compress short samples. Now here too there could be a similar algo,  
> if it's
> tonal, cross-correlation would detect matching frames, only at a  
> smaller
> level. Imagine if you convert a violin sound into a pitch period  
> somewhere
> in its middle, and the residual from that the subtraction of that  
> pitch
> period in repeated frames. I think the residual would be rather quiet.
If you're going to use the primary violin sounds middle pitch as the  
predictor, then you need a way for your encoder and decoder to find  
the exact same waveform.  If you can do this in a way that your  
decoder could discover the predicted values, then FLAC would be a  
successful way to compress the residual.

Brian Willoughby
Sound Consulting

Didier Dambrin

2009-Aug-09 06:11 UTC

head link

[Flac-dev] alternate compression

I'm doing testing on this at the moment.

But to start with:>>Because music represents an analog signal,As I wrote, it would only apply to specific genres, not analog recordings. 
Electronic music quite often doesn't leave a computer these days. And it 
mainly consists of drums, synths & vocals/effects. Drums are often samples 
sequenced at sample (not sub-sample) accuracy, thus repeated (of course if 
the song was post-resampled, there will be sub-sample times). Synths are a 
problem, as the riffs will have more variations, and also free-running 
oscillators will give troubles.
But remember that it's not about finding perfect matches which will very 
rarely happen, just correlating signals to leave a residual, as long as it 
compensates (repeats enough) for the extra frame pool you'd have to store.
>>The repetition would not be genre-dependent, but would be
tempo-dependent.If would be very genre dependent for the reason I explained: samples. Say 
you have a drummer repeating a drumloop. If it's recorded, there's no
chance
the noise of the drums will correlate, it will change all the time. But if 
it's a drum sample, they will match. It's easy to correlate similar 
kickdrums, but that hardly works with cymbals.



Anyway, right now I get what I wanted somewhat working, well enough 
considering I've only spent a couple of hours.

I've done matching-frame-detection at small & big level. Small is just a
couple of samples, but I also tried big tempo-synced blocks (this wasn't a 
problem, I have a good tempo detector, and btw tempo detection can also be 
done using cross-correlation in a similar matching-frame way).
It works (& takes ages btw), for ex I take a typical piece of music with a 
drumloop repeated & a varying synth line over it. It detects the repeated 
drumloop, subtracts it, so that for some parts there's just the synth line 
alone, without the drumloop anymore. And of course it totally fails for 
other songs I tested. So I'm sure it could end up working well with a lot 
more time.

..But sadly none of FLAC, WavPack or OptimFrog could compress the 
pre-processed song better, or hardly. And considering you'd also have to add
the pool of frames, it would end up worse.

The problem is the discontinuities I think. Say you work with little, 
non-tempo-synced frames, and you find a matching frame, which you subtract 
from the song at the places it matches. You'll have a discontinuity around 
it. If the frames around this one also match, it doesn't matter as they will
be subtracted as well. But if they don't (enough), the discontinuity will 
stay.
I also tried windowing the frame before subtracting it, no more 
discontinuity but with small frames it's not very useful anymore.

But if I run the pre-processing on something perfectly repeated several 
times, it really finds the frames, and it doesn't require knowing the tempo.
If you don't know the tempo, the only problem will be misalignment, which 
will leave little bits of audio that were too short to find matching frames, 
but most of the processed waveform will still be silence.


So, I don't think I will test this further (it's the kind of thing you
can
spend months on to eventually give up), but I think it has potential, just 
maybe not coupled with existing lossless compression methods.
Afterall compression is about finding how data repeats, and music clearly 
repeats. It would also be useful for lossy compression, say you have a 
drummer playing 2x a loop, a lossy compressor could assume the second is the 
same as the first, even if it doesn't match perfectly. But really, I think a
compressor should compress music at the level it repeats, and the current 
compressors seem to work at a smaller time scale.
(& in any case it would also require a huge compressing time, unless 
matching frames detection is done heavily in parallel using GPU maybe)


So the algo I tried is roughly:
-peek frames from the waveform, 1 by 1
-cross-correlate the frame with the rest of the waveform
-check the correlation result & whenever there's a strong match, look
around
it if there's not an even better match that's close enough
-for each match, subtract the frame from the waveform (tried with & without 
windowing). This may also be improved if you normalize the frame to the 
matching one (haven't tried)


Btw, are all lossless compression methods working in the time domain?



>> Btw, what do you think of this?
>> http://www.hydrogenaudio.org/forums/index.php?
>> s=95a0210a0ba3304eca44ac3bd57990cb&showtopic=73895
>> (didn't know where to post this, that forum seemed related)
>
> That article is very naive, or at least the way it is described is
> very naive.  Real music does not repeat in terms of whole frames.
> Frames are a completely artificial creation of the digital world, and
> frame timing does not correspond to the timing of music repetitions
> in music.  Because music represents an analog signal, the repetition
> could occur at a fraction of a frame, or even a fraction of a
> sample.  Compressing a drum loop would require a lot of tricks to
> detect the repetition unless the frame size were somehow luckily
> aligned with the tempo.  Maybe a song with 70.3125 BPM or 140.625 BPM
> could be compressed this way, but most music will not have such a
> precise tempo - in fact, tempo may drift if a live band is recorded.
>
>
>> So I thought: imagine a pre-processing coupled with FLAC. It would
>> take
>> frames out of the whole song, and try to cross-correlate them with
>> the song
>> itself. When it finds strong matches (under a certain threshold, and
>> starting with a couple of matches), the frame is saved to a pool,
>> and it's
>> subtracted from the song.
>> Then you FLAC the (small) pool, and the song, full of near-silent
>> spots (&
>> silence where pure repetitions occured).
>> At decode time, you unFLAC the pool and the song, and you add back the
>> frames from the pool to the song.
> This might work, but you would have to be very lucky to find matches
> given the block size of FLAC (or the frame size of any format, for
> that matter).  But, you're right, if you can predict the waveform
> with reasonable accuracy, then you can reduce the size.
>
> FLAC and many other compression algorithms do, in fact, use this
> technique.  They look at the music, predict future samples, and then
> encode the difference between the predicted value and the actual value.
>
> It's doubtful that you could find a better algorithm at predicting
> the waveform, but if you do, then FLAC will work well with your added
> processing layer.
>
>
>> I haven't experimented yet, but let's say I try to correlate
frames
>> with the
>> song, and I get something like 20 near-repeats, I may end up with a
>> very
>> silent "song leftover", still as long as the song, but maybe
in
>> 4bits worth
>> or something? But it would also have bumps of original audio (that
>> didn't
>> find any matching frame).
>> The thing is, I don't really know how FLAC compresses so I
don't
>> know if it
>> would compress the "leftover" so much better.
> It's doubtful that you could find such repetition, given that the
> frame size has nothing to do with the tempo of the song, and
> repetition in music are based on tempo.  But, if you could find a
> match or even a near match, then FLAC would compress the difference
> better than the original.
>
>> And I don't really know how much matching frames you'd find out
in
>> music out
>> there, it would be very genre-dependent. But I'm surprised that no
one
>> really investigated this (there were old discussions in that
>> forum). Sure,
>> streaming is important, but it's common to fully download a song.
> The repetition would not be genre-dependent, but would be tempo-
> dependent.  I suppose you could say that certain genres might have a
> prevalent tempo, but there is enough variation within each genre to
> make the problem as big as non-genre-dependent matching.
>
> People have investigated this, but perhaps not at the macro level as
> is being discussed here.  I think you'll discover that finding a
> match within a song is very difficult.  You could perhaps start with
> BPM detection code, and then try to find repetitions based upon
> tempo, but even if you find matches this way, you still would need to
> find some way to squeeze the repetitions into whole frames, which are
> not divisions of tempo.
>
> Feel free to experiment.  The FLAC library makes it possible for you
> to work at the high level without writing everything yourself.
>
>> At the same time this wouldn't be very interesting for my need,
>> which is to
>> compress short samples. Now here too there could be a similar algo,
>> if it's
>> tonal, cross-correlation would detect matching frames, only at a
>> smaller
>> level. Imagine if you convert a violin sound into a pitch period
>> somewhere
>> in its middle, and the residual from that the subtraction of that
>> pitch
>> period in repeated frames. I think the residual would be rather quiet.
>
> If you're going to use the primary violin sounds middle pitch as the
> predictor, then you need a way for your encoder and decoder to find
> the exact same waveform.  If you can do this in a way that your
> decoder could discover the predicted values, then FLAC would be a
> successful way to compress the residual.
>
> Brian Willoughby
> Sound Consulting
>

--------------------------------------------------------------------------------



No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.392 / Virus Database: 270.13.47/2290 - Release Date: 08/08/09 
06:10:00

Brian Willoughby

2009-Aug-09 19:33 UTC

head link

[Flac-dev] alternate compression

On Aug 8, 2009, at 23:11, Didier Dambrin wrote:> Electronic music quite often doesn't leave a computer these days.  
> And it
> mainly consists of drums, synths & vocals/effects. Drums are often  
> samples
> sequenced at sample (not sub-sample) accuracy, thus repeated (of  
> course if
> the song was post-resampled, there will be sub-sample times).
Good point.  I have certainly seen songs which were at a fixed tempo,  
say 128 BPM, and were so precise that you could cut and paste pieces  
of the song without glitches.  Every measure lined up closely enough  
with the others that you could separate instruments from each other  
by subtracting out the repeated patterns.

> Synths are a problem, as the riffs will have more variations, and  
> also free-running
> oscillators will give troubles.
Not only that, but some synths are oversampled, thus you have the  
"analog" problem of subsampled waveforms.

> Anyway, right now I get what I wanted somewhat working, well enough
> considering I've only spent a couple of hours.
Excellent!

> ..But sadly none of FLAC, WavPack or OptimFrog could compress the
> pre-processed song better, or hardly. And considering you'd also  
> have to add
> the pool of frames, it would end up worse.
This surprises me.  Have you tried aligning your frames to the  
standard FLAC frame size?

As for the pool, it seems like the first occurrence of a repetition  
would compress like usual, and the subsequent ones would compress  
more than usual.
> The problem is the discontinuities I think. Say you work with little,
> non-tempo-synced frames, and you find a matching frame, which you  
> subtract
> from the song at the places it matches. You'll have a discontinuity  
> around
> it. If the frames around this one also match, it doesn't matter as  
> they will
> be subtracted as well. But if they don't (enough), the  
> discontinuity will
> stay.
You may be right about the discontinuities.  Have you tried making  
your transitions only at zero-crossings?
> I also tried windowing the frame before subtracting it, no more
> discontinuity but with small frames it's not very useful anymore.
Windowing may seem like a good idea, but remember that your decoder  
will have to recreate every step that your encoder uses so that it  
can be undone.  Thus, windowing may make it difficult to be lossless.
> But if I run the pre-processing on something perfectly repeated  
> several
> times, it really finds the frames, and it doesn't require knowing  
> the tempo.
> If you don't know the tempo, the only problem will be misalignment,  
> which
> will leave little bits of audio that were too short to find  
> matching frames,
> but most of the processed waveform will still be silence.
Seems like knowing the tempo would allow the encoding phase to take  
far less time.  It makes sense that you don't absolutely "need"
it,
but you did say it takes a really long time to find matches.

> Btw, are all lossless compression methods working in the time domain?
I would guess that most lossless audio compression methods are time  
domain.  However, LJPG (lossless JPEG) uses a very efficient lossy  
compression followed by lossless compression of the difference.  I  
wouldn't be surprised if there is an audio codec which combines lossy  
frequency domain compression with lossless compression of the  
difference between the lossy version and the original.  If there  
isn't then I'll just patent that...

Brian Willoughby
Sound Consulting

Maybe Matching Threads

Search for more maybe matching threads

flac dev - Aug 2009 - alternate compression

[Flac-dev] floating point

[Flac-dev] alternate compression

[Flac-dev] alternate compression

Maybe Matching Threads