I'm doing testing on this at the moment.
But to start with:>>Because music represents an analog signal,
As I wrote, it would only apply to specific genres, not analog recordings.
Electronic music quite often doesn't leave a computer these days. And it
mainly consists of drums, synths & vocals/effects. Drums are often samples
sequenced at sample (not sub-sample) accuracy, thus repeated (of course if
the song was post-resampled, there will be sub-sample times). Synths are a
problem, as the riffs will have more variations, and also free-running
oscillators will give troubles.
But remember that it's not about finding perfect matches which will very
rarely happen, just correlating signals to leave a residual, as long as it
compensates (repeats enough) for the extra frame pool you'd have to store.
>>The repetition would not be genre-dependent, but would be
tempo-dependent.
If would be very genre dependent for the reason I explained: samples. Say
you have a drummer repeating a drumloop. If it's recorded, there's no
chance
the noise of the drums will correlate, it will change all the time. But if
it's a drum sample, they will match. It's easy to correlate similar
kickdrums, but that hardly works with cymbals.
Anyway, right now I get what I wanted somewhat working, well enough
considering I've only spent a couple of hours.
I've done matching-frame-detection at small & big level. Small is just a
couple of samples, but I also tried big tempo-synced blocks (this wasn't a
problem, I have a good tempo detector, and btw tempo detection can also be
done using cross-correlation in a similar matching-frame way).
It works (& takes ages btw), for ex I take a typical piece of music with a
drumloop repeated & a varying synth line over it. It detects the repeated
drumloop, subtracts it, so that for some parts there's just the synth line
alone, without the drumloop anymore. And of course it totally fails for
other songs I tested. So I'm sure it could end up working well with a lot
more time.
..But sadly none of FLAC, WavPack or OptimFrog could compress the
pre-processed song better, or hardly. And considering you'd also have to add
the pool of frames, it would end up worse.
The problem is the discontinuities I think. Say you work with little,
non-tempo-synced frames, and you find a matching frame, which you subtract
from the song at the places it matches. You'll have a discontinuity around
it. If the frames around this one also match, it doesn't matter as they will
be subtracted as well. But if they don't (enough), the discontinuity will
stay.
I also tried windowing the frame before subtracting it, no more
discontinuity but with small frames it's not very useful anymore.
But if I run the pre-processing on something perfectly repeated several
times, it really finds the frames, and it doesn't require knowing the tempo.
If you don't know the tempo, the only problem will be misalignment, which
will leave little bits of audio that were too short to find matching frames,
but most of the processed waveform will still be silence.
So, I don't think I will test this further (it's the kind of thing you
can
spend months on to eventually give up), but I think it has potential, just
maybe not coupled with existing lossless compression methods.
Afterall compression is about finding how data repeats, and music clearly
repeats. It would also be useful for lossy compression, say you have a
drummer playing 2x a loop, a lossy compressor could assume the second is the
same as the first, even if it doesn't match perfectly. But really, I think a
compressor should compress music at the level it repeats, and the current
compressors seem to work at a smaller time scale.
(& in any case it would also require a huge compressing time, unless
matching frames detection is done heavily in parallel using GPU maybe)
So the algo I tried is roughly:
-peek frames from the waveform, 1 by 1
-cross-correlate the frame with the rest of the waveform
-check the correlation result & whenever there's a strong match, look
around
it if there's not an even better match that's close enough
-for each match, subtract the frame from the waveform (tried with & without
windowing). This may also be improved if you normalize the frame to the
matching one (haven't tried)
Btw, are all lossless compression methods working in the time domain?
>> Btw, what do you think of this?
>> http://www.hydrogenaudio.org/forums/index.php?
>> s=95a0210a0ba3304eca44ac3bd57990cb&showtopic=73895
>> (didn't know where to post this, that forum seemed related)
>
> That article is very naive, or at least the way it is described is
> very naive. Real music does not repeat in terms of whole frames.
> Frames are a completely artificial creation of the digital world, and
> frame timing does not correspond to the timing of music repetitions
> in music. Because music represents an analog signal, the repetition
> could occur at a fraction of a frame, or even a fraction of a
> sample. Compressing a drum loop would require a lot of tricks to
> detect the repetition unless the frame size were somehow luckily
> aligned with the tempo. Maybe a song with 70.3125 BPM or 140.625 BPM
> could be compressed this way, but most music will not have such a
> precise tempo - in fact, tempo may drift if a live band is recorded.
>
>
>> So I thought: imagine a pre-processing coupled with FLAC. It would
>> take
>> frames out of the whole song, and try to cross-correlate them with
>> the song
>> itself. When it finds strong matches (under a certain threshold, and
>> starting with a couple of matches), the frame is saved to a pool,
>> and it's
>> subtracted from the song.
>> Then you FLAC the (small) pool, and the song, full of near-silent
>> spots (&
>> silence where pure repetitions occured).
>> At decode time, you unFLAC the pool and the song, and you add back the
>> frames from the pool to the song.
> This might work, but you would have to be very lucky to find matches
> given the block size of FLAC (or the frame size of any format, for
> that matter). But, you're right, if you can predict the waveform
> with reasonable accuracy, then you can reduce the size.
>
> FLAC and many other compression algorithms do, in fact, use this
> technique. They look at the music, predict future samples, and then
> encode the difference between the predicted value and the actual value.
>
> It's doubtful that you could find a better algorithm at predicting
> the waveform, but if you do, then FLAC will work well with your added
> processing layer.
>
>
>> I haven't experimented yet, but let's say I try to correlate
frames
>> with the
>> song, and I get something like 20 near-repeats, I may end up with a
>> very
>> silent "song leftover", still as long as the song, but maybe
in
>> 4bits worth
>> or something? But it would also have bumps of original audio (that
>> didn't
>> find any matching frame).
>> The thing is, I don't really know how FLAC compresses so I
don't
>> know if it
>> would compress the "leftover" so much better.
> It's doubtful that you could find such repetition, given that the
> frame size has nothing to do with the tempo of the song, and
> repetition in music are based on tempo. But, if you could find a
> match or even a near match, then FLAC would compress the difference
> better than the original.
>
>> And I don't really know how much matching frames you'd find out
in
>> music out
>> there, it would be very genre-dependent. But I'm surprised that no
one
>> really investigated this (there were old discussions in that
>> forum). Sure,
>> streaming is important, but it's common to fully download a song.
> The repetition would not be genre-dependent, but would be tempo-
> dependent. I suppose you could say that certain genres might have a
> prevalent tempo, but there is enough variation within each genre to
> make the problem as big as non-genre-dependent matching.
>
> People have investigated this, but perhaps not at the macro level as
> is being discussed here. I think you'll discover that finding a
> match within a song is very difficult. You could perhaps start with
> BPM detection code, and then try to find repetitions based upon
> tempo, but even if you find matches this way, you still would need to
> find some way to squeeze the repetitions into whole frames, which are
> not divisions of tempo.
>
> Feel free to experiment. The FLAC library makes it possible for you
> to work at the high level without writing everything yourself.
>
>> At the same time this wouldn't be very interesting for my need,
>> which is to
>> compress short samples. Now here too there could be a similar algo,
>> if it's
>> tonal, cross-correlation would detect matching frames, only at a
>> smaller
>> level. Imagine if you convert a violin sound into a pitch period
>> somewhere
>> in its middle, and the residual from that the subtraction of that
>> pitch
>> period in repeated frames. I think the residual would be rather quiet.
>
> If you're going to use the primary violin sounds middle pitch as the
> predictor, then you need a way for your encoder and decoder to find
> the exact same waveform. If you can do this in a way that your
> decoder could discover the predicted values, then FLAC would be a
> successful way to compress the residual.
>
> Brian Willoughby
> Sound Consulting
>
--------------------------------------------------------------------------------
No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.392 / Virus Database: 270.13.47/2290 - Release Date: 08/08/09
06:10:00