--- Brady Patterson <brady@spaceship.com> wrote:> This is in response to Miroslav's idea about variable block sizes. I > may be a > bit out of my league here as I'm just starting to look at how the > actual > encoding gets done. But it seems to me that you could make a decent > guess > about when something "new" happens based on the second derivative of > the signal > (where the first derivative is the difference between a given sample > and the > previous, and the second is you-get-the-idea). > > Here's my rationale: high-amplitude, high-frequency sections are the > hard ones > to encode, or at least will work best in their own frame. Those > characteristics imply a high first derivative. You want to put such > sections > in their own block, and boundaries of such blocks will be where the > second > derivative is relatively high. > > Okay, that's not quite right, since the first derivative will be > negative about > half the time, and large negative has the same effect as lange > positive. So I > think what you really want is the first derivative of the absolute > value of the > first derivative. > > Then there's the question of where to put the boundaries. Some > trial-and-error > is probably the best approach here. For files on which the above > formula is > consistently high, it will probably be desirable to set the limit > high to avoid > too much frame overhead.I had done some experiments a while back with varying the blocksize. My initial approach was to do an exhaustive search on some clips just to see where the upper limit of improvement was, and it turned out do not be that great. If I remember right it was something like <1% compression improvement. In retrospect I probably would have designed the format with more restrictions on the blocksize to make the decoder simpler. So I guess I would say before trying really complicated algos, brute force it on a couple of samples to see if what you end up with will be worth it. My conclusion at the time was that varying the blocksize would probably only make sense for things like sound fonts. Josh __________________________________________________ Do you Yahoo!? Faith Hill - Exclusive Performances, Videos & More http://faith.yahoo.com
On Thu, Oct 17, 2002 at 09:51:02AM -0500, Brady Patterson wrote:> > Okay, I deleted most of this thread, so I was waiting for another message to > respond to, so unfortunately this will be out of place in the thread. > > This is in response to Miroslav's idea about variable block sizes. I may be a > bit out of my league here as I'm just starting to look at how the actual > encoding gets done. But it seems to me that you could make a decent guess > about when something "new" happens based on the second derivative of the signal > (where the first derivative is the difference between a given sample and the > previous, and the second is you-get-the-idea). > > Here's my rationale: high-amplitude, high-frequency sections are the hard ones > to encode, or at least will work best in their own frame. Those > characteristics imply a high first derivative. You want to put such sections > in their own block, and boundaries of such blocks will be where the second > derivative is relatively high. > > Okay, that's not quite right, since the first derivative will be negative about > half the time, and large negative has the same effect as lange positive. So I > think what you really want is the first derivative of the absolute value of the > first derivative. > > Then there's the question of where to put the boundaries. Some trial-and-error > is probably the best approach here. For files on which the above formula is > consistently high, it will probably be desirable to set the limit high to avoid > too much frame overhead. > > Hope this was interesting and/or useful :) .Well, i took 10 CD and test my primitive implementations of these algos. Here are my results: size encoding time (0) 6401778544 1.0000 (1) 4193699407 0.6551 1.0000 1.00 (2) 4180011683 0.6529 0.9967 1.18 (3) 4186509853 0.6540 0.9983 1.15 "best" CD: (0) 503448568 1.0000 (1) 349525363 0.6942 1.0000 (2) 347167639 0.6896 0.9933 (3) 347864119 0.6910 0.9952 "best" track: (0) 44111804 1.0000 (1) 28091683 0.6368 1.0000 (2) 27769870 0.6295 0.9885 (3) 27864205 0.6317 0.9919 where: (0) wav files (1) flac files, fixed blocksize 4608 (2) flac files, variable blocksize, "lpc idea" (3) flac files, variable blocksize, watching average of absolute values of first and second derivative -- Miroslav Lichvar
--- Miroslav Lichvar <lichvarm@phoenix.inf.upol.cz> wrote:> On Thu, Oct 17, 2002 at 09:51:02AM -0500, Brady Patterson wrote: > > ... But it seems to me that you could make a decent guess > > about when something "new" happens based on the second derivative > of the signal > > (where the first derivative is the difference between a given > sample and the > > previous, and the second is you-get-the-idea). > > > > Here's my rationale: high-amplitude, high-frequency sections are > the hard ones > > to encode, or at least will work best in their own frame. Those > > characteristics imply a high first derivative. You want to put > such sections > > in their own block, and boundaries of such blocks will be where the > second > > derivative is relatively high. > > > > Okay, that's not quite right, since the first derivative will be > negative about > > half the time, and large negative has the same effect as lange > positive. So I > > think what you really want is the first derivative of the absolute > value of the > > first derivative. > > > > Then there's the question of where to put the boundaries. Some > trial-and-error > > is probably the best approach here. For files on which the above > formula is > > consistently high, it will probably be desirable to set the limit > high to avoid > > too much frame overhead. > > Well, i took 10 CD and test my primitive implementations of these > algos. Here are my results: > > size encoding time > (0) 6401778544 1.0000 > (1) 4193699407 0.6551 1.0000 1.00 > (2) 4180011683 0.6529 0.9967 1.18 > (3) 4186509853 0.6540 0.9983 1.15 > > "best" CD: > (0) 503448568 1.0000 > (1) 349525363 0.6942 1.0000 > (2) 347167639 0.6896 0.9933 > (3) 347864119 0.6910 0.9952 > > "best" track: > (0) 44111804 1.0000 > (1) 28091683 0.6368 1.0000 > (2) 27769870 0.6295 0.9885 > (3) 27864205 0.6317 0.9919 > > where: > (0) wav files > (1) flac files, fixed blocksize 4608 > (2) flac files, variable blocksize, "lpc idea" > (3) flac files, variable blocksize, watching average of absolute > values of first and second derivativeInteresting, looks like the best case is ~ 0.75% increase in compression for 18% increase in encode time. The compression increase is similar to my old brute force test but much faster. The question is, is it worth it from the user's point of view? Josh __________________________________________________ Do you Yahoo!? Y! Web Hosting - Let the expert host your web site http://webhosting.yahoo.com/
Okay, I deleted most of this thread, so I was waiting for another message to respond to, so unfortunately this will be out of place in the thread. This is in response to Miroslav's idea about variable block sizes. I may be a bit out of my league here as I'm just starting to look at how the actual encoding gets done. But it seems to me that you could make a decent guess about when something "new" happens based on the second derivative of the signal (where the first derivative is the difference between a given sample and the previous, and the second is you-get-the-idea). Here's my rationale: high-amplitude, high-frequency sections are the hard ones to encode, or at least will work best in their own frame. Those characteristics imply a high first derivative. You want to put such sections in their own block, and boundaries of such blocks will be where the second derivative is relatively high. Okay, that's not quite right, since the first derivative will be negative about half the time, and large negative has the same effect as lange positive. So I think what you really want is the first derivative of the absolute value of the first derivative. Then there's the question of where to put the boundaries. Some trial-and-error is probably the best approach here. For files on which the above formula is consistently high, it will probably be desirable to set the limit high to avoid too much frame overhead. Hope this was interesting and/or useful :) . -- Brady Patterson (brady@spaceship.com)
A good first run. I wonder, though, what the distribution of block sizes looks like, and what the magnitude of residual is as a function of block index. I'd honestly expect a more significant improvement for either algorithm. On Sat, 2002-10-19 at 11:13, Miroslav Lichvar wrote:> On Thu, Oct 17, 2002 at 09:51:02AM -0500, Brady Patterson wrote: > > > > Okay, I deleted most of this thread, so I was waiting for another message to > > respond to, so unfortunately this will be out of place in the thread. > > > > This is in response to Miroslav's idea about variable block sizes. I may be a > > bit out of my league here as I'm just starting to look at how the actual > > encoding gets done. But it seems to me that you could make a decent guess > > about when something "new" happens based on the second derivative of the signal > > (where the first derivative is the difference between a given sample and the > > previous, and the second is you-get-the-idea). > > > > Here's my rationale: high-amplitude, high-frequency sections are the hard ones > > to encode, or at least will work best in their own frame. Those > > characteristics imply a high first derivative. You want to put such sections > > in their own block, and boundaries of such blocks will be where the second > > derivative is relatively high. > > > > Okay, that's not quite right, since the first derivative will be negative about > > half the time, and large negative has the same effect as lange positive. So I > > think what you really want is the first derivative of the absolute value of the > > first derivative. > > > > Then there's the question of where to put the boundaries. Some trial-and-error > > is probably the best approach here. For files on which the above formula is > > consistently high, it will probably be desirable to set the limit high to avoid > > too much frame overhead. > > > > Hope this was interesting and/or useful :) . > > Well, i took 10 CD and test my primitive implementations of these > algos. Here are my results: > > size encoding time > (0) 6401778544 1.0000 > (1) 4193699407 0.6551 1.0000 1.00 > (2) 4180011683 0.6529 0.9967 1.18 > (3) 4186509853 0.6540 0.9983 1.15 > > "best" CD: > (0) 503448568 1.0000 > (1) 349525363 0.6942 1.0000 > (2) 347167639 0.6896 0.9933 > (3) 347864119 0.6910 0.9952 > > "best" track: > (0) 44111804 1.0000 > (1) 28091683 0.6368 1.0000 > (2) 27769870 0.6295 0.9885 > (3) 27864205 0.6317 0.9919 > > where: > (0) wav files > (1) flac files, fixed blocksize 4608 > (2) flac files, variable blocksize, "lpc idea" > (3) flac files, variable blocksize, watching average of absolute > values of first and second derivative > > > -- > Miroslav Lichvar > > > ------------------------------------------------------- > This sf.net email is sponsored by: > Access Your PC Securely with GoToMyPC. Try Free Now > https://www.gotomypc.com/s/OSND/DD > _______________________________________________ > Flac-dev mailing list > Flac-dev@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/flac-dev-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.xiph.org/pipermail/flac-dev/attachments/20021021/901c9a4a/attachment.pgp