I'm trying to understand the residuevqtrain program, and I have some questions for Monty, Erik, or anyone that understands how it's supposed to work. I captured TRAIN_RES data from an encoding of a single track (about 4:43), producing two files, residue_0.vqd (3727 lines, = 3727 points?) and residue_1.vqd (huge). I then did a run with the parameters from the usage message residuevqtrain test_256_6_8_01_0 -p 256,6,8 -e .01 residue_0.vqd (with the version of residuevqtrain at the CVS head, last changed around 11/9 or so). 1. I'm thinking that this program is basically supposed to solve the VQ design problem, as documented on 'http://data-compression.com/vq.html' (with some variations for vorbis). In that problem, the goal is to choose the codevectors to minimize average distortion (average distance between each training vector and its associated codevector). This measure, or something similar, is given in the residuevqtrain output as 'metric error'. In the run I did, though, the value of metric error actually *increases* over the 1000 passes (see output below). Isn't this a bad thing, or am I missing something? 2. The residuevqtrain algorithm actually seems to be trying to minimize a slightly different measure, marked as 'dist' in the output. If I understand the idea correctly, the idea is to choose codevectors so that a nearly equal number of training vectors will be associated with each one. The 'dist' measure is a measure of how much this is so, smaller values being better. Why would this be better than just minimizing distortion? Although I can imagine the two metrics being highly correlated, it looks to me like the former would be better when they differ (think of what happens if you have a group of training vectors clumped together). 3. The program does seem to reduce 'dist', but I notice that the lowest value seen for it and for 'metric error' in this run was actually at pass 0. Does this mean that we should just stop at pass 0, or that the metrics are wrong, or is something else going on here? 4. If I'm reading the 'quantized entries' from the output .vqi file correctly, it looks as if there are a large number of duplicate entries (maybe because of quantization?). Isn't this bad? Or am I misreading? (In my example, I'm reading the first six lines as the first codevector, and so on.) I have more questions, but I'll stop here in case this is all just cluelessness on my part. --Mike $ residuevqtrain test_256_6_8_01_0 -p 256,6,8 -e .01 residue_0.vqd 128 colums per line in file residue_0.vqd reseeding with quantization.... Pass #0... : dist 0.361175(305.73) metric error=1.73526 cells shifted this iteration: 4 cell diameter: 4.66::10.3::36.8 (0 unused/79 dup) Pass #1... : dist 1.82539(305.73) metric error=9.36826 cells shifted this iteration: 32 cell diameter: 4.57::28.6::43.1 (5 unused/77 dup) Pass #2... : dist 1.70066(305.73) metric error=9.79896 cells shifted this iteration: 5 cell diameter: 4.57::29.9::46.5 (3 unused/76 dup) [...] Pass #995... : dist 0.554347(305.73) metric error=17.324 cells shifted this iteration: 0 cell diameter: 4.4::26.7::48.8 (0 unused/70 dup) Pass #996... : dist 0.554347(305.73) metric error=17.324 cells shifted this iteration: 0 cell diameter: 4.4::26.7::48.8 (0 unused/70 dup) Pass #997... : dist 0.554347(305.73) metric error=17.324 cells shifted this iteration: 0 cell diameter: 4.4::26.7::48.8 (0 unused/70 dup) Pass #998... : dist 0.554347(305.73) metric error=17.324 cells shifted this iteration: 0 cell diameter: 4.4::26.7::48.8 (0 unused/70 dup) Pass #999... : dist 0.554347(305.73) metric error=17.324 cells shifted this iteration: 0 cell diameter: 4.4::26.7::48.8 (0 unused/70 dup) -- [O]ne of the features of the Internet [...] is that small groups of people can greatly disturb large organizations. --Charles C. Mann --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'vorbis-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.
> I'm trying to understand the residuevqtrain program, and I have some questions > for Monty, Erik, or anyone that understands how it's supposed to work. > > I captured TRAIN_RES data from an encoding of a single track (about 4:43), > producing two files, residue_0.vqd (3727 lines, = 3727 points?) and > residue_1.vqd (huge).residue_0.vqd is residue data from short blocks, residue_1.vqd from long blocks.> I then did a run with the parameters from the usage > message > > residuevqtrain test_256_6_8_01_0 -p 256,6,8 -e .01 residue_0.vqd > > (with the version of residuevqtrain at the CVS head, last changed around 11/9 > or so). > > 1. I'm thinking that this program is basically supposed to solve the VQ > design problem, as documented on 'http://data-compression.com/vq.html' > (with some variations for vorbis).Yes, it runs either a straight LBG training, or a modified LBG training that attempts to maintain constant probability of occurrence per training cell (default).> In that problem, the goal is to choose the codevectors to minimize average > distortion (average distance between each training vector and its > associated codevector). This measure, or something similar, is given in > the residuevqtrain output as 'metric error'. In the run I did, though, > the value of metric error actually *increases* over the 1000 passes (see > output below). Isn't this a bad thing, or am I missing something?If you use -b, it's a straight LBG and global error will always go down. However, minimum average global error is a *lousy* training metric for audio (because frequency peaks are 'rare', you'll end up training to model the noise component of the signal, and peaks will always be very poorly approximated).> > 2. The residuevqtrain algorithm actually seems to be trying to minimize a > slightly different measure, marked as 'dist' in the output. If I > understand the idea correctly, the idea is to choose codevectors so that a > nearly equal number of training vectors will be associated with each one.yes.> The 'dist' measure is a measure of how much this is so, smaller values > being better. Why would this be better than just minimizing distortion?it isn't really.... what it gives you is a codebook where each entry has the same codeword length. the training stuff is not just production, it's meant for experimentation as well.> Although I can imagine the two metrics being highly correlated,They are.> it looks > to me like the former would be better when they differ (think of what > happens if you have a group of training vectors clumped together).Both, actually, are very suboptimal it turns out; perhaps there's a better way to do it that I haven't tried (well, there almost certainly is). The problem is that in frequency domain audio data, we fortunately only have to carefully replicate features that make up a small part of the data. Unfortunately, residue trained codebooks are being trained to represent global characteristics with minimum error. Globally, the tonal peaks, what we need to be most careful with, make up very little of the data and thus are modelled poorly.> 3. The program does seem to reduce 'dist', but I notice that the lowest value > seen for it and for 'metric error' in this run was actually at pass 0. > Does this mean that we should just stop at pass 0, or that the metrics are > wrong, or is something else going on here?omething else happened. 'dist' does not converge stably (it tends to oscillate about the minimum), but it should not shoot off to infinity.> > 4. If I'm reading the 'quantized entries' from the output .vqi file > correctly, it looks as if there are a large number of duplicate entries > (maybe because of quantization?). Isn't this bad?Yes, is bad.> Or am I misreading? > (In my example, I'm reading the first six lines as the first codevector, > and so on.)Yeah. BTW, grouping sixes is likely bad (the vector size isn't a multiple of six. try fours.> $ residuevqtrain test_256_6_8_01_0 -p 256,6,8 -e .01 residue_0.vqd > 128 colums per line in file residue_0.vqd > reseeding with quantization.... > Pass #0... : dist 0.361175(305.73) metric error=1.73526 > cells shifted this iteration: 4 > cell diameter: 4.66::10.3::36.8 (0 unused/79 dup)Yeah, things were already bad at this point (all the dupes). In this case, the data file is probably way too small to train (not enough short blocks to produce a set). Monty --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'vorbis-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.
xiphmont@xiph.org (Monty) writes:> Yes, it runs either a straight LBG training, or a modified LBG training that > attempts to maintain constant probability of occurrence per training cell > (default).Okay, so in the latter case the point of trying to maintain constant probability is so that they get Huffman encoded (?) to the same length? Is that right? This is what you mean when you talk below about "a codebook where each entry has the same codeword length"? Why is this good? Intuitively, it seems like if you're going to Huffman encode it anyway, it doesn't really matter whether the codevector probabilities are equal or not.> However, minimum average global error is a *lousy* training metric for audio > (because frequency peaks are 'rare', you'll end up training to model the > noise component of the signal, and peaks will always be very poorly > approximated).So maybe we could modify the metric to give more what you're looking for. If I understand, your saying that most training vectors look (say) like this (0.1, 0.2, 0.0, -0.1) but occasionally you get something like this (-0.1, 0.1, 98.0, -0.1) where the 98.0 is what you're calling a peak. And you're saying that even though these peaks are rare, distorting them is actually pretty bad compared to lots of minor distortion of the short vectors (which you're calling noise, I think). If this is all about right, how about a metric that gives more emphasis to peaks? We could use squared or even cubed distance instead of just distance, for example. We could also try ignoring small (noise) distances.> The problem is that in frequency domain audio data, we fortunately only have > to carefully replicate features that make up a small part of the data. > Unfortunately, residue trained codebooks are being trained to represent > global characteristics with minimum error. Globally, the tonal peaks, what > we need to be most careful with, make up very little of the data and thus > are modelled poorly.If this is the key problem, I think a different metric (and possibly some algorithm tweaks, a la bias) could fix things. One problem for me at this point is that I don't really understand the characteristics of these residue vectors, the patterns that would be present that would be candidates for compression. (Is the residue-probability space even very non-random/compressible? Maybe it would be better to just compress the residues directly with Huffman, gzip, or whatever?) Idiot newbie question: How bad does it sound if you drop (zero out) the residues completely? Has anyone ever listened to it?> Yeah, things were already bad at this point (all the dupes). In this case, > the data file is probably way too small to train (not enough short blocks to > produce a set).What does "produce a set" mean? A set of what? (The residue file had ~3700 training points, I think, to train 256 entries.) Thanks for the help! --Mike -- [O]ne of the features of the Internet [...] is that small groups of people can greatly disturb large organizations. --Charles C. Mann --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'vorbis-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.