thr3ads.net - Vorbis dev - [vorbis-dev] questions re residuevqtrain [Dec 2000]

If this information is useful, please help other people find it:
Share via:

Mike Coleman

2000-Dec-20 18:07 UTC

[vorbis-dev] questions re residuevqtrain

I'm trying to understand the residuevqtrain program, and I have some
questions
for Monty, Erik, or anyone that understands how it's supposed to work.

I captured TRAIN_RES data from an encoding of a single track (about 4:43),
producing two files, residue_0.vqd (3727 lines, = 3727 points?) and
residue_1.vqd (huge).  I then did a run with the parameters from the usage
message

   residuevqtrain test_256_6_8_01_0 -p 256,6,8 -e .01 residue_0.vqd

(with the version of residuevqtrain at the CVS head, last changed around 11/9
or so).

1.  I'm thinking that this program is basically supposed to solve the VQ
    design problem, as documented on
'http://data-compression.com/vq.html'
    (with some variations for vorbis).

    In that problem, the goal is to choose the codevectors to minimize average
    distortion (average distance between each training vector and its
    associated codevector).  This measure, or something similar, is given in
    the residuevqtrain output as 'metric error'.  In the run I did,
though,
    the value of metric error actually *increases* over the 1000 passes (see
    output below).  Isn't this a bad thing, or am I missing something?

2.  The residuevqtrain algorithm actually seems to be trying to minimize a
    slightly different measure, marked as 'dist' in the output.  If I
    understand the idea correctly, the idea is to choose codevectors so that a
    nearly equal number of training vectors will be associated with each one.
    The 'dist' measure is a measure of how much this is so, smaller
values
    being better.  Why would this be better than just minimizing distortion?
    Although I can imagine the two metrics being highly correlated, it looks
    to me like the former would be better when they differ (think of what
    happens if you have a group of training vectors clumped together).

3.  The program does seem to reduce 'dist', but I notice that the lowest
value
    seen for it and for 'metric error' in this run was actually at pass
0.
    Does this mean that we should just stop at pass 0, or that the metrics are
    wrong, or is something else going on here?

4.  If I'm reading the 'quantized entries' from the output .vqi file
    correctly, it looks as if there are a large number of duplicate entries
    (maybe because of quantization?).  Isn't this bad?  Or am I misreading?
    (In my example, I'm reading the first six lines as the first codevector,
    and so on.)

I have more questions, but I'll stop here in case this is all just
cluelessness on my part.

--Mike

$ residuevqtrain test_256_6_8_01_0 -p 256,6,8 -e .01 residue_0.vqd 
128 colums per line in file residue_0.vqd
reseeding with quantization....
Pass #0... : dist 0.361175(305.73) metric error=1.73526 
cells shifted this iteration: 4
cell diameter: 4.66::10.3::36.8 (0 unused/79 dup)
Pass #1... : dist 1.82539(305.73) metric error=9.36826 
cells shifted this iteration: 32
cell diameter: 4.57::28.6::43.1 (5 unused/77 dup)
Pass #2... : dist 1.70066(305.73) metric error=9.79896 
cells shifted this iteration: 5
cell diameter: 4.57::29.9::46.5 (3 unused/76 dup)
[...]
Pass #995... : dist 0.554347(305.73) metric error=17.324 
cells shifted this iteration: 0
cell diameter: 4.4::26.7::48.8 (0 unused/70 dup)
Pass #996... : dist 0.554347(305.73) metric error=17.324 
cells shifted this iteration: 0
cell diameter: 4.4::26.7::48.8 (0 unused/70 dup)
Pass #997... : dist 0.554347(305.73) metric error=17.324 
cells shifted this iteration: 0
cell diameter: 4.4::26.7::48.8 (0 unused/70 dup)
Pass #998... : dist 0.554347(305.73) metric error=17.324 
cells shifted this iteration: 0
cell diameter: 4.4::26.7::48.8 (0 unused/70 dup)
Pass #999... : dist 0.554347(305.73) metric error=17.324 
cells shifted this iteration: 0
cell diameter: 4.4::26.7::48.8 (0 unused/70 dup)


-- 
[O]ne of the features of the Internet [...] is that small groups of people can
greatly disturb large organizations.  --Charles C. Mann

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to
'vorbis-dev-request@xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is
needed.
Unsubscribe messages sent to the list will be ignored/filtered.

Monty

2000-Dec-21 12:02 UTC

head link

[vorbis-dev] questions re residuevqtrain

> I'm trying to understand the residuevqtrain program, and I have some
questions
> for Monty, Erik, or anyone that understands how it's supposed to work.
> 
> I captured TRAIN_RES data from an encoding of a single track (about 4:43),
> producing two files, residue_0.vqd (3727 lines, = 3727 points?) and
> residue_1.vqd (huge). 
residue_0.vqd is residue data from short blocks, residue_1.vqd from
long blocks.
> I then did a run with the parameters from the usage
> message
> 
>    residuevqtrain test_256_6_8_01_0 -p 256,6,8 -e .01 residue_0.vqd
> 
> (with the version of residuevqtrain at the CVS head, last changed around
11/9
> or so).
> 
> 1.  I'm thinking that this program is basically supposed to solve the
VQ
>     design problem, as documented on
'http://data-compression.com/vq.html'
>     (with some variations for vorbis).
Yes, it runs either a straight LBG training, or a modified LBG
training that attempts to maintain constant probability of occurrence
per training cell (default).  
>     In that problem, the goal is to choose the codevectors to minimize
average
>     distortion (average distance between each training vector and its
>     associated codevector).  This measure, or something similar, is given
in
>     the residuevqtrain output as 'metric error'.  In the run I did,
though,
>     the value of metric error actually *increases* over the 1000 passes
(see
>     output below).  Isn't this a bad thing, or am I missing something?
If you use -b, it's a straight LBG and global error will always go down.

However, minimum average global error is a *lousy* training metric for
audio (because frequency peaks are 'rare', you'll end up training to
model the noise component of the signal, and peaks will always be very
poorly approximated).
> 
> 2.  The residuevqtrain algorithm actually seems to be trying to minimize a
>     slightly different measure, marked as 'dist' in the output.  If
I
>     understand the idea correctly, the idea is to choose codevectors so
that a
>     nearly equal number of training vectors will be associated with each
one.
yes.
>     The 'dist' measure is a measure of how much this is so, smaller
values
>     being better.  Why would this be better than just minimizing
distortion?
it isn't really.... what it gives you is a codebook where each entry
has the same codeword length.  the training stuff is not just
production, it's meant for experimentation as well.
>     Although I can imagine the two metrics being highly correlated, 
They are.
>     it looks
>     to me like the former would be better when they differ (think of what
>     happens if you have a group of training vectors clumped together).
Both, actually, are very suboptimal it turns out; perhaps there's a
better way to do it that I haven't tried (well, there almost certainly is).

The problem is that in frequency domain audio data, we fortunately
only have to carefully replicate features that make up a small part of
the data.  Unfortunately, residue trained codebooks are being trained
to represent global characteristics with minimum error.  Globally, the
tonal peaks, what we need to be most careful with, make up very little
of the data and thus are modelled poorly.
> 3.  The program does seem to reduce 'dist', but I notice that the
lowest value
>     seen for it and for 'metric error' in this run was actually at
pass 0.
>     Does this mean that we should just stop at pass 0, or that the metrics
are
>     wrong, or is something else going on here?
omething else happened.  'dist' does not converge stably (it tends to
oscillate about the minimum), but it should not shoot off to infinity.
> 
> 4.  If I'm reading the 'quantized entries' from the output .vqi
file
>     correctly, it looks as if there are a large number of duplicate entries
>     (maybe because of quantization?).  Isn't this bad?  
Yes, is bad.
>     Or am I misreading?
>     (In my example, I'm reading the first six lines as the first
codevector,
>     and so on.)
Yeah.  BTW, grouping sixes is likely bad (the vector size isn't a multiple
of six. try fours.
> $ residuevqtrain test_256_6_8_01_0 -p 256,6,8 -e .01 residue_0.vqd 
> 128 colums per line in file residue_0.vqd
> reseeding with quantization....
> Pass #0... : dist 0.361175(305.73) metric error=1.73526 
> cells shifted this iteration: 4
> cell diameter: 4.66::10.3::36.8 (0 unused/79 dup)
Yeah, things were already bad at this point (all the dupes).  In this case, the
data file is probably way too small to train (not enough short blocks to produce
a set).

Monty

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to
'vorbis-dev-request@xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is
needed.
Unsubscribe messages sent to the list will be ignored/filtered.

Mike Coleman

2000-Dec-22 02:30 UTC

head link

[vorbis-dev] questions re residuevqtrain

xiphmont@xiph.org (Monty) writes:> Yes, it runs either a straight LBG training, or a modified LBG training
that
> attempts to maintain constant probability of occurrence per training cell
> (default).
Okay, so in the latter case the point of trying to maintain constant
probability is so that they get Huffman encoded (?) to the same length?  Is
that right?  This is what you mean when you talk below about "a codebook
where
each entry has the same codeword length"?  Why is this good?  Intuitively,
it
seems like if you're going to Huffman encode it anyway, it doesn't
really
matter whether the codevector probabilities are equal or not.
> However, minimum average global error is a *lousy* training metric for
audio
> (because frequency peaks are 'rare', you'll end up training to
model the
> noise component of the signal, and peaks will always be very poorly
> approximated).
So maybe we could modify the metric to give more what you're looking for. 
If
I understand, your saying that most training vectors look (say) like this

        (0.1, 0.2, 0.0, -0.1)

but occasionally you get something like this

        (-0.1, 0.1, 98.0, -0.1)

where the 98.0 is what you're calling a peak.  And you're saying that
even
though these peaks are rare, distorting them is actually pretty bad compared
to lots of minor distortion of the short vectors (which you're calling
noise,
I think).

If this is all about right, how about a metric that gives more emphasis to
peaks?  We could use squared or even cubed distance instead of just distance,
for example.  We could also try ignoring small (noise) distances.
> The problem is that in frequency domain audio data, we fortunately only
have
> to carefully replicate features that make up a small part of the data.
> Unfortunately, residue trained codebooks are being trained to represent
> global characteristics with minimum error.  Globally, the tonal peaks, what
> we need to be most careful with, make up very little of the data and thus
> are modelled poorly.
If this is the key problem, I think a different metric (and possibly some
algorithm tweaks, a la bias) could fix things.  

One problem for me at this point is that I don't really understand the
characteristics of these residue vectors, the patterns that would be present
that would be candidates for compression.  (Is the residue-probability space
even very non-random/compressible?  Maybe it would be better to just compress
the residues directly with Huffman, gzip, or whatever?)

Idiot newbie question: How bad does it sound if you drop (zero out) the
residues completely?  Has anyone ever listened to it?
> Yeah, things were already bad at this point (all the dupes).  In this case,
> the data file is probably way too small to train (not enough short blocks
to
> produce a set).
What does "produce a set" mean?  A set of what?

(The residue file had ~3700 training points, I think, to train 256 entries.)

Thanks for the help!
--Mike


-- 
[O]ne of the features of the Internet [...] is that small groups of people can
greatly disturb large organizations.  --Charles C. Mann

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to
'vorbis-dev-request@xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is
needed.
Unsubscribe messages sent to the list will be ignored/filtered.

Seemingly Similar Threads

Search for more maybe matching threads

Vorbis dev - Dec 2000 - questions re residuevqtrain

[vorbis-dev] questions re residuevqtrain

[vorbis-dev] questions re residuevqtrain

[vorbis-dev] questions re residuevqtrain

Seemingly Similar Threads