Hello folks,
The codebook infrastructure for Vorbis is now well underway. This is it! The
last piece needed for *real* bitstreams.
To that end, I have a decent VQ codebook generator running. I was originally
using some farily typical merge/split algorithms and then decided that the
right way to do this was to model a VQ codebook as an m-dimensional set of
bubbles (like a foam).
I know there are folks reading the list that understand the details of vector
quantization, so I thought I'd sanity check my assumptions and approach.
*I*
think what I'm doing is perfectly sensible (and I'll point out that the
code
works), but playing with VQ is new to me, so others will know better.
Some background and a set of assumptions:
1) VQ and Huffman coding are just special cases of a more generic method of
mapping input values (scalar or vector) to a 'codeword' then reversing
the
process on the other side. The process may be lossless or it may quantize (ie,
the original mapping may be 1-to-1 or many-to-1); the reverse process is always
1-to-1 (every codeword has an output value). A complete collection of
codewords is a codebook.
2) The codebook mechanism in Vorbis (for now) is used to encode both the LSP
coefficients (for the spectral envelope) and the MDCT coefficient residue after
removing the spectral envelope. Either/both may be VQ/Huffman/etc; Vorbis does
not distinguish between the special cases of the codebook mechanism. On the
decode side, codebooks all look alike (and strictly speaking, the proper route
is only to spec a decoder and let the encoder do anything it wants to produce a
valid stream).
*** 3) It just so turns out that to keep complexity down for now, the encoder
will use VQ (with fixed or variable length codewords) for all encoding
operations. However, the generation of a variable length codeword codebook is
different than generating a fixed-length word codebook (and the error
characteristics are different). Here's one of the places where I really
want a
sanity check!:
a. When generating a VQ codebook that uses fixed length words, you want
each word to have roughly the same probability of use. This naturally
results in some entries that take up alot of 'volume' and some
that
take up very little. Areas of large volume have much higher mean error
than areas that take up very little 'volume'.
b. When generating a VQ codebook that uses variable length words, one
tries to keep average error across entries even. Entries that have a
low probability of coming up have higher error per hit, but a much
lower probability of occurring. Overall, this method results appears
to result in much lower 'worst case' error, but higher average
error
is areas of the codebook space with a relatively high probability of
a hit. In fact, this behaves much more like a lossy huffcoding where
error is pushed away from high probabilities to low probabilites, but
not to the extent as in a fixed length word codebook.
The first sanity check I really want is point 3; what experience do folks have
with fixed length vs. variable length VQ codewords? I only know what I've
been
playing with 18 hours a day for the last week :-) Specifically, I'm looking
for the easy answer: Is one obviously better than the other? Fixed length
codeword is the 'classic' VQ I see everywhere. Is this due to a natural
superiority of this method or only because the other uses want the property of
fixed length codewords (which are very useful to highly lossy channel
encoding)?
Next, we get on to my eight dimensional foams.
I've decided that rather than have a merge/split alg, I'd model the foam
as a loosely physical system. What I settled on best resembles a mass of small
bubbles (foam) competing for space. The analogy should be obvious; each entry
in the codebook can be looked at as a bubble, occupying some amount of volume in
however many dimensions the codebook has (I've been playing with eight). It
borders its neighboring bubbles. Neighboring bubbles push it; although slightly
compressible, it pushes back (and may end up transferrign the force to other
bubbles) inherently shifting things around until the system reaches steady state
(assuming proper damping in the system). We add one additional property where
the midpoint of the bubble (the representative entry value) tends to move toward
the center of probability density in the volume the bubble occupies (this will
make bubbles tend to gravitate toward areas of high probability).
We can adjust the size of each bubble (according to either error or probability)
by adding or removing 'pressure' in the bubble. The pressure directly
grows/shrinks the bubble's volume. If the bubble shrinks, it pulls other
bubbles closer. If it groes, it pushes--- OK, you get the picture.
The fixed-length word tables can be generated using the number of entries
contained in the bubble as a pressure metric. Too few, just pump up the
bubble.
Variable-length word codebooks can be generated using total error in the bubble
as the pressure metric.
So, comments? Everything is coming along (the VQ generation foam is actually
running and building codebooks). There are likely bits of intuition I'm
missing though and things I've not yet considered. Others' experience
is
welcome in order to short-circuit the learning curve.
Monty
--- >8 ----
List archives: http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/