On Wed 10 September 2003 13:08, Richard Felton wrote:> I have been using libvorbis for the past few weeks and have been
> asked to summarise what I have discovered about the codec. There
> is an early draft of the document at
> http://www.geocities.com/gatewaystation/vorbis/vorbis.htm -
Firstly, it may be a good idea to make it clear that what you are
documenting is the Xiph.org Vorbis reference codec. I could in
theory write an encoder that outputs valid Vorbis data yet works in
a wholly different way.
Secondly, in the diagram the MDCT and FFT appear before the
psychoacoustic stage, while in the text they are part of it. I
think the diagram is right, because transforming data into the
frequency domain doesn't have much to do with human hearing,
instead it is done because it yields data that can be more
effectively compressed by vector quantisation. So the
psychoacoustics header should be two paragraphs down and the text
should be adapted accordingly.
In the vector quantising explanation, I would change the middle
three paragraphs to something like the following:
---
Each point falls into a section and we could transmit the relevant
section number for each point. Since we are sending only a one
digit number rather than the entire vector, we achieve compression.
The decoder will have a codebook, which holds a vector for each
section, and use it to look up a vector for each section number it
receives. Ofcourse, since all original vectors within a section are
eventually decoded to the same vector from the codebook, some
information is lost.
The design of a vector quantiser is a difficult task. Obviously we
want to lose as little information as possible, so the decision
boundaries and codebook vectors must be designed in such a way as
to minimise the difference between the original vector and the
decoded vector. This in turn depends upon the distribution of the
input vectors, i.e. the input data of the encoder, and it is
important that the codebook used works well with a wide variety of
input data.
Vorbis extends the theory into more dimensions but this is difficult
to convey graphically. An algorithm for codebook design (similar to
the one used in Vorbis) can be found on the web at
data-compression.com [5].
The encoder achieves further compression by encoding the indices
using Huffman codes before sending them to the decoder.
---
As for the German article, the (German) online summary only mentions
that there were 6000 entries, of which 3300 with the 64
kbit-compressed data. Ogg Vorbis is clearly the best at 64kbit,
while at 128kbit the differences are smaller, with most people
being unable to distinguish between RealAudio, WMA, MP3Pro and MP3.
Lastly, perhaps it would be possible to generate a call graph of the
encoder somehow? It would be nice to have a graphical
representation of what uses what. Or maybe a more clear link
between the source files and the blocks in the block diagram, so
that it's easy to see which part of the functionality is
implemented where.
Cheers,
Lourens
--
GPG public key: http://home.student.utwente.nl/l.e.veen/lourens.key
--- >8 ----
List archives: http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to
'vorbis-dev-request@xiph.org'
containing only the word 'unsubscribe' in the body. No subject is
needed.
Unsubscribe messages sent to the list will be ignored/filtered.