thr3ads.net - Vorbis dev - [vorbis-dev] Understanding of Vorbis coder [Sep 2001]

If this information is useful, please help other people find it:
Share via:

Padmashri Suresh

2001-Sep-05 01:14 UTC

[vorbis-dev] Understanding of Vorbis coder

Hi
I have gone through the document available in the net regarding the
Vorbis encoder /Decoder.
Based on that i have prepared a understanding document on the
encoder/decoder block. I would like to
know whether my understanding of the coder is OK. If there are any
other  additional block  /information pl. provide me
with the same.
Thanks and regards
S.Padmashri


<HR NOSHADE>
<UL>
<LI>application/msword attachment:
Understanding_Document_on_Ogg_Vorbis.doc
</UL>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Understanding_Document_on_Ogg_Vorbis.doc
Type: application/octet-stream
Size: 37424 bytes
Desc: not available
Url :
http://lists.xiph.org/pipermail/vorbis-dev/attachments/20010905/d75f5a04/Understanding_Document_on_Ogg_Vorbis-0001.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Wipro_Disclaimer.txt
Type: application/octet-stream
Size: 877 bytes
Desc: not available
Url :
http://lists.xiph.org/pipermail/vorbis-dev/attachments/20010905/d75f5a04/Wipro_Disclaimer-0001.obj

Erik Kruus

2001-Sep-05 06:18 UTC

head link

[vorbis-dev] Understanding of Vorbis coder

uggh.  Word attachment.  Two quick comments:

1) block diagram:  in my vorbis.on2.com decoder stuff is my version of a
block diagram for beta2(3?). That block diagram is now incomplete in that it is
missing recent additions such as (but not limited to):
 - a "feedback" link to touch up an amplitude parameter,
 - all channel coupling.
 (- and psy.c has been essentially completely rewritten since those days)
( Note: start off at http://www.mathdogs.com/vorbis-illuminated/ for an
  overview of the algorithm.  vorbis.on2.com is too technical for most
  folks )

2) "formant", to me at least, conveys phase info to be meaningful
(Are the zero locations not a vital part of a formant?)
The floor is purely an amplitude thing, so I wonder about the
correctness of calling the floor "formant-like".

Erik.
> If there are any
> other  additional block  /information pl. provide me
> with the same.
> Thanks and regards
> S.Padmashri
--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to
'vorbis-dev-request@xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is
needed.
Unsubscribe messages sent to the list will be ignored/filtered.

Gian-Carlo Pascutto

2001-Sep-05 06:53 UTC

head link

[vorbis-dev] Understanding of Vorbis coder

At 13:44 5/09/2001 +0530, you wrote:>Hi
>I have gone through the document available in the net regarding the
>Vorbis encoder /Decoder.
>Based on that i have prepared a understanding document on the
>encoder/decoder block. I would like to
>know whether my understanding of the coder is OK. If there are any
>other  additional block  /information pl. provide me
>with the same.
First things first: please use something else besides Microsoft
Word format. It's usually possible to extract the text on a Unix
box but that's about it. HTML would be a lot better. And you could
link to it so you don't have to dump it into the mailinglist each
time you make an update.

Now, on the document:

(note, what I state below may very well be wrong at times. If so, 
please correct!)
>
'Input speech signal' >
Vorbis handles a lot more than only speech
>Instead of performing Sub Banding on the time domain data before 
MDCT in ogg Vorbis they perform Windowing of the input speech signal.
Windows are overlapped to reduce undesirable distortion that would 
occur with non-overlapping, adjacent windows.Vorbis uses windows 
of two sizes, called short and long. The sizes must be powers of
two.>
Mentioning subbanding here is not needed as it's not used anyway
and will probably only add confusion (what is subbanding?). 

Simpeler and shorter: 'The input audio data is windowed before
the MDCT is applied. The MDCT uses an overlap of 50%.'
As I understand it, the M in MDCT implies that you use some
kind of overlap, so if you assume the reader knows what an MDCT
is, there's no need to explain the need for overlapping.
But perhaps: 'MDCT stands for Modified Discrete Cosine Transform.
It transforms blocks of audio data from the time to the 
frequency domain. It uses an overlap between those blocks to
be able to do this in a lossless manner.

The windowing+MDCT are really closely related steps. Windowing
is NOT the same as splitting up in short and long blocks!

The decision whether to use a long or a short block is done before
this by 4 parallel bandpass filters that detect energy surges.
Short blocks are used to get better precision in the time domain,
if needed.

In the graph, I'm not sure if the psychoacoustic model is
in parallel with the windowing+MDCT. Since the psymodel needs
frequency domain data I'd assume it works on the MDCT output
too, but I'm not sure.
>This block generates the Spectral envelope and it is called as 
floor curve. [..] This spectral envelope 
curve is represented by LPC coefficients>
The most important goal of the psychoacoustics is to deteremine
what is audible and what is not. That's totally missing here.
As I understand it, the psychoacoustics are used to simplify
the data to which the LPC curve is fitted. The LPC curve itself
is a coarse approximation of how the actual spectral envelope
looks after the psymodel has been applied.
>These curve have formant like structure due to roll of property 
of the masking tone. >
I have no idea what this is supposed to mean...
>The LPC coefficients are computed using Levinsion Durbin 
algorithm>
The actual algorithm is pretty irrelevant in the grand scheme
of things, especially since (IIRC) it's changed at least once
and right now some kind of hybrid structure is used.
>The output of the MDCT block and the LSP block are quantified 
and then encoded using the codebook mechanism. >
Erm, no. The 'floor curve' generated by the LPC/LSP coefficients
(the coarse approximation of the spectral envelope) is subtracted
from the MDCT data (the actual spectral envelope, after the psymodel
has been applied) leaving behind the 'residue'.

Both the floor curve coefficients and the residue are then fed
to the VQ codebooks. They are not 'quantified and then encoded'.
This is a single step inherent in the vector quantization.
>
The decoder receives the frame extracts the LPC coeff. >
You've just said earlier they are converted to LSP form
prior to encoding because that representation tolerates
quantization better. So it's not LPC coeffs that are 
decoded of course, but LSP coeffs. 


-- 
GCP

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to
'vorbis-dev-request@xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is
needed.
Unsubscribe messages sent to the list will be ignored/filtered.

Seemingly Similar Threads

Search for more possibly parallel threads

Vorbis dev - Sep 2001 - Understanding of Vorbis coder

[vorbis-dev] Understanding of Vorbis coder

[vorbis-dev] Understanding of Vorbis coder

[vorbis-dev] Understanding of Vorbis coder

Seemingly Similar Threads