thr3ads.net - Vorbis dev - [vorbis-dev] Vorbis/Lame [Aug 1999]

If this information is useful, please help other people find it:
Share via:

Gabriel Bouvigne

1999-Aug-25 07:20 UTC

[vorbis-dev] Vorbis/Lame

Hi,

I think that it would be a good thing to know more about those 2 projects
(and also the future patent free format).
I think that many people as me know about Lame, but not about Vorbis, and
vice-versa.

It would be fine that someone (perhaps the maintainer) of every project
would introduce to both group of people those projects. 2 things would be
interesting (to my mind):
- to know about the "orientation" and goals of those projects, what is
the
current status, what is planned
-to have a short introduction to the tools and techniques used in both

Regards,

Gabriel

--- >8 ----
List archives:  xiph.org/archives
Ogg project homepage: xiph.org/ogg

Mark Taylor

1999-Sep-06 20:11 UTC

head link

[vorbis-dev] long message on absolute threshold of hearing (ATH)

Robert recent coded up (in LAME) the formula for the ATH.  I would
like to switch the LAME psycho-acoustics (gpsycho) over to using
formulas for all quantities like this instead of the ISO MP3 tables.
This will make it much easier to use gpsycho with any sampling rate,
any size FFT or any number of critical bands, so it can be used in
other encoders (like Vorbis).  Also, a couple people have suggested at
higher bitrates there should be less noise shaping.  At 128kbs, you
need all the help you can get from the psycho-acoustics, but at 256kbs
probably just using the ATH would be quite good.

Anyway, I was comparing Robert's values to what the ISO uses, and
I was not able to get them to match up.  I think that the ISO formula
is basically broken.  Here's my take on the situation - if anyone
knows why I'm wrong or has other usefull comments, pleast post!

The ISO formula goes through a complicated procedure of first
computing a threshold in partition bands, adding the ATH, then suming
the values into scalefactor bands, and finally computing a ratio
(masking/energy).  Then, in loop.c, when computing the allowed
distortion, this ratio is multiplied by the average energy (as computed by the
MDCT) within each scalefactor band.

To test this, I first measured the strength of a 3.3kHz sine
wave with amplitude 32767 (as large as possible on a 16bit CD).  
This is the frequency for which the ear is most sensitive.
The energy of this wave shows up in scalefactor band (sfb) = 12,
with an energy of -10db.  The dynamic range of a CD is 96db,
meaning that the energy range in sfb=12 is:  -106 -> -10 (db).  

Next, I disabled all the masking from l3psy(), except the ATH,
then computed the actual l3_xmin (allowed distortion in loop.c).
Using the ISO ATH formula, this number hovers around -150db,
a full 50db below the lowest possible energy!!  Thus it is *never*
used.

Here are the results from a random frame.  'ISO masking' is the ISO
ATH value (since all other masking was turned off) 'ath' is the value
computed from Robert's code (normalized at 3.3kHz), 
and ave_ener is the average amount of energy in the scalefactor band:

 0 ISO masking=    -111.86  ath=    -75.46    ave_ener=    -28.77  (db) 
 1 ISO masking=    -133.42  ath=    -85.59    ave_ener=    -35.10  (db) 
 2 ISO masking=    -152.47  ath=    -88.77    ave_ener=    -35.88  (db) 
 3 ISO masking=    -149.16  ath=    -90.40    ave_ener=    -43.64  (db) 
 4 ISO masking=    -150.66  ath=    -91.43    ave_ener=    -54.46  (db) 
 5 ISO masking=    -144.71  ath=    -92.16    ave_ener=    -61.28  (db) 
 6 ISO masking=    -149.53  ath=    -93.02    ave_ener=    -56.81  (db) 
 7 ISO masking=    -153.00  ath=    -93.76    ave_ener=    -54.95  (db) 
 8 ISO masking=    -143.65  ath=    -94.81    ave_ener=    -47.43  (db) 
 9 ISO masking=    -157.70  ath=    -96.04    ave_ener=    -40.64  (db) 
10 ISO masking=    -150.85  ath=    -97.84    ave_ener=    -50.10  (db) 
11 ISO masking=    -140.94  ath=    -99.92    ave_ener=    -55.77  (db) 
12 ISO masking=    -151.46  ath=   -100.98    ave_ener=    -61.99  (db) 
13 ISO masking=    -165.33  ath=   -100.92    ave_ener=    -62.04  (db) 
14 ISO masking=    -151.18  ath=    -98.48    ave_ener=    -49.57  (db) 
15 ISO masking=    -149.10  ath=    -95.20    ave_ener=    -48.68  (db) 
16 ISO masking=    -151.12  ath=    -93.72    ave_ener=    -58.48  (db) 
17 ISO masking=    -151.89  ath=    -92.10    ave_ener=    -59.46  (db) 
18 ISO masking=    -141.80  ath=    -88.49    ave_ener=    -53.07  (db) 
19 ISO masking=    -133.54  ath=    -80.69    ave_ener=    -60.38  (db) 
20 ISO masking=    -128.13  ath=    -66.16    ave_ener=    -74.77  (db) 

I haven't yet run any listening tests with the new ath, but hopefully
tomorrow.  My feeling as that unlike the other psycho acoustics, ATH
should be close to perfect.  That is, quantization noise < ATH really
should not be audible.  I am hoping the ATH will make a nice analog silence
detection: any time the energy < ath, we can just zero out all the
coefficients.  

Mark

  

--- >8 ----
List archives:  xiph.org/archives
Ogg project homepage: xiph.org/ogg

Apparently Analagous Threads

Search for more seemingly similar threads

Vorbis dev - Aug 1999 - Vorbis/Lame

[vorbis-dev] Vorbis/Lame

[vorbis-dev] long message on absolute threshold of hearing (ATH)

Apparently Analagous Threads