Hi, I think that it would be a good thing to know more about those 2 projects (and also the future patent free format). I think that many people as me know about Lame, but not about Vorbis, and vice-versa. It would be fine that someone (perhaps the maintainer) of every project would introduce to both group of people those projects. 2 things would be interesting (to my mind): - to know about the "orientation" and goals of those projects, what is the current status, what is planned -to have a short introduction to the tools and techniques used in both Regards, Gabriel --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/
Mark Taylor
1999-Sep-06 20:11 UTC
[vorbis-dev] long message on absolute threshold of hearing (ATH)
Robert recent coded up (in LAME) the formula for the ATH. I would like to switch the LAME psycho-acoustics (gpsycho) over to using formulas for all quantities like this instead of the ISO MP3 tables. This will make it much easier to use gpsycho with any sampling rate, any size FFT or any number of critical bands, so it can be used in other encoders (like Vorbis). Also, a couple people have suggested at higher bitrates there should be less noise shaping. At 128kbs, you need all the help you can get from the psycho-acoustics, but at 256kbs probably just using the ATH would be quite good. Anyway, I was comparing Robert's values to what the ISO uses, and I was not able to get them to match up. I think that the ISO formula is basically broken. Here's my take on the situation - if anyone knows why I'm wrong or has other usefull comments, pleast post! The ISO formula goes through a complicated procedure of first computing a threshold in partition bands, adding the ATH, then suming the values into scalefactor bands, and finally computing a ratio (masking/energy). Then, in loop.c, when computing the allowed distortion, this ratio is multiplied by the average energy (as computed by the MDCT) within each scalefactor band. To test this, I first measured the strength of a 3.3kHz sine wave with amplitude 32767 (as large as possible on a 16bit CD). This is the frequency for which the ear is most sensitive. The energy of this wave shows up in scalefactor band (sfb) = 12, with an energy of -10db. The dynamic range of a CD is 96db, meaning that the energy range in sfb=12 is: -106 -> -10 (db). Next, I disabled all the masking from l3psy(), except the ATH, then computed the actual l3_xmin (allowed distortion in loop.c). Using the ISO ATH formula, this number hovers around -150db, a full 50db below the lowest possible energy!! Thus it is *never* used. Here are the results from a random frame. 'ISO masking' is the ISO ATH value (since all other masking was turned off) 'ath' is the value computed from Robert's code (normalized at 3.3kHz), and ave_ener is the average amount of energy in the scalefactor band: 0 ISO masking= -111.86 ath= -75.46 ave_ener= -28.77 (db) 1 ISO masking= -133.42 ath= -85.59 ave_ener= -35.10 (db) 2 ISO masking= -152.47 ath= -88.77 ave_ener= -35.88 (db) 3 ISO masking= -149.16 ath= -90.40 ave_ener= -43.64 (db) 4 ISO masking= -150.66 ath= -91.43 ave_ener= -54.46 (db) 5 ISO masking= -144.71 ath= -92.16 ave_ener= -61.28 (db) 6 ISO masking= -149.53 ath= -93.02 ave_ener= -56.81 (db) 7 ISO masking= -153.00 ath= -93.76 ave_ener= -54.95 (db) 8 ISO masking= -143.65 ath= -94.81 ave_ener= -47.43 (db) 9 ISO masking= -157.70 ath= -96.04 ave_ener= -40.64 (db) 10 ISO masking= -150.85 ath= -97.84 ave_ener= -50.10 (db) 11 ISO masking= -140.94 ath= -99.92 ave_ener= -55.77 (db) 12 ISO masking= -151.46 ath= -100.98 ave_ener= -61.99 (db) 13 ISO masking= -165.33 ath= -100.92 ave_ener= -62.04 (db) 14 ISO masking= -151.18 ath= -98.48 ave_ener= -49.57 (db) 15 ISO masking= -149.10 ath= -95.20 ave_ener= -48.68 (db) 16 ISO masking= -151.12 ath= -93.72 ave_ener= -58.48 (db) 17 ISO masking= -151.89 ath= -92.10 ave_ener= -59.46 (db) 18 ISO masking= -141.80 ath= -88.49 ave_ener= -53.07 (db) 19 ISO masking= -133.54 ath= -80.69 ave_ener= -60.38 (db) 20 ISO masking= -128.13 ath= -66.16 ave_ener= -74.77 (db) I haven't yet run any listening tests with the new ath, but hopefully tomorrow. My feeling as that unlike the other psycho acoustics, ATH should be close to perfect. That is, quantization noise < ATH really should not be audible. I am hoping the ATH will make a nice analog silence detection: any time the energy < ath, we can just zero out all the coefficients. Mark --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/