Steinar H. Gunderson
2007-Apr-23 03:41 UTC
[Vorbis-dev] Getting masked FFT data out of libvorbisenc
[Apologies if this gets through twice. I sent it first without subscribing, but it seems like it got stuck in the moderation queue, so I subscribed and re-sent it.] I'm doing some work on audio fingerprinting for a school project (more precisely, my master's thesis. I got a hint on #vorbis that I might want to look into the internal floor representations in libvorbisenc to get out audio data after the psychoacoustic masking, but I'm having problems actually getting out the right data. Basically, I'm looking in mapping0.c, dumping out the debugging information that's already there. One of the most promising places seemed to be just before floor1_fit, but it seems a bit odd: http://home.samfundet.no/~sesse/vorbis_floor3.png In particular, there's a _lot_ of energy in the treble, where I'd expect there to be almost none. I don't know very much about the internals of Vorbis (nor psychoacoustics in general, I'm afraid), but it seems to be as if the floor is a rough copy of the FFT _plus_ the tone masking stuff, whereas I'd probably want it to be a rough copy of the FFT _minus_ the tone masking stuff. Is there any way I can actually get out this kind of information, short of encoding the entire signal and decoding it again (which will obviously also leave me with all the quantization noise and other artifacts that I don't want)? /* Steinar */ -- Homepage: http://www.sesse.net/
xiphmont@xiph.org
2007-Apr-23 14:38 UTC
[Vorbis-dev] Getting masked FFT data out of libvorbisenc
On 4/23/07, Steinar H. Gunderson <sgunderson@bigfoot.com> wrote:> > Basically, I'm looking in mapping0.c, dumping out the debugging information > that's already there. One of the most promising places seemed to be just > before floor1_fit, but it seems a bit odd: > > http://home.samfundet.no/~sesse/vorbis_floor3.png > > In particular, there's a _lot_ of energy in the treble, where I'd expect > there to be almost none.That's not energy, that's approximate discrimination threshold. You end up with so much treble 'masking' because the ear's tonal HF discrimination hardware is not very sensitive; look at a normal Absolute Threshold of Hearing curve and you'll see where the sharp upward slope is coming from. The only reason it moves around in Vorbis and it's not a fixed ATH curve is because 0dB is not a fixed point in absolute terms, and the Vorbis code is calculating the most pessimistic possible masking curve such that regardless the actual final playback level of the audio, the masking curve is always either correct or too low (but never too high). Monty I don't know very much about the internals of Vorbis> (nor psychoacoustics in general, I'm afraid), but it seems to be as if the > floor is a rough copy of the FFT _plus_ the tone masking stuff, whereas I'd > probably want it to be a rough copy of the FFT _minus_ the tone masking > stuff. > > Is there any way I can actually get out this kind of information, short of > encoding the entire signal and decoding it again (which will obviously also > leave me with all the quantization noise and other artifacts that I don't > want)? > > /* Steinar */ > -- > Homepage: http://www.sesse.net/ > _______________________________________________ > Vorbis-dev mailing list > Vorbis-dev@xiph.org > http://lists.xiph.org/mailman/listinfo/vorbis-dev >