After playing with the vorbis code for a while and doing tons of hacks and analysis on it, I've found it to perform very poorly with impulse signals. The MDCT seems to cause lots of spreading, and it seems to result in much worse impulse performance then mp3. What is the current plan on handling this? Will a smart quantizer be able to avoid it? I've been looking at various ways of taking care of this, and before I bother implimenting something I'd like to make sure that no one has gone down this path before: Roughtly vorbis currently does: input wave -> MDCT -> LPC -> LSP -> quant -> ------------------>output \->delpc->error->quant -^ What do you think of this: input wav -> DWT -> sum non-impuse factors -> iDWT -> MDCT ... (like above) \ -> -> sum impulse factors -> iDWT -> LPC -> LSP -> quant i.e. use a wavelet transform to seperate out impulsey signals and compress them in the time domain. The decoder complexity really isn't increased much (just one more dequant/LPC and a sum). I think there are optimized versions of the haar DWT that go really fast too.. --- >8 ---- List archives: xiph.org/archives Ogg project homepage: xiph.org/ogg
> After playing with the vorbis code for a while and doing tons of hacks and > analysis on it, I've found it to perform very poorly with impulse signals.At the immediate moment, a commit error on my part is making it worse :-) I currently have the psychoacoustics to *only* use a 2048 sample window... I was doing distortion testing on that specific window size and forgot to put it back to normal (where it will use a 256 or 512 sample window for impulses). (I just turned it back on in CVS). *HOWEVER*, that just puts us back in the mp3 range of dealing. I've been interested for a long time in doing something similar to what you propose, but I'd backed out what I was doing in order to push ahead with more fundamental parts of the first cut release (too many details for just me to handle at the moment :-) There are actually bitflags in the stream right now waiting for exactly this sort of thing to drop in. (This also means that 'research' on the subject can proceed at a prudent pace without holding anything up).> I've been looking at various ways of taking care of this, and before I > bother implimenting something I'd like to make sure that no one has gone > down this path before:I've gone part way down the path, so I have some additional clues to offer. This basic tack has my approval.> Roughtly vorbis currently does: > > input wave -> MDCT -> LPC -> LSP -> quant -> ------------------>output > \->delpc->error->quant -^ > > What do you think of this: > > input wav -> DWT -> sum non-impuse factors -> iDWT -> MDCT ... (like above) > \ > -> -> sum impulse factors -> iDWT -> LPC -> LSP -> quant > > i.e. use a wavelet transform to seperate out impulsey signals and > compress them in the time domain.Yes, this is exactly the way I wanted to proceed (only I wasn't using wavelets; wavelets are indeed worth pursuing). The encoder/decoder were structured exactly for the above flow (the convenience of the layout isn't accidental). However, we need to find a better way to emcode the impulses. (More on this later; I wanted to respond quickly to say that you're on the path that I started, but hadn't continued).> The decoder complexity really isn't increased much (just one more > dequant/LPC and a sum). I think there are optimized versions of the haar > DWT that go really fast too..Yes, in addition, wavelets are (IIRC) linear time (O(n)) transforms as well. The time taken by Haar transform itself would practically be lost in the noise :-) I'll have much more to talk about after some sleep. Monty --- >8 ---- List archives: xiph.org/archives Ogg project homepage: xiph.org/ogg
> Date: Fri, 19 Nov 1999 08:50:53 -0500 (EST) > From: Gregory Maxwell <greg@linuxpower.cx> > X-Sender: greg@link.z.aubbs.cx > Content-Type: TEXT/PLAIN; charset=US-ASCII > Sender: owner-vorbis-dev@xiph.org > Precedence: bulk > Reply-To: vorbis-dev@xiph.org > X-UIDL: hBkd9e6]!!IV@!!/j[d9 > X-UID: 632 > > > After playing with the vorbis code for a while and doing tons of hacks and > analysis on it, I've found it to perform very poorly with impulse signals. > > The MDCT seems to cause lots of spreading, and it seems to result in much > worse impulse performance then mp3. > > > What do you think of this: > > input wav -> DWT -> sum non-impuse factors -> iDWT -> MDCT ... (like above) > \ > -> -> sum impulse factors -> iDWT -> LPC -> LSP -> quant >Do you guys really think window switching is so bad? It clearly works very well and is not just 'mp3' quailty, since it is used in AAC which is pretty much the best encoder out there. The only problem I can see is that the encoding is not as efficient - you always need to allocated extra bits for short MDCT windows. But except for extreme cases like castanets.wav, the amount of attacks/pulses is usually less than 5%. Assuming 50% more bits for the lossless encoding, a more sophisticated technique would save at most 2.5%. Also, I believe Vorbis is using a 2048 sample MDCT window? (like AAC, but almost twice that of mp3). Such a large window results in more spreading, making short windows even more important? Monty: you've mentioned comparisions between Vorbis and AAC in the past - which AAC encoder/decoder were you using? If you get a chance, could you the output from a decoded AAC encoding of castanets.wav? Mark --- >8 ---- List archives: xiph.org/archives Ogg project homepage: xiph.org/ogg