Thread-split from the vorbis-mailing list ("Vorbis determined to be as good as MPC at 128 kbps!") <p>On Sun, 30 May 2004, Segher Boessenkool wrote: [Steven So] SS>> If iTunes AAC can encode castanets with much less pre-echo at SS>> ABR 128 kbps, then hopefully there will be an imaginative SS>> (and non-patented) way of doing this in Vorbis without the SS>> bitrate inflation of GTune and QKTune. [Segher Boessenkool] SB> Use some different transform? MDCT isn't the best audio transform SB> ever invented, esp. not for non-steady waveforms. Steven is talking about Vorbis, Segher. Vorbis makes use of the MDCT. <p>Let's see... Vorbis I versus AAC in transient coding... (simplified ASCII art following) audio wave ('-'=low volume, <!>=transient ) --------------------------<!>-------------------- AAC +---------------+---------------+---------------+ | 1 | 2 | 3 | frame no. +---------------+-+-+-+-+-+-+-+-+---------------+ | L |S|S|S|S|S|S|S|S| L | transform +---------------+-+-+-+-+-+-+-+-+---------------+ | A | B |C| D | E | scalefactor sets +---------------+---------+-+---+---------------+ Vorbis I +---------------+-+-+-+-+-+-+---------------+---- | 1 |2|3|4|5|6|7| 8 | packet no. +---------------+-+-+-+-+-+-+---------------+---- | L |S|S|S|S|S|S| L | transform +---------------+-+-+-+-+-+-+---------------+---- | F |G|H|I|J|K|L| M | floor curves +---------------+-+-+-+-+-+-+---------------+---- Vorbis II (proposal, see below) +---------------+---------+-+---------------+---- | 1 | 2 |3| 4 | packet no. +---------------+-+-+-+-+-+-+---------------+---- | L |S|S|S|S|S|S| L | transform +---------------+-+-+-+-+-+-+---------------+---- | N | O |P| Q | floor curves +---------------+---------+-+---------------+---- L = long transform S = short transform A-E = sets of scalefactors (AAC) F-N = floor curves (Vorbis I) M-Q = floor curves (Vorbis II) Obviously Vorbis I is wasting space in this example by coding 5 floor curves (G-K) that are very similar. AAC *shares* the scalefactor set B with these 5 windows thus saving space. Vorbis II could allow the storage of multiple 'short' MDCT spectra (maximal blocksize1/blocksize0 many) into one packet that share ONE floor curve. It maybe also worth the effort to encode the channel's residue vectors as one big vector (per channel) by interleaving. I think this will also improve coding efficiency a bit. As a side effect there will be the need for moreresidue configurations since the size of the residue vectors can be 1*128, 2*128, 3*128, ..., 7*128 and 8*128=1024. <p>Back to Vorbis I: What can be done to minimize pre-echos without increasing bitrate that much ? How about temporal noise shaping ? "Impossible!", you may say. Well, TNS is not a buil-in Vorbis feature like in AAC. But it doesn't HAVE to. TNS can be done either by coding the MDCT spectrum by 1) LPC-Filter + quantized LPC residual OR 2) using an NSQ (noise shaping quantizer) The AAC format allows method 1. But Method 2 could be done for both (Vorbis and AAC) without breaking compatibility. In fact, method 2 is used by MPC in the time domain to shape the quantization noise within a subband to better match the masking threshold. An NSQ applied in time domain can spectral shape the q-noise. What about an NSQ applied in the frequency domain ? What does it do ? Well, because of the time/frequency duality it will TEMPORAL-SHAPE the q-noise. Et voilĂ ! That's the theory. Don't know how well this can be applied in practice for Vorbis. (has to be investigated) <p>Ghis! Sebastian -- PGP-Key-ID (long): 572B1778A4CA0707 --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'vorbis-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.
> Steven is talking about Vorbis, Segher. > Vorbis makes use of the MDCT.Vorbis makes use of any transform you want. Currently there's only one transform defined, and that's the MDCT, sure. That doesn't mean we're stuck with it forever. Same goes for window shapes, btw.> Obviously Vorbis I is wasting space in this example by > coding 5 floor curves (G-K) that are very similar. > AAC *shares* the scalefactor set B with these 5 windows > thus saving space.Sharing the floors decreases the space needed for the floors, but increases the space needed for the residues. So this is a tradeoff. Also, deciding per group of packets if floors should be shared again wastes a few bits; more tradeoffs. I'm not saying what Vorbis currently does is optimal, of course; just that this is not a silver bullet.> It maybe also worth the effort to encode the channel's > residue vectors as one big vector (per channel) by > interleaving. I think this will also improve coding > efficiency a bit.Same comment. Segher
On Mon, 19 Jul 2004, Monty wrote:> On Fri, Jun 11, 2004 at 07:51:04PM +0200, Sebastian Gesemann wrote: > > > As for the transform issue: Why do you think is the MDCT not > > the best transform ever invented ? Do you want frequency varying > > time/frequency resolutions ? Do you think it's worth it ? > > yes and yes. rather, I've always wanted a hybrid transform pair, one > that focuses on frequency/pitch and another that focuses on time. The > reason being that the ear hears and processes these seperately, and > the MDCT is only well-suited to the former.So, what options do we have ? (1) the traditional hybrid filterbank approach: a) roughly split the signal into subbands (DWT or (P)QMF) b) further band-varying bandsplitting (MDCT) (2) the "other" hybrid filterbank approach I've tinkered with a) do the usual MDCT (like in Vorbis now) b) increase the temporal resolution via MDCtransforming some regions of the MDCT spectrum of the first transform and kindof reverse a bit the first stage for those regions (3) MDCT+TNS a) just a single MDCT like already done in Vorbis I b) linear predictive coding of the MDCT samples whereas the LPC synthesis filter models the temporal shape PROs: (1) - frequency varying time resolutions are possible (2) - frequency varying time resolutions are possible - completely MDCT based and perfect reconstrucion possible (3) - temporal shape can be accurately modeled by the LPC filter - easy to implement CONs: (1) - You have to design QMF / DWT filters - complicated to implement - need for (spectral) alias reduction transforms after the 2nd stage (2) - a bit more time consuming to calculate - need for (temporal) alias reduction - alias reduction implies a higher encode/decode delay I'd go for (3). It's nearly as powerful as (1) and (2) and IMHO much more efficient/simpler to implement. And IMHO the reason why heavily quantized HF noise sounds so metallic in case of a high spectral resolution is simple scalar quantization without dithering. I vote for a "Trellis Coded Quantizer" for Vorbis II. It implicitely does some sort of dithering, does a graet job in terms of rate/distortion at low rates and is VERY EASY to decode. http://ieeexplore.ieee.org/xpl/abs_free.jsp?arNumber=806615 [packets sharing floor curves and/or codebook class codes]> They probably will in the future (meaning V-II) due to the large > coding overhead at very low bitrates.You really mean sharing across packets ? (which will create some kind of Intra and Predictive-Coded packets) Or just coupling several small (ie blocksize0) chunks into one packet that share the curve ? (still independently decodable packets) I originally didn't want to go that far and create I-/P-packets like it's already done in Video coding. But maybe it's worth a try. I guess the floor curve overhead could be reduced by using temporal and inter-channel prediction.> MontySebastian -- PGP-Key-ID (long): 572B1778A4CA0707
On Tue, 3 Aug 2004, Sebastian Gesemann wrote:> I vote for a "Trellis Coded Quantizer" for Vorbis II. It > implicitely does some sort of dithering, does a graet > job in terms of rate/distortion at low rates and is VERY > EASY to decode. > > http://ieeexplore.ieee.org/xpl/abs_free.jsp?arNumber=806615Sorry, you need a login on that page. But the paper is also publicly available here: http://www-spacl.ece.arizona.edu/Publications/Papers/R_11.ps Sebastian -- PGP-Key-ID (long): 572B1778A4CA0707
vorbis-dev-bounces@xiph.org wrote:> >(3) MDCT+TNS > a) just a single MDCT like already done in Vorbis I > b) linear predictive coding of the MDCT samples whereas the > >Just curious, has TNS been patented yet? Best regards, Steve.