thr3ads.net - Vorbis dev - [vorbis-dev] Transient coding: AAC vs. Vorbis [Jun 2004]

If this information is useful, please help other people find it:
Share via:

Sebastian Gesemann

2004-Jun-02 10:45 UTC

[vorbis-dev] Transient coding: AAC vs. Vorbis

Thread-split from the vorbis-mailing list
("Vorbis determined to be as good as MPC at 128 kbps!")

<p>On Sun, 30 May 2004, Segher Boessenkool wrote:

[Steven So]
SS>> If iTunes AAC can encode castanets with much less pre-echo at
SS>> ABR 128 kbps, then hopefully there will be an imaginative
SS>> (and non-patented) way of doing this in Vorbis without the
SS>> bitrate inflation of GTune and QKTune.

[Segher Boessenkool]
SB> Use some different transform?  MDCT isn't the best audio transform
SB> ever invented, esp. not for non-steady waveforms.

Steven is talking about Vorbis, Segher.
Vorbis makes use of the MDCT.

<p>Let's see... Vorbis I versus AAC in transient coding...
(simplified ASCII art following)

audio wave  ('-'=low volume, <!>=transient )
--------------------------<!>--------------------

AAC
+---------------+---------------+---------------+
|       1       |       2       |       3       | frame no.
+---------------+-+-+-+-+-+-+-+-+---------------+
|       L       |S|S|S|S|S|S|S|S|       L       | transform
+---------------+-+-+-+-+-+-+-+-+---------------+
|       A       |    B    |C| D |       E       | scalefactor sets
+---------------+---------+-+---+---------------+

Vorbis I
+---------------+-+-+-+-+-+-+---------------+----
|       1       |2|3|4|5|6|7|       8       |     packet no.
+---------------+-+-+-+-+-+-+---------------+----
|       L       |S|S|S|S|S|S|       L       |     transform
+---------------+-+-+-+-+-+-+---------------+----
|       F       |G|H|I|J|K|L|       M       |     floor curves
+---------------+-+-+-+-+-+-+---------------+----

Vorbis II (proposal, see below)
+---------------+---------+-+---------------+----
|       1       |    2    |3|       4       |     packet no.
+---------------+-+-+-+-+-+-+---------------+----
|       L       |S|S|S|S|S|S|       L       |     transform
+---------------+-+-+-+-+-+-+---------------+----
|       N       |    O    |P|       Q       |     floor curves
+---------------+---------+-+---------------+----

L   = long transform
S   = short transform
A-E = sets of scalefactors (AAC)
F-N = floor curves (Vorbis I)
M-Q = floor curves (Vorbis II)

Obviously Vorbis I is wasting space in this example by
coding 5 floor curves (G-K) that are very similar.
AAC *shares* the scalefactor set B with these 5 windows
thus saving space.

Vorbis II could allow the storage of multiple 'short'
MDCT spectra (maximal blocksize1/blocksize0 many)
into one packet that share ONE floor curve.

It maybe also worth the effort to encode the channel's
residue vectors as one big vector (per channel) by
interleaving. I think this will also improve coding
efficiency a bit. As a side effect there will be the need
for moreresidue configurations since the size of the residue
vectors can be 1*128, 2*128, 3*128, ..., 7*128 and 8*128=1024.

<p>Back to Vorbis I:

What can be done to minimize pre-echos without increasing
bitrate that much ? How about temporal noise shaping ?
"Impossible!", you may say. Well, TNS is not a buil-in
Vorbis feature like in AAC. But it doesn't HAVE to.

TNS can be done either by coding the MDCT spectrum by
1) LPC-Filter + quantized LPC residual
OR
2) using an NSQ (noise shaping quantizer)

The AAC format allows method 1. But Method 2 could be done
for both (Vorbis and AAC) without breaking compatibility.
In fact, method 2 is used by MPC in the time domain to
shape the quantization noise within a subband to better
match the masking threshold.

An NSQ applied in time domain can spectral shape the
q-noise. What about an NSQ applied in the frequency domain ?
What does it do ? Well, because of the time/frequency duality
it will TEMPORAL-SHAPE the q-noise. Et voilà !

That's the theory. Don't know how well this can be applied
in practice for Vorbis. (has to be investigated)

<p>Ghis!
Sebastian


--
PGP-Key-ID (long): 572B1778A4CA0707

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to
'vorbis-dev-request@xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is
needed.
Unsubscribe messages sent to the list will be ignored/filtered.

Segher Boessenkool

2004-Jun-05 12:43 UTC

head link

[vorbis-dev] Transient coding: AAC vs. Vorbis

> Steven is talking about Vorbis, Segher.
> Vorbis makes use of the MDCT.
Vorbis makes use of any transform you want.  Currently there's
only one transform defined, and that's the MDCT, sure.  That
doesn't mean we're stuck with it forever.

Same goes for window shapes, btw.
> Obviously Vorbis I is wasting space in this example by
> coding 5 floor curves (G-K) that are very similar.
> AAC *shares* the scalefactor set B with these 5 windows
> thus saving space.
Sharing the floors decreases the space needed for the
floors, but increases the space needed for the
residues.  So this is a tradeoff.  Also, deciding
per group of packets if floors should be shared again
wastes a few bits; more tradeoffs.

I'm not saying what Vorbis currently does is optimal,
of course; just that this is not a silver bullet.
> It maybe also worth the effort to encode the channel's
> residue vectors as one big vector (per channel) by
> interleaving. I think this will also improve coding
> efficiency a bit.
Same comment.


Segher

Sebastian Gesemann

2004-Aug-03 06:24 UTC

head link

[vorbis-dev] Transient coding: AAC vs. Vorbis

On Mon, 19 Jul 2004, Monty wrote:
> On Fri, Jun 11, 2004 at 07:51:04PM +0200, Sebastian Gesemann wrote:
>
> > As for the transform issue: Why do you think is the MDCT not
> > the best transform ever invented ? Do you want frequency varying
> > time/frequency resolutions ? Do you think it's worth it ?
>
> yes and yes.  rather, I've always wanted a hybrid transform pair, one
> that focuses on frequency/pitch and another that focuses on time.  The
> reason being that the ear hears and processes these seperately, and
> the MDCT is only well-suited to the former.
So, what options do we have ?

(1) the traditional hybrid filterbank approach:
a) roughly split the signal into subbands (DWT or (P)QMF)
b) further band-varying bandsplitting (MDCT)
(2) the "other" hybrid filterbank approach I've tinkered with
a) do the usual MDCT (like in Vorbis now)
b) increase the temporal resolution via MDCtransforming
some regions of the MDCT spectrum of the first transform
and kindof reverse a bit the first stage for those regions
(3) MDCT+TNS
a) just a single MDCT like already done in Vorbis I
b) linear predictive coding of the MDCT samples whereas the
LPC synthesis filter models the temporal shape

PROs:

(1) - frequency varying time resolutions are possible
(2) - frequency varying time resolutions are possible
- completely MDCT based and perfect reconstrucion possible
(3) - temporal shape can be accurately modeled by the LPC filter
- easy to implement

CONs:
(1) - You have to design QMF / DWT filters
- complicated to implement
- need for (spectral) alias reduction transforms after
the 2nd stage
(2) - a bit more time consuming to calculate
- need for (temporal) alias reduction
- alias reduction implies a higher encode/decode delay


I'd go for (3).
It's nearly as powerful as (1) and (2) and IMHO much more
efficient/simpler to implement.


And IMHO the reason why heavily quantized HF noise
sounds so metallic in case of a high spectral resolution
is simple scalar quantization without dithering.

I vote for a "Trellis Coded Quantizer" for Vorbis II. It
implicitely does some sort of dithering, does a graet
job in terms of rate/distortion at low rates and is VERY
EASY to decode.

http://ieeexplore.ieee.org/xpl/abs_free.jsp?arNumber=806615


[packets sharing floor curves and/or codebook class codes]
> They probably will in the future (meaning V-II) due to the large
> coding overhead at very low bitrates.
You really mean sharing across packets ? (which will create some
kind of Intra and Predictive-Coded packets)
Or just coupling several small (ie blocksize0) chunks
into one packet that share the curve ? (still independently
decodable packets)

I originally didn't want to go that far and create I-/P-packets
like it's already done in Video coding. But maybe it's worth a
try. I guess the floor curve overhead could be reduced by using
temporal and inter-channel prediction.

> Monty
Sebastian

--
PGP-Key-ID (long): 572B1778A4CA0707

Sebastian Gesemann

2004-Aug-03 06:55 UTC

head link

[vorbis-dev] Transient coding: AAC vs. Vorbis

On Tue, 3 Aug 2004, Sebastian Gesemann wrote:
> I vote for a "Trellis Coded Quantizer" for Vorbis II. It
> implicitely does some sort of dithering, does a graet
> job in terms of rate/distortion at low rates and is VERY
> EASY to decode.
>
> http://ieeexplore.ieee.org/xpl/abs_free.jsp?arNumber=806615
Sorry, you need a login on that page. But the paper is also
publicly available here:

http://www-spacl.ece.arizona.edu/Publications/Papers/R_11.ps


Sebastian

--
PGP-Key-ID (long): 572B1778A4CA0707

Stephen So

2004-Aug-03 16:31 UTC

head link

[vorbis-dev] Transient coding: AAC vs. Vorbis

vorbis-dev-bounces@xiph.org wrote:
>
>(3) MDCT+TNS
>    a) just a single MDCT like already done in Vorbis I
>    b) linear predictive coding of the MDCT samples whereas the
>
>
Just curious, has TNS been patented yet?

Best regards,

Steve.

Possibly Parallel Threads

Search for more possibly parallel threads

Vorbis dev - Jun 2004 - Transient coding: AAC vs. Vorbis

[vorbis-dev] Transient coding: AAC vs. Vorbis

[vorbis-dev] Transient coding: AAC vs. Vorbis

[vorbis-dev] Transient coding: AAC vs. Vorbis

[vorbis-dev] Transient coding: AAC vs. Vorbis

[vorbis-dev] Transient coding: AAC vs. Vorbis

Possibly Parallel Threads