thr3ads.net - theora - [theora] Fixed Quantizer

If this information is useful, please help other people find it:
Share via:

Dan Miller

2003-Mar-25 17:25 UTC

[theora] Fixed Quantizer - Fixed Quality

Here's the problem:
> 2) Encoding with rate control as in single pass "Bitrate 
> control" will not
> lead to better quality than fixed quant (with the right value 
> of the fixed
> quant). Ratecontrol doens't know anything about "quality". It
> will try to 
> reach more-or-less CBR. 
> 
> But somehow this is not a fair comparison, because how do you 
> determine 
> the right quantizer value? You have to look at the material, 
> so you have 
> extra information. 
> 
> ---------------------------------------------------------------
> 
> 3) Two-pass-encoding with varying quantizer can lead to better overall
> quality than fixed quantizer encoding. 
> 
> E.g.: Encode Barcelona with Quant 25, but Suzie with quant 8. 
> Total size will be similar: 
> 
> Suzie-Q8:  275442   +   Barcelona-Q25: 347980    =   Total 623422
> Suzie-Q20: 115378   +   Barcelona-Q20: 550760    =   Total 666138
> 
> But visual quality makes a real difference as you can see 
> from th other
> attached pictures: Barcelona-Q25 isn't too much worse than Q20.
> Suzie-Q8 is _much_ better than Q20. 
> 
> These are just examples, of course...
everything you say is basically true.  However, what you are not accounting for
is that it is the job of the codec to define what "Q=8" means.  In the
DIVX case, I would claim the codec is at fault for not accounting for the fact
that some material will look terrible at Q=20, and redefining Q on that basis. 
Your theory seems to be that this is the job of a hypothetical "2-pass
encoder", but I don't see how multiple passes per se makes any
difference.  It's an issue of where the logic resides.  How does any
encoder, whether one-pass, 2-pass, or whatever, determine that the
'suzy' scenes need a different setting than the Barcelona clip to
achieve subjectively similar quality?

I can tell you how this is usually dealt with in practice: most encoder apps
provide modes where quality and bitrate can both be variable within some range. 
In your example, we might say that Q can vary up to 25, but only if necessary to
pull the bitrate down below some threshold.  Below that threshold, Q can go down
(ie quality increases in your example) until the threshold bitrate is
approximated.  2-pass encoders simply have more information on how to do this
effectively (ie knowing that a simple scene is coming up, they can increase
quality on the cut so you don't see an ugly transitional period of a few
frames).  True CBR is basically this strategy rigorously enforced against a
given transport speed and playback buffer model.

This sort of relates to the PSNR discussion in the following way: internally,
when making various encoding choices (block type, quantizers), most video codecs
simply use some variation of MSE (mean squared error, which is what PSNR is
derived from), or more typically SAD (Sum of Absolute Differences), which is a
very similar metric (but easier to calculate).  In either case, as has been
discussed, the results of this approach do not correlate very well with
perceived quality, especially when taken over varying types of source material
(as your examples prove).

So, for my money, the codecs should be doing a better job of incorporating some
intelligence to correlate their 'Q' values to actual perceived quality,
rather than some arbitrary pixel difference value.  That way, fixed-Q could
actually mean something useful.  I suspect that audio codecs, particularly
Vorbis, do this intrinsically, because their internal psycho-accoustic models
tend to be rather complex.  In the video world, for reasons that elude me, this
is not the case.  I know of no codec that incorporates any useful psycho-visual
model into its encoder (though there are encoding apps that sit on top of codecs
that claim to do this).

IMSHO, this should be a major design goal of any improved Theora encoders we
develop.

--- >8 ----
List archives:  xiph.org/archives
Ogg project homepage: xiph.org/ogg
To unsubscribe from this list, send a message to
'theora-request@xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is
needed.
Unsubscribe messages sent to the list will be ignored/filtered.

Stan Seibert

2003-Mar-25 17:34 UTC

head link

[theora] Fixed Quantizer - Fixed Quality

On Tue, 2003-03-25 at 19:25, Dan Miller wrote:> So, for my money, the codecs should be doing a better job of
> incorporating some intelligence to correlate their 'Q' values to
> actual perceived quality, rather than some arbitrary pixel difference
> value.  That way, fixed-Q could actually mean something useful.  I
> suspect that audio codecs, particularly Vorbis, do this intrinsically,
> because their internal psycho-accoustic models tend to be rather
> complex.  In the video world, for reasons that elude me, this is not
> the case.  I know of no codec that incorporates any useful
> psycho-visual model into its encoder (though there are encoding apps
> that sit on top of codecs that claim to do this).
I'll ask the obvious follow-up.  :)

Is there a reasonable "psycho-visual" model to work with?

---
Stan Seibert

<p><p>--- >8 ----
List archives:  xiph.org/archives
Ogg project homepage: xiph.org/ogg
To unsubscribe from this list, send a message to
'theora-request@xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is
needed.
Unsubscribe messages sent to the list will be ignored/filtered.

Marco Al

2003-Mar-25 17:52 UTC

head link

[theora] Fixed Quantizer - Fixed Quality

From: "Dan Miller" <dan@on2.com>
> everything you say is basically true.  However, what you are not
> accounting for is that it is the job of the codec to define what
"Q=8"
> means.
I think the general assumption was that you meant quantizer by Q, not
quality. Christoph most certainly means quantizer with Q.
> I suspect that audio codecs, particularly Vorbis, do this intrinsically,
> because their internal psycho-accoustic models tend to be rather complex.
> In the video world, for reasons that elude me, this is not the case.  I
> know of no codec that incorporates any useful psycho-visual model into its
> encoder (though there are encoding apps that sit on top of codecs that
> claim to do this).
I think audio is also easier because our hearing is mostly frequency
sensitive, and our sight more structure sensitive. To oversimplify ... our
hearing perceives the max error, which makes quantization for constant
quality much easier, but our sight perceives an error which is a more
complex function of the errors at the seperate frequencies.

A coding mode which puts a hard limit on a MB's MSE shouldnt be too slow or
hard to code BTW. Would be an easy point to start for constant quality
coding.

Marco

--- >8 ----
List archives:  xiph.org/archives
Ogg project homepage: xiph.org/ogg
To unsubscribe from this list, send a message to
'theora-request@xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is
needed.
Unsubscribe messages sent to the list will be ignored/filtered.

Dan Miller

2003-Mar-25 20:45 UTC

head link

[theora] Fixed Quantizer - Fixed Quality

> From: Stan Seibert [mailto:volsung@mailsnare.net]
...> Is there a reasonable "psycho-visual" model to work with?
> (in booming narrator voice:) "Well Stan, that's an excellent
question!!"

I'm just starting to review the present state of research (see my link in a
previous post to the 'ITS' objective measurement stuff for instance --
I'm pretty impressed with their stuff so far).  In my own research, I've
looked at frequency-banded PSNR, as well as modifications to PSNR to account for
the fact that low contrast scenes will have a much lower MSE for the perceived
error (presumably because the eye/brain is doing contrast adjustments on a
region basis).  This is a big issue -- more on that later (quick point: PSNR
usually is calculated with a presumed pixel value range of 0-255 [20 * log10(255
/ sqrt(mse) )].  What if the image has a range of 50 to 200?  Shoudn't the
formula then be 20 * log10(150 / sqrt(mse) ) ?? )

All of this begs the question: what exactly does the eye/brain do with an image?
One big problem that makes the video side harder than audio is that viewing
conditions can vary so widely.  everything from a movie theater (dark room with
a large, hi-res screen) to looking at some multimedia on your iPAQ outside on a
sunny day.

My general impression is that most people agree we perceive images through some
sort of wavelet-like combination spatial/frequency decomposition.  Obviously, we
have circuits to do feature extraction at various levels (edge detectors, etc). 
So my guess would be that we need to break the image down into reasonably sized
areas (the size of the regions is very dependent on viewing conditions; optimum
is probably a specific angle of vision).  We also have to consider how to
segment an image into regions without problems arising at the region boundaries.
Then, within these regions, we need to do some sort of frequency domain
analysis, and empirically learn what the JND's (Just Noticeable Differences)
are for various types of distortion (noise, low-pass, phase distortion,
quantization...), all normalized to the overall energy of the region.

In other words, we need a comprehensive model of allowable threshold distortions
(as a function of total energy) in a combined spatial/frequency domain.  Then we
can tune our codecs to produce errors that fall within those thresholds,
allocating bits accordingly.

Yeah, something like that sounds nice.

-dan
--- >8 ----
List archives:  xiph.org/archives
Ogg project homepage: xiph.org/ogg
To unsubscribe from this list, send a message to
'theora-request@xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is
needed.
Unsubscribe messages sent to the list will be ignored/filtered.

Dan Miller

2003-Mar-25 21:01 UTC

head link

[theora] Fixed Quantizer - Fixed Quality

> From: Marco Al [mailto:marco@simplex.nl]
...> I think the general assumption was that you meant quantizer by Q, not
> quality. Christoph most certainly means quantizer with Q.
Fair enough.  I guess then my point is that offering some sort of raw
'Quantizer' knob to an end user of a codec is a baad idea.  The user
usually wants to go for maximum quality M (Q could be confusing), limited to
peak datarate P, with average datarate D.  These are the sorts of knobs a good
codec should be presenting to the world.
--- >8 ----
List archives:  xiph.org/archives
Ogg project homepage: xiph.org/ogg
To unsubscribe from this list, send a message to
'theora-request@xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is
needed.
Unsubscribe messages sent to the list will be ignored/filtered.

Seemingly Similar Threads

Search for more reasonably related threads

theora - Mar 2003 - Fixed Quantizer - Fixed Quality

[theora] Fixed Quantizer - Fixed Quality

[theora] Fixed Quantizer - Fixed Quality

[theora] Fixed Quantizer - Fixed Quality

[theora] Fixed Quantizer - Fixed Quality

[theora] Fixed Quantizer - Fixed Quality

Seemingly Similar Threads