thr3ads.net - ogg dev - [ogg-dev] OggYUV [Nov 2005]

If this information is useful, please help other people find it:
Share via:

jkoleszar@on2.com

2005-Nov-08 19:47 UTC

[ogg-dev] OggYUV

Here's a shot at a list of fields:

// High level data
Displayed Width&Height
Stored Width&Height
Aspect Ratio (Fractional)
Frame Rate (Fractional)
FourCC (Optional, set to zero to use values below)
Colorspace (enum, R'G'B', Y'CbCr, JPEG (not sure proper name),
etc)

// Subsampling data
U Channel X Sample Rate (Fractional)
U Channel Y Sample Rate (Fractional)
U Channel X Sample Offset (Fractional)
U Channel Y Sample Offset (Fractional)
V Channel X Sample Rate (Fractional)
V Channel Y Sample Rate (Fractional)
V Channel X Sample Offset (Fractional)
V Channel Y Sample Offset (Fractional)

// Storage data
A Channel Bits Per Sample
A Channel Field 0 Offset (in bits)
A Channel Field 1 Offset (in bits)
A Channel X Stride (in bits)
A Channel Y Stride (in bits?)
Y/R Channel Bits Per Sample
Y/R Channel Field 0 Offset (in bits)
Y/R Channel Field 1 Offset (in bits)
Y/R Channel X Stride (in bits)
Y/R Channel Y Stride (in bits?)
U/G Channel Bits Per Sample
U/G Channel Field 0 Offset (in bits)
U/G Channel Field 1 Offset (in bits)
U/G Channel X Stride (in bits)
U/G Channel Y Stride (in bits?)
V/B Channel Bits Per Sample
V/B Channel Field 0 Offset (in bits)
V/B Channel Field 1 Offset (in bits)
V/B Channel X Stride (in bits)
V/B Channel Y Stride (in bits?)

Known limitations: This won't support formats with variable strides.
Haven't found any common formats that this will exclude, but I haven't
looked very hard. Also won't support subsampled alpha, which is probably
undesirable anyway.

I'm still not convinced that RGB and YUV can't (shouldn't) be
combined,
since RGB is so similar to a 4:4:4 YUV format.

I still include FourCC, because using it is a shorthand way of filling out
almost all of the fields below for the common raw formats. Subsampling
offsets are one example. However, any application that can only identify
its source data by FourCC probably doesn't know where the samples were
taken, so that would have to be invented in any case. If you want to limit
the allowable values for the FourCC field, I don't have an issue with
that, but I think it's useful for decoders to be able to tell easily
whether or not they support the format (since most decoders will operate
on the well defined formats) and useful for encoders, since most data
sources are described by a fourcc (the exception being application that
actually generate images, rather than extract/transcode, I suppose)

Well formed streams should fully describe the FourCC in the descriptive
fields below, but whether it's actually necessary to describe them or not
is a separate argument.

Arc

2005-Nov-08 23:57 UTC

head link

[ogg-dev] OggYUV

> If you want to limit the allowable values for the FourCC field, I don't
have
> an issue with that, but I think it's useful for decoders to be able to
tell
> easily whether or not they support the format (since most decoders will 
> operate on the well defined formats) and useful for encoders, since most
data
> sources are described by a fourcc (the exception being application that
> actually generate images, rather than extract/transcode, I suppose)
I disagree with this, most decoders using OggStream are unlikely to be using 
FourCC, or at least the ones I care most about, and this places a complexity 
burden on all implementations which use OggYUV such that they *MUST* have a 
table of FourCC -> format mappings, whereas software which already supports 
FourCC should already have a table of these mappings and be able to quickly see 
if a OggYUV stream is directly mappable to a raw YUV FourCC codec.

Also, as you pointed out, many FourCC implementations are ambiguously defined 
and are thus inadequate on their own.  No.  Backwards compatability to this 
obsolete codec-identification system should be provided by software which 
actually uses the older system, not forced on all implementations of the newer 
system.  Shorthanding fields saves only a few bytes in the stream header
(we're
not even talking data packet header) and adds manditory complexity.  


I'll address the other elements of your draft line by line:
> Displayed Width&Height
> Stored Width&Height
> Aspect Ratio (Fractional)
Aspect ratio is what makes pixels potentially non-square, and since we're
not
encoding in blocks as most compressed codecs do, what purpose would having a 
different displayed/stored width/height serve?

I implemented 24-bit fields for width/height/aspect_num/aspect_den just as 
Theora does.  Honestly, I don't forsee anyone doing greater than 65536 wide 
video in the next, oh, 50 years, being as even our current high definition video
is only getting up toward 4096 wide and the bandwidths for such ultra-super-high
definition video would certainly surpass anything we'll have in the near
future,
but heck, might as well use the same as Theora, right?

Who will cry over 2 wasted bytes in a raw video codec header? :-)

> Colorspace (enum, R'G'B', Y'CbCr, JPEG (not sure proper
name), etc)
This isn't what colorspace means, from what I've seen at least.. Theora 
implements ITU 601 and CIE 709 colorspaces, which apparently tell the decoder or
converter how to properly map YUV values to RGB.  It's not YUV vs RGB, but 
rather one of those fields unique to YUV video.

Correct me if I'm wrong, or if "Colorspace" is ambiguous.

I provided an 8 bit field for this, just as Theora, though we'll likely not
use
more than half of this space in the near future.

 > // Subsampling data
> U Channel X Sample Rate (Fractional)
> U Channel Y Sample Rate (Fractional)
> U Channel X Sample Offset (Fractional)
> U Channel Y Sample Offset (Fractional)
> V Channel X Sample Rate (Fractional)
> V Channel Y Sample Rate (Fractional)
> V Channel X Sample Offset (Fractional)
> V Channel Y Sample Offset (Fractional)
I'm unsure what you're trying to do here.  Implement 4:4:4 vs 4:2:2 vs
4:2:0?
What is the offset, why is it fractional?

All the common formats (ignoring some of the older FourCC which are rarely used)
implement a two-line system with no more than four pixels in each
"block".
Thus, we can implement this very simply.  Y-U-V is always provided in that 
order (wether planar or packed), so what we must encode is on which luma pixels 
chroma data is provided for.

8 pixel block can have 1, 2, 4, or 8 chroma samples, so this should be our first
2-bit field, then (only applicable if 2 or 4) we can stagger chroma in both
x&y,
then we can split chroma in both x&y, resulting in the following table of
valid
possibilities (*=doesn't matter):

00**00: UV -- -- --
        -- -- -- --

00**10: U- -- V- --
        -- -- -- --

00**01: U- -- -- --
        V- -- -- --

00**11: U- -- -- --
        -- -- V- --

010000: Impossible

011000: UV -- UV --
        -- -- -- --

010100: UV -- -- --
        UV -- -- --

011100: UV -- -- --
        -- -- UV --

011010: U- V- U- V-
        -- -- -- --

011001: U- -- U- --
        V- -- V- -- 

011011: U- -- U- --
        -- V- -- V-

010110: U- -- V- --
        U- -- V- --

010101: Impossible

010111: Impossible

011110: U- -- V- --
        V- -- U- --

011101: Impossible 

011111: Impossible 

100000: UV -- UV --
        UV -- UV --

101000: Impossible

100100: Impossible

101100: UV -- UV --
        -- UV -- UV

100010: U- V- U- V-
        U- V- U- V-

100001: Impossible

100011: Impossible


10**11: U- V- U- V-
        V- U- V- U-

11****: UV UV UV UV
        UV UV UV UV

(All the "Impossible" entries are duplicates, since all these bits do
is shift,
and often, shifting results in the same despite it's arrangement)

I'm not proposing this mapping is complete, but there are less than 20 and
(if I
didn't make mistakes) it's all done using simple bit shifts which can be
generically programmed.  

Being as there's less than 20, it may make sense to simply define these sets
on
their own, giving the 6 bits over to "data format" and make these part
of the
spec.  As far as mapping to other codecs, it really doesn't matter, since 
there'll probobally be a table which says "map to this
arrangement".

Mapping to maximum 32 would only require 5 bits, if we wanted to condense this 
list to index#'s, gaining another format bit for something else... in the 
current spec draft, the top bit is used for interlaced flag, the second for 
wether the data is packed or not.. this may not be sufficient.

> // Storage data
> A Channel Bits Per Sample
> A Channel Field 0 Offset (in bits)
> A Channel Field 1 Offset (in bits)
> A Channel X Stride (in bits)
> A Channel Y Stride (in bits?)
> Y/R Channel Bits Per Sample
> Y/R Channel Field 0 Offset (in bits)
> Y/R Channel Field 1 Offset (in bits)
> Y/R Channel X Stride (in bits)
> Y/R Channel Y Stride (in bits?)
> U/G Channel Bits Per Sample
> U/G Channel Field 0 Offset (in bits)
> U/G Channel Field 1 Offset (in bits)
> U/G Channel X Stride (in bits)
> U/G Channel Y Stride (in bits?)
> V/B Channel Bits Per Sample
> V/B Channel Field 0 Offset (in bits)
> V/B Channel Field 1 Offset (in bits)
> V/B Channel X Stride (in bits)
> V/B Channel Y Stride (in bits?)
I'm unsure what any of this is or why it's nessesary.  Please explain?

> I'm still not convinced that RGB and YUV can't (shouldn't) be
combined,
> since RGB is so similar to a 4:4:4 YUV format.
Compare http://wiki.xiph.org/OggRGB to http://wiki.xiph.org/OggYUV - yes, they 
are similar, but YUV is much more complex, and I see no reason to join them.  
Or, if you prefer, think of them as one codec with two identifiers which change 
the fields around in the header/etc.


Load up the current draft at http://wiki.xiph.org/OggYUV 
 
-- 

The recognition of individual possibility,
 to allow each to be what she and he can be,
  rests inherently upon the availability of knowledge;
 The perpetuation of ignorance is the beginning of slavery.

from "Die Gedanken Sind Frei": Free Software and the Struggle for Free
Thought
 by Eben Moglen, General council of the Free Software Foundation

John Koleszar

2005-Nov-09 06:13 UTC

head link

[ogg-dev] OggYUV

Arc wrote:
>I disagree with this, most decoders using OggStream are unlikely to be using
>FourCC, or at least the ones I care most about, and this places a complexity
>I don't think this is true. Most data sources are going to have some 
fourcc associated with them. If it's a piece of hardware, it's going to 
work on windows, and there will be a fourcc to describe it's data 
format, assuming it's a relatively open piece of hardware. And since the 
more open hardware is the most likely to have linux support, you're 
likely to be getting data described by a common fourcc. On the other 
end, if you're working with a mplayer that supports many codecs (eg, 
mplayer), it's going to understand many of the standard fourcc's
already.
>burden on all implementations which use OggYUV such that they *MUST* have a 
>table of FourCC -> format mappings, whereas software which already
supports
>  
>Not true. I'm proposing tagging the data with the fourcc if you know 
what it is. You can leave it blank and let the extra data fields 
describe it if you don't know or don't want to use the fourcc. If you do
know the fourcc, you fill in the fourcc field AND the data layout 
fields. Then applications that don't know anything about the fourcc can 
stil work on the data, and applications that do understand fourcc have a 
much easier time dealing with it. Otherwise, an application has to 
inspect the 30 some-odd parameters I identified earlier to see if it's a 
stream it already understands.

To use a car analogy, it's like trying to sell a car by listing  30 
parameters including cylinders, displacement, wheelbase, wheel diameter, 
number of wheels, number of doors, height, headroom, legroom, etc, but 
not the model name, when the guy buying it (eg, a player application) 
only cares whether it's a pickup truck or not. Yes, you can look at all 
the parameters and figure out that it's a truck being described, but 
it's a lot of work when you only want to park it in your garage (copy it 
to video memory, for instance). By listing the model along with all the 
data, you support the manufacturer (data importer) who knows that the 
vehicle's a truck but lists its parameters anyway, the mechanics 
(plugins) who don't care what model of vehicle it is but need all its 
parameters, the guy who parks it in his garage (player). You also 
support the hobby builder who doesn't have an official model name. He 
can build a car that everyone can work on without actually naming it.
>hese mappings and be able to quickly see 
>if a OggYUV stream is directly mappable to a raw YUV FourCC codec.
>  
>Robust applications that support only the common formats will have to 
parse untagged headers to determine if the format is really supported or 
not, but friendly applications that know they are outputting data in a 
standard format should tag the data as being standard.
>Also, as you pointed out, many FourCC implementations are ambiguously
defined
>and are thus inadequate on their own.
>Yes, many of them are ambiguous and not well understood. However, the 
ones that are widely used (YV12, I420, YUY2, UYVY, YVYU are the ones I 
use most) ARE well understood, and images formatted in that way will be 
common payloads somewhere in the OggStream chain between the original 
data source and the video card.
>>Displayed Width&Height
>>Stored Width&Height
>>Aspect Ratio (Fractional)
>>    
>>
>
>Aspect ratio is what makes pixels potentially non-square, and since
we're not
>encoding in blocks as most compressed codecs do, what purpose would having a
>different displayed/stored width/height serve?
>  
>Many of the YUV formats only work on image sizes that are a multiple of 
some common number (2, 4). YUV 4:2:0 formats can only store images with 
an even number of pixels in both directions. If you have an odd sized 
image, you can leave the border pixels undefined or extend them, but you 
need to specify that only w-1 pixels contain valid data.
>This isn't what colorspace means, from what I've seen at least..
Theora
>implements ITU 601 and CIE 709 colorspaces, which apparently tell the
decoder or
>converter how to properly map YUV values to RGB.  It's not YUV vs RGB,
but
>rather one of those fields unique to YUV video.
>
>Correct me if I'm wrong, or if "Colorspace" is ambiguous.
>  
>I'm not a color expert. But as far as I can tell, color is described by 
a triple. (RGB is linear, R'G'B' is nonlinear, ITU 601 and CIE 709
are
others) The link Timothy sent yesterday is good. I'm trying to grok it 
now. In any case, this field is an enumeration, and we just need to 
identify the proper values. I think we're basically in agreement here.
>>// Subsampling data
>>U Channel X Sample Rate (Fractional)
>>U Channel Y Sample Rate (Fractional)
>>U Channel X Sample Offset (Fractional)
>>U Channel Y Sample Offset (Fractional)
>>V Channel X Sample Rate (Fractional)
>>V Channel Y Sample Rate (Fractional)
>>V Channel X Sample Offset (Fractional)
>>V Channel Y Sample Offset (Fractional)
>>    
>>
>
>I'm unsure what you're trying to do here.  Implement 4:4:4 vs 4:2:2
vs 4:2:0?
>What is the offset, why is it fractional?
>  
>Yes. The sample rate tells you 4:4:4 vs 4:2:2 vs 4:2:0 vs 4:1:1 etc. 
These don't have to be a lot of bits. The offset tells you where the 
sample was taken, since some chroma samples are taken at the same place 
as the luma, and others are taken half way inbetween, and various 
combinations thereof. I'd guess 2 bits for each of these is probably 
sufficient (if you stick to a four pixel macropixel).
>All the common formats (ignoring some of the older FourCC which are rarely
used)
>implement a two-line system with no more than four pixels in each
"block".
>Thus, we can implement this very simply.  Y-U-V is always provided in that 
>order (wether planar or packed), so what we must encode is on which luma
pixels
>chroma data is provided for.
>  
>YUV is absolutely NOT always in that order. YV12 (likely theora's 
storage format unless it diverged from VP3) actually stores the V plane 
before the U plane in memory. YVYU is a fairly common packed format that 
stores V first.
>>// Storage data
>>A Channel Bits Per Sample
>>A Channel Field 0 Offset (in bits)
>>A Channel Field 1 Offset (in bits)
>>A Channel X Stride (in bits)
>>A Channel Y Stride (in bits?)
>>Y/R Channel Bits Per Sample
>>Y/R Channel Field 0 Offset (in bits)
>>Y/R Channel Field 1 Offset (in bits)
>>Y/R Channel X Stride (in bits)
>>Y/R Channel Y Stride (in bits?)
>>U/G Channel Bits Per Sample
>>U/G Channel Field 0 Offset (in bits)
>>U/G Channel Field 1 Offset (in bits)
>>U/G Channel X Stride (in bits)
>>U/G Channel Y Stride (in bits?)
>>V/B Channel Bits Per Sample
>>V/B Channel Field 0 Offset (in bits)
>>V/B Channel Field 1 Offset (in bits)
>>V/B Channel X Stride (in bits)
>>V/B Channel Y Stride (in bits?)
>>    
>>
>
>I'm unsure what any of this is or why it's nessesary.  Please
explain?
>  
>I think this is what's necessary to fully describe an arbitrary four 
channel buffer of an optionally interlaced image (as long as it only has 
two fields). You actually need data like this in the OggRGB format. 
Right now, there isn't enough to tell the field order, so you'd have to 
mandate something. There are different field orderings, eg BGRA, ABGR, 
RGBA, ARGB, RGB, BGR, etc. To be pedantic, I think there are different 
RGB colorspaces, though I don't think they're generally used in computer
video. You need signed Y stride since there are RGB formats that aren't 
tightly packed (eg rows aligned to a four byte boundary). Also, some 
images are stored top down, others bottom up.. Having an offset and 
stride handles this well, because it's how it's done in software 
(pointer + stride). You need separate values for each channel, since 
each channel can be stored in any order. X stride is needed because in 
the packed formats, the stride between luma samples can be different 
from the stride between chroma samples.
>Compare http://wiki.xiph.org/OggRGB to http://wiki.xiph.org/OggYUV - yes,
they
>are similar, but YUV is much more complex, and I see no reason to join them.
>Or, if you prefer, think of them as one codec with two identifiers which
change
>the fields around in the header/etc.
>  
>A fully defined RGB image as just as complex as a YUV one, except for 
the subsampling.
>Load up the current draft at http://wiki.xiph.org/OggYUV 
>  
>I'll take a look at it, but I'm not ready to talk bits when the fields 
are still up in the air.

Possibly Parallel Threads

Search for more maybe matching threads

ogg dev - Nov 2005 - OggYUV

[ogg-dev] OggYUV

[ogg-dev] OggYUV

[ogg-dev] OggYUV

Possibly Parallel Threads