On Tue, Nov 08, 2005 at 06:08:25PM +0800, illiminable wrote:> Why not just make it OggRawFOURCC, do we really need one stream format for > rgb, and one for yuv ?[snip]> I just meant oggRaw, not fourcc.Oh, thank god you corrected this. :-) I was contemplating an "OggVid" format, and here is why I'm steering against it (though, yes, this has been a topic of discussion, and yes, it hasn't been decided on yet)..> The fields are going to be the same for RGB, so why make it twice as much > work to implement.Because the fields are /not/ going to be the same for RGB. RGB has resolution, framerate, prehaps even interlace and aspect ratio... But chroma subsampling? no. And this is where much of the complexity comes. If we were to combine them, we would be, essentially, doing it something like this: Value Meaning 0 RGB 1 YUV444 2 YUV422 3 YUV420 4 YUV411 ..... And then spend an additional field on bits/channel, whereas both chroma channels in YUV is are going to have the same size. Oh, please, let there not be an exception to this. Now, keep in mind, that calling two codecs different is a conceptual one. If the difference between OggYUV and OggRGB ends up being changing the three character identifier, eliminating chroma subsampling field, and adding an extra bits per channel field and prehaps some other RGB-centric things. There are additional YUV-centric things which are not on the wiki now, as well, which Theora has fields to implement and other video codecs already are. So, if you will, view the \x00YUV vs \x00RGB as a flag of roughly the same format, or at least, that's the strategy which I believe is better at the moment. Is there anything in the software implementation of these which would complicate this further? -- The recognition of individual possibility, to allow each to be what she and he can be, rests inherently upon the availability of knowledge; The perpetuation of ignorance is the beginning of slavery. from "Die Gedanken Sind Frei": Free Software and the Struggle for Free Thought by Eben Moglen, General council of the Free Software Foundation
> But chroma subsampling? no. And this is where much of the complexity > comes. > > If we were to combine them, we would be, essentially, doing it something > like > this: > Value Meaning > 0 RGB > 1 YUV444 > 2 YUV422 > 3 YUV420 > 4 YUV411 > .....Yes.> And then spend an additional field on bits/channel, whereas both chroma > channels > in YUV is are going to have the same size. Oh, please, let there not be > an > exception to this.Well depending what you want to do... not only are there samping differences, there are also ordering differences, eg. interleaved or planar, and if interleaved, in which order... eg. among the windows fourcc's... YV12 (which is most similar to theoras output)... then there is IYUV, which is the same, except the U and V planes are in the opposite order. Then there's YUY2 which is interleaved Y0 U0 Y1 V0 Y2 U1 Y3 V1, and YVYU (Y0 V0 Y1 U0 Y2 V1 Y3 U1), and UYVY (U0 Y0 V0 Y1 U0 Y2 V0 Y3)... and then there's AYUV, which has a 4th alpha channel. Then there's the issue of where the samples lie on a grid in relation to the pixels centre, do the samples centre over the pixels in the horizontal or vertical direction, or do they fall at the mid point between 2 pixel centres. And then there's the colour spaces (which i don't know all the details of, but i'm sure derf or rillian can tell you all about it). The way i see it, if you are suggesting have a bits per sample/pixel field, a planar or interleaved field, a field for the subsampling in the horizontal and vertical directions, some way to denote the ordering of interleaved channels, a field to specify if there is an alpha channel, and if there is, where and how it's represented, and perhaps something to accurately specify the colour space, basically what you are doing is opening up millions of possibilities, most of which are completely useless. If you have a bits per channel field in RGB, what about RGB24, 3 channels, 8 bits each, but padded into 32 bits. RGB555, 15 bits, padded to 16. There's thousands of invalid possibilities, and only 15-20 or less valid ones... only really 3-5 commonly used. If someone wants to go crazy and design a franken-yuv format for some bizarre reason, then they can easily make another stream format... but you can pretty much count the ones people actually care about and that are used in 90% of cases on one hand, YV12 (4:2:0), YUY2(4:2:2), RGB24, ARGB and maybe RGB555. And if the format is enumerated, then all the other fields will be the same for both RGB and YUV. It will be simple, it doesn't open up the possibility that people can specify bizzare combinations of things that you have to check against, other wise fail, or do some unknown amount of transformation to get to something you can display. Also, on another issue, i already find the method of codec identification pretty ad hoc... i think having ident fields that are only 3 or 4 bytes is a very bad idea. Zen.
Arc wrote:>Because the fields are /not/ going to be the same for RGB. > >RGB has resolution, framerate, prehaps even interlace and aspect ratio... > >But chroma subsampling? no. And this is where much of the complexity comes. > >Not all YUV formats are subsampled either. And not all YUV formats are planar. If you're going to distinguish between them, I think it has to be along the lines of packed vs planar, not colorspace. An <a href="http://www.fourcc.org/yuv.php#AYUV">AYUV</a> frame, for instance, and a RGBA frame are going to look identical in terms of header info and data packet size, they just differ in colorspace. It seems easier to me to have two 32 bit fields, one I'll call "PixelFormat" and another I'll call "FormatExtraData" for this discussion. The PixelFormat field uniquely identifies the data storage method. This includes YUV/RGB, the chroma sampling, packed vs planar, etc.. Then let the FormatExtraData field could be defined (or not) for each format. This might be a good spot to stick the RGB bit packing, or an endianness flag, or a vertical flip flag, etc. Yes, I would propose using the proper fourcc to describe the pixel format, since they're reasonably well documented at fourcc.org, but it doesn't have to be that way. There are so many pixel formats that's it's hard to just enumerate all the data contained in a fourcc, let alone put it in a simple header. I don't know of any applications that operate on completely arbitrary image data, so putting some of the smarts into the application rather than trying to put it in a header doesn't seem like a bad trade off. As an exercise, how would you create a header to distinguish between UYVY data and YVYU? Both are 4:2:2 packed YUV formats, just differing in component order.. It's not an endianness issue either (U Y0 V Y1 vs Y0 V Y1 U)
On Tue, Nov 08, 2005 at 09:36:52PM +0800, illiminable wrote:> > Then there's YUY2 which is interleaved Y0 U0 Y1 V0 Y2 U1 Y3 V1, and YVYU > (Y0 V0 Y1 U0 Y2 V1 Y3 U1), and UYVY (U0 Y0 V0 Y1 U0 Y2 V0 Y3)... and then > there's AYUV, which has a 4th alpha channel.We will only be doing [A]YUV ordered planar encoding, no other order, not packed using one of several methods. You're right, there's simply too many different possibilities, and the software implementation is too complex.> Then there's the issue of where the samples lie on a grid in relation to > the pixels centre, do the samples centre over the pixels in the horizontal > or vertical direction, or do they fall at the mid point between 2 pixel > centres.Yes, that needs to be noted, too. I've seen common implementations which do both, and the subsampled chroma -> RGB mapping is very different between the different methods.> And then there's the colour spaces (which i don't know all the details of, > but i'm sure derf or rillian can tell you all about it).That's one of the fields we're currently lacking on the wiki, and one which I don't understand either.> If you have a bits per channel field in RGB, what about RGB24, 3 channels, > 8 bits each, but padded into 32 bits. RGB555, 15 bits, padded to 16.Or RGB 565, giving green an extra bit because the human eye can see twice as many shades of green than red or blue.. but yes, RGB doubles the issue, which is why we need a seperate codec for it.> There's thousands of invalid possibilities, and only 15-20 or less valid > ones... only really 3-5 commonly used. > > If someone wants to go crazy and design a franken-yuv format for some > bizarre reason, then they can easily make another stream format... but you > can pretty much count the ones people actually care about and that are used > in 90% of cases on one hand, YV12 (4:2:0), YUY2(4:2:2), RGB24, ARGB and > maybe RGB555.4:4:4, 4:1:1, RGB32, 16-bit per channel, many other common ones, especially for those used for professional video. This is primarily an interchange format, something that the Theora codec can output for the media player to receive, or the webcam can send to Theora to encode, or raw video to be stored in such that it can be encoded to a new codec in testing while reliably keeping a/v sync. Media players don't have to support every format, nor does any video codec. If a video codec (ie, DV) can only output 4:1:1 and the media player only takes 4:4:4 then an intermediary plugin will be needed to do the convertion. Some media frameworks already have functions for these, so they'll just take whatever format is being outputted and do the convertion themselves before sending to the media player. So what I propose for OggYUV is to cover the capabilities of Ogg video codecs, everything Theora is capable of and prehaps a bit more that we've seen from other codecs. 4:4:4, as I recall, is supported by the Theora spec (even if the current implementation doesn't).> Also, on another issue, i already find the method of codec identification > pretty ad hoc... i think having ident fields that are only 3 or 4 bytes is > a very bad idea.Talk to Monty about this, it's part of the design for Ogg. It's what we've done to date, and as long as you're working in a strategy where codecs are asked if they support something, or provide some information similar to mime magic, it works fine. If you're suggesting that OggPCM and OggYUV use "RawPCM" and "RawYUV", or something similar for an identifier, to allow future codecs to begin with PCM* or YUV*, that makes some sense, but I currently feel that the three letters are sufficient and allow 3rd party codecs to use a prefix if PCM/YUV/RGB is in their name. It's become a pseduo-standard that the first byte of page0 be a header ID byte, and the variable length codec identification magic follows. In the OggStream code that I'm working on, 8 bytes are used to identify a codec from the plugin to the application, with the first 7 of those useable such that the null-padding will null-terminate the string. The web search API, to find the name and plugin for an unknown codec, sends the entire contents of packet 0 to the application via HTTP. -- The recognition of individual possibility, to allow each to be what she and he can be, rests inherently upon the availability of knowledge; The perpetuation of ignorance is the beginning of slavery. from "Die Gedanken Sind Frei": Free Software and the Struggle for Free Thought by Eben Moglen, General council of the Free Software Foundation