We have new drafts of CMML 4.0 as a text codec and ROE as an xml stream abstract, subsuming the authoring support in CMML 3.1 and earlier. Another thing we talked about at LCA is a how to specify relationships between the various streams in Ogg so that a server, muxer or player can make intelligent decisions about the contained tracks. The general idea is to use the (http-style) Message Headers in the Skeleton track to describe each logical bitstream, but no one has ever written anything down. This is a proposal to get the ball rolling. Requirements: * Distinguish alternates based on language * Distinguish among subtitles for translation, for hearing impaired * Distinguish commentary tracks * Distinguish overlays from alternates from primaries = Self description The following message headers describe the corresponding track, metadata essentially. Lang: <locale> Machine parsable locale string describing the language the track is in. Used for example to choose the default audio track based on user preferences. Role: <role-type> Free form qualifier used to mark the category of the track content. We will define a basic set of role-types with standard meaning for machine interpretation. Example role-types: commentary (e.g. Director's commentary on a film) transcription (e.g. detailed record) interpretation (contains additional information) slides (visual aides accompanying another track) Of these, commentary is the only one I'd really like to have. Some other ideas: logo, ticker, credits, translation. The last is effectively the default though. Logo or ticker might be useful to have a different default for whether they are overlaid or not. Description: <string> Human readable description of the track, intended for display in a user interface. This can be localized by appending '.<locale>' to Description. We could also copy general metadata here, e.g. title, creator, date, location, license. That's perhaps more interesting in the fishead packet which describes the stream as a whole rather than the individual tracks. Program: <string> Arbitrary tag for distinguishing a group of tracks from an unrelated group they happen to be multiplexed with. For example, three separate programs might be sent over the same link multiplexed together, but only audio and video tracks with the same value for the Program message header should be played together. The default 'empty' program is a valid program, every fishbone without this message header marks the corresponding track as being in the default program. = Relations The self-description allows us to prioritize tracks implicitly, based on user preferences for showing audio, video, text, or some combination, preferred languages and roles. But there remain areas of ambiguity, so we define a way to mark relationships with other streams. The value of each of these is an Ogg stream serial number. Overlays: <serialno> This track doesn't (necessarily) stand on its own but is meant to be laid on top of another track. This distinguishes, for example a MNG video (no Overlays) with MNG subtitles (Overlays: corresponding theora video). Another example might be a vocal audio track that can be mixed with a music-only karaoke track. Substitutes: <serialno> Indicates that a track is an alternate or substitute for another. Translates: <serialno> Indicates that a track is an alternate language or media version of another track. Parallels: <serialno> Indicates that a track should be played together with another, instead of being treated as alternates. Of these, Overlays is the only one I'm really clear on the use case for. I think the others could be handled just as well by specifying heuristics: tracks of the same media type and role with different lang Translate each other. Tracks with the same media type, program, role that don't overlay another are Parallels. Question: Is it better to specify multiple relations with a list of serial numbers, or with multiple message headers? Thoughts? -r
On 16/02/2008, Ralph Giles <giles@xiph.org> wrote:> The general idea is to use the (http-style) Message Headers > in the Skeleton track to describe each logical bitstream, but no one > has ever written anything down. This is a proposal to get the ball > rolling.awesome, thanks :-)> Lang: <locale>generally I think we should go with existing HTTP and email headers where possible, eg. Content-Language> Description: <string> > > Human readable description of the track, intended for display in a > user interface. This can be localized by appending '.<locale>' to > Description. > > We could also copy general metadata here, e.g. title, creator, date, > location, license. That's perhaps more interesting in the fishead > packet which describes the stream as a whole rather than the > individual tracks.I kinda feel that this kind of human-readable metadata better belongs in CMML; the skeleton tells you where to go (for each locale), the CMML has language-specific metadata.> = Relations > > The self-description allows us to prioritize tracks implicitly, based > on user preferences for showing audio, video, text, or some > combination, preferred languages and roles. But there remain areas of > ambiguity, so we define a way to mark relationships with other > streams.> The value of each of these is an Ogg stream serial number.perhaps an ID (labelled by Content-ID or similar) is more robust. By the ogg spec, serialnos need to be changed when chaining etc., which would further complicate the remuxing of content with skeleton.> Substitutes: <serialno> > Parallels: <serialno>my proposal for these stems from Debian package relationships, where Provides, Depends, Recommends, Suggests, Conflicts are defined locally to each package. In aggregate (considering the whole universe of debian packages) these fields describe the graph of package relationships, but in isolation they are robust in that changes to other packages don't necessarily force a change to the relationship metadata. In terms of skeleton, two tracks which provide the same thing (eg. subtitles, but in different languages) don't need to contain metadata referencing each other. Instead they simply "Provide: subtitles", and that metadata does not need to be changed if similar tracks are added or removed. I intend to write this up (hopefully this week) as part of the ROE definition, ie. where the ROE-XML and these skeleton headers should be mutually invertible.> Question: Is it better to specify multiple relations with a list of > serial numbers, or with multiple message headers?these are equivalent, as defined for email/HTTP message headers (multiple message headers can be intercalated with commas). cheers, Conrad. -> boost your vocab, learn Haskell today!
On 15-Feb-08, at 10:11 PM, Conrad Parker wrote:>> Lang: <locale> > > generally I think we should go with existing HTTP and email headers > where possible, eg. Content-Language'k>> Description: <string> > > I kinda feel that this kind of human-readable metadata better belongs > in CMML; the skeleton tells you where to go (for each locale), the > CMML has language-specific metadata.This is fine for the stream as a whole, but I don't see how it helps with describing a particular audio track. The idea is that Role (or Provides) gives the client software a way to choose directly, and Description gives it a way to refine the options it offers the user. Content-ID: a2 Content-Type: audio/vorbis Content-Language: en Provides: Commentary Description: Audio Commentary by the Director and Writers Description.fr: Le commentaire du metteur en sc?ne Content-ID: a3 Content-Type: audio/vorbis Content-Language: en Provides: Commentary Description: Audio Commentary by the Cast>> = Relations >> >> The value of each of these is an Ogg stream serial number. > > perhaps an ID (labelled by Content-ID or similar) is more robust.Right, you mentioned that at LCA, I just forgot.>> Substitutes: <serialno> >> Parallels: <serialno> > > my proposal for these stems from Debian package relationships, where > Provides, Depends, Recommends, Suggests, Conflicts are defined locally > to each package. In aggregate (considering the whole universe of > debian packages) these fields describe the graph of package > relationships, but in isolation they are robust in that changes to > other packages don't necessarily force a change to the relationship > metadata.Yeah, that part's all good. I look forward to your writeup though, because I think the problem space is a little different. Provides works as a synonym for Role, but how do you do Overlays? Content-ID: overlay1 Role: subtitle Overlays: video1 vs Content-ID: overlay1 Provides: subtitles Depends: video1 I guess it works to make depends a synonym for Overlays. I do like Suggests as a way to flag a default configuration.> In terms of skeleton, two tracks which provide the same thing (eg. > subtitles, but in different languages) don't need to contain metadata > referencing each other. Instead they simply "Provide: subtitles", and > that metadata does not need to be changed if similar tracks are added > or removed.What role does Conflicts have?>> Question: Is it better to specify multiple relations with a list of >> serial numbers, or with multiple message headers? > > these are equivalent, as defined for email/HTTP message headers > (multiple message headers can be intercalated with commas).Ok, thanks. -r>
Might want to have a "priority" or "quality" field, in case a stream has to be pared down due to congestion (or just user preference). Something to say "this stream is important" and "this one is not really".
Something which you might also want to consider, unless it is deemed to be outside the scope of skeleton, is a way to embed data that can be referred to by other logical streams, as a means to avoid duplication. A logical stream is self contained as far as the data it handles goes, but several multiplexed streams might need the same, or similar, data. For instance, something I wanted to do was having several multiplexed Kate streams being able to use a shared font. This is currently not possible without duplicating the font data in each separate logical Kate stream, leading to waste of bandwidth. Since such "global" data needs to be in the headers (well, it doesn't *need* to be, but it's cleaner that way), it also means a large burst of data when one starts a stream. Also consider that text translations might mean dozens of multiplexed logical streams, something which isn't likely to occur with, say, Vorbis and Theora, where you usually get a few streams together. Duplication in this case becomes more of an issue. Back when I still thought CMML was about meta description of accompanying streams, I'd actually thought of CMML carrying such shared data to be delivered at the time it was needed, but that was quite "wouldn't it be nice" territory. If such shared data is kept in headers, Skeleton becomes a better fit for this payload.
Hi, A couple of points: 1) Font data, as in the actual font itself, doesn't really belong in an ogg stream. Given that it truly is "global" data in the sense that your fonts are shared by more than just a single ogg file, then the font should be separate from the stream and just referenced using an appropriate font naming scheme. If you meant font references in the first place, well those are small, and won't gobble much bandwidth at all. 2) We have been working on a specification and mechanism for indicating to clients that there are multiple tracks of the same "kind" (e.g. translation), and allowing clients to request individual tracks out of sets of like tracks. In fact with HTTP headers like Content-Language we can also allow the server to default to a particular translation selection in the absence of guidance from the client. At the moment I think a preliminary name for this specification is ROE - Silvia is in the process of nailing the spec down so you should ask her any questions you have about it :) Obviously this doesn't "solve" the duplication issue (if there is one) but it does prevent duplicated data eating bandwidth. 3) Text is cheap! Really cheap :) Seriously - compare the amount of space in your file taken up by text to that taken up even by audio, let alone video. Cheers, -Shane On Feb 20, 2008 1:18 AM, ogg.k.ogg.k@googlemail.com < ogg.k.ogg.k@googlemail.com> wrote:> Something which you might also want to consider, unless it is deemed > to be outside the scope of skeleton, is a way to embed data that can be > referred to by other logical streams, as a means to avoid duplication. > > A logical stream is self contained as far as the data it handles goes, but > several multiplexed streams might need the same, or similar, data. > > For instance, something I wanted to do was having several multiplexed > Kate streams being able to use a shared font. This is currently not > possible > without duplicating the font data in each separate logical Kate stream, > leading to waste of bandwidth. Since such "global" data needs to be in > the headers (well, it doesn't *need* to be, but it's cleaner that way), it > also > means a large burst of data when one starts a stream. > > Also consider that text translations might mean dozens of multiplexed > logical streams, something which isn't likely to occur with, say, Vorbis > and Theora, where you usually get a few streams together. Duplication > in this case becomes more of an issue. > > Back when I still thought CMML was about meta description of accompanying > streams, I'd actually thought of CMML carrying such shared data to be > delivered at the time it was needed, but that was quite "wouldn't it be > nice" > territory. If such shared data is kept in headers, Skeleton becomes a > better > fit for this payload. > _______________________________________________ > ogg-dev mailing list > ogg-dev@xiph.org > http://lists.xiph.org/mailman/listinfo/ogg-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.xiph.org/pipermail/ogg-dev/attachments/20080220/b74087f7/attachment.htm
Ralph, I was thinking about this proposal quite a bit today. I agree, using ROE as a description format on the server for tracks that are available to a media resource, say http://example.org/video.ogx with the following potential tracks {A,B,C,D}, we now have a means of communicating between server and client about what data the server should actually put into the stream. I am not sure I entirely agree with the method of communication between client and server. At one point at FOMS/LCA we discussed using URLs as a communication mechanism: http://example.org/video.ogx?tracks=A,B,D&t=15 would return the resource video.ogx with only tracks A,B, and D and only from time offset 15sec onwards. Such a scenario is possible as soon as the client has received the ROE file that describes video.ogx, so can determine itself which tracks to include in the request. Your example below is rather different: the client will ask from the server a particular language and the server will itself select the correct tracks that go with it. I believe this sort of selection should be restricted to Content-Language: <locale>, and rather more generically it should be restricted to the values of the "distinction" attribute of the switch statement in ROE (see http://wiki.xiph.org/index.php/ROE). In contrast, the "provides" attributes should be specified in the URL with the "tracks" query parameter as described above, e.g. http://example.org/video.ogx?tracks=video:v1, audio:a1b2, text_overlay:t1, logo: 1. This is rather explicit, while the HTTP header message fields do it implicitly. I believe we need both methods and it might be time to start adding it to the wiki at http://wiki.xiph.org/index.php/ROE. Cheers, Silvia. On Sat, Feb 16, 2008 at 10:56 AM, Ralph Giles <giles@xiph.org> wrote:> We have new drafts of CMML 4.0 as a text codec and ROE as an xml > stream abstract, subsuming the authoring support in CMML 3.1 and > earlier. > > Another thing we talked about at LCA is a how to specify > relationships between the various streams in Ogg so that a server, > muxer or player can make intelligent decisions about the contained > tracks. The general idea is to use the (http-style) Message Headers > in the Skeleton track to describe each logical bitstream, but no one > has ever written anything down. This is a proposal to get the ball > rolling. > > Requirements: > > * Distinguish alternates based on language > * Distinguish among subtitles for translation, > for hearing impaired > * Distinguish commentary tracks > * Distinguish overlays from alternates from primaries > > = Self description > > The following message headers describe the corresponding track, > metadata essentially. > > Lang: <locale> > > Machine parsable locale string describing the language the track is > in. Used for example to choose the default audio track based on user > preferences. > > Role: <role-type> > > Free form qualifier used to mark the category of the track content. > We will define a basic set of role-types with standard meaning for > machine interpretation. Example role-types: > > commentary (e.g. Director's commentary on a film) > transcription (e.g. detailed record) > interpretation (contains additional information) > slides (visual aides accompanying another track) > > Of these, commentary is the only one I'd really like to have. Some > other ideas: logo, ticker, credits, translation. The last is > effectively the default though. Logo or ticker might be useful to > have a different default for whether they are overlaid or not. > > Description: <string> > > Human readable description of the track, intended for display in a > user interface. This can be localized by appending '.<locale>' to > Description. > > We could also copy general metadata here, e.g. title, creator, date, > location, license. That's perhaps more interesting in the fishead > packet which describes the stream as a whole rather than the > individual tracks. > > Program: <string> > > Arbitrary tag for distinguishing a group of tracks from an unrelated > group they happen to be multiplexed with. For example, three separate > programs might be sent over the same link multiplexed together, but > only audio and video tracks with the same value for the Program > message header should be played together. The default 'empty' program > is a valid program, every fishbone without this message header marks > the corresponding track as being in the default program. > > > = Relations > > The self-description allows us to prioritize tracks implicitly, based > on user preferences for showing audio, video, text, or some > combination, preferred languages and roles. But there remain areas of > ambiguity, so we define a way to mark relationships with other > streams. The value of each of these is an Ogg stream serial number. > > Overlays: <serialno> > > This track doesn't (necessarily) stand on its own but is meant to be > laid on top of another track. This distinguishes, for example a MNG > video (no Overlays) with MNG subtitles (Overlays: corresponding > theora video). Another example might be a vocal audio track that can > be mixed with a music-only karaoke track. > > Substitutes: <serialno> > > Indicates that a track is an alternate or substitute for another. > > Translates: <serialno> > > Indicates that a track is an alternate language or media version of > another track. > > Parallels: <serialno> > > Indicates that a track should be played together with another, > instead of being treated as alternates. > > Of these, Overlays is the only one I'm really clear on the use case > for. I think the others could be handled just as well by specifying > heuristics: tracks of the same media type and role with different > lang Translate each other. Tracks with the same media type, program, > role that don't overlay another are Parallels. > > Question: Is it better to specify multiple relations with a list of > serial numbers, or with multiple message headers? > > Thoughts? > -r > _______________________________________________ > ogg-dev mailing list > ogg-dev@xiph.org > http://lists.xiph.org/mailman/listinfo/ogg-dev >