(cc-ed to theora-dev because that will reach a larger audience - please reply to only one mailing list.) Hi all, In discussions with the video accessibility subgroup of the W3C HTML working group, we are currently looking at how to deal with multitrack video, e.g. such video that has a main video and audio track, plus e.g. a sign language video track, an audio description audio track, a caption track and several subtitle tracks in different languages. (i.e. several theora, several vorbis and several kate tracks) We are in the process of developing a JavaScript API for extracting information from such files and thus be able to control them from JavaScript, e.g. turn them on/off, find out what is there and what is relevant etc. In this JavaScript API, it is necessary to address the tracks. Now, I don't think there is an explicit track order in an Ogg file - tracks are just regarded as parallel entities and differentiated by serial number, right? I found out that the MPEG/QuickTime container format has an explicit "track ID" that numbers its tracks. The track order also determines the display order. "Lower numbered layers are shown on top of higher-numbered layers. This layering order is important mainly when multiple video tracks are combined though graphics modes." (see http://developer.apple.com/quicktime/qttutorial/movies.html) There is also an explicit rendering mode, e.g. to alphablend, dither, blend or copy a layer on top of another one. Further, MPEG/QuickTime has an explicit "alternate group ID". This ID defines tracks that belong to the same group and can thus only be displayed alternately of each other, e.g. subtitle tracks, audio dubs, or video tracks with differing video quality. I now wonder whether we need to introduce an explicit "track ID" into Ogg, which will define a fixed order independent of the parsing system and will allow us in JavaScript to directly address tracks. For example, the current draft looks something like this: interface MediaTrack { readonly attribute DOMString title; readonly attribute DOMString type; readonly attribute DOMString role; readonly attribute DOMString lang; attribute boolean enabled; ... }; interface MediaTracks { readonly attribute unsigned long length; getter MediaTrack item(in unsigned long index); ... }; interface HTMLMediaElement : HTMLElement { ... readonly attribute MediaTracks tracks; ... }; Which could be used for something like this: if (video.tracks[1].role == "caption") video.tracks[1].enbled = true; if (video.tracks[1].lang == "fr") video.tracks[1].enbled = true; As you can see, there is track index that runs through all the tracks. If we want to keep them in the same order in every system, we have to come up with an ordering scheme. Right now, I can see two different systems: the order in which their BOS pages are given in the Ogg header part - or the order in which the serial numbers go, when ordered. Alternatively, we can introduce an explicit track ID and order by that number. I'm curious what others think. Cheers, Silvia.
ogg.k.ogg.k at googlemail.com
2010-Feb-02 11:26 UTC
[theora-dev] [ogg-dev] handling multitrack Ogg
> Right now, I can see two different systems: the order in which their > BOS pages are given in the Ogg header part - or the order in which the > serial numbers go, when ordered. Alternatively, we can introduce an > explicit track ID and order by that number.If one reencodes an Ogg stream (eg, using a video editor, etc), the serial numbers and BOS ordering might change. There are no tools that I know of that allow ordering BOS pages in a specific order, so that'd have to be created, and all writing programs would have to be made to preserve ordering. Ordering by serial numbers, even assuming they do not change, will give you an arbitrary ordering (as serial numbers are most often random to avoid collisions when merging with another file), which conflicts with an author specified ordering. Such an ordering is a property of the whole stream, rather than its individual constituent logical streams. Indeed, it may change if you remove/add streams, while the constituent streams will stay unchanged. Thus, I think such an ordering may be best done as a specific property placed in the fishead for this particular stream, eg "Track-ID: 5".
> In discussions with the video accessibility subgroup of the W3C HTML > working group, we are currently looking at how to deal with multitrack > video, e.g. such video that has a main video and audio track, plus > e.g. a sign language video track, an audio description audio track, a > caption track and several subtitle tracks in different languages. > (i.e. several theora, several vorbis and several kate tracks)Hooray! This was always an intended use of the system, even if the metadata representation of the track structure was never set. I didn't want to get too far into it without practical use cases people could present that they actually needed.> Right now, I can see two different systems: the order in which their > BOS pages are given in the Ogg header part - or the order in which the > serial numbers go, when ordered. Alternatively, we can introduce an > explicit track ID and order by that number.Right, add an explicit mapping of the indirection. Well, any of the above requires an addition to best practice. Oggk's right about 'no tool would automatically respect any of the above right now', so we get to decide but then we also have to gently coerce regardless the decision. Thank god there are no good tools for editing as yet! I don't object to any of those proposals. They seem pretty interchangable. So long as its clearly documented and orderings, when they matter, are explicitly defined in the metadata. Monty
Silvia Pfeiffer wrote:> I found out that the MPEG/QuickTime container format has an explicit > "track ID" that numbers its tracks. > > The track order also determines the display order....> Further, MPEG/QuickTime has an explicit "alternate group ID". This ID > defines tracks that belong to the same group and can thus only be > displayed alternately of each otherI think these are two good examples of what we _don't_ want. The player should be able to decide whether to overlay a sign language track on top of the video, or to display it in a separate rectangle. Similarly, the player should be able to decide to display multiple subtitle languages at once, or play multiple audio tracks at once. This is the nature of HTML: content is separate from presentation.> I now wonder whether we need to introduce an explicit "track ID" into > Ogg, which will define a fixed order independent of the parsing system > and will allow us in JavaScript to directly address tracks.The ogg stream serial number is already such an explicit track ID. Your proposed API should work fine; you just have to treat the tracks attribute as an associative array instead of an index array. (Javascript associative arrays only support string keys, so you'll have to convert the track ID to a string, but that's fine.) If MPEG streams use their index to specify overlay ordering then they can expose a "readonly attribute unsigned long altitude" (or whatever you want to call it). As for specifying the relationship between streams, Skeleton fisbone packets are the logical way to do this ... but of course you know that better than anyone. --Ben -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: OpenPGP digital signature Url : http://lists.xiph.org/pipermail/theora-dev/attachments/20100202/302a5256/attachment.pgp
It sounds like you're on the right track. There are 2 places I would start looking - DVD's, and Quicktime. Quicktime is mature and the container is the basis for mp4 and 3gp. Can you find a way to transcode to Ogg maintaining the track information? As someone who makes videos, its beyond me to translate audio to other languages, but I would be interested in doing commentary tracks. Right now I do a second version of the video for that. A nice feature in iMovie is that it does 'ducking', so the commentary pushes the audio levels of other tracks down as needed. Re if (video.tracks[1].lang == "fr") video.tracks[1].enabled = true; It would help for W3C to standardize language codes. Quicktime uses 3 characters, Ogg uses 2 characters? There are also variations of the same language. Can a script find a closest match? In addition to scripting, a system level setting could be used to select preferred tracks. What ever you do, should equally apply to subtitles. On Tue, Feb 2, 2010 at 3:01 AM, Silvia Pfeiffer <silviapfeiffer1 at gmail.com>wrote:> (cc-ed to theora-dev because that will reach a larger audience - > please reply to only one mailing list.) > > Hi all, > > In discussions with the video accessibility subgroup of the W3C HTML > working group, we are currently looking at how to deal with multitrack > video, e.g. such video that has a main video and audio track, plus > e.g. a sign language video track, an audio description audio track, a > caption track and several subtitle tracks in different languages. > (i.e. several theora, several vorbis and several kate tracks) > > We are in the process of developing a JavaScript API for extracting > information from such files and thus be able to control them from > JavaScript, e.g. turn them on/off, find out what is there and what is > relevant etc. In this JavaScript API, it is necessary to address the > tracks. > > Now, I don't think there is an explicit track order in an Ogg file - > tracks are just regarded as parallel entities and differentiated by > serial number, right? > > I found out that the MPEG/QuickTime container format has an explicit > "track ID" that numbers its tracks. > > The track order also determines the display order. "Lower numbered > layers are shown on top of higher-numbered layers. This layering order > is important mainly when multiple video tracks are combined though > graphics modes." (see > http://developer.apple.com/quicktime/qttutorial/movies.html) There is > also an explicit rendering mode, e.g. to alphablend, dither, blend or > copy a layer on top of another one. > > Further, MPEG/QuickTime has an explicit "alternate group ID". This ID > defines tracks that belong to the same group and can thus only be > displayed alternately of each other, e.g. subtitle tracks, audio dubs, > or video tracks with differing video quality. > > I now wonder whether we need to introduce an explicit "track ID" into > Ogg, which will define a fixed order independent of the parsing system > and will allow us in JavaScript to directly address tracks. > > For example, the current draft looks something like this: > > interface MediaTrack { > readonly attribute DOMString title; > readonly attribute DOMString type; > readonly attribute DOMString role; > readonly attribute DOMString lang; > attribute boolean enabled; > ... > }; > interface MediaTracks { > readonly attribute unsigned long length; > getter MediaTrack item(in unsigned long index); > ... > }; > interface HTMLMediaElement : HTMLElement { > ... > readonly attribute MediaTracks tracks; > ... > }; > > Which could be used for something like this: > > if (video.tracks[1].role == "caption") video.tracks[1].enbled = true; > if (video.tracks[1].lang == "fr") video.tracks[1].enbled = true; > > > As you can see, there is track index that runs through all the tracks. > If we want to keep them in the same order in every system, we have to > come up with an ordering scheme. > > Right now, I can see two different systems: the order in which their > BOS pages are given in the Ogg header part - or the order in which the > serial numbers go, when ordered. Alternatively, we can introduce an > explicit track ID and order by that number. > > I'm curious what others think. > > Cheers, > Silvia. > _______________________________________________ > ogg-dev mailing list > ogg-dev at xiph.org > http://lists.xiph.org/mailman/listinfo/ogg-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.xiph.org/pipermail/ogg-dev/attachments/20100202/002b05cb/attachment.htm
On Tue, 2010-02-02 at 22:01 +1100, Silvia Pfeiffer wrote:> ... > Right now, I can see two different systems: the order in which their > BOS pages are given in the Ogg header part - or the order in which the > serial numbers go, when ordered. Alternatively, we can introduce an > explicit track ID and order by that number. > > I'm curious what others think.I have had to solve a similar problem with ALingA[1], but not in the specific context of annotated video. Whether the serial number is the same as the track id, the two have to be in 1-1 correspondence. So it would be simpler to use the serial number as the track id. I have not found this identity to be an obstacle in implementations. HTH, Elaine [1] http://www.ihear.com/FreeCLAS/wiki/ALingA
On Wed, Feb 3, 2010 at 5:07 AM, Frank Barchard <fbarchard at google.com> wrote:> It sounds like you're on the right track. ?There are 2 places I would start > looking - DVD's, and Quicktime. > Quicktime is mature and the container is the basis for mp4 and 3gp.That's what the post referred to - how it was done in QuickTime.> Can you find a way to transcode to Ogg maintaining the track information?Part of the result of putting such information into Skeleton will be that such transcoding will be enabled, yes.> As someone who makes videos, its beyond me to translate audio to other > languages, but I would be interested in doing commentary tracks. ?Right now > I do a second version of the video for that. > A nice feature in iMovie is that it does 'ducking', so the commentary pushes > the audio levels of other tracks down as needed.Yup, that's what editors are supposed to do. Hopefully some of the open source editors will eventually evolve to contain such functionality.> Re > if (video.tracks[1].lang == "fr") ? ? ? ? ?video.tracks[1].enabled = true; > It would help for?W3C?to standardize language codes. ?Quicktime uses 3 > characters, Ogg uses 2 characters? > There are also variations of the same language. ?Can a script find a closest > match?There are standards for language codes. They are not all 2 characters long. http://www.w3.org/TR/REC-html40/struct/dirlang.html explains what is in use in W3C.> In addition to scripting, a system level setting could be used to select > preferred tracks.Indeed, at least a browser preference setting would be nice. This is a call that the browser vendors have to make, not us.> What ever you do, should equally apply to subtitles.Of course. This was just an example. We are also talking textual audio descriptions, btw. Cheers, Silvia.
On Wed, Feb 3, 2010 at 9:08 AM, Benjamin M. Schwartz <bmschwar at fas.harvard.edu> wrote:> Silvia Pfeiffer wrote: >> QuickTime doesn't force a player to use this information. But it needs >> to be present, otherwise the player has no idea which information goes >> on top of which other information, e.g. the caption track goes on top >> of the video. Of course, the player is free to not do any overlaying >> and draw separate display areas etc. > > The player should know that the caption track goes on top of the video > because the caption track is labeled "caption" and the video is labeled > "main", or whatever the agreed-upon labels are for track types.No, those labels don't exist - at least not in Ogg and not right now. This is another thing that we will need to introduce (see Conrad's email). It will likely be a "role:caption" or "role:subtitle" or "role:sign" label, but this is actually different from the display order. For some things like captions, the display order is obvious. But for others, e.g. slides or sign language, it's not. The sign language video and the slides video could both be sub-videos of a talk recording. OTOH, the sign language video could be the main video on the others could be sub-videos. In such cases a recommended display order needs to be given.>> The issue is that this kind of information is not currently available >> for Ogg tracks, so we need to add it to skeleton. > > I agree. ?I'm just suggesting that the skeleton content should be > semantic, as much as possible, rather than prescriptive.OK, fair enough.>> Nobody wants to refer to such long numbers. They are also really hard >> to "loop through". But I'll see what we come up with. > > They're not numbers; they're strings. ?They're opaque identifiers. > Programmers use opaque identifiers all the time. ?As for looping through > them, you can do > > for (var t in video.tracks) { > ? ? ? ?if (video.tracks[t].role == "caption") video.tracks[t].enabled = true; > ? ? ? ?if (video.tracks[t].lang == "fr") ? ? ? ? ?video.tracks[t].enabled = true; > } > > To me, that seems even easier than the equivalent int-array for loop.Hmm, interesting... S.
On Tue, Feb 2, 2010 at 10:26 PM, ogg.k.ogg.k at googlemail.com <ogg.k.ogg.k at googlemail.com> wrote:>> Right now, I can see two different systems: the order in which their >> BOS pages are given in the Ogg header part - or the order in which the >> serial numbers go, when ordered. Alternatively, we can introduce an >> explicit track ID and order by that number. > > If one reencodes an Ogg stream (eg, using a video editor, etc), the > serial numbers and BOS ordering might change. > > There are no tools that I know of that allow ordering BOS pages in a > specific order, so that''d have to be created, and all writing programs > would have to be made to preserve ordering. > > Ordering by serial numbers, even assuming they do not change, will > give you an arbitrary ordering (as serial numbers are most often > random to avoid collisions when merging with another file), which > conflicts with an author specified ordering. > > Such an ordering is a property of the whole stream, rather than its > individual constituent logical streams. Indeed, it may change if you > remove/add streams, while the constituent streams will stay unchanged. > > Thus, I think such an ordering may be best done as a specific property > placed in the fishead for this particular stream, eg "Track-ID: 5". >That was my first reaction, too. Have it there as an explicit field. But it means that whenever we edit a video - cut out a track / add a track - this field has to be rewritten. Thus, it might be better if we used a more inherent order scheme that doesn''t require anything new in the stream. We would, e.g. just need to prescribe that the order in which the bos pages for the different tracks appear in the stream is also the numbering order of the tracks. This won''t break existing editing approaches and doesn''t require making changes to fields at the front of the file when just oggz-merging or oggz-ripping. The bos order right now is determined by the tools and the tools that I used kept the given order and added any new tracks after the given ones. But this would indeed be a behaviour that would need to be prescribed. On the other hand, if there is a URL that points to e.g. the second track in a video and the video is being changed on the server where the second track is being replaced, then it points to the wrong track, even though that track may still be available in the video file. This wouldn''t happen with an explicit reference. We could just claim that the latter is unfortunate, but a problem of the author / editor, who destroyed the links to the resource. Further, if somebody wanted to really link to an explicit track, they could use the serialnumber (however ugly it is). This will only break when the video is edited and the serialnumber rewritten rather than when just tracks are added/removed. I think this may be acceptable. Cheers, Silvia.
ogg.k.ogg.k at googlemail.com
2010-Feb-02 22:41 UTC
[theora-dev] [ogg-dev] handling multitrack Ogg
> But it means that whenever we edit a video - cut out a track / add a > track - this field has to be rewritten.The skeleton track has to be recreated anyway every time the file is saved again. Especially now that it may have an index.> Thus, it might be better if we used a more inherent order scheme that > doesn''t require anything new in the stream.There is no order now. You can define an order for arbitrary serial numbers, but if what you want is to define a particular order (eg, this track gets overlaid on top of this track), then you have another information to store anyway. So you will require something new in the stream.> We would, e.g. just need to prescribe that the order in which the bos > pages for the different tracks appear in the stream is also the > numbering order of the tracks. This won''t break existing editing > approaches and doesn''t require making changes to fields at the front > of the file when just oggz-merging or oggz-ripping. The bos order > right now is determined by the tools and the tools that I used kept > the given order and added any new tracks after the given ones. But > this would indeed be a behaviour that would need to be prescribed.This seems brittle. Rely on an implicit ordering that up to now had no significance (apart from the ''Theora BOS first'' rule to avoid confusing audio players).> On the other hand, if there is a URL that points to e.g. the second > track in a video and the video is being changed on the server where > the second track is being replaced, then it points to the wrong track, > even though that track may still be available in the video file. This > wouldn''t happen with an explicit reference. > > We could just claim that the latter is unfortunate, but a problem of > the author / editor, who destroyed the links to the resource. Further,This would be akin to having chosen an anchor system as <a href="#104"> instead of <a href="#name">, which 104 being a byte offset in the HTML page. If the author changes the page, it invalidates the anchors pointing to it. Yes, it would have been the author''s fault, but maybe the author did not know others had linked to this particular page unbeknownst to him.