Silvia Pfeiffer
2010-Mar-23 05:43 UTC
[theora-dev] Extension to Skeleton for multi-track media
Hi all, Discussions about a need for an extension to Skeleton to cater for multi-track media files has been going on for a while. In a recent thread here, in discussions on IRC, and at FOMS between Jan, Ralph, Viktor and I, we discussed some fields. Viktor and I continued that discussion to make more specific recommendations on what fields to add. We now have a wiki page at http://wiki.xiph.org/SkeletonHeaders that has aggregated all the different things that were discussed. "Language", "Role" and "Name" are fields that we want to introduce to better expose "semantic" information about the tracks. These will help a media player make better choices on which tracks to display and which to put into a selection to the user. These will also be necessary for providing the (currently proposed) HTML5 JavaScript API (http://www.w3.org/WAI/PF/HTML/wiki/Media_MultitrackAPI) with the right information. Note that right now there is no proposal for explicitly specifying track dependencies, since the need for these are not really clear. A further part of the wiki page is the proposal to impose an implicit order on the tracks through the order in which their BOS pages are given. This is nothing semantic, but only a convenience so we can ascertain that different Web browsers will address the same track by the same index number through JavaScript. Finally there are two rendering related fields that we propose introducing: Display-hint and Altitude (their names could of course still be changed). Each of these are specified as features of the given track, but relate to the other tracks for rendering. Altitude specified the display ordering (as z-index in HTML/CSS), and Display-hints right now has proposals for picture-in-picture display relative the video's display area, for transparency to be applied to all pixels of the track, for one color to be chosen as completely transparent (as in green-screening), and for an image mask to be applied. The image in the image mask would need to be either referenced or encoded into another skeleton package - this isn't quite solved yet. We'd like to introduce these fields into a new version of Skeleton, such that future formats will be able to include these fields. All of these fields are optional, so there are no requirements really, just additional functionality that a media player can make use of. This email is to ask for input to the different proposals and for suggestions of further improvements. Cheers, Silvia.
Benjamin M. Schwartz
2010-Mar-23 14:19 UTC
[theora-dev] Extension to Skeleton for multi-track media
Silvia Pfeiffer wrote:> "Language", "Role" and "Name" are fields that we want to introduce to > better expose "semantic" information about the tracks.These three are great. Comments: 1. It is common for movies to list a series of languages, and it's not always the case that one is dominant. To accommodate this, we should permit specifying the Language field multiple times, as allowed in RFC 2822. The Javascript API should return an array of language codes. Conventionally, the first language code should be the dominant one if present. A track with no language code should return an empty array. 2. Some of the roles are unclear. It would be good to add clarifying descriptions of their meaning and intended use. For example, I don't know the motivation or use for: text/activeregion, text/annotation, text/transcript, text/linguistic, text/chapters, audio/music, audio/speech, audio/sfx. Also, video/alpha needs to specify how a multichannel track (like Theora) can be rendered down to a single alpha channel, for example by using the unmodified bytes of Y as alpha. 3. It seems that the name is meant to be only a semi-human-readable tag, not a fully user-facing title. Perhaps a localized Title field would be a good addition at some point.> A further part of the wiki page is the proposal to impose an implicit > order on the tracks through the order in which their BOS pages are > given. This is nothing semantic, but only a convenience so we can > ascertain that different Web browsers will address the same track by > the same index number through JavaScript.I reiterate my preference for associative arrays, indexed by the Ogg track ID and name. The BOS ordering is unstable, and provides no benefit that I can see over unique stream identifiers.> Finally there are two rendering related fields that we propose > introducing: Display-hint and Altitude (their names could of course > still be changed).Altitude seems fine. I have more problems with Display-hint: pip: Specifying that a track can be shown as PIP might be a good thing. This mechanism seems very rigid, though. Television sets that provide PIP usually let the user control the positioning, because they may want to see different parts of the underlying frame. I'm not convinced that specifying a position or size along with the PIP hint is necessary at all. If it is, the text should say "may be displayed" instead of "should be displayed" to indicate that the player should give the user control. Content producers who want exact control of overlay positioning should use Altitude and video/alpha. Where are the zero coordinates of the display area? If w and h are percentages, what are they percentages of? 2. mask: Ogg files are self-contained. This proposal breaks that in a huge way, and I think it's terrible. The right way to do this is in CSS in the webpage, a la http://labs.silverorange.com/files/video-demo/ambient.xhtml http://webkit.org/blog/181/css-masks/ Please remove mask from the draft. 3. transparentcolor. This will not work. Lossy video codecs do not reproduce exact colors. I am not aware of any continuous-tone image or video coding system that employs this approach, because it doesn't work. Please remove it from the draft. People who want transparency will have to use the video/alpha system. Further improvements: As currently stated, the video/alpha label cannot actually be used to blend multiple tracks together. For example, if I want an exactly controlled optional overlay, I would create 3 Theora tracks labeled as video/main, video/alpha, and video/alternate (or maybe video/additional), all the same size. The altitude of the additional track would be higher than the main, to indicate that it goes on top. There are now at least three possibilities: 1. The alpha track applies to the additional track. 2. The alpha track applies to the main track (before compositing) 3. The alpha track applies to the whole video (after compositing) At present, there is no way to distinguish these cases, and the situation is even more underspecified in the case of multiple additional tracks. To remedy this, I recommend an additional header field "Applies-to: [name]". This indicates the name of the track to which a track applies. For example, a text track may apply to the to audio track of which it is a transcription, and the video onto which it should be overlayed. A video/sign track Applies-to the audio track of which it is a translation. A video/alpha track Applies-to each track it is supposed to mask (before compositing). For video/alpha, this is still insufficient, because masking a video and an overlay before compositing them is not the same as masking after compositing. To permit masking after compositing, video/alpha tracks should optionally have one or more Altitudes. For each Altitude held by a video/alpha track, it applies to the composited result of all visible higher tracks. --Ben -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: OpenPGP digital signature Url : http://lists.xiph.org/pipermail/theora-dev/attachments/20100323/09064747/attachment.pgp
Chris Pearce
2010-Apr-26 22:35 UTC
[theora-dev] Extension to Skeleton for multi-track media
On 23/03/2010 6:43 p.m., Silvia Pfeiffer wrote:> Discussions about a need for an extension to Skeleton to cater for > multi-track media files has been going on for a while. In a recent > thread here, in discussions on IRC, and at FOMS between Jan, Ralph, > Viktor and I, we discussed some fields. Viktor and I continued that > discussion to make more specific recommendations on what fields to > add. > > We now have a wiki page at http://wiki.xiph.org/SkeletonHeaders that > has aggregated all the different things that were discussed. >Other than the the message header fields at the wiki page above, is there a list somewhere of the fields you'd like to see added to the skeleton track? Do you want any of the above fields to be compulsory in the next version of Skeleton? It looks like all of those headers would be specified in the fisbone packet, which I don't need to change for indexing. They also look like they're variable length anyway, so including them as message fields makes perfect sense. I'm trying to nail down a spec for the indexing-related stuff for a skeleton 4.0 track, and it would be great if we could produce a unified spec with all the changes we both want for a skeleton 4.0 track. :) All the best, Chris P.