thr3ads.net - ogg dev - [ogg-dev] Skeletal relations [Feb 2008]

If this information is useful, please help other people find it:
Share via:

Ralph Giles

2008-Feb-15 15:56 UTC

[ogg-dev] Skeletal relations

We have new drafts of CMML 4.0 as a text codec and ROE as an xml  
stream abstract, subsuming the authoring support in CMML 3.1 and  
earlier.

Another thing we talked about at LCA is a how to specify  
relationships between the various streams in Ogg so that a server,  
muxer or player can make intelligent decisions about the contained  
tracks. The general idea is to use the (http-style) Message Headers  
in the Skeleton track to describe each logical bitstream, but no one  
has ever written anything down. This is a proposal to get the ball  
rolling.

Requirements:

* Distinguish alternates based on language
* Distinguish among subtitles for translation,
   for hearing impaired
* Distinguish commentary tracks
* Distinguish overlays from alternates from primaries

= Self description 
The following message headers describe the corresponding track,  
metadata essentially.

Lang: <locale>

Machine parsable locale string describing the language the track is  
in. Used for example to choose the default audio track based on user  
preferences.

Role: <role-type>

Free form qualifier used to mark the category of the track content.  
We will define a basic set of role-types with standard meaning for  
machine interpretation. Example role-types:

   commentary (e.g. Director's commentary on a film)
   transcription (e.g. detailed record)
   interpretation (contains additional information)
   slides (visual aides accompanying another track)

Of these, commentary is the only one I'd really like to have. Some  
other ideas: logo, ticker, credits, translation. The last is  
effectively the default though. Logo or ticker might be useful to  
have a different default for whether they are overlaid or not.

Description: <string>

Human readable description of the track, intended for display in a  
user interface. This can be localized by appending '.<locale>' to
Description.

We could also copy general metadata here, e.g. title, creator, date,  
location, license. That's perhaps more interesting in the fishead  
packet which describes the stream as a whole rather than the  
individual tracks.

Program: <string>

Arbitrary tag for distinguishing a group of tracks from an unrelated  
group they happen to be multiplexed with. For example, three separate  
programs might be sent over the same link multiplexed together, but  
only audio and video tracks with the same value for the Program  
message header should be played together. The default 'empty' program  
is a valid program, every fishbone without this message header marks  
the corresponding track as being in the default program.


= Relations 
The self-description allows us to prioritize tracks implicitly, based  
on user preferences for showing audio, video, text, or some  
combination, preferred languages and roles. But there remain areas of  
ambiguity, so we define a way to mark relationships with other  
streams. The value of each of these is an Ogg stream serial number.

Overlays: <serialno>

This track doesn't (necessarily) stand on its own but is meant to be  
laid on top of another track. This distinguishes, for example a MNG  
video (no Overlays) with MNG subtitles (Overlays: corresponding  
theora video). Another example might be a vocal audio track that can  
be mixed with a music-only karaoke track.

Substitutes: <serialno>

Indicates that a track is an alternate or substitute for another.

Translates: <serialno>

Indicates that a track is an alternate language or media version of  
another track.

Parallels: <serialno>

Indicates that a track should be played together with another,  
instead of being treated as alternates.

Of these, Overlays is the only one I'm really clear on the use case  
for. I think the others could be handled just as well by specifying  
heuristics: tracks of the same media type and role with different  
lang Translate each other. Tracks with the same media type, program,  
role that don't overlay another are Parallels.

Question: Is it better to specify multiple relations with a list of  
serial numbers, or with multiple message headers?

Thoughts?
  -r

Conrad Parker

2008-Feb-15 22:11 UTC

head link

[ogg-dev] Skeletal relations

On 16/02/2008, Ralph Giles <giles@xiph.org> wrote:>  The general idea is to use the (http-style) Message Headers
>  in the Skeleton track to describe each logical bitstream, but no one
>  has ever written anything down. This is a proposal to get the ball
>  rolling.
awesome, thanks :-)
>  Lang: <locale>
generally I think we should go with existing HTTP and email headers
where possible, eg. Content-Language
>  Description: <string>
>
>  Human readable description of the track, intended for display in a
>  user interface. This can be localized by appending
'.<locale>' to
>  Description.
>
>  We could also copy general metadata here, e.g. title, creator, date,
>  location, license. That's perhaps more interesting in the fishead
>  packet which describes the stream as a whole rather than the
>  individual tracks.
I kinda feel that this kind of human-readable metadata better belongs
in CMML; the skeleton tells you where to go (for each locale), the
CMML has language-specific metadata.
>  = Relations >
>  The self-description allows us to prioritize tracks implicitly, based
>  on user preferences for showing audio, video, text, or some
>  combination, preferred languages and roles. But there remain areas of
>  ambiguity, so we define a way to mark relationships with other
>  streams.
> The value of each of these is an Ogg stream serial number.
perhaps an ID (labelled by Content-ID or similar) is more robust. By
the ogg spec, serialnos need to be changed when chaining etc., which
would further complicate the remuxing of content with skeleton.
>  Substitutes: <serialno>
>  Parallels: <serialno>
my proposal for these stems from Debian package relationships, where
Provides, Depends, Recommends, Suggests, Conflicts are defined locally
to each package. In aggregate (considering the whole universe of
debian packages) these fields describe the graph of package
relationships, but in isolation they are robust in that changes to
other packages don't necessarily force a change to the relationship
metadata.

In terms of skeleton, two tracks which provide the same thing (eg.
subtitles, but in different languages) don't need to contain metadata
referencing each other. Instead they simply "Provide: subtitles", and
that metadata does not need to be changed if similar tracks are added
or removed.

I intend to write this up (hopefully this week) as part of the ROE
definition, ie. where the ROE-XML and these skeleton headers should be
mutually invertible.
>  Question: Is it better to specify multiple relations with a list of
>  serial numbers, or with multiple message headers?
these are equivalent, as defined for email/HTTP message headers
(multiple message headers can be intercalated with commas).

cheers,

Conrad.
 -> boost your vocab, learn Haskell today!

Ralph Giles

2008-Feb-15 23:35 UTC

head link

[ogg-dev] Skeletal relations

On 15-Feb-08, at 10:11 PM, Conrad Parker wrote:
>>  Lang: <locale>
>
> generally I think we should go with existing HTTP and email headers
> where possible, eg. Content-Language
'k
>>  Description: <string>
>
> I kinda feel that this kind of human-readable metadata better belongs
> in CMML; the skeleton tells you where to go (for each locale), the
> CMML has language-specific metadata.
This is fine for the stream as a whole, but I don't see how it helps  
with describing a particular audio track. The idea is that Role (or  
Provides) gives the client software a way to choose directly, and  
Description gives it a way to refine the options it offers the user.

Content-ID: a2
Content-Type: audio/vorbis
Content-Language: en
Provides: Commentary
Description: Audio Commentary by the Director and Writers
Description.fr: Le commentaire du metteur en sc?ne

Content-ID: a3
Content-Type: audio/vorbis
Content-Language: en
Provides: Commentary
Description: Audio Commentary by the Cast
>>  = Relations >>
>> The value of each of these is an Ogg stream serial number.
>
> perhaps an ID (labelled by Content-ID or similar) is more robust.
Right, you mentioned that at LCA, I just forgot.
>>  Substitutes: <serialno>
>>  Parallels: <serialno>
>
> my proposal for these stems from Debian package relationships, where
> Provides, Depends, Recommends, Suggests, Conflicts are defined locally
> to each package. In aggregate (considering the whole universe of
> debian packages) these fields describe the graph of package
> relationships, but in isolation they are robust in that changes to
> other packages don't necessarily force a change to the relationship
> metadata.
Yeah, that part's all good. I look forward to your writeup though,  
because I think the problem space is a little different. Provides  
works as a synonym for Role, but how do you do Overlays?

Content-ID: overlay1
Role: subtitle
Overlays: video1

vs

Content-ID: overlay1
Provides: subtitles
Depends: video1

I guess it works to make depends a synonym for Overlays. I do like  
Suggests as a way to flag a default configuration.
> In terms of skeleton, two tracks which provide the same thing (eg.
> subtitles, but in different languages) don't need to contain metadata
> referencing each other. Instead they simply "Provide: subtitles",
and
> that metadata does not need to be changed if similar tracks are added
> or removed.
What role does Conflicts have?
>>  Question: Is it better to specify multiple relations with a list of
>>  serial numbers, or with multiple message headers?
>
> these are equivalent, as defined for email/HTTP message headers
> (multiple message headers can be intercalated with commas).
Ok, thanks.

  -r>

ogg.k.ogg.k@googlemail.com

2008-Feb-18 02:25 UTC

head link

[ogg-dev] Skeletal relations

Might want to have a "priority" or "quality" field, in case
a stream has to be
pared down due to congestion (or just user preference). Something to say
"this stream is important" and "this one is not really".

ogg.k.ogg.k@googlemail.com

2008-Feb-19 06:18 UTC

head link

[ogg-dev] Skeletal relations

Something which you might also want to consider, unless it is deemed
to be outside the scope of skeleton, is a way to embed data that can be
referred to by other logical streams, as a means to avoid duplication.

A logical stream is self contained as far as the data it handles goes, but
several multiplexed streams might need the same, or similar, data.

For instance, something I wanted to do was having several multiplexed
Kate streams being able to use a shared font. This is currently not possible
without duplicating the font data in each separate logical Kate stream,
leading to waste of bandwidth. Since such "global" data needs to be in
the headers (well, it doesn't *need* to be, but it's cleaner that way),
it also
means a large burst of data when one starts a stream.

Also consider that text translations might mean dozens of multiplexed
logical streams, something which isn't likely to occur with, say, Vorbis
and Theora, where you usually get a few streams together. Duplication
in this case becomes more of an issue.

Back when I still thought CMML was about meta description of accompanying
streams, I'd actually thought of CMML carrying such shared data to be
delivered at the time it was needed, but that was quite "wouldn't it be
nice"
territory. If such shared data is kept in headers, Skeleton becomes a better
fit for this payload.

Shane Stephens

2008-Feb-19 13:03 UTC

head link

[ogg-dev] Skeletal relations

Hi,

A couple of points:

1) Font data, as in the actual font itself, doesn't really belong in an ogg
stream.  Given that it truly is "global" data in the sense that your
fonts
are shared by more than just a single ogg file, then the font should be
separate from the stream and just referenced using an appropriate font
naming scheme.  If you meant font references in the first place, well those
are small, and won't gobble much bandwidth at all.

2) We have been working on a specification and mechanism for indicating to
clients that there are multiple tracks of the same "kind" (e.g.
translation), and allowing clients to request individual tracks out of sets
of like tracks.  In fact with HTTP headers like Content-Language we can also
allow the server to default to a particular translation selection in the
absence of guidance from the client.  At the moment I think a preliminary
name for this specification is ROE - Silvia is in the process of nailing the
spec down so you should ask her any questions you have about it :)
Obviously this doesn't "solve" the duplication issue (if there is
one) but
it does prevent duplicated data eating bandwidth.

3) Text is cheap!  Really cheap :)  Seriously - compare the amount of space
in your file taken up by text to that taken up even by audio, let alone
video.

Cheers,
    -Shane

On Feb 20, 2008 1:18 AM, ogg.k.ogg.k@googlemail.com <
ogg.k.ogg.k@googlemail.com> wrote:
> Something which you might also want to consider, unless it is deemed
> to be outside the scope of skeleton, is a way to embed data that can be
> referred to by other logical streams, as a means to avoid duplication.
>
> A logical stream is self contained as far as the data it handles goes, but
> several multiplexed streams might need the same, or similar, data.
>
> For instance, something I wanted to do was having several multiplexed
> Kate streams being able to use a shared font. This is currently not
> possible
> without duplicating the font data in each separate logical Kate stream,
> leading to waste of bandwidth. Since such "global" data needs to
be in
> the headers (well, it doesn't *need* to be, but it's cleaner that
way), it
> also
> means a large burst of data when one starts a stream.
>
> Also consider that text translations might mean dozens of multiplexed
> logical streams, something which isn't likely to occur with, say,
Vorbis
> and Theora, where you usually get a few streams together. Duplication
> in this case becomes more of an issue.
>
> Back when I still thought CMML was about meta description of accompanying
> streams, I'd actually thought of CMML carrying such shared data to be
> delivered at the time it was needed, but that was quite "wouldn't
it be
> nice"
> territory. If such shared data is kept in headers, Skeleton becomes a
> better
> fit for this payload.
> _______________________________________________
> ogg-dev mailing list
> ogg-dev@xiph.org
> http://lists.xiph.org/mailman/listinfo/ogg-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.xiph.org/pipermail/ogg-dev/attachments/20080220/b74087f7/attachment.htm

Silvia Pfeiffer

2008-Mar-22 07:28 UTC

head link

[ogg-dev] Skeletal relations

Ralph,

I was thinking about this proposal quite a bit today.

I agree, using ROE as a description format on the server for tracks
that are available to a media resource, say
http://example.org/video.ogx with the following potential tracks
{A,B,C,D}, we now have a means of communicating between server and
client about what data the server should actually put into the stream.

I am not sure I entirely agree with the method of communication
between client and server.

At one point at FOMS/LCA we discussed using URLs as a communication mechanism:

  http://example.org/video.ogx?tracks=A,B,D&t=15

  would return the resource video.ogx with only tracks A,B, and D and
only from time offset 15sec onwards.

Such a scenario is possible as soon as the client has received the ROE
file that describes video.ogx, so can determine itself which tracks to
include in the request.


Your example below is rather different: the client will ask from the
server a particular language and the server will itself select the
correct tracks that go with it. I believe this sort of selection
should be restricted to Content-Language: <locale>, and rather more
generically it should be restricted to the values of the "distinction"
attribute of the switch statement in ROE (see
http://wiki.xiph.org/index.php/ROE).

In contrast, the "provides" attributes should be specified in the URL
with the "tracks" query parameter as described above, e.g.
http://example.org/video.ogx?tracks=video:v1, audio:a1b2,
text_overlay:t1, logo: 1. This is rather explicit, while the HTTP
header message fields do it implicitly.

I believe we need both methods and it might be time to start adding it
to the wiki at http://wiki.xiph.org/index.php/ROE.

Cheers,
Silvia.



On Sat, Feb 16, 2008 at 10:56 AM, Ralph Giles <giles@xiph.org>
wrote:> We have new drafts of CMML 4.0 as a text codec and ROE as an xml
>  stream abstract, subsuming the authoring support in CMML 3.1 and
>  earlier.
>
>  Another thing we talked about at LCA is a how to specify
>  relationships between the various streams in Ogg so that a server,
>  muxer or player can make intelligent decisions about the contained
>  tracks. The general idea is to use the (http-style) Message Headers
>  in the Skeleton track to describe each logical bitstream, but no one
>  has ever written anything down. This is a proposal to get the ball
>  rolling.
>
>  Requirements:
>
>  * Distinguish alternates based on language
>  * Distinguish among subtitles for translation,
>    for hearing impaired
>  * Distinguish commentary tracks
>  * Distinguish overlays from alternates from primaries
>
>  = Self description >
>  The following message headers describe the corresponding track,
>  metadata essentially.
>
>  Lang: <locale>
>
>  Machine parsable locale string describing the language the track is
>  in. Used for example to choose the default audio track based on user
>  preferences.
>
>  Role: <role-type>
>
>  Free form qualifier used to mark the category of the track content.
>  We will define a basic set of role-types with standard meaning for
>  machine interpretation. Example role-types:
>
>    commentary (e.g. Director's commentary on a film)
>    transcription (e.g. detailed record)
>    interpretation (contains additional information)
>    slides (visual aides accompanying another track)
>
>  Of these, commentary is the only one I'd really like to have. Some
>  other ideas: logo, ticker, credits, translation. The last is
>  effectively the default though. Logo or ticker might be useful to
>  have a different default for whether they are overlaid or not.
>
>  Description: <string>
>
>  Human readable description of the track, intended for display in a
>  user interface. This can be localized by appending
'.<locale>' to
>  Description.
>
>  We could also copy general metadata here, e.g. title, creator, date,
>  location, license. That's perhaps more interesting in the fishead
>  packet which describes the stream as a whole rather than the
>  individual tracks.
>
>  Program: <string>
>
>  Arbitrary tag for distinguishing a group of tracks from an unrelated
>  group they happen to be multiplexed with. For example, three separate
>  programs might be sent over the same link multiplexed together, but
>  only audio and video tracks with the same value for the Program
>  message header should be played together. The default 'empty'
program
>  is a valid program, every fishbone without this message header marks
>  the corresponding track as being in the default program.
>
>
>  = Relations >
>  The self-description allows us to prioritize tracks implicitly, based
>  on user preferences for showing audio, video, text, or some
>  combination, preferred languages and roles. But there remain areas of
>  ambiguity, so we define a way to mark relationships with other
>  streams. The value of each of these is an Ogg stream serial number.
>
>  Overlays: <serialno>
>
>  This track doesn't (necessarily) stand on its own but is meant to be
>  laid on top of another track. This distinguishes, for example a MNG
>  video (no Overlays) with MNG subtitles (Overlays: corresponding
>  theora video). Another example might be a vocal audio track that can
>  be mixed with a music-only karaoke track.
>
>  Substitutes: <serialno>
>
>  Indicates that a track is an alternate or substitute for another.
>
>  Translates: <serialno>
>
>  Indicates that a track is an alternate language or media version of
>  another track.
>
>  Parallels: <serialno>
>
>  Indicates that a track should be played together with another,
>  instead of being treated as alternates.
>
>  Of these, Overlays is the only one I'm really clear on the use case
>  for. I think the others could be handled just as well by specifying
>  heuristics: tracks of the same media type and role with different
>  lang Translate each other. Tracks with the same media type, program,
>  role that don't overlay another are Parallels.
>
>  Question: Is it better to specify multiple relations with a list of
>  serial numbers, or with multiple message headers?
>
>  Thoughts?
>   -r
>  _______________________________________________
>  ogg-dev mailing list
>  ogg-dev@xiph.org
>  http://lists.xiph.org/mailman/listinfo/ogg-dev
>

Possibly Parallel Threads

Search for more seemingly similar threads

ogg dev - Feb 2008 - Skeletal relations

[ogg-dev] Skeletal relations

[ogg-dev] Skeletal relations

[ogg-dev] Skeletal relations

[ogg-dev] Skeletal relations

[ogg-dev] Skeletal relations

[ogg-dev] Skeletal relations

[ogg-dev] Skeletal relations

Possibly Parallel Threads