thr3ads.net - ogg dev - [ogg-dev] Ogg/Kate preliminary documentation [Feb 2008]

If this information is useful, please help other people find it:
Share via:

ogg.k.ogg.k@googlemail.com

2008-Feb-08 03:12 UTC

[ogg-dev] Ogg/Kate preliminary documentation

> Some of the things you talk about were not solved at the CMML level, but
> rather through using different Ogg
> logical bitstreams.
While this is possible to do it this way (and probably a good idea for the
examples like a clock in a corner), it implies that all the placements and
logically different "items" are known at the start of the stream
(since the
Ogg spec says a stream can't start midway through another stream, an
interesting restriction, but which is there nonetheless). While this is fine
for a file based stream, it is not if the stream is generated in realtime.

While it is not used at the moment, I do have a "category" field in
the ID
header, meant to be a tag used by a player to know what is supplied by
a particular stream (eg, the user may want to select a number of categories,
such as "transcript" and "commentary", and a language, and
two streams
would be displayed by the player.

However, forcing the use of several separate streams, while having the
advantage of keeping things simple (and being the solution I selected for
multiple languages), may be overly restrictive.
> * overlapping timed text pieces would be coming in through differnt logical
> bistreams or the CSS (there may be a timing extension necessary to CSS to
do
> so - if you have found a better way of doing this, I'll be keen to see)
Not a better solution, I'm afraid, merely a different one. You define
regions and
(very simple) styles, and there is a system of "motions" (mostly
splines) that
can alter attributes like color, position, etc. It's another custom scheme
I'm
afraid, but one which is kept simple and powerful I believe (hope ?).
> The advantage of having things in different logical bitstreams is that you
> can create addressing schemes can refer to just a subset of logical
> bitstreams if you e.g. only want some part of the composition delivered to
> you from a server. For example,
> http://example.org/video.ogx?track=video,audio,transcript will avoid giving
> you the digital time,logo, and channel number tracks for the above example.
> The CMML design has always focused on trying to keep things in components
> that can easily be added or taken away.
This is a very good point, and the real point of Annodex, if I'm not
mistaken
(addressability of audio/video content) ? Kate does not attempt to deal with
this, it's totally outside its scope. I understand that CMML does this for
non
CMML streams anyway (eg, Theora) ?
> I'm very keen to seeing your specifications and seeing kate at work -
it may
> well be that you have found some better solutions to some of the problems
> that we attack differently with CMML and thus we should think about picking
> the best designs. Really wanting to see it working - post your specs and
the
> patched vlc version here if you can!
I'll send you a recent snapshot, feel free to take inspiration from it, but
I've
only worked on it for about a month now, so don't expect to see much you
haven't solved yet :)
I do not have a patch for vlc, only MPlayer and xine (MPlayer does only text
subtitles, but xine does all). As for specs, since the bitstream
format is still in
flux (and the API to a lesser extent), there are no docs yet. The wiki page is
all there is for the moment.
> BTW: on the kate wiki page, Annodex is mentioned - what annodex is is simly
> a Ogg file with skeleton and a CMML track in addition to other digital
> media. It's a term that we used to specify the particular multiplexed
file
> with which we wanted to work, but it hasn't really much meaning in
itself
> nowadays.
Yes, I've noticed that very much of the code (in xine, say) was shared to
decode
Ogg and Annodex streams.

Conrad Parker

2008-Feb-08 03:24 UTC

head link

[ogg-dev] Ogg/Kate preliminary documentation

On 08/02/2008, ogg.k.ogg.k@googlemail.com <ogg.k.ogg.k@googlemail.com>
wrote:>
> This is a very good point, and the real point of Annodex, if I'm not
mistaken
> (addressability of audio/video content) ? Kate does not attempt to deal
with
> this, it's totally outside its scope. I understand that CMML does this
for non
> CMML streams anyway (eg, Theora) ?
not CMML, but the algorithm for remuxing from time offsets, which is
specified in the annodex Internet-Draft. This basically describes how
to reconstruct a bitstream to represent a given time offset, including
any necessary headers, keyframes and preroll. The parameters for these
are given in the skeleton headers.

Conrad.

ogg.k.ogg.k@googlemail.com

2008-Feb-08 03:36 UTC

head link

[ogg-dev] Ogg/Kate preliminary documentation

> not CMML, but the algorithm for remuxing from time offsets, which is
> specified in the annodex Internet-Draft. This basically describes how
Ah, yes, and CMML supplies a way to specify addresses which are
then interpreted by this algorithm. Have I got that right ?

Ralph Giles

2008-Feb-08 16:03 UTC

head link

[ogg-dev] Ogg/Kate preliminary documentation

On 2/8/08, ogg.k.ogg.k@googlemail.com <ogg.k.ogg.k@googlemail.com> wrote:
> While this is possible to do it this way (and probably a good idea for the
> examples like a clock in a corner), it implies that all the placements and
> logically different "items" are known at the start of the stream
(since the
> Ogg spec says a stream can't start midway through another stream, an
> interesting restriction, but which is there nonetheless). While this is
fine
> for a file based stream, it is not if the stream is generated in realtime.
Right. This was, in fact, one of the roles of "chaining" where
you'd
mark such changed components with a chain boundary, at which such
things are explicitly allowed to change. The drawbacks are the
overhead of resending all the setup data for configurable codecs like
vorbis and theora, and the semantic conflict between 'chain boundary
flags an edit point' and 'chain boundary flags a program change'
which
have confused people implementing playlist-style representations of
chained streams for some time. CMML has a similar confusion as it can
be used for chapter markers as easily as dialog markup within a single
scene.

There are certainly arguments for doing it both ways, but from the
Annodex point of view it is nice to push as much of that onto the
mux/skeleton level as possible, for all the reasons Silvia described.
Do you have a counter illustration of where adding a new category
suddenly, on the fly is contra-compelling?
> While it is not used at the moment, I do have a "category" field
in the ID
> header, meant to be a tag used by a player to know what is supplied by
> a particular stream (eg, the user may want to select a number of
categories,
> such as "transcript" and "commentary", and a language,
and two streams
> would be displayed by the player.
CMML 3.1 had a 'track' attribute that could be supplied to clips to
make a similar distinction. We discussed this quite a bit at LCA last
week and the general feeling was that we should remove that from CMML
itself, focussing on its role as a text track codec for the 4.0
revision, and push the multiple-stream affects up to the authoring
level, with either a new xml format for describing stream contents, or
in the stream itself.

We need something like a catagory (we were using "role",
"lang", and a
couple of other labels) for the Ogg Skeleton message headers for use
by other media types in the stream in any case. For example, to
distinguish the main audio tracks from commentary, music or effects
only, and so on. To say that a particular visual overlay should be
applied to a particular video stream, and whether to do so by default,
or optionally. That sort of thing. So this mechanism must exist at the
per-stream level regardless of what a particular codec supports.

 -r

ogg.k.ogg.k@googlemail.com

2008-Feb-11 02:27 UTC

head link

[ogg-dev] Ogg/Kate preliminary documentation

> Right. This was, in fact, one of the roles of "chaining" where
you'd
> mark such changed components with a chain boundary, at which such
> things are explicitly allowed to change. The drawbacks are the
> overhead of resending all the setup data for configurable codecs like
> vorbis and theora, and the semantic conflict between 'chain boundary
> flags an edit point' and 'chain boundary flags a program
change' which
This also means that having to chain a particular logical stream implies
having to break and rechain all other multiplexed streams. For, say,
Theora (just imagining there, I don't know if that'd actually be the
case),
it could mean having to reencode a keyframe on the fly for the first frame
of the new chain, or go without video for whatever time is left before the
next keyframe (I've got no real idea how much time typically elapses
between keyframes, but I believe it is variable ?)
> There are certainly arguments for doing it both ways, but from the
> Annodex point of view it is nice to push as much of that onto the
> mux/skeleton level as possible, for all the reasons Silvia described.
> Do you have a counter illustration of where adding a new category
> suddenly, on the fly is contra-compelling?
No particular reason, just the fact that it constrains possible uses of the
codec, especially for on the fly generation.
I could certainly make up an example where one streams a video of people
in an office, and labels are placed near each person following them around,
but this is just a possible use I just made up, not something I actually need
to be done.
Not that the kate format currently supports well moving regions around in
realtime anyway, but that's something I'm thinking about currently.
> CMML 3.1 had a 'track' attribute that could be supplied to clips to
> make a similar distinction. We discussed this quite a bit at LCA last
> week and the general feeling was that we should remove that from CMML
> itself, focussing on its role as a text track codec for the 4.0
> revision, and push the multiple-stream affects up to the authoring
> level, with either a new xml format for describing stream contents, or
> in the stream itself.
I'm not totally sure I'm following you here.
The idea behind the category field was for a player program to automatically
classify the kate streams as being the "same". eg, if you have 16
streams,
in 8 languages, 8 of them having the category "subtitles" and the rest
having
the category "director comments", then the program could present a
list of
two available stream types, "subtitles" and "director
comments", in addition to
the language selection. This set of categories may or may not match any
categories that would apply to an accompanying multiplexed video, say.
If there was any (eg, not a text only stream).
> only, and so on. To say that a particular visual overlay should be
> applied to a particular video stream, and whether to do so by default,
That's an interesting use I hadn't thought of.

Apparently Analagous Threads

Search for more possibly parallel threads

ogg dev - Feb 2008 - Ogg/Kate preliminary documentation

[ogg-dev] Ogg/Kate preliminary documentation

[ogg-dev] Ogg/Kate preliminary documentation

[ogg-dev] Ogg/Kate preliminary documentation

[ogg-dev] Ogg/Kate preliminary documentation

[ogg-dev] Ogg/Kate preliminary documentation

Apparently Analagous Threads