thr3ads.net - theora dev - [theora-dev] My issues with ogg and directshow... [May 2004]

If this information is useful, please help other people find it:
Share via:

illiminable

2004-May-08 12:14 UTC

[theora-dev] My issues with ogg and directshow...

Listening to the meeting on granule pos tonight/today it became clear that
the issues everyone is concerned with for the most part don't affect my
implementations and the issues i have pretty much don't affect anyone
else... and in the cases where they overlap, the reasoning seems to be
different. And since everyone else has had a lot more time to consider all
these issues and i'm pretty new to this, it's a lot harder for me to
make a
cogent argument on the fly. So i figure i'd spell out all the things
i've
come across in my implementation, just to put them out there.

I'll just preface to say, that my experience in audio/video is probably
considerably less than most of the others working on this stuff, so if i
make false assumptions, am missnig the point etc, then just tell me ! :)

DShow Background
=============Directshow is a very structured media framework, there are specific
interfaces for communication and guidelines for how and when data can be
passed. It is however highly modular and flexible enabling hundreds of
codecs to be implemented with it.

Some background... there are a few major components.
www.illiminable.com/ogg/graphedit.html  for a look what the graphs are like.
Graphs, filters, pins, samples and allocator pools.

In order for two filters to connect, their pins need to offer certain media
types specifying the type of data and various parameters of the data (frame
rates, frame size, sample rate etc) depending on the media type.

Allocator pools exist between the connection of any two pins. An allocator
pool is a fixed number of fixed size samples. All data is passed through the
allocator pools. Before the user starts the graph (presses play), no data is
passed in the graph. When the usr presses play the graph goes into pause
mode and data is pushed through the graph filling up all the allocator
pools, until all the threads are blocked, then the graph goes into play
mode. As the downstream end (renderers) pulls data out of the downstream
allocators it frees up a spot for an upstream filter, it's thread unblocks
and fills the space etc.

Directshow requires start and end times for all samples.

Demuxing
======
Ok, so given that the graph has to be built before data is passed
downstream, there is a problem. How can the demuxer know what filters to
connect to (ie what the streams are) ? The demux needs to read ahead enough
to find the BOS pages. Now we know how many streams there are. How does it
know what kind of streams they are ? It has to be able to recognise the
capture patterns of every possible codec. So a "codec oblivious" demux
is
already out of the question.

Lets look further downstream for the moment... we'll assume we have a vorbis
only stream. Now the directsound audio renderer won't connect to any decoder
unless it tells it the audio parameters, number of channels, sample rate etc
etc. Now if no data can flow in the graph yet, how can the decoder have seen
the header pages to know this ? It can't. This information is considered
part of the setup data. Hence the media parameters have to come from the
demux when it connected to the decoder, ie the media type the demux offers
is (Audio/Vorbis 2 channel 44100) for example.

So the demux has to be able to parse the BOS page headers to offer a useful
media type. So now the demux has to be able to not only identify the streams
but also know how to get at least the key information out of them. ie The
demux has to know how to parse the header of every possible codec header
format it will offer.

Now, why isn't this an issue with every other codec i assume you are
thinking ?

The main reason is that the header format of ogg codecs (ie vorbis headers,
speex headers etc) is completely arbitrary and defined completely by the
codec. That's good in a way that codecs can define whatever information they
want. But it's bad in the sense that your demux can't be as dumb as
you'd
want. Other formats have at least portions of fixed header, where no matter
what the exact details of the codec, some core information can be gauranteed
to be found at fixed locations. And also codecs identifiers are fixed (or at
least bounded) size and in fixed location. So you can do for example a
fourcc map of the identifier to a directshow media-type guid, and get the
key parameters from a fixed place. So all this information is available up
front, and the demux doesn't need to know any specifics of codec headers,
and it can handle new codecs without modification to the demux.

Incidentally this is all that OGM is, just an extra header before the codec
specific ones that contains this information. Similarly annodex for example
uses anxdata headers which preface each codec stream and contains
information like granule rate and codec identifiers in fixed locations of
bounded size.

The related issue is that of identifying streams... the codec identifier has
no bounds, there is no way to say this is the end of teh identifier, and
this is the rest of the header. In other words \001vorbis is pretty much
indistinguishable to \001vorbis2. How can you tell if the 2 is part fo the
identifier or the rest of teh header ?

Time Stamps
========
Directshow works in UNITS of 1/10,000,000 of a second, it knows nothing of
granule pos. When something like media player requests a seek or a position
request it wants these units. So the seek request comes into the graph. It
needs to be passed back to the demux being the only portion of the graph
with direct access to the data source. Now in order to seek in ogg, you need
granule pos, so again the demux needs to know how to make the conversion.
The decoding filters can't make this conversion, because they only know
about their granule pos, so even if they did convert and try to get the
demux to seek on this granule pos, it would restrict the available seeking
landmarks to only that codec. So again the demux needs that information
about "granule rate" in order to make the conversion for each codec it
may
come across in it's seek.

Now after we seek, we hit a page we want to start from (and maybe go back a
bit to ensure we get a keyframe etc)... so when we scan back we find a new
starting point. Directshow now considers the time point it asked to seek to
as time 0. It doesn't want to know about absolute times.

So we are at a point a few pages before we want to start, we have to make
sure we hit one page of every logical stream in order to get a landmark
granule pos. Now thats kind of ok for dense codecs... but what about sparse
ones ? With the end time stamp scheme we have to find at least one page of
every stream before we get our deisred one.

Using the start stamp scheme we can resync as we hit a page. As we get a
page we know what time this page starts at.and we then have a reference
point to determine start and end times of every subsequent sample in that
stream. this means less seek back.

My personal preference for the timestamp scheme would be start timestamps
for all codecs. As assuming you want start and end times finding the end
given the start is much easier and efficient than finding the start given
the end time.

As for stream duration, i see no problem with having an empty EOS page which
has the end time in it.

But from the sounds of it, this isn't the general consensus.
===============================================
Anyway... i've said my bit for now ! :) This is long enough for now and
i'm
tired !

Zen.

<p>--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to
'theora-dev-request@xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is
needed.
Unsubscribe messages sent to the list will be ignored/filtered.

Timothy B. Terriberry

2004-May-08 13:58 UTC

head link

[theora-dev] My issues with ogg and directshow...

> Ok, so given that the graph has to be built before data is passed
> downstream, there is a problem. How can the demuxer know what filters to
> connect to (ie what the streams are) ? The demux needs to read ahead
enough> to find the BOS pages. Now we know how many streams there are. How does
it> know what kind of streams they are ? It has to be able to recognise the
> capture patterns of every possible codec. So a "codec oblivious"
demux
is> already out of the question.
This is an issue of where the separation line is drawn, not whether or not
separation can exist. The Ogg abstraction has a richer interaction between
codec and muxer than the DS framework mandates. But this doesn't prevent
you from defining an "Ogg codec" interface as a richer instance of the
general DS codec interface, adding such things as generic functions to
answer questions like, "Given this initial packet, can you decode this
stream?" or, "What are the DS media parameters corresponding to this
complete set of header packets?" or, "What is the time associated with
this granule position?" The muxer can still rely wholly on the codecs to
answer these questions, it just needs a richer codec API than the DS
framework in general has. New codecs can still be added without
modifications to the demux so long as they implement this extended API.

And as an aside, please don't use the phrase "granule rate"... it
implies,
incorrectly, than the granule position->time mapping can be accomplished
by multiplying by a simple scale factor, and this is NOT true in general.
In particular, it is not true for Theora.
> Directshow works in UNITS of 1/10,000,000 of a second, it knows nothing
of> granule pos. When something like media player requests a seek or a
position> request it wants these units. So the seek request comes into the graph.It

Generally one seeks to a time, not a granule position. The granule
position->time mapping is unique, but the reverse does not have to be. So
when dealing with multiple codecs, you convert everything to a time in
order to be able to compare values among them. It's unfortunate that DS
does not let one work in the native time base of the streams, but units of
100 nanoseconds should be accurate enough for most purposes.

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to
'theora-dev-request@xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is
needed.
Unsubscribe messages sent to the list will be ignored/filtered.

Ralph Giles

2004-May-10 09:21 UTC

head link

[theora-dev] My issues with ogg and directshow...

On Sun, May 09, 2004 at 03:14:37AM +0800, illiminable wrote:
> Listening to the meeting on granule pos tonight/today it became clear that
> the issues everyone is concerned with for the most part don't affect my
> implementations and the issues i have pretty much don't affect anyone
> else... and in the cases where they overlap, the reasoning seems to be
> different. And since everyone else has had a lot more time to consider all
> these issues and i'm pretty new to this, it's a lot harder for me
to make a
> cogent argument on the fly. So i figure i'd spell out all the things
i've
> come across in my implementation, just to put them out there.
Thanks for putting this together, Zen. It's really nice to have a solid
introduction to the issues from someone experienced with the framework.
> Allocator pools exist between the connection of any two pins. An allocator
> pool is a fixed number of fixed size samples.
I can see how this works fixed-bitrate codecs (and most uncompressed media, of
course). Does one just use 'really big buffers' for vbr data?
> Directshow requires start and end times for all samples.
And you've succeeded in calculating this for all our codecs?
> Ok, so given that the graph has to be built before data is passed
> downstream, there is a problem. How can the demuxer know what filters to
> connect to (ie what the streams are) ? The demux needs to read ahead enough
> to find the BOS pages. Now we know how many streams there are. How does it
> know what kind of streams they are ? It has to be able to recognise the
> capture patterns of every possible codec. So a "codec oblivious"
demux is
> already out of the question.
> 
> Lets look further downstream for the moment... we'll assume we have a
vorbis
> only stream. Now the directsound audio renderer won't connect to any
decoder
> unless it tells it the audio parameters, number of channels, sample rate
etc
> etc. Now if no data can flow in the graph yet, how can the decoder have
seen
> the header pages to know this ? It can't. This information is
considered
> part of the setup data. Hence the media parameters have to come from the
> demux when it connected to the decoder, ie the media type the demux offers
> is (Audio/Vorbis 2 channel 44100) for example.
> 
> So the demux has to be able to parse the BOS page headers to offer a useful
> media type. So now the demux has to be able to not only identify the
streams
> but also know how to get at least the key information out of them. ie The
> demux has to know how to parse the header of every possible codec header
> format it will offer.
> 
> Now, why isn't this an issue with every other codec i assume you are
> thinking ?
To clarify here, it's my understanding that format parameter lookup is a
feature of the AVI and ogm container formats (and asf, presumedly) not of
any of the specific codecs. Is this correct?

That's why lookup of this information is always possible there, and not 
for ogg, even if we provide a convenience library that can do the header
parse for all the codec embeddings it knows about, as I think derf was
suggesting.

Practically speaking, I think this can be dealt with. After all, being able
to identify a codec by FOURCC doesn't help if you can't find an
implementing
dll. From the point of view of DirectShow, it's just a limitation of this 
particular container format.

Not knowing anything about them, I'd guess that quicktime can optionally
provide a table with this information, and that MPEG program streams, like
ogg, don't provide much beyond the packet types. How does DirectShow handle
those containers?
> The related issue is that of identifying streams... the codec identifier
has
> no bounds, there is no way to say this is the end of teh identifier, and
> this is the rest of the header. In other words \001vorbis is pretty much
> indistinguishable to \001vorbis2. How can you tell if the 2 is part fo the
> identifier or the rest of teh header ?
Yes. It's well defined in specific codec specs, but more flexible in
general.
Just looking file-magic style at some of the initial bytes should always
work.
> Using the start stamp scheme we can resync as we hit a page. As we get a
> page we know what time this page starts at.and we then have a reference
> point to determine start and end times of every subsequent sample in that
> stream. this means less seek back.
This is another good example of problems with the end-time granule. Thanks.
> As for stream duration, i see no problem with having an empty EOS page
which
> has the end time in it.
The only problem here is that you can't rely on the page being there (the
stream
might be truncated, and in fact my explicitly be so in Ogg Vorbis). So it's 
sugar, not something that's 'built-in' to the format design.
> But from the sounds of it, this isn't the general consensus.
Dunno. Sounded like Aaron was on your side. :)

Cheers,
 -r
--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to
'theora-dev-request@xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is
needed.
Unsubscribe messages sent to the list will be ignored/filtered.

Possibly Parallel Threads

Search for more possibly parallel threads

theora dev - May 2004 - My issues with ogg and directshow...

[theora-dev] My issues with ogg and directshow...

[theora-dev] My issues with ogg and directshow...

[theora-dev] My issues with ogg and directshow...

Possibly Parallel Threads