thr3ads.net - ogg dev - [ogg-dev] New Ogg Dirac mapping draft [Aug 2008]

If this information is useful, please help other people find it:
Share via:

Ralph Giles

2008-Aug-12 03:03 UTC

[ogg-dev] New Ogg Dirac mapping draft

David Flynn has proposed a new Ogg Dirac mapping. The draft is here:

   http://davidf.woaf.net/dirac-mapping-ogg.pdf

This is a much bigger break from other codecs than my draft (at  
http://wiki.xiph.org/index.php/OggDirac). We talked a bit about it on  
IRC today. Below is my summary; hopefully David can correct anything  
I got wrong or misleading. Comments?

There are two main differences from the earlier proposal:

* The granulepos is split into three fields instead of two, with the  
extra field encoding the reordering offset.

* The mapping requires a page flush after every frame data packet.

The first allows the actual presentation time of the corresponding  
packet to be determined, while in my scheme a group of reordered  
frames all get the same granulepos.

The second assigns a granulepos to every *packet* instead of every  
*page* as is usual, so the granulepos can be used in practice to  
calculate a presentation timestamp for every frame.

An offset to a restart point for restarting after seek is included as  
in my draft.

Pros:

The muxer doesn't have to crack data packets or maintain state to  
figure out the presentation timestamps. Demux code is simpler.

Both presentation and decode timestamps are readily available from a  
simple look at the granulepos on each packet out of libogg.

The encoding is clever, so the frame number calculation by adding the  
two halves according to the skeleton granule shift still works.

Cons:

Restart after seek still requires new code; that part of skeleton  
doesn't work.

Muxing overhead for one-page-per-packet is excessive for small  
packets. 3% (vs 0.7%) for 230 kbps video, up to 10% for 50 kbps. This  
isn't going to work for cell phone video.

Many Ogg tools assume they can repaginate, and probably won't get the  
one-page-per-packet stuff right. This leads to the usual argument  
that the demuxer has to be able to reconstruct the timestamps anyway,  
if it's going to be liberal in what it accepts, so the demuxer isn't  
actually simpler.

The granulepos will no longer be numerically non-decreasing, so  
implementations that make this assumption will break.

Summary:

Adding a new codec has always required code changes to the muxer in  
Ogg. The question is whether this is a better precedent for future  
codecs with future-predicted data. My draft tried to be minimally  
different from previous practice: same granulepos logic as theora,  
custom timestamp generation like all the other codecs. David's draft  
requires new seek as well as timestamp generation code, but moves  
some of the complexity for the latter from simplistic to  
sophisticated implementations.

The idea of one-packet-per-page isn't unprecedented. CSIRO did that  
for their mobile video version of theora (and reported the overhead  
was a real problem). We also talked about long pages without spanning  
packets at FOMS in January. There's a buffering issue with packets  
that are both packed and spanning, and the overhead can actually be  
lower for large (>8K) packets. I'd almost rather see us take this  
route, with a new Ogg page type, if the Dirac developers want a  
timestamp per frame, but that certainly doesn't minimize disruption.

  -r

ogg.k.ogg.k at googlemail.com

2008-Aug-12 12:46 UTC

head link

[ogg-dev] New Ogg Dirac mapping draft

> Many Ogg tools assume they can repaginate, and probably won't get the
> one-page-per-packet stuff right. This leads to the usual argument
This could be something to add to Skeleton. Kate (and probably CMML)
needs the one-packet-per-page thing also, and any discontinuous codec
probably needs it as well (well, not *need*, but no good buffering without
it). It's trivial for a muxer to do, and it's transparent to a demuxer.
> The granulepos will no longer be numerically non-decreasing, so
> implementations that make this assumption will break.
Wouldn't pretty much anything that deal with Ogg be broken by this ?
How would, say, oggz-validate deal with this, apart from treating Dirac
differently ? Seeking would not work without knowing a particular stream
is Dirac too (would have to bsearch on time, not granpos directly). I'd
have thought this to be a hard requirement rather than an assumption.
If using Skeleton, seeking can now be done on time values even if you
don't know which codec it is - I think this breaks this too (though I have
not thought too hard about this).
> lower for large (>8K) packets. I'd almost rather see us take this
> route, with a new Ogg page type, if the Dirac developers want a
Intriguing, can you expand on what you mean by "new Ogg page type" ?

Ralph Giles

2008-Aug-13 09:24 UTC

head link

[ogg-dev] Fwd: New Ogg Dirac mapping draft

On Tue, Aug 12, 2008 at 5:46 AM, ogg.k.ogg.k wrote:
> This could be something to add to Skeleton. Kate (and probably CMML)
> needs the one-packet-per-page thing also, and any discontinuous codec
> probably needs it as well (well, not *need*, but no good buffering without
> it). It's trivial for a muxer to do, and it's transparent to a
demuxer.
That's a good idea. Any suggestions for where?
> Wouldn't pretty much anything that deal with Ogg be broken by this ?
It depends how they're written. If they calculate a numerical
granulepos for the desired point on the timeline and seek by comparing
stream granulepos to that numeric value, they will fail. If they
convert the stream values to time and compare that way, it will work
fine. We've tried to encourage that direction with calls like
th_granule_time(), and in general it's difficult to calculate a
numeric value for theora because of the skips at the keyframes.

David, can you think of a fancier encoding that would make your
granulepos values non-decreasing?

OTOH, the RFC can be read to require the numeric values be increasing.

  granule position: An increasing position number for a specific
     logical bitstream stored in the page header.  Its meaning is
     dependent on the codec for that logical bitstream and specified in
     a specific media mapping.

The question is whether 'position number' is literally the same as the
value of the granule position field. :)

Also, I say 'non-decreasing' since a codec doesn't necessarily
advance
decoding with every packet. The seeking algorithm only requires
non-decreasing time-equivalents, so 'increasing' in the RFC is an
artificial constraint.
>>                                                  I'd almost rather
see us take this
>> route, with a new Ogg page type, if the Dirac developers want a
>
> Intriguing, can you expand on what you mean by "new Ogg page
type" ?
The Ogg page header has a version field and 5 unused flag bits, so we
can add new page types if we want. ogg_stream_pagein() will reject
pages with newer version numbers, and extra flags are ignored.

There were a couple of things we talked about wanting to do if we
revised the Ogg page structure:

* The CRC rejection of corrupt data isn't always what you want (i.e.
good for audio where digital noise is unacceptable, bad for video
where a goopy picture is better than no picture at all) and is
expensive to calculate when muxing high-bitrate streams. So it would
be nice if we could flag the CRC just covering the page header fields
and not the packet data.

* Having packets both span and pack in the same page increases the
expense of seeking, so it would be nice to do something about this. A
preserve-flag in skeleton like you suggested for one-page-per-packet
might help here, but we could also solve it with a new page type that
just doesn't allow it.

* Various codecs have wanted one-page-per-packet regardless to reduce
buffering requirements for low-frequency packet streams.

* The lacing method of encoding packet length and page spanning is
less efficient for large packets, so for HD video a 'large page' type
would be nice.

There hasn't been a concrete suggestion for this, but the general idea
is that we introduce an new Ogg page version 1, with a flag for
whether CRC includes the payload data or not, and either another flag
or another page version that selects a different packet length
encoding which doesn't support packing. Whether we want to add
multiple explicit timestamp fields like mux authors have requested, I
don't know.

 -r

ogg.k.ogg.k at googlemail.com

2008-Aug-13 10:05 UTC

head link

[ogg-dev] Fwd: New Ogg Dirac mapping draft

>> This could be something to add to Skeleton. Kate (and probably CMML)
>> needs the one-packet-per-page thing also, and any discontinuous codec
>> probably needs it as well (well, not *need*, but no good buffering
without
>> it). It's trivial for a muxer to do, and it's transparent to a
demuxer.
>
> That's a good idea. Any suggestions for where?
Well, there are reserved bits in fisbone just after the granule shift.
I doubt adding this to the message headers would be a good way to do it,
but that'd another way.
>> Wouldn't pretty much anything that deal with Ogg be broken by this
?
>
> It depends how they're written. If they calculate a numerical
> granulepos for the desired point on the timeline and seek by comparing
> stream granulepos to that numeric value, they will fail. If they
And that's the canonical way AFAIK. Comparing times computed from
the granpos you get from pages you get from a bsearch requires good
knowledge of the codec, whereas comparing granpos can seek within
any codec.
> OTOH, the RFC can be read to require the numeric values be increasing.
I recall pointing out a discrepancy between Ogg docs and the RFC, and
I think someone (either Silvia or Conrad, probably Silvia) fixed the RFC.
(That was the reason I'd originally included the low counter bits for Kate).
>> Intriguing, can you expand on what you mean by "new Ogg page
type" ?
>
> The Ogg page header has a version field and 5 unused flag bits, so we
> can add new page types if we want. ogg_stream_pagein() will reject
> pages with newer version numbers, and extra flags are ignored.
Which is essentially creating a new format, as an old demuxer will be able
to do nothing at all with such a stream, or do you have a cunning plan to
make those backward compatible ?
> * The lacing method of encoding packet length and page spanning is
> less efficient for large packets, so for HD video a 'large page'
type
> would be nice.
I'll take this opportunity to mention my pet idea (don't think I ever
mentioned
it on ogg-dev), to start with a byte, then, if larger than 255, add two bytes,
then for packets larger than 255+65536, add 4 bytes. This lacing is worse
than the current one only for packets between 256 and 511 bytes (admittedly
a probably common case, but adding only one byte). Now that you mention
flags, lacing type could be put in flags, so this worsening could be avoided
too.
> encoding which doesn't support packing. Whether we want to add
> multiple explicit timestamp fields like mux authors have requested, I
> don't know.
Certainly a criticism of Ogg I heard more than once :)

David Flynn

2008-Aug-13 20:08 UTC

head link

[ogg-dev] New Ogg Dirac mapping draft

On 2008-08-12, Ralph Giles <giles at xiph.org>
wrote:> David Flynn has proposed a new Ogg Dirac mapping.
I thought it'd be a good idea to explain some of the rationale in why we
want to change the definition of granulepos in the ogg-dirac mapping.

Terms used in this document:
 - GP64 = The 64bit granule_position as found in the page header.
 - GPH+L = Granule pos high + low as split by granule_shift.
 - ST = System Time; this is the monotonically increasing decoder clock
 - PT = Presentation Time; Picture is displayed when PT = ST, which implies AV
sync.

NOTE, we will not use the terms I,P,B -- they are mpeg2 terms which do not
map to constructs in dirac or h264 properly.

Properties of an Out-of-order video codec (dirac,h264,vc-1,mpeg2)
 - Each picture has a unique PT.
 - Pictures in the stream are not in PT order.
 - The decoder reorders pictures at output into PT order.
 - ST != PT in stream order (ie, input to decoder).

Defacto rules of ogg (I've not found these actually written down anywhere):
 [A1] One of GP64 or GPH+L must increase for each packet
      For in-order codecs using keyframe-granuleshift, both are true.
 [A2] GPH+L == time.
   All codecs so far are inorder, so ST=PT=time.
 [A3] Page flushes are NOT invariant across remuxes.

 The ogg RFC does states that GP64 is codec specific without any
 restriction.

What is needed to decode & display Out-of-order coded video?
 Each picture must have a unique & accurate(correct) PT.
 ST needs to be derived from the stream correctly:
   - Can interpolate ST for a particular picture
   - Can not determine the starting value of ST from the first picture.
   - This happens in streaming, example:
      PT: 14 10 11 12 13
      ST: 10 11 12 13 14

What is problematic with the xiph mapping?
  - Here is an example using the xiph mapping:
      Sync point: V       V                 V
      PT(actual): 0 3 1 2 6 4 5 9 7 8 c a b d
      GP_high:    0 0 0 0 6 6 6 6 6 6 6 6 6 d
      GP_low:     1 1 2 3 1 1 1 1 2 3 3 4 5 1
      GPH+L-1:    0 0 1 2 6 6 6 6 7 8 8 9 a d

  - Each picture does not have a unique value for granulepos
     => Cannot determine unique&correct PT
     => Cannot determine correct ST
     => If (due to paging) no GP64 is available for a frame,
        it is impossible to correctly interpolate the value of PT.
     => Don't know when to display pictures

  - Seeking is difficult:
     -  Want to seek to frame N
     -  GPH+L is non-unique (don't know if the right one has been found)
     => Some values of GPH+L do not exist (searches may fail)
     -  and GPH+L != N (ie, may find the wrong frame)

  - Locating the sync point (eg, after seek) is irritating
     - GP_low != to number of packets(pictures) since sync point.
     => Have to search backwards until GP_high changes

  - Copes badly with open gop:
    To correctly decode picture(PT=4) in above example, the sync point it
    depends upon is picture(PT=0).
    However, this would violate the property of GP64(n) > GP64(n-1).

  - It requires that a page is flushed before transmitting a sync point
    so that a syncpoint is guaranteed to have a valid GP64.
    This violates axiom A3


Some comments on choice of GP64 in bbc mapping:
  Consider axiom A2 (GPH+L == time), assume this is PT.
  - PT (not ST) makes sense for AV sync
  - PT (not ST) makes sense for locating pictures (seek)
    although a naive sync will find the wrong picture.
  - The stream is in the order required to satisfy decoding
    dependencies, ie PT jumps around.
    This violates axiom A1 (GPH+L(n) > GPH+L(n-1)).
    This violates axiom A1 (GP64(n) > GP64(n-1)).

  Consider axiom A2 (GPH+L == time), assume this is ST.
  - Complies with axiom A1 (GPH+L(n) > GPH+L(n-1)).
  - Complies with axiom A1 (GP64(n) > GP64(n-1)).
  - Is not useful for AV sync.
  - Is not useful for seeking (you will end up with the wrong picture).

  => No good reason for GPH+L == ST
  .'. choose GPH+L = PT.

Some interactions with skeleton:
  >  ... allowing to map a granule position [GPH+L] to time by calculating
  >  "granulepos [GPH+L] / granulerate"
    -- http://wiki.xiph.org/OggSkeleton

 '.' the only useful time to decoding is the PT
  => GPH+L = PT.

  Ie, you can seek based upon presentation time, however a binary
  search can hit a reordered picture and therefor choose the wrong
  picture at the end.  The error is +/- one GOP.

 ---
  > Restart after seek still requires new code; that part of skeleton
  > doesn't work.
    -- http://article.gmane.org/gmane.comp.multimedia.ogg.devel/1118

  Actually, ogg skeleton does not provide such information to any GOP
  based video codec.
    - It only has Preroll, which in a GOP based video
      codec is constantly varying.
    - Preroll only makes sense for video when using something such as
      Ponly-with-intra-slice-refresh, where there are no keyframes.

Some final remarks:
  - one-packet-per-page:
    It has been said that one-packet-per-page (ie, a page flush per
    packet) upsets remuxing due to axiom A3.  however, it is a requirement
    of the xiph mapping that a page flush occurs before a sync point.
  - To resolve the above contradiction, i assume that axiom A3 is invalid
  - The bbc mapping allows reconstruction of PT, ST and distance to
    syncpoint without any a priori information.
  - The bbc mapping does not require peeking into the packet payload to
    fill in the blanks
  - If GPH+L is to be useful, it is not possible to comply with
    axiom A1 (GPH+L(n) > GPH+L(n-1)).

Stop press:
  - I've realised that it is possible to rearrange GP64 in such a way that:
    + Complies with axiom A1 (GP64(n) > GP64(n-1)).
    + violates axiom A1 (GPH+L(n) > GPH+L(n-1)).
    However, i doubt that is any use, since i hope any sane demuxer searches
    based upon GPH+L.

Regards,
..david

Ralph Giles

2008-Aug-15 21:52 UTC

head link

[ogg-dev] Fwd: Fwd: New Ogg Dirac mapping draft

On Wed, Aug 13, 2008 at 3:05 AM, ogg.k.ogg.k at googlemail.com wrote:
> And that's the canonical way AFAIK. Comparing times computed from
> the granpos you get from pages you get from a bsearch requires good
> knowledge of the codec, whereas comparing granpos can seek within
> any codec.
No. it's in general impossible to calculate the granulepos that
corresponds to a particular time in a theora stream; only the reverse
is possible. That's why David was talking about comparing High Word +
Low Word, which is the frame count in theora, and can be treated as
the seek time in different units.
>> OTOH, the RFC can be read to require the numeric values be increasing.
>
> I recall pointing out a discrepancy between Ogg docs and the RFC, and
> I think someone (either Silvia or Conrad, probably Silvia) fixed the RFC.
> (That was the reason I'd originally included the low counter bits for
Kate).
Right, thanks for pointing that out. The correction is recorded at
http://wiki.xiph.org/RFC_3533_Errata

David's proposal still violates that without the "stop the press"
fancier encoding.
> I'll take this opportunity to mention my pet idea (don't think I
ever mentioned
> it on ogg-dev), to start with a byte, then, if larger than 255, add two
bytes,
> then for packets larger than 255+65536, add 4 bytes. This lacing is worse
> than the current one only for packets between 256 and 511 bytes (admittedly
> a probably common case, but adding only one byte). Now that you mention
> flags, lacing type could be put in flags, so this worsening could be
avoided
> too.
So this would be:

len = read_uint8()
if (len == 255):
 len += read_uint16()
 if (len == 255+65535):
   len += read_uint32()

And a len of 2^32 - 1 would indicate a continued packet?

We can't change the lacing scheme without changing the stream
structure version. So I'd propose something like:

stream_structure_version = 1
header_type_flags:
 bit 0 : fresh/continued packet
 bit 1 : bos
 bit 2 : eos
new flags:
 bit 3 : CRC is only the header data
 bit 4-5: 0 is traditional lacing, 1 is 16 bit packet length, 2 is 32
bit packet length, 3 is 64 bit packet length.
 The packet length field would start on byte 26 where the segment
table length is in the traditional lacing. Using this kind of length
encoding implies one packet per page with no continuation.

If the lacing type and crc flags are zero, stream structure can be
zero. Muxers SHOULD not mix stream structure values within a logical
bitstream to avoid confusing legacy implementations.

 -r

Ralph Giles

2008-Aug-15 23:48 UTC

head link

[ogg-dev] Fwd: New Ogg Dirac mapping draft

We've been discussing this on irc. Short summary, followed by some
responses.

I think we've verified now that my old proposal works fine for MPEG-2
style reordered streams. I believe it can be made to work with 'open
gop' streams by making the granulepos assignment more sophisticated
than I described. However, Dirac allows essentially random reference
structures, so it's possible to construct streams with overlapping
keyframe dependencies my proposal can't handle without breaking the
numerically non-decreasing granulepos rule.

That's an argument for David's granulepos mapping, especially since
the open gop stuff in my mapping is hacky. My thinking now is that the
non-decreasing numeric encoding (the stop-the-presses version) is
better. GPH+GPL=frame works for theora, but doesn't do any better with
naive seeking than 'find this numerical granulepos' and doesn't
simplify frame-accurate seeking if you relax the one-page-per-packet
rule, which I think we must.

On Wed, Aug 13, 2008 at 1:08 PM, David Flynn <davidf+nntp at woaf.net>
wrote:
> Defacto rules of ogg (I've not found these actually written down
anywhere):
No, we've not really worked out these parts of the spec. Thanks for helping!
>  - Seeking is difficult:
>     -  Want to seek to frame N
>     -  GPH+L is non-unique (don't know if the right one has been found)
>     => Some values of GPH+L do not exist (searches may fail)
>     -  and GPH+L != N (ie, may find the wrong frame)
You're really wanting the granulepos field to be a frame timestamp.
Ogg just isn't designed to provide this information. The granulepos
isn't present in the stream for every packet. They're just supposed to
provide "seeking signposts" during the bisection search and, mostly as
a side effect, let an encoder give some hints to the muxer about
interleave order to reduce buffering.

Your proposal stuffs sequence headers and other aux data units in with
the following frame in a single Ogg packet, and then insists on
special one-packet-per-page encapsulation, so you can get this frame
timestamp behaviour. I think that's why it feels like such a hack to
me. New constraints, breaking abstraction layers, to do something that
the format doesn't intend.

I agree seeking is hard. To recap, Monty's original vision was that
granulepos would be monotonically increasing, and you could map your
seek time onto a granulepos and bisection search for that number. That
worked great for vorbis-only streams, but as soon as you have
multiplexed data, you have multiple granulepos schemes (or just
timebases) so it's easier to map any granulepos you find to time and
then compare in that space. With theora, we took advantage of this to
squeeze in a reference to the closest restart point (keyframe) without
revising the container code. So you can't calculate f:time->granulepos
at all in general now, only its inverse. And it turns out, because of
packed and continued packets, that you can't even find a single
restart point, you have to find "the last page with a timestamp that
maps to a time prior to the seek point, for each substream you care
about, and start decoding each substream there." And then you have to
search again for keyframe streams, back up by the preroll in lapped
streams, etc.

This is all about being able to do frame accurate seeking. Maybe
applications don't actually care about that, just getting in the
neighborhood is good enough. There are things a muxer can do (and
mapping spec recommend) as "best practices" to improve the performance
of such naive implementations. Like strategic page flushes. I think
we're all for that, but assuming those practices will always happen in
an application (like an editor) that needs frame-accurate access
without fail violates 'liberal in what you accept'.

 -r

Ralph Giles

2008-Aug-18 16:48 UTC

head link

[ogg-dev] Fwd: New Ogg Dirac mapping draft

On Mon, Aug 18, 2008 at 2:25 AM, ogg.k.ogg.k at googlemail.com
<ogg.k.ogg.k at googlemail.com> wrote:
>> If the lacing type and crc flags are zero, stream structure can be
>> zero.
>
> I do not understand this.
My point here was that stream structure version 0 pages (the current
ogg spec) are a strict subset of stream structure version 1 pages in
my proposal. (It could be so in yours too.) The new flag values are
chosen so that if they are set to zero as they are in version 0 pages,
the header can be parsed just like a version 0 page, which hopefully
simplifies implementation.

 -r

Apparently Analagous Threads

Search for more possibly parallel threads

ogg dev - Aug 2008 - New Ogg Dirac mapping draft

[ogg-dev] New Ogg Dirac mapping draft

[ogg-dev] New Ogg Dirac mapping draft

[ogg-dev] Fwd: New Ogg Dirac mapping draft

[ogg-dev] Fwd: New Ogg Dirac mapping draft

[ogg-dev] New Ogg Dirac mapping draft

[ogg-dev] Fwd: Fwd: New Ogg Dirac mapping draft

[ogg-dev] Fwd: New Ogg Dirac mapping draft

[ogg-dev] Fwd: New Ogg Dirac mapping draft

Apparently Analagous Threads