ogg.k.ogg.k@googlemail.com
2008-Feb-07 03:20 UTC
[ogg-dev] Seeking to granules in discontinuous streams
> No particular answers, but I can at least point out that the way > things were designed for CMML was to work with the existing Ogg > seeking algorithm. The idea is that a generic seeking routine can work > on any Ogg file, as long as it knows the granulepos->time mapping for > the logical bitstreams in the file. That's why all the > timestamp-related info is crammed into granulepos rather than being > placed in packet data, where it is no longer visible in the > container..I think I'm missing something here, but I'm not quite sure what and I'd really like to understand it. At a basic level, CMML granule-to-time function is (granule>>32)*scale. This loses the low bits, which encode the high bits of the previous packet's granulepos. Please stop me there if I'm wrong. If one wants to seek into a CMML stream, one starts with a time. One does not have a granulepos, but has to construct one from the time, so one just generates a "good guess" granulepos: (time/scale<<32). The low bits may as well be left to zero, since we have no knowledge what they might be. And this is the crux of the problem, as far as I understand how it works: you'll have to do a first bisection to find the page that maps to that granulepos, then, and only then, you get to know the low bits of the actual granulepos, and then you have to do another bisection to find it. This is *not* handled by Ogg, since the granulepos encoding is wholly out of its concern, so there's no way it can optimize out hte second biseciton. If you want to be clever and you know clips are short and can't overlap, you can get away with manual backward streaming to avoid the bisection, but that's still a second seek. Compare this with storing the granulepos of the earliest still active packet, which is what I'm doing currently. This means a granulepos doesn't contain info on the previous packet's granulepos, but since you have to do two seeks anyway, you don't mind. You'll just have to start decoding the first packet to find out the granulepos of the earlier packet, rather than just shift the found granulepos by 32. This means you have to know the packet format, so it does depend on the lib/codec. *BUT* this is also the case for CMML, where the seeking client must know about the fact that CMML encodes a granule like it does, with (x<<32)|y. Ogg can't do anything from this knowledge (*), it still has to be handled via the client code. The only difference I see between the two methods is that mine requires a lookup inside the packet, which you'll have anyway since Ogg had to seek to it beforehand. (*) or can it ? Is this the bit I'm missing ? I'm 99% sure it can't, but I'd be happy to be proved wrong as it would resolve the conendrum.> Assumptions about seeking in Ogg: > http://www.annodex.net/software/liboggz/html/group__seek__semantics.html > > which quotes from a mail from Monty in 2002 explaining the seek algorithm.Ah, quite interesting. That last solution Monty mentions would "fix" the double seek problems.> Ralph's recent talk at LCA, "Seeking is hard: Ogg design internals": > http://linux.conf.au/programme/fridayI'll have a look, seems very topical.> Hope that clarifies things a little bit. The answer may simply be that > what you want to do isn't possible within the existing Ogg container + > seek algorithm.I don't think so, though it'd require a bit of help from the client code, who'd know the semantics of the codec. Unless this is also a consequence of the bit I'm missing and not managing to get my head around.
On 2/7/08, ogg.k.ogg.k@googlemail.com <ogg.k.ogg.k@googlemail.com> wrote:> And this is the crux of the problem, as far as I understand how it works: > you'll have to do a first bisection to find the page that maps to that > granulepos, > then, and only then, you get to know the low bits of the actual granulepos, > and then you have to do another bisection to find it.The original seeking algorithm was that: granulepos maps to a non-decreasing timestamp (in a way defined by the codec) and pages are strictly ordered by that timestamp. So you seek by bisection searching for a time. It doesn't matter that you can't compute a granulepos from a time; all you need is the other direction. What you read from the Ogg stream is a granulepos from a page header out of some particular substream, which you *can* convert to a time, and that tells you which way to jump for your next bisection. Then since the pages are in order, you just start decoding from a bit before your seek target *in time* and off you go. What makes designing Ogg embeddings complicated in working out all the implications of this for a new style of codec.> This is *not* handled by Ogg, since the granulepos encoding is wholly out > of its concern, so there's no way it can optimize out hte second biseciton. > If you want to be clever and you know clips are short and can't overlap, > you can get away with manual backward streaming to avoid the bisection, > but that's still a second seek.What I was trying to address in my talk, and don't know if it came out clearly, was that, yes there's no way to avoid the second bisection. That's bad, but not as bad as you might think because *you already have to do other bisections anyway*. Video codecs have keyframes, so you usually have to start way before your seek target. And because of the problem of continued packets, it's not enough to find a single granulepos which maps to a time nearest your seek target, you actually have to treat all the streams independently. Monty wasn't aware of this when he described the seek algorithm in 2002 (which works fine if there are no continued packets or only one logical stream). Now, this is actually an argument for not having 30 different subtitle streams all muxed together. However, I think the best way to solve this is to have a profile require some kind of lower bound on the frequency of subtitle pages. And perhaps also their coincidence with video frames, &c. so demuxers have some idea how hard to try to find such streams.> Compare this with storing the granulepos of the earliest still active packet, > which is what I'm doing currently. This means a granulepos doesn't contain > info on the previous packet's granulepos, but since you have to do two seeks > anyway, you don't mind.The advantage of storing this in the granulepos field itself, like theora and CMML do, is that the seek code may already understand how to handle the back pointer. Right now everything assumes the mapping is from 'initialized decoder' + 'granulepos from page header' => timestamp, or in the case of theora and CMML => 'timestamp' + 'last keyframe timestamp.' Having to look in the accompanying page data is an additional complication. BTW, I don't think the 'discontinuous codec thing in ogg-multiplex.htm is a very useful concept. Ogg is conceptually time-continuous media, and one may trivially make any discontinuous codec continuous by declaring that no packets mean nothing is to be played during the interval in question. The real issue is the relative frequency of pages, and the burden a great asymmetry there places on muxers and decoders to find all the relevant data. Put another way, start-time labelling for substreams with low page frequency helps buffering after they're muxed, but it only goes so far if there's not also a lower bound on that frequency. Make any sense? -r
ogg.k.ogg.k@googlemail.com
2008-Feb-11 02:46 UTC
[ogg-dev] Seeking to granules in discontinuous streams
> The advantage of storing this in the granulepos field itself, like > theora and CMML do, is that the seek code may already understand how > to handle the back pointer. Right now everything assumes the mapping > is from 'initialized decoder' + 'granulepos from page header' => > timestamp, or in the case of theora and CMML => 'timestamp' + 'last > keyframe timestamp.' Having to look in the accompanying page data is > an additional complication.Since the format of that back pointer is specific to each codec, the seek code must already know how that codec does its backlink, for each codec it supports. This is not generic for all codecs. Therefore, picking an extra 64 bits granulepos at the start of the data is a fairly trivial complication, as the Ogg layer will have the data handy since it just seeked to it. That extra complication comes with the plus of not chopping off half the granule bit space for the backlink, leaving only a 4 billionth of the granule space available (if one assumes no reserved bits). Or is there something that makes it that much easier to deal with a backlink in the granulepos itself ? For reference, my data packets start with start granule, end granule, backlink granule, all 64 bits, so picking the granule is just a simple constant offset lookup into the data packet, eg, you replace: backlink = granulepos<<32; by backlink = read64le(op.packet+constant_offset); Hardly complex.> decoders to find all the relevant data. Put another way, start-time > labelling for substreams with low page frequency helps buffering after > they're muxed, but it only goes so far if there's not also a lower > bound on that frequency.If I understand what you want to get at, the frequency would be tied to the maximum delay between two packets ? FWIW, we only need to find a packet from the disc stream on the first bisection (to find the backlink). For the second bisection, it is enough to seek to a time before that backlink, as streaming in will eventually encounter the backlinked packet, and that's enough for our purposes. Therefore, the second bisection can be done timewise on the main continuous codec (eg, the accompanying theora video). If there is no accompanying theora video (or similar), then I don't think the problem happens, since you'll be swamped by kate packets anyway, since there'll be nothing else there, so you don't waste time looking for them. Still means you get the hit for the first bisection. Following on your point about a lower bound on the frequency, maybe one could add "keepalive" packets when no data packets have been issued for a long enough time, and such packets would carry no data but the backlink itself. Hmm, I like the idea. The packets would be very small (well, datawise, most of the data will be Ogg header stuff), and we wouldn't get that bad data duplication that was inherent in the "echo" method described in the Writ doc. Mind you, that kind of frequency info is something that might be usefully put in a new skeleton field, if it could help the seek code seek more efficiently. Not sure if and how it could though.