I'm interested to know of any player, which can properly resync after
receiving a theora+vorbis stream that has been cut somewhere in the
middle... ie it doesn't start at granule 0.
The problem being... when streams are cut in this way, the start time, and
the times of the first few packets in each stream are unknown. When working
at a packet level... the first few packets will arrive with timestamps
of -1, it's only once the last packet in the first page arrives that you
have any idea the relative offsets of the streams (if you are lucky), or
even when either of them are supposed to commence playback, the times you
should have assigned to the packets that have already passed.
Eg... suppose a stream starts (after headers)
Page Vorbis (time equiv gran pos = 50 secs)
Packet 1
Packet 2
....
Packet 15
Page Theora (time equiv gran pos = 51 secs)
Packet 1
Packet 2
..
Packet 6
Lets assume we have 5 fps video and each vorbis packet is 0.04 secs long.
So this means that the vorbis page starts at time equiv of 49.4 secs and the
theora page at time equiv of 49.8, but this is unknown to a player.
OK... so my question is, how do you know when to start playing back each
stream, and at what time the first 14 vorbis packets and the first 5 theora
frames are supposed to be presented ?
The options i see are :
a) Just start playing everything now and hope for the best :
ie. we are 0.4 seconds out of sync
b) Just drop those initial packets :
ie we lose data, and we still can't assign a presentation time to
anything, since we have no timebase to work from. Or you could resync by
making all the codecs talk to each other... which is pretty dodgy.
c) VLC Solution... jsut start playing now and hopefully we'll fix it up
later.
ie just start playing now, and hope we can resample everything by trial
and error later so it resyncs.
d) The big buffer solution.
ie Buffer everything up letting all the renderers run dry, and hope we
eventually get something that has a time stamp (which is not gauranteed) and
then jam all the collected data (ie 1+ seconds of raw audio and video) out
in one big chunk.
All of which are poor solutions. The other issue is, how are we supposed to
tell the difference between a file of this kind that really does intend for
there to be a wait time at the start (ie for resyncing, where the video may
really not be intended to start for some amount of time to compensate for
the different granularitys of the audio and video), and one which looks
idenical but expects us to treat the start as really being the start
regardless of the granule pos.
I'm starting to think that there is nothing that actually can properly
resync... but all of them are just "close enough" as to be acceptable.
Cheers,
Zen.