Hi, I've been thinking about how a Theora encoder could be integrated into mencoder or transcode and I'm not sure whether I understand the A/V sync strategy of Theora/Vorbis correctly. When transcoding from some video format (ie MPEG2 or DivX), at least some images of the video stream will have time stamps, as well as the fragments of the audio stream. Or at least time stamps can be generated somehow (AVI is said to be somewhat weird...). The player would compare those time stamps to get the A-V delay and adjust playback speed accordingly to not let audio and video drift too much. What I do not understand is, how I could preserve timestamps when encoding to Theora/Vorbis without dropping frames or resample audio.>From what I know from Mplayer's Ogg demuxer, Theora's granulepos canonly represent multiples of the frametime. Vorbis's granulepos is specified in units of audio samples. Is ist valid for Vorbis packets to drift the granulepos to compensate for A/V sync? That would mean that granulepos does not represent the number of actually played samples any more. Seems like an ugly hack. Any other ideas? David -- GnuPG public key: http://user.cs.tu-berlin.de/~dvdkhlng/dk.gpg Fingerprint: B17A DC95 D293 657B 4205 D016 7DEF 5323 C174 7D40 --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'theora-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.
On Thursday, June 12, 2003, at 05:30 pm, David Kuehling wrote:> Is ist valid for Vorbis packets to drift the granulepos to compensate > for A/V sync? That would mean that granulepos does not represent the > number of actually played samples any more. Seems like an ugly hack.Yes it does. Counter to spec too, so I don't recommend it. I guess I'm perplexed from the other end. The vorbis granulepos is in number of samples, so multiply by the sample rate to get a timestamp. The theora granulepos can be converted to the frame index, which you multiply by the framerate to get a timestamp. Those timestamps should align at playback. It's up to the player to work out how to make that happen. -r --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'theora-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.
this is the same problem you get sometimes trying to convert Quicktime to AVI -- QT has timestamps in millisec, AVI just has a global frame-per-sec field and a framenumber for each frame. Ogg is more like AVI in that respect. For the record, MPEG2 only has timestamps that are in effect frame numbers. Ralph is correct that if the original file is properly constructed, the audio and video should relate in a mathematically perfect fashion. The problem comes up in this context: many cheaper multimedia systems do not have perfect, frame-accurate A/V sync on input (for instance, the video is usually locked to the incoming signal, but the audio is resampled by the computer's internal audio board using a different clock). Material captured from such systems will not have a 'standard' framerate, and can even drift around within the file. AVI files (and, by extension, Ogg streams) can only deal with this in a global way, by slightly changing the frames-per-sec number (that's why we have so much precision in the numerator/denominator). This usually works (at least for short files -- try to watch a 2-hour AVI file captured on a home system; you'll often have up to 10 frames out of sync in the middle of the movie). To see how this works, look at your favorite DivX AVI pRon clip using riffwalk, and check out the frames per sec -- it's probably something like 23.92364. Quicktime is more sophisticated; it can actually stamp each frame with the exact time it was captured. Therefore, there may be no way to map a Quicktime movie onto AVI (or Ogg/Theora), maintaining perfect sync, without duplicating or dropping frames. These issues frequently plague anyone who has tried to convert Quicktime files to another format. We debated going with a timestamp approach, but the framecount won out (for now). ___ Dan Miller (++,) Founder, On2 Technologies <p>> -----Original Message-----> From: Ralph Giles [mailto:giles@xiph.org] > Sent: Thursday, June 12, 2003 11:53 AM > To: theora-dev@xiph.org > Subject: Re: [theora-dev] A/V sync in Theora > > > On Thursday, June 12, 2003, at 05:30 pm, David Kuehling wrote: > > > Is ist valid for Vorbis packets to drift the granulepos to > compensate > > for A/V sync? That would mean that granulepos does not > represent the > > number of actually played samples any more. Seems like an > ugly hack. > > Yes it does. Counter to spec too, so I don't recommend it. > > I guess I'm perplexed from the other end. The vorbis granulepos is in > number of samples, so multiply by the sample rate to get a timestamp. > The theora granulepos can be converted to the frame index, which you > multiply by the framerate to get a timestamp. Those timestamps should > align at playback. It's up to the player to work out how to make that > happen. > > -r > > --- >8 ---- > List archives: http://www.xiph.org/archives/ > Ogg project homepage: http://www.xiph.org/ogg/ > To unsubscribe from this list, send a message to > 'theora-dev-request@xiph.org' > containing only the word 'unsubscribe' in the body. No > subject is needed. > Unsubscribe messages sent to the list will be ignored/filtered. >--- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'theora-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.
as you say, Mplayer and other AVI apps slightly adjust framerate *after* the capture is complete. If this is done correctly, it will work well in most cases. The exception is if you are capturing off an old analog videotape, where tape stretch can easily make the sync drift up and down. DVD or broadcast TV capture, or mini-DV type stuff will work fine (because the playback machine has a stable digital clock). Theora will work just as well as AVI. There are no arbitrary timestamps in AVI or MPEG; there are frame numbers and a global framerate (MPEG doesn't even have the option of a fractional framerate other than drop frame = 23.997). -----Original Message----- From: David Kuehling [mailto:dvdkhlng@gmx.de] Sent: Thu 6/12/2003 3:29 PM To: theora-dev@xiph.org Cc: theora-dev@xiph.org Subject: Re: [theora-dev] A/V sync in Theora >>>>> "Ralph" == Ralph Giles <giles@xiph.org> writes: > I'm surprised you describe this as a problem with DVD though. I didn't > think players were smart enough to adjust playback to match an > external sync. What do you mean with "external sync"? The way I understand mplayer, mplayer continously calculates the A/V delay using either heuristics or any kind of timing information from the stream. It then slighly adjusts playback speed to keep the delay down. I'm not sure about what Dan said about MPEG2. Exact time stamps might not be required to get A/V sync right. I'm not concerned about absolut correctness in the time domain, just relative A/V delay. Maybe audio fragments in MPEG2 can somehow be related to the frames they belong to, either by position in the bitstream or any kind of additional information? Imagine how many people would be annoyed, if the first beta of Theora with the first simple transcoders produced desynced output. How long will it take until someone implements a transcoder which is smart enough to minimize A/V sync by scanning the source material and adjusting numerator/denumerator correctly? That technology is also IMO too restricted. How am I supposed to record television flawlessly with Theora? My sound card usually reports a sample rate of 44101 when set to 44100. But mencoder will still assume that the sample rate is 44100, introducing an error that will amount to 163ms after 2 hours. I start to notice that something is wrong after about 50ms. I have a Sound Blaster PCI 128 which I would consider quite high quality. What about even cheaper sound cards? Remember that this is the error reported by the sound card, due to limits in the timing chip, not taking into account the actual inaccuracy of the timer itself. Note also that Linux won't report fractual sample rates. So the average error after 2 hours of recording with 22050 Hz sample rate would as well be 163ms, even if the soundcard's timer is perfect. You cannot tell people that they can use Theora for one thing but not another. Sorry for not having more constructive comments at the moment. You won't be willing to throw away the flawed Ogg concept and stick to Matroska instead? ;-) David -- GnuPG public key: http://user.cs.tu-berlin.de/~dvdkhlng/dk.gpg Fingerprint: B17A DC95 D293 657B 4205 D016 7DEF 5323 C174 7D40 --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'theora-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered. -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 6638 bytes Desc: winmail.dat Url : http://lists.xiph.org/pipermail/theora-dev/attachments/20030612/754922d7/winmail-0001.bin