Silvia Pfeiffer
2007-Apr-29 03:19 UTC
[ogg-dev] Fwd: [whatwg] Cue points in media elements
Hi, this is an email from the HTML5 standardisation group, which I thought would be very interesting to Annodex and Xiph people. I do not know enough about the cue points that he is talking about to know how much should be done in html/javascript and how much directly in the media. But I found the need for cue points interesting. Maybe someone here has an answer to Brian on how it could work or what should be improved in the HTML5 spec to make it work... Cheers, Silvia. ---------- Forwarded message ---------- From: Brian Campbell <Brian.P.Campbell@dartmouth.edu> Date: Apr 29, 2007 5:14 PM Subject: [whatwg] Cue points in media elements To: whatwg@whatwg.org I'm a developer of a custom engine for interactive multimedia, and I've recently noticed the work WHATWG has been doing on adding <video> and <audio> elements to HTML. I'm very glad to see these being proposed for addition to HTML, because if they (and several other features) are done right, it means that there may be a chance for us to stop using a custom engine, and use an off-the-shelf HTML engine, putting our development focus on our authoring tools instead. My hope is that eventually, if these features get enough penetration, to put our content up on the web directly, rather than having to distribute the runtime software with it. I've taken a look at the current specification for media elements, and on the whole, it looks like it would meet our needs. We are currently using VP3, and a combination of MP3 and Vorbis audio, for our codecs, so having Ogg Theora (based on VP3) and Ogg Vorbis as a baseline would be completely fine with us, and much preferable to the patent issues and licensing fees we'd need to deal with if we used MPEG4. For the sort of content that we produce, cue points are incredibly important. Most of our content consists of a video or voiceover playing while bullet points appear, animations play, and graphics are revealed, all in sync with the video. We have a very simple system for doing cue points, that is extremely easy for the content authors to write and is robust for paused media, media that is skipped to the end, etc. We simply have a blocking call, WAIT, that waits until a specific point or the end of a specified media element. For instance, in our language, you might see something like this: (movie "Foo.mov" :name 'movie) (wait @movie (tc 2 3)) (show @bullet-1) (wait @movie) (show @bullet-2) If the user skips to the end of the media clip, that simply causes all WAITs on that media clip to return instantly. If they skip forward in the media clip, without ending it, all WAITs before that point will return instantly. If the user pauses the media clip, all WAITs on the media clip will block until it is playing again. This is a nice system, but I can't see how even as simple a system as this could be implemented given the current specification of cue points. The problem is that the callbacks execute "when the current playback position of a media element reaches" the cue point. It seems unclear to me what "reaching" a particular time means. If video playback freezes for a second, and so misses a cue point, is that considered to have been "reached"? Is there any way that you can guarantee that a cue point will be executed as long as video has passed a particular cue point? With a lot of bookkeeping and the "timeupdate" event along with the cue points, you may be able to keep track of the current time in the movie well enough to deal with the user skipping forward, pausing, and the video stalling and restarting due to running out of buffer. This doesn't address, as far as I can tell, issues like the thread displaying the video pausing for whatever reason and so skipping forward after it resumes, which may cause cue points to be lost, and which isn't specified to send a "timeupdate" event. Basically, what is necessary is a way to specify that a cue point should always be fired as long as playback has passed a certain time, not just if it "reaches" a particular time. This would prevent us from having to do a lot of bookkeeping to make sure that cue points haven't been missed, and make everything simpler and less fragile. We're also greatly interested in making our content accessible, to meet Section 508 requirements. For now, we are focusing on captioning for the deaf. We have voiceovers on some screens with no associated video, video that appears in various places on the screen, and the occasional sound effects. Because there is not a consistent video location, nor is there even a frame for voiceovers to appear in, we don't display the captions directly over the video, but instead send events to the current screen, which is responsible for catching the events and displaying them in a location appropriate for that screen, usually a standard location. In the current spec, all that is provided for is controls to turn closed captions on or off. What would be much better is a way to enable the video element to send caption events, which include the text of the current caption, and can be used to display those captions in a way that fits the design of the content better. I hope these comments make sense; let me know if you have any questions or suggestions. Thanks, Brian Campbell Interactive Media Lab, Dartmouth College http://iml.dartmouth.edu
Possibly Parallel Threads
- Fwd: [whatwg] Video, Closed Captions, and Audio Description Tracks
- Fwd: [whatwg] Suddenly, ~40% of IE users get HTML5 Theora with no effort
- using Kate for WebVTT encapsulation
- Should Join Discussion - Fwd: [whatwg] Give guidance about RFC 4281 codecs parameter
- Chris DiBona's passing comment on Theora + Youtube on the WhatWG list