Ok, here's what I've been thinking of in terms of the scrolling lyrics format for Ogg. An xml stream, it matches the head-body-[body-]-tail structure I suggested for packetization. I'm happy with the lyrics aspect, and it maps cleanly onto the existing formats. I also think it will handle the talk transcript, subtitle, and karaoke requirements well. I call it a 'transcript' stream as the most general of these terms. We treat synchonized and staic transcripts of equal footing. The timestamps themselves are marked by optional attributes on just about any xml tag, so they can easily be added to a static transcript, or ignored for static display of a syncronized one. The difference probably will be flagged in the stream description metadata, though. I think it wise to partition the timestamp attributes into a separate xml namespace, to facilite their use in other xml streams. What I'm not happy with is the extension of this to scripts and screenplays. I think it would be really cool if we could move transparently from a screenplay or script to subtitles, just by changing a stylesheet, but their structure is more complicated and it will mean a number of additional tags. Not least is the problem of the various (yet allegedly rigidly specified) formatting and structure conventions for screenplays, stage plays, and radio scripts. At this point, you might as well look at the attached examples. Below is a summary of the tags, but if you're familiar with xml, the examples should be enough. After the xml declaration a <transcript> tag containing the entire document. Stream type identification would happen here if it isn't obvious from the declaration line. Next is an <info> element with basic metadata, similar in spirit to the vorbis comment header. Probably with the same tags, too, plus things like "transcriber" and "translator". The close of the <info> element ends the "head" part of the document. An ogg packet boundary will probably occur here, defining the tree depth for any further splits. The coarse structure of the transcript is given by a nestable <section> element. We use the general 'class' attribute to distinguish the various levels and types of grouping: verse from chorus, scene from act. Typically, class is used only as a formatting hook for stylesheets, but this is imposing some semantic content to the value. A more traditional SGML approach would be to have different elements for each level of grouping: <act><scene>...</scene>...</act>, <chorus> and <bridge>. I'm essentially trading for simplicity here, on the assumption that most files will be very flat anyway, and dumb parsers will mostly be ignore the hierarchical structure. The <section> tags are for pretty-printing and machinability. The former is entirely covered by the class attibute, but I'm not sure of the propriety for the second. The first thing inside the <section> element is an optional header, with things like who's speaking the lines, or the location of the shot if it's a scene. This is a bit ugly, in that we can't say what goes in the scene heading of a screenplay verses the chorus of a song. Hence the traditional approach. Inside the innermost section come a series of <line> elements, each marking a line of the song, or a line of dialog. This is the most specific of the structural elements and cannot be nested. At the same level we'll probably want things like <action> to describe the blocking and maybe something like <sfx>, though that could be handed as another actor. Inside the line-level tags, we have inline markup. <emph> for emphasis, something like <character> (can you think of a better name?) for marking names and props, which are often specially formatted for clarity, a <peren> for perenthetical direction. That sort of thing. All of these could have a class attribute, of course. We also have a <span> tag that exists pretty much exclusively for adding attributes to bits of text. This can be used for additional formatting keyed to a stylesheet, or to put syllable-by-syllable timestamps on karaoke tracks. We also generally allow an 'id' attribute on any element for cross-referencing and unique identification. Allowing XLink/XPointer would also be a reasonable idea, though I wouldn't require that the parser support following the links. Timestamps: I suggest three timestamp attributes: a start time, and optionally either a stop time or a duration. If there's only a start time, the player can just display it as until the next stamped element comes up. The timestamps can be nested where their associated element tags can be. In these cases the higher-level stamps should encompass the lower ones, and the lower-level ones take precedence in display as more specific. Exactly how this is handled is up to the player, but for example, a karaoke application might use <line> level timestamps to display a line at a time, but hilight each syllable as it goes by according to the span tags. For the value format I'd like to allow just a few options. Normal time relative to the start of the track, with precision given by extension to decimal seconds or smpte subframes. "2:32" or "0:34:63:14" A raw integer, defaulting to the elapsed time in milliseconds, possibly in units of an arbitrary 'timebase' specified as an attribute of the opening transcript tag. Finally, I want to include absolute ISO timestamps for marking live events. "1999-10-15T17:54:19.78Z" or "2000-08-23T13:19-700" That's about it. Yes, I'm invoking stylesheets and lots of other complicated machinery. I want to have that power for future flexibility and high-quality output; that's the whole point of xml. But I maintain we can still write a small dedicated parser that just throws up the <line> tags as they come. That's a pretty broad spectrum. Comments welcome! -ralph -- giles@ashlu.bc.ca <HR NOSHADE> <UL> <LI>TEXT/PLAIN attachment: example-2.xml </UL> -------------- next part -------------- A non-text attachment was scrubbed... Name: example-2.xml Type: application/octet-stream Size: 2925 bytes Desc: not available Url : http://lists.xiph.org/pipermail/vorbis/attachments/20000828/75a1ffa7/example-2-0001.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: example-3.xml Type: application/octet-stream Size: 3910 bytes Desc: not available Url : http://lists.xiph.org/pipermail/vorbis/attachments/20000828/75a1ffa7/example-3-0001.obj