On Tue, Jan 07, 2003 at 01:41:11AM +1100, Silvia.Pfeiffer@csiro.au wrote:> > 2) an I-D describing the "Ogg file format" written by me: > http://www.ietf.org/internet-drafts/draft-pfeiffer-ogg-fileformat-00.txt >There's atleast one error in this document: Ogg Theora encapsulates a Vorbis-encoded audio bitstream and a Tarkin-encoded video bitstream in a single physical Ogg bitstream. That's incorrect. Tarkin is a completely different codec, Theroa is based on VP3. Also, I do not believe that Theora encapsulates the Vorbis, but that they exist as concurrent, synced chains within Ogg. Monty would know more about this, as I have done very little studying of the Theora format. Also, if Monty would care to clairify... Ogg Vorbis puts a further constraint onto Ogg by specifying that concurrent multiplexing is not allowed in Ogg Vorbis files. Is this true? I've been wondering about this. I'm setting up an Ogg "editor" for the Freeform codebase (detailed at http://savannah.gnu.org/projects/freeform ) which will allow encoded Ogg files to be published, then allow the publisher, or another publisher, to crop, chain together, etc different pieces to form a new work. A good example of this is the Indymedia Newsreal project. This project has people from around the world submit up to 5 minute segments which are combined into a monthly news program, this is distributed both online and on physical mediums (VHS, VCD, DVD, etc) for viewing. Currently, these segments are submitted via DV tapes to have some person in a centralised location go through the tapes and combine. Then the tapes would be archived digitally and, at a lower bitrate, for download. An alternative to this would be to have the publishers encode high bitrate theora (Q6+) and upload as Ogg. Then someone, somewhere, will go through these segments and using the above mentioned codebase merge the show's intro, the segments, any media that would go between the segments, and the closing into one larger Ogg file. This file could then be used to generate the physical medias, be archived itself, and generate the lower bitrate (Q1) stream version (peeling???). One issue that comes up is the NTSC/PAL deal. Some media will be submitted as NTSC, some PAL, and there is actually two physical versions - NTSC and PAL. If two theora streams of different bitrates are chained together what will happen when you run it through a decoder? Another limitation to this, and Vorbis streams as well, is more elegant handling of stream-switching. Similar to Quicktime's "Effects", as I've been reading on, it think it'd be worthwhile to have an "effects" codec which did things like handle audio mixing (multiple concurrent vorbis, speex, and flac streams), video fading (overlapping theora streams), scaling/cropping/spacing theora streams like layers over eachother (ie, one frame can reduce in size or be pushed off the screen), and other basic video-editing effects. Applying these to pre-encoded media would allow cooperative projects to happen without the generation-loss that would result from decoding/editing/re-encoding and allow the seperation of different elements such as a movie with the soundtrack and vocals seperated so that the user could choose between english or spanish, while only having a simple speex stream for each language. Even when we're just talking Vorbis, it'd be nice to be able to have overlapping streams which an effects codec caused to crossfade, or with Speex being able to have voice over music at the tail end (both full volume), then fade out the vorbis stream, continue with the speex stream, then chain into a new speex stream at the same time as starting a new vorbis stream which is faded in. I'm imagining a cool mixing board/effects icecast2 stream source for handling all this on the fly with pre-encoded media. I'm saying all this to ask, is this possible to do within the current Ogg Vorbis specifications, if we added a codec to handle it? If a player (such as XMMS) runs into a codec it doesn't understand, will it play the rest? If it finds two Vorbis layers that overlap, will it die? It would kinda look like this: <p> Track 1 Crossfade Track 2 [AAAAAAAAAAAAAAAAAAAAAAAAAA][BBBBBBBBBBB] [CCCCCCCCCCC][DDDDDDDDDDDDDDDDDDDDDDDDDDDD] [ZZZZZZZZZZZ] A = Track 1 with n sec cut from tail B = n sec tail of Track 1 C = n sec lead of Track 2 D = the rest of Track 2 Z = effects codec I can understand if XMMS would silently ignore Z, but what is going to happen when it hits the end of A, chained together with B and C both being Vorbis streams? Will it silently ignore C or crash? I guess it wouldn't be so bad if old players simply chopped off the "crossfade" beginning of songs, since it would be typically less than two seconds, but crashing or dropping the live stream would be quite bad. I assume that only Monty knows enough about this to answer definitivly.. prehaps some of these questions can be incorporated into Theora specs? -------------- next part -------------- A non-text attachment was scrubbed... Name: part Type: application/pgp-signature Size: 188 bytes Desc: not available Url : http://lists.xiph.org/pipermail/theora-dev/attachments/20030106/dd3596b6/part-0001.pgp
Silvia.Pfeiffer@csiro.au
2004-Aug-06 15:01 UTC
[speex-dev] Update on Ogg-based IETF standard documents (MIME-types, file formats)
Hi everybody, this is an update on the developed Ogg IETF standard documents and their status. All of these documents are in the process of discussion and have not yet been accepted as standards. <p>The following Internet-Drafts (I-Ds) have been prepared for standardisation and submitted to the IETF: 1) an I-D requesting to register "application/ogg" as a mime-type written by Linus Walleij: http://www.ietf.org/internet-drafts/draft-walleij-ogg-mediatype-07.txt 2) an I-D describing the "Ogg file format" written by me: http://www.ietf.org/internet-drafts/draft-pfeiffer-ogg-fileformat-00.txt <p>The following three I-Ds exist but are not active I-Ds with the IETF: 3) an I-D describing how to encode vorbis over rtp from Phil Kerr: see attached file draft-kerr-vorbis-rtp-00.txt 4) an expired I-D requesting to register "audio/vorbis" as a mime-type for Vorbis audio, no matter if it is inside an Ogg or RTP stream written by Linus Walleij (there is no request for "audio/vorbis" in the vorbis over rtp I-D): http://www.watersprings.org/pub/id/draft-walleij-vorbis-mediatype-01.txt 5) an I-D requesting to register "audio/speex" as a mime-type for speex over RTP written by Greg Herlein, Jean-Marc Valin and Simon Morlat: http://www.speex.org/drafts/draft-herlein-speex-rtp-profile-05.txt <p>I'd like to encourage everyone in the Ogg community to support the I-D authors in their task to progress these documents through IETF by reading and commenting on them. The time to contribute is now as this can only progress through IETF with the consensus of the developer community. Regards, Silvia. <p> Network Working Group Phil Kerr Internet-Draft The Ogg Vorbis Community December 20, 2002 / OpenDrama Expires: June 20, 2003 <p> RTP Payload Format for Vorbis Encoded Audio <draft-kerr-avt-vorbis-rtp-00.txt> Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress". The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [1]. Copyright Notice Copyright (C) The Internet Society (2002). All Rights Reserved. Abstract This document describes a RTP payload format for transporting Vorbis encoded audio. Table of Contents 1. Introduction ........................................ x 2. Background .......................................... x 3. Payload Format ...................................... x 3.1 RTP Header .......................................... x 3.2 Payload Header ...................................... x 3.3 Payload Data ........................................ x 3.4 Example RTP Packet .................................. x 4. Frame Packetizing ................................... x 4.1 Example Fragmented Vorbis Packet .................... x 5. Codebooks ........................................... x 6. Security Considerations ............................. x 7. Acknowledgments ..................................... x 8. References .......................................... x 9. Full Copyright Statement ............................ x 10. Authors Address ..................................... x 1 Introduction This document describes how Vorbis encoded audio may be formatted for use as an RTP payload type. 2 Background The Xiph.org Foundation creates and defines codecs for use in multimedia that are not encumbered by patents and thus may be freely implemented by any individual or organization. Vorbis is the general purpose multi-channel audio codec created by the Xiph.org Foundation. Vorbis encoded audio is generally found within an Ogg format bitstream, which provides framing and synchronization. For the purposes of RTP transport, this layer is unnecessary, and so raw Vorbis packets are used in the payload. Vorbis packets are unbounded in length currently. At some future point there will likely be a practical limit placed on packet length. Typical Vorbis packet sizes are from very small (2-3 bytes) to quite large (8-12 kilobytes). The reference implementation [2] seems to make every packet less than ~800 bytes, except for the codebooks packet which are ~8-12 kilobytes. Within a RTP context the maximum Vorbis packet SHOULD be kept below the MTU size of 1500 octets, including the RTP and payload headers, to avoid fragmentation. 3 Payload Format The standard RTP header is followed by an 8 bit payload header, and then the payload data. 3.1 RTP Header 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P|X| CC |M| PT | sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | synchronization source (SSRC) identifier | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | contributing source (CSRC) identifiers | | ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The RTP header begins with an octet of fields (V, P, X, and CC) to support specialized RTP uses (see [4] and [5] for details). For Vorbis RTP applications, V is set to 2, and the P, X, and CC fields are set to 0. Marker (M): 1 bit Set to zero. Audio silence suppression not used. This conforms to section 4.1 of [6]. Payload Type (PT): 7 bits An RTP profile for a class of applications is expected to assign a payload type for this format, or a dynamically allocated payload type should be chosen which designates the payload as Vorbis. Sequence number: 16 bits The sequence number increments by one for each RTP data packet sent, and may be used by the receiver to detect packet loss and to restore packet sequence. This field is detailed further in [3]. Timestamp: 32 bits A timestamp representing the sampling time of the first sample of the first Vorbis packet in the RTP packet. The clock frequency MUST be set to the sample rate of the encoded audio data and is conveyed out-of-band. SSRC/CSRC identifiers: These two fields, 32 bits each with one SSRC field and a maximum of 16 CSRC field, are as defined in [3]. 3.2 Payload Header The first octet of the payload data is the payload header: 1 2 3 4 5 6 7 8 +---+---+---+---+---+---+---+---+ | C | F | R | # of packets | +---+---+---+---+---+---+---+---+ C: 1 bit Set to one if this is a continuation of a fragmented packet. F: 1 bit Set to one if the payload contains complete packets or if it contains the last fragment of a fragmented packet. R: 1 bit Reserved, must be set to zero by senders, and ignored by receivers. The last 5 bits are the number of complete packets in this payload. This provides for a maximum number of 32 Vorbis packets in the payload. If C is set to one, this number should be 0. 3.3 Payload Data If the payload contains a single Vorbis packet or a Vorbis packet fragment, the Vorbis packet data follows the payload header. For payloads which consist of multiple Vorbis packets, payload data consists of one octet representing the packet length followed by the packet data for each of the Vorbis packets in the payload. The Vorbis packet length octet is the length minus one. A value of 0 means a length of 1. The payload packing of the Vorbis data packets SHOULD follow the guidelines set-out in section 4.4 of [6] where the oldest packet occurs immediately after the RTP packet header. 3.4 Example RTP Packet Here is an example RTP packet containing two Vorbis packets. RTP Packet Header: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 8 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 2 |0|0| 0 |0| PT | sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp (in sample rate units) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | synchronization source (SSRC) identifier | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | contributing source (CSRC) identifiers | | ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Payload Data: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0|0|0| # pks: 2| len | vorbis data ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ...vorbis data... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ... | len | next vorbis packet data... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ <p>4 Frame Packetizing Each RTP packet contains either one complete Vorbis packet, one Vorbis packet fragment, or an integer number of complete Vorbis packets (upto a max of 32 packets, since the number of packets is defined by a 5 bit value). Any Vorbis packet that is larger than 256 octets and less than the path-MTU should be placed in a RTP packet by itself. Any Vorbis packet that is 256 bytes or less should be bundled in the RTP packet with as many Vorbis packets as will fit, up to a maximum of 32. If a Vorbis packet will not fit into the RTP packet, it must be fragmented. A fragmented packet has a zero in the last five bits of the payload header. Each fragment after the first will also set the Continued (C) bit to one in the payload header. The RTP packet containing the last fragment of the Vorbis packet will have the Marker (F) bit set to one. 4.1 Example Fragmented Vorbis Packet Here is an example fragmented Vorbis packet split over three RTP packets. RTP packet header details have been excluded from this example. Packet 1: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0|0|0| 0| len | vorbis data ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ...vorbis data... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The number of packets field is set to 0. Packet 2: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |1|0|0| 0| len | vorbis data ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ...vorbis data... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The C bit is set to 1 and the number of packets field is set to 0. For large Vorbis fragments there can be several of these type of payload packets. The maximum packet size should be no greater than the MTU of 1500 octets, including all RTP and payload headers. Packet 3: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |1|1|0| 0| len | vorbis data ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ...vorbis data... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ This is the last Vorbis fragment packet. The C and F bits are set and the packet count remains set to 0. 5 Codebooks To decode a Vorbis stream, a set of codebooks is required. These codebooks are allowed to change for each logical bitstream (for example, for each song encoded in a radio stream). The codebooks must be completely intact and a client can not decode a stream with an incomplete or corrupted set. A client connecting to a multicast RTP Vorbis session needs to get the first set of codebooks in some manner. These codebooks are typically between 4 kilobytes and 8 kilobytes in size. On joining a session the first packet sent MUST be a Vorbis codebook message. When codebooks change a new set are sent as a SR just prior to the Vorbis bitstream change as an APP defined RTCP message with the 4 octet name field set to VORC. This is the same format as the initial codebook packet. Codebook RTCP packets MUST set the padding (P) flag and add the appropriate padding octets needed to conform with section 6.6 of [3]. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P| subtype | PT=APP=204 | length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SSRC/CSRC | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | VORC | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | codebook checksum | codebook ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ A 16 bit 1's complement checksum of the codebook precedes the codebook datablock. This checksum is used to detect a corrupted codebook. If a checksum failure is detected an empty RR RTCP message, of APP type with the 4 octet name field set to VORR, is sent from the client. Transmission of the codebook back to the client SHOULD be handled as an unicast delivery to prevent a rogue client from generating an excessive number of codebook requests within a multicast stream, however multicast transmission of codebook request replies SHOULD be catered for at the application level. 6 Security Considerations RTP packets using this payload format are subject to the security considerations discussed in the RTP specification [3]. This implies that the confidentiality of the media stream is achieved by using encryption. Because the data compression used with this payload format is applied end-to-end, encryption may be performed on the compressed data. 7 Acknowledgments This I-D is a continuation of draft-moffitt-vorbis-rtp-00.txt. Thanks to the Ogg Vorbis Community and to the Xiph.org team, especially Jack Moffitt <jack@xiph.org>. 8 References 1. Key words for use in RFCs to Indicate Requirement Levels (RFC 2119). 2. libvorbis: Available from the Xiph website, http://www.xiph.org 3. RTP: A Transport Protocol for Real-Time Applications (RFC 1889). 4. RTP: A transport protocol for real-time applications. Work in progress, draft-ietf-avt-rtp-new-11.txt. 5. RTP Profile for Audio and Video Conferences with Minimal Control. Work in progress, draft-ietf-avt-profile-new-12.txt. 9 Full Copyright Statement Copyright (C) The Internet Society (2002). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. <p>10 Authors Address Phil Kerr Centre for Music Technology University of Glasgow email: philkerr@elec.gla.ac.uk WWW: http://www.xiph.org/ --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'speex-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.