I made a few updates to OggPCM2 http://wiki.xiph.org/index.php/OggPCM2 reflecting the latest discussions. Could everyone have a look at it and see if they agree. Otherwise, what do you feel should be changed? Anyone wants to speak in support of chunked PCM? For all those that are just tired of this mess like me, please express yourself in the new spec I created: OggPCM3 http://wiki.xiph.org/index.php/OggPCM3 Jean-Marc P.S. So far, I think we have OggPCM2 5, OggPCM 0. Please vote for OggPCM3! :-) Le mardi 15 novembre 2005 ? 11:21 +0100, Michael Smith a ?crit :> On 11/15/05, Erik de Castro Lopo <mle+xiph@mega-nerd.com> wrote: > > Hi all, > > > > The remaining issue to be decided for the OggPCM2 spec is the support > > of chunked vs interleaved data. > > I think interleaved is the obvious choice - that's what most audio > applications are used to dealing with, it's what we need to feed to > audio hardware in the end usually, etc. > > Whilst I accept that there are many good uses for chunked data, I > think the transformation is trivial, particularly given certain > characteristics of the Ogg container. Remember, the data, if you read > an ogg stream into memory, is _already_ likely to be non-contiguous, > due to ogg's structure. It's trivial, and has insignificant additional > overhead, to de-interleave as you read it into a packet buffer. > > So, you've forced additional implementation complexity onto all > implementations, but the benefits aren't obviously significant. > > Oh, and if it's not already obvious, I support this spec, rather than Arc's. > > Mike > _______________________________________________ > ogg-dev mailing list > ogg-dev@xiph.org > http://lists.xiph.org/mailman/listinfo/ogg-dev >
Jean-Marc Valin wrote:> I made a few updates to OggPCM2 http://wiki.xiph.org/index.php/OggPCM2 > reflecting the latest discussions. Could everyone have a look at it and > see if they agree. Otherwise, what do you feel should be changed?You guys are probably off on some IRC channel somewhere discussing these things, but... why 64 bits for the CODEC identifier? 32 ("PCM ") should be fine? Why store N-bit in the most significant bits and not least? Doesn't that mean an application would likely need to shift everything down again? Pedantic: the sentence "Format IDs below 0x80000000 are reserved for use by Xiph and all the ones above are allowed for application-specific formats" leaves the use of 0x80000000 itself unspecified. Rene.
Rene Herman wrote:> Why store N-bit in the most significant bits and not least? Doesn't > that mean an application would likely need to shift everything down > again?One advantage of storing in the MSB's is that the relative value remains correct when processed as the larger word size. For instance, a signed 12 bit integer would use 0x400 to represent +50% amplitude. By packing this value into the MSB's of a 16 bit word, you get 0x4000, which still represents a +50% amplitude. This way any software that can work on 16 bit samples will "do the right thing" on samples with lower resolution. One thing that should probably be added to the wiki is that the extra bits should be set in a round-towards-zero fashion - eg, 0 for positive numbers, 1 for negative numbers. This is probably worth discussing. Should we do it as I propose here, or is truncation a better way to go?> Pedantic: the sentence "Format IDs below 0x80000000 are reserved for > use by Xiph and all the ones above are allowed for > application-specific formats" leaves the use of 0x80000000 itself > unspecified. >Agreed. Perhaps "Format IDs with the most significant bit cleared are reserved for use by Xiph. Other formats are considered to be application specific, and MUST have this bit set." Objections? John
> You guys are probably off on some IRC channel somewhere discussing these > things, but... why 64 bits for the CODEC identifier? 32 ("PCM ") should > be fine?The only reason for having 64 bits is that most other Xiph codecs tend to have about that length. I don't think it causes a real problem anyway. Jean-Marc
On 2005-11-16, Jean-Marc Valin wrote:> Otherwise, what do you feel should be changed?One obvious thing that seems to be lacking is the granulepos mapping. As suggested in Ogg documentation, for audio a simple sampling frame number ought to suffice, but I think the convention should still be spelled out. Secondly, I'd like to see the channel map fleshed out in more detail. (Beware of the pet peeve...) IMO the mapping should cover at least the channel assignments possible in WAVE files, the most common Ambisonic ones, and perhaps some added channel interpretations like "surround" which are commonly used but lacking in most file formats. (For example, THX does not treat surround as a directional source, so the correct semantics cannot be captured e.g. by WAVE files. Surprisingly neither can the fact that some pair of channels is Dolby Surround encoded, as opposed to some form of vanilla stereo.) (As a further idea prompted by ambisonic compatibility encodings, I'd also like to explore the possibility of multiple tagging. For example, Dolby Surround, Circle Surround, Logic 7 and ambisonic BHJ are all designed to be stereo compatible so that a legacy decoder can play them as-is. But if they are tagged as something besides normal stereo, such a decoder will probably just ignore them. So, there's a case to be made for overlapping, preferential tags, one telling the decoder that the data *can* be played as stereo, another one telling that it *should* be interpreted as, say, BHJ, and so on. Object minded folks can think of this as type inheritance of a kind. But of course this is more food-for-thought than must-have-feature since nobody else is doing anything of the sort at the moment.)> Anyone wants to speak in support of chunked PCM?Actually I'd like to add a general point against it. The chunked vs. interleaved question is an instance of the more general problem of efficiently linearizing a multidimensional structure. We want to do this so that typical access patterns (and in particular locality of access) translate gracefully and efficiently. Thus we group primarily by time (interleaving) when locality is by time (accessing a sample with a given sampling time most increases the odds that a sample with a close by sampling time is soon accessed) and primarily by channel (chunking) when locality is by channel (accessing a channel will make it probable that the same channel is accessed again); we also try to preserve rough order of access. Ogg is primarily a streaming delivery application, so we usually access Ogg data by ascending time. Ogg does not support nonlinear space allocation or in-place modification, so editors which are probably the most important application in need of independently accessible channels will not be using it as an intermediate format in any case. We're also talking about multichannel audio delivery where the different channels are best thought of as part of a single multidimensional signal, not a library-in-a-file type collection of independent signals, so it can be argued that the individual channels do not really make sense in isolation. In this case access won't merely be localised in time, but in fact the natural access pattern for recorders, transmitters, players and even some filters is a dense, temporally ascending scan over some interleaved channel ordering. If we think of Ogg as a line format, all this translates into lower packetization latency and memory requirements (buffer per multichannel stream vs. buffer per channel) for interleaved data; if we think of Ogg as a file format it translates into fewer seeks and less framing overhead while streaming from disk. In most cases a chunked layout has no countervailing benefits. Even interfaces which go with separate channels aren't such a good reason to offer a chunking option because were probably designed with some other application (like interactive gaming or offloading processing load onto a peripheral) in mind, or might simply be badly engineered (just about anything from MS). Furthermore, if we really encounter an application which would benefit from grouping by channel (say, language variants of the same soundtrack), that can already be accomplished via multiple logical streams. In fact the multiplexing machinery is there for this precise purpose: the packet structure is a deliberate tradeoff between the temporal order always present in streaming files and the conflicting interest in limiting latency, error propagation and buffer consumption, brought on by parallelism, correlations and indivisibilities over dimensions other than time. If the channels are so independent of each other or so internally cohesive that chunking is justified, then they ought to be independent enough for standalone use and for placement in separate logical streams, or even separate files. Whatever interdependencies they might have ought to be exposed to the consumer via OggSkeleton or external metadata in any case. Thus whatever we want to accomplish by chunking is probably better accomplished by the broader Ogg framework, or by some mechanism besides Ogg altogether. The only valid reason to chunk the data I can think of is bitrate peeling: chunking means that entire chunks/packets can be skipped to drop channels. But this clearly isn't the best way to go about peeling because, as I said, audio channels tend to be tightly coupled. We don't go from stereo to mono by cleaving off the right or left channel, but by summing, and if we simply drop a surround channel, we'll also break any multichannel panning law. Thus if we want to enable peeling, we have to use things akin to mid/side coding (like the UHJ hierarchy) or joint progressive coding over the entire set of channels (e.g. Vorbis's progressive vector quantization), and only then reorder and chunk the data. As a result this sort of stuff will always be encoding dependent and it shouldn't be specified at a higher level of generalization where the machinery could end up being used for the wrong sort of encoding (e.g. vanilla 5.1) and would impose its overheads (e.g. latency) indiscriminately. Not surprisingly this is how it's already done in Ogg: at least Vorbis specifies that peeling is to be carried out by a codec specific peeler operating within packets. The considerations which yielded this decision apply directly to an intermediate level abstraction like OggPCM (below Ogg multiplexing but also above a specific PCM coding like 16-bit big endian B-format), so I think incorporating a chunking option here would really represent a case of reinventing the wheel, square. (Newbie intro: I'm a 27-year old Finnish math/CS student and coder, with a long term personal interest in both audio processing and external memory algorithms, yet without an open source implementation background. I joined the list after OggPCM was mentioned on sursound, so it's also safe to assume I'm an ambisonic bigot.) -- Sampo Syreeni, aka decoy - mailto:decoy@iki.fi, tel:+358-50-5756111 student/math+cs/helsinki university, http://www.iki.fi/~decoy/front openpgp: 050985C2/025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2
> One obvious thing that seems to be lacking is the granulepos mapping. As > suggested in Ogg documentation, for audio a simple sampling frame number > ought to suffice, but I think the convention should still be spelled > out.I was under the (maybe wrong) impression that the Ogg spec already covered everything that's needed for granulepos. If that's not the case, please suggest some text.> Secondly, I'd like to see the channel map fleshed out in more detail. > (Beware of the pet peeve...) IMO the mapping should cover at least the > channel assignments possible in WAVE files, the most common Ambisonic > ones, and perhaps some added channel interpretations like "surround" > which are commonly used but lacking in most file formats. (For example, > THX does not treat surround as a directional source, so the correct > semantics cannot be captured e.g. by WAVE files. Surprisingly neither > can the fact that some pair of channels is Dolby Surround encoded, as > opposed to some form of vanilla stereo.)You mean describing the enums for "Channel Mapping Header" just like we have the the format? Yes, this definitely needs to be done. My comment about OggPCM2 being nearly done obviously didn't apply to the extra headers (which can still be defined afterwards anyway). Some default mappings may be useful too (e.g. by default, 2 channels is stereo and left is encoded first).> (As a further idea prompted by ambisonic compatibility encodings, I'd > also like to explore the possibility of multiple tagging. For example, > Dolby Surround, Circle Surround, Logic 7 and ambisonic BHJ are all > designed to be stereo compatible so that a legacy decoder can play them > as-is. But if they are tagged as something besides normal stereo, such a > decoder will probably just ignore them. So, there's a case to be made > for overlapping, preferential tags, one telling the decoder that the > data *can* be played as stereo, another one telling that it *should* be > interpreted as, say, BHJ, and so on. Object minded folks can think of > this as type inheritance of a kind. But of course this is more > food-for-thought than must-have-feature since nobody else is doing > anything of the sort at the moment.)I would say that this can probably be handled by the "Channel Conversion Header", don't you think. I was also wondering if it was a good to actually suggest (as in "implementers SHOULD") certain default mappings, for example in down-sampling from stereo to mono and all.> > Anyone wants to speak in support of chunked PCM? > > Actually I'd like to add a general point against it.Good :-) Jean-Marc
Sampo Syreeni wrote:> Secondly, I'd like to see the channel map fleshed out in more detail.Sampo, I'm the one who came up with the channel mapping this. Let me flesh it out a bit more this evening.> (As a further idea prompted by ambisonic compatibility encodings, I'd > also like to explore the possibility of multiple tagging. For example, > Dolby Surround, Circle Surround, Logic 7 and ambisonic BHJ are all > designed to be stereo compatible so that a legacy decoder can play them > as-is. But if they are tagged as something besides normal stereo, such a > decoder will probably just ignore them. So, there's a case to be made > for overlapping, preferential tags, one telling the decoder that the > data *can* be played as stereo, another one telling that it *should* be > interpreted as, say, BHJ, and so on. Object minded folks can think of > this as type inheritance of a kind. But of course this is more > food-for-thought than must-have-feature since nobody else is doing > anything of the sort at the moment.)Doesn't the Channel Conversion Header fulfil this need? Maybe it needs a bit more explanation and an example.> > Anyone wants to speak in support of chunked PCM? > > Actually I'd like to add a general point against it.Thanks for speaking up. Opinion noted. Erik -- +-----------------------------------------------------------+ Erik de Castro Lopo +-----------------------------------------------------------+ A Microsoft Certified System Engineer is to computing what a MacDonalds Certified Food Specialist is to fine cuisine.
Sampo Syreeni wrote:> Secondly, I'd like to see the channel map fleshed out in more detail.Sampo, I did flesh out the wiki a **little** more. Is the intent clearer now?> (Beware of the pet peeve...)What is that pet peeve?> IMO the mapping should cover at least the > channel assignments possible in WAVE files, the most common Ambisonic > ones, and perhaps some added channel interpretations like "surround" > which are commonly used but lacking in most file formats.I haven't enumerated them all, but we should be able to without too much trouble, (For example,> THX does not treat surround as a directional source, so the correct > semantics cannot be captured e.g. by WAVE files.Do you have any more info about THX? I've searched the web and found little of any worth.> (As a further idea prompted by ambisonic compatibility encodings, I'd > also like to explore the possibility of multiple tagging. For example, > Dolby Surround, Circle Surround, Logic 7 and ambisonic BHJ are all > designed to be stereo compatible so that a legacy decoder can play them > as-is.Does the Channel Conversion Header cover this? Cheers, Erik -- +-----------------------------------------------------------+ Erik de Castro Lopo +-----------------------------------------------------------+ "The lusers I know are so clueless, that if they were dipped in clue musk and dropped in the middle of pack of horny clues, on clue prom night during clue happy hour, they still couldn't get a clue." --Michael Girdwood, in the monastery