Hi all, Siliva contacted me about this OggPCM proposal and asked me to join in. For those who don't know me, I am the main author and maintainer of libsndfile and therefore know quite a bit about how uncompressed audio is stored in sound files. However even I would not consider myself an expert; there are areas to do with channel assignments that I know I am ignorant of. I am also quite ignorant of the Ogg container format. I have now read: http://wiki.xiph.org/OggPCM and find that it has a number of short comings. a) There is no marker to distinguish little endian data from big endian data. b) There is no mention of audio data being help in double precision (64 bit) floating point. Current this is supported in libsndfile by WAV, AIFF, AU, IRCAM and the two different Matlab/Octave file formats (I may also have overlooked some). c) I think having separate fields for things like signed/ unsigned/float and bit width is a mistake. I would suggest instead a single field that encodes all this information in a enumeration. Ie: OGG_PCM_U8 /* Unsigned 8 bit */ OGG_PCM_S8 /* Signed 8 bit. */ OGG_PCM_S16 OGG_PCM_S24 OGG_PCM_S32 OGG_PCM_FLOAT32 OGG_PCM_FLOAT64 and so on. This scheme makes it very difficult to get signed/unsigned and bitwith messed up. d) Don't bother implementing unsigned PCM for bit widths greater than 8 bits. No other common file format uses it and those unsigned formats are a pain to work with. e) Consider whether the endianness should also be encoded in the enumeration above. I would recommend that it is resulting in: OGG_PCM_U8 /* Unsigned 8 bit */ OGG_PCM_S8 /* Signed 8 bit. */ OGG_PCM_LE_S16 OGG_PCM_BE_S16 OGG_PCM_LE_S24 OGG_PCM_BE_S24 ... OGG_PCM_LE_FLOAT32 OGG_PCM_BE_FLOAT32 ... f) Encoding of channel information. In a two channel file, is the audio data a stereo image or two distinct mono channels? For a file with N (> 2) channels, are there pairs of channels which should be considered as a stereo pairs or do you want to place these stereo pairs as separate streams within a single ogg container? What about multi channel surround sound (there are a number of different formats like 5.1 and 7.1) or quadraphonic? How are you going to specify which channel is which. Being able to encode this stuff easily is **vital**. g) With things like surround sound, are you going to allow 24 bit audio for the main stereo pair and 16 bits for the side channels? This might best be achieved using separate stream, but that would make channel information all that more important. Is it useful to have PCM for the main stereo pair and say vorbis encoding for the side channels? Please realize that this is all just off the top of my head. There may be a bunch of other stuff I have overlooked. Is it OK if I can get some other people that know more about this stuff involved? Erik -- +-----------------------------------------------------------+ Erik de Castro Lopo +-----------------------------------------------------------+ "I'm not proud .... We really haven't done everything we could to protect our customers ... Our products just aren't engineered for security." -- Brian Valentine, Senior Vice President of Microsoft's Windows development team
On Thu, Nov 10, 2005 at 10:13:19AM +1100, Erik de Castro Lopo wrote:> Is it OK if I can get some other people that know more about > this stuff involved?By all means. And thanks for responding so quickly, this is quite helpful. -r
On Thu, Nov 10, 2005 at 10:13:19AM +1100, Erik de Castro Lopo wrote:> > Siliva contacted me about this OggPCM proposal and asked me > to join in.Yes, she mentioned that she would. Thank you for your suggestions, they are well thought out and quite helpful. It's alot to process at once, and I (as the original author of the current OggPCM draft spec) will reply more fully soon with feedback.> I have now read: > > http://wiki.xiph.org/OggPCMAlso, http://wiki.xiph.org/Talk:OggPCM - or the discussion tab at the top of the page. That's where the debates on this are mostly ongoing..> Please realize that this is all just off the top of my head. > There may be a bunch of other stuff I have overlooked.How I feel, too.> Is it OK if I can get some other people that know more about > this stuff involved?As Ralph already said, Absolutly. -- The recognition of individual possibility, to allow each to be what she and he can be, rests inherently upon the availability of knowledge; The perpetuation of ignorance is the beginning of slavery. from "Die Gedanken Sind Frei": Free Software and the Struggle for Free Thought by Eben Moglen, General council of the Free Software Foundation
On Thu, Nov 10, 2005 at 10:13:19AM +1100, Erik de Castro Lopo wrote:> > a) There is no marker to distinguish little endian data > from big endian data.The original reason for this is because Ogg makes such a matter moot, since the bitpacker in libogg2 handles endian.. however, if a "chunk" packer is made available (similar to memcpy), this becomes important since we'll want to copy the data in which ever endian it already is. Does endian vary widely for raw audio codecs, or would it be reasonable to settle on one standard and expect all codecs to convert to the correct endian which don't comply with the "norm"? If most hardware supports one endian or another, I say we should stick to that, since that's what the codec plugins would export anyway.> b) There is no mention of audio data being help in double > precision (64 bit) floating point. Current this is > supported in libsndfile by WAV, AIFF, AU, IRCAM and the > two different Matlab/Octave file formats (I may also > have overlooked some).The bits per sample field covers this. Set this to "64" and set the data type to "float" and it "should just work"...> c) I think having separate fields for things like signed/ > unsigned/float and bit width is a mistake. I would suggest > instead a single field that encodes all this information > in a enumeration. Ie: > > OGG_PCM_U8 /* Unsigned 8 bit */ > OGG_PCM_S8 /* Signed 8 bit. */ > OGG_PCM_S16 > OGG_PCM_S24 > OGG_PCM_S32 > OGG_PCM_FLOAT32 > OGG_PCM_FLOAT64 > > and so on. This scheme makes it very difficult to get > signed/unsigned and bitwith messed up. > d) Don't bother implementing unsigned PCM for bit widths > greater than 8 bits. No other common file format uses > it and those unsigned formats are a pain to work with.Problem with this is inflexibility. See, not ever application must support every possible combination of formatting - in fact, many will require a very small set of parameters going in, ie, "it must be float of 16, 24, 32, or 64 bit" or "it must be 16 or 24 bit signed". Implementors will never, very likely, implement 32-bit unsigned int, and that is not an issue. If some fool does, his data will simply not be accessable to any other codec or application unless he writes a conversion plugin, which in essence, treats the two sides (from OggStream's perspective) as two entirely different codecs, even if both are in OggPCM format. The flexibility of this does, though, encourage stuff like 96bit audio. Anyone implementing a codec which uses this, and import/exports it, will also write the appropriate conversion OggStream plugin which will allow applications which only support, say, 16bit audio, to work with it. I guess you could chalk this up to an inherit difference in philosophy and purpose between OggPCM and RIFF/WAVE (.wav).. theirs is as much an interchange format as a storage codec, where OggPCM isn't really intended for storage. FLAC (Free Lossless Audio Codec) limits to a certain number of formats, and all decoders can decode these formats, and it's well suited for storage as a /compressed/ lossless codec.. As primarily an interchange codec, if you have some rare or new format being imported/exported from your new codec, you had better also make sure it can itself support more common formats (ie, 44100/16/2) or that you include a conversion plugin which does that for your users.> f) Encoding of channel information. In a two channel file, > is the audio data a stereo image or two distinct mono > channels? For a file with N (> 2) channels, are there > pairs of channels which should be considered as a stereo > pairs or do you want to place these stereo pairs as > separate streams within a single ogg container? What > about multi channel surround sound (there are a number > of different formats like 5.1 and 7.1) or quadraphonic? > How are you going to specify which channel is which. > Being able to encode this stuff easily is **vital**.I agree - this is something that wasn't on my radar until this morning when MikeS was asking about the channel layout in Vorbis/FLAC. How would you suggest this data be included in the binary header? I honestly have no experience with anything other than mono and stereo. It should all be in the same stream.> g) With things like surround sound, are you going to allow > 24 bit audio for the main stereo pair and 16 bits for > the side channels? This might best be achieved using > separate stream, but that would make channel information > all that more important. Is it useful to have PCM for the > main stereo pair and say vorbis encoding for the side > channels?Do people really do such things as encode different channels with different sample sizes (and, I assume, samplerates)? I'd really like to prefer keeping a fixed samplesize/samplerate for all channels. I really doubt any Ogg audio codec is going to get that complicated anytime soon, and if it's really needed, a codec plugin /could/ be fed/provide packets from multiple OggPCM bitstreams, just like how a+v codecs (ie, DV) would import/export OggPCM+OggYUV. Is there anything else you've thought of that we've missed? -- The recognition of individual possibility, to allow each to be what she and he can be, rests inherently upon the availability of knowledge; The perpetuation of ignorance is the beginning of slavery. from "Die Gedanken Sind Frei": Free Software and the Struggle for Free Thought by Eben Moglen, General council of the Free Software Foundation
----- Original Message ----- From: "Arc" <arc@Xiph.org> To: <ogg-dev@Xiph.org> Sent: Thursday, November 10, 2005 2:57 PM Subject: Re: [ogg-dev] OggPCM proposal feedback> On Thu, Nov 10, 2005 at 10:13:19AM +1100, Erik de Castro Lopo wrote: >> >> a) There is no marker to distinguish little endian data >> from big endian data. > > The original reason for this is because Ogg makes such a matter moot, > since the bitpacker in libogg2 handles endian.. however, if a "chunk"The problem i see with this, correct me if i'm wrong, but you are suggesting you are going to get a really large chunk of pcm and feed it into libogg word at a time, in order to have it's endianness possibly reversed ? And then on the other end, read it out word at a time ? If as you suggest this is primarily for interchange, then this seems a really inefficient way to pass data around. The reason it should allow specification for endianness is that, if it is to be used as an interchange format, it needs to be easily copied around in a format friendly to the host processer. Requiring the byte order flipping of every sample seems contrary to the purpose of such a format.> Problem with this is inflexibility. See, not ever application must > support every possible combination of formatting - in fact, many will > require a very small set of parameters going in, ie, "it must be float > of 16, 24, 32, or 64 bit" or "it must be 16 or 24 bit signed".What you are saying is that outputs will support some set of data formats, and inputs will support some other set of data inputs. In other words, each component only supports the types it knows how to do something with. And so, the hypothetical situation of some new format, not in the enumeration comes along, and of course, the components won't support it since they only support their subset of data they know how to handle. So, a new value needs to be added to the enumeration. The result is, existing components (which didn't support the new format anyway), will still be in the same situation if they come across a new file, with an enumeration value they don't understand. So, if a new format comes along which does require a new enumeration value, then the components are going to have to be modified to support that anyway, and the adding another value to the list they support is a non-issue. The only possible way it's inflexible is if the number of values in the enumeration exceeds the size of the field it has. Probably unlikely even with 8 bits, almost certainly unlikely for 16.> Implementors will never, very likely, implement 32-bit unsigned int, and > that is not an issue. If some fool does, his data will simply not be > accessable to any other codec or application unless he writes a > conversion plugin, which in essence, treats the two sides (from > OggStream's perspective) as two entirely different codecs, even if both > are in OggPCM format.I know you are working on OggStream, and this is the perspective you are taking, but other implementations ie directshow, quicktime, mplayer, gstreamer don't and maybe won't be using oggstream. So i think we need to take a bigger picture approach what assumptions are being made here.> I guess you could chalk this up to an inherit difference in philosophy > and purpose between OggPCM and RIFF/WAVE (.wav).. theirs is as much an > interchange format as a storage codec, where OggPCM isn't really > intended for storage. FLAC (Free Lossless Audio Codec) limits to a > certain number of formats, and all decoders can decode these formats, > and it's well suited for storage as a /compressed/ lossless codec..This is another issue i see as a problem, why is it only an interchange format ? What is the rationale for that ? There are many people who want a storage format. Are you suggesting that another raw storage format also be made ? I think this is the wrong approach, flac and other codecs operate on a tighter subset, because they have to perform complex transformations on the data, and supporting too many types increase complexity. A raw format essentially needs no processing, it just needs copying into a buffer that supports that type of data.> I'd really like to prefer keeping a fixed samplesize/samplerate for all > channels. I really doubt any Ogg audio codec is going to get that > complicated anytime soon, and if it's really needed, a codec plugin > /could/ be fed/provide packets from multiple OggPCM bitstreams, just > like how a+v codecs (ie, DV) would import/export OggPCM+OggYUV.I think this is another problem with your approach, you are assuming that ogg is a closed format, and that only "ogg" formats are the issue here. Ogg is a generic container format, other codecs can and will be used inside it. Making assumptions based only on the current set of xiph codecs in my opinon is a little narrow focussed. Zen.
Arc wrote:> Does endian vary widely for raw audio codecs,Well there are really only two endian-nesses, big and little. WAV is usually little endian but there is also a (very rare) big endian version. AIFF is usually little endian but also supports big endian encoding. CAF, AU, IRCAM and a number of others support both endian-nesses equally.> or would it be reasonable > to settle on one standard and expect all codecs to convert to the > correct endian which don't comply with the "norm"?Not reasonable.> The bits per sample field covers this. Set this to "64" and set the > data type to "float" and it "should just work"...See my comment on the wiki: http://wiki.xiph.org/Talk:OggPCM Most importantly: "Please don't make determination of the data format depend on multiple fields. Instead use an enumeration so that something like little endian 16 bit PCM can be specifed as OGG_PCM_LE_PCM_16 and big endian 64 bit doubles can be specified as OGG_PCM_BE_FLOAT_64. This scheme is far more transparent and self documenting. If the format field is 8 bits, this scheme supports 256 formats; if its 16 bit it will support 65536 formats.> > c) I think having separate fields for things like signed/ > > unsigned/float and bit width is a mistake. I would suggest > > instead a single field that encodes all this information > > in a enumeration. Ie: > > > > OGG_PCM_U8 /* Unsigned 8 bit */ > > OGG_PCM_S8 /* Signed 8 bit. */ > > OGG_PCM_S16 > > OGG_PCM_S24 > > OGG_PCM_S32 > > OGG_PCM_FLOAT32 > > OGG_PCM_FLOAT64 > > > > and so on. This scheme makes it very difficult to get > > signed/unsigned and bitwith messed up.You didn't address this issue. Do you think it is unimportant?> > d) Don't bother implementing unsigned PCM for bit widths > > greater than 8 bits. No other common file format uses > > it and those unsigned formats are a pain to work with. > > Problem with this is inflexibility. See, not ever application must > support every possible combination of formatting -Exactly, a codec could support OGG_PCM_S16, OGG_PCM_FLOAT32 and thats it. If the decoder in the codec wants to figure out if it supports the current file it can do: if (format != OGG_PCM_S16 && format != OGG_PCM_FLOAT32) ooops_we_dont_handle_this ("some error message"); This is far less error prone than: if (! (bitwdith == 16 && signed && data_format == OGG_PCM_PCM) || ! (bitwdith == 32 && data_format == OGG_PCM_FLOAT)) ooops_we_dont_handle_this ("some error message");> in fact, many will require a very small set of parameters going in,My propsal has a small number of parameters; one. I don't thinks its practical to have zero parameters. How this: switch (format) { case OGG_PCM_S8 : case OGG_PCM_FLOAT32 : case OGG_PCM_FLOAT64 : /* ALl Ok. */ break ; default: ooops_we_dont_handle_this ("some error message"); break ; } Its hard to get this wrong and its obvious when it is wrong.> ie, "it must be float > of 16, 24, 32, or 64 bit"There is no such thing as 16 and 24 bit float.> Implementors will never, very likely, implement 32-bit unsigned int,My point exactly. So why even make it possible? If that changes at some point in the future add the enumeration.> > f) Encoding of channel information. In a two channel file, > > is the audio data a stereo image or two distinct mono > > channels? For a file with N (> 2) channels, are there > > pairs of channels which should be considered as a stereo > > pairs or do you want to place these stereo pairs as > > separate streams within a single ogg container? What > > about multi channel surround sound (there are a number > > of different formats like 5.1 and 7.1) or quadraphonic? > > How are you going to specify which channel is which. > > Being able to encode this stuff easily is **vital**. > > I agree - this is something that wasn't on my radar until this morning > when MikeS was asking about the channel layout in Vorbis/FLAC. How > would you suggest this data be included in the binary header? I > honestly have no experience with anything other than mono and stereo.I have little more experience than you. I sent invitations for people to join this discussion to the music-dsp mailing list. I hope somebody knowledgeable will show up.> > g) With things like surround sound, are you going to allow > > 24 bit audio for the main stereo pair and 16 bits for > > the side channels? This might best be achieved using > > separate stream, but that would make channel information > > all that more important. Is it useful to have PCM for the > > main stereo pair and say vorbis encoding for the side > > channels? > > Do people really do such things as encode different channels with > different sample sizes (and, I assume, samplerates)?Different bitwidth makes sense. You need to high dynamic range on your main stereo signal, but probably not on the side channels. Different sample rates also makes sense. If the main stereo pair is sampled at 96kHz it makes sense to have the sub bass signal (ie all the low frequencies) sampled at a much lower rate. For a sub-bass signal 8kHz might be appropriate.> I'd really like to prefer keeping a fixed samplesize/samplerate for all > channels. I really doubt any Ogg audio codec is going to get that > complicated anytime soon,Really? What about a high quality Ogg video stream multiplexed with a 5.1 audio stream?> Is there anything else you've thought of that we've missed?Not yet, but we haven't heard from anyone else yet. I would like to see input (or at least an OK) from a large number of people in the audio field. Erik -- +-----------------------------------------------------------+ Erik de Castro Lopo +-----------------------------------------------------------+ 'Unix beats Windows' - says Microsoft! http://blogs.zdnet.com/Murphy/index.php?p=459
Hi,> The flexibility of this does, though, encourage stuff like 96bit audio. > Anyone implementing a codec which uses this, and import/exports it, will > also write the appropriate conversion OggStream plugin which will allow > applications which only support, say, 16bit audio, to work with it.Do you think the noise in your 16bit application will sound different between a conversion from a 96bit or 80bit audio file from the same analog source ? If the argument for keeping these fields freeform is to support 96bit audio, I'd say Erik is right that you shouldn't pick freeform fields. As a practical matter, I don't see a direct use case for a file/interchange format with a 540 dB dynamical range. Thomas Dave/Dina : future TV today ! - http://www.davedina.org/ <-*- thomas (dot) apestaart (dot) org -*-> I'm emotionally raped by Jesus <-*- thomas (at) apestaart (dot) org -*-> URGent, best radio on the net - 24/7 ! - http://urgent.fm/
Erik de Castro Lopo wrote:> f) Encoding of channel information. In a two channel file, > is the audio data a stereo image or two distinct mono > channels? For a file with N (> 2) channels, are there > pairs of channels which should be considered as a stereo > pairs or do you want to place these stereo pairs as > separate streams within a single ogg container? What > about multi channel surround sound (there are a number > of different formats like 5.1 and 7.1) or quadraphonic? > How are you going to specify which channel is which. > Being able to encode this stuff easily is **vital**.please don't forget ambisonics :-) 2 channel UHJ 4 channel 1st order 9 channel 2nd order 16 channel 3rd order etc.
I threw a rough draft of an alternative format incorporating the comments received so far in this discussion on the wiki: http://wiki.xiph.org/index.php/OggPCM#Format Oliver, This seems to me like it would support the ambisonic requirements you mention, though it doesn't (and I imagine won't) describe the mic locations. Somebody who actually uses that info could probably define extra header pages for a later version of this spec. I hadn't even heard of ambisonics until your post, to be honest. John oliver oli wrote:> Erik de Castro Lopo wrote: > >> f) Encoding of channel information. In a two channel file, >> is the audio data a stereo image or two distinct mono >> channels? For a file with N (> 2) channels, are there pairs >> of channels which should be considered as a stereo >> pairs or do you want to place these stereo pairs as >> separate streams within a single ogg container? What >> about multi channel surround sound (there are a number >> of different formats like 5.1 and 7.1) or quadraphonic? How >> are you going to specify which channel is which. Being able to >> encode this stuff easily is **vital**. > > > please don't forget ambisonics :-) > > 2 channel UHJ > 4 channel 1st order > 9 channel 2nd order > 16 channel 3rd order > > etc. > _______________________________________________ > ogg-dev mailing list > ogg-dev@xiph.org > http://lists.xiph.org/mailman/listinfo/ogg-dev