Arc wrote:> Does endian vary widely for raw audio codecs,Well there are really only two endian-nesses, big and little. WAV is usually little endian but there is also a (very rare) big endian version. AIFF is usually little endian but also supports big endian encoding. CAF, AU, IRCAM and a number of others support both endian-nesses equally.> or would it be reasonable > to settle on one standard and expect all codecs to convert to the > correct endian which don't comply with the "norm"?Not reasonable.> The bits per sample field covers this. Set this to "64" and set the > data type to "float" and it "should just work"...See my comment on the wiki: http://wiki.xiph.org/Talk:OggPCM Most importantly: "Please don't make determination of the data format depend on multiple fields. Instead use an enumeration so that something like little endian 16 bit PCM can be specifed as OGG_PCM_LE_PCM_16 and big endian 64 bit doubles can be specified as OGG_PCM_BE_FLOAT_64. This scheme is far more transparent and self documenting. If the format field is 8 bits, this scheme supports 256 formats; if its 16 bit it will support 65536 formats.> > c) I think having separate fields for things like signed/ > > unsigned/float and bit width is a mistake. I would suggest > > instead a single field that encodes all this information > > in a enumeration. Ie: > > > > OGG_PCM_U8 /* Unsigned 8 bit */ > > OGG_PCM_S8 /* Signed 8 bit. */ > > OGG_PCM_S16 > > OGG_PCM_S24 > > OGG_PCM_S32 > > OGG_PCM_FLOAT32 > > OGG_PCM_FLOAT64 > > > > and so on. This scheme makes it very difficult to get > > signed/unsigned and bitwith messed up.You didn't address this issue. Do you think it is unimportant?> > d) Don't bother implementing unsigned PCM for bit widths > > greater than 8 bits. No other common file format uses > > it and those unsigned formats are a pain to work with. > > Problem with this is inflexibility. See, not ever application must > support every possible combination of formatting -Exactly, a codec could support OGG_PCM_S16, OGG_PCM_FLOAT32 and thats it. If the decoder in the codec wants to figure out if it supports the current file it can do: if (format != OGG_PCM_S16 && format != OGG_PCM_FLOAT32) ooops_we_dont_handle_this ("some error message"); This is far less error prone than: if (! (bitwdith == 16 && signed && data_format == OGG_PCM_PCM) || ! (bitwdith == 32 && data_format == OGG_PCM_FLOAT)) ooops_we_dont_handle_this ("some error message");> in fact, many will require a very small set of parameters going in,My propsal has a small number of parameters; one. I don't thinks its practical to have zero parameters. How this: switch (format) { case OGG_PCM_S8 : case OGG_PCM_FLOAT32 : case OGG_PCM_FLOAT64 : /* ALl Ok. */ break ; default: ooops_we_dont_handle_this ("some error message"); break ; } Its hard to get this wrong and its obvious when it is wrong.> ie, "it must be float > of 16, 24, 32, or 64 bit"There is no such thing as 16 and 24 bit float.> Implementors will never, very likely, implement 32-bit unsigned int,My point exactly. So why even make it possible? If that changes at some point in the future add the enumeration.> > f) Encoding of channel information. In a two channel file, > > is the audio data a stereo image or two distinct mono > > channels? For a file with N (> 2) channels, are there > > pairs of channels which should be considered as a stereo > > pairs or do you want to place these stereo pairs as > > separate streams within a single ogg container? What > > about multi channel surround sound (there are a number > > of different formats like 5.1 and 7.1) or quadraphonic? > > How are you going to specify which channel is which. > > Being able to encode this stuff easily is **vital**. > > I agree - this is something that wasn't on my radar until this morning > when MikeS was asking about the channel layout in Vorbis/FLAC. How > would you suggest this data be included in the binary header? I > honestly have no experience with anything other than mono and stereo.I have little more experience than you. I sent invitations for people to join this discussion to the music-dsp mailing list. I hope somebody knowledgeable will show up.> > g) With things like surround sound, are you going to allow > > 24 bit audio for the main stereo pair and 16 bits for > > the side channels? This might best be achieved using > > separate stream, but that would make channel information > > all that more important. Is it useful to have PCM for the > > main stereo pair and say vorbis encoding for the side > > channels? > > Do people really do such things as encode different channels with > different sample sizes (and, I assume, samplerates)?Different bitwidth makes sense. You need to high dynamic range on your main stereo signal, but probably not on the side channels. Different sample rates also makes sense. If the main stereo pair is sampled at 96kHz it makes sense to have the sub bass signal (ie all the low frequencies) sampled at a much lower rate. For a sub-bass signal 8kHz might be appropriate.> I'd really like to prefer keeping a fixed samplesize/samplerate for all > channels. I really doubt any Ogg audio codec is going to get that > complicated anytime soon,Really? What about a high quality Ogg video stream multiplexed with a 5.1 audio stream?> Is there anything else you've thought of that we've missed?Not yet, but we haven't heard from anyone else yet. I would like to see input (or at least an OK) from a large number of people in the audio field. Erik -- +-----------------------------------------------------------+ Erik de Castro Lopo +-----------------------------------------------------------+ 'Unix beats Windows' - says Microsoft! http://blogs.zdnet.com/Murphy/index.php?p=459
On Thu, Nov 10, 2005 at 07:03:43PM +1100, Erik de Castro Lopo wrote:> > WAV is usually little endian but there is also a (very rare) big endian > version. AIFF is usually little endian but also supports big endian > encoding. CAF, AU, IRCAM and a number of others support both endian-nesses > equally.This doesn't seem to be a large issue - a single bit in the header could specify it, 0=MSB, 1=LSB, or vice versa. VorbisFile will export either endianness, this seems to be the end of this part of the debate.> "Please don't make determination of the data format depend on > multiple fields. Instead use an enumeration so that something > like little endian 16 bit PCM can be specifed as OGG_PCM_LE_PCM_16 > and big endian 64 bit doubles can be specified as OGG_PCM_BE_FLOAT_64. > This scheme is far more transparent and self documenting. If the > format field is 8 bits, this scheme supports 256 formats; if its 16 > bit it will support 65536 formats.You're still working with the philosophy of FourCC-world, where based on wether a plugin or application supports a 32-bit identifier you know if it either has full support or no support. We aren't working by that philosophy. We do not need to maintain an table of predefined formats, extended each time someone wants to use a new format, since no application needs to support any combination of encoding parameters. Honestly, as far as I'm concerned unsigned samples can go away... almost nothing uses 8-bit samples anymore, and unsigned 8-bit even less so. However, support for (ie) 48-bit-float should not have to be created, the values for how many bits to use and wether it's int or float should be seperate, as should the number of channels/etc. On Thu, Nov 10, 2005 at 03:44:53PM +0800, illiminable wrote:> > I think this is the wrong approach, flac and other codecs operate on a > tighter subset, because they have to perform complex transformations on the > data, and supporting too many types increase complexity. A raw format > essentially needs no processing, it just needs copying into a buffer that > supports that type of data.The complexity isn't increased by added flexibility, and that flexibility completely eliminates the same issue created from FLAC - FLAC was designed to losslessly support every common audio format, and yet, you find it's subset of formats too tight. Don't you see the inherit issue here? It comes back to someone deciding which formats should be valid, and which ones wont, and enforcing that by using an index# to a table of supported formats vs leaving it freeform for future implementors to use. Changing the spec a bit, where the samplesize must be a multiple of 8 and may not exceed 128bit (4-bit field), seems like something worthwhile to eliminate the padding issue. But between float and int, why /not/ allow someone to do something insane like 96-bit audio? 20 years ago, we thought that 16 bit, or prehaps 24 bit, was the maximum we could do. Why would anyone want more than 24 bit? And yet, the issue was raised that 64-bit audio samples are nessesary. In another 20 years, will people be arguing that 128bit samples are nessesary? Or than 48bit is a good tradeoff between 32bit and 64bit? No - it does not increase complexity, nor does it impose any requirements on implementations, since instead of a 32-bit identifier we use the entire first packet of the stream to check for compatability. No, your media player does -NOT- have to support 256 channel audio, nor must it support audisonic, or 64-bit audio, etc. There's no reason, however, to force everything into artificial, arbitrary limitations based on what we believe is reasonable for today. If a media player only supports a subset of what the codec supports, that's completely fine and expected.> I have little more experience than you. I sent invitations for people > to join this discussion to the music-dsp mailing list. I hope somebody > knowledgeable will show up.There's a difference between experience and differences of design philosophy. This isn't the issue of right or wrong, but two different styles of designing codecs. Raw fourcc codecs are each setup for a different format, or small set of formats. RIFF/WAVE uses a subset of formats, expecting all applications which support it's FourCC to understand all those formats. Again, this is done under the concept that a codec should either be fully supported or unsupported. Whereas, not all audio codecs are going to support even the subset that you provided (64-bit float, for example). Nor are all applications which use Ogg going to support anything but 16-bit signed int, nor should they be expected to. I think it's reasonable to do away with unsigned because modern codecs just aren't going to use it, but I'm not going to try to predict wether someone will want to use 48-bit audio, or 128-bit audio, and wether they'll use int or float.> Different bitwidth makes sense. You need to high dynamic range > on your main stereo signal, but probably not on the side channels. > > Different sample rates also makes sense. If the main stereo pair > is sampled at 96kHz it makes sense to have the sub bass signal > (ie all the low frequencies) sampled at a much lower rate. For a > sub-bass signal 8kHz might be appropriate.I think, for these, given Ogg's use of granulepos and the syncing complexity which allowing different channels to be different rates and sizes, this is something best left to muxed raw channels and have any codec which supports this draw from the different raw channels.> Not yet, but we haven't heard from anyone else yet. I would like > to see input (or at least an OK) from a large number of people in > the audio field.I think this is good to emphasis - it's ok to support some combinations of formats which are not used, since they'll simply be ignored if they're infavorable to implement, but missing something nessesary is something we need to make sure not to do. I've put a reduced config set on the wiki. -- The recognition of individual possibility, to allow each to be what she and he can be, rests inherently upon the availability of knowledge; The perpetuation of ignorance is the beginning of slavery. from "Die Gedanken Sind Frei": Free Software and the Struggle for Free Thought by Eben Moglen, General council of the Free Software Foundation
On Thu, Nov 10, 2005 at 01:35:47PM -0800, Arc wrote:> > "Please don't make determination of the data format depend on > > multiple fields. Instead use an enumeration so that something > > like little endian 16 bit PCM can be specifed as OGG_PCM_LE_PCM_16 > > and big endian 64 bit doubles can be specified as OGG_PCM_BE_FLOAT_64. > > This scheme is far more transparent and self documenting. If the > > format field is 8 bits, this scheme supports 256 formats; if its 16 > > bit it will support 65536 formats. > > You're still working with the philosophy of FourCC-world, where based on > wether a plugin or application supports a 32-bit identifier you know if > it either has full support or no support.It's not just the FourCC world; lots of unixland audio code works this way too. I agree with de Castro Lopo. Making a general format and then saying people are free to only implement the subset they care about is contrary to Xiph's design philosophy and will lead directly to interoperability issues. It's better to specify and require a useful subset here. -r
Arc wrote:> However, support for (ie) 48-bit-float should not have to be created,Where are you going to find a 48 bit float? Is there an IEEE standard for that? I know some floating point DSP chips use a 48 bit float internally, but if there is more than one they are unlikely to be compatible and they certainly cannot be read by standard CPUs without having pull each value apart into separate sign, mantissa and exponent and then recreating a host CPU compatible floating point value from that.> But between float and int, why /not/ allow someone to do something > insane like 96-bit audio?I think putting contraints on insane people is a good thing. It saves the rest of us a lot of grief.> 20 years ago, we thought that 16 bit, or > prehaps 24 bit, was the maximum we could do.The only place I've ever heard of 96 bit anything is in image processing where R, G and B were each encoded as a float. If the RGB explanation is not what you are thinking about, do you realise that 96 bits gives a dynamic range of over 500 decibels? That means that if the largest sample corresponds to 1000 volts, the smallest sample will correspond to 5e-26 volts. I think you'll find that this is less than the voltage on an electron which is approx. 1e-19 volts.> And yet, the issue was raised that 64-bit audio samples > are nessesary.A couple of points: - 64 bit float is supported by most CPUs, 96 bit and 48 bits are not. - 64 bit float is used to prevent rounding errors in calculations. If you have a program that needs to pass data to another program (via a file) for processing and then get the data back (via a file again) for further processing, 64 bit float is a sensible option.> This isn't the issue of right or wrong, but two different > styles of designing codecs.See my comments above re insanity. Erik -- +-----------------------------------------------------------+ Erik de Castro Lopo +-----------------------------------------------------------+ "Unix and C are the ultimate computer viruses." -- Richard P Gabriel
Arc wrote:>You're still working with the philosophy of FourCC-world, where based on >wether a plugin or application supports a 32-bit identifier you know if >it either has full support or no support. > >We aren't working by that philosophy. We do not need to maintain an >table of predefined formats, extended each time someone wants to use a >new format, since no application needs to support any combination of >encoding parameters. > >You could pack the data into the low order part of a 32 bit word and treat the upper part as extended data, then people could use it like an enumeration if they want to. I think that this is different from the FourCC issue we've argued elsewhere though. The point of fully specifying everything is to allow applications to operate on data types that weren't known when they were created. It's for forward compatability of the applications, not the format. Arguing about whether to specify fields for everything now or to create enumerations for everything later has little bearing on what the format will be able to hold in the future. Actually, requiring fields constrains you a bit, because you have to identify all the relevant fields up front and hope that nobody invents a new one. Also, it's really hard to write applications that will operate sensibly on data types that haven't been invented yet, so striving for forward compatibility of the applications probably isn't worth much effort. Every audio lib and driver I've seen uses an enumeration to describe the data, and I'm sure plenty of smart people have had this discussion before. In the end, either way is fine with me, since they both fit my limited purposes, but I think this is a small issue compared to how to support multiple sampling rates.
On 11/11/05, Arc <arc@xiph.org> wrote:> Honestly, as far as I'm concerned unsigned samples can go away... almost > nothing uses 8-bit samples anymore, and unsigned 8-bit even less so.Speech samples used for speech analysis are commonly in 8bit mono. If I wanted to put them in Ogg for easy editing/muxing/whatever, I would definitely want them to be supported. Just my two cents. :)