Hi all, I updated the wiki with another rev of this format. Updates include support for 43 formats in 14 coding schemes, as derived from the ALSA API. This seemed like a good way to get a list of what the formats in common use out there are, so it should be fairly comprehensive. Modifications to the "rev 2" format: 1. Expanded the 'id' field to support more than 7 formats. Format id is now 16 bits, but the lower 6 bits are reserved to describe the storage packing. 2. Removed the ID word on the data packets and added a "number of comments" field to the header packet. This was done to preserve the alignment of the data payload. 3. Changed the byte ordering in the header packet to be big endian, to be consistent with network byte order. New since "rev 2": 1. Added the notion of chunked vs interleaved storage. For discussion: 1. How useful is the 6 reserved bits in the format id? I'm a little uncomfortable with it, since it's only 70% (45/64) efficient, and I can't think of anything really useful to do with the data, but maybe someone else can. On the other hand, support for 1024 different coding types in the upper 10 bits seems sufficient to me, and you could in theory create a flag in a later minor rev to use some of the reserved space in the flags field as extra bits for the coding type, so it may not be too bad to keep it. 2. Does supporting both chunked and interleaved storage place too much of a burden on the applications? There are definite performance related advantages to storing data in a chunked format for some operations. 3. Does Ogg support zero length data packets? This was something I added as an afterthought, to support the case where an application might not know that the packet it just stuffed into the stream was actually the last packet, so I thought it might be useful to be able to store a zero length data packet with the EOS flag set that an application could use to finalize the stream. Anyway, have at it, and thanks in advance for the feedback. Cheers, John
> 3. Does Ogg support zero length data packets? This was something I added > as an afterthought, to support the case where an application might not > know that the packet it just stuffed into the stream was actually the last > packet, so I thought it might be useful to be able to store a zero length > data packet with the EOS flag set that an application could use to > finalize the stream.Ogg explicitly supports empty pages with the EOS flag set for precisely this purpose. At least in theory (i.e. according to spec); I haven't actually tested this with real applications. So codecs don't need zero-length packets to tag this case. I'm not sure whether zero length packets would work, but I see no obvious reason why they couldn't; I think they probably do. Mike
Hi John, jkoleszar@on2.com wrote:> Hi all, > > I updated the wiki with another rev of this format. Updates include > support for 43 formats in 14 coding schemes, as derived from the ALSA API. > This seemed like a good way to get a list of what the formats in common > use out there are, so it should be fairly comprehensive.Unfortunately the ALSA API defines a number of formats which are in practice extremely rare. In particular, any unsigned int format larger than 8 bits. For instance, the only unsigned int type that libsndfile supports is unsigned 8 bit. I would also stringly advise against supporting **any** APDCM format. These things are a PITA to support and some cannot be supported without extending the header. For instance, microsoft's ADPCM requires that a set of 8 coefficients require dor decoding be sent in the header. Most of the other ADPCM have blocks sizes that need to be sent. All in all this is a huge PITA. In comparison to FLAC, Speex and Vorbis, APDCM formats have little to offer.> Modifications to the "rev 2" format: > 1. Expanded the 'id' field to support more than 7 formats. Format id is > now 16 bits, but the lower 6 bits are reserved to describe the storage > packing.I still think that assigning meaning to bits within the format field is a mistake. Specifying bits like this could only be useful if you expect the decoder to generate code on the fly when it gets asked to decode say 16 bit, unsigned, little endian. Auto generated code that automagically supports all of these formats is significantly harder to write and debug than the equivalent set of single purpose decoders so I would suggest that this auto-magic stuff is a bad idea. Ergo, assigning meaning to the bits is a bad idea as well.> New since "rev 2": > 1. Added the notion of chunked vs interleaved storage.Again, I strongly recommend against allowing non-interleaved data. It simply complicates everything far more than it needs to be.> For discussion: > 1. How useful is the 6 reserved bits in the format id? I'm a littleReserved for what? I can't possibly think what it could be used for.> uncomfortable with it, since it's only 70% (45/64) efficient,This is a file header. Even under the most bloated scheme we could think of, its unlikely to be more than 100 bytes and it will be followed by hundreds of kilobytes at least of audio data. So why are we trying to conserve a couple of bits in the header?> 2. Does supporting both chunked and interleaved storage place too much of > a burden on the applications?I believe it does. Cheers, Erik -- +-----------------------------------------------------------+ Erik de Castro Lopo +-----------------------------------------------------------+ "That being done, all you have to do next is call free() slightly less often than malloc(). You may want to examine the Solaris system libraries for a particularly ambitious implementation of this technique." -- Eric O'Dell (comp.lang.dylan)
> Unfortunately the ALSA API defines a number of formats which are > in practice extremely rare. In particular, any unsigned int format > larger than 8 bits. For instance, the only unsigned int type that > libsndfile supports is unsigned 8 bit.I expected this, it just seemed like a good starting point to get more than 7 formats on the table. Specifically I wanted to the logarithmic coding formats in there to make it clear this wasn't just for the integer ones.> I would also stringly advise against supporting **any** APDCM > format. These things are a PITA to support and some cannot be > supported without extending the header. For instance, microsoft's > ADPCM requires that a set of 8 coefficients require dor decoding > be sent in the header. Most of the other ADPCM have blocks sizes > that need to be sent. All in all this is a huge PITA. In comparison > to FLAC, Speex and Vorbis, APDCM formats have little to offer.No objection here. I'd like to see someone other than myself go through and cull the list of formats into whatever a practical subset is. As long as it does 16 bit signed little endian interleaved, I'll be happy.> I still think that assigning meaning to bits within the format field > is a mistake. Specifying bits like this could only be useful if > you expect the decoder to generate code on the fly when it gets > asked to decode say 16 bit, unsigned, little endian. Auto generated > code that automagically supports all of these formats is significantly > harder to write and debug than the equivalent set of single purpose > decoders so I would suggest that this auto-magic stuff is a bad idea. > Ergo, assigning meaning to the bits is a bad idea as well.I'm fine with a straight enumeration.. I put the extra fields in there more as a discussion point, saying "if you want to have some meaning here, this is how I'd break it out." As I tried to make clear, I can't really think of a good use for it. The only thing I can think of would be code that extracts the 4 bytes for the sample then calls some other function based on the coding type to convert it, but calling a function on every sample is always a bad idea.> Again, I strongly recommend against allowing non-interleaved data. > It simply complicates everything far more than it needs to be.This is probably the only point we may disagree on. Having the data chunked opens the door for a whole host of SIMD optimized filters and it definitely could be a useful internal representation along a filter chain. As long as you're only dealing with byte aligned data, I don't think the storage and retrieval is that difficult. I agree, it's probably not very useful in the general case, but there are some cases where it is, so it may be worth defining it. I'm imagining the case of writing a command line filter chain, for instance: $ snd_capture | deinterlace | denoise | normalize | interlace | compress (yes, we'll ignore the fact that you can't normalize in one pass...)> Reserved for what? I can't possibly think what it could be used for.These were the 6 bits of the format id I reserved for the storage size and endianness, so that's what they could be used for. As for what an application would use them for, I agree, I'm at a loss.> This is a file header. Even under the most bloated scheme we could > think of, its unlikely to be more than 100 bytes and it will be > followed by hundreds of kilobytes at least of audio data. So why > are we trying to conserve a couple of bits in the header?Agreed.. I'm an embedded/dsp guy by trade, so these are the things I think of. The comment I was trying to make is that reserving 30% of a word for "this can't be described by these fields" is ugly, ugly, ugly. John
jkoleszar@on2.com wrote:> I updated the wiki with another rev of this format. Updates include > support for 43 formats in 14 coding schemes, as derived from the ALSA API.As an interested bystander, I see that this proposal still has only one format and one rate field for possibly many channels. Earlier someone made the point that you might want to store main and side/back channels with different width and/or rate. Using different logical streams was suggested as an alternative, but has that been discussed enough? Still only as that interested bystander, I expect different rates to probably be a pain (what's a frame?) but per-channel format sounded fairly straightforward. Another point made was that of encoding channel information. Ie, are N channels N mono channels, are some pairs to be considered stereo-pairs, some M-tuples to be considered <M-phonic> pairs? Don't see anything on the wiki about those issues, so thought I'd bring them up again just in case they slipped through. Not lobbying for anything, just interested... Rene.
Rene Herman wrote:> As an interested bystander, I see that this proposal still has only one > format and one rate field for possibly many channels. Earlier someone > made the point that you might want to store main and side/back channels > with different width and/or rate. Using different logical streams was > suggested as an alternative, but has that been discussed enough? > > Still only as that interested bystander, I expect different rates to > probably be a pain (what's a frame?) but per-channel format sounded > fairly straightforward.The only sensible way to do this is to have the more than one logical stream with all channels in the same stream having the same sample rate.> Another point made was that of encoding channel information. Ie, are N > channels N mono channels, are some pairs to be considered stereo-pairs, > some M-tuples to be considered <M-phonic> pairs?See http://wiki.xiph.org/index.php/OggPCM2 Erik -- +-----------------------------------------------------------+ Erik de Castro Lopo +-----------------------------------------------------------+ "Men never do evil so completely and cheerfully as when they do it from religious conviction." -- Blaise Pascal, mathematician, 1670