thr3ads.net - ogg dev - [ogg-dev] OggPCM format description, rev 3 [Nov 2005]

If this information is useful, please help other people find it:
Share via:

jkoleszar@on2.com

2005-Nov-13 15:13 UTC

[ogg-dev] OggPCM format description, rev 3

Hi all,

I updated the wiki with another rev of this format. Updates include
support for 43 formats in 14 coding schemes, as derived from the ALSA API.
This seemed like a good way to get a list of what the formats in common
use out there are, so it should be fairly comprehensive.

Modifications to the "rev 2" format:
1. Expanded the 'id' field to support more than 7 formats. Format id is
now 16 bits, but the lower 6 bits are reserved to describe the storage
packing.

2. Removed the ID word on the data packets and added a "number of
comments" field to the header packet. This was done to preserve the
alignment of the data payload.

3. Changed the byte ordering in the header packet to be big endian, to be
consistent with network byte order.

New since "rev 2":
1. Added the notion of chunked vs interleaved storage.

For discussion:
1. How useful is the 6 reserved bits in the format id? I'm a little
uncomfortable with it, since it's only 70% (45/64) efficient, and I
can't
think of anything really useful to do with the data, but maybe someone
else can. On the other hand, support for 1024 different coding types in
the upper 10 bits seems sufficient to me, and you could in theory create a
flag in a later minor rev to use some of the reserved space in the flags
field as extra bits for the coding type, so it may not be too bad to keep
it.

2. Does supporting both chunked and interleaved storage place too much of
a burden on the applications? There are definite performance related
advantages to storing data in a chunked format for some operations.

3. Does Ogg support zero length data packets? This was something I added
as an afterthought, to support the case where an application might not
know that the packet it just stuffed into the stream was actually the last
packet, so I thought it might be useful to be able to store a zero length
data packet with the EOS flag set that an application could use to
finalize the stream.

Anyway, have at it, and thanks in advance for the feedback.

Cheers,

John

Michael Smith

2005-Nov-13 15:43 UTC

head link

[ogg-dev] OggPCM format description, rev 3

> 3. Does Ogg support zero length data packets? This was something I added
> as an afterthought, to support the case where an application might not
> know that the packet it just stuffed into the stream was actually the last
> packet, so I thought it might be useful to be able to store a zero length
> data packet with the EOS flag set that an application could use to
> finalize the stream.
Ogg explicitly supports empty pages with the EOS flag set for
precisely this purpose. At least in theory (i.e. according to spec); I
haven't actually tested this with real applications. So codecs don't
need zero-length packets to tag this case. I'm not sure whether zero
length packets would work, but I see no obvious reason why they
couldn't; I think they probably do.

Mike

Erik de Castro Lopo

2005-Nov-13 15:57 UTC

head link

[ogg-dev] OggPCM format description, rev 3

Hi John,

jkoleszar@on2.com wrote:
> Hi all,
> 
> I updated the wiki with another rev of this format. Updates include
> support for 43 formats in 14 coding schemes, as derived from the ALSA API.
> This seemed like a good way to get a list of what the formats in common
> use out there are, so it should be fairly comprehensive.
Unfortunately the ALSA API defines a number of formats which are
in practice extremely rare. In particular, any unsigned int format
larger than 8 bits. For instance, the only unsigned int type that
libsndfile supports is unsigned 8 bit.

I would also stringly advise against supporting **any** APDCM 
format. These things are a PITA to support and some cannot be 
supported without extending the header. For instance, microsoft's 
ADPCM  requires that a set of 8 coefficients require dor decoding 
be sent in the header. Most of the other ADPCM have blocks sizes
that need to be sent. All in all this is a huge PITA. In comparison
to FLAC, Speex and Vorbis, APDCM formats have little to offer.
> Modifications to the "rev 2" format:
> 1. Expanded the 'id' field to support more than 7 formats. Format
id is
> now 16 bits, but the lower 6 bits are reserved to describe the storage
> packing.
I still think that assigning meaning to bits within the format field
is a mistake. Specifying bits like this could only be useful if
you expect the decoder to generate code on the fly when it gets
asked to decode say 16 bit, unsigned, little endian. Auto generated
code that automagically supports all of these formats is significantly
harder to write and debug than the equivalent set of single purpose
decoders so I would suggest that this auto-magic stuff is a bad idea.
Ergo, assigning meaning to the bits is a bad idea as well.
> New since "rev 2":
> 1. Added the notion of chunked vs interleaved storage.
Again, I strongly recommend against allowing non-interleaved data.
It simply complicates everything far more than it needs to be.
> For discussion:
> 1. How useful is the 6 reserved bits in the format id? I'm a little
Reserved for what? I can't possibly think what it could be used for.
> uncomfortable with it, since it's only 70% (45/64) efficient,
This is a file header. Even under the most bloated scheme we could
think of, its unlikely to be more than 100 bytes and it will be
followed by hundreds of kilobytes at least of audio data. So why
are we trying to conserve a couple of bits in the header?
> 2. Does supporting both chunked and interleaved storage place too much of
> a burden on the applications?
I believe it does.

Cheers,
Erik
-- 
+-----------------------------------------------------------+
  Erik de Castro Lopo
+-----------------------------------------------------------+
"That being done, all you have to do next is call free() slightly 
less often than malloc(). You may want to examine the Solaris 
system libraries for a particularly ambitious implementation of 
this technique."
-- Eric O'Dell (comp.lang.dylan)

jkoleszar@on2.com

2005-Nov-13 18:18 UTC

head link

[ogg-dev] OggPCM format description, rev 3

> Unfortunately the ALSA API defines a number of formats which are
> in practice extremely rare. In particular, any unsigned int format
> larger than 8 bits. For instance, the only unsigned int type that
> libsndfile supports is unsigned 8 bit.
I expected this, it just seemed like a good starting point to get more
than 7 formats on the table. Specifically I wanted to the logarithmic
coding formats in there to make it clear this wasn't just for the integer
ones.
> I would also stringly advise against supporting **any** APDCM
> format. These things are a PITA to support and some cannot be
> supported without extending the header. For instance, microsoft's
> ADPCM  requires that a set of 8 coefficients require dor decoding
> be sent in the header. Most of the other ADPCM have blocks sizes
> that need to be sent. All in all this is a huge PITA. In comparison
> to FLAC, Speex and Vorbis, APDCM formats have little to offer.
No objection here. I'd like to see someone other than myself go through
and cull the list of formats into whatever a practical subset is. As long
as it does 16 bit signed little endian interleaved, I'll be happy.
> I still think that assigning meaning to bits within the format field
> is a mistake. Specifying bits like this could only be useful if
> you expect the decoder to generate code on the fly when it gets
> asked to decode say 16 bit, unsigned, little endian. Auto generated
> code that automagically supports all of these formats is significantly
> harder to write and debug than the equivalent set of single purpose
> decoders so I would suggest that this auto-magic stuff is a bad idea.
> Ergo, assigning meaning to the bits is a bad idea as well.
I'm fine with a straight enumeration.. I put the extra fields in there
more as a discussion point, saying "if you want to have some meaning here,
this is how I'd break it out." As I tried to make clear, I can't
really
think of a good use for it. The only thing I can think of would be code
that extracts the 4 bytes for the sample then calls some other function
based on the coding type to convert it, but calling a function on every
sample is always a bad idea.
> Again, I strongly recommend against allowing non-interleaved data.
> It simply complicates everything far more than it needs to be.
This is probably the only point we may disagree on. Having the data
chunked opens the door for a whole host of SIMD optimized filters and it
definitely could be a useful internal representation along a filter chain.
As long as you're only dealing with byte aligned data, I don't think the
storage and retrieval is that difficult. I agree, it's probably not very
useful in the general case, but there are some cases where it is, so it
may be worth defining it.

I'm imagining the case of writing a command line filter chain, for instance:
 $ snd_capture | deinterlace | denoise | normalize | interlace | compress
(yes, we'll ignore the fact that you can't normalize in one pass...)
> Reserved for what? I can't possibly think what it could be used for.
These were the 6 bits of the format id I reserved for the storage size and
endianness, so that's what they could be used for. As for what an
application would use them for, I agree, I'm at a loss.
> This is a file header. Even under the most bloated scheme we could
> think of, its unlikely to be more than 100 bytes and it will be
> followed by hundreds of kilobytes at least of audio data. So why
> are we trying to conserve a couple of bits in the header?
Agreed.. I'm an embedded/dsp guy by trade, so these are the things I think
of. The comment I was trying to make is that reserving 30% of a word for
"this can't be described by these fields" is ugly, ugly, ugly.

John

Rene Herman

2005-Nov-14 04:48 UTC

head link

[ogg-dev] OggPCM format description, rev 3

jkoleszar@on2.com wrote:
> I updated the wiki with another rev of this format. Updates include
> support for 43 formats in 14 coding schemes, as derived from the ALSA API.
As an interested bystander, I see that this proposal still has only one 
format and one rate field for possibly many channels. Earlier someone 
made the point that you might want to store main and side/back channels 
with different width and/or rate. Using different logical streams was 
suggested as an alternative, but has that been discussed enough?

Still only as that interested bystander, I expect different rates to 
probably be a pain (what's a frame?) but per-channel format sounded 
fairly straightforward.

Another point made was that of encoding channel information. Ie, are N 
channels N mono channels, are some pairs to be considered stereo-pairs, 
some M-tuples to be considered <M-phonic> pairs?

Don't see anything on the wiki about those issues, so thought I'd bring 
  them up again just in case they slipped through. Not lobbying for 
anything, just interested...

Rene.

Erik de Castro Lopo

2005-Nov-14 08:08 UTC

head link

[ogg-dev] OggPCM format description, rev 3

Rene Herman wrote:
> As an interested bystander, I see that this proposal still has only one 
> format and one rate field for possibly many channels. Earlier someone 
> made the point that you might want to store main and side/back channels 
> with different width and/or rate. Using different logical streams was 
> suggested as an alternative, but has that been discussed enough?
> 
> Still only as that interested bystander, I expect different rates to 
> probably be a pain (what's a frame?) but per-channel format sounded 
> fairly straightforward.
The only sensible way to do this is to have the more than one logical
stream with all channels in the same stream having the same sample
rate.
> Another point made was that of encoding channel information. Ie, are N 
> channels N mono channels, are some pairs to be considered stereo-pairs, 
> some M-tuples to be considered <M-phonic> pairs?
See  http://wiki.xiph.org/index.php/OggPCM2

Erik
-- 
+-----------------------------------------------------------+
  Erik de Castro Lopo
+-----------------------------------------------------------+
"Men never do evil so completely and cheerfully as when they do it
from religious conviction."  -- Blaise Pascal, mathematician, 1670

Reasonably Related Threads

Search for more apparently analagous threads

ogg dev - Nov 2005 - OggPCM format description, rev 3

[ogg-dev] OggPCM format description, rev 3

[ogg-dev] OggPCM format description, rev 3

[ogg-dev] OggPCM format description, rev 3

[ogg-dev] OggPCM format description, rev 3

[ogg-dev] OggPCM format description, rev 3

[ogg-dev] OggPCM format description, rev 3

Reasonably Related Threads