thr3ads.net - ogg dev - [ogg-dev] OggPCM proposal feedback [Nov 2005]

If this information is useful, please help other people find it:
Share via:

Erik de Castro Lopo

2005-Nov-09 15:13 UTC

[ogg-dev] OggPCM proposal feedback

Hi all,

Siliva contacted me about this OggPCM proposal and asked me
to join in. For those who don't know me, I am the main author
and maintainer of libsndfile and therefore know quite a bit
about how uncompressed audio is stored in sound files. However
even I would not consider myself an expert; there are areas
to do with channel assignments that I know I am ignorant of.
I am also quite ignorant of the Ogg container format.

I have now read:

    http://wiki.xiph.org/OggPCM

and find that it has a number of short comings. 


  a) There is no marker to distinguish little endian data
     from big endian data.
  b) There is no mention of audio data being help in double
     precision (64 bit) floating point. Current this is
     supported in libsndfile by WAV, AIFF, AU, IRCAM and the
     two different Matlab/Octave file formats (I may also
     have overlooked some).
  c) I think having separate fields for things like signed/
     unsigned/float and bit width is a mistake. I would suggest
     instead a single field that encodes all this information
     in a enumeration. Ie:

         OGG_PCM_U8          /* Unsigned 8 bit */
         OGG_PCM_S8         /* Signed 8 bit. */
         OGG_PCM_S16
         OGG_PCM_S24
         OGG_PCM_S32
         OGG_PCM_FLOAT32
         OGG_PCM_FLOAT64

     and so on. This scheme makes it very difficult to get 
     signed/unsigned and bitwith messed up.
  d) Don't bother implementing unsigned PCM for bit widths
     greater than 8 bits. No other common file format uses 
     it and those unsigned formats are a pain to work with.
  e) Consider whether the endianness should also be encoded
     in the enumeration above. I would recommend that it is
     resulting in:

         OGG_PCM_U8          /* Unsigned 8 bit */
         OGG_PCM_S8         /* Signed 8 bit. */
         OGG_PCM_LE_S16
         OGG_PCM_BE_S16
         OGG_PCM_LE_S24
         OGG_PCM_BE_S24
         ...
         OGG_PCM_LE_FLOAT32
         OGG_PCM_BE_FLOAT32
         ...
  
  f) Encoding of channel information. In a two channel file,
     is the audio data a stereo image or two distinct mono
     channels? For a file with N (> 2) channels, are there 
     pairs of channels which should be considered as a stereo
     pairs or do you want to place these stereo pairs as 
     separate streams within a single ogg container? What
     about multi channel surround sound (there are a number
     of different formats like 5.1 and 7.1) or quadraphonic? 
     How are you going to specify which channel is which. 
     Being able to encode this stuff easily is **vital**.
  g) With things like surround sound, are you going to allow
     24 bit audio for the main stereo pair and 16 bits for
     the side channels? This might best be achieved using
     separate stream, but that would make channel information 
     all that more important. Is it useful to have PCM for the
     main stereo pair and say vorbis encoding for the side
     channels?
     
Please realize that this is all just off the top of my head.
There may be a bunch of other stuff I have overlooked. 


Is it OK if I can get some other people that know more about 
this stuff involved?

Erik
-- 
+-----------------------------------------------------------+
  Erik de Castro Lopo
+-----------------------------------------------------------+
"I'm not proud   .... We really haven't done everything we could
to protect our customers ... Our products just aren't engineered
for security." -- Brian Valentine, Senior Vice President of
Microsoft's Windows development team

Ralph Giles

2005-Nov-09 15:50 UTC

head link

[ogg-dev] OggPCM proposal feedback

On Thu, Nov 10, 2005 at 10:13:19AM +1100, Erik de Castro Lopo wrote:
> Is it OK if I can get some other people that know more about 
> this stuff involved?
By all means. And thanks for responding so quickly, this is quite 
helpful.

 -r

Arc

2005-Nov-09 16:24 UTC

head link

[ogg-dev] OggPCM proposal feedback

On Thu, Nov 10, 2005 at 10:13:19AM +1100, Erik de Castro Lopo
wrote:> 
> Siliva contacted me about this OggPCM proposal and asked me
> to join in. 
Yes, she mentioned that she would.  Thank you for your suggestions, they 
are well thought out and quite helpful.  It's alot to process at once, 
and I (as the original author of the current OggPCM draft spec) will 
reply more fully soon with feedback.

> I have now read:
> 
>     http://wiki.xiph.org/OggPCM
Also, http://wiki.xiph.org/Talk:OggPCM - or the discussion tab at the 
top of the page.  That's where the debates on this are mostly ongoing..
 
> Please realize that this is all just off the top of my head.
> There may be a bunch of other stuff I have overlooked. 
How I feel, too.
 
> Is it OK if I can get some other people that know more about 
> this stuff involved?
As Ralph already said, Absolutly.

-- 

The recognition of individual possibility,
 to allow each to be what she and he can be,
  rests inherently upon the availability of knowledge;
 The perpetuation of ignorance is the beginning of slavery.

from "Die Gedanken Sind Frei": Free Software and the Struggle for Free
Thought
 by Eben Moglen, General council of the Free Software Foundation

Arc

2005-Nov-09 22:57 UTC

head link

[ogg-dev] OggPCM proposal feedback

On Thu, Nov 10, 2005 at 10:13:19AM +1100, Erik de Castro Lopo
wrote:> 
>   a) There is no marker to distinguish little endian data
>      from big endian data.
The original reason for this is because Ogg makes such a matter moot, 
since the bitpacker in libogg2 handles endian.. however, if a "chunk" 
packer is made available (similar to memcpy), this becomes important 
since we'll want to copy the data in which ever endian it already is.

Does endian vary widely for raw audio codecs, or would it be reasonable 
to settle on one standard and expect all codecs to convert to the 
correct endian which don't comply with the "norm"?  If most
hardware
supports one endian or another, I say we should stick to that, since 
that's what the codec plugins would export anyway.
>   b) There is no mention of audio data being help in double
>      precision (64 bit) floating point. Current this is
>      supported in libsndfile by WAV, AIFF, AU, IRCAM and the
>      two different Matlab/Octave file formats (I may also
>      have overlooked some).
The bits per sample field covers this.  Set this to "64" and set the 
data type to "float" and it "should just work"...

>   c) I think having separate fields for things like signed/
>      unsigned/float and bit width is a mistake. I would suggest
>      instead a single field that encodes all this information
>      in a enumeration. Ie:
> 
>          OGG_PCM_U8          /* Unsigned 8 bit */
>          OGG_PCM_S8         /* Signed 8 bit. */
>          OGG_PCM_S16
>          OGG_PCM_S24
>          OGG_PCM_S32
>          OGG_PCM_FLOAT32
>          OGG_PCM_FLOAT64
> 
>      and so on. This scheme makes it very difficult to get 
>      signed/unsigned and bitwith messed up.
>   d) Don't bother implementing unsigned PCM for bit widths
>      greater than 8 bits. No other common file format uses 
>      it and those unsigned formats are a pain to work with.
Problem with this is inflexibility.  See, not ever application must 
support every possible combination of formatting - in fact, many will 
require a very small set of parameters going in, ie, "it must be float 
of 16, 24, 32, or 64 bit" or "it must be 16 or 24 bit signed".  

Implementors will never, very likely, implement 32-bit unsigned int, and 
that is not an issue.  If some fool does, his data will simply not be 
accessable to any other codec or application unless he writes a 
conversion plugin, which in essence, treats the two sides (from 
OggStream's perspective) as two entirely different codecs, even if both 
are in OggPCM format.

The flexibility of this does, though, encourage stuff like 96bit audio.  
Anyone implementing a codec which uses this, and import/exports it, will 
also write the appropriate conversion OggStream plugin which will allow 
applications which only support, say, 16bit audio, to work with it.

I guess you could chalk this up to an inherit difference in philosophy 
and purpose between OggPCM and RIFF/WAVE (.wav).. theirs is as much an 
interchange format as a storage codec, where OggPCM isn't really 
intended for storage.  FLAC (Free Lossless Audio Codec) limits to a 
certain number of formats, and all decoders can decode these formats, 
and it's well suited for storage as a /compressed/ lossless codec..

As primarily an interchange codec, if you have some rare or new format 
being imported/exported from your new codec, you had better also make 
sure it can itself support more common formats (ie, 44100/16/2) or that 
you include a conversion plugin which does that for your users.

>   f) Encoding of channel information. In a two channel file,
>      is the audio data a stereo image or two distinct mono
>      channels? For a file with N (> 2) channels, are there 
>      pairs of channels which should be considered as a stereo
>      pairs or do you want to place these stereo pairs as 
>      separate streams within a single ogg container? What
>      about multi channel surround sound (there are a number
>      of different formats like 5.1 and 7.1) or quadraphonic? 
>      How are you going to specify which channel is which. 
>      Being able to encode this stuff easily is **vital**.
I agree - this is something that wasn't on my radar until this morning 
when MikeS was asking about the channel layout in Vorbis/FLAC.  How 
would you suggest this data be included in the binary header?  I 
honestly have no experience with anything other than mono and stereo.

It should all be in the same stream.  

>   g) With things like surround sound, are you going to allow
>      24 bit audio for the main stereo pair and 16 bits for
>      the side channels? This might best be achieved using
>      separate stream, but that would make channel information 
>      all that more important. Is it useful to have PCM for the
>      main stereo pair and say vorbis encoding for the side
>      channels?
Do people really do such things as encode different channels with 
different sample sizes (and, I assume, samplerates)?

I'd really like to prefer keeping a fixed samplesize/samplerate for all 
channels.  I really doubt any Ogg audio codec is going to get that 
complicated anytime soon, and if it's really needed, a codec plugin 
/could/ be fed/provide packets from multiple OggPCM bitstreams, just 
like how a+v codecs (ie, DV) would import/export OggPCM+OggYUV.

Is there anything else you've thought of that we've missed?

-- 

The recognition of individual possibility,
 to allow each to be what she and he can be,
  rests inherently upon the availability of knowledge;
 The perpetuation of ignorance is the beginning of slavery.

from "Die Gedanken Sind Frei": Free Software and the Struggle for Free
Thought
 by Eben Moglen, General council of the Free Software Foundation

illiminable

2005-Nov-09 23:45 UTC

head link

[ogg-dev] OggPCM proposal feedback

----- Original Message ----- 
From: "Arc" <arc@Xiph.org>
To: <ogg-dev@Xiph.org>
Sent: Thursday, November 10, 2005 2:57 PM
Subject: Re: [ogg-dev] OggPCM proposal feedback

> On Thu, Nov 10, 2005 at 10:13:19AM +1100, Erik de Castro Lopo wrote:
>>
>>   a) There is no marker to distinguish little endian data
>>      from big endian data.
>
> The original reason for this is because Ogg makes such a matter moot,
> since the bitpacker in libogg2 handles endian.. however, if a
"chunk"
The problem i see with this, correct me if i'm wrong, but you are suggesting
you are going to get a really large chunk of pcm and feed it into libogg 
word at a time, in order to have it's endianness possibly reversed ? And 
then on the other end, read it out word at a time ? If as you suggest this 
is primarily for interchange, then this seems a really inefficient way to 
pass data around.

The reason it should allow specification for endianness is that, if it is to 
be used as an interchange format, it needs to be easily copied around in a 
format friendly to the host processer. Requiring the byte order flipping of 
every sample seems contrary to the purpose of such a format.

> Problem with this is inflexibility.  See, not ever application must
> support every possible combination of formatting - in fact, many will
> require a very small set of parameters going in, ie, "it must be float
> of 16, 24, 32, or 64 bit" or "it must be 16 or 24 bit
signed".
What you are saying is that outputs will support some set of data formats, 
and inputs will support some other set of data inputs. In other words, each 
component only supports the types it knows how to do something with. And so, 
the hypothetical situation of some new format, not in the enumeration comes 
along, and of course, the components won't support it since they only 
support their subset of data they know how to handle. So, a new value needs 
to be added to the enumeration. The result is, existing components (which 
didn't support the new format anyway), will still be in the same situation 
if they come across a new file, with an enumeration value they don't 
understand.

So, if a new format comes along which does require a new enumeration value, 
then the components are going to have to be modified to support that anyway, 
and the adding another value to the list they support is a non-issue. The 
only possible way it's inflexible is if the number of values in the 
enumeration exceeds the size of the field it has. Probably unlikely even 
with 8 bits, almost certainly unlikely for 16.

> Implementors will never, very likely, implement 32-bit unsigned int, and
> that is not an issue.  If some fool does, his data will simply not be
> accessable to any other codec or application unless he writes a
> conversion plugin, which in essence, treats the two sides (from
> OggStream's perspective) as two entirely different codecs, even if both
> are in OggPCM format.
I know you are working on OggStream, and this is the perspective you are 
taking, but other implementations ie directshow, quicktime, mplayer, 
gstreamer don't and maybe won't be using oggstream. So i think we need
to
take a bigger picture approach what assumptions are being made here.
> I guess you could chalk this up to an inherit difference in philosophy
> and purpose between OggPCM and RIFF/WAVE (.wav).. theirs is as much an
> interchange format as a storage codec, where OggPCM isn't really
> intended for storage.  FLAC (Free Lossless Audio Codec) limits to a
> certain number of formats, and all decoders can decode these formats,
> and it's well suited for storage as a /compressed/ lossless codec..
This is another issue i see as a problem, why is it only an interchange 
format ? What is the rationale for that ? There are many people who want a 
storage format. Are you suggesting that another raw storage format also be 
made ?

I think this is the wrong approach, flac and other codecs operate on a 
tighter subset, because they have to perform complex transformations on the 
data, and supporting too many types increase complexity. A raw format 
essentially needs no processing, it just needs copying into a buffer that 
supports that type of data.

> I'd really like to prefer keeping a fixed samplesize/samplerate for all
> channels.  I really doubt any Ogg audio codec is going to get that
> complicated anytime soon, and if it's really needed, a codec plugin
> /could/ be fed/provide packets from multiple OggPCM bitstreams, just
> like how a+v codecs (ie, DV) would import/export OggPCM+OggYUV.
I think this is another problem with your approach, you are assuming that 
ogg is a closed format, and that only "ogg" formats are the issue
here. Ogg
is a generic container format, other codecs can and will be used inside it. 
Making assumptions based only on the current set of xiph codecs in my opinon 
is a little narrow focussed.

Zen.

Erik de Castro Lopo

2005-Nov-10 00:03 UTC

head link

[ogg-dev] OggPCM proposal feedback

Arc wrote:
> Does endian vary widely for raw audio codecs,
Well there are really only two endian-nesses, big and little.

WAV is usually little endian but there is also a (very rare) big endian 
version. AIFF is usually little endian but also supports big endian 
encoding. CAF, AU, IRCAM and a number of others support both endian-nesses
equally.
> or would it be reasonable 
> to settle on one standard and expect all codecs to convert to the 
> correct endian which don't comply with the "norm"?
Not reasonable.
> The bits per sample field covers this.  Set this to "64" and set
the
> data type to "float" and it "should just work"...
See my comment on the wiki:

    http://wiki.xiph.org/Talk:OggPCM

Most importantly:

    "Please don't make determination of the data format depend on 
     multiple fields. Instead use an enumeration so that something 
     like little endian 16 bit PCM can be specifed as OGG_PCM_LE_PCM_16 
     and big endian 64 bit doubles can be specified as OGG_PCM_BE_FLOAT_64. 
     This scheme is far more transparent and self documenting. If the 
     format field is 8 bits, this scheme supports 256 formats; if its 16 
     bit it will support 65536 formats.
> >   c) I think having separate fields for things like signed/
> >      unsigned/float and bit width is a mistake. I would suggest
> >      instead a single field that encodes all this information
> >      in a enumeration. Ie:
> > 
> >          OGG_PCM_U8          /* Unsigned 8 bit */
> >          OGG_PCM_S8         /* Signed 8 bit. */
> >          OGG_PCM_S16
> >          OGG_PCM_S24
> >          OGG_PCM_S32
> >          OGG_PCM_FLOAT32
> >          OGG_PCM_FLOAT64
> > 
> >      and so on. This scheme makes it very difficult to get 
> >      signed/unsigned and bitwith messed up.
You didn't address this issue. Do you think it is unimportant?
> >   d) Don't bother implementing unsigned PCM for bit widths
> >      greater than 8 bits. No other common file format uses 
> >      it and those unsigned formats are a pain to work with.
> 
> Problem with this is inflexibility.  See, not ever application must 
> support every possible combination of formatting -
Exactly, a codec could support OGG_PCM_S16, OGG_PCM_FLOAT32 and thats
it. If the decoder in the codec wants to figure out if it supports the 
current file it can do:

    if (format != OGG_PCM_S16 && format != OGG_PCM_FLOAT32)
       ooops_we_dont_handle_this ("some error message");

This is far less error prone than:

    if (! (bitwdith == 16 && signed && data_format ==
OGG_PCM_PCM)
           || ! (bitwdith == 32 && data_format == OGG_PCM_FLOAT))
       ooops_we_dont_handle_this ("some error message");
> in fact, many will require a very small set of parameters going in,
My propsal has a small number of parameters; one. I don't thinks its
practical to have zero parameters. How this:

    switch (format)
    {   case OGG_PCM_S8 :
        case OGG_PCM_FLOAT32 :
        case OGG_PCM_FLOAT64 :
                /* ALl Ok. */
                break ;
                
        default:
                ooops_we_dont_handle_this ("some error message");
                break ;
        }

Its hard to get this wrong and its obvious when it is wrong.
> ie, "it must be float 
> of 16, 24, 32, or 64 bit"
There is no such thing as 16 and 24 bit float. 
> Implementors will never, very likely, implement 32-bit unsigned int, 
My point exactly. So why even make it possible? If that changes at some
point in the future add the enumeration.
> >   f) Encoding of channel information. In a two channel file,
> >      is the audio data a stereo image or two distinct mono
> >      channels? For a file with N (> 2) channels, are there 
> >      pairs of channels which should be considered as a stereo
> >      pairs or do you want to place these stereo pairs as 
> >      separate streams within a single ogg container? What
> >      about multi channel surround sound (there are a number
> >      of different formats like 5.1 and 7.1) or quadraphonic? 
> >      How are you going to specify which channel is which. 
> >      Being able to encode this stuff easily is **vital**.
> 
> I agree - this is something that wasn't on my radar until this morning 
> when MikeS was asking about the channel layout in Vorbis/FLAC.  How 
> would you suggest this data be included in the binary header?  I 
> honestly have no experience with anything other than mono and stereo.
I have little more experience than you. I sent invitations for people
to join this discussion to the music-dsp mailing list. I hope somebody
knowledgeable will show up.
> >   g) With things like surround sound, are you going to allow
> >      24 bit audio for the main stereo pair and 16 bits for
> >      the side channels? This might best be achieved using
> >      separate stream, but that would make channel information 
> >      all that more important. Is it useful to have PCM for the
> >      main stereo pair and say vorbis encoding for the side
> >      channels?
> 
> Do people really do such things as encode different channels with 
> different sample sizes (and, I assume, samplerates)?
Different bitwidth makes sense. You need to high dynamic range
on your main stereo signal, but probably not on the side channels.

Different sample rates also makes sense. If the main stereo pair
is sampled at 96kHz it makes sense to have the sub bass signal
(ie all the low frequencies) sampled at a much lower rate. For a
sub-bass signal 8kHz might be appropriate.
> I'd really like to prefer keeping a fixed samplesize/samplerate for all
> channels.  I really doubt any Ogg audio codec is going to get that 
> complicated anytime soon,
Really? What about a high quality Ogg video stream multiplexed with 
a 5.1 audio stream?
> Is there anything else you've thought of that we've missed?
Not yet, but we haven't heard from anyone else yet. I would like
to see input (or at least an OK) from a large number of people in
the audio field.

Erik
-- 
+-----------------------------------------------------------+
  Erik de Castro Lopo
+-----------------------------------------------------------+
'Unix beats Windows' - says Microsoft! 
http://blogs.zdnet.com/Murphy/index.php?p=459

Thomas Vander Stichele

2005-Nov-10 02:25 UTC

head link

[ogg-dev] OggPCM proposal feedback

Hi,

> The flexibility of this does, though, encourage stuff like 96bit audio.  
> Anyone implementing a codec which uses this, and import/exports it, will 
> also write the appropriate conversion OggStream plugin which will allow 
> applications which only support, say, 16bit audio, to work with it.
Do you think the noise in your 16bit application will sound different
between a conversion from a 96bit or 80bit audio file from the same
analog source ? If the argument for keeping these fields freeform is to
support 96bit audio, I'd say Erik is right that you shouldn't pick
freeform fields.

As a practical matter, I don't see a direct use case for a
file/interchange format with a 540 dB dynamical range.

Thomas


Dave/Dina : future TV today ! - http://www.davedina.org/
<-*- thomas (dot) apestaart (dot) org -*->
I'm emotionally raped by Jesus
<-*- thomas (at) apestaart (dot) org -*->
URGent, best radio on the net - 24/7 ! - http://urgent.fm/

oliver oli

2005-Nov-10 05:44 UTC

head link

[ogg-dev] OggPCM proposal feedback

Erik de Castro Lopo wrote:>   f) Encoding of channel information. In a two channel file,
>      is the audio data a stereo image or two distinct mono
>      channels? For a file with N (> 2) channels, are there 
>      pairs of channels which should be considered as a stereo
>      pairs or do you want to place these stereo pairs as 
>      separate streams within a single ogg container? What
>      about multi channel surround sound (there are a number
>      of different formats like 5.1 and 7.1) or quadraphonic? 
>      How are you going to specify which channel is which. 
>      Being able to encode this stuff easily is **vital**.
please don't forget ambisonics :-)

2 channel UHJ
4 channel 1st order
9 channel 2nd order
16 channel 3rd order

etc.

John Koleszar

2005-Nov-10 06:15 UTC

head link

[ogg-dev] OggPCM proposal feedback

I threw a rough draft of an alternative format incorporating the 
comments received so far in this discussion on the wiki:
http://wiki.xiph.org/index.php/OggPCM#Format

Oliver,
This seems to me like it would support the ambisonic requirements you 
mention, though it doesn't (and I imagine won't) describe the mic 
locations. Somebody who actually uses that info could probably define 
extra header pages for a later version of this spec. I hadn't even heard 
of ambisonics until your post, to be honest.

John

oliver oli wrote:
> Erik de Castro Lopo wrote:
>
>>   f) Encoding of channel information. In a two channel file,
>>      is the audio data a stereo image or two distinct mono
>>      channels? For a file with N (> 2) channels, are there     
pairs
>> of channels which should be considered as a stereo
>>      pairs or do you want to place these stereo pairs as      
>> separate streams within a single ogg container? What
>>      about multi channel surround sound (there are a number
>>      of different formats like 5.1 and 7.1) or quadraphonic?      How 
>> are you going to specify which channel is which.      Being able to 
>> encode this stuff easily is **vital**.
>
>
> please don't forget ambisonics :-)
>
> 2 channel UHJ
> 4 channel 1st order
> 9 channel 2nd order
> 16 channel 3rd order
>
> etc.
> _______________________________________________
> ogg-dev mailing list
> ogg-dev@xiph.org
> http://lists.xiph.org/mailman/listinfo/ogg-dev

Possibly Parallel Threads

Search for more reasonably related threads

ogg dev - Nov 2005 - OggPCM proposal feedback

[ogg-dev] OggPCM proposal feedback

[ogg-dev] OggPCM proposal feedback

[ogg-dev] OggPCM proposal feedback

[ogg-dev] OggPCM proposal feedback

[ogg-dev] OggPCM proposal feedback

[ogg-dev] OggPCM proposal feedback

[ogg-dev] OggPCM proposal feedback

[ogg-dev] OggPCM proposal feedback

[ogg-dev] OggPCM proposal feedback

Possibly Parallel Threads