thr3ads.net - ogg dev - [ogg-dev] OggPCM proposal feedback [Nov 2005]

If this information is useful, please help other people find it:
Share via:

Erik de Castro Lopo

2005-Nov-10 00:03 UTC

[ogg-dev] OggPCM proposal feedback

Arc wrote:
> Does endian vary widely for raw audio codecs,
Well there are really only two endian-nesses, big and little.

WAV is usually little endian but there is also a (very rare) big endian 
version. AIFF is usually little endian but also supports big endian 
encoding. CAF, AU, IRCAM and a number of others support both endian-nesses
equally.
> or would it be reasonable 
> to settle on one standard and expect all codecs to convert to the 
> correct endian which don't comply with the "norm"?
Not reasonable.
> The bits per sample field covers this.  Set this to "64" and set
the
> data type to "float" and it "should just work"...
See my comment on the wiki:

    http://wiki.xiph.org/Talk:OggPCM

Most importantly:

    "Please don't make determination of the data format depend on 
     multiple fields. Instead use an enumeration so that something 
     like little endian 16 bit PCM can be specifed as OGG_PCM_LE_PCM_16 
     and big endian 64 bit doubles can be specified as OGG_PCM_BE_FLOAT_64. 
     This scheme is far more transparent and self documenting. If the 
     format field is 8 bits, this scheme supports 256 formats; if its 16 
     bit it will support 65536 formats.
> >   c) I think having separate fields for things like signed/
> >      unsigned/float and bit width is a mistake. I would suggest
> >      instead a single field that encodes all this information
> >      in a enumeration. Ie:
> > 
> >          OGG_PCM_U8          /* Unsigned 8 bit */
> >          OGG_PCM_S8         /* Signed 8 bit. */
> >          OGG_PCM_S16
> >          OGG_PCM_S24
> >          OGG_PCM_S32
> >          OGG_PCM_FLOAT32
> >          OGG_PCM_FLOAT64
> > 
> >      and so on. This scheme makes it very difficult to get 
> >      signed/unsigned and bitwith messed up.
You didn't address this issue. Do you think it is unimportant?
> >   d) Don't bother implementing unsigned PCM for bit widths
> >      greater than 8 bits. No other common file format uses 
> >      it and those unsigned formats are a pain to work with.
> 
> Problem with this is inflexibility.  See, not ever application must 
> support every possible combination of formatting -
Exactly, a codec could support OGG_PCM_S16, OGG_PCM_FLOAT32 and thats
it. If the decoder in the codec wants to figure out if it supports the 
current file it can do:

    if (format != OGG_PCM_S16 && format != OGG_PCM_FLOAT32)
       ooops_we_dont_handle_this ("some error message");

This is far less error prone than:

    if (! (bitwdith == 16 && signed && data_format ==
OGG_PCM_PCM)
           || ! (bitwdith == 32 && data_format == OGG_PCM_FLOAT))
       ooops_we_dont_handle_this ("some error message");
> in fact, many will require a very small set of parameters going in,
My propsal has a small number of parameters; one. I don't thinks its
practical to have zero parameters. How this:

    switch (format)
    {   case OGG_PCM_S8 :
        case OGG_PCM_FLOAT32 :
        case OGG_PCM_FLOAT64 :
                /* ALl Ok. */
                break ;
                
        default:
                ooops_we_dont_handle_this ("some error message");
                break ;
        }

Its hard to get this wrong and its obvious when it is wrong.
> ie, "it must be float 
> of 16, 24, 32, or 64 bit"
There is no such thing as 16 and 24 bit float. 
> Implementors will never, very likely, implement 32-bit unsigned int, 
My point exactly. So why even make it possible? If that changes at some
point in the future add the enumeration.
> >   f) Encoding of channel information. In a two channel file,
> >      is the audio data a stereo image or two distinct mono
> >      channels? For a file with N (> 2) channels, are there 
> >      pairs of channels which should be considered as a stereo
> >      pairs or do you want to place these stereo pairs as 
> >      separate streams within a single ogg container? What
> >      about multi channel surround sound (there are a number
> >      of different formats like 5.1 and 7.1) or quadraphonic? 
> >      How are you going to specify which channel is which. 
> >      Being able to encode this stuff easily is **vital**.
> 
> I agree - this is something that wasn't on my radar until this morning 
> when MikeS was asking about the channel layout in Vorbis/FLAC.  How 
> would you suggest this data be included in the binary header?  I 
> honestly have no experience with anything other than mono and stereo.
I have little more experience than you. I sent invitations for people
to join this discussion to the music-dsp mailing list. I hope somebody
knowledgeable will show up.
> >   g) With things like surround sound, are you going to allow
> >      24 bit audio for the main stereo pair and 16 bits for
> >      the side channels? This might best be achieved using
> >      separate stream, but that would make channel information 
> >      all that more important. Is it useful to have PCM for the
> >      main stereo pair and say vorbis encoding for the side
> >      channels?
> 
> Do people really do such things as encode different channels with 
> different sample sizes (and, I assume, samplerates)?
Different bitwidth makes sense. You need to high dynamic range
on your main stereo signal, but probably not on the side channels.

Different sample rates also makes sense. If the main stereo pair
is sampled at 96kHz it makes sense to have the sub bass signal
(ie all the low frequencies) sampled at a much lower rate. For a
sub-bass signal 8kHz might be appropriate.
> I'd really like to prefer keeping a fixed samplesize/samplerate for all
> channels.  I really doubt any Ogg audio codec is going to get that 
> complicated anytime soon,
Really? What about a high quality Ogg video stream multiplexed with 
a 5.1 audio stream?
> Is there anything else you've thought of that we've missed?
Not yet, but we haven't heard from anyone else yet. I would like
to see input (or at least an OK) from a large number of people in
the audio field.

Erik
-- 
+-----------------------------------------------------------+
  Erik de Castro Lopo
+-----------------------------------------------------------+
'Unix beats Windows' - says Microsoft! 
http://blogs.zdnet.com/Murphy/index.php?p=459

Arc

2005-Nov-10 13:35 UTC

head link

[ogg-dev] OggPCM proposal feedback

On Thu, Nov 10, 2005 at 07:03:43PM +1100, Erik de Castro Lopo
wrote:> 
> WAV is usually little endian but there is also a (very rare) big endian 
> version. AIFF is usually little endian but also supports big endian 
> encoding. CAF, AU, IRCAM and a number of others support both endian-nesses
> equally.
This doesn't seem to be a large issue - a single bit in the header could 
specify it, 0=MSB, 1=LSB, or vice versa.

VorbisFile will export either endianness, this seems to be the end of 
this part of the debate.

>     "Please don't make determination of the data format depend on 
>      multiple fields. Instead use an enumeration so that something 
>      like little endian 16 bit PCM can be specifed as OGG_PCM_LE_PCM_16 
>      and big endian 64 bit doubles can be specified as OGG_PCM_BE_FLOAT_64.
>      This scheme is far more transparent and self documenting. If the 
>      format field is 8 bits, this scheme supports 256 formats; if its 16 
>      bit it will support 65536 formats.
You're still working with the philosophy of FourCC-world, where based on 
wether a plugin or application supports a 32-bit identifier you know if 
it either has full support or no support.

We aren't working by that philosophy.  We do not need to maintain an 
table of predefined formats, extended each time someone wants to use a 
new format, since no application needs to support any combination of 
encoding parameters.

Honestly, as far as I'm concerned unsigned samples can go away... almost 
nothing uses 8-bit samples anymore, and unsigned 8-bit even less so.

However, support for (ie) 48-bit-float should not have to be created, 
the values for how many bits to use and wether it's int or float should 
be seperate, as should the number of channels/etc.

On Thu, Nov 10, 2005 at 03:44:53PM +0800, illiminable
wrote:>
> I think this is the wrong approach, flac and other codecs operate on a
> tighter subset, because they have to perform complex transformations on the
> data, and supporting too many types increase complexity. A raw format
> essentially needs no processing, it just needs copying into a buffer that
> supports that type of data.
The complexity isn't increased by added flexibility, and that 
flexibility completely eliminates the same issue created from FLAC - 
FLAC was designed to losslessly support every common audio format, and 
yet, you find it's subset of formats too tight.

Don't you see the inherit issue here?  It comes back to someone deciding 
which formats should be valid, and which ones wont, and enforcing that 
by using an index# to a table of supported formats vs leaving it 
freeform for future implementors to use.

Changing the spec a bit, where the samplesize must be a multiple of 8 
and may not exceed 128bit (4-bit field), seems like something worthwhile 
to eliminate the padding issue.

But between float and int, why /not/ allow someone to do something 
insane like 96-bit audio?  20 years ago, we thought that 16 bit, or 
prehaps 24 bit, was the maximum we could do.  Why would anyone want more 
than 24 bit?  And yet, the issue was raised that 64-bit audio samples 
are nessesary.  In another 20 years, will people be arguing that 128bit 
samples are nessesary?  Or than 48bit is a good tradeoff between 32bit 
and 64bit?  

No - it does not increase complexity, nor does it impose any 
requirements on implementations, since instead of a 32-bit identifier we 
use the entire first packet of the stream to check for compatability.  

No, your media player does -NOT- have to support 256 channel audio, nor 
must it support audisonic, or 64-bit audio, etc.  There's no reason, 
however, to force everything into artificial, arbitrary limitations 
based on what we believe is reasonable for today.

If a media player only supports a subset of what the codec supports, 
that's completely fine and expected.

> I have little more experience than you. I sent invitations for people
> to join this discussion to the music-dsp mailing list. I hope somebody
> knowledgeable will show up.
There's a difference between experience and differences of design 
philosophy.  This isn't the issue of right or wrong, but two different 
styles of designing codecs.

Raw fourcc codecs are each setup for a different format, or small set of 
formats.  RIFF/WAVE uses a subset of formats, expecting all applications 
which support it's FourCC to understand all those formats.  Again, this 
is done under the concept that a codec should either be fully supported 
or unsupported.

Whereas, not all audio codecs are going to support even the subset that 
you provided (64-bit float, for example).  Nor are all applications 
which use Ogg going to support anything but 16-bit signed int, nor 
should they be expected to.

I think it's reasonable to do away with unsigned because modern codecs 
just aren't going to use it, but I'm not going to try to predict wether 
someone will want to use 48-bit audio, or 128-bit audio, and wether 
they'll use int or float.  

> Different bitwidth makes sense. You need to high dynamic range
> on your main stereo signal, but probably not on the side channels.
> 
> Different sample rates also makes sense. If the main stereo pair
> is sampled at 96kHz it makes sense to have the sub bass signal
> (ie all the low frequencies) sampled at a much lower rate. For a
> sub-bass signal 8kHz might be appropriate.
I think, for these, given Ogg's use of granulepos and the syncing 
complexity which allowing different channels to be different rates and 
sizes, this is something best left to muxed raw channels and have any 
codec which supports this draw from the different raw channels. 

> Not yet, but we haven't heard from anyone else yet. I would like
> to see input (or at least an OK) from a large number of people in
> the audio field.
I think this is good to emphasis - it's ok to support some combinations 
of formats which are not used, since they'll simply be ignored if 
they're infavorable to implement, but missing something nessesary is 
something we need to make sure not to do.

I've put a reduced config set on the wiki.

-- 

The recognition of individual possibility,
 to allow each to be what she and he can be,
  rests inherently upon the availability of knowledge;
 The perpetuation of ignorance is the beginning of slavery.

from "Die Gedanken Sind Frei": Free Software and the Struggle for Free
Thought
 by Eben Moglen, General council of the Free Software Foundation

Ralph Giles

2005-Nov-10 14:02 UTC

head link

[ogg-dev] OggPCM proposal feedback

On Thu, Nov 10, 2005 at 01:35:47PM -0800, Arc wrote:
> >     "Please don't make determination of the data format
depend on
> >      multiple fields. Instead use an enumeration so that something 
> >      like little endian 16 bit PCM can be specifed as
OGG_PCM_LE_PCM_16
> >      and big endian 64 bit doubles can be specified as
OGG_PCM_BE_FLOAT_64.
> >      This scheme is far more transparent and self documenting. If the 
> >      format field is 8 bits, this scheme supports 256 formats; if its
16
> >      bit it will support 65536 formats.
> 
> You're still working with the philosophy of FourCC-world, where based
on
> wether a plugin or application supports a 32-bit identifier you know if 
> it either has full support or no support.
It's not just the FourCC world; lots of unixland audio code works this 
way too. I agree with de Castro Lopo. Making a general format and then
saying people are free to only implement the subset they care about is
contrary to Xiph's design philosophy and will lead directly to 
interoperability issues. It's better to specify and require a useful
subset here.

 -r

Erik de Castro Lopo

2005-Nov-10 14:13 UTC

head link

[ogg-dev] OggPCM proposal feedback

Arc wrote:
> However, support for (ie) 48-bit-float should not have to be created, 
Where are you going to find a 48 bit float? Is there an IEEE
standard for that? I know some floating point DSP chips use
a 48 bit float internally, but if there is more than one they
are unlikely to be compatible and they certainly cannot be read
by standard CPUs without having pull each value apart into 
separate sign, mantissa and exponent and then recreating a host 
CPU compatible floating point value from that. 
> But between float and int, why /not/ allow someone to do something 
> insane like 96-bit audio? 
I think putting contraints on insane people is a good thing. 
It saves the rest of us a lot of grief.
> 20 years ago, we thought that 16 bit, or 
> prehaps 24 bit, was the maximum we could do.
The only place I've ever heard of 96 bit anything is in image
processing where R, G and B were each encoded as a float.

If the RGB explanation is not what you are thinking about, do
you realise that 96 bits gives a dynamic range of over 500 
decibels? That means that if the largest sample corresponds to
1000 volts, the smallest sample will correspond to 5e-26 volts.
I think you'll find that this is less than the voltage on an
electron which is approx. 1e-19 volts.
> And yet, the issue was raised that 64-bit audio samples 
> are nessesary.
A couple of points:

   - 64 bit float is supported by most CPUs, 96 bit and 48 bits
     are not.

   - 64 bit float is used to prevent rounding errors in
     calculations. If you have a program that needs to pass
     data to another program (via a file) for processing and 
     then get the data back (via a file again) for further
     processing, 64 bit float is a sensible option.
     > This isn't the issue of right or wrong, but two different 
> styles of designing codecs.
See my comments above re insanity.

Erik
-- 
+-----------------------------------------------------------+
  Erik de Castro Lopo
+-----------------------------------------------------------+
"Unix and C are the ultimate computer viruses." -- Richard P Gabriel

John Koleszar

2005-Nov-10 14:26 UTC

head link

[ogg-dev] OggPCM proposal feedback

Arc wrote:
>You're still working with the philosophy of FourCC-world, where based on
>wether a plugin or application supports a 32-bit identifier you know if 
>it either has full support or no support.
>
>We aren't working by that philosophy.  We do not need to maintain an 
>table of predefined formats, extended each time someone wants to use a 
>new format, since no application needs to support any combination of 
>encoding parameters.
>  
>You could pack the data into the low order part of a 32 bit word and 
treat the upper part as extended data, then people could use it like an 
enumeration if they want to. I think that this is different from the 
FourCC issue we've argued elsewhere though. The point of fully 
specifying everything is to allow applications to operate on data types 
that weren't known when they were created. It's for forward 
compatability of the applications, not the format. Arguing about whether 
to specify fields for everything now or to create enumerations for 
everything later has little bearing on what the format will be able to 
hold in the future. Actually, requiring fields constrains you a bit, 
because you have to identify all the relevant fields up front and hope 
that nobody invents a new one. Also, it's really hard to write 
applications that will operate sensibly on data types that haven't been 
invented yet, so striving for forward compatibility of the applications 
probably isn't worth much effort. Every audio lib and driver I've seen 
uses an enumeration to describe the data, and I'm sure plenty of smart 
people have had this discussion before. In the end, either way is fine 
with me, since they both fit my limited purposes, but I think this is a 
small issue compared to how to support multiple sampling rates.

Kyungjoon Lee

2005-Nov-11 08:47 UTC

head link

[ogg-dev] OggPCM proposal feedback

On 11/11/05, Arc <arc@xiph.org> wrote:> Honestly, as far as I'm concerned unsigned samples can go away...
almost
> nothing uses 8-bit samples anymore, and unsigned 8-bit even less so.Speech samples used for speech analysis are commonly in 8bit mono. If
I wanted to put them in Ogg for easy editing/muxing/whatever, I would
definitely want them to be supported.

Just my two cents. :)

Reasonably Related Threads

Search for more reasonably related threads

ogg dev - Nov 2005 - OggPCM proposal feedback

[ogg-dev] OggPCM proposal feedback

[ogg-dev] OggPCM proposal feedback

[ogg-dev] OggPCM proposal feedback

[ogg-dev] OggPCM proposal feedback

[ogg-dev] OggPCM proposal feedback

[ogg-dev] OggPCM proposal feedback

Reasonably Related Threads