This came out of the OggPCM discussion, but I think it needs to be addressed on a wider scale. Let's start here, 5 years ago.. http://lists.xiph.org/pipermail/vorbis-dev/2000-July/009513.html (I included this email, below) I emailed David (author of that email) and asked him to join this list. I'm thinking, as I look at the problem, that surround sound needs to be defined _outside_ the audio codec stream. Comments are inappropriate, since we need to know if our apps can support a stream based on packet0. As Silvia mentioned recently, we've been lacking a multi-audio codec handling system for a long time. Wether that be for surround sound, prehaps with different samplerate/samplesize, or for alternative languages, or for recorded tracks which need to be mixed. Let's begin with the above email and continue from there. If this is to be outside of OggPCM, then it should be seperate from the thread for OggPCM beside not encoding the data within it's header. ================================================================================== Hi everyone, Over the last two weeks or so, I've been thinking about how to add surround sound to Ogg -- and more than that, to do it in the best way possible. With this in mind, I started considering using Ambisonic surround sound. The advantages of this format are considerable: a) It was developed in the early to mid '70s, so the patents should be expired by now. b) It's scalable -- it can handle everything from mono, to stereo, full horizontal surround, or full 360-degree spherical surround (periphonic). c) It's based on a sound mathematical foundation (unlike Dolby's 5.1 system), allowing you to calculate how it should be able to perform. d) Depending on how many channels you are willing to use (and the source of your material) you can make the positioning of sound in the sound field as accurate as you want. (First-order Ambisonics requires four channels for full periphonic surround, while second-order Ambisonics provide a larger 'sweet spot' with eight channels.) Since an Ogg stream can have up to 255 channels, the format itself will not be a limitation. e) It's compatible with existing decoder technology -- if your player does not support surround decoding, two of the channels can be decoded as M/S stereo. (Once patent issues regarding M/S stereo in digital audio encoding are worked out, that is... Monty thinks that M/S stereo may have been patented by Fraunhofer for audio compression uses.) Many articles further describing the format itself are available at www.ambisonic.net, but I'll try to give a brief overview here for those of you who are unfamiliar with it. Ambisonics is based on the principle of encoding spherical harmonic components of a sound field into one or more audio channels, which allow reproduction of that sound field later on with an array of speakers in a certain configuration. The spherical harmonics it reproduces are similar in shape to the various electron orbitals in an atom, if any of you are familiar with that. The first component, the 'W' signal, reproduces the pressure of the signal. (This would be 0th-order Ambisonics, and gives you mono. This takes the shape of the 'S' orbitals in an atom.) The next three signals are the directional components of the signal, which give you the 2- or 3-D spatial information. These are the X, Y, and Z signals. (These make up 1st-order Ambisonics, and have the shape of the 'P' orbitals in an atom. If you decode only the W and Y signals through an M/S decoder, you have stereo. If you decode WXY, you have horizontal surround. If you decode all four, you have periphonic surround. Anyway, you get the picture.) The next four signals describe the curvature of the signal (I think -- I don't understand this realm terribly well yet), and have the shape of the 'D' orbitals of an atom. These make up the signals of 2nd-order Ambisonics, which will, if used, increase the accuracy of the reproduced sound field considerably, and widen the 'sweet spot' by a fair amount. Higher-order Ambisonics are possible, but little research has been done to this date. All existing Ambisonic recordings of actual events are in first-order Ambisonic formats. Now that I've tried to describe what the Ambisonic system is (hopefully I didn't confuse you too much -- read some of the articles on the site I gave earlier for more in-depth information), I'll describe more specifically what I have in mind for the Ogg project. First of all, though there are several formats currently in use for distributing Ambisonic material (including one -- UHJ -- which is stereo compatible) I propose that we use Ambisonic B-format (using the harmonics directly -- we would have W, X, Y, etc. channels in the Ogg stream), as it is the easiest to decode (if we used another format like UHJ, the player would have to decode it to B-format before it could do more with it), is the oldest (it's been used since the early '70s, so any patents should have expired YEARS ago), and it's the most flexible. What I would propose would be a field for each track that is marked Ambisonic, of one byte. Of this byte, the first three bits would indicate which order of Ambisonics were being used, and the last five bits would indicate which signal in the hierarchy this stream represented. For example: (These are big-endian) Component Order Signal W 000 00000 X 001 00000 Y 001 00001 Z 001 00010 R 010 00000 etc. One additional format which may become important as time goes by is G-format, which is basically an Ambisonic signal pre-decoded for a standard DVD-type 5.1 speaker array. Once this begins to be used, we may want to incorporate a G-format-to-B-format converter into the encoder so we're again working with B-format signals. Even though no one anywhere is talking about using 7th-order Ambisonic signals, this is a good way of future-proofing the format, should any such system be developed. (If it was easier to implement, each of these fields could be a whole byte.) Once we have an Ambisonically-encoded Ogg stream, how do we play it? There will be two ways. The first way is the way Ambisonic material has traditionally been played -- the B-format signals are outputted directly, and fed to an outboard decoder module. This offers maximum flexibility, as this decoder could be a 128-speaker auditorium model. The drawbacks, however, would be showstoppers if this were the only playback method. (The biggest drawback is the need for everyone to have a decoder -- they aren't very common, so they're fairly expensive.) The second way is the way I propose that most of us will play this -- software decoding. The B-format components will be rematrixed in software into a certain number of speaker feeds, which will then be outputted to your amplifier and speaker array. You would create a configuration file (either manually or with some sort of GUI) which would tell the decoder about your speaker array (location, possibly frequency range as well), and then the decoder would make the appropriate adjustments to the signals to reproduce the sound field as accurately as possible. At first, I was thinking that this could be done using the four-channel output of a Sound Blaster Live (or similar card), feeding the 5.1 channel analog input of a home theater receiver. If sound cards are developed which integrate AC-3 encoding (allowing you a virtual 5.1 outputs which are encoded by a chip into an AC-3 or DTS stream), this could also be used. (We probably wouldn't be able to integrate AC-3 encoding directly into the player until the patent on it expires, which won't be for quite a while. I doubt Dolby would give us the free license to it we would need to do so -- especially since we would be competing with Dolby for surround mindshare!) Hopefully, we would also be able to drive two SBLive cards in parallel eventually, giving us 8-channels of output, but that remains to be seen. (The driver doesn't even support 4-channel output yet...) For now, however, independant of how fast the software decoding side of this develops, the format can at least be defined to create a framework for future work. One thing I've been keeping in mind during this process is that we don't necessarily have to be using the Vorbis codec for this -- once the Squish lossless codec is available, that could be used for any or all components of an Ambisonic signal. At this point, several things need to happen. a) The format need to be defined, preferably in such a way that it is as flexible as possible, and allows non-surround-capable decoders to decode the W-Y components as M/S stereo. b) I'm going to ask Richard Furse, who has already written a software Ambisonic encoder and decoder, if he will relicense these tools to us under the LGPL. If he agrees, this will make many things much easier and faster. c) The patent situation of Ambisonics in general, and particularly newer Ambisonic developments (G-format pre-decoded 5.1, 2nd- and higher-order Ambisonics in general, etc.) needs to be examined to ensure that we know of any patent pitfalls so they can be worked around. d) Work towards 4-channel output on the SBLive (and maybe Aureal Vortex, if they ever open their specs) drivers needs to proceed. e) Someone should look into how easy or hard it will be to output four channels under Windows -- we'll want to be able to for a future Winamp plugin. f) Once b) happens, or we write our own equivalents, functions will need to be either added to libvorbis, or possibly a standalone library, to work with Ambisonic material. (On the encoding side, UHJ-encoded material, such as all the CDs put out by Nimbus in England, will be decoded into B-format before it is encoded. A G-format-to-B-format decoder may also be useful, if DVD-Audio discs begin to be released in G-format. On the decoding side, we will probably want to put the software Ambisonic decoding in its own library to reduce bloat in the main plugin. The main plugin could output raw B-format without this extra plugin, but would output stereo by default.) I realize that this is a LOT to digest at once, but hopefully this will stimulate discussion about the future of surround support in the Ogg format, and get things off and running. I'm really excited about the whole Ogg project, and Vorbis in particular, and I think we can all look forward to an exciting future ahead... Watch out Dolby and DTS, here we come! David -- David Carter ** dcarter at sigfs.org ** dcarter at visi.com PGP Key 581CBE61: E07EE199C767C752 8A8B1A9F015BF2EA Key available by finger or www.keyserver.net ================================================================================== -- The recognition of individual possibility, to allow each to be what she and he can be, rests inherently upon the availability of knowledge; The perpetuation of ignorance is the beginning of slavery. from "Die Gedanken Sind Frei": Free Software and the Struggle for Free Thought by Eben Moglen, General council of the Free Software Foundation