On 2/28/07, Ivo Emanuel Gon?alves <justivo@gmail.com> wrote:> On 2/28/07, Ralph Giles <giles@xiph.org> wrote: > > Well, there are todo pages at wiki.xiph.org, but I meant more in the > > community folklore sense. My point is a roadmap doesn't help much unless > > there are people committed to making things happen. That's been the > > problem with a lot of this stuff, and why it's been so nice to see the > > ambisonics work happening. > > The situation on Ambisonics is tricky, because it depends on someone > coding a whole API for the different Xiph projects AND Monty being > available to apply whatever changes are need in Vorbis.I have been giving some thought to how to include Ambisonics in Ogg Vorbis. There is a question at the end, so please plough on. As I understand it, all that is needed is some machine parseable metadata to identify the audio data as being Ambsionics. The channel coupling wont be optimal and the phase may get a bit munged (Ambisonics is big on low-frequency phase), but it will work. And the missing bits can then be worked on in Ghost at peoples' leisure. Now, Vorbis comments aren't intended for machine parseable metadata, so the metadata will need to go in the Ogg container as a separate (chained) stream. This scheme will not only work for Ogg Vorbis, but for Ogg <anything>. There currently isn't a standard for a metadata stream to go into Ogg, but there is a draft standard at: http://wiki.xiph.org/index.php/Metadata According to this draft standard, all I need to do is to invent some XML which includes the required information, and we are away. Now for the question; how much did I get wrong? Many thanks, Martin -- Martin J Leese E-mail: martin.leese@stanfordalumni.org Web: http://members.tripod.com/martin_leese/
Martin Leese wrote:> On 2/28/07, Ivo Emanuel Gon?alves <justivo@gmail.com> wrote: > >> On 2/28/07, Ralph Giles <giles@xiph.org> wrote: >> > Well, there are todo pages at wiki.xiph.org, but I meant more in the >> > community folklore sense. My point is a roadmap doesn't help much >> unless >> > there are people committed to making things happen. That's been the >> > problem with a lot of this stuff, and why it's been so nice to see the >> > ambisonics work happening. >> >> The situation on Ambisonics is tricky, because it depends on someone >> coding a whole API for the different Xiph projects AND Monty being >> available to apply whatever changes are need in Vorbis. > > I have been giving some thought to how to > include Ambisonics in Ogg Vorbis. There is a > question at the end, so please plough on. > > As I understand it, all that is needed is some > machine parseable metadata to identify the > audio data as being Ambsionics. The channel > coupling wont be optimal and the phase may > get a bit munged (Ambisonics is big on > low-frequency phase), but it will work. And the > missing bits can then be worked on in Ghost > at peoples' leisure. > > Now, Vorbis comments aren't intended for > machine parseable metadata, so the metadata > will need to go in the Ogg container as a > separate (chained) stream. This scheme will > not only work for Ogg Vorbis, but for Ogg > <anything>. There currently isn't a standard > for a metadata stream to go into Ogg, but > there is a draft standard at: > http://wiki.xiph.org/index.php/Metadata > > According to this draft standard, all I need to > do is to invent some XML which includes the > required information, and we are away. > > Now for the question; how much did I get wrong? >It depends what your aim is. The mapping type in the vorbis setup header is meant for this[1],[2]. Of course a nonzero mapping type will cause a lot of players to give up, but so will including the XML stream. I believe this is how is was intended multi-channel would be handled. As you say using a separate metadata stream would allow all codecs to use the same scheme, but the codec would need to communicate this with the muxer if it wanted to use knowledge of the mapping. Vorbis and OggPCM have their own mapping information, which also means that they can be put in containers other than Ogg without losing the mapping. (I think FLAC does too.) If you want to go the separate metadata route there's the choice of metadata stream. Skeleton[3],[4] is already implemented in some places and typically contains metadata relevant to stream decoding. This is mainly temporal information, but also, "allows for attachment of message header fields given as name- value pairs that contain some sort of protocol messages about the logical bitstream, e.g. the screen size for a video bitstream or the number of channels for an audio bitstream."[5] The metadata split that seems to be emerging is decode related stuff goes in skeleton and other metdata (e.g. indexing) goes into CMML/currently- non-existent-XML-streams[6],[7]. Without knowing what you need the metadata to record (I assume it can be fairly strictly defined?) I'd say of the two metadata approaches going the Skeleton route is the easier task here. It avoids needing to parse XML and Skeleton is more strictly defined as being in the right place for decode steup. [1]<http://www.xiph.org/vorbis/doc/Vorbis_I_spec.html#id2510452> [2]<http://lists.xiph.org/pipermail/vorbis-dev/2007-February/018697.html>. [3]<http://wiki.xiph.org/index.php/Ogg_Skeleton> [4]<http://annodex.net/TR/draft-pfeiffer-annodex-02.html#anchor8> [5]<http://annodex.net/TR/draft-pfeiffer-annodex-02.html#anchor6> [6] And vorbiscomments for the basic TITLE, ARTIST, etc. stuff. [7] This is probably because: a) work has been done on Skeleton, b) it's more obvious what decode related information is needed and how it should be used. -- imalone
Paul Martin <pm@nowster.zetnet.co.uk> wrote:> Getting stereo from Ambisonics is a very simple job -- just a matter of > which matrix you multiply with. The crudest way would be to use L=(W-X)/2 > and R=(W+X)/2 and ignore all other channels.Looks like a typo crept in. The X-channel points forward, so I suspect you meant Y which points left. As I explain at: http://wiki.xiph.org/index.php/Ambisonics#Default_channel_conversions_from_B-Format "Starting from B-Format, it is possible to synthesize any mic response pointing in any direction. Hence, it is possible to synthesize all coincident stereo mic techniques." What is suggested (with Y instead of X) will work, but is just one possible mix from an infinite set. Regards, Martin -- Martin J Leese E-mail: martin.leese@stanfordalumni.org Web: http://members.tripod.com/martin_leese/
Gregory Maxwell gmaxwell@gmail.com wrote>It would be nice if someone with a working ambisonic playback rigwould give me some feedback on the decodes of coupled encodes. :) Has anyone done this; tried coupled encode / decoder of ambisonic signals? I get the impression from this forum, including Monty's post, that no one has tried ANY coupling on a multi-channel file. And only point and lossless are in present use for stereo. Monty said 8 phase & 4 phase coupling were once used in the dim & distant past (for stereo) but not anymore. Does anyone know how much improvement in packing efficiency they give? After Monty's post, Lossy coupling on Ambi files is VERBOTTEN ! -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.5.463 / Virus Database: 269.5.6/770 - Release Date: 20/04/07 18:43
Paul Martin wrote :> If you haven't got a true Ambisonic setup, you can get a flavour by using the following script to convert a 4 channel .amb to a 4 channel "square" speaker arrangement. (The latest SoX has a bug in the mixer function, which stops it taking 16 numbers.)> sox -V -S $1 -c 4 -3 -r 48000 $2 \ mixer \ 0.3536,0.3536,0.3536,0.3536,\0.1768,0.1768,-0.1768,-0.1768,\ 0.1768,-0.1768,-0.1768,0.1768,\ 0,0,0,0 \ rabbit stat> This then gives you a file with the channels in the format Left Front, Right Front, Left Rear, Right Rear.This is a "cardioid" decode; fairly non-descript. Used mainly for large area performance. If your material is meant to be used at home, an "Energy" decoder is better and can be obtained by replacing +-0.1768 with +- 0.25 in the above script. "SHELF FILTERS for Ambisonic Decoders" from www.ambisonia.net\Members\ricardo Gregory Maxwell wrote :>Gains for lossless packing were pretty high. Higher than for typicalstereo files.. at least on the ambisonic files I tested.>From the nature of Ambisonic signals, I expect at least 2:1 gain if not more for lossless multichannel coupling of Ambisonic B-format.-- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.5.463 / Virus Database: 269.5.7/771 - Release Date: 21/04/07 11:56
> > sox -V -S $1 -c 4 -3 -r 48000 $2 \ mixer \ 0.3536,0.3536,0.3536,0.3536,\ > 0.1768,0.1768,-0.1768,-0.1768,\ 0.1768,-0.1768,-0.1768,0.1768,\ 0,0,0,0 \ > rabbit stat > > > This then gives you a file with the channels in the format Left Front, Right Front, Left Rear, Right Rear. > > This is a "cardioid" decode; fairly non-descript. Used mainly for > large area performance. If your material is meant to be used at > home, an "Energy" decoder is better and can be obtained by > replacing +-0.1768 with +- 0.25 in the above script.> Are you sure? Amb files store the W channel at 3dB down.Please read "SHELF FILTERS for Ambisonic Decoders" from www.ambisonia.net\Members\ricardo for a "simple" explanation. The full theoretical treatement is in "General Metatheory ... " which is referred to in the above paper. Bear in mind that "General Metatheory... " doesn't use WXYZ in the same way as conventional Ambisonics. -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.5.463 / Virus Database: 269.5.10/774 - Release Date: 23/04/07 17:26
Richard Lee wrote:> Please read > "SHELF FILTERS for Ambisonic Decoders" from > www.ambisonia.net\Members\ricardo for a "simple" explanation.http://www.ambisonia.com/Members/ricardo ^^^ seems to work better... -- Tuomo ... Why is the alphabet in that order? Is it because of that song?
> I have done. How does that relate to my pointing out the fact thatthe W channel is stored 3dB down in an .amb file? Well, it changes the decode, doesn't it? In "SHELF FILTERS ... ", I use WXYZ in its strict Ambisonic sense as defined in http://www.york.ac.uk/inst/mustech/3d_audio/secondor.html Please use these strict definitions when talking about Ambi decodes. I don't use anything else. If I do, I point it out clearly.>By the way, that document boils down to putting a low-pass filter onthe W channel. No. What my document says is that if you make these very simple changes to the decoding (and proper Shelf filters are simple only if you are a DSP guru) you will get better localisation. See "Localization in Horizontal-Only Ambisonic Systems" - Benjamin, Lee & Heller AES oct06 San Francisco from www.ai.sri.com/ajh/ambisonics has the experimental evidence. Your decoder is OK but an Energy decoder is better and an Energy decoder with Shelf filters is better still. -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.5.463 / Virus Database: 269.6.0/775 - Release Date: 24/04/07 17:43
Paul Martin wrote :>> In "SHELF FILTERS ... ", I use WXYZ in its strict Ambisonic sense as defined inhttp://www.york.ac.uk/inst/mustech/3d_audio/secondor.html>You're dancing round the question. The specification of an AMB file says that the W channel will be stored at a level of -3dB.http://www.ambisonia.com/Members/etienne/Members/mleese/file-format-for-b-format> "The W channel is attenuated by -3 dB (1/sqrt(2)) for all orders. That is to say, a source at 45 degrees azimuth (zero elevation) wouldproduce equal gains in W, X, and Y."> This means that my decode *is* the "Energy" one.You've just described an ENCODING issue. Your DECODER> 0.3536,0.3536,0.3536,0.3536,\ > 0.1768,0.1768,-0.1768,-0.1768,\ > 0.1768,- 0.1768,-0.1768,0.1768,\as Sebastian Olter points out is a "Cardioid" decoder. Its OK but an "Energy" decoder is better and an "Energy" decoder with Shelf filters is better still. -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.5.463 / Virus Database: 269.6.1/776 - Release Date: 25/04/07 12:19
> It's interesting that Richard Lee's document has the note:> IMPORTANT CORRECTION> The rectangular decoders in the early 5mar06 edition is (sic) wrong.Please destroy any copies of that document. This referred to a slightly inaccurate version of the RECTANGULAR decoder which was superseded by Aaron's exact solution. It has no bearing on what we are discussing on Square decoders. __________________> If the W channel is stored in the AMB file at a reduced level, and I'm decoding from an AMB file, do you expect me to ignore the effect of that reduced level of the W channel when decoding?YES. You have to assume the AMB file is correct. W is NOT "stored in the AMB file at a reduced level". The signal in the AMB file IS W. The correct Rationalised Energy Square decoder is LB = W' - 0.7X' + 0.7Y' etc Eqn 4.2 Rationalised Square Decoder from "SHELF FILTERS ..." or a scaled version of this. This gives best results if you do not use Shelf filters. I have asked Martin Leese to correct his http://www.ambisonia.com/Members/etienne/Members/mleese/file-format-for-b-format page to clarify all this. -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.5.463 / Virus Database: 269.6.1/777 - Release Date: 26/04/07 15:23
Not meaning to distract from the more constructive conversation.... I assume, somewhere, someone has a compendium of recommended hardware for ambisonics? Something to sanity check/inform equipment selections of someone building My First Ambisonics Rig? I can roll my own of course, but I like to avoid mistakes others have already made. Monty
Dear Martin,> http://www.ambisonia.com/Members/etienne/Members/mleese/file-format-for-b-format> "The W channel is attenuated by -3 dB (1/sqrt(2)) for all orders. That is to say, a source at 45 degrees azimuth (zero elevation) would produce equal gains in W, X, and Y."> As others have said, the available documentation is either opaque or uses conflicting notation.Could you change the above statement on your page to say "The W channel is a perfect omnidirectional microphone whose response is -3dB with respect to the on-axis response of the X, Y and Z signals. A source at 45 degrees azimuth with zero elevation would produce equal signals in W, X and Y." The original statement implies that there is an "original" W which is then attenuated to something which is not proper W in B-format. This is causing lots of confusion in the Vorbis decoder discussions both on the forum and in private. Thanks Richard -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.5.467 / Virus Database: 269.6.1/778 - Release Date: 27/04/07 13:39
"Gregory Maxwell" <gmaxwell@gmail.com> wrote:> On 4/27/07, xiphmont@xiph.org <xiphmont@xiph.org> wrote: > > (eg, the FAQ doesn't answer a few simple questions like 'should I be > > using monopole speakers, or are bipoles better like in other > > systems?') > > Yech. Other than a few (possibly crazy?) people, no one recommends > dipole speaker for ambisonic playback... they don't work, at least > mathematically. There has been some argument based on a few people's > experience that in some rooms they might work okay, but it's not > entirely clear why. > > Small, full range, flat monitors, are probably best. Small, because > big is the enemy of being able to place them correctly. Consistent > phase response is important. (i.e. reversing phase on a driver breaks > it completely). Because the goal is to reconstruct the soundfield, > consistent and predictable speaker behavior is useful.Gregory's advice is sound. The work that suggested using dipole speakers used all dipoles, front as well as back. One pole of each speaker faced the centre of the room and the other the wall, on which was stuff to absorb or disperse the reflections. This is entirely different from the use of dipoles in conventional surround sound. So, in brief, use monopoles as Gregory suggests. The key is that all speakers cooperate to localise a sound, so all speakers are equally important. They must be phased matched; the easiest way of doing this is to use identical units. As Gregory suggests, take care with wiring; if one speaker phase is reversed then all is lost. For the same reason, if you use different power-amps, make sure all are non-inverting (or all inverting). Placement in the room can also matter. Left-right symmetry of the room is more important than front-back. When it works you will know. The soundfield will "gel" and your ears will relax. Regards, Martin -- Martin J Leese E-mail: martin.leese@stanfordalumni.org Web: http://members.tripod.com/martin_leese/