Hello, We (Google) have been experimenting with configuration and adjustments to CELT-only Opus that give good results for compressing ambisonic audio signals [1]. Based on our results so far, we would like to use Opus to encode spatial audio. We hope to make it easy/possible to use libopus with other common tools and software modules (ffmpeg/libav in particular). Based on my reading of the libopus code and the IETF spec, it seems one reasonable option would be to add a new "Channel Mapping Family" for ambisonic audio [2]. The mapping family would indicate to the decoder that the audio is ambisonics and the channel mapping array would indicate which ambisonic channel (W, X, Y, etc) corresponds to which coded stream. This representation is analogous to Opus headers for surround sound. There are a few caveats though. Although we believe we can achieve good compression at first without changing the bitstream or the decoder, we would like the flexibility to potentially modify both if potential improvements are compelling enough to impress you (specifically, we have a pre/post transform that would require sending compressible side information). Would changing either the bitstream or encoder require adding yet another channel mapping? Would it require a new Opus version number? To summarize, should we add a new channel mapping for ambisonics? If not, what should we do? [1] https://en.wikipedia.org/wiki/Ambisonics [2] https://tools.ietf.org/html/draft-ietf-codec-oggopus-14#section-5.1.1 Thanks for your input, Michael Graczyk -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xiph.org/pipermail/opus/attachments/20160417/f70da8bc/attachment.html>
Michael Graczyk wrote:> Based on my reading of the libopus code and the IETF spec, it seems one > reasonable option would be to add a new "Channel Mapping Family" for > ambisonic audio [2]. The mapping family would indicate to the decoder > that the audio is ambisonics and the channel mapping array would > indicate which ambisonic channel (W, X, Y, etc) corresponds to which > coded stream. This representation is analogous to Opus headers for > surround sound.Yes, this all sounds good.> There are a few caveats though. Although we believe we can achieve good > compression at first without changing the bitstream or the decoder, we > would like the flexibility to potentially modify both if potential > improvements are compelling enough to impress you (specifically, we have > a pre/post transform that would require sending compressible side > information). Would changing either the bitstream or encoder require > adding yet another channel mapping? Would it require a new Opus version > number?I'm assuming that the pre/post transform would be something like a decorrelating transform on the channels, and the actual core parts of Opus would not change. For that, I think an extra channel mapping is a reasonable approach, without requiring a new version number. We've talked about using the Opus padding as a place to store extra side information in the past. I haven't thought deeply about the trade-offs between that and other approaches, like using an invalid TOC sequence to add additional packet data, as we did with the MPEG TS embedding (<https://wiki.xiph.org/OpusTS>). The difficulty with padding is that it prevents transparent repacketization, because the padding occurs once per Opus packet, while side information is typically required once per Opus frame (of which there can be several in an Opus packet). The invalid TOC sequence approach would put the data in its own Opus packet entirely, which might have other drawbacks.> To summarize, should we add a new channel mapping for ambisonics? If > not, what should we do?That seems like the best approach. See also <https://www.iana.org/assignments/opus-channel-mapping-families/opus-channel-mapping-families.xhtml>. As long as there is a specification somewhere (this does not have to be an IETF specification, though it could be), we can add it to this list.
On 04/17/2016 10:29 PM, Michael Graczyk wrote:> Based on my reading of the libopus code and the IETF spec, it seems one > reasonable option would be to add a new "Channel Mapping Family" for > ambisonic audio [2]. The mapping family would indicate to the decoder > that the audio is ambisonics and the channel mapping array would > indicate which ambisonic channel (W, X, Y, etc) corresponds to which > coded stream. This representation is analogous to Opus headers for > surround sound.Right. This shouldn't be very hard, and mostly just involve deciding on which order you want to name the channels.> There are a few caveats though. Although we believe we can achieve good > compression at first without changing the bitstream or the decoder, we > would like the flexibility to potentially modify both if potential > improvements are compelling enough to impress you (specifically, we have > a pre/post transform that would require sending compressible side > information). Would changing either the bitstream or encoder require > adding yet another channel mapping? Would it require a new Opus version > number?So the important question is "can an existing decoder (assuming it knows about the ambisonics channel mapping) decode your file?". If the answer is yes, then it's just an encoder-only change and it's easy to add to the code base. If the answer is no, then it's actually a bitstream change. A bitstream change would either have to go through the IETF process or (if it's just a pre/post transform) at the very least have a formal definition somewhere. In either case, that's a lot more work than just a new mapping or even an encoder-only change.> To summarize, should we add a new channel mapping for ambisonics? If > not, what should we do?The first step is definitely to add a new channel mapping for ambisonics. Cheers, Jean-Marc
Hi, Ambisonics are good for the recording but what is about the playback? In addition to an Ambisonic channel mapping, isn't it worthwhile to think about some object based audio coding, too? But then again, that would be much more work than just adding a new channel mapping. Christian> Michael Graczyk <mgraczyk at google.com> hat am 18. April 2016 um 04:29 > geschrieben: > > > Hello, > > We (Google) have been experimenting with configuration and adjustments to > CELT-only Opus that give good results for compressing ambisonic audio > signals [1]. Based on our results so far, we would like to use Opus to > encode spatial audio. We hope to make it easy/possible to use libopus with > other common tools and software modules (ffmpeg/libav in particular). > > Based on my reading of the libopus code and the IETF spec, it seems one > reasonable option would be to add a new "Channel Mapping Family" for > ambisonic audio [2]. The mapping family would indicate to the decoder that > the audio is ambisonics and the channel mapping array would indicate which > ambisonic channel (W, X, Y, etc) corresponds to which coded stream. This > representation is analogous to Opus headers for surround sound. > > There are a few caveats though. Although we believe we can achieve good > compression at first without changing the bitstream or the decoder, we > would like the flexibility to potentially modify both if potential > improvements are compelling enough to impress you (specifically, we have a > pre/post transform that would require sending compressible side > information). Would changing either the bitstream or encoder require adding > yet another channel mapping? Would it require a new Opus version number? > > To summarize, should we add a new channel mapping for ambisonics? If not, > what should we do? > > > [1] https://en.wikipedia.org/wiki/Ambisonics > [2] https://tools.ietf.org/html/draft-ietf-codec-oggopus-14#section-5.1.1 > > > Thanks for your input, > Michael Graczyk > _______________________________________________ > opus mailing list > opus at xiph.org > http://lists.xiph.org/mailman/listinfo/opus-- Symonics GmbH Geierweg 25 72144 Dußlingen Tel +49 7072 8006100 Fax +49 7072 8006109 Email: christian.hoene at symonics.com Geschäftsführer/President: Dr. Christian Hoene Sitz der Gesellschaft/Place of Business: Tübingen Registereintrag/Commercial Register: Amtsgericht Stuttgart, HRB 739918
On Tue, Apr 19, 2016 at 10:20 AM, Jean-Marc Valin <jmvalin at jmvalin.ca> wrote:> So the important question is "can an existing decoder (assuming it knows > about the ambisonics channel mapping) decode your file?". If the answer > is yes, then it's just an encoder-only change and it's easy to add to > the code base. If the answer is no, then it's actually a bitstream > change. A bitstream change would either have to go through the IETF > process or (if it's just a pre/post transform) at the very least have a > formal definition somewhere. In either case, that's a lot more work than > just a new mapping or even an encoder-only change.That makes sense. For now I will focus on encoder only changes. If an adaptive pre/post transform had to send side information, would it also need to go through the IETF process?> The first step is definitely to add a new channel mapping for ambisonics.Great, I'll write up something precise and send it out. On Tue, Apr 19, 2016 at 10:19 AM, Timothy B. Terriberry <tterribe at xiph.org> wrote:> I'm assuming that the pre/post transform would be something like a > decorrelating transform on the channels, and the actual core parts of Opus > would not change. For that, I think an extra channel mapping is a reasonable > approach, without requiring a new version number. > > We've talked about using the Opus padding as a place to store extra side > information in the past. I haven't thought deeply about the trade-offs > between that and other approaches, like using an invalid TOC sequence to add > additional packet data, as we did with the MPEG TS embedding > (<https://wiki.xiph.org/OpusTS>). > > The difficulty with padding is that it prevents transparent repacketization, > because the padding occurs once per Opus packet, while side information is > typically required once per Opus frame (of which there can be several in an > Opus packet). The invalid TOC sequence approach would put the data in its > own Opus packet entirely, which might have other drawbacks.Thanks for the link. It looks like that embedding carries metadata. Would it be possible to include something like residual coded transform coefficients that way?> That seems like the best approach. See also > <https://www.iana.org/assignments/opus-channel-mapping-families/opus-channel-mapping-families.xhtml>. > As long as there is a specification somewhere (this does not have to be an > IETF specification, though it could be), we can add it to this list.Great, I will use these as a guide to writing a more precise description of the ambisonic mapping. -- Thanks, Michael Graczyk
On Tue, Apr 19, 2016 at 10:34 AM, Christian Hoene <christian.hoene at symonics.com> wrote:> Hi, > > Ambisonics are good for the recording but what is about the playback?I think playback is a separate concern. You will have to interpret the ambisonics at some point, analogous to downmixing a 5.1 Opus stream to stereo for playback.> In addition to an Ambisonic channel mapping, isn't it worthwhile to think about > some object based audio coding, too?Yes, it would be nice to have a way to include multiple mono channels, for example. I have not though too much about that but maybe that would be something to keep in mind.