Drew Allen
2018-Apr-10 17:52 UTC
[opus] [PATCH] opus-tools/opusfile: Support for Ambisonics
Friendly ping for supporting ambisonics in opus-tools & opusfile Please LMK any ?s you might have. Cheers, Drew -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xiph.org/pipermail/opus/attachments/20180410/fd4a3709/attachment-0001.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: opus-tools-Support-for-Ambisonics.patch Type: text/x-patch Size: 13182 bytes Desc: not available URL: <http://lists.xiph.org/pipermail/opus/attachments/20180410/fd4a3709/attachment-0002.bin> -------------- next part -------------- A non-text attachment was scrubbed... Name: opusfile-Support-Ambisonics.patch Type: text/x-patch Size: 20828 bytes Desc: not available URL: <http://lists.xiph.org/pipermail/opus/attachments/20180410/fd4a3709/attachment-0003.bin>
Timothy B. Terriberry
2018-May-25 23:43 UTC
[opus] [PATCH] opus-tools/opusfile: Support for Ambisonics
Drew Allen wrote:> Friendly ping for supporting ambisonics in opus-tools & opusfile > > Please LMK any ?s you might have.Sorry for the long delay on the review. The high-level design looks a lot better. I have a number of comments on the implementation details. I didn't really like the way that opus_head_projection_parse() turned out. I was originally thinking something more generic/future proof that could also be used for any other channel mapping families that want to define a custom channel mapping table. The way this is implemented in the patch you sent, it duplicates a lot of code from opus_head_parse(), the user still needs to call opus_head_parse() (so the work gets duplicated at runtime, too), and they have to pass in a buffer of exactly the right size (which means they need to know some details of the specific mapping family and how the demixing matrix works). I think a slightly better approach is along the lines of the opus_head_parse_ext() I originally suggested, but just return a pointer into the user-supplied packet instead of copying the data. For the simple case where the user is just going to pass the data immediately to opus_projection_decoder_create(), they don't need a separate buffer at all, and the parameters to return the pointer and the matrix size can be NULLable to make them optional, so we can implement opus_head_parse() simply by calling opus_head_parse_ext() and passing NULL for them (eliminating all of the duplicate code and work). Open question: Should opus_head_parse_ext() with NULL for the matrix and/or matrix size parameters succeed, or return an error (OP_EIMPL?) to let the application know that there is missing data they're not getting by using the old API (i.e., so they know they won't be able to decode that stream correctly)? If we do continue to return success, we should probably fill in the existing mapping[] array in OpusHead with something meaningful (e.g., 255 for all silence). Right now it is just copying a number of bytes out of the demixing matrix equal to the channel count, which doesn't make much sense. There are a few errors in the opus_head_projection_parse() implementation. For mapping family 253 you don't validate that the packet has at least 21 bytes before reading bytes 19 and 20. Although you have bumped up OP_NCHANNELS_MAX to 18, op_validate_ambisonic_channels() will actually succeed for channel counts much larger than that. I didn't see anything that would then prevent us from trying to decode such streams (and overflowing the statically-sized buffers based on OP_NCHANNELS_MAX). I think this should return OP_EIMPL like we do for mapping family 255. Finally, I don't want to use sqrt() and introduce a libm dependency for the fixed-point builds, so we should probably extract the ambisonics order from the channel count a different way. The next issue is that the demixing matrix cannot be stored directly in the OggOpusFile struct. In the case of a chained, seekable stream with multiple ambisonics projection links, we will parse all of the headers up front, and the matrix that gets stored will simply be the last one parsed, which will overwrite all of the others (and if they have different channel/stream counts, we could potentially use data from a mix of matrices of different sizes). The proper place to put this data is in the OggOpusLink structure, since there is one of these per link in a chained file. Since files can have a nearly unlimited number of links, and this matrix can be somewhat large, it probably makes sense to dynamically allocate it instead of using a fixed-size buffer, especially since most files probably won't be using ambisonics to begin with. Once we have dynamic allocation for the demixing matrix, I wonder if we shouldn't just bump OP_NCHANNELS_MAX up to 227 now. Otherwise whenever someone does want to produce some of these higher-order ambisonics streams, everyone will have to upgrade libopusfile to be able to play them. I also don't think you converted the check to see if we can re-use an existing decoder correctly. At a minimum it should ensure we want to create the same type of decoder and in the projection case, either verify the demixing matrices match or not attempt to re-use projection decoders (since this would require saving a copy of the demixing matrix to correctly handle the non-seekable case). I also think op_generic_decoder_init() should use op_use_projection() to decide which kind of decoder to create, to make sure it does not get out of sync with the rest of the code. At that point we're passing in enough pieces of an OggOpusLink that we should probably just pass the whole thing. I think we can also get rid of some #ifdefs if we always define op_use_projection(), but make it return 0 without OPUS_HAVE_OPUS_PROJECTION_H. Currently, it also looks like you are re-using the context variable decode_cb_ctx to save the user context for both the decode and projection_decode callbacks. So if you register both, the one you get passed depends on the order you register the callbacks, and gets passed to both callbacks. That behavior isn't documented in the API docs, and is a little confusing. I think the context used for each callback should be saved independently. There are a number of other minor nits about making the formatting consistent with the rest of the codebase and the use of consistent names for things (i.e., opus_projection_decode vs. op_decode_projection_cb_func, mapping_family vs. channel_mapping, od vs. st, etc.). Since some of the above points can be tricky (especially lifetime management for the dynamically allocated demixing matrices), enumerating all of the formatting nits is laborious, and because I wanted to keep things moving, I went ahead and addressed all of the above comments in a separate patch (attached). Please review, and if it looks okay, I'll commit a version squashed with your patch. For the purposes of the open question above, for users of the old opus_parse_head() API I'm currently returning success when we have a projection matrix, but I'm strongly leaning towards returning OP_EIMPL instead. Let me know what you think about raising the channel limit, also. -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-Review-comments-for-ambisonics-support.patch Type: text/x-patch Size: 51777 bytes Desc: not available URL: <http://lists.xiph.org/pipermail/opus/attachments/20180525/b8ef08f4/attachment-0001.bin>
Drew Allen
2018-Jun-04 16:08 UTC
[opus] [PATCH] opus-tools/opusfile: Support for Ambisonics
Hey Tim, Thank you for taking the time to revise and improve my patch. I agree to all the changes you've made here and if it suits you, I'm happy to sign off for submitting the patch to opusfile. Cheers, Drew On Fri, May 25, 2018 at 4:43 PM Timothy B. Terriberry <tterribe at xiph.org> wrote:> Drew Allen wrote: > > Friendly ping for supporting ambisonics in opus-tools & opusfile > > > > Please LMK any ?s you might have. > > Sorry for the long delay on the review. The high-level design looks a > lot better. I have a number of comments on the implementation details. > > I didn't really like the way that opus_head_projection_parse() turned > out. I was originally thinking something more generic/future proof that > could also be used for any other channel mapping families that want to > define a custom channel mapping table. The way this is implemented in > the patch you sent, it duplicates a lot of code from opus_head_parse(), > the user still needs to call opus_head_parse() (so the work gets > duplicated at runtime, too), and they have to pass in a buffer of > exactly the right size (which means they need to know some details of > the specific mapping family and how the demixing matrix works). > > I think a slightly better approach is along the lines of the > opus_head_parse_ext() I originally suggested, but just return a pointer > into the user-supplied packet instead of copying the data. For the > simple case where the user is just going to pass the data immediately to > opus_projection_decoder_create(), they don't need a separate buffer at > all, and the parameters to return the pointer and the matrix size can be > NULLable to make them optional, so we can implement opus_head_parse() > simply by calling opus_head_parse_ext() and passing NULL for them > (eliminating all of the duplicate code and work). > > Open question: Should opus_head_parse_ext() with NULL for the matrix > and/or matrix size parameters succeed, or return an error (OP_EIMPL?) to > let the application know that there is missing data they're not getting > by using the old API (i.e., so they know they won't be able to decode > that stream correctly)? If we do continue to return success, we should > probably fill in the existing mapping[] array in OpusHead with something > meaningful (e.g., 255 for all silence). Right now it is just copying a > number of bytes out of the demixing matrix equal to the channel count, > which doesn't make much sense. > > There are a few errors in the opus_head_projection_parse() > implementation. For mapping family 253 you don't validate that the > packet has at least 21 bytes before reading bytes 19 and 20. Although > you have bumped up OP_NCHANNELS_MAX to 18, > op_validate_ambisonic_channels() will actually succeed for channel > counts much larger than that. I didn't see anything that would then > prevent us from trying to decode such streams (and overflowing the > statically-sized buffers based on OP_NCHANNELS_MAX). I think this should > return OP_EIMPL like we do for mapping family 255. Finally, I don't want > to use sqrt() and introduce a libm dependency for the fixed-point > builds, so we should probably extract the ambisonics order from the > channel count a different way. > > The next issue is that the demixing matrix cannot be stored directly in > the OggOpusFile struct. In the case of a chained, seekable stream with > multiple ambisonics projection links, we will parse all of the headers > up front, and the matrix that gets stored will simply be the last one > parsed, which will overwrite all of the others (and if they have > different channel/stream counts, we could potentially use data from a > mix of matrices of different sizes). The proper place to put this data > is in the OggOpusLink structure, since there is one of these per link in > a chained file. Since files can have a nearly unlimited number of links, > and this matrix can be somewhat large, it probably makes sense to > dynamically allocate it instead of using a fixed-size buffer, especially > since most files probably won't be using ambisonics to begin with. > > Once we have dynamic allocation for the demixing matrix, I wonder if we > shouldn't just bump OP_NCHANNELS_MAX up to 227 now. Otherwise whenever > someone does want to produce some of these higher-order ambisonics > streams, everyone will have to upgrade libopusfile to be able to play them. > > I also don't think you converted the check to see if we can re-use an > existing decoder correctly. At a minimum it should ensure we want to > create the same type of decoder and in the projection case, either > verify the demixing matrices match or not attempt to re-use projection > decoders (since this would require saving a copy of the demixing matrix > to correctly handle the non-seekable case). I also think > op_generic_decoder_init() should use op_use_projection() to decide which > kind of decoder to create, to make sure it does not get out of sync with > the rest of the code. At that point we're passing in enough pieces of an > OggOpusLink that we should probably just pass the whole thing. > > I think we can also get rid of some #ifdefs if we always define > op_use_projection(), but make it return 0 without > OPUS_HAVE_OPUS_PROJECTION_H. > > Currently, it also looks like you are re-using the context variable > decode_cb_ctx to save the user context for both the decode and > projection_decode callbacks. So if you register both, the one you get > passed depends on the order you register the callbacks, and gets passed > to both callbacks. That behavior isn't documented in the API docs, and > is a little confusing. I think the context used for each callback should > be saved independently. > > There are a number of other minor nits about making the formatting > consistent with the rest of the codebase and the use of consistent names > for things (i.e., opus_projection_decode vs. > op_decode_projection_cb_func, mapping_family vs. channel_mapping, od vs. > st, etc.). > > Since some of the above points can be tricky (especially lifetime > management for the dynamically allocated demixing matrices), enumerating > all of the formatting nits is laborious, and because I wanted to keep > things moving, I went ahead and addressed all of the above comments in a > separate patch (attached). Please review, and if it looks okay, I'll > commit a version squashed with your patch. > > For the purposes of the open question above, for users of the old > opus_parse_head() API I'm currently returning success when we have a > projection matrix, but I'm strongly leaning towards returning OP_EIMPL > instead. Let me know what you think about raising the channel limit, also. >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xiph.org/pipermail/opus/attachments/20180604/4962cd20/attachment.html>