I am interested in being able to encode a single Opus stream using several CPU cores. I get a raw audio input and "opusenc" can transcode it at 1200% speed (Raspberry PI 3B+). It saturates a single CPU core, but the other three are idle. Is out there any project to add multithreading options to "opusenc", or something in that line? Looking around, I have found this: https://github.com/enzo1982/superfast#superfast-codecs https://hydrogenaud.io/index.php?topic=114598.0 <https://github.com/enzo1982/superfast/blob/master/doc/SuperFast%20Codecs.pdf> Is it out there any other multithreaded "opusenc" drop in replacement?. Any plan for future "opusenc" improvement in this area? Thanks. -- Jesús Cea Avión _/_/ _/_/_/ _/_/_/ jcea at jcea.es - https://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ Twitter: @jcea _/_/ _/_/ _/_/_/_/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: OpenPGP digital signature URL: <http://lists.xiph.org/pipermail/opus/attachments/20200330/704fb035/attachment.sig>
I'm not aware of any other attempts, and there have never been official plans. It's difficult to partition input for opus at anything other than the track level, because of the way the decoder derives its adaptive state from recently-seen audio. I guess cutting together streams with at least an 80ms overlap wouldn't glitch too much? You could probably do something to try different encoding options for each block in parallel to use more cores, but it might be hard to get good scaling. Most people just encode multiple streams simultaneously if they want better throughput. FWIW, -r On 2020-03-29 5:47 p.m., Jesus Cea wrote:> I am interested in being able to encode a single Opus stream using > several CPU cores. > > I get a raw audio input and "opusenc" can transcode it at 1200% speed > (Raspberry PI 3B+). It saturates a single CPU core, but the other three > are idle. > > Is out there any project to add multithreading options to "opusenc", or > something in that line? > > Looking around, I have found this: > > https://github.com/enzo1982/superfast#superfast-codecs > https://hydrogenaud.io/index.php?topic=114598.0 > <https://github.com/enzo1982/superfast/blob/master/doc/SuperFast%20Codecs.pdf> > > Is it out there any other multithreaded "opusenc" drop in replacement?. > Any plan for future "opusenc" improvement in this area? > > Thanks. > > > _______________________________________________ > opus mailing list > opus at xiph.org > http://lists.xiph.org/mailman/listinfo/opus >
On 30/3/20 23:17, Ralph Giles wrote:> I'm not aware of any other attempts, and there have never been official > plans. It's difficult to partition input for opus at anything other than > the track level, because of the way the decoder derives its adaptive > state from recently-seen audio. I guess cutting together streams with at > least an 80ms overlap wouldn't glitch too much?According to the Opus standard, after 80 ms the encoding would converge. That is, only the previous 80 ms of audio would be needed to get a perfectly merged stream. You could play safe and do, lets say, 200 ms overlapping. For example, read https://wiki.xiph.org/OggOpus . Efficiency is quite nice: 1 second fragments: 92% 10 seconds fragments: 99.2%> Most people just encode multiple streams simultaneously if they want > better throughput.I agree, and that would be my usual approach. But in this particular case I only have a stream that needs to be processed as fast as possible. Seeing three idle CPU cores hurts my engineer brain :). -- Jesús Cea Avión _/_/ _/_/_/ _/_/_/ jcea at jcea.es - https://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ Twitter: @jcea _/_/ _/_/ _/_/_/_/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: OpenPGP digital signature URL: <http://lists.xiph.org/pipermail/opus/attachments/20200331/b3e64645/attachment.sig>
I can't think of any good way to multithread the encoder without ruining efficiency, because so much efficiency is gained from holding state over, from the good old MP2 days all the way up to now. Running at 12x even on an RPi makes it very questionable whether to spend *any *engineer resources in that direction, too. On desktop systems it's closer to 100x for single-threaded stereo CELT encoding. You could easily separate decoding/SRC and encapsulating onto their own threads, but I doubt that would gain you more than 10% or so, since those are normally very fast operations. Splitting the encoder stages into individual threads seems like it would be promising, even if the referenced PDF abandoned the idea due to implementation effort, but you rapidly run into the problem of memory contention: So much time is wasted moving data between cores and caches and main memory. While real throughput goes slightly up, electrical efficiency goes way down. If fidelity is less important than raw speed, then just chopping the stream into X chunks and encoding each on its own thread will work; the shorter the chunks, the more wasted data, but at least it generally works. As jm says, there may be compatibility issues. You might be able to get around that with a combination of FEC packets and marking other packets lost, instead of stitching, but I'm not sure if relying on FEC would even be more or less compatible than relying on stitching. As for the direct question, no, there's no drop-in replacement available. Opusenc could use a general cleanup, so if someone threw some basic pthreads into that effort it might well be accepted. Em On Sun, Mar 29, 2020 at 5:57 PM Jesus Cea <jcea at jcea.es> wrote:> I am interested in being able to encode a single Opus stream using > several CPU cores. > > I get a raw audio input and "opusenc" can transcode it at 1200% speed > (Raspberry PI 3B+). It saturates a single CPU core, but the other three > are idle. > > Is out there any project to add multithreading options to "opusenc", or > something in that line? > > Looking around, I have found this: > > https://github.com/enzo1982/superfast#superfast-codecs > https://hydrogenaud.io/index.php?topic=114598.0 > < > https://github.com/enzo1982/superfast/blob/master/doc/SuperFast%20Codecs.pdf > > > > Is it out there any other multithreaded "opusenc" drop in replacement?. > Any plan for future "opusenc" improvement in this area? > > Thanks. > > -- > Jesús Cea Avión _/_/ _/_/_/ _/_/_/ > jcea at jcea.es - https://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ > Twitter: @jcea _/_/ _/_/ _/_/_/_/_/ > jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/ _/_/ _/_/ > "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ > "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ > "El amor es poner tu felicidad en la felicidad de otro" - Leibniz > > _______________________________________________ > opus mailing list > opus at xiph.org > http://lists.xiph.org/mailman/listinfo/opus >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xiph.org/pipermail/opus/attachments/20200331/afffc63e/attachment.html>