thr3ads.net - opus - [opus] Antw: Re: Antw: [EXT] Opus merging streams [Apr 2022]

If this information is useful, please help other people find it:
Share via:

Andrew Sonzogni

2022-Apr-06 07:47 UTC

[opus] Antw: Re: Antw: [EXT] Opus merging streams

Thanks for the info !

Then I have a question. What kind of mixer algorithms can be used to mix 
3 different channel together on an embedded system ?

I've used this one but it's not THAT good: (chan1 + chan2 + chan3) / 3

The output signal may peak or be buggy some times.

For your information, I'm using an ARM M4F with Opus configured like 
this (40ms, 16kHz, 16 bitrate, 0 compres).


Kind regards,

Andrew

Le 06/04/2022 ? 08:36, Sampo Syreeni a ?crit?:> On 2022-04-06, Ulrich Windl wrote:
>
>> Incidentially I came across a Dolby Atmos demo that had about 118 
>> channels wirh 24bit audio at 48kHz, all in one huge WAV file yesterday.
>
> Is that even a legitimate encoding?! What the fuck.
>
>> When I tried to play that (in plain stereo) with audiacity, even my 
>> fast computer (i7 at 4GHz) had dropouts. So I can imagine that 
>> decoding a large number of channels and mixing those seems to be a 
>> bad idea.
>
> It is. Which is why my favourite ambisonics exists (sales pitch): it's 
> a principled and nigh entropically speaking optimum way to fold down a 
> static central soundfield down to a number of channels. Third order, 
> so sixteen channels, seems to be upto the task for *any* central 
> isotropic soundfield at all, and the system yields to static 
> optimization.
>
> I cannot for the life of me understand why Atmos exists. Except for 
> patent patent law or something like that. If it was used to express a 
> live gaming or augmented reality setup, with arbitrary auditory 
> parallax, I could get the point. But that's not what Atmos or even 
> Dolby AC-4 are about. They just encode a static scene -- in a way 
> *much* more complicated and heavier on the processor than a
"simple"
> third degree periphonic ambisonic HOA signal set would be, and in a 
> manner not amenable to low resource optimizations in surround sound. 
> The object based encoding simply seems stupid and superfluous.-- 

	
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.xiph.org/pipermail/opus/attachments/20220406/e983e1da/attachment.htm>

Ulrich Windl

2022-Apr-06 08:22 UTC

head link

[opus] Antw: Re: Antw: [EXT] Opus merging streams

>>> Andrew Sonzogni <andrew at safehear.fr> schrieb am 06.04.2022
um 09:47 inNachricht <f7a34d74-0026-1d71-6823-62454988aa33 at
safehear.fr>:> Thanks for the info !
> 
> Then I have a question. What kind of mixer algorithms can be used to mix 
> 3 different channel together on an embedded system ?
> 
> I've used this one but it's not THAT good: (chan1 + chan2 + chan3)
/ 3
Obviously if you have an overflow first, then dividing by three won't help,
I
guess.

chan1/3 + chan2/3 + chan3/3 ?
> 
> The output signal may peak or be buggy some times.
> 
> For your information, I'm using an ARM M4F with Opus configured like 
> this (40ms, 16kHz, 16 bitrate, 0 compres).
Regards,
Ulrich
> 
> 
> Kind regards,
> 
> Andrew
> 
> Le 06/04/2022 ? 08:36, Sampo Syreeni a ?crit :
>> On 2022-04-06, Ulrich Windl wrote:
>>
>>> Incidentially I came across a Dolby Atmos demo that had about 118 
>>> channels wirh 24bit audio at 48kHz, all in one huge WAV file
yesterday.
>>
>> Is that even a legitimate encoding?! What the fuck.
>>
>>> When I tried to play that (in plain stereo) with audiacity, even my
>>> fast computer (i7 at 4GHz) had dropouts. So I can imagine that 
>>> decoding a large number of channels and mixing those seems to be a 
>>> bad idea.
>>
>> It is. Which is why my favourite ambisonics exists (sales pitch):
it's
>> a principled and nigh entropically speaking optimum way to fold down a 
>> static central soundfield down to a number of channels. Third order, 
>> so sixteen channels, seems to be upto the task for *any* central 
>> isotropic soundfield at all, and the system yields to static 
>> optimization.
>>
>> I cannot for the life of me understand why Atmos exists. Except for 
>> patent patent law or something like that. If it was used to express a 
>> live gaming or augmented reality setup, with arbitrary auditory 
>> parallax, I could get the point. But that's not what Atmos or even 
>> Dolby AC-4 are about. They just encode a static scene -- in a way 
>> *much* more complicated and heavier on the processor than a
"simple"
>> third degree periphonic ambisonic HOA signal set would be, and in a 
>> manner not amenable to low resource optimizations in surround sound. 
>> The object based encoding simply seems stupid and superfluous.
> -- 
> 
>

Sampo Syreeni

2022-Apr-06 10:37 UTC

head link

[opus] Antw: Re: Antw: [EXT] Opus merging streams

On 2022-04-06, Andrew Sonzogni wrote:
> I've used this one but it's not THAT good: (chan1 + chan2 + chan3)
/ 3
> 
> The output signal may peak or be buggy some times.
That is an age old question. In general, if you assume the channels are 
statistically independent, and you want each channel to contribute 
equally and maximally, you scale each by the square root of the number 
of channels. That solution mathematically mostly goes through under the 
central limit theorem, and exactly so if each of the sources are 
Gaussian distributed. In general you can assume that with sampled real 
life signals, because of reverberation and another application of the 
central limit theorem, but in case of "digital" signals and in 
particular if you sum many channels which are coherent (contain more or 
less the same stuff), you'll fall off the ladder.

Personally I'd advocate for a pure sum with no scaling. But that sort of 
thing is kind of opinionated: it only really works if you have dynamics 
to spare and also an absolute scaling of the incoming waveforms over 
your whole signal chain. That is, each of the source waveforms at 
digital full scale are supposed to represent an absolute amplitude of, 
say, 96 dB(Z).

When you do this sort of a gain architecture, anything coming in goes 
out just as it was, unless modified. If it comes in at 24-bit fullscale, 
it'll literally break not your ears but also your windows. So you'll 
never actually put in anything near fullscale. You *might* put in 
combinations of sounds which then -- if incoherent with each other -- 
will usually scale by the square root law. Which is fine, as long as you 
keep your output fullscale to something which will break stuff. Say, do 
24 bits and set the digital fullscale at something like 130dB SPL in 
your monitors.

This by the way is how it was and is done in cinematic audio editing. 
The full dynamics shatter your bones and ass, in absolute SPL range. 
Anything lesser is just added linearly into the mix. The added bonus of 
this setup is that you actually get to use amplitude as an extra 
variable which isn't normalized away by someone turning a knob or 
putting in an extra compressor in the signal pathway. You can actually 
compose with volume, instead of assuming everybody takes your dynamics 
away at will. (Obviously in many cases they will do just that, but trust 
you me, composing in absolute amplitude leaves *so* much more dynamics 
and nuance for the rest of them idjits to churn up.)
> For your information, I'm using an ARM M4F with Opus configured like 
> this (40ms, 16kHz, 16 bitrate, 0 compres).
Just unpack and sum. You really, *really* don't want to see how an 
optimized algorithm for something like this would be. It'd take a 
cross-compile into a fully functional language, some heady CS-minded 
meta-coding concepts, and a week of computation to find the precise 
weaving-in of the functional forms. At best. It's nowhere *near* the 
best use of your time, especially since you can optimize the simple 
decode-sum-recode(?) cycle already.


As an example of the latter kind of optimization, if you have a number 
of unsynchronized opus streams decoding, you can rather easily land the 
output in circular buffers of double the block size. The code for 
updating such a buffer is classic, as is reading from it with arbitrary 
delay upto half the buffer size. The code to SIMD-vectorize arbitrary 
byte spanning reads from such a buffer is also classic, and can be found 
even from the better implementations of memcpy(). The only thing you 
need to do then is to weave your code together so that it more or less 
advances a word/cacheline/whatever-unit a time in each buffer, linearly, 
and sums over the inputs synchronously, without unduly polluting your 
CPU's data cache. It's nasty even then, but highly doable, and
that's
about as fast you can go with any library imaginable (any deeper 
algorithm would require recoding the Opus library in a functional 
language, and weaving its internals together with your buffering algo).
-- 
Sampo Syreeni, aka decoy - decoy at iki.fi, http://decoy.iki.fi/front
+358-40-3751464, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2

opus - Apr 2022 - Antw: Re: Antw: [EXT] Opus merging streams

[opus] Antw: Re: Antw: [EXT] Opus merging streams

[opus] Antw: Re: Antw: [EXT] Opus merging streams

[opus] Antw: Re: Antw: [EXT] Opus merging streams