thr3ads.net - Speex dev - [speex-dev] Server based audio merge [Aug 2004]

If this information is useful, please help other people find it:
Share via:

Allen Drennan

2004-Aug-06 15:01 UTC

[speex-dev] Server based audio merge

>True, but there is one critical place where it's necessary to mix at
least>two streams--when someone's trying to break into a stream.  If speaker
A>goes on and on and speaker B (or C, D, E, F...) wants to interject or
>interrupt, who do they do it without inband without mixing?
It doesn't have to be done that way.  You can simply have the server
echo the voice streams back to the various clients.  Leave the job of
mixing the sounds to the sound device (ex: DirectSound or sound
hardware) from multiple streams.

In most conversations when one person starts and another stops people
tend to stop speaking otherwise you cannot understand either person.
>The 'obvious' solution seems to be run N processes to detect
'speech'
>or important audio content on the incoming N streams.  Pick on or
>two that need output, then mix and recode them. 
Again not recommended as it has a major impact on total latency of the
voice stream to decode, mix and recode at the server to only decode
again at the client.
>If the detection is
>done in the client, then the servers job is much simpler--arbitrate,
>mix, and encode.  Since the overlap periods of the mixing are going
>to be infrequent and discontinuous, you don't have to be sample 
>exact--no stream synchronization required.
Additionally you should never upstream voice from clients to the server
that aren't transmitting.  You write code to detect transmission.

-----Original Message-----
From: owner-speex-dev@xiph.org [mailto:owner-speex-dev@xiph.org] On
Behalf Of David Willmore
Sent: Thursday, November 20, 2003 5:43 PM
To: speex-dev@xiph.org
Subject: Re: [speex-dev] Server based audio merge
> I tend to disagree.  It normal human conversation it wouldn't make
much> sense to have 2 people talking over each other at the same time.
Thus,> it most scenarios you would have only one talker anyway.
Additionally,> encode->decode/mix/encode->decode isn't a very efficient CPU
process
for> a server, it's complicated to keep timing correct and it has a
negative> impact on total latency.
True, but there is one critical place where it's necessary to mix at
least
two streams--when someone's trying to break into a stream.  If speaker A
goes on and on and speaker B (or C, D, E, F...) wants to interject or
interrupt, who do they do it without inband without mixing?
> The overhead required to mix merge and re-encode is usually not worth
> the benefit as in most situations you are not really saving any
> bandwidth.
But the options are *don't transcode* and *always transcode*.  Switching
between them is difficult to do on the fly.

The 'obvious' solution seems to be run N processes to detect
'speech'
or important audio content on the incoming N streams.  Pick on or
two that need output, then mix and recode them.  If the detection is
done in the client, then the servers job is much simpler--arbitrate,
mix, and encode.  Since the overlap periods of the mixing are going
to be infrequent and discontinuous, you don't have to be sample 
exact--no stream synchronization required.

So, I'd say any maching that can decode two streams, encode one stream,
and do a little overhead should be able to act as a server.

Hey, the client has to be able to encode in speex in real time, anyway,
why waste that effort?

Cheers,
David
--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to
'speex-dev-request@xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is
needed.
Unsubscribe messages sent to the list will be ignored/filtered.

<p>--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to
'speex-dev-request@xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is
needed.
Unsubscribe messages sent to the list will be ignored/filtered.

David Willmore

2004-Aug-06 15:01 UTC

head link

[speex-dev] Server based audio merge

> I tend to disagree.  It normal human conversation it wouldn't make much
> sense to have 2 people talking over each other at the same time.  Thus,
> it most scenarios you would have only one talker anyway.  Additionally,
> encode->decode/mix/encode->decode isn't a very efficient CPU
process for
> a server, it's complicated to keep timing correct and it has a negative
> impact on total latency.
True, but there is one critical place where it's necessary to mix at least
two streams--when someone's trying to break into a stream.  If speaker A
goes on and on and speaker B (or C, D, E, F...) wants to interject or
interrupt, who do they do it without inband without mixing?
> The overhead required to mix merge and re-encode is usually not worth
> the benefit as in most situations you are not really saving any
> bandwidth.
But the options are *don't transcode* and *always transcode*.  Switching
between them is difficult to do on the fly.

The 'obvious' solution seems to be run N processes to detect
'speech'
or important audio content on the incoming N streams.  Pick on or
two that need output, then mix and recode them.  If the detection is
done in the client, then the servers job is much simpler--arbitrate,
mix, and encode.  Since the overlap periods of the mixing are going
to be infrequent and discontinuous, you don't have to be sample 
exact--no stream synchronization required.

So, I'd say any maching that can decode two streams, encode one stream,
and do a little overhead should be able to act as a server.

Hey, the client has to be able to encode in speex in real time, anyway,
why waste that effort?

Cheers,
David
--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to
'speex-dev-request@xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is
needed.
Unsubscribe messages sent to the list will be ignored/filtered.

Carsten Breuer

2004-Aug-06 15:01 UTC

head link

[speex-dev] Server based audio merge

Hi Allen,

<p>>>True, but there is one critical place where it's necessary
to mix at> least two streams--when someone's trying to break into a stream.  If
speaker
>>goes on and on and speaker B (or C, D, E, F...) wants to interject or
>>interrupt, who do they do it without inband without mixing?
> It doesn't have to be done that way.  You can simply have the server
> echo the voice streams back to the various clients.  Leave the job of
> mixing the sounds to the sound device (ex: DirectSound or sound
> hardware) from multiple streams.
Yes, but the notmal client doesn't have the bandwith for that.
Audio should nearly cost no bandwith, because we have all the 
videostream that must also go over the connection. So Audio should 
perhaps not be more then 8 kbits. Not sure if this is posible,
but this is a very importanbt issue.
>>The 'obvious' solution seems to be run N processes to detect
'speech'
>>or important audio content on the incoming N streams.  Pick on or
>>two that need output, then mix and recode them. 
> Again not recommended as it has a major impact on total latency of the
> voice stream to decode, mix and recode at the server to only decode
> again at the client.
You can zip it ;-)).
> Additionally you should never upstream voice from clients to the server
> that aren't transmitting.  You write code to detect transmission.
Sure. So the server can detect if mixing is necessary.

Best Regards,

<p><p>Carsten

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to
'speex-dev-request@xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is
needed.
Unsubscribe messages sent to the list will be ignored/filtered.

Jean-Marc Valin

2004-Aug-06 15:01 UTC

head link

[speex-dev] Server based audio merge

There's no perfect solution to the multiple client problem. Each
approach has advantages and drawbacks:

1) Mixing at the server
- Allows a constant bandwidth for every client
- Allows compatibility with regular VoIP prones
- Requires transcoding, even when only on person is talking
- Higher bit-rate required for the general case (one speaker is talking)

2) Sending multiple streams
- Possible to do without a server at all
- Best quality (no transcoding)
- Non-constant bandwidth

        Jean-Marc


-- 
Jean-Marc Valin, M.Sc.A., ing. jr.
LABORIUS (http://www.gel.usherb.ca/laborius)
Université de Sherbrooke, Québec, Canada


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 190 bytes
Desc: Ceci est une partie de message numériquement signée
Url :
http://lists.xiph.org/pipermail/speex-dev/attachments/20031121/0033822d/signature-0001.pgp

Carsten Breuer

2004-Aug-06 15:01 UTC

head link

[speex-dev] Server based audio merge

Hi David, hi all,
> True, but there is one critical place where it's necessary to mix at
least
> two streams--when someone's trying to break into a stream.  If speaker
A
> goes on and on and speaker B (or C, D, E, F...) wants to interject or
> interrupt, who do they do it without inband without mixing?
Exactly. I had to read this before my last reply ;-)).
> But the options are *don't transcode* and *always transcode*. 
Switching
> between them is difficult to do on the fly.
The server should be smart enaugh to check, if a package can be 
distributed without merge, or not.
> So, I'd say any maching that can decode two streams, encode one stream,
> and do a little overhead should be able to act as a server.
Exactly.

Best Regards,

<p><p>Carsten

--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to
'speex-dev-request@xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is
needed.
Unsubscribe messages sent to the list will be ignored/filtered.

Reasonably Related Threads

Search for more apparently analagous threads

Speex dev - Aug 2004 - Server based audio merge

[speex-dev] Server based audio merge

[speex-dev] Server based audio merge

[speex-dev] Server based audio merge

[speex-dev] Server based audio merge

[speex-dev] Server based audio merge

Reasonably Related Threads