>True, but there is one critical place where it's necessary to mix atleast>two streams--when someone's trying to break into a stream. If speakerA>goes on and on and speaker B (or C, D, E, F...) wants to interject or >interrupt, who do they do it without inband without mixing?It doesn't have to be done that way. You can simply have the server echo the voice streams back to the various clients. Leave the job of mixing the sounds to the sound device (ex: DirectSound or sound hardware) from multiple streams. In most conversations when one person starts and another stops people tend to stop speaking otherwise you cannot understand either person.>The 'obvious' solution seems to be run N processes to detect 'speech' >or important audio content on the incoming N streams. Pick on or >two that need output, then mix and recode them.Again not recommended as it has a major impact on total latency of the voice stream to decode, mix and recode at the server to only decode again at the client.>If the detection is >done in the client, then the servers job is much simpler--arbitrate, >mix, and encode. Since the overlap periods of the mixing are going >to be infrequent and discontinuous, you don't have to be sample >exact--no stream synchronization required.Additionally you should never upstream voice from clients to the server that aren't transmitting. You write code to detect transmission. -----Original Message----- From: owner-speex-dev@xiph.org [mailto:owner-speex-dev@xiph.org] On Behalf Of David Willmore Sent: Thursday, November 20, 2003 5:43 PM To: speex-dev@xiph.org Subject: Re: [speex-dev] Server based audio merge> I tend to disagree. It normal human conversation it wouldn't makemuch> sense to have 2 people talking over each other at the same time.Thus,> it most scenarios you would have only one talker anyway.Additionally,> encode->decode/mix/encode->decode isn't a very efficient CPU processfor> a server, it's complicated to keep timing correct and it has anegative> impact on total latency.True, but there is one critical place where it's necessary to mix at least two streams--when someone's trying to break into a stream. If speaker A goes on and on and speaker B (or C, D, E, F...) wants to interject or interrupt, who do they do it without inband without mixing?> The overhead required to mix merge and re-encode is usually not worth > the benefit as in most situations you are not really saving any > bandwidth.But the options are *don't transcode* and *always transcode*. Switching between them is difficult to do on the fly. The 'obvious' solution seems to be run N processes to detect 'speech' or important audio content on the incoming N streams. Pick on or two that need output, then mix and recode them. If the detection is done in the client, then the servers job is much simpler--arbitrate, mix, and encode. Since the overlap periods of the mixing are going to be infrequent and discontinuous, you don't have to be sample exact--no stream synchronization required. So, I'd say any maching that can decode two streams, encode one stream, and do a little overhead should be able to act as a server. Hey, the client has to be able to encode in speex in real time, anyway, why waste that effort? Cheers, David --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'speex-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered. <p>--- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'speex-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.
> I tend to disagree. It normal human conversation it wouldn't make much > sense to have 2 people talking over each other at the same time. Thus, > it most scenarios you would have only one talker anyway. Additionally, > encode->decode/mix/encode->decode isn't a very efficient CPU process for > a server, it's complicated to keep timing correct and it has a negative > impact on total latency.True, but there is one critical place where it's necessary to mix at least two streams--when someone's trying to break into a stream. If speaker A goes on and on and speaker B (or C, D, E, F...) wants to interject or interrupt, who do they do it without inband without mixing?> The overhead required to mix merge and re-encode is usually not worth > the benefit as in most situations you are not really saving any > bandwidth.But the options are *don't transcode* and *always transcode*. Switching between them is difficult to do on the fly. The 'obvious' solution seems to be run N processes to detect 'speech' or important audio content on the incoming N streams. Pick on or two that need output, then mix and recode them. If the detection is done in the client, then the servers job is much simpler--arbitrate, mix, and encode. Since the overlap periods of the mixing are going to be infrequent and discontinuous, you don't have to be sample exact--no stream synchronization required. So, I'd say any maching that can decode two streams, encode one stream, and do a little overhead should be able to act as a server. Hey, the client has to be able to encode in speex in real time, anyway, why waste that effort? Cheers, David --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'speex-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.
Hi Allen, <p>>>True, but there is one critical place where it's necessary to mix at> least two streams--when someone's trying to break into a stream. If speaker >>goes on and on and speaker B (or C, D, E, F...) wants to interject or >>interrupt, who do they do it without inband without mixing? > It doesn't have to be done that way. You can simply have the server > echo the voice streams back to the various clients. Leave the job of > mixing the sounds to the sound device (ex: DirectSound or sound > hardware) from multiple streams.Yes, but the notmal client doesn't have the bandwith for that. Audio should nearly cost no bandwith, because we have all the videostream that must also go over the connection. So Audio should perhaps not be more then 8 kbits. Not sure if this is posible, but this is a very importanbt issue.>>The 'obvious' solution seems to be run N processes to detect 'speech' >>or important audio content on the incoming N streams. Pick on or >>two that need output, then mix and recode them. > Again not recommended as it has a major impact on total latency of the > voice stream to decode, mix and recode at the server to only decode > again at the client.You can zip it ;-)).> Additionally you should never upstream voice from clients to the server > that aren't transmitting. You write code to detect transmission.Sure. So the server can detect if mixing is necessary. Best Regards, <p><p>Carsten --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'speex-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.
There's no perfect solution to the multiple client problem. Each approach has advantages and drawbacks: 1) Mixing at the server - Allows a constant bandwidth for every client - Allows compatibility with regular VoIP prones - Requires transcoding, even when only on person is talking - Higher bit-rate required for the general case (one speaker is talking) 2) Sending multiple streams - Possible to do without a server at all - Best quality (no transcoding) - Non-constant bandwidth Jean-Marc -- Jean-Marc Valin, M.Sc.A., ing. jr. LABORIUS (http://www.gel.usherb.ca/laborius) Université de Sherbrooke, Québec, Canada -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 190 bytes Desc: Ceci est une partie de message numériquement signée Url : http://lists.xiph.org/pipermail/speex-dev/attachments/20031121/0033822d/signature-0001.pgp
Hi David, hi all,> True, but there is one critical place where it's necessary to mix at least > two streams--when someone's trying to break into a stream. If speaker A > goes on and on and speaker B (or C, D, E, F...) wants to interject or > interrupt, who do they do it without inband without mixing?Exactly. I had to read this before my last reply ;-)).> But the options are *don't transcode* and *always transcode*. Switching > between them is difficult to do on the fly.The server should be smart enaugh to check, if a package can be distributed without merge, or not.> So, I'd say any maching that can decode two streams, encode one stream, > and do a little overhead should be able to act as a server.Exactly. Best Regards, <p><p>Carsten --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'speex-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.