J.K. Lin (jk@pageshare.com) wrote:>
> Hi:
>
> I am new to speex and I am evaluating the possibility of using
> Speex for web conferencing (pretty big scale). It looks very promising.
It's worked really well for me. I think you'll find it to be a
great codec.
> I have some questions, maybe very naive, but please help me:
>
>
> 1) Is there any sample implementation using Speex in web conerencing
> in voice? To be more specific, in Windows platforms? (ActiveX?
> Java applet implementation?)
I created a program like this for Windows, but the source is not
[yet?] available to the public - it's still very much a work in
progress. I'm willing to share information and code snippets
though, given specific questions.
> 2) What is the recomended bandwidth? 4kbs (in one way)? 2 kbs?
> (The shown samples sound pretty good at 4kbs.)
I recommend the wideband mode (16kHz) for really nice quality speech.
Telling the codec to use 10-15kbps seems to work well for CBR. VBR
with the quality set around 6.0 is also nice, consuming roughly
4-23kbps, although the average would be pretty low most of the time...
> 3) What is the recomended buffer? 1 second or 2 seconds?
That's way too large for an interactive conversation. I've been
experimenting with different buffer sizes and my current favorite is
40ms. Sending 40ms of audio over the wire results in a delay of
roughly 40ms+transmission delay+playback latency+codec latency.
Some typical numbers that I'm experiencing so far would be around:
40ms packetization delay (packet rate of 25 packets per second)
30ms transmission delay (typical broadband-to-broadband 1-way time)
60ms playback latency (not too sure about this one, might be lower)
34ms codec latency (does this overlap with packetization delay...?)
-----
164ms total latency
I'm not an expert at this stuff so take these numbers with a grain of
salt, and if anyone has comments on them please let me know.
> 4) What would happen if sound packets are dropped (time shift
> in different computer clock speeds)? What if some
> packet holes have to be filled? (repeating the previous packet?)
I'm not sure about this "time shift in different computer clock
speeds" thing you're talking about. Your program should using a
timing mechanism such that it operates independently of computer
clock speed.
But, in the event of packet loss or delay, you can use the packet
loss concealment feature of Speex as Jean-Marc suggested. However,
if you have Speex make up for a packet you don't have, you should
probably be careful to avoid subsequently decoding that packet if
it arrives late (as it probably will)...
> 5) Any otther issues that I should pay attention to?
Use UDP. Don't use TCP. You get less packet overhead (which can
be really important at high packet rates) and you get better
performance. Actually, there's also RTP, but I don't know much
about that yet. I'm pretty sure it's layered on top of UDP and
you'd have to get an RTP library from somewhere to use it (or
maybe it's simple enough to be implemented without too much
work...?)
Packet overhead. As you increase your packet rate, you decrease
one of the latency factors (packetization delay). However, you
also increase bandwidth wasted by packet overhead. The IP headers
contain 20 bytes, and then UDP uses an additional 8 bytes. If you
have a user on a dialup modem, the PPP headers will use an
additional 5-7 bytes. That's a total of 28-35 bytes PER PACKET.
At a rate of 25 packets per second, that's 700-875 bytes per second,
or 5-7kbps, which gets significant for a dialup modem user.
Communications protocol. This handles call setup, teardown,
audio transmission, format negotiation, and whatever else you may
like. The question is, which protocol should you use? It seems
that the two popular ones are H323 and SIP. They are large and
complicated standards but if you want your program to be
interoperable with other programs, you should use one (or both?)
Personally, I just made my own [simple] proprietary protocol.
Maybe some day I'll go for interoperability but I'm not there yet.
Preprocessing. Sometimes there's a strong bass signal present
in a recording from a mic. It can be caused by vibrations or air
flow or just ambient noise. It's really helpful to remove this
bass before encoding the audio. It makes the codec's job easier
(I think?) and, more importantly, is much easier on the ears of
someone listening with headphones... To remove bass from a signal,
you can run it through a high-pass filter using convolution. It's
not as hard as it might sound. There's an excellent book on DSP
techniques available online for free at:
http://www.analog.com/Analog_Root/static/technology/dsp/training/materials/dsp_book_index.html
Echo cancellation. If someone in the conversation is using
speakers, the other person (or people) will hear an echo of their
own voice(s) as the sound travels from the speakers back into a
microphone. This is really annoying. I usually try to get people
to wear headphones. It's possible to do echo cancellation in
software, but it's really hard. This would really be a killer
feature for the Speex codec to provide, if that's possible... :)
> 6) Anybody did it and I can learn from?
Sure. I'm not sure if this sort of thing is on-topic for this
list though...?
Tom
--- >8 ----
List archives: http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to
'speex-dev-request@xiph.org'
containing only the word 'unsubscribe' in the body. No subject is
needed.
Unsubscribe messages sent to the list will be ignored/filtered.