I have a video conference like application that I've been working on for
a while now, and a recent change is causing some odd problems, and I was
wondering if anyone else had seen problems like this. The issue I'm
seeing is that when using the sound card for capture, the audio will
eventually get about 1-2 seconds out of synch (delayed), from the
video. However, if I use USB devices for capture and playback, the
delay disapears. To make life more complex, using the sound card only
causes the delay on some systems, but not all systems. On my two main
development boxes I see no problems with either USB or mini-jack on the
soundboard, but on one of our other test machines only the USB works.
It feels to me like it might be an audio clocking issue, but it could
also be a speed of processing issue. Has anyone seen this at all, and
if so did anything help it? Is there anything I can change in the speex
codec to speed things up, in case it's a speed of processing issue? I
am seeing a fair number of problem records coming out of the speex
jitter buffer, but they might just mean that data isn't being fed in
fast enough into the buffer. I'm checking:
speex_jitter_get(&speexJitter, (short *)newData, NULL);
if (speexJitter.valid_bits == 0) //bad record
{
fprintf(stderr, "Interpolating since nothing happened!!!!\n");
fflush(stderr);
}
Does this say I'm processing too slowly into the buffer, or that the
data put in is somehow corrupt?
The application is Windows based, and I'm using DirectSound for capture
and playback, specifically DirectSoundFullDuplexCreate8() with
Notification Positions. The RTP Library I'm using for transfer is
JRTPLib 3.7.1, and I'm using the associated JThread 1.2.1 for . I'm
currently using Speex 1.1.12, in wideband mode with the speex jitter
buffer, Quality is set to 8, and perceptual enhancement on. The
preprocessor state is being set as follows:
preprocessorState = speex_preprocess_state_init(320, 16000); //640, 32000
int denoise = 0;
int agc = 0;
int vad = 0;
int dreverb = 0;
float agcLevel = 8000;
float dereverb_decay = .5f;
float dereverb_level = .2f;
speex_preprocess_ctl(preprocessorState,
SPEEX_PREPROCESS_SET_DENOISE, &denoise);
speex_preprocess_ctl(preprocessorState, SPEEX_PREPROCESS_SET_AGC, &agc);
speex_preprocess_ctl(preprocessorState, SPEEX_PREPROCESS_SET_VAD, &vad);
speex_preprocess_ctl(preprocessorState,
SPEEX_PREPROCESS_SET_DEREVERB, &dreverb);
speex_preprocess_ctl(preprocessorState,
SPEEX_PREPROCESS_SET_DEREVERB_DECAY, &dereverb_decay);
speex_preprocess_ctl(preprocessorState,
SPEEX_PREPROCESS_SET_DEREVERB_LEVEL, &dereverb_level);
speex_preprocess_ctl(preprocessorState,
SPEEX_PREPROCESS_SET_AGC_LEVEL, &agcLevel);
Thank you for any input you might have!
Jamie Stanton
> -----Original Message----- > From: speex-dev-bounces@xiph.org [mailto:speex-dev-bounces@xiph.org]On > Behalf Of James Stanton > Sent: Thursday, October 04, 2007 12:53 PM > To: speex-dev@xiph.org > Subject: [Speex-dev] Audio Speed Variability > > > I have a video conference like application that I've been working on for > a while now, and a recent change is causing some odd problems, and I was > wondering if anyone else had seen problems like this....Short answer: don't use output sample rates other than 44100 or 48000. Longer answer: Sound chips usually run at one of those rates, often either. Those rates are more or less guaranteed to work properly. Most chips don't support other rates directly; a software resampler in the driver is used instead. Unfortunately, Microsoft released a horribly-broken reference resampler implementation to sound hardware OEMs a few years ago, and many of them still use it. On their sound cards, if you ask for 11025 Hz, for example you're likely to get 11100 Hz or something similarly-imprecise. That obviously causes cumulative latency/slippage problems. Bottom line: voice codec applications that need to work at lower rates really need to resample to 44.1K or 48K themselves in order to work robustly across all hardware platforms. Neither MS nor sound-hardware OEMs have shown the slightest interest in fixing this bug, so that's just the way it goes. -- john
John, Thanks for the reply! You mentioned output sample rates should be 44100 or 48000, should I worry about input (Mic) Sample rates as well? (Currently I was requesting the sample rate on both ends to be 16000 samplesPerSecond, for ease of passing into the codec) Also, do you recommend any particular resampler that I should use, or are any of the ones out there probably okay, or should I just write my own? Thanks again for your help! Jamie John Miles wrote:>>-----Original Message----- >>From: speex-dev-bounces@xiph.org [mailto:speex-dev-bounces@xiph.org]On >>Behalf Of James Stanton >>Sent: Thursday, October 04, 2007 12:53 PM >>To: speex-dev@xiph.org >>Subject: [Speex-dev] Audio Speed Variability >> >> >>I have a video conference like application that I've been working on for >>a while now, and a recent change is causing some odd problems, and I was >>wondering if anyone else had seen problems like this.... >> >> > > >Short answer: don't use output sample rates other than 44100 or 48000. > >Longer answer: Sound chips usually run at one of those rates, often either. >Those rates are more or less guaranteed to work properly. Most chips don't >support other rates directly; a software resampler in the driver is used >instead. Unfortunately, Microsoft released a horribly-broken reference >resampler implementation to sound hardware OEMs a few years ago, and many of >them still use it. On their sound cards, if you ask for 11025 Hz, for >example you're likely to get 11100 Hz or something similarly-imprecise. >That obviously causes cumulative latency/slippage problems. > >Bottom line: voice codec applications that need to work at lower rates >really need to resample to 44.1K or 48K themselves in order to work robustly >across all hardware platforms. Neither MS nor sound-hardware OEMs have >shown the slightest interest in fixing this bug, so that's just the way it >goes. > >-- john > >_______________________________________________ >Speex-dev mailing list >Speex-dev@xiph.org >http://lists.xiph.org/mailman/listinfo/speex-dev > >
Hi, On 10/5/07, John Miles <jmiles@pop.net> wrote:> Longer answer: Sound chips usually run at one of those rates, often either. > Those rates are more or less guaranteed to work properly. Most chips don't > support other rates directly; a software resampler in the driver is used > instead. Unfortunately, Microsoft released a horribly-broken reference > resampler implementation to sound hardware OEMs a few years ago, and many of > them still use it. On their sound cards, if you ask for 11025 Hz, for > example you're likely to get 11100 Hz or something similarly-imprecise. > That obviously causes cumulative latency/slippage problems.Is there any statistic regarding which chips operate on which frequency? I'm wondering will it be safe to use 48kHz rather then 44.1kHz, as it is a multiple of 16kHz, so it should be easier, faster and less lossy to convert to/from it. -- Regards, Alexander Chemeris. SIPez LLC. SIP VoIP, IM and Presence Consulting http://www.SIPez.com tel: +1 (617) 273-4000