I have a video conference like application that I've been working on for a while now, and a recent change is causing some odd problems, and I was wondering if anyone else had seen problems like this. The issue I'm seeing is that when using the sound card for capture, the audio will eventually get about 1-2 seconds out of synch (delayed), from the video. However, if I use USB devices for capture and playback, the delay disapears. To make life more complex, using the sound card only causes the delay on some systems, but not all systems. On my two main development boxes I see no problems with either USB or mini-jack on the soundboard, but on one of our other test machines only the USB works. It feels to me like it might be an audio clocking issue, but it could also be a speed of processing issue. Has anyone seen this at all, and if so did anything help it? Is there anything I can change in the speex codec to speed things up, in case it's a speed of processing issue? I am seeing a fair number of problem records coming out of the speex jitter buffer, but they might just mean that data isn't being fed in fast enough into the buffer. I'm checking: speex_jitter_get(&speexJitter, (short *)newData, NULL); if (speexJitter.valid_bits == 0) //bad record { fprintf(stderr, "Interpolating since nothing happened!!!!\n"); fflush(stderr); } Does this say I'm processing too slowly into the buffer, or that the data put in is somehow corrupt? The application is Windows based, and I'm using DirectSound for capture and playback, specifically DirectSoundFullDuplexCreate8() with Notification Positions. The RTP Library I'm using for transfer is JRTPLib 3.7.1, and I'm using the associated JThread 1.2.1 for . I'm currently using Speex 1.1.12, in wideband mode with the speex jitter buffer, Quality is set to 8, and perceptual enhancement on. The preprocessor state is being set as follows: preprocessorState = speex_preprocess_state_init(320, 16000); //640, 32000 int denoise = 0; int agc = 0; int vad = 0; int dreverb = 0; float agcLevel = 8000; float dereverb_decay = .5f; float dereverb_level = .2f; speex_preprocess_ctl(preprocessorState, SPEEX_PREPROCESS_SET_DENOISE, &denoise); speex_preprocess_ctl(preprocessorState, SPEEX_PREPROCESS_SET_AGC, &agc); speex_preprocess_ctl(preprocessorState, SPEEX_PREPROCESS_SET_VAD, &vad); speex_preprocess_ctl(preprocessorState, SPEEX_PREPROCESS_SET_DEREVERB, &dreverb); speex_preprocess_ctl(preprocessorState, SPEEX_PREPROCESS_SET_DEREVERB_DECAY, &dereverb_decay); speex_preprocess_ctl(preprocessorState, SPEEX_PREPROCESS_SET_DEREVERB_LEVEL, &dereverb_level); speex_preprocess_ctl(preprocessorState, SPEEX_PREPROCESS_SET_AGC_LEVEL, &agcLevel); Thank you for any input you might have! Jamie Stanton
> -----Original Message----- > From: speex-dev-bounces@xiph.org [mailto:speex-dev-bounces@xiph.org]On > Behalf Of James Stanton > Sent: Thursday, October 04, 2007 12:53 PM > To: speex-dev@xiph.org > Subject: [Speex-dev] Audio Speed Variability > > > I have a video conference like application that I've been working on for > a while now, and a recent change is causing some odd problems, and I was > wondering if anyone else had seen problems like this....Short answer: don't use output sample rates other than 44100 or 48000. Longer answer: Sound chips usually run at one of those rates, often either. Those rates are more or less guaranteed to work properly. Most chips don't support other rates directly; a software resampler in the driver is used instead. Unfortunately, Microsoft released a horribly-broken reference resampler implementation to sound hardware OEMs a few years ago, and many of them still use it. On their sound cards, if you ask for 11025 Hz, for example you're likely to get 11100 Hz or something similarly-imprecise. That obviously causes cumulative latency/slippage problems. Bottom line: voice codec applications that need to work at lower rates really need to resample to 44.1K or 48K themselves in order to work robustly across all hardware platforms. Neither MS nor sound-hardware OEMs have shown the slightest interest in fixing this bug, so that's just the way it goes. -- john
John, Thanks for the reply! You mentioned output sample rates should be 44100 or 48000, should I worry about input (Mic) Sample rates as well? (Currently I was requesting the sample rate on both ends to be 16000 samplesPerSecond, for ease of passing into the codec) Also, do you recommend any particular resampler that I should use, or are any of the ones out there probably okay, or should I just write my own? Thanks again for your help! Jamie John Miles wrote:>>-----Original Message----- >>From: speex-dev-bounces@xiph.org [mailto:speex-dev-bounces@xiph.org]On >>Behalf Of James Stanton >>Sent: Thursday, October 04, 2007 12:53 PM >>To: speex-dev@xiph.org >>Subject: [Speex-dev] Audio Speed Variability >> >> >>I have a video conference like application that I've been working on for >>a while now, and a recent change is causing some odd problems, and I was >>wondering if anyone else had seen problems like this.... >> >> > > >Short answer: don't use output sample rates other than 44100 or 48000. > >Longer answer: Sound chips usually run at one of those rates, often either. >Those rates are more or less guaranteed to work properly. Most chips don't >support other rates directly; a software resampler in the driver is used >instead. Unfortunately, Microsoft released a horribly-broken reference >resampler implementation to sound hardware OEMs a few years ago, and many of >them still use it. On their sound cards, if you ask for 11025 Hz, for >example you're likely to get 11100 Hz or something similarly-imprecise. >That obviously causes cumulative latency/slippage problems. > >Bottom line: voice codec applications that need to work at lower rates >really need to resample to 44.1K or 48K themselves in order to work robustly >across all hardware platforms. Neither MS nor sound-hardware OEMs have >shown the slightest interest in fixing this bug, so that's just the way it >goes. > >-- john > >_______________________________________________ >Speex-dev mailing list >Speex-dev@xiph.org >http://lists.xiph.org/mailman/listinfo/speex-dev > >
Hi, On 10/5/07, John Miles <jmiles@pop.net> wrote:> Longer answer: Sound chips usually run at one of those rates, often either. > Those rates are more or less guaranteed to work properly. Most chips don't > support other rates directly; a software resampler in the driver is used > instead. Unfortunately, Microsoft released a horribly-broken reference > resampler implementation to sound hardware OEMs a few years ago, and many of > them still use it. On their sound cards, if you ask for 11025 Hz, for > example you're likely to get 11100 Hz or something similarly-imprecise. > That obviously causes cumulative latency/slippage problems.Is there any statistic regarding which chips operate on which frequency? I'm wondering will it be safe to use 48kHz rather then 44.1kHz, as it is a multiple of 16kHz, so it should be easier, faster and less lossy to convert to/from it. -- Regards, Alexander Chemeris. SIPez LLC. SIP VoIP, IM and Presence Consulting http://www.SIPez.com tel: +1 (617) 273-4000