Robert McGilvray
2016-Jun-28 16:04 UTC
[asterisk-users] Audio cutting in and out - asterisk 13.1 cert6 / confbridge
Hello, We use Asterisk extensively for conferencing - for the last 8 years or so this has been the 1.4/1.6/1.8 releases running chan_sip and meetme for up to around 350 concurrent users. Right around that number DAHDI hit's a hard coded memory limit and kicks allocation errors in the log. [Jun 22 10:04:13] WARNING[9095] app_meetme.c: Unable to open DAHDI pseudo channel: Cannot allocate memory In order to support our growing user count we recently upgraded to 13.1-cert6 with pjsip and replaced meetme with confbridge. During all of our UAT and load testing everything seemed to be fine, there were no perceived audio quality issues or any logs that would indicate an issue. Unfortunately now that we're in production I'm getting consistent complaints that the audio from participants is cutting in and out. It only seems to occur while under load with > 350 users but that is anecdotal at best. This is not a simple networking issue, we've pretty much ruled that out with various performance testing. That was not the case initially and we had incrementing UDP packet receive errors which we've eliminated with a bit of tuning. There are numerous architectural differences between the two installations and so far I have not been successful in determining the root cause. I'm reaching out to the community and the developers for insight and feedback hoping there is prior experience with this issue and how to resolve it. As you can see below the most significant difference is probably the use of VMware on the new install. I've tuned the ESXi host and guest per VMware's recommendations for latency and jitter (full cpu/mem reservations) with no improvement. With all of the reading I've done I suspect my issue may come down to a timing source and VMware not providing a reliable clock. It seems they allow a backlog of interrupts and if it hasn't caught up in 60s they are simply dropped. Before I rip apart the environment and rebuild on physical I'd like to try and confirm that hypothesis. In the past this was a simple matter of running dahdi_test which would report the accuracy. I'm not sure how to interpret the results of "timing test" in the Asterisk CLI. If I increase the number of ticks per second the results are erratic while under load. I'm using the timerfd module in Asterisk with a 1000HZ tick kernel and high res timers enabled. I've tried both hpet and tsc as system clock sources, both exhibit the same breaks in audio. It sounds like someone presses the mute button in the middle of a sentence. Any insight is appreciated! Here are the specs on the new install: Physical HW Cisco UCS Blade (UCSB-B200-M3) vMware ESXI 5.5 VM Guest 4 vCPU w/ 32G of RAM tuned for latency/jitter (sensitivity=high) and full cpu/memory reservations. VM OS Redhat EL7 kernel 3.10.0-327.13.1.el7.x86_64 with tickless disabled e.g nohz=off and 1000HZ. Asterisk 13.1-cert6 using the timerfd module. Regards Robert McGilvray SS&C GlobeOp Associate Director, IT Network Security GlobeOp Financial Services | 1565 Front Street | Yorktown Hts NY 10598 t: +1 (914)-293-3584 | f: +1 (914)-293-3510 rmcgilvr at globeop.com | www.ssctech.com<http://www.ssctech.com/> | www.sscglobeop.com<http://www.sscglobeop.com/> Follow us: Twitter<http://twitter.com/GlobeOp> | Facebook<http://www.facebook.com/pages/SSC-Technologies-Inc/191750415876> | LinkedIn<http://www.linkedin.com/company/globeop-financial-services> This email with all information contained herein or attached hereto may contain confidential and/or privileged information intended for the addressee(s) only. If you have received this email in error, please contact the sender and immediately delete this email in its entirety and any attachments thereto. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.digium.com/pipermail/asterisk-users/attachments/20160628/7de0a64c/attachment.html>
Joshua Colp
2016-Jun-29 09:45 UTC
[asterisk-users] Audio cutting in and out - asterisk 13.1 cert6 / confbridge
Robert McGilvray wrote: <snip>> Before I rip apart the environment and rebuild on physical I?d like to > try and confirm that hypothesis. In the past this was a simple matter of > running dahdi_test which would report the accuracy. I?m not sure how to > interpret the results of ?timing test? in the Asterisk CLI. If I > increase the number of ticks per second the results are erratic while > under load. I?m using the timerfd module in Asterisk with a 1000HZ tick > kernel and high res timers enabled. I?ve tried both hpet and tsc as > system clock sources, both exhibit the same breaks in audio. It sounds > like someone presses the mute button in the middle of a sentence."timing test" does similar, it just doesn't do the automatic calculation. Confbridge normally operates at a mixing interval of 20ms, which is 50 ticks per second. That would be what you would want to test. If you don't get 50 per second then that means ConfBridge will not provide a steady source of media to each participant and it will be up to each remote jitterbuffer to handle the delayed traffic. Enough of it and stuff goes wonky. You could also see this on a packet capture. That would determine if it's timing related or not. -- Joshua Colp Digium, Inc. | Senior Software Developer 445 Jan Davis Drive NW - Huntsville, AL 35806 - US Check us out at: www.digium.com & www.asterisk.org
Robert McGilvray
2016-Jun-29 15:24 UTC
[asterisk-users] Audio cutting in and out - asterisk 13.1 cert6 / confbridge
"timing test" does similar, it just doesn't do the automatic calculation. Confbridge normally operates at a mixing interval of 20ms, which is 50 ticks per second. That would be what you would want to test. If you don't get 50 per second then that means ConfBridge will not provide a steady source of media to each participant and it will be up to each remote jitterbuffer to handle the delayed traffic. Enough of it and stuff goes wonky. You could also see this on a packet capture. That would determine if it's timing related or not. -- Thanks Joshua. We're talking about pretty long gaps in the audio, probably around 10-15 seconds which is quite a bit of missed ticks at 20ms sampling. I was poking around the timing code trying to get a better understanding of things and found that Asterisk uses timerfd_create with CLOCK_MONOTONIC as the clock. The man page states CLOCK_MONOTONIC is affected by incremental adjustments to the time made by things like NTP. I may be completely off track here but would something like vmtools that tries to correct the clock skew (caused by VMware) be causing some issues here? Meaning that if asterisk calls timerfd_create but then the time is adjusted could that throw off the timing of the descriptor? Regards Bob This email with all information contained herein or attached hereto may contain confidential and/or privileged information intended for the addressee(s) only. If you have received this email in error, please contact the sender and immediately delete this email in its entirety and any attachments thereto.