Colin Anderson
2006-Jul-06 10:33 UTC
[asterisk-users] Phones cutting out.....again - PLEASE HELP!! !
So you need a "divide and conquer" strategy here: 1. Is it Asterisk or the WAN? This should be easy enough to test for. Do call dropouts happen in your datacentre? If not, your Asterisk install is good. My money's on the 10mbit WAN pipe, and that's what I would be focussing on. 2. If it's the WAN, is it a connectivity issue or a bandwidth issue? Do a continous ping from the remote location to your Asterisk server for a day. You should get NO packets dropped. If you are getting drops, it's a connectivity issue and you have to look at your SLA to see what your provider considers good. Otherwise, bandwidth issue. 3. If it's a bandwidth issue, is it your users doing things or is it a service that is eating bandwidth? If it's a service that is aggregated to a remote server, like email, then you can use bandwidth management tools like AstShape or good old tc to severely retard available bandwidth to the troublesome service. If it's your users, you have to determine what they are doing. Look at patterns: Does it happen every Tuesday afternoon when you know Bob from Accounting is running his reports? 4. Sounds like you are running Asterisk --> SIP --> 10mbit WAN --> SIP --> Phones - which probably is half the issue right there because of no jitterbuffer. Dig up an old P-3, stick in Trixbox, run it out to your remote location, and have your Eyebeam clients use *it* instead of your big Asterisk server for local connectivity. Then tie your P-3 to your big Asterisk server with IAX. Jitterbuffer + trunking = goodness and your P-3 won't choke under load if you avoid transcoding by using the same codec end-to-end. Yes it will blow having to maintain two dialplans. But IAX works frigging great. I use it to aggregate 30 remote locations over the *public* Internet to my big Asterisk server, and I never get complaints of dropouts, and in fact I use it extensively myself and IMO it sounds better* than the local CableCo's VoIP offering, which is a big POS. 5. Regardless of what it actually is, I would have some sort of traffic shaper at both ends of the WAN pipe. Again, dig up a couple of old P-2 or P-3's and stick in a bootable Monowall CD, change the default rules to allow all traffic through, but create a traffic shaping ruleset to give priority and bandwidth to 5060, 4569, 10000-20000 and dump everything else to a low priority queue. 6. I'd run GSM anyway (even though you tried it) because it would eliminate half your bandwidth consumption. Another variable eliminated. hth *By 'sounds better' I mean it sounds like a perfectly normal PSTN call, ALL THE TIME in s d of co s an ly s nd ng li e t hs -----Original Message----- From: whois wes [mailto:whoiswes@gmail.com] Sent: Thursday, July 06, 2006 10:51 AM To: Asterisk Users Mailing List - Non-Commercial Discussion Subject: [asterisk-users] Phones cutting out.....again - PLEASE HELP!!! Hate to drag this one back up, but....it's happening again. Overview of architecture: Dell poweredge 2850, running fedora core 4, asterisk 1.2.7.1, zaptel 1.2.5, and sangoma wanpipe 2.3.4 drivers. T1 interface card is the sangoma a104d with onboard echo can. Server is located in our data center and connected directly to our cisco 6513 core switch, so we have almost zero latency. The office having the issues is located several miles away and is connected via a 10Mbit fiber pipe, also low latency. Ping times between remote office and here are well under 10ms. T1's are robbed-bit, E&M wink signalling <--- (this may be cause, but want your input). Server load is averaging around 20%, plenty of memory, disk space, and bandwidth available. No QOS running on network. ulaw is the primary codec. Server is stable, and there are no extraneous services running, save mysql and httpd. Even running a processor intensive query doesn't trigger the droputs, they happen randomly. Phones are a mix of Eyebeam 1.5.5 and Eyebeam 1.10 3010n. Both types of phones are experiencing cutting out of the signal, mainly in the Rx stream, but occassional in the Tx stream as well. The cutting out was NOT occurring last night, and the phone server is being rebooted nightly. Nothing has changed AT ALL, and the problem has started occurring again. If I don't do ANYTHING at all today, there is a 50% chance that this will NOT occur tomorrow. In other words, SOMETHING is causing our phones to drop out, but whatever changes I make seem to have no effect. The problem will start and stop seeminly at it's own whim. --- Things I have tried: 1. changed from ulaw to gsm as primary codec - no change 2. disabled hardware echo can on A104D - no change 3. moved from asterisk 1.2.4 to 1.2.7.1, recompiled both several times - no change 4. have played with gain settings a bit, doesn't seem to make much difference --- At this point, i am nearing the end of my rope - i have rebuilt this machine three times now, and have recompiled the system at least a dozen times. We have gone from Digium hardware to Sangoma harware and back again. I have changed every conceivable setting on the phones to no avail. The problem will randomly disappear, only to come back a few days later. I can make a change, it seems to have an effect, then we're back to the same old thing again. I am in dire need of ANY help anyone can offer, this has been going on in some form for almost three months. Thanks for reading, Wes _______________________________________________ --Bandwidth and Colocation provided by Easynews.com -- asterisk-users mailing list To UNSUBSCRIBE or update options visit: http://lists.digium.com/mailman/listinfo/asterisk-users
whois wes
2006-Jul-06 10:58 UTC
[asterisk-users] Phones cutting out.....again - PLEASE HELP!! !
Thanks for the quick responses everyone. To answer some of the questions posed: The main traffic going over this pipe is voice, with a small amount of web traffic as well. There are 60 total users, 5 of which access anything other than what is on their LAN up there. In any case, we are not saturating the pipe, and our telco put some sort of filters on the Optiman switches on each side to eliminate any jitter (or so they say). Prior to the filter being installed, we had our main application server for that location located down here - when the issue started (out of the blue, nothing really triggered it, and our bandwidth didn't change or spike) we moved that server to the remote location. So, before we even had the issue, we were using WAY more bandwidth, almost 8Mbit at times...we're averaging around 2-3 now, and it rarely spikes above that. Also, when I connect to the server locally (the server is in the room next to me, in other words, and i have 1 Gbit of bandwidth all the way to the back of the server, I still get call dropouts. In other words, completely bypassing the fiber pipe results in the same problem. For that reason alone, I don't think it's the WAN (although I agree with what all of you said in regards to QOS, etc, it's just not up to me to implement that, even though it's been suggested numerous times). However, this IS the only server (of 8 total, all in the same rack and connected to the telco via the same DS3) that is having the issue, which DOES point to it being the WAN, as that is our ONLY remote location. See why I'm frustrated? I do like the idea of putting a local box up there and using an IAX trunk over the pipe, and will see about getting that implemented. GSM was already shot down as 'too low-quality' - we'd rather up the pipe to 20Mbit than go with a lower quality codec. Sorry that I forgot to mention some of this in my initial post, and hopefully the above info will shed a bit more light on my confusion. Thank you all again for replying so quickly, and if you have any other suggestions, please let me know. Wes On 7/6/06, Colin Anderson <ColinA@landmarkmasterbuilder.com> wrote:> So you need a "divide and conquer" strategy here: > > 1. Is it Asterisk or the WAN? This should be easy enough to test for. Do > call dropouts happen in your datacentre? If not, your Asterisk install is > good. My money's on the 10mbit WAN pipe, and that's what I would be > focussing on. > > 2. If it's the WAN, is it a connectivity issue or a bandwidth issue? Do a > continous ping from the remote location to your Asterisk server for a day. > You should get NO packets dropped. If you are getting drops, it's a > connectivity issue and you have to look at your SLA to see what your > provider considers good. Otherwise, bandwidth issue. > > 3. If it's a bandwidth issue, is it your users doing things or is it a > service that is eating bandwidth? If it's a service that is aggregated to a > remote server, like email, then you can use bandwidth management tools like > AstShape or good old tc to severely retard available bandwidth to the > troublesome service. If it's your users, you have to determine what they are > doing. Look at patterns: Does it happen every Tuesday afternoon when you > know Bob from Accounting is running his reports? > > 4. Sounds like you are running Asterisk --> SIP --> 10mbit WAN --> SIP --> > Phones - which probably is half the issue right there because of no > jitterbuffer. Dig up an old P-3, stick in Trixbox, run it out to your remote > location, and have your Eyebeam clients use *it* instead of your big > Asterisk server for local connectivity. Then tie your P-3 to your big > Asterisk server with IAX. Jitterbuffer + trunking = goodness and your P-3 > won't choke under load if you avoid transcoding by using the same codec > end-to-end. Yes it will blow having to maintain two dialplans. But IAX works > frigging great. I use it to aggregate 30 remote locations over the *public* > Internet to my big Asterisk server, and I never get complaints of dropouts, > and in fact I use it extensively myself and IMO it sounds better* than the > local CableCo's VoIP offering, which is a big POS. > > 5. Regardless of what it actually is, I would have some sort of traffic > shaper at both ends of the WAN pipe. Again, dig up a couple of old P-2 or > P-3's and stick in a bootable Monowall CD, change the default rules to allow > all traffic through, but create a traffic shaping ruleset to give priority > and bandwidth to 5060, 4569, 10000-20000 and dump everything else to a low > priority queue. > > 6. I'd run GSM anyway (even though you tried it) because it would eliminate > half your bandwidth consumption. Another variable eliminated. > > hth > > *By 'sounds better' I mean it sounds like a perfectly normal PSTN call, ALL > THE TIME in s d of co s an ly s nd ng li e t hs > > > -----Original Message----- > From: whois wes [mailto:whoiswes@gmail.com] > Sent: Thursday, July 06, 2006 10:51 AM > To: Asterisk Users Mailing List - Non-Commercial Discussion > Subject: [asterisk-users] Phones cutting out.....again - PLEASE HELP!!! > > > Hate to drag this one back up, but....it's happening again. > > Overview of architecture: > > Dell poweredge 2850, running fedora core 4, asterisk 1.2.7.1, zaptel > 1.2.5, and sangoma wanpipe 2.3.4 drivers. T1 interface card is the > sangoma a104d with onboard echo can. > > Server is located in our data center and connected directly to our > cisco 6513 core switch, so we have almost zero latency. The office > having the issues is located several miles away and is connected via a > 10Mbit fiber pipe, also low latency. Ping times between remote office > and here are well under 10ms. > > T1's are robbed-bit, E&M wink signalling <--- (this may be cause, but > want your input). > > Server load is averaging around 20%, plenty of memory, disk space, and > bandwidth available. No QOS running on network. ulaw is the primary > codec. Server is stable, and there are no extraneous services > running, save mysql and httpd. Even running a processor intensive > query doesn't trigger the droputs, they happen randomly. > > Phones are a mix of Eyebeam 1.5.5 and Eyebeam 1.10 3010n. Both types > of phones are experiencing cutting out of the signal, mainly in the Rx > stream, but occassional in the Tx stream as well. The cutting out was > NOT occurring last night, and the phone server is being rebooted > nightly. Nothing has changed AT ALL, and the problem has started > occurring again. If I don't do ANYTHING at all today, there is a 50% > chance that this will NOT occur tomorrow. In other words, SOMETHING > is causing our phones to drop out, but whatever changes I make seem to > have no effect. The problem will start and stop seeminly at it's own > whim. > > --- > Things I have tried: > > 1. changed from ulaw to gsm as primary codec - no change > 2. disabled hardware echo can on A104D - no change > 3. moved from asterisk 1.2.4 to 1.2.7.1, recompiled both several > times - no change > 4. have played with gain settings a bit, doesn't seem to make much > difference > --- > > At this point, i am nearing the end of my rope - i have rebuilt this > machine three times now, and have recompiled the system at least a > dozen times. We have gone from Digium hardware to Sangoma harware and > back again. I have changed every conceivable setting on the phones to > no avail. The problem will randomly disappear, only to come back a few > days later. I can make a change, it seems to have an effect, then > we're back to the same old thing again. > > I am in dire need of ANY help anyone can offer, this has been going on > in some form for almost three months. > > Thanks for reading, > > Wes > _______________________________________________ > --Bandwidth and Colocation provided by Easynews.com -- > > asterisk-users mailing list > To UNSUBSCRIBE or update options visit: > http://lists.digium.com/mailman/listinfo/asterisk-users > _______________________________________________ > --Bandwidth and Colocation provided by Easynews.com -- > > asterisk-users mailing list > To UNSUBSCRIBE or update options visit: > http://lists.digium.com/mailman/listinfo/asterisk-users >
Colin Anderson
2006-Jul-06 11:43 UTC
[asterisk-users] Phones cutting out.....again - PLEASE HELP!! !
>Also, when I connect to the server locally (the server is in the room >next to me, in other words, and i have 1 Gbit of bandwidth all the way >to the back of the server, I still get call dropouts.>However, this IS the only server (of 8 total, all in the same rack and >connected to the telco via the same DS3) that is having the issue, >which DOES point to it being the WAN, as that is our ONLY remote >location.So perhaps what you are seeing is two or more subtle issues with the same symptom, so subjectively it looks like the *same* issue. 1. Definitely try the remote IAX box to rule out bandwidth starvation. 2. Definitely try the ping test to rule out connectivity. 3. You have to figure out what the problem is with your big Asterisk box. There should be no reason why you are getting dropouts on the local LAN. What is the output of zttest? Is it good? Does zttool indicate IRQ misses? If it's OK, then your hardware - T1 setup is good, so you have ruled out your Asterisk box. It is also a worthwhile excercise to rule out the onboard ethernet card in the Dell. In fact, whenever I do a new box, I automatically disable the onboard LAN and replace it with an add-in 3com or Intel. It is also a worthwhile excercise to user setpci to change the latency of the cards in the Dell so that your Zap boards can grab the bus as much as possible. 4. The thing that is common in all scenarios is the EyeBeam client itself. Any soft phone is subject to the strengths and weaknesses of the audio chipset in the PC, with issues to consider like latency, audio threshold before it starts the TX, and duplex settings. Because troubleshooting these variables is often as hard as troubleshooting an entire Asterisk install, I would never run a soft-phone and expect people to use it productively. What happens when you put in a "real" phone? If you don't have a hardphone, maybe try something else like the Snom soft-phone. In the end, this is all about eliminating variables as much as possible, and this will determine your "decision matrix" of things to try. The first matrix will be the most difficult to implement because you have a whole wack of stuff to eliminate, but they will get smaller and smaller as you eliminate variables and eventually you will only have 2 or 3 variables to test for, and then you are golden. OT: I find it useful to make painstaking notes or keep a spreadsheet of test results when going through a troubleshooting process like this. Often, referring back to the spreadsheet gives me valuable insight into a problem. I read this book, and I got shivers down my spine because it's like these guys got into my brain and stole (what I thought) was an original problem-solving idea of mine: http://www.transcendstrategy.com/html/index.php?module=htmlpages&func=displa y&pid=7 Every person that troubleshoots a complex system should read this book (disclaimer: I just read it, I have nothing to do with these guys) good luck