Hi everyone, I have been running into a few issues with Asterisk/polycom and I am running out of ideas. This problem has been ongoing for the last couple of weeks. I will try to be as detailed as I can, but I might leave out a few details. Any suggestions would be greatly appreciated. Our setup: Asterisk 1.2.22, Fedora 4. Co-located to our PRI providers facility. Comes to us via a DS3 to our other co-lo, then is sent via wireless to our office. We also have another office in another state but the connection is pretty much identical (they may have different wireless equipment) except the path it takes. About 3 weeks ago, our phones started to reboot randomly. Sometimes within 5 minutes, and sometimes up to an hour, but they were pretty consistent, and it didn't matter if the agent was on the phone or not. Asterisk would only tell me that the phones were not reachable, which clearly was the case since it was rebooting. At the time, they were running I think SIP version 1.6.6. However, the odd factor to that, was the office in the other state was not having this problem (running 1.6.6) and my phone and my supervisors (which we were testing SIP 2.2.2). After having our phone company test the DS3, there were some errors found, and correct and that solved the problem for a few days. Which of course, it started to happen again. However, this time there were no errors on the DS3, and again the same problems were occurring, but my phone, my supervisors, and the other states were fine. So I decided to upgrade all of the phones to version 2.2.2. So that at least solved the reboot problem, but I ran into another problem. Now, the phones lose their registration with Asterisk. My bosses phone does it every time he receives a voicemail. Most people can go a few hours without having to restart, but then sooner or later, it loses its registration. From what I can tell, there are no errors in Asterisk, no errors in the logs from the phones booting. I can even ping the phones from the asterisk box when this happens, but they just won't re-reg with the system. Now my phone has been up almost 3 days now, but other phones are still having this problem. Also the phones in the other state are not having this problem AND are still on 1.6.6. Anyone have any suggestions on something to look at, test, try, etc? I'm completely stumped. Thanks, Kevin
> I have been running into a few issues with Asterisk/polycom and I am > running out of ideas. This problem has been ongoing for the last couple > of weeks. I will try to be as detailed as I can, but I might leave out a > few details. Any suggestions would be greatly appreciated. > > Our setup: Asterisk 1.2.22, Fedora 4. Co-located to our PRI providers > facility. Comes to us via a DS3 to our other co-lo, then is sent via > wireless to our office. We also have another office in another state but > the connection is pretty much identical (they may have different > wireless equipment) except the path it takes. > > About 3 weeks ago, our phones started to reboot randomly. Sometimes > within 5 minutes, and sometimes up to an hour, but they were pretty > consistent, and it didn't matter if the agent was on the phone or not. > Asterisk would only tell me that the phones were not reachable, which > clearly was the case since it was rebooting. At the time, they were > running I think SIP version 1.6.6. However, the odd factor to that, was > the office in the other state was not having this problem (running > 1.6.6) and my phone and my supervisors (which we were testing SIP 2.2.2). > > After having our phone company test the DS3, there were some errors > found, and correct and that solved the problem for a few days. Which of > course, it started to happen again. However, this time there were no > errors on the DS3, and again the same problems were occurring, but my > phone, my supervisors, and the other states were fine. So I decided to > upgrade all of the phones to version 2.2.2. So that at least solved the > reboot problem, but I ran into another problem. > > Now, the phones lose their registration with Asterisk. My bosses phone > does it every time he receives a voicemail. Most people can go a few > hours without having to restart, but then sooner or later, it loses its > registration. From what I can tell, there are no errors in Asterisk, no > errors in the logs from the phones booting. I can even ping the phones > from the asterisk box when this happens, but they just won't re-reg with > the system. Now my phone has been up almost 3 days now, but other phones > are still having this problem. Also the phones in the other state are > not having this problem AND are still on 1.6.6. > > Anyone have any suggestions on something to look at, test, try, etc? I'm > completely stumped.Here are some things to check from my experience: 1. The polycom phones are a bit sensitive to low power, more so on the older 301/501/601 models. Say you have refrigerator plugged into the same power ckts supplying power to several phones, when the refrigerator compressor kicks on (randomly), it draws down the voltage on the line, the phones reboot. So, if several phones reset at the same time, look at power, new equipment installed sucking down power, check the voltage coming in from the street, maybe you are getting brown-outs from the power company and don't realize it. Summer time is a huge draw on the power grid. 2. Network congestion combined with poor phone performance, mostly on the older models 301/501/601, will cause the phones to randomly drop off from the PBX. Sniff the network and ensure you don't have a broadcast storm or high network congestion, like pc's with viruses, when the phones drop off. If the phones become unreachable, then reachable within 10 seconds (look in the asterisk message log) then this points to network congestion. Ensure the sip users are set to qualify=yes (2000ms). 3. Poor switch performance, switch going bad, not responding in a timely manner forwarding packets. You can run an extended ping through the switch, pings are low priority so if the sip packets are latent, you will definitely see latent ping times as well. Good Luck, JR --------------------- JR Richardson Engineering for the Masses
> Connections at our office. > Computer---[PC jack on the back of the phones]----Phone----Netgear > Switch---Mikrotik Router--- Wireless Connection to router--ds3---asteriskI'm curious about the Mikrotik Router and the Wireless Connection, is the router setup to NAT or route? VoIP across a wireless connection is not a good thing unless it is setup with QOS layer2/3. Optimally the PBX should be on the LAN side of the router.> > The Netgear switch is a fsm7326p PoE switch. Below is the current status > of the PoE system (I removed the ones that are not using PoE). Now, I'm > pretty sure this equipment is on it's own breaker, it used to house most > of our co-lo equipment before we moved it to another office. Mainly I'm > providing this information in case there is something that is a miss > that maybe I am over looking that you may clearly see being a problem.The switch doesn't look overloaded as far as power goes. Is it on a UPS or plugged straight into building power? You could still be having brown-outs that would cause some, ports to drop below acceptable voltage. Make sure the switch is on a good UPS to eliminate this as a possibility.> > From a sniff point of view..where would the best place to sniff in your > opinion? From the Asterisk server or from somewhere within our office, > or both?Mirror the switch port going to the router on the LAN side and sniff there for broadcast storm/viruses/congestion. Sniff the WAN side of the router to seek congestion on the DS3. What is the router utilization, how many open NAT translations, is the router over loaded, has your traffic increased over last several months?> > Also, I have combed through the configuration files enough to where I > may be overlooking something, is there anything in the configuration > that you could think of that may cause the problems? >The only configuration is qualify=yes and nat=yes. If the network is routed between the phones and the PBX and not NATed, set nat=no and qualify=no and see how that works. You mentioned this started happening 3 months ago, what happened then? Network changes, equipment changes, traffic increased, new users (downloading allot during the day, surfing porn), wireless interference? Good luck. -- Thanks. JR --------------------- JR Richardson Engineering for the Masses
JR Richardson wrote:> You mentioned this started happening 3 months ago, what happened then? > Network changes, equipment changes, traffic increased, new users > (downloading allot during the day, surfing porn), wireless > interference? > >The initial problem started when our DS3 was throwing errors. Once that was resolved, it was fine until about a week later when the problems started again...but this time no errors from showing on the DS3. Otherwise, I will try some other suggestions the next time I am back in that office. Thanks again, Kevin