Hello List, we have 3 Asterisk boxes running under Fedora Core 2. Every box hangs/crashes from time to time. These installations are image based, means we made an image from our testserver with an image tool, which is able to manage ext3 partitions and deployed it to different server hardware. These servers run very stable and I could not find any failures in the logs. As these crashes appeared the first time, I thought rebooting these machines by cronjob every night at 04:00 would solve the problems. It seemed to work quite well for a couple of weeks. Today I saw our own asterisk production server crash :-( . These crashes are always the same, asterisk stops responding, the cli does not give any reaction on command input, you have to manually kill -9 all asterisk and moh processes. Asterisk logs are empty. We don't use any isdn/fxs/fxo/e1/t1 cards in these servers. Our connections to PSTN is only made by Patton/Inalp SmartNode Gateways, connected to asterisk via sip protocol. Scince these crashes appear on three servers with different hardware, and the main installation is always the same, I would think there are "only" two possible sources to find the failure: Operating System Fedora Core 2 Kernel 2.6.8-1.521 Asterisk CVS-HEAD-01/08/05 Has anybody out there similar problems, and if yes, how did he fix them? Is there any working solution, having asterisk control itself perhaps by using a script that drops a test call in /var/spool/asterisk/outgoing and if this call wasn't processed successfull the script stops all running asterisk and moh processes and restarts asterisk? Any help would be appreciated, since "I can't get no sleep with these timebombs out there" ;-) Guido Hecken
> we have 3 Asterisk boxes running under Fedora Core 2. Every box > hangs/crashes from time to time. > These installations are image based, means we made an image from our > testserver with an image tool, which is able to manage ext3 partitions and > deployed it to different server hardware. > These servers run very stable and I could not find any failures in the logs. > As these crashes appeared the first time, I thought rebooting these machines > by cronjob every night at 04:00 would solve the problems. It seemed to work > quite well for a couple of weeks. Today I saw our own asterisk production > server crash :-( . > These crashes are always the same, asterisk stops responding, the cli does > not give any reaction on command input, you have to manually kill -9 all > asterisk and moh processes. > Asterisk logs are empty. > > We don't use any isdn/fxs/fxo/e1/t1 cards in these servers. > Our connections to PSTN is only made by Patton/Inalp SmartNode Gateways, > connected to asterisk via sip protocol. > Scince these crashes appear on three servers with different hardware, and > the main installation is always the same, I would think there are "only" two > possible sources to find the failure: > > Operating System Fedora Core 2 Kernel 2.6.8-1.521 > Asterisk CVS-HEAD-01/08/05 > > Has anybody out there similar problems, and if yes, how did he fix them? > Is there any working solution, having asterisk control itself perhaps by > using a script that drops a test call in /var/spool/asterisk/outgoing and if > this call wasn't processed successfull the script stops all running asterisk > and moh processes and restarts asterisk?Far too many variables for anyone to even guess at the root cause. Problem could be related to slight differences in o/s libraries between systems, coding problems within asterisk, etc. There were some issues reported with cvs head in January relative to hangs, etc. Might consider changing /etc/asterisk/logger.conf and add debug to the list. Then after a failure, at least look at /var/log/asterisk/debug messages. For additional info, I'd suggest compiling the code on one of thse machines to see if it complains about missing/inappropriate items.
Rich,> Far too many variables for anyone to even guess at the root cause. Problem > could be related to slight differences in o/s libraries between systems, > coding problems within asterisk, etc.You 're right, it could be every thing....> There were some issues reported with cvs head in January relative tohangs,> etc.Are they reported in the bugtracker, or in the mailing list?> Might consider changing /etc/asterisk/logger.conf and add debug to thelist.> Then after a failure, at least look at /var/log/asterisk/debug messages.Yes, this was the first thing, I did after the crash showed up. I simply forgot to enable it, since this production server ran long time without problems. But now, following murphy's law, the next crash will never happen ;-)> For additional info, I'd suggest compiling the code on one of thsemachines> to see if it complains about missing/inappropriate items.After these machines were setup, we compiled new code on every machine, since we started with an older version of Asterisk in November 2004. The compiling of asterisk did not show me any relevant (?) errors. But I remember there were some statements (Warnings) in the console output of the make process, I didn't understand. Is this output logged in addition to the console in a logfile somewhere? If so, one could examine this output and hopefully get some hints... Thanks for your help Guido Hecken
> > (There are other debug modes, but not sure I'd use those to catch a > > production problem. The one's I know about are primarily intended for > > development debugging. Other folks might contribute hints here.) > This reeks of a deadlock, > http://voip-info.org/wiki-Asterisk+deadlock > see this > HowTo Debug a DeadLock in Asterisk > i wrote up eons ago on the wiki > http://voip-info.org/wiki-Asterisk+debuggingThanks for your informations, I will "try" to follow the instructions on debugging asterisk. Since I'm not a programmer, I think I will get some fun with it.... ;-) Guido