I always keep a terminal window open with "top" running for my asterisk servers. Since we've had Asterisk in production, for about 9 months, I've noticed with every platform and every card we've tried that the load average will be going along at about 0.1 to 0.5 with about 30 channels(15 SIP -> Zap conversations) going and then at seemingly random times the load average will jump to over 2.0. All the while the processor idle never goes below 50%. Does anyone know what the asterisk process is doing that causes these load jumps? (I have determined that initiating new calls or hanging up calls is not a factor in the timing of these jumps) Does anyone not experience these load jumps? This occurs on all hardware platforms that I've tested: P3 non-SMP, P4 non-SMP, P4 SMP, AMD non-SMP and AMD SMP using all available Digium T1 cards: wct410p, 400p and 100p The only common element is RedHat 9.0 as the OS and the fact the there is no other large service running on the machines(no web, no DB, no X) I have tested other resource-intensive applications(like MySQL in a constant loop of ordered selects of 1 million records) and not seen any other instances of load spikes on these systems. I have loaded up the channels on a test server to see what will happen is the load spikes while it is already at 2.0 and with 100 channels(50 SIP-> Zap conversations) it ran for 4 hours with the load averaging around 2.0(on non-SMP P4) and then I got a spike and the load went upto 8.0 and the server crashed. I would like to find out why asterisk is doing this just to satisfy my own curiosity, but if anything can be done about it I could get a lot more out of the servers I have before having to buy more whenever I need to increase capacity. MATT---
On Mon, 2004-02-23 at 09:19, mattf wrote:> I always keep a terminal window open with "top" running for my asterisk > servers. Since we've had Asterisk in production, for about 9 months, I've > noticed with every platform and every card we've tried that the load average > will be going along at about 0.1 to 0.5 with about 30 channels(15 SIP -> > Zap conversations) going and then at seemingly random times the load average > will jump to over 2.0. > > All the while the processor idle never goes below 50%. > > Does anyone know what the asterisk process is doing that causes these load > jumps? > (I have determined that initiating new calls or hanging up calls is not a > factor in the timing of these jumps)First a word on load averages as opposed to percent idle of CPU. Load average is the average number of processes awaiting cpu service. A process could be idle if it has no real work to complete and has allowed the CPU to skip on to another process. Percent idle is easier to understand as it is how much of the CPU's time is spent waiting for a process to need servicing. The problem of using top to monitor load is much like quantum physics, you change the value when you observe it. So part of your spike may be in timing of the observation. There are many operations that could affect the load average. Any new threads loading would be in a high busy state until the loading period id over and the process starts idle looping. Load mozilla up sometime while watching the load on your system shoot up. Your percent idle may still stay smallish since it is mostly exercising the disk subsystem and the CPU is waiting most of that time. If you are seeing a load average climb, you should identify the processes starting or running at that time. If it is falling, the processes have either completed the busy cycle, or have gone away. It is still likely though that you are seeing some errant behavior in RH9 caused by the new thread library. There may be a broken select function or something similar that is causing your trouble. Maybe you should try an older RH, or a different distribution and see if this happens as well.> I have loaded up the channels on a test server to see what will happen is > the load spikes while it is already at 2.0 and with 100 channels(50 SIP-> > Zap conversations) it ran for 4 hours with the load averaging around 2.0(on > non-SMP P4) and then I got a spike and the load went upto 8.0 and the server > crashed.Did the whole system crash or did just asterisk crash? If it was just asterisk, did you get a core dump and did you do a backtrace on it? -- Steven Critchfield <critch@basesys.com>
Thanks for the response. I plan on trying Slackware on my backup/test asterisk server when I have a new backup server ready in a few weeks. I've noticed in some database machine testing that Slackware starts up in about half the time of RedHat and doesn't have all of that Redhat junk either. I'll post my results running Slackware after I've had time to test it. When I said crashed I meant that the whole operating system crashed, so no backtrace possible. Thanks, MATT--- -----Original Message----- From: Steven Critchfield [mailto:critch@basesys.com] Sent: Monday, February 23, 2004 12:27 PM To: asterisk-users@lists.digium.com Subject: Re: [Asterisk-Users] Processor load spikes On Mon, 2004-02-23 at 09:19, mattf wrote:> I always keep a terminal window open with "top" running for my asterisk > servers. Since we've had Asterisk in production, for about 9 months, I've > noticed with every platform and every card we've tried that the loadaverage> will be going along at about 0.1 to 0.5 with about 30 channels(15 SIP -> > Zap conversations) going and then at seemingly random times the loadaverage> will jump to over 2.0. > > All the while the processor idle never goes below 50%. > > Does anyone know what the asterisk process is doing that causes these load > jumps? > (I have determined that initiating new calls or hanging up calls is not a > factor in the timing of these jumps)First a word on load averages as opposed to percent idle of CPU. Load average is the average number of processes awaiting cpu service. A process could be idle if it has no real work to complete and has allowed the CPU to skip on to another process. Percent idle is easier to understand as it is how much of the CPU's time is spent waiting for a process to need servicing. The problem of using top to monitor load is much like quantum physics, you change the value when you observe it. So part of your spike may be in timing of the observation. There are many operations that could affect the load average. Any new threads loading would be in a high busy state until the loading period id over and the process starts idle looping. Load mozilla up sometime while watching the load on your system shoot up. Your percent idle may still stay smallish since it is mostly exercising the disk subsystem and the CPU is waiting most of that time. If you are seeing a load average climb, you should identify the processes starting or running at that time. If it is falling, the processes have either completed the busy cycle, or have gone away. It is still likely though that you are seeing some errant behavior in RH9 caused by the new thread library. There may be a broken select function or something similar that is causing your trouble. Maybe you should try an older RH, or a different distribution and see if this happens as well.> I have loaded up the channels on a test server to see what will happen is > the load spikes while it is already at 2.0 and with 100 channels(50 SIP-> > Zap conversations) it ran for 4 hours with the load averaging around2.0(on> non-SMP P4) and then I got a spike and the load went upto 8.0 and theserver> crashed.Did the whole system crash or did just asterisk crash? If it was just asterisk, did you get a core dump and did you do a backtrace on it? -- Steven Critchfield <critch@basesys.com> _______________________________________________ Asterisk-Users mailing list Asterisk-Users@lists.digium.com http://lists.digium.com/mailman/listinfo/asterisk-users To UNSUBSCRIBE or update options visit: http://lists.digium.com/mailman/listinfo/asterisk-users
I captured a load spike graphically with ttyload in case anyone want s to see what it looks like: After hanging around at 0.50 load, it spiked up to 2.52 in less than 10 seconds. The only active processes before and during the spike were asterisk-related. below is a 10 second interval 12 minute snapshot of the load on my asterisk machine VICIast2 0.09, 0.24, 0.29 10:40:01 ttyload, v0.4.4 3.00 2.90 2.81 2.71 2.61 2.52 * 2.42 2.32 2.23 2.13 * 2.03 1.94 1.84 * 1.74 1.65 * 1.55 1.45 * 1.35 1.26 * 1.16 * 1.06 0.97 * 0.87 * 0.77 ***** * 0.68 ***** * 0.58 ****** * 0.48 *************** ********** * ** 0.39 * ******************************************** 0.29 ****** *** * ********* ********* 0.19 **** * *********** * **** 0.10 * ************ * 0.00 ^10:32^10:33^10:34^10:35^10:36^10:37^10:38^10:39^10:4 MATT--- -----Original Message----- From: Patrick [mailto:idefix@puzzled.xs4all.nl] Sent: Monday, February 23, 2004 1:19 PM To: asterisk-users@lists.digium.com Subject: RE: [Asterisk-Users] Processor load spikes On Mon, 2004-02-23 at 18:42, mattf wrote:> Thanks for the response. I plan on trying Slackware on my backup/test > asterisk server when I have a new backup server ready in a few weeks. I've > noticed in some database machine testing that Slackware starts up in about > half the time of RedHat and doesn't have all of that Redhat junk either. > I'll post my results running Slackware after I've had time to test it. > > When I said crashed I meant that the whole operating system crashed, so no > backtrace possible. > > Thanks, > > MATT---Hi Matt, My RH9 box has never crashed although on some others running RH9 I've seen load spikes also. The only similar situation I vaguely remember from long ago was either related to using a T400P/E400P card on a motherboard with the incorrect pci slot voltage or to a power supply that couldn't cope with the extra load. Don't recall exactly anymore so could be wrong but maybe worth keeping in mind. I always do the following on a RH9 box: * export LD_ASSUME_KERNEL=2.4.1 before you start asterisk. Alternatively you can build a plain vanilla 2.4.2x kernel from kernel.org and use that one * turn off all unnecessary cron jobs. updatedb can have quite a field day with eating up I/O and keeping disks pretty busy and iirc you may want to turn off the fam service also * turn off all unnecessary services and remove all unnecessary modules from /etc/modules.conf If you find the cause, please let us know. Good luck, Patrick _______________________________________________ Asterisk-Users mailing list Asterisk-Users@lists.digium.com http://lists.digium.com/mailman/listinfo/asterisk-users To UNSUBSCRIBE or update options visit: http://lists.digium.com/mailman/listinfo/asterisk-users