Hi Is there a way to nail down the issue of high load on a server basically trying to understand the reason behind high load at a specific time period. I use top command but it does not have history. Nagios reports saying "*[04-25-2012 10:11:00] SERVICE ALERT: dev;LOAD;WARNING;HARD;3;WARNING - load average: 6.88, 6.36, 5.71"* * * Please help me understand. Regards Kaushal
On 25 April 2012 07:09, Kaushal Shriyan <kaushalshriyan at gmail.com> wrote:> Is there a way to nail down the issue of high load on a server basically > trying to understand the reason behind high load at a specific time period. > I use top command but it does not have history.Among many other solutions, my favourite is nmon with a reasonably aggressive data collection which can be turned into a pretty spreadsheet.
From: Kaushal Shriyan <kaushalshriyan at gmail.com>> Is there a way to nail down the issue of high load on a server basically > trying to understand the reason behind high load at a specific time period. > I use top command but it does not have history.Maybe adapt something like this to your needs: while :; do LOAD=`cat /proc/loadavg | cut -f1 -d'.'`; if [ $LOAD -gt 3 ]; then ps auxfw > /tmp/ps.`date +"%s"`; sleep 60; fi; sleep 10; done JD
On 25/04/2012 07:09, Kaushal Shriyan wrote:> Hi > > Is there a way to nail down the issue of high load on a server basically > trying to understand the reason behind high load at a specific time period. > I use top command but it does not have history. > > Nagios reports saying "*[04-25-2012 10:11:00] SERVICE ALERT: > dev;LOAD;WARNING;HARD;3;WARNING - load average: 6.88, 6.36, 5.71"*When this happens and you get an alert login to the box and take a look. vmstat, top, iostat, iotop ... all good commands to see what's going on. Of course if you can't or don't want to login then you can always get nagios to email you what's currently going on by writing your own custom plugin via nrpe.