thr3ads.net - CentOS - [CentOS] High load average, low CPU utilization [Mar 2014]

If this information is useful, please help other people find it:
Share via:

Matt Garman

2014-Mar-27 22:20 UTC

[CentOS] High load average, low CPU utilization

I have a dual Xeon 5130 (four total CPUs) server running CentOS 5.7.
 Approximately every 17 hours, the load on this server slowly creeps up
until it hits 20, then slowly goes back down.

The most recent example started around 2:00am this morning.  Outside of
these weird times, the load never exceeds 2.0 (and in fact spends the
overwhelming majority of its time at 1.0).  So this morning, a few data
points:

    - 2:06 to 2:07 load increased from 1.0 to 2.0
    - At 2:09 it hit 4.0
    - At 2:10 it hit 5.34
    - At 2:16 it hit 10.02
    - At 2:17 it hit 11.0
    - At 2:24 it hit 17.0
    - At 2:27 it hit 19.0 and stayed here +/1 1.0 until
    - At 2:48 it was 18.96 and looks like it started to go down (very
slowly)
    - At 2:57 it was 17.84
    - At 3:05 it was 16.76
    - At 3:16 it was 15.03
    - At 3:27 it was 9.3
    - At 3:39 it was 4.08
    - At 3:44 it was 1.92, and stayed under 2.0 from there on

This is the 1m load average by the way (i.e. first number in /proc/loadavg,
given by top, uptime, etc).

Running top while this occurs shows very little CPU usage.  It seems the
standard cause of this is processes in a "d" state, which means
waiting on
I/O.  But we're not seeing this.

In fact, I the system runs sar, and I've collected copious amounts of data.
 But I don't see anything that jumps out that correlates with these events.
 I.e., no surges in disk IO, disk read/write bytes, network traffic, etc.
 The system *never* uses any swap.

I also used "dstat" to collect all data that it can for 24 hours (so
it
captured one of these events).  I used 1 second samples, loaded the info up
into a huge spreadsheet, but again, didn't see any obvious
"trigger" or
interesting stuff going on while the load spiked.

All the programs running on the system seem to work fine while this is
happening... but it triggers all kinds of monitoring alerts which is
annoying.  We've been collecting data too, and as I said above, seems to
happen every 17 hours.

I checked all our cron jobs, and nothing jumped out as an obvious culprit.

Anyone seen anything like this?  Any thoughts or ideas?

Thanks,
Matt

Miranda Hawarden-Ogata

2014-Mar-27 22:44 UTC

head link

[CentOS] High load average, low CPU utilization

On 2014/03/27 12:20, Matt Garman wrote:> Anyone seen anything like this?  Any thoughts or ideas?
>
> Thanks,
> Matt
Something of a shot in the dark, but when we had a server with a high 
load average where nothing obvious was causing it, it turned out to be 
multiple df cmds hanging on a stale nfs mount. This command helped us id it:

top -b -n 1 | awk '{if (NR <=7) print; else if ($8 == "D")
{print;
count++} } END {print "Total status D: "count}'

Hope that helps,
Miranda

Mr Queue

2014-Mar-28 14:01 UTC

head link

[CentOS] High load average, low CPU utilization

On Thu, 27 Mar 2014 17:20:22 -0500
Matt Garman <matthew.garman at gmail.com> wrote:
> Anyone seen anything like this?  Any thoughts or ideas?
Post some data.. This public facing? Are you getting sprayed down by packets?
Array? Soft/hard? Someone have screens
laying around? Write a trap to catch a process list when the loads spike? Look
at crontab(s)? User accounts? Malicious
shells? Any guest containers around? Possibilities are sort of endless here.

-- 
People often find it easier to be a result of the past than a cause of the
future.

Mr Queue

2014-Mar-28 20:42 UTC

head link

[CentOS] High load average, low CPU utilization

On Thu, 27 Mar 2014 17:20:22 -0500
Matt Garman <matthew.garman at gmail.com> wrote:
> Any thoughts or ideas?
Start digging into your array. Perhaps you're starting to lose a drive and
it's running daily integrity checks or
something. ie, dropping in and out of the array or the like.. /var/log/messages
might have some clues..

(not cat, but tac) tac /var/log/messages | less

Don't forget about the crons in /etc/cron*

-- 
You know you're a little fat if you have stretch marks on your car.
		-- Cyrus, Chicago Reader 1/22/82

Apparently Analagous Threads

Search for more reasonably related threads

CentOS - Mar 2014 - High load average, low CPU utilization

[CentOS] High load average, low CPU utilization

[CentOS] High load average, low CPU utilization

[CentOS] High load average, low CPU utilization

[CentOS] High load average, low CPU utilization

Apparently Analagous Threads