thr3ads.net - CentOS - [CentOS] waiting IOs... [Sep 2009]

If this information is useful, please help other people find it:
Share via:

John Doe

2009-Sep-09 16:23 UTC

[CentOS] waiting IOs...

Hi,

We have a storage server (HP DL360G5 + MSA20 (12 disks in RAID 6) on a
SmartArray6400).
10 directories are exported through nfs to 10 clients
(rsize=32768,wsize=32768,soft,intr,nosuid,proto=udp,vers=3).
The server is apparently not doing much but... we have very high waiting IOs.

dstat show very little activity, but high 'wai'...

# dstat 
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw 
  0   0  88  12   0   0| 413k   98k|   0     0 |   0     0 | 188   132 
  0   1  46  53   0   0| 716k   48k|  19k  420k|   0     0 |1345   476 
  0   1  49  50   0   1| 492k   32k|  12k  181k|   0     0 |1269   482 
  0   1  63  37   0   0| 316k  159k|  58k  278k|   0     0 |1789  1562 
  0   0  74  26   0   0|  84k  512k|1937B 6680B|   0     0 |1200   106 
  0   1  44  55   0   1| 612k   80k|  14k  221k|   0     0 |1378   538 
  1   1  52  47   0   0| 628k    0 |  17k  318k|   0     0 |1327   520 
  0   1  50  49   0   0| 484k   60k|  14k  178k|   0     0 |1303   494 
  0   0  87  13   0   0| 124k    0 |7745B  116k|   0     0 |1083   139 
  0   1  59  41   0   0| 316k   60k|4828B   67k|   0     0 |1179   346 

top shows that one nfsd is usualy in state 'D' (waiting).

# top -i    (sorted by cpu usage)
top - 18:11:28 up 207 days,  7:13,  2 users,  load average: 0.99, 1.07, 1.00
Tasks: 124 total,   1 running, 123 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us,  0.2%sy,  0.0%ni, 54.3%id, 45.3%wa,  0.2%hi,  0.0%si,  0.0%st
Mem:   3089252k total,  3068112k used,    21140k free,   928468k buffers
Swap:  2008116k total,      164k used,  2007952k free,   293716k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+ 
COMMAND
16571 root      15   0 12708 1076  788 R    1  0.0   0:00.02
top
 2580 root      15   0     0    0    0 D    0  0.0   2:36.70 nfsd 

# cat /proc/net/rpc/nfsd
rc 8872 34768207 38630969
fh 142 0 0 0 0
io 2432226534 884662242
th 32 394 4851.311 2437.416 370.949 238.432 542.241 4.942 2.239 1.000 0.427
0.541
ra 64 3876274 5025 3724 2551 2030 2036 1506 1607 1219 1154 1136249
net 73410453 73261524 0 0
rpc 73408119 0 0 0 0
proc2 18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
proc3 22 33 9503937 1315066 11670859 7139862 0 5033349 28129122 3729031 0 0 0
487614 0 1116215 0 0 2054329 21225 66 0 2351744
proc4 2 0 0
proc4ops 40 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0

Do you think nfs is the problem here?
If so, is there something wrong with our config?
Is it too much to have 10 dir x 10 clients, even if there is almost no traffic?

Thx,
JD

nate

2009-Sep-09 17:56 UTC

head link

[CentOS] waiting IOs...

John Doe wrote:> Hi,
>
> We have a storage server (HP DL360G5 + MSA20 (12 disks in RAID 6) on a
> SmartArray6400).
> 10 directories are exported through nfs to 10 clients
> (rsize=32768,wsize=32768,soft,intr,nosuid,proto=udp,vers=3).
> The server is apparently not doing much but... we have very high waiting
> IOs.
How about running iostat -x ? Sounds like the system is doing a lot
more than you think it is..

nate

Eugene Vilensky

2009-Sep-09 18:17 UTC

head link

[CentOS] waiting IOs...

> Do you think nfs is the problem here?
> If so, is there something wrong with our config?
> Is it too much to have 10 dir x 10 clients, even if there is almost no
traffic?
This has very little to do with the number of clients or directories,
except that in general terms more users = more work.  If the workload
is highly random and not easily cached, you could be seeing low
throughput but also IOPS starvation.  Does 12 disks in Raid6 mean 10
data disks?  That's not a lot at all, especially if it is an array
that doesn't do much abstraction; any given block will only exist in
on a single spindle, creating potential for hot spots if we are
talking large sized disks.

Can you do any analysis of the array, specifically looking for hot
disks, busy disks, high latencies, etc?  Do you know what your
baseline should be for this array versus your 10 client workload?

John R Pierce

2009-Sep-09 18:34 UTC

head link

[CentOS] waiting IOs...

John Doe wrote:> Hi,
>
> We have a storage server (HP DL360G5 + MSA20 (12 disks in RAID 6) on a
SmartArray6400).
> 10 directories are exported through nfs to 10 clients
(rsize=32768,wsize=32768,soft,intr,nosuid,proto=udp,vers=3).
> The server is apparently not doing much but... we have very high waiting
IOs.
>
> dstat show very little activity, but high 'wai'...
>   
as others have said, iostat -x N  for an N like 5 (5 seconds).   ignore 
the first sample as its the average since boot, instead, look at the 
ongoing 5 second interval samples, and look for high await, svctm, and 
%util, as well as the rrqm/s and wrqm/s numbers, rather than sec/s, as 
sequential access is likely the least of your problems.    raid6 does 
pretty poorly on heavy random write workloads

Also, IBM has a neat freeware system analysis tool called NMON 
(originally for AIX, ported to Linux)
http://www.ibm.com/developerworks/aix/library/au-analyze_aix/     works 
sorta like a souped up 'top' but has per file system IO stats and stuff 
too.   it can also accumulate stats over a long period into a CSV file, 
and they have an excel spreadsheet that loads said CSV file and cranks 
out a lot of fairly useful graphs.    dunno if the excel spreadsheet 
works in OOcalc or not.

James Pearson

2009-Sep-10 10:35 UTC

head link

[CentOS] waiting IOs...

John Doe wrote:> Hi,
> 
> We have a storage server (HP DL360G5 + MSA20 (12 disks in RAID 6) on a
SmartArray6400).
> 10 directories are exported through nfs to 10 clients
(rsize=32768,wsize=32768,soft,intr,nosuid,proto=udp,vers=3).
> The server is apparently not doing much but... we have very high waiting
IOs.
Probably not connected, but personally I would use 'hard' and 
'proto=tcp' instead of 'soft' and 'proto=udp' on the
clients

James Pearson

Maybe Matching Threads

Search for more apparently analagous threads

CentOS - Sep 2009 - waiting IOs...

[CentOS] waiting IOs...

[CentOS] waiting IOs...

[CentOS] waiting IOs...

[CentOS] waiting IOs...

[CentOS] waiting IOs...

Maybe Matching Threads