Hi,
We have a cluster of Poweredge 1750s and couple storage nodes
(Powervault 220) running RH9 (2.4.20-31.9smp). We use rsync for daily
backups. The daily rsync write size is only around 10MB and read size
is ~300MB. The rsync finishes fast enough, but for around 4-5 hours
after rsync finishes, the load on the storage node becomes very high
(~10 from its usual ~0.5). During this period, the compute nodes spits
out the following message and in general jobs run slower..
nfs server not responding
nfs server OK
The time gap between nfs server going down and up is variable and is
upto 10-20secs.
I have ensured all the disks are OK, the network card is OK (full
duplex etc), switch is properly configured, retrans are around 4% (I
am not sure if this above normal, read somewhere <5% is OK), ping
latency is normal and tcpdump shows nothing unusual.
The wsize/rsize is 8K and I have increased [rw]mem_default and
[rw]mem_max in /proc/sys/net/core for tuning performance.
I am sure the problem is being caused by rsync, since I tried it at
different times of day and the problem is repeated. Also tried
different bwlimits.
Any ideas what could be causing this and what else I might try out..
TIA
rsk
--
Rajesh Korde
MSU-MCBI
517-355-9715 x 303