Reading the "waiting IOs" thread made me remember I have a similar problem that has been here for months, and I have no sulution yet. A single CentOS 5.2 x86_64 machine here is overloading our NetApp filer with excessive NFS getattr, lookup and access operations. The weird thing is that the number of these operations increases over time. I have an mrtg graph (which I didn't want to attach here) showing e.g. 200 NFS Ops on Monday, measured with filer-mrtg, going up to, e.g. 1200 in a straight line within days. nfsstat -l on the filer proves beyond doubt that the load is caused by this particular machine. dstat shows me which NFS operations are causing it. date/time | null gatr satr look aces ... 10-09 12:22:52| 0 0 0 0 0 10-09 12:22:53| 0 525 0 602 602 10-09 12:22:54| 0 1275 0 1464 1438 10-09 12:22:55| 0 0 0 0 0 10-09 12:22:56| 0 0 0 0 0 10-09 12:22:57| 0 0 0 0 0 10-09 12:22:58| 0 238 0 270 270 10-09 12:22:59| 0 1461 0 1663 1660 10-09 12:23:00| 0 205 0 133 114 10-09 12:23:01| 0 0 0 0 0 10-09 12:23:02| 0 1 0 0 0 10-09 12:23:03| 0 0 0 0 0 10-09 12:23:04| 0 1411 0 1574 1574 10-09 12:23:05| 0 498 0 465 466 10-09 12:23:06| 0 0 0 0 0 10-09 12:23:07| 0 0 0 0 0 10-09 12:23:08| 0 0 0 0 0 10-09 12:23:09| 0 1082 0 1178 1192 10-09 12:23:10| 0 790 0 885 865 This behaviour is somehow tied to the Gnome desktop. I have other machines running CentOS 5.2 x86_64 (at init level 3) which don't show this behaviour. I also have CentOS 5.2 i386 machines which don't show it either. None of the other machines on the lan show it - RHEL3 32 and 64bit, Solaris. What I'd need is a monitoring tool than can tie the NFS ops to process ids or applications. lsof isn't nearly as helpful here as I thought. I even copied this workstation user's files to another account, logged in and ran the same apps - and couldn't reproduce it. Ideas? Essentially, this makes CentOS 64bit undeployable in our environemnt.
lhecking at users.sourceforge.net wrote:> Reading the "waiting IOs" thread made me remember I have a similar problem > that has been here for months, and I have no sulution yet. > > A single CentOS 5.2 x86_64 machine here is overloading our NetApp filer with > excessive NFS getattr, lookup and access operations. The weird thing is that > the number of these operations increases over time. I have an mrtg graph > (which I didn't want to attach here) showing e.g. 200 NFS Ops on Monday, > measured with filer-mrtg, going up to, e.g. 1200 in a straight line within > days. nfsstat -l on the filer proves beyond doubt that the load is caused by > this particular machine. dstat shows me which NFS operations are causing it. > > date/time | null gatr satr look aces ... > 10-09 12:22:52| 0 0 0 0 0 > 10-09 12:22:53| 0 525 0 602 602 > 10-09 12:22:54| 0 1275 0 1464 1438 > 10-09 12:22:55| 0 0 0 0 0 > 10-09 12:22:56| 0 0 0 0 0 > 10-09 12:22:57| 0 0 0 0 0 > 10-09 12:22:58| 0 238 0 270 270 > 10-09 12:22:59| 0 1461 0 1663 1660 > 10-09 12:23:00| 0 205 0 133 114 > 10-09 12:23:01| 0 0 0 0 0 > 10-09 12:23:02| 0 1 0 0 0 > 10-09 12:23:03| 0 0 0 0 0 > 10-09 12:23:04| 0 1411 0 1574 1574 > 10-09 12:23:05| 0 498 0 465 466 > 10-09 12:23:06| 0 0 0 0 0 > 10-09 12:23:07| 0 0 0 0 0 > 10-09 12:23:08| 0 0 0 0 0 > 10-09 12:23:09| 0 1082 0 1178 1192 > 10-09 12:23:10| 0 790 0 885 865 > > This behaviour is somehow tied to the Gnome desktop. I have other machines > running CentOS 5.2 x86_64 (at init level 3) which don't show this behaviour. > I also have CentOS 5.2 i386 machines which don't show it either. None of the > other machines on the lan show it - RHEL3 32 and 64bit, Solaris. > > What I'd need is a monitoring tool than can tie the NFS ops to process ids > or applications. lsof isn't nearly as helpful here as I thought. I even copied > this workstation user's files to another account, logged in and ran the same > apps - and couldn't reproduce it. > > Ideas? Essentially, this makes CentOS 64bit undeployable in our environemnt.Do you have anything running that would try to read all the files and build a search index - like beagle? There's also the nightly run of updatedb but that just reads the filenames and normally nfs mounts are excluded. -- Les Mikesell lesmikesell at gmail.com
lhecking at users.sourceforge.net wrote:> > Reading the "waiting IOs" thread made me remember I have a similar problem > that has been here for months, and I have no sulution yet. > > A single CentOS 5.2 x86_64 machine here is overloading our NetApp filer > with > excessive NFS getattr, lookup and access operations. The weird thing is >There was a kernel update in the 5.2/5.3 time frame that fixed a NFS client bug regarding lookups, what kernel are you running? Have you tried running lsof on the client side to see which processes are using the files served over NFS? nate
lhecking at users.sourceforge.net
2009-Oct-02 12:11 UTC
[CentOS] [Solved] Excessive NFS operations
lhecking at users.sourceforge.net writes: [...]> A single CentOS 5.2 x86_64 machine here is overloading our NetApp filer with > excessive NFS getattr, lookup and access operations. The weird thing is that > the number of these operations increases over time. I have an mrtg graph > (which I didn't want to attach here) showing e.g. 200 NFS Ops on Monday, > measured with filer-mrtg, going up to, e.g. 1200 in a straight line within > days. nfsstat -l on the filer proves beyond doubt that the load is caused by > this particular machine. dstat shows me which NFS operations are causing it.Thanks for all the replies. I believe we have found the culprit. First, updating the CentOS kernel did not help. I am now >99% certain that the problem was caused by the XScreenSaver "Phosphor" screensaver running in one or more vnc sessions to RHEL3 machines on the CentOS5 desktop. The screensaver was customised to run a perl script in the user's account that generates random quotes. In any case, disabling this screensaver under RHEL3 appears to have solved our problem, with about 5 days' worth of monitoring data to support this. This is definitely a weird interaction, as neither the screensaver nor its components actually run on the CentOS machine. I have not checked whether any other activities in a vnc session cause similar behaviour.