thr3ads.net - freebsd stable - Trouble with NFSd under 6.1-Stable, any ideas? [May 2006]

If this information is useful, please help other people find it:
Share via:

Howard Leadmon

2006-May-14 18:29 UTC

Trouble with NFSd under 6.1-Stable, any ideas?

Hello All, 

 I have been running FBSD a long while, and actually running since the 5.x
releases on the server I am having troubles with.   I basically have a small
network and just use NIS/NFS to link my various FBSD and Solaris machines
together.

 This has all been running fine up till a few days ago, when all of a sudden
NFS came to a crawl, and CPU usage so high the box appears to freeze almost.
When I had 6.1-RC running all seemed well, then came the announcement for the
official 6.1 release, so I did the cvs updates, made world, kernel, and ran
mergemaster to get everything up to the 6.1 stable version.

 Now after doing this, something is wrong with NFS.   It works, it will return
information and open files, just it's very very slow, and while performing a
request the CPU spike is astounding.  A simple du of my home directory can
take minutes, and machine all but locks up if the request is done over NFS.
Here is top snip:

  PID USERNAME   THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
  497 root         1   4    0  1252K   780K -      2  50:42 188.48% nfsd


 This is a nice IBM eServer with dual P4-XEON's and a couple GB or RAM on a
disk array, and locally is screams, heck NFS used to scream till I updated.  I
am not really sure what info would be useful in debugging, so won't post
tons
of misc junk in this eMail, but if anyone has any ideas as to how best to
figure out and resolve this issue it would sure be appreicated...



---
Howard Leadmon
http://www.leadmon.net

Stephen Hurd

2006-May-14 21:54 UTC

head link

Trouble with NFSd under 6.1-Stable, any ideas?

Howard Leadmon wrote:>    Hello All, 
>
>  I have been running FBSD a long while, and actually running since the 5.x
> releases on the server I am having troubles with.   I basically have a
small
> network and just use NIS/NFS to link my various FBSD and Solaris machines
> together.
>
>  This has all been running fine up till a few days ago, when all of a
sudden
> NFS came to a crawl, and CPU usage so high the box appears to freeze
almost.
> When I had 6.1-RC running all seemed well, then came the announcement for
the
> official 6.1 release, so I did the cvs updates, made world, kernel, and ran
> mergemaster to get everything up to the 6.1 stable version.
>
>  Now after doing this, something is wrong with NFS.   It works, it will
return
> information and open files, just it's very very slow, and while
performing a
> request the CPU spike is astounding.  A simple du of my home directory can
> take minutes, and machine all but locks up if the request is done over NFS.
> Here is top snip:
>
>   PID USERNAME   THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
>   497 root         1   4    0  1252K   780K -      2  50:42 188.48% nfsd
>
>
>  This is a nice IBM eServer with dual P4-XEON's and a couple GB or RAM
on a
> disk array, and locally is screams, heck NFS used to scream till I updated.
I
> am not really sure what info would be useful in debugging, so won't
post tons
> of misc junk in this eMail, but if anyone has any ideas as to how best to
> figure out and resolve this issue it would sure be appreicated...
>   Are you running rpc.lockd?  I've had very bad luck with it since 
sometime in the 5.x series... especially with it interoperating with 
Solaris.  I submitted a PR on it, but it's apparently broken in about X 
ways.  If possible, I would suggest living without rpc.lockd for now (if 
you're currently living with it that is)

Other than that issue, NFS itself has been working nicely for me.

Michel Talon

2006-May-14 22:55 UTC

head link

Trouble with NFSd under 6.1-Stable, any ideas?

> Are you running rpc.lockd?  I've had very bad luck with it since 
> sometime in the 5.x series... especially with it interoperating with 
> Solaris.  I submitted a PR on it, but it's apparently broken in about X
> ways.  If possible, I would suggest living without rpc.lockd for now (if 
> you're currently living with it that is)
On the contrary NFS problems interoperating with Linux have been cleared for
me since upgrading Linux to Fedora Core 5 and FreeBSD to 6.1. In particular
rpc.lockd works, everything is OK, performance is fine. I had very bad
problems in the past, when we were running Fedora Core 3.


-- 

Michel TALON

Kris Kennaway

2006-May-15 02:50 UTC

head link

Trouble with NFSd under 6.1-Stable, any ideas?

On Sun, May 14, 2006 at 02:28:55PM -0400, Howard Leadmon
wrote:> 
>    Hello All, 
> 
>  I have been running FBSD a long while, and actually running since the 5.x
> releases on the server I am having troubles with.   I basically have a
small
> network and just use NIS/NFS to link my various FBSD and Solaris machines
> together.
> 
>  This has all been running fine up till a few days ago, when all of a
sudden
> NFS came to a crawl, and CPU usage so high the box appears to freeze
almost.
> When I had 6.1-RC running all seemed well, then came the announcement for
the
> official 6.1 release, so I did the cvs updates, made world, kernel, and ran
> mergemaster to get everything up to the 6.1 stable version.
> 
>  Now after doing this, something is wrong with NFS.   It works, it will
return
> information and open files, just it's very very slow, and while
performing a
> request the CPU spike is astounding.  A simple du of my home directory can
> take minutes, and machine all but locks up if the request is done over NFS.
> Here is top snip:
> 
>   PID USERNAME   THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
>   497 root         1   4    0  1252K   780K -      2  50:42 188.48% nfsd
> 
> 
>  This is a nice IBM eServer with dual P4-XEON's and a couple GB or RAM
on a
> disk array, and locally is screams, heck NFS used to scream till I updated.
I
> am not really sure what info would be useful in debugging, so won't
post tons
> of misc junk in this eMail, but if anyone has any ideas as to how best to
> figure out and resolve this issue it would sure be appreicated...
Use tcpdump and related tools to find out what traffic is being sent.

Also verify that you did not change your system configuration in any
way: there have been no changes to NFS since the release, so it is
unclear why an update would cause the problem to suddenly occur.

Kris

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
Url :
http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20060515/06c1a1ba/attachment.pgp

Joerg Lehners

2006-May-24 16:58 UTC

head link

Trouble with NFSd under 6.1-Stable, any ideas?

"Rong-en Fan" <grafan@gmail.com> wrote:> On 5/14/06, Kris Kennaway <kris@obsecurity.org> wrote:
>> On Sun, May 14, 2006 at 02:28:55PM -0400, Howard Leadmon wrote:
>>>
[...]>> Use tcpdump and related tools to find out what traffic is being sent.
>>
>> Also verify that you did not change your system configuration in any
>> way: there have been no changes to NFS since the release, so it is
>> unclear why an update would cause the problem to suddenly occur.
>>
>> Kris
>
> Hi Kris and Howard,
>
> As I posted few days ago, I have similar problems like Howard's
> (some details in the thread "6.1-RELEASE, em0 high interrupt rate
> and nfsd eats lots of cpu" on stable@). After binary searching
> the source tree, I found that
>
> RELENG_6_1, 2006.04.30.03.57 ok
> RELENG_6_1, 2006.04.30.04.00 bad
>
> The only commit is kern/vfs_lookup.c, an MFC of rev 1.90 and 1.91.
> With 04.30 03.57's source + manaully patched vfs_lookup.c rev 1.90,
> the same problem occurs.[...]

Confirmed!

I can create the problem here at will.

Setup 1: NFS server 'testido' FreeBSD 6.1-STABLE as of 15. May 2006
with sys/kern/vfs_lookup.c 1.80.2.7, NFS schurks FreeBSD 6.1-STABLE as of
15. May 2006.

/usr/src from testido mounted on /mnt on schurks.
running 'cd /mnt ; du >/dev/null' two times (first after fresh boot
of
testido second when all served data is in memory of testido):

joerg @ schurks> cd /mnt
joerg @ schurks> time du >/dev/null
    86.09s real     0.14s user     1.91s system
joerg @ schurks> time du >/dev/null
   205.10s real     0.20s user     1.92s system
joerg @ schurks>

Screenfull output of top on testido AFTER both tests (testido stopped
responding to screen output sometimes, especially during the
second test):

last pid:   329;  load averages:  4.14,  2.77,  1.25    up 0+00:07:30  18:44:47
29 processes:  1 running, 28 sleeping
CPU states:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
Mem: 8420K Active, 28M Inact, 72M Wired, 110M Buf, 880M Free
Swap: 4000M Total, 4000M Free

   PID USERNAME  THR PRI NICE   SIZE    RES STATE    TIME   WCPU COMMAND
   201 root        1   4    0  1232K   792K -        4:42 116.31% nfsd
   329 joerg       1  96    0  2404K  1676K RUN      0:00  0.00% top
   168 root        1 115    0  2456K  1760K select   0:00  0.00% sshd
   313 root        1  96    0  1428K  1168K select   0:00  0.00% rlogind
   194 root        1 115    0  1556K  1256K select   0:00  0.00% mountd
   299 root        1   8    0  1720K  1436K wait     0:00  0.00% login
   314 root        1   8    0  1748K  1460K wait     0:00  0.00% login
   298 root        1  96    0  1304K  1048K select   0:00  0.00% rlogind
   199 root        1   4    0  1356K  1040K accept   0:00  0.00% nfsd
   256 root        1  96    0  2892K  1760K select   0:00  0.00% ntpd
   315 joerg       1  20    0  1448K  1020K pause    0:00  0.00% ksh
   300 root        1   5    0  1448K   996K ttyin    0:00  0.00% ksh
   158 root        1  96    0  1332K   940K select   0:00  0.00% syslogd
   163 root        1  96    0  1448K  1128K select   0:00  0.00% inetd
   176 root        1  96    0  1408K  1044K select   0:00  0.00% rpcbind
   185 root        1  96    0  1476K  1148K select   0:00  0.00% ypbind
   261 root        1 115    0  1304K   952K select   0:00  0.00% lpd

Setup 2: NFS server 'testido' FreeBSD 6.1-STABLE as of 15. May 2006
with sys/kern/vfs_lookup.c 1.80.2.6, NFS schurks FreeBSD 6.1-STABLE as of
15. May 2006.

Same tests as before:

joerg @ schurks> time du >/dev/null
    22.63s real     0.15s user     1.82s system
joerg @ schurks> time du >/dev/null
    16.52s real     0.17s user     1.68s system
joerg @ schurks>

Screenfull output of top on testido AFTER both tests (testido responded
fine during both tests):

last pid:   329;  load averages:  0.49,  0.26,  0.10    up 0+00:01:50  18:35:30
29 processes:  1 running, 28 sleeping
CPU states:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
Mem: 8424K Active, 28M Inact, 72M Wired, 110M Buf, 880M Free
Swap: 4000M Total, 4000M Free

   PID USERNAME  THR PRI NICE   SIZE    RES STATE    TIME   WCPU COMMAND
   201 root        1   4    0  1232K   792K -        0:03  3.76% nfsd
   168 root        1 115    0  2456K  1760K select   0:00  0.00% sshd
   329 joerg       1  96    0  2404K  1676K RUN      0:00  0.00% top
   313 root        1  96    0  1428K  1168K select   0:00  0.00% rlogind
   194 root        1 115    0  1556K  1256K select   0:00  0.00% mountd
   299 root        1   8    0  1720K  1440K wait     0:00  0.00% login
   314 root        1   8    0  1748K  1464K wait     0:00  0.00% login
   298 root        1  96    0  1304K  1048K select   0:00  0.00% rlogind
   199 root        1   4    0  1356K  1040K accept   0:00  0.00% nfsd
   315 joerg       1  20    0  1448K  1020K pause    0:00  0.00% ksh
   256 root        1  96    0  2892K  1760K select   0:00  0.00% ntpd
   300 root        1   5    0  1448K   996K ttyin    0:00  0.00% ksh
   158 root        1  96    0  1332K   940K select   0:00  0.00% syslogd
   163 root        1  96    0  1448K  1128K select   0:00  0.00% inetd
   261 root        1 109    0  1304K   952K select   0:00  0.00% lpd
   176 root        1  96    0  1408K  1044K select   0:00  0.00% rpcbind
   185 root        1  96    0  1476K  1148K select   0:00  0.00% ypbind


See the HUGE difference in consumed TIME.

The only difference was sys/kern/vfs_lookup.c version 1.80.2.6
vs. 1.80.2.7.


   Joerg
-- 
Mail: Joerg.Lehners@Informatik.Uni-Oldenburg.DE    Tel: 2198
Real: Joerg Lehners, Informatik ARBI, Uni Oldenburg, D-26111 Oldenburg
Unwoerter: Kostensenkung - Gewinnmaximierung - billig, billig, billig

freebsd stable - May 2006 - Trouble with NFSd under 6.1-Stable, any ideas?

Trouble with NFSd under 6.1-Stable, any ideas?

Trouble with NFSd under 6.1-Stable, any ideas?

Trouble with NFSd under 6.1-Stable, any ideas?

Trouble with NFSd under 6.1-Stable, any ideas?

Trouble with NFSd under 6.1-Stable, any ideas?