thr3ads.net - freebsd stable - 4.11-STABLE leaks vnodes worse then 4.x from Feb 13th ... ? [Jul 2005]

If this information is useful, please help other people find it:
Share via:

Marc G. Fournier

2005-Jul-15 15:34 UTC

4.11-STABLE leaks vnodes worse then 4.x from Feb 13th ... ?

Recently, I started having problems with one of my newest servers ... 
figuring that it might have somethign to do with the fact that I went SATA 
for this one (all others are SCSI), I figured it might be a driver issue 
causing the problems, since everything else is the same as the other 5 
servers on our network ...

Today, I'm starting to wonder if I've been just looking at the
"most
obvious" cause, instead of looking deeper ...

The problem that manifests itself is similar to the old 'ran out of
vnode'
issue I used to experience under 4.x ... the server would still run, be 
totally pingable, and you could even get the motd when you tried to ssh 
in, but you couldn't get a prompt, and all processes were hung ...

I just upgraded the kernel on this machine (mercury) on the 13th of July, 
and its been running 1 day, 12 hrs now ... there is hardly anything 
running on this machine (10 jails), and vnode usage is:

debug.numvnodes: 336460 - debug.freevnodes: 5275 - debug.vnlru_nowhere: 0 -
vlruwt

One of my older servers (neptune), running kernels from Feb 13th of this 
year, and with 81 jails running on it, is using up *significantly less* 
vnodes (uptime: 1 day, 10 hours):

debug.numvnodes: 279710 - debug.freevnodes: 91442 - debug.vnlru_nowhere: 0 -
vlruwt

Now, compared to neptune, mercury isn't running anything special ... 
several apache 1 processes, postfix, cyrus-imapd and that's it ... 
neptune on the other hand, is running the full gambit ... aolserver, java, 
apache 1 and 2, postfix, etc ...

So, I'm starting to think that the problem isn't "hardware
related", but
the kernel itself ... the latest 4.11-STABLE kernel seems to have brought 
in new vnode leakage, or ... vnlru isn't working as it should be to free 
up vnodes ...

Looking at that process on mercury:

# ps aux | grep vnlru
root        7  0.0  0.0     0    0  ??  DL   Wed11PM   0:00.65  (vnlru)

whereas on neptune:

# ps aux | grep vnlru
root          9  0.0  0.0     0    0  ??  DL   Thu01AM   0:00.79  (vnlru)

so about the same about of CPU time being expended ... a bit more on the 
more loaded server, but not a major amount ...

I'd like to try and debug this, but don't know where to start ... I 
realize that 4.x isn't being pushed anymore, but there are alot of us that 
haven't moved to 5.x yet (am working on that for our next server, but its 
going to take me several months before I can convert all our existing 
servers up) ...

I do have a serial console on this server, if that helps to debug things 
...

I've heard that there was some work done on 5.x to clean up some of the 
vnode leaks ... not sure if that is fact or just rumor ... but, if so, 
would any of them be MFCable to 4.x?

Thanks ...

----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org           Yahoo!: yscrappy              ICQ: 7615664

Marc G. Fournier

2005-Jul-16 12:01 UTC

head link

4.11-STABLE leaks vnodes worse then 4.x from Feb 13th ... ?

'k, first time I've ever seen this happen ... this morning, around 4am, 
vnode usage jumped by almost 130k ... no idea why, since the previous 
morning, it was only by ~20k ... but:

Jul 16 04:12:00 mercury root: debug.numvnodes: 336460 - debug.freevnodes: 6404 -
debug.vnlru_nowhere: 0 - vlruwt
Jul 16 04:24:00 mercury root: debug.numvnodes: 458359 - debug.freevnodes: 27 -
debug.vnlru_nowhere: 0 - vlruwt

At about 8:30am, when I checked the system, freevnodes was still <2k, 
although minvnodes is set to ~50k ...

Jul 16 08:36:00 mercury root: debug.numvnodes: 460354 - debug.freevnodes: 1221 -
debug.vnlru_nowhere: 0 - vlruwt

Figuring that before the server once more hung solid, I'd do a 'clean 
reboot', I shutdown all of the VMs, and umount'd the large drive ... it 
freed up >400k vnodes really quickly:

Jul 16 08:45:16 mercury root: debug.numvnodes: 460354 - debug.freevnodes: 426587
- debug.vnlru_nowhere: 0 - vlruwt

Is there some way of getting a report on vnode usage?  And/or forcing a 
flush (ala sync?) without having to umount?

Thanks ...



On Fri, 15 Jul 2005, Marc G. Fournier wrote:
>
> Recently, I started having problems with one of my newest servers ... 
> figuring that it might have somethign to do with the fact that I went SATA 
> for this one (all others are SCSI), I figured it might be a driver issue 
> causing the problems, since everything else is the same as the other 5 
> servers on our network ...
>
> Today, I'm starting to wonder if I've been just looking at the
"most obvious"
> cause, instead of looking deeper ...
>
> The problem that manifests itself is similar to the old 'ran out of
vnode'
> issue I used to experience under 4.x ... the server would still run, be 
> totally pingable, and you could even get the motd when you tried to ssh in,
> but you couldn't get a prompt, and all processes were hung ...
>
> I just upgraded the kernel on this machine (mercury) on the 13th of July,
and
> its been running 1 day, 12 hrs now ... there is hardly anything running on 
> this machine (10 jails), and vnode usage is:
>
> debug.numvnodes: 336460 - debug.freevnodes: 5275 - debug.vnlru_nowhere: 0 -
> vlruwt
>
> One of my older servers (neptune), running kernels from Feb 13th of this 
> year, and with 81 jails running on it, is using up *significantly less* 
> vnodes (uptime: 1 day, 10 hours):
>
> debug.numvnodes: 279710 - debug.freevnodes: 91442 - debug.vnlru_nowhere: 0
-
> vlruwt
>
> Now, compared to neptune, mercury isn't running anything special ...
several
> apache 1 processes, postfix, cyrus-imapd and that's it ... neptune on
the
> other hand, is running the full gambit ... aolserver, java, apache 1 and 2,
> postfix, etc ...
>
> So, I'm starting to think that the problem isn't "hardware
related", but the
> kernel itself ... the latest 4.11-STABLE kernel seems to have brought in
new
> vnode leakage, or ... vnlru isn't working as it should be to free up
vnodes
> ...
>
> Looking at that process on mercury:
>
> # ps aux | grep vnlru
> root        7  0.0  0.0     0    0  ??  DL   Wed11PM   0:00.65  (vnlru)
>
> whereas on neptune:
>
> # ps aux | grep vnlru
> root          9  0.0  0.0     0    0  ??  DL   Thu01AM   0:00.79  (vnlru)
>
> so about the same about of CPU time being expended ... a bit more on the
more
> loaded server, but not a major amount ...
>
> I'd like to try and debug this, but don't know where to start ... I
realize
> that 4.x isn't being pushed anymore, but there are alot of us that
haven't
> moved to 5.x yet (am working on that for our next server, but its going to 
> take me several months before I can convert all our existing servers up)
...
>
> I do have a serial console on this server, if that helps to debug things
...
>
> I've heard that there was some work done on 5.x to clean up some of the
vnode
> leaks ... not sure if that is fact or just rumor ... but, if so, would any
of
> them be MFCable to 4.x?
>
> Thanks ...
>
> ----
> Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
> Email: scrappy@hub.org           Yahoo!: yscrappy              ICQ: 7615664
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to
"freebsd-stable-unsubscribe@freebsd.org"
>
----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org           Yahoo!: yscrappy              ICQ: 7615664

Marc G. Fournier

2005-Jul-17 15:10 UTC

head link

vnode leak in NFS (Was: Re: 4.11-STABLE leaks vnodes worse then 4.x from Feb 13th ... ?)

Wow, now this was unexpected ... figured I try a quick theory this morning 
... debug.freevnodes was down to:

debug.freevnodes: 103987

umount /nfs and ...

# sysctl debug.freevnodes
debug.freevnodes: 332106


The vnode leak isn't in the unionfs code ... its in the nfs code :(


On Fri, 15 Jul 2005, Marc G. Fournier wrote:
>
> Recently, I started having problems with one of my newest servers ... 
> figuring that it might have somethign to do with the fact that I went SATA 
> for this one (all others are SCSI), I figured it might be a driver issue 
> causing the problems, since everything else is the same as the other 5 
> servers on our network ...
>
> Today, I'm starting to wonder if I've been just looking at the
"most obvious"
> cause, instead of looking deeper ...
>
> The problem that manifests itself is similar to the old 'ran out of
vnode'
> issue I used to experience under 4.x ... the server would still run, be 
> totally pingable, and you could even get the motd when you tried to ssh in,
> but you couldn't get a prompt, and all processes were hung ...
>
> I just upgraded the kernel on this machine (mercury) on the 13th of July,
and
> its been running 1 day, 12 hrs now ... there is hardly anything running on 
> this machine (10 jails), and vnode usage is:
>
> debug.numvnodes: 336460 - debug.freevnodes: 5275 - debug.vnlru_nowhere: 0 -
> vlruwt
>
> One of my older servers (neptune), running kernels from Feb 13th of this 
> year, and with 81 jails running on it, is using up *significantly less* 
> vnodes (uptime: 1 day, 10 hours):
>
> debug.numvnodes: 279710 - debug.freevnodes: 91442 - debug.vnlru_nowhere: 0
-
> vlruwt
>
> Now, compared to neptune, mercury isn't running anything special ...
several
> apache 1 processes, postfix, cyrus-imapd and that's it ... neptune on
the
> other hand, is running the full gambit ... aolserver, java, apache 1 and 2,
> postfix, etc ...
>
> So, I'm starting to think that the problem isn't "hardware
related", but the
> kernel itself ... the latest 4.11-STABLE kernel seems to have brought in
new
> vnode leakage, or ... vnlru isn't working as it should be to free up
vnodes
> ...
>
> Looking at that process on mercury:
>
> # ps aux | grep vnlru
> root        7  0.0  0.0     0    0  ??  DL   Wed11PM   0:00.65  (vnlru)
>
> whereas on neptune:
>
> # ps aux | grep vnlru
> root          9  0.0  0.0     0    0  ??  DL   Thu01AM   0:00.79  (vnlru)
>
> so about the same about of CPU time being expended ... a bit more on the
more
> loaded server, but not a major amount ...
>
> I'd like to try and debug this, but don't know where to start ... I
realize
> that 4.x isn't being pushed anymore, but there are alot of us that
haven't
> moved to 5.x yet (am working on that for our next server, but its going to 
> take me several months before I can convert all our existing servers up)
...
>
> I do have a serial console on this server, if that helps to debug things
...
>
> I've heard that there was some work done on 5.x to clean up some of the
vnode
> leaks ... not sure if that is fact or just rumor ... but, if so, would any
of
> them be MFCable to 4.x?
>
> Thanks ...
>
> ----
> Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
> Email: scrappy@hub.org           Yahoo!: yscrappy              ICQ: 7615664
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to
"freebsd-stable-unsubscribe@freebsd.org"
>
----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org           Yahoo!: yscrappy              ICQ: 7615664

freebsd stable - Jul 2005 - 4.11-STABLE leaks vnodes worse then 4.x from Feb 13th ... ?

4.11-STABLE leaks vnodes worse then 4.x from Feb 13th ... ?

4.11-STABLE leaks vnodes worse then 4.x from Feb 13th ... ?

vnode leak in NFS (Was: Re: 4.11-STABLE leaks vnodes worse then 4.x from Feb 13th ... ?)