I have two servers, both running 4.10 of within a few days (Aug 5 for venus, Aug 7 for neptune) ... both running jail environments ... one with ~60 running, the other with ~80 ... the one with 60 has been running for ~25 days now, and is at the border of running out of vnodes: Aug 31 20:58:00 venus root: debug.numvnodes: 519920 - debug.freevnodes: 11058 - debug.vnlru_nowhere: 256463 - vlrup Aug 31 20:59:01 venus root: debug.numvnodes: 519920 - debug.freevnodes: 13155 - debug.vnlru_nowhere: 256482 - vlrup Aug 31 21:00:03 venus root: debug.numvnodes: 519920 - debug.freevnodes: 13092 - debug.vnlru_nowhere: 256482 - vlruwt while the other one has been up for ~1 days, but is using alot less, for more processes: Aug 31 20:58:00 neptune root: debug.numvnodes: 344062 - debug.freevnodes: 208655 - debug.vnlru_nowhere: 0 - vlruwt Aug 31 20:59:00 neptune root: debug.numvnodes: 344062 - debug.freevnodes: 208602 - debug.vnlru_nowhere: 0 - vlruwt Aug 31 21:00:03 neptune root: debug.numvnodes: 344062 - debug.freevnodes: 208319 - debug.vnlru_nowhere: 0 - vlruwt I've tried shutting down all of the VMs on venus, and umount'd all of the unionfs mounts, as well as the one nfs mount we have ... the above #s are after the VMs (and mounts are recreated ... Now, my understanding of the vnodes is that for every file opened, a vnode is created ... in my case, since I'm using unionfs, there are two vnodes per file ... if it possible that there are 'stale' vnodes that aren't being freed up? Is there some way of 'viewing' the vnode structure? For instance, fstat shows: venus# fstat | wc -l 19531 So, obviously it isn't just open files that I'm dealing with here, for even if I double that, that is nowhere near 519920 ... So, where else are the vnodes going? Is there a 'leak'? What can I look at to try and narrow this down / provide more information? Even some way of determining a specific process that is sucking back alot of them, to move that to a different machine ... ? Help? ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
As a follow up, looking at vmstat -m .. specifically the work that David did on seperating the union vs regular vnodes: UNION mount 60 2K 3K204800K 162 0 0 32 undcac 0 0K 1K204800K343638713 0 0 16 unpath 13146 227K 1025K204800K 43541149 0 0 16,32,64,128 Export Host 1 1K 1K204800K 164 0 0 256 vnodes 141 7K 8K204800K 613 0 0 16,32,64,128,256 Why does 'vnodes' show only 141 InUse? Or, in this case, should I be looking at: FFS node496600124150K 127870K204800K401059293 0 0 256 496k FFS nodes, if I'm reading right? vs neptune, which is showing only: FFS node300433 75109K 80257K204800K 3875307 0 0 256 On Tue, 31 Aug 2004, Marc G. Fournier wrote:> > I have two servers, both running 4.10 of within a few days (Aug 5 for venus, > Aug 7 for neptune) ... both running jail environments ... one with ~60 > running, the other with ~80 ... the one with 60 has been running for ~25 days > now, and is at the border of running out of vnodes: > > Aug 31 20:58:00 venus root: debug.numvnodes: 519920 - debug.freevnodes: 11058 > - debug.vnlru_nowhere: 256463 - vlrup > Aug 31 20:59:01 venus root: debug.numvnodes: 519920 - debug.freevnodes: 13155 > - debug.vnlru_nowhere: 256482 - vlrup > Aug 31 21:00:03 venus root: debug.numvnodes: 519920 - debug.freevnodes: 13092 > - debug.vnlru_nowhere: 256482 - vlruwt > > while the other one has been up for ~1 days, but is using alot less, for more > processes: > > Aug 31 20:58:00 neptune root: debug.numvnodes: 344062 - debug.freevnodes: > 208655 - debug.vnlru_nowhere: 0 - vlruwt > Aug 31 20:59:00 neptune root: debug.numvnodes: 344062 - debug.freevnodes: > 208602 - debug.vnlru_nowhere: 0 - vlruwt > Aug 31 21:00:03 neptune root: debug.numvnodes: 344062 - debug.freevnodes: > 208319 - debug.vnlru_nowhere: 0 - vlruwt > > I've tried shutting down all of the VMs on venus, and umount'd all of the > unionfs mounts, as well as the one nfs mount we have ... the above #s are > after the VMs (and mounts are recreated ... > > Now, my understanding of the vnodes is that for every file opened, a vnode is > created ... in my case, since I'm using unionfs, there are two vnodes per > file ... if it possible that there are 'stale' vnodes that aren't being freed > up? Is there some way of 'viewing' the vnode structure? > > For instance, fstat shows: > > venus# fstat | wc -l > 19531 > > So, obviously it isn't just open files that I'm dealing with here, for even > if I double that, that is nowhere near 519920 ... > > So, where else are the vnodes going? Is there a 'leak'? What can I look at > to try and narrow this down / provide more information? > > Even some way of determining a specific process that is sucking back alot of > them, to move that to a different machine ... ? > > Help? > > ---- > Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) > Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664 >---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
On Tue, Aug 31, 2004 at 09:21:09PM -0300, Marc G. Fournier wrote:> > I have two servers, both running 4.10 of within a few days (Aug 5 for > venus, Aug 7 for neptune) ... both running jail environments ... one with > ~60 running, the other with ~80 ... the one with 60 has been running for > ~25 days now, and is at the border of running out of vnodes: > > Aug 31 20:58:00 venus root: debug.numvnodes: 519920 - debug.freevnodes: > 11058 - debug.vnlru_nowhere: 256463 - vlrup > Aug 31 20:59:01 venus root: debug.numvnodes: 519920 - debug.freevnodes: > 13155 - debug.vnlru_nowhere: 256482 - vlrup > Aug 31 21:00:03 venus root: debug.numvnodes: 519920 - debug.freevnodes: > 13092 - debug.vnlru_nowhere: 256482 - vlruwt > > [..] > > I've tried shutting down all of the VMs on venus, and umount'd all of the > unionfs mounts, as well as the one nfs mount we have ... the above #s are > after the VMs (and mounts are recreated ... > > Now, my understanding of the vnodes is that for every file opened, a vnode > is created ... in my case, since I'm using unionfs, there are two vnodes > per file ... if it possible that there are 'stale' vnodes that aren't > being freed up? Is there some way of 'viewing' the vnode structure? > > For instance, fstat shows: > > venus# fstat | wc -l > 19531You can also try pstat -f|more from the user side.> So, obviously it isn't just open files that I'm dealing with here, for > even if I double that, that is nowhere near 519920 ...You might want to setup for remote kernel debugging and peek around the system / further examine vnode structures. (If you have physical access to two machines you can setup a null modem cable.)> So, where else are the vnodes going? Is there a 'leak'? What can I look > at to try and narrow this down / provide more information?If the use count isn't decremented (to zero) vnodes wont be placed on the freelist. Perhaps something isn't calling vrele() where it should in unionfs? You should check the reference counts: v_usecount and v_holdcnt on some of the suspect vnodes. Any specific things you might suspect as possible cause? Any messages preceeding the ones you listed above? If you can espace to the debugger, some things to try are: show page show lockedvn You could do a dump for later examination if you are forced to reboot the machine (after trying unmount).> Even some way of determining a specific process that is sucking back alot > of them, to move that to a different machine ... ?While this only works for open file entries you can get a top 10 by using: fstat|perl -ane ' $sum{$F[1]}++; END{print "$_: $sum{$_}\n" for sort {$sum{$b}<=>$sum{$a}} keys %sum} '|head -10> ---- > Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) > Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664-- Allan Fields, AFRSL - http://afields.ca 2D4F 6806 D307 0889 6125 C31D F745 0D72 39B4 5541
On Tue, Aug 31, 2004, Marc G. Fournier wrote:> > I have two servers, both running 4.10 of within a few days (Aug 5 for > venus, Aug 7 for neptune) ... both running jail environments ... one with > ~60 running, the other with ~80 ... the one with 60 has been running for > ~25 days now, and is at the border of running out of vnodes: > > Aug 31 20:58:00 venus root: debug.numvnodes: 519920 - debug.freevnodes: > 11058 - debug.vnlru_nowhere: 256463 - vlrup > Aug 31 20:59:01 venus root: debug.numvnodes: 519920 - debug.freevnodes: > 13155 - debug.vnlru_nowhere: 256482 - vlrup > Aug 31 21:00:03 venus root: debug.numvnodes: 519920 - debug.freevnodes: > 13092 - debug.vnlru_nowhere: 256482 - vlruwt > > while the other one has been up for ~1 days, but is using alot less, for > more processes: > > Aug 31 20:58:00 neptune root: debug.numvnodes: 344062 - debug.freevnodes: > 208655 - debug.vnlru_nowhere: 0 - vlruwt > Aug 31 20:59:00 neptune root: debug.numvnodes: 344062 - debug.freevnodes: > 208602 - debug.vnlru_nowhere: 0 - vlruwt > Aug 31 21:00:03 neptune root: debug.numvnodes: 344062 - debug.freevnodes: > 208319 - debug.vnlru_nowhere: 0 - vlruwt > > I've tried shutting down all of the VMs on venus, and umount'd all of the > unionfs mounts, as well as the one nfs mount we have ... the above #s are > after the VMs (and mounts are recreated ... > > Now, my understanding of the vnodes is that for every file opened, a vnode > is created ... in my case, since I'm using unionfs, there are two vnodes > per file ... if it possible that there are 'stale' vnodes that aren't > being freed up? Is there some way of 'viewing' the vnode structure? > > For instance, fstat shows: > > venus# fstat | wc -l > 19531 > > So, obviously it isn't just open files that I'm dealing with here, for > even if I double that, that is nowhere near 519920 ... > > So, where else are the vnodes going? Is there a 'leak'? What can I look > at to try and narrow this down / provide more information?First of all, use 'fstat -m' to ensure that you're counting all open files. Second, I believe it is normal for the number of files that fstat reports to be lower than the number of vnodes actually allocated, since unreferenced vnodes (which don't show up in fstat) are cached. It's a bit worrisome that debug.vnlru_nowhere is large on one machine and 0 on the other. That number says that the system tried to reclaim some unreferenced vnodes, but didn't find any. I don't know whether this indicates a leak, or merely a vnode-intensive process.> Even some way of determining a specific process that is sucking back alot > of them, to move that to a different machine ... ?'fstat -mu' or 'fstat -mp' might be helpful in tracking down which user or process is eating up your vnodes.