I woke up to a frozen box this morning - it froze up a few more times before I got a handle on it. Basically, the box runs idle but refuses to do disk IO, or does it -very- slowly. Top shows processes stuck in 'ffsvget', 'inode', and 'vlruwk' state. I can get the box responsive again by setting sysctl kern.maxvnods=100000. It starts up with kern.maxnodes=36079. I don't know yet if this is a 'fix' - I wanted to send this mail out before the box froze again. I can reliably get the box in to this state by doing 'find /'. I do have a lot of files on the disk, and a things like squid and postgres that do a lot of file i/o, but I don't recall this happening before this week. I don't find anything in 'tuning' about bumping up vnodes, but I do see sporadic reports on a google group search - searching for 'ffsvgt'. Anybody run into this before? - Mike H.
On Wed, 21 May 2003, Mike Harding wrote:> I woke up to a frozen box this morning - it froze up a few more times > before I got a handle on it. > > Basically, the box runs idle but refuses to do disk IO, or does it > -very- slowly. > > Top shows processes stuck in 'ffsvget', 'inode', and 'vlruwk' state.[...]> I can reliably get the box in to this state by doing 'find /'. I do > have a lot of files on the disk, and a things like squid and postgres > that do a lot of file i/o, but I don't recall this happening before > this week. I don't find anything in 'tuning' about bumping up vnodes, > but I do see sporadic reports on a google group search - searching for > 'ffsvgt'. > > Anybody run into this before?I can reproduce this at will with rsync. Here's how: 1) check out ports from the ports cvs repo using 'cvs co ports' (you can use an anoncvs server or your own local ports cvs repo from cvsup) 2) Since all the CVS/Root files are the same, change them all into hard links to the same file with a script like this: #!/usr/bin/perl # # First arg: file to link to # # remaining args: files to check if their contents is the same as first # file. If they are the same, they will be removed and replaced with a # hard link to the first file. my $master = shift @ARGV; print "Master file: $master\n"; local $/ = undef; open(M, $master) || die "$master: $!\n"; my $master_content = <M>; close M; while (my $file = shift @ARGV) { if ($file eq $master) { print "Skipping original $master\n"; next; } open (F, $file) || die "$file: $!\n"; my $content = <F>; close F; if ($content eq $master_content) { print "replacing $file\n"; unlink $file or die "$file: $!\n"; link $master, $file or die "link $master, $file: $!\n"; } } then: cd /usr/ports; find . -name Root | xargs /tmp/link-if-same.pl ./CVS/Root This will produce thousands of hard links to the same file, like: /usr/ports% ls -l CVS/Root -rw-r--r-- 13613 root wheel 18 May 19 15:03 CVS/Root 3) rsync your ports tree to another box with rsync -avHS source::ports/ /usr/ports/ (set up /usr/local/etc/rsyncd.conf and /etc/inetd.conf as appropriate) 4) watch the destination box hang with rsync in vlruwk wchan. Yes I know this is convoluted, but it reliably reproduces the same problem for me. In my case the fix was to use --exclude=CVS/ in the rsync flags -- I'm sure that doesn't help you. No, I don't have a fix, but maybe this recipe will help someone else debug it. -- Tod McQuillin
Hi, this is ishizuka@ish.org. I posted problem report of http://www.freebsd.org/cgi/query-pr.cgi?pr=52425. I think this is the same problem. Although it was described to fix this problem in 4.5R as http://www.freebsd.org/releases/4.5R/relnotes-i386.html#KERNEL, I tested 4.5R and the system was hanged up. (As sysctl is not responded in 4.5R when system hanged up, I can not determine that the 4.5R has the same problem or not.)
Hi Mike ... What version of FreeBSD are you running? There were several fixes put in just after 4.8 was released, one of which dealt with freeing up vnodes, since I was hitting similar problems on our server where we're using unionfs ... On Wed, 21 May 2003, Mike Harding wrote:> > I woke up to a frozen box this morning - it froze up a few more times > before I got a handle on it. > > Basically, the box runs idle but refuses to do disk IO, or does it > -very- slowly. > > Top shows processes stuck in 'ffsvget', 'inode', and 'vlruwk' state. > > I can get the box responsive again by setting sysctl > kern.maxvnods=100000. It starts up with kern.maxnodes=36079. I don't > know yet if this is a 'fix' - I wanted to send this mail out before > the box froze again. > > I can reliably get the box in to this state by doing 'find /'. I do > have a lot of files on the disk, and a things like squid and postgres > that do a lot of file i/o, but I don't recall this happening before > this week. I don't find anything in 'tuning' about bumping up vnodes, > but I do see sporadic reports on a google group search - searching for > 'ffsvgt'. > > Anybody run into this before? > > - Mike H. > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" >Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
In article <20030524190051.R598@hub.org>, scrappy@hub.org wrote:> What version of FreeBSD are you running? There were several fixes put >in just after 4.8 was released, one of which dealt with freeing up vnodes, >since I was hitting similar problems on our server where we're using >unionfs ...If you're on stable, I suggest reading the man page for mount_union very carefully, especially the BUGS section. Be verrrry careful, you're in dangerous territory. (Yes, I've hurt myself on that.) -- Steve Watt KD6GGD PP-ASEL-IA ICBM: 121W 56' 57.8" / 37N 20' 14.9" Internet: steve @ Watt.COM Whois: SW32 Free time? There's no such thing. It just comes in varying prices...
From: Mike Harding [mailto:mvh@ix.netcom.com]> I'm running a very recent RELENG-4 - but I had a suspicion > that this was > unionfs related, so I unmounted the /usr/ports union mounts > under a jail > in case this was causing the problem, and haven't seen the problem > since.Probably you don't want to use unionfs but nullfs, mount /usr/ports readonly to the jail, and use a jail-writeable directory for port building. I personally use WRKDIRPREFIX=/usr/obj WRKDIR=${WRKDIRPREFIX}${.CURDIR} in /etc/make.conf. This setup has been working for years for me without problems. Helge
> grep vnlru | grep -v grepHandy efficiency hint: You can replace that pair of greps with grep '[v]lnru' which doesn't match the grep arg but still matches all the vlnru lines.
Hi, David-san. I have still vnodes problem in 4.8-stable with /sys/kern/vfs_subr.c 1.249.2.30. 310.locate of weekly cron make slow down or panic. Values of sysctl are shown as follows when they reached slow down. (1) #1 machine (Celeron 466 with 256 mega byte rams) % sysctl kern.maxvnodes kern.maxvnodes: 17979 % sysctl vm.zone | grep VNODE VNODE: 192, 0, 18004, 122, 18004 (2) #2 machine (Celeron 1300 with 512 mega byte rams) % sysctl kern.maxvnodes kern.maxvnodes: 36072 % sysctl vm.zone | grep VNODE VNODE: 192, 0, 36097, 49, 36097 (3) #3 machine (Pentium III 600 with 512 mega rams) % sysctl kern.maxvnodes kern.maxvnodes: 36142 % sysctl vm.zone | grep VNODE VNODE: 192, 0, 36167, 85, 36167 (4) #4 machine (Pentium III 750 with 512 mega byte rams) % sysctl kern.maxvnodes kern.maxvnodes: 36075 % sysctl vm.zone | grep VNODE VNODE: 192, 0, 36100, 46, 36100 -- ishizuka@ish.org
On Mon, Jun 09, 2003, Masachika ISHIZUKA wrote:> Hi, David-san. > I have still vnodes problem in 4.8-stable with /sys/kern/vfs_subr.c > 1.249.2.30. > > 310.locate of weekly cron make slow down or panic. Values of sysctl > are shown as follows when they reached slow down. > > (1) #1 machine (Celeron 466 with 256 mega byte rams) > > % sysctl kern.maxvnodes > kern.maxvnodes: 17979 > % sysctl vm.zone | grep VNODE > VNODE: 192, 0, 18004, 122, 18004This looks pretty normal to me for a quiescent system. Ordinarily I would actually suggest raising maxvnodes if you have lots of little files. Does the number of vnodes shoot up when 310.locate runs? Did you get a backtrace from the panics? Perhaps the VM page cache is still interfering...