Arno J. Klaassen
2012-Jul-06 13:19 UTC
nfs-bug when server for 9-Stable becomes client as well ?
Hello, looks like I discouvered a probable bug in the nfs-code, very easy to reproduce in my setup : Machine-1 : Today's 9-stable, exporting /files (ufs) and /z2 (zfs) Machine-2 : 8-stable as of April the 10th exporting /raid1 On Machine-1 I mount /raid1 (rw,nfsv3,intr,tcp,rsize=32768,wsize=32768) and start a script on this mount looping something like : dd if=/dev/random of=BIG bs=1048576 count=${SIZE} cp -fp BIG BIG2 cmp -x BIG BIG2 I let this run for 24 hours (from time to time stressing Machine-1 with other scripts, including provoking heavy swapping), no problem at all. However, then I mount /z2 (rw,nfsv3,intr,tcp,rsize=32768,wsize=32768) on Machine-2, and *immediately* the above loop on Machine-1 fails : Copying file ...cp: BIG: Permission denied No console messages this time, last time I got kernel: nfs_getpages: error 13 kernel: vm_fault: pager read error, pid 87803 (cmp) on Machine-1. I repeated this scenario by replacing Machine-2 with a good old 6-4-stable one, same outcome. Please tell me what I could do to nail this down a bit more. Thanx in advance, Best, Arno
Vincent Hoffman
2012-Jul-06 15:34 UTC
nfs-bug when server for 9-Stable becomes client as well ?
On 06/07/2012 14:19, Arno J. Klaassen wrote:> Hello, > > looks like I discouvered a probable bug in the nfs-code, very > easy to reproduce in my setup : > > > Machine-1 : Today's 9-stable, exporting /files (ufs) and /z2 (zfs) > > Machine-2 : 8-stable as of April the 10th exporting /raid1 > > On Machine-1 I mount /raid1 (rw,nfsv3,intr,tcp,rsize=32768,wsize=32768) > and start a script on this mount looping something like : > > dd if=/dev/random of=BIG bs=1048576 count=${SIZE} > cp -fp BIG BIG2 > cmp -x BIG BIG2 > > I let this run for 24 hours (from time to time stressing Machine-1 with > other scripts, including provoking heavy swapping), no problem at all. > > However, then I mount /z2 (rw,nfsv3,intr,tcp,rsize=32768,wsize=32768) > on Machine-2, and *immediately* the above loop on Machine-1 fails : > > Copying file ...cp: BIG: Permission denied > > No console messages this time, last time I got > > kernel: nfs_getpages: error 13 > kernel: vm_fault: pager read error, pid 87803 (cmp) > > on Machine-1. > > I repeated this scenario by replacing Machine-2 with a good old > 6-4-stable one, same outcome. > > Please tell me what I could do to nail this down a bit more.Its possible (although not definite) that you have hit the a mountd bug as documented in PRs kern/131342 kern/136865 I've recently asked on -CURRENT about this and had a patch to try from Rick, I'm testing it now but it doesnt seem to fix it for me, just improve it alothough I'm trying to get enough runs to be a valid sample. (see http://docs.freebsd.org/cgi/getmsg.cgi?fetch=377627+0+archive/2012/freebsd-current/20120701.freebsd-current ) What I did for my production nas was edit mount.c so it didnt send a SIGHUP to mountd as suggested by rick, as it was easy to do and non intrusive. Vince> > Thanx in advance, > > Best, Arno > > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"