Hi, I've got a very weird problem with NFS mounts on a RELENG_6 machine (a.k.a 6.2-PRERELEASE, sources synced yesterday, November 8th). It's an HP Proliant DL360 G4 (G4p to be exact), but that shouldn't matter. I've been banging my head on the table for several hours, but I can't find the source of the problem. :-( What I'm trying to do should be very simple: mounting an NFS directory via TCP (instead of UDP which is the default), like this: # mount_nfs -T -3 -R 3 -i -s -o ro 127.0.0.1:/localdisk /nfs/test Symptom: As soon as I use the -T option (TCP) with the mount command, it simply hangs forever. If I use the intr/soft flags, I can Ctrl-C it after a while, and the mount indeed appears in the output from "mount", but any command that tries to access it (e.g. ls(1)) also hangs. Even umount(8) hangs. More observations: - UDP works perfectly fine. No problems at all. - Other TCP connections beside NFS (e.g. ssh) work fine. - IPF is present, but disabled (ipf -D). - IPFW only contains the default "allow any to any" rule. - The interface doesn't matter. Mounting from localhost (via lo0) has the same problem as via a real NIC. - I first observed the problem on RELENG_6 of 2006-10-19 (but it could be much older, because I haven't tried NFS-via-TCP on this machine before). Then I updated to 2006-11-08, no change. - SMP or UP kernel doesn't make a difference. - No special compiler flags, make.conf is empty. - Kernel config is GENERIC with a few additions for more shared memory and semaphores (so Squid and PostgreSQL are happy) and some other unrelated details. - No suspicious things in dmesg. Kernel prints nothing during the mount attempts. - Output from rpcinfo -p looks good. - tcpdump shows that the TCP connection is immediately shut down: After connecting successfully, it sends a FIN, then reconnects, etc. ad infinitum. Meanwhile vfs.nfs.reconnects increases slowly. - On a different machine (different hardware, but same RELENG_6 and very similar kernel config), the problem does *NOT* occur. I compared sysctl variables relevant to nfs, rpc and tcp, and they're all the same. Also, rpcinfo -p is the same. Now I'm running out of ideas ... Obviously there must be something special with that machine, because it works fine on a different machine, but I'm not able to find out what it is. I even considered putting a few printf() calls into some places in sys/nfsclient/nfs_socket.c to find out what's going on, but I'm not sure if that makes sense and whether it will give any useful results. Any hints and ideas will be greatly appreciated. Best regards Oliver -- Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing Dienstleistungen mit Schwerpunkt FreeBSD: http://www.secnetix.de/bsd Any opinions expressed in this message may be personal to the author and may not necessarily reflect the opinions of secnetix in any way. "And believe me, as a C++ programmer, I don't hesitate to question the decisions of language designers. After a decent amount of C++ exposure, Python's flaws seem ridiculously small." -- Ville Vainio
On Thu, Nov 09, 2006 at 06:17:06PM +0100, Oliver Fromme wrote:> I've got a very weird problem with NFS mounts on a RELENG_6 > machine (a.k.a 6.2-PRERELEASE, sources synced yesterday, > November 8th). It's an HP Proliant DL360 G4 (G4p to be > exact), but that shouldn't matter. I've been banging my > head on the table for several hours, but I can't find the > source of the problem. :-(Is this machine using pf/pfil? If so, are you using "scrub" at all? If so, don't. :-) -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |
Hi Oliver,> Now I'm running out of ideasWell done on that big list of things already tried :-) Using a normal UDP mount I had eratic come & go problems with amd until I added to rc.conf nfs_server_flags="-u -t -n 10" Turns out I had too few. 10 fixed it. man nfsd: A server should run enough daemons to handle the maximum level of concurrency from its clients. defaults/rc.conf nfs_server_flags="-u -t -n 4" In my case my remote amd was trying to mount all 5 of / /tmp /usr /var /usr1 Might help, Good luck. -- Julian Stacey. BSD Unix C Net Consultancy, Munich/Muenchen http://berklix.com Mail Ascii, not HTML. Ihr Rauch = mein allergischer Kopfschmerz. http://berklix.org/free-software
From: "S.C.Sprong" <scsprong@gmail.com> To: freebsd-stable@FreeBSD.ORG Subject: Re: Trouble: NFS via TCP In-Reply-To: <200611091717.kA9HH631005085@lurza.secnetix.de> X-Newsgroups: mpc.lists.freebsd.stable,muc.lists.freebsd.stable In article <200611091717.kA9HH631005085@lurza.secnetix.de> you wrote:>Symptom: As soon as I use the -T option (TCP) with the mount command, >it simply hangs forever. If I use the intr/soft flags, I can Ctrl-C >it after a while, and the mount indeed appears in the output from >"mount", but any command that tries to access it (e.g. ls(1)) also >hangs. Even umount(8) hangs.I've had the same problems and made similar observations. A few more: - Running 'tcpdump tcp port 2048' on the NFS server _after_ the client is stuck in this state causes a spontaneous reboot of the server. - Running 'netstat -a' shows that the client is stuck in an endless connect-disconnect loop and chews through port numbers. - It happens with fxp, rl, and sis cards. - TCP initial window size advertisement doesn't seem to matter. - While using UDP I may have encountered a similar bus as described in NetBSD bugs bin/20663: deadlock in cron(8) Many reboots later, I solved my problem by disabling the following tweaks I had in /etc/sysctl.conf for ages: #vfs.nfs.bufpackets=8 # 20050510: read 16 blocks instead of 8 #vfs.read_max=16 # 20060908: obsolete? #vfs.nfsrv.gatherdelay_v3=10000 And reverted to the system defaults: vfs.nfs4.nfsv3_commit_on_close: 0 vfs.nfs.bufpackets: 4 vfs.nfs.nfs_ip_paranoia: 1 vfs.nfs.nfs_directio_allow_mmap: 1 vfs.nfs.nfs_directio_enable: 0 vfs.nfs.clean_pages_on_close: 1 vfs.nfs.nfsv3_commit_on_close: 0 vfs.nfs.access_cache_timeout: 2 vfs.nfsrv.nfs_privport: 1 vfs.nfsrv.commit_miss: 0 vfs.nfsrv.commit_blks: 0 vfs.nfsrv.async: 0 vfs.nfsrv.gatherdelay_v3: 0 vfs.nfsrv.gatherdelay: 10000 Hope this helps, scs