thr3ads.net - freebsd stable - Trouble: NFS via TCP [Nov 2006]

If this information is useful, please help other people find it:
Share via:

Oliver Fromme

2006-Nov-09 18:00 UTC

Trouble: NFS via TCP

Hi,

I've got a very weird problem with NFS mounts on a RELENG_6
machine (a.k.a 6.2-PRERELEASE, sources synced yesterday,
November 8th).  It's an HP Proliant DL360 G4 (G4p to be
exact), but that shouldn't matter.  I've been banging my
head on the table for several hours, but I can't find the
source of the problem.  :-(

What I'm trying to do should be very simple:  mounting an
NFS directory via TCP (instead of UDP which is the default),
like this:

# mount_nfs -T -3 -R 3 -i -s -o ro 127.0.0.1:/localdisk /nfs/test

Symptom:  As soon as I use the -T option (TCP) with the
mount command, it simply hangs forever.  If I use the
intr/soft flags, I can Ctrl-C it after a while, and the
mount indeed appears in the output from "mount", but any
command that tries to access it (e.g. ls(1)) also hangs.
Even umount(8) hangs.

More observations:

 - UDP works perfectly fine.  No problems at all.
 - Other TCP connections beside NFS (e.g. ssh) work fine.
 - IPF is present, but disabled (ipf -D).
 - IPFW only contains the default "allow any to any" rule.
 - The interface doesn't matter.  Mounting from localhost
   (via lo0) has the same problem as via a real NIC.
 - I first observed the problem on RELENG_6 of 2006-10-19
   (but it could be much older, because I haven't tried
   NFS-via-TCP on this machine before).  Then I updated
   to 2006-11-08, no change.
 - SMP or UP kernel doesn't make a difference.
 - No special compiler flags, make.conf is empty.
 - Kernel config is GENERIC with a few additions for more
   shared memory and semaphores (so Squid and PostgreSQL
   are happy) and some other unrelated details.
 - No suspicious things in dmesg.  Kernel prints nothing
   during the mount attempts.
 - Output from rpcinfo -p looks good.
 - tcpdump shows that the TCP connection is immediately
   shut down:  After connecting successfully, it sends a
   FIN, then reconnects, etc. ad infinitum.  Meanwhile
   vfs.nfs.reconnects increases slowly.
 - On a different machine (different hardware, but same
   RELENG_6 and very similar kernel config), the problem
   does *NOT* occur.  I compared sysctl variables relevant
   to nfs, rpc and tcp, and they're all the same.  Also,
   rpcinfo -p is the same.

Now I'm running out of ideas ...  Obviously there must be
something special with that machine, because it works fine
on a different machine, but I'm not able to find out what
it is.

I even considered putting a few printf() calls into some
places in sys/nfsclient/nfs_socket.c to find out what's
going on, but I'm not sure if that makes sense and whether
it will give any useful results.

Any hints and ideas will be greatly appreciated.

Best regards
   Oliver

-- 
Oliver Fromme,  secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing
Dienstleistungen mit Schwerpunkt FreeBSD: http://www.secnetix.de/bsd
Any opinions expressed in this message may be personal to the author
and may not necessarily reflect the opinions of secnetix in any way.

"And believe me, as a C++ programmer, I don't hesitate to question
the decisions of language designers.  After a decent amount of C++
exposure, Python's flaws seem ridiculously small." -- Ville Vainio

Jeremy Chadwick

2006-Nov-09 18:24 UTC

head link

Trouble: NFS via TCP

On Thu, Nov 09, 2006 at 06:17:06PM +0100, Oliver Fromme
wrote:> I've got a very weird problem with NFS mounts on a RELENG_6
> machine (a.k.a 6.2-PRERELEASE, sources synced yesterday,
> November 8th).  It's an HP Proliant DL360 G4 (G4p to be
> exact), but that shouldn't matter.  I've been banging my
> head on the table for several hours, but I can't find the
> source of the problem.  :-(
Is this machine using pf/pfil?  If so, are you using "scrub"
at all?  If so, don't.  :-)

-- 
| Jeremy Chadwick                                 jdc at parodius.com |
| Parodius Networking                        http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, USA |
| Making life hard for others since 1977.               PGP: 4BD6C0CB |

Julian H. Stacey

2006-Nov-09 19:25 UTC

head link

Trouble: NFS via TCP

Hi Oliver,
> Now I'm running out of ideas 
Well done on that big list of things already tried :-)
Using a normal UDP mount I had eratic come & go problems with amd
until I added to rc.conf
	nfs_server_flags="-u -t -n 10"
Turns out I had too few. 10 fixed it.
man nfsd:
	A server should run enough daemons to handle the maximum
	level of concurrency from its clients.
defaults/rc.conf
	nfs_server_flags="-u -t -n 4"
In my case my remote amd was trying to mount all 5 of
	/
	/tmp
	/usr
	/var
	/usr1
Might help, Good luck.

-- 
Julian Stacey.  BSD Unix C Net Consultancy, Munich/Muenchen  http://berklix.com
Mail Ascii, not HTML.		Ihr Rauch = mein allergischer Kopfschmerz.
			http://berklix.org/free-software

S.C.Sprong

2006-Nov-22 15:29 UTC

head link

Trouble: NFS via TCP

From: "S.C.Sprong" <scsprong@gmail.com>
To: freebsd-stable@FreeBSD.ORG
Subject: Re: Trouble: NFS via TCP
In-Reply-To: <200611091717.kA9HH631005085@lurza.secnetix.de>
X-Newsgroups: mpc.lists.freebsd.stable,muc.lists.freebsd.stable

In article <200611091717.kA9HH631005085@lurza.secnetix.de> you
wrote:>Symptom:  As soon as I use the -T option (TCP) with the mount command,
>it simply hangs forever.  If I use the intr/soft flags, I can Ctrl-C
>it after a while, and the mount indeed appears in the output from
>"mount", but any command that tries to access it (e.g. ls(1)) also
>hangs. Even umount(8) hangs.
I've had the same problems and made similar observations.
A few more:

- Running 'tcpdump tcp port 2048' on the NFS server _after_ the client
  is stuck in this state causes a spontaneous reboot of the server.

- Running 'netstat -a' shows that the client is stuck in an endless
  connect-disconnect loop and chews through port numbers.

- It happens with fxp, rl, and sis cards.

- TCP initial window size advertisement doesn't seem to matter.

- While using UDP I may have encountered a similar bus as described
  in NetBSD bugs bin/20663: deadlock in cron(8)


Many reboots later, I solved my problem by disabling the following tweaks
I had in /etc/sysctl.conf for ages:

#vfs.nfs.bufpackets=8
# 20050510: read 16 blocks instead of 8
#vfs.read_max=16
# 20060908: obsolete?
#vfs.nfsrv.gatherdelay_v3=10000

And reverted to the system defaults:

vfs.nfs4.nfsv3_commit_on_close: 0
vfs.nfs.bufpackets: 4
vfs.nfs.nfs_ip_paranoia: 1
vfs.nfs.nfs_directio_allow_mmap: 1
vfs.nfs.nfs_directio_enable: 0
vfs.nfs.clean_pages_on_close: 1
vfs.nfs.nfsv3_commit_on_close: 0
vfs.nfs.access_cache_timeout: 2
vfs.nfsrv.nfs_privport: 1
vfs.nfsrv.commit_miss: 0
vfs.nfsrv.commit_blks: 0
vfs.nfsrv.async: 0
vfs.nfsrv.gatherdelay_v3: 0
vfs.nfsrv.gatherdelay: 10000

Hope this helps,
scs

freebsd stable - Nov 2006 - Trouble: NFS via TCP

Trouble: NFS via TCP

Trouble: NFS via TCP

Trouble: NFS via TCP

Trouble: NFS via TCP