thr3ads.net - freebsd stable - more weird bugs with mmap-ing via NFS [Mar 2006]

If this information is useful, please help other people find it:
Share via:

Mikhail Teterin

2006-Mar-21 22:48 UTC

more weird bugs with mmap-ing via NFS

> When I mount with large read and write sizes:
>
> 	mount_nfs -r 65536 -w 65536 -U -ointr pandora:/backup /backup
>
> it changes -- for the worse. Short time into it -- the file stops growing
> according to the `ls -sl' run on the NFS server (pandora) at exactly
3200
> FS blocks (the FS was created with `-b 65536 -f 8129').
>
> At the same time, according to `systat -if' on both client and server,
the
> client continues to send (and the server continues to receive) about 30Mb
> of some (?) data per second.
When the client is in this state it remains quite usable except for the 
following:

	1) Trying to start `systat 1 -vm' stalls ALL access to local disks,
	   apparently -- no new programs can start, and the running ones
	   can not access any data either; attempts to Ctrl-C the starting
	   systat succeed only after several minutes.

	2) The writing process is stuck unkillable in the following state:

		CPU PRI NI   VSZ   RSS MWCHAN STAT  TT       TIME
		27  -4  0 1351368 137764 nfs    DL    p4    1:05,52

	   Sending it any signal has no effect. (Large sizes are explained
	   by it mmap-ing its large input and output.)

	3) Forceful umount of the share, that the program is writing to,
	   paralyzes the system for several minutes -- unlike in 1), not
	   even the mouse is moving. It would seem, the process is dumping
	   core, but it is not -- when the system unfreezes, the only
	   message from the kernel is:

		vm_fault: pager read error, pid XXXX (mzip)
	  
Again, this is on 6.1/i386 from today, which we are about to release into the 
cruel world.

Yours,

	-mi

Matthew Dillon

2006-Mar-21 22:56 UTC

head link

more weird bugs with mmap-ing via NFS

:When the client is in this state it remains quite usable except for the 
:following:
:
:	1) Trying to start `systat 1 -vm' stalls ALL access to local disks,
:	   apparently -- no new programs can start, and the running ones
:	   can not access any data either; attempts to Ctrl-C the starting
:	   systat succeed only after several minutes.
:
:	2) The writing process is stuck unkillable in the following state:
:
:		CPU PRI NI   VSZ   RSS MWCHAN STAT  TT       TIME
:		27  -4  0 1351368 137764 nfs    DL    p4    1:05,52
:
:	   Sending it any signal has no effect. (Large sizes are explained
:	   by it mmap-ing its large input and output.)
:
:	3) Forceful umount of the share, that the program is writing to,
:	   paralyzes the system for several minutes -- unlike in 1), not
:	   even the mouse is moving. It would seem, the process is dumping
:	   core, but it is not -- when the system unfreezes, the only
:	   message from the kernel is:
:
:		vm_fault: pager read error, pid XXXX (mzip)
:	  
:Again, this is on 6.1/i386 from today, which we are about to release into the 
:cruel world.
:
:Yours,
:
:	-mi

    There are a number of problems using a block size of 65536.  First of
    all, I think you can only safely do it if you use a TCP mount, also
    assuming the TCP buffer size is appropriately large to hold an entire
    packet.  For UDP mounts, 65536 is too large (the UDP data length can
    only be 65536 bytes.  For that matter, the *IP* packet itself can 
    not exceed 65535 bytes.  So 65536 will not work with a UDP mount.

    The second problem is related to the network driver.  The packet MTU
    is 1500, which means, typically, a limit of around 1460-1480 payload
    bytes per packet.  A UDP large UDP packet that is, say, 48KB, will be
    broken down into over 33 IP packet fragments.  The network stack could
    very well drop some of these packet fragments making delivery of the 
    overall UDP packet unreliable.

    The NFS protocol itself does allow read and write packets to be
    truncated providing that the read or write operation is either bounded
    by the file EOF or (for a read) the remaining data is all zero's.  
    Typically the all-zero's case is only optimized by the NFS server when
    the underlying filesystem block itself is unallocated (i.e. a 'hole'
    in the file).  In all other cases the full NFS block size is passed
    between client and server.

    I would stick to an NFS block size of 8K or 16K.  Frankly, there is
    no real reason to use a larger block size.

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>

Mikhail Teterin

2006-Mar-22 18:13 UTC

head link

more weird bugs with mmap-ing via NFS

?????? 22 ???????? 2006 12:23, Matthew Dillon ??
????????:>     My guess is that you are exporting the filesystem as a particular
>     user id that is not root (i.e. you do not have -maproot=root: in the
>     exports line on the server).
Yes, indeed, re-exporting with -maproot=0 leads to normal behavior. Thanks for 
the workaround!

Here are the stats: As the program is working hard, the incoming traffic on 
the client is about 200Kb/s (I guess, all those not yet existant pages being 
faulted in) and the outgoing -- about 7Kb/s with occasional spikes to 8Mb/s 
(I guess, this is when the flushing takes place).
> ? ? What is likely happening is that the NFS client is trying to push out
> ? ? the pages using the root uid rather then the user uid. ?This is a
>     highly probable circumstance for VM pages because once they get
>     disassociated from the related buffer cache buffer, the cred
information
>     for the last process to modify the related VM pages is lost. ?When the
>     kernel tries to flush the pages out it winds up using root creds.
So mmap is just a more "reliable" way to trigger this problem, right?

Is not this, like, a major bug? A file can be opened, written to for a while, 
and then -- at a semi-random moment -- the log will drop across the road? 
Ouch...

Thanks a lot to all concerned for helping solve this problem. Yours,

	-mi

Matthew Dillon

2006-Mar-22 19:03 UTC

head link

more weird bugs with mmap-ing via NFS

:So mmap is just a more "reliable" way to trigger this problem, right?
:
:Is not this, like, a major bug? A file can be opened, written to for a while, 
:and then -- at a semi-random moment -- the log will drop across the road? 
:Ouch...
:
:Thanks a lot to all concerned for helping solve this problem. Yours,
:
:	-mi

    I consider it a bug.  I think the only way to reliably fix the problem
    is to give the client the ability to specify the uid to issue RPCs with
    in the NFS mount command, to match what the export does.

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>

freebsd stable - Mar 2006 - more weird bugs with mmap-ing via NFS

more weird bugs with mmap-ing via NFS

more weird bugs with mmap-ing via NFS

more weird bugs with mmap-ing via NFS

more weird bugs with mmap-ing via NFS