> When I mount with large read and write sizes: > > mount_nfs -r 65536 -w 65536 -U -ointr pandora:/backup /backup > > it changes -- for the worse. Short time into it -- the file stops growing > according to the `ls -sl' run on the NFS server (pandora) at exactly 3200 > FS blocks (the FS was created with `-b 65536 -f 8129'). > > At the same time, according to `systat -if' on both client and server, the > client continues to send (and the server continues to receive) about 30Mb > of some (?) data per second.When the client is in this state it remains quite usable except for the following: 1) Trying to start `systat 1 -vm' stalls ALL access to local disks, apparently -- no new programs can start, and the running ones can not access any data either; attempts to Ctrl-C the starting systat succeed only after several minutes. 2) The writing process is stuck unkillable in the following state: CPU PRI NI VSZ RSS MWCHAN STAT TT TIME 27 -4 0 1351368 137764 nfs DL p4 1:05,52 Sending it any signal has no effect. (Large sizes are explained by it mmap-ing its large input and output.) 3) Forceful umount of the share, that the program is writing to, paralyzes the system for several minutes -- unlike in 1), not even the mouse is moving. It would seem, the process is dumping core, but it is not -- when the system unfreezes, the only message from the kernel is: vm_fault: pager read error, pid XXXX (mzip) Again, this is on 6.1/i386 from today, which we are about to release into the cruel world. Yours, -mi
:When the client is in this state it remains quite usable except for the :following: : : 1) Trying to start `systat 1 -vm' stalls ALL access to local disks, : apparently -- no new programs can start, and the running ones : can not access any data either; attempts to Ctrl-C the starting : systat succeed only after several minutes. : : 2) The writing process is stuck unkillable in the following state: : : CPU PRI NI VSZ RSS MWCHAN STAT TT TIME : 27 -4 0 1351368 137764 nfs DL p4 1:05,52 : : Sending it any signal has no effect. (Large sizes are explained : by it mmap-ing its large input and output.) : : 3) Forceful umount of the share, that the program is writing to, : paralyzes the system for several minutes -- unlike in 1), not : even the mouse is moving. It would seem, the process is dumping : core, but it is not -- when the system unfreezes, the only : message from the kernel is: : : vm_fault: pager read error, pid XXXX (mzip) : :Again, this is on 6.1/i386 from today, which we are about to release into the :cruel world. : :Yours, : : -mi There are a number of problems using a block size of 65536. First of all, I think you can only safely do it if you use a TCP mount, also assuming the TCP buffer size is appropriately large to hold an entire packet. For UDP mounts, 65536 is too large (the UDP data length can only be 65536 bytes. For that matter, the *IP* packet itself can not exceed 65535 bytes. So 65536 will not work with a UDP mount. The second problem is related to the network driver. The packet MTU is 1500, which means, typically, a limit of around 1460-1480 payload bytes per packet. A UDP large UDP packet that is, say, 48KB, will be broken down into over 33 IP packet fragments. The network stack could very well drop some of these packet fragments making delivery of the overall UDP packet unreliable. The NFS protocol itself does allow read and write packets to be truncated providing that the read or write operation is either bounded by the file EOF or (for a read) the remaining data is all zero's. Typically the all-zero's case is only optimized by the NFS server when the underlying filesystem block itself is unallocated (i.e. a 'hole' in the file). In all other cases the full NFS block size is passed between client and server. I would stick to an NFS block size of 8K or 16K. Frankly, there is no real reason to use a larger block size. -Matt Matthew Dillon <dillon@backplane.com>
?????? 22 ???????? 2006 12:23, Matthew Dillon ?? ????????:> My guess is that you are exporting the filesystem as a particular > user id that is not root (i.e. you do not have -maproot=root: in the > exports line on the server).Yes, indeed, re-exporting with -maproot=0 leads to normal behavior. Thanks for the workaround! Here are the stats: As the program is working hard, the incoming traffic on the client is about 200Kb/s (I guess, all those not yet existant pages being faulted in) and the outgoing -- about 7Kb/s with occasional spikes to 8Mb/s (I guess, this is when the flushing takes place).> ? ? What is likely happening is that the NFS client is trying to push out > ? ? the pages using the root uid rather then the user uid. ?This is a > highly probable circumstance for VM pages because once they get > disassociated from the related buffer cache buffer, the cred information > for the last process to modify the related VM pages is lost. ?When the > kernel tries to flush the pages out it winds up using root creds.So mmap is just a more "reliable" way to trigger this problem, right? Is not this, like, a major bug? A file can be opened, written to for a while, and then -- at a semi-random moment -- the log will drop across the road? Ouch... Thanks a lot to all concerned for helping solve this problem. Yours, -mi
:So mmap is just a more "reliable" way to trigger this problem, right? : :Is not this, like, a major bug? A file can be opened, written to for a while, :and then -- at a semi-random moment -- the log will drop across the road? :Ouch... : :Thanks a lot to all concerned for helping solve this problem. Yours, : : -mi I consider it a bug. I think the only way to reliably fix the problem is to give the client the ability to specify the uid to issue RPCs with in the NFS mount command, to match what the export does. -Matt Matthew Dillon <dillon@backplane.com>