Alkis Georgopoulos
2019-Sep-20 04:58 UTC
[klibc] nfsmount default timeo=7 causes timeouts on 100 Mbps
In case anyone's interested, I followed up in the linux-nfs mailing list: https://marc.info/?l=linux-nfs&m=156887818618861&w=2 Thanks, Alkis On 9/15/19 10:51 AM, Alkis Georgopoulos wrote:> I think I got it. > > Both nfsmount and `mount -t nfs` now default to rsize/wsize = 1 MB. > By lowering this to 32K, all issues are gone, even with the default > timeo=7. And nfsroot=xxx client responsiveness is a whole lot better. > > I think when nfsmount was initially written, the default rsize/wsize > were much lower, which matched the timeo=7. > > Now they cause the lags/timeouts that I reported. > > So please, instead of increasing timeo, decrease the default rsize/wsize > to 32K. > (or, where does that 1 MB default come from, so that I report it there...) > > Thanks, > Alkis > > On 9/15/19 9:48 AM, Alkis Georgopoulos wrote: >> I can't explain why 700 msecs aren't enough to avoid timeouts in 100 >> Mbps networks, but my tests verify it, so I'm writing to the list to >> request that you increase the default timeo to at least 30, or to 600 >> which is the default for `mount -t nfs`. >> >> How to reproduce: >> >> 1) Cabling: >> server <=> 100 Mbps switch <=> client >> >> Alternatively, one can use a 1000 Mbps switch and this command: >> ethtool -s enp3s0 speed 100 duplex full autoneg on >> >> 2) Server: >> apt install nfs-kernel-server >> echo '/srv *(ro,async,no_subtree_check)' >> /etc/exports >> exportfs -ra >> truncate -s 10G /srv/10G.file >> The sparse file ensures that disk IO bandwidth isn't an issue. >> >> 3) Client: >> /usr/lib/klibc/nfsmount -o timeo=7 192.168.1.112:/srv /mnt >> dd if=/mnt/10G.file of=/dev/null status=progress >> >> 4) Result: >> dd there starts with 11.2 MB/sec, which is fine/expected, >> and it slowly drops to 2 MB/sec after a while, >> it lags, omitting some seconds in its output line, >> e.g. 507510784 bytes (508 MB, 484 MiB) copied, 186 s, 2,7 MB/s^C, >> at which point "Ctrl+C" needs 30+ seconds to stop dd, >> because of IO waiting etc. >> >> In another terminal tab, `dmesg -w` is full of these: >> [? 316.404250] nfs: server 192.168.1.112 not responding, still trying >> [? 316.759512] nfs: server 192.168.1.112 OK >> >> By using the NFS mount command defaults, timeo=600 and retrans=2, dd >> is constantly at 11.2 MB/sec, Ctrl+C is instant, and there's nothing >> in dmesg. >> >> It is entirely possible that timeo=7 should be enough and I bumped >> into an NFS bug, but I'm not experienced enough to troubleshoot it >> more without help. >> >> If anyone can make timeo=7 work properly in 100 Mbps networks in any >> distribution/version, please tell me to test with that. >> I was testing with Ubuntu 18.04.3, kernel 4.15. >> >> Kind regards, >> Alkis Georgopoulos >
Thorsten Glaser
2019-Sep-20 19:19 UTC
[klibc] nfsmount default timeo=7 causes timeouts on 100 Mbps
Alkis Georgopoulos dixit:> In case anyone's interested, I followed up in the linux-nfs mailing list: > https://marc.info/?l=linux-nfs&m=156887818618861&w=2Thanks, I am interested in how this plays out and read through the thread. We used to use NFS with our terminal server infra? structure as well and noticed similar problems but do not have the knowledge to track this down but are on LTSP sshfs now. bye, //mirabilos -- Thorsten Glaser (Founding Member) Teckids e.V.???Digital freedom with youth and education https://www.teckids.org/
Alkis Georgopoulos
2019-Sep-21 11:14 UTC
[klibc] nfsmount default timeo=7 causes timeouts on 100 Mbps
I managed to get to the bottom of this, and filed a bug report for NFS: https://bugzilla.kernel.org/show_bug.cgi?id=204939 Klibc nfsmount still a bug: it needs to NOT hardcode timeo=7. Either the NFS defaults should be used, which result in: timeo=600,rsize=1048576,wsize=1048576, or at least the kernel documented defaults, https://www.kernel.org/doc/Documentation/filesystems/nfs/nfsroot.txt which are: timeo=7,rsize=4096,wsize=4096 But clearly hardcoding (the old) timeo=7 and leaving the (new) default rsize=1048576 is wrong and it's causing dmesg errors. With all the workarounds applied, the traffic for netbooting a client with ext4-over-nfs dropped from e.g. 1160 MB to 221 MB, i.e. it's a major improvement.
Ben Hutchings
2019-Oct-07 16:14 UTC
[klibc] nfsmount default timeo=7 causes timeouts on 100 Mbps
On Sat, 2019-09-21 at 14:14 +0300, Alkis Georgopoulos wrote:> I managed to get to the bottom of this, and filed a bug report for NFS: > https://bugzilla.kernel.org/show_bug.cgi?id=204939 > > Klibc nfsmount still a bug: it needs to NOT hardcode timeo=7.Right. It looks like we should set it to 0 and the kernel will then substitute its default value. This is what nfs-utils appears to do. Ben.> Either the NFS defaults should be used, > which result in: timeo=600,rsize=1048576,wsize=1048576, > or at least the kernel documented defaults, > https://www.kernel.org/doc/Documentation/filesystems/nfs/nfsroot.txt > which are: timeo=7,rsize=4096,wsize=4096 > > But clearly hardcoding (the old) timeo=7 and leaving the (new) default > rsize=1048576 is wrong and it's causing dmesg errors. > > With all the workarounds applied, the traffic for netbooting a client > with ext4-over-nfs dropped from e.g. 1160 MB to 221 MB, i.e. it's a > major improvement.-- Ben Hutchings Nothing is ever a complete failure; it can always serve as a bad example. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: <https://lists.zytor.com/archives/klibc/attachments/20191007/e718780c/attachment.sig>
Possibly Parallel Threads
- nfsmount default timeo=7 causes timeouts on 100 Mbps
- nfsmount default timeo=7 causes timeouts on 100 Mbps
- [klibc:master] nfsmount: Use kernel client's default value for timeo option
- nfsmount default timeo=7 causes timeouts on 100 Mbps
- [klibc:master] nfsmount: support nfsvers= and vers= options