Alkis Georgopoulos
2019-Sep-15 06:48 UTC
[klibc] nfsmount default timeo=7 causes timeouts on 100 Mbps
I can't explain why 700 msecs aren't enough to avoid timeouts in 100 Mbps networks, but my tests verify it, so I'm writing to the list to request that you increase the default timeo to at least 30, or to 600 which is the default for `mount -t nfs`. How to reproduce: 1) Cabling: server <=> 100 Mbps switch <=> client Alternatively, one can use a 1000 Mbps switch and this command: ethtool -s enp3s0 speed 100 duplex full autoneg on 2) Server: apt install nfs-kernel-server echo '/srv *(ro,async,no_subtree_check)' >> /etc/exports exportfs -ra truncate -s 10G /srv/10G.file The sparse file ensures that disk IO bandwidth isn't an issue. 3) Client: /usr/lib/klibc/nfsmount -o timeo=7 192.168.1.112:/srv /mnt dd if=/mnt/10G.file of=/dev/null status=progress 4) Result: dd there starts with 11.2 MB/sec, which is fine/expected, and it slowly drops to 2 MB/sec after a while, it lags, omitting some seconds in its output line, e.g. 507510784 bytes (508 MB, 484 MiB) copied, 186 s, 2,7 MB/s^C, at which point "Ctrl+C" needs 30+ seconds to stop dd, because of IO waiting etc. In another terminal tab, `dmesg -w` is full of these: [ 316.404250] nfs: server 192.168.1.112 not responding, still trying [ 316.759512] nfs: server 192.168.1.112 OK By using the NFS mount command defaults, timeo=600 and retrans=2, dd is constantly at 11.2 MB/sec, Ctrl+C is instant, and there's nothing in dmesg. It is entirely possible that timeo=7 should be enough and I bumped into an NFS bug, but I'm not experienced enough to troubleshoot it more without help. If anyone can make timeo=7 work properly in 100 Mbps networks in any distribution/version, please tell me to test with that. I was testing with Ubuntu 18.04.3, kernel 4.15. Kind regards, Alkis Georgopoulos
Alkis Georgopoulos
2019-Sep-15 07:51 UTC
[klibc] nfsmount default timeo=7 causes timeouts on 100 Mbps
I think I got it. Both nfsmount and `mount -t nfs` now default to rsize/wsize = 1 MB. By lowering this to 32K, all issues are gone, even with the default timeo=7. And nfsroot=xxx client responsiveness is a whole lot better. I think when nfsmount was initially written, the default rsize/wsize were much lower, which matched the timeo=7. Now they cause the lags/timeouts that I reported. So please, instead of increasing timeo, decrease the default rsize/wsize to 32K. (or, where does that 1 MB default come from, so that I report it there...) Thanks, Alkis On 9/15/19 9:48 AM, Alkis Georgopoulos wrote:> I can't explain why 700 msecs aren't enough to avoid timeouts in 100 > Mbps networks, but my tests verify it, so I'm writing to the list to > request that you increase the default timeo to at least 30, or to 600 > which is the default for `mount -t nfs`. > > How to reproduce: > > 1) Cabling: > server <=> 100 Mbps switch <=> client > > Alternatively, one can use a 1000 Mbps switch and this command: > ethtool -s enp3s0 speed 100 duplex full autoneg on > > 2) Server: > apt install nfs-kernel-server > echo '/srv *(ro,async,no_subtree_check)' >> /etc/exports > exportfs -ra > truncate -s 10G /srv/10G.file > The sparse file ensures that disk IO bandwidth isn't an issue. > > 3) Client: > /usr/lib/klibc/nfsmount -o timeo=7 192.168.1.112:/srv /mnt > dd if=/mnt/10G.file of=/dev/null status=progress > > 4) Result: > dd there starts with 11.2 MB/sec, which is fine/expected, > and it slowly drops to 2 MB/sec after a while, > it lags, omitting some seconds in its output line, > e.g. 507510784 bytes (508 MB, 484 MiB) copied, 186 s, 2,7 MB/s^C, > at which point "Ctrl+C" needs 30+ seconds to stop dd, > because of IO waiting etc. > > In another terminal tab, `dmesg -w` is full of these: > [? 316.404250] nfs: server 192.168.1.112 not responding, still trying > [? 316.759512] nfs: server 192.168.1.112 OK > > By using the NFS mount command defaults, timeo=600 and retrans=2, dd is > constantly at 11.2 MB/sec, Ctrl+C is instant, and there's nothing in dmesg. > > It is entirely possible that timeo=7 should be enough and I bumped into > an NFS bug, but I'm not experienced enough to troubleshoot it more > without help. > > If anyone can make timeo=7 work properly in 100 Mbps networks in any > distribution/version, please tell me to test with that. > I was testing with Ubuntu 18.04.3, kernel 4.15. > > Kind regards, > Alkis Georgopoulos
Alkis Georgopoulos
2019-Sep-20 04:58 UTC
[klibc] nfsmount default timeo=7 causes timeouts on 100 Mbps
In case anyone's interested, I followed up in the linux-nfs mailing list: https://marc.info/?l=linux-nfs&m=156887818618861&w=2 Thanks, Alkis On 9/15/19 10:51 AM, Alkis Georgopoulos wrote:> I think I got it. > > Both nfsmount and `mount -t nfs` now default to rsize/wsize = 1 MB. > By lowering this to 32K, all issues are gone, even with the default > timeo=7. And nfsroot=xxx client responsiveness is a whole lot better. > > I think when nfsmount was initially written, the default rsize/wsize > were much lower, which matched the timeo=7. > > Now they cause the lags/timeouts that I reported. > > So please, instead of increasing timeo, decrease the default rsize/wsize > to 32K. > (or, where does that 1 MB default come from, so that I report it there...) > > Thanks, > Alkis > > On 9/15/19 9:48 AM, Alkis Georgopoulos wrote: >> I can't explain why 700 msecs aren't enough to avoid timeouts in 100 >> Mbps networks, but my tests verify it, so I'm writing to the list to >> request that you increase the default timeo to at least 30, or to 600 >> which is the default for `mount -t nfs`. >> >> How to reproduce: >> >> 1) Cabling: >> server <=> 100 Mbps switch <=> client >> >> Alternatively, one can use a 1000 Mbps switch and this command: >> ethtool -s enp3s0 speed 100 duplex full autoneg on >> >> 2) Server: >> apt install nfs-kernel-server >> echo '/srv *(ro,async,no_subtree_check)' >> /etc/exports >> exportfs -ra >> truncate -s 10G /srv/10G.file >> The sparse file ensures that disk IO bandwidth isn't an issue. >> >> 3) Client: >> /usr/lib/klibc/nfsmount -o timeo=7 192.168.1.112:/srv /mnt >> dd if=/mnt/10G.file of=/dev/null status=progress >> >> 4) Result: >> dd there starts with 11.2 MB/sec, which is fine/expected, >> and it slowly drops to 2 MB/sec after a while, >> it lags, omitting some seconds in its output line, >> e.g. 507510784 bytes (508 MB, 484 MiB) copied, 186 s, 2,7 MB/s^C, >> at which point "Ctrl+C" needs 30+ seconds to stop dd, >> because of IO waiting etc. >> >> In another terminal tab, `dmesg -w` is full of these: >> [? 316.404250] nfs: server 192.168.1.112 not responding, still trying >> [? 316.759512] nfs: server 192.168.1.112 OK >> >> By using the NFS mount command defaults, timeo=600 and retrans=2, dd >> is constantly at 11.2 MB/sec, Ctrl+C is instant, and there's nothing >> in dmesg. >> >> It is entirely possible that timeo=7 should be enough and I bumped >> into an NFS bug, but I'm not experienced enough to troubleshoot it >> more without help. >> >> If anyone can make timeo=7 work properly in 100 Mbps networks in any >> distribution/version, please tell me to test with that. >> I was testing with Ubuntu 18.04.3, kernel 4.15. >> >> Kind regards, >> Alkis Georgopoulos >
Possibly Parallel Threads
- nfsmount default timeo=7 causes timeouts on 100 Mbps
- nfsmount default timeo=7 causes timeouts on 100 Mbps
- [klibc:master] nfsmount: Use kernel client's default value for timeo option
- nfsmount default timeo=7 causes timeouts on 100 Mbps
- [PATCH] use reliable nfs mount options per default