hw
2020-May-18 11:13 UTC
[CentOS] how does autofs deal with stuck NFS mounts and suspending to RAM?
Hi, after trying sshfs to mount a remote file system on a server with the result that sshfs will sooner or later get stuck and require a reboot of the client, I'm fed up with it and am looking for alternatives. So next I would like to use NFS over a VPN connection instead. To minimize the instances of the NFS mount getting stuck, it might be helpful to use autofs. What happens when the mount is stuck because the connection is down and autofs figures the idle timeout has expired and tries to unmount the remote file system? What happens when I put the client to sleep by suspending to RAM? Will autofs automatically unmount first, or will the server have to deal with a client that has apparently gone away and might re-appear later in unexpected ways? Is there a way to tell NFS to retry an operation _now_ after the connection went down and came back, rather than having to wait for a possibly rather long time? Is there a better alternative for mounting remote file systems over unreliable connections?
Warren Young
2020-May-18 23:36 UTC
[CentOS] how does autofs deal with stuck NFS mounts and suspending to RAM?
On May 18, 2020, at 5:13 AM, hw <hw at gc-24.de> wrote:> > Is there a better alternative for mounting remote file systems over unreliable > connections?I don?t have a good answer for you, because if you?d asked me without all this backstory whether NFS or SSHFS is more tolerant of bad connections, I?d have told you SSHFS. NFS comes out of the "Unix lab? world, where all of the computers are hard-wired to nearby servers. It gets really annoyed when packet loss starts happening, and since it?s down in the kernel, that can mean the whole box locks up until NFS gets happy again. NFS is that way on purpose: it?s often used to provide critical file service (e.g. root-on-NFS) so if file I/O stops happening it *must* block and wait out the failure, else all I/O dependent on NFS starts failing. Some of this affects SSHFS as well. To some extent, the solution to the broader problem is ?Dropbox? et al. That is, a solution that was designed around the idea that connectivity might not be constant. This is also while DVCSes like Git have become popular.
hw
2020-May-19 10:19 UTC
[CentOS] how does autofs deal with stuck NFS mounts and suspending to RAM?
On Tuesday, May 19, 2020 1:36:03 AM CEST Warren Young wrote:> On May 18, 2020, at 5:13 AM, hw <hw at gc-24.de> wrote: > > Is there a better alternative for mounting remote file systems over > > unreliable connections? > > I don?t have a good answer for you, because if you?d asked me without all > this backstory whether NFS or SSHFS is more tolerant of bad connections, > I?d have told you SSHFS.That's what I thought. Should I make a bug report? Sshfs is clearly intended to reconnect automatically when mounted like that, and it doesn't do that.> NFS comes out of the "Unix lab? world, where all of the computers are > hard-wired to nearby servers. It gets really annoyed when packet loss > starts happening, and since it?s down in the kernel, that can mean the > whole box locks up until NFS gets happy again.It's intended to do that, which is fine. Sshfs is intended to do that as well. Both are supposed to reconnect when the connection is back. So far, sshfs has failed to do that to the extend that it is unusable. So far, NFS with autofs hasn't caused issues, yet the testing continues. It's also a lot faster despite I used compression with sshfs.> NFS is that way on purpose: it?s often used to provide critical file service > (e.g. root-on-NFS) so if file I/O stops happening it *must* block and wait > out the failure, else all I/O dependent on NFS starts failing. > > Some of this affects SSHFS as well. To some extent, the solution to the > broader problem is ?Dropbox? et al. That is, a solution that was designed > around the idea that connectivity might not be constant.Well, I need the file system accessible like a file system, not involving storing files somewhere else and downloading them somewhere else or somehow syncing some files manually between servers and clients once in a while. How am I supposed to work remotely when I don't have access to the files involved.> This is also while DVCSes like Git have become popular.Are you sure that's the reason?
Jonathan Billings
2020-May-19 12:22 UTC
[CentOS] how does autofs deal with stuck NFS mounts and suspending to RAM?
On Mon, May 18, 2020 at 05:36:03PM -0600, Warren Young wrote:> On May 18, 2020, at 5:13 AM, hw <hw at gc-24.de> wrote: > > > > Is there a better alternative for mounting remote file systems > > over unreliable > > connections? > > I don?t have a good answer for you, because if you?d asked me > without all this backstory whether NFS or SSHFS is more tolerant of > bad connections, I?d have told you SSHFS.On the other hand, NFS is a fully-featured filesystem that supports fancy features like locking and a full ACL system. SSHFS is a FUSE filesystem that will break a lot of software if you try to use it for anything more complex than 'ls' and 'cp'. For what it's worth, Samba with SMBv3 and the POSIX extension[1] is a lot more tolerant of bad connections, and presents itself as a real filesystem under linux. 1. https://wiki.samba.org/index.php/SMB3-Linux -- Jonathan Billings <billings at negate.org>
Orion Poplawski
2020-May-21 02:55 UTC
[CentOS] how does autofs deal with stuck NFS mounts and suspending to RAM?
On 5/18/20 5:13 AM, hw wrote:> Hi, > > after trying sshfs to mount a remote file system on a server with the result > that sshfs will sooner or later get stuck and require a reboot of the client, > I'm fed up with it and am looking for alternatives. > > So next I would like to use NFS over a VPN connection instead. To minimize > the instances of the NFS mount getting stuck, it might be helpful to use > autofs. > > What happens when the mount is stuck because the connection is down and autofs > figures the idle timeout has expired and tries to unmount the remote file > system?Nothing good, and bad things happen before this.> What happens when I put the client to sleep by suspending to RAM? Will autofs > automatically unmount first, or will the server have to deal with a client > that has apparently gone away and might re-appear later in unexpected ways?This is the mechanism that I use to try to mitigate this on our systems: This triggers on suspend type events: # cat /etc/systemd/system/suspend.target.wants/offnet.service [Unit] Description=Unmount all NFS mounts before disconnecting from network Before=systemd-hibernate.service Before=systemd-shutdown.service Before=systemd-suspend.service [Service] ExecStart=/usr/local/sbin/offnet Type=oneshot [Install] WantedBy=hibernate.target WantedBy=shutdown.target WantedBy=suspend.target ---- This triggers when you bring down a vpn connection with NetworkManager: # cat /etc/NetworkManager/dispatcher.d/pre-down.d/autofs #!/bin/bash if [ -x /usr/bin/logger ]; then LOGGER="/usr/bin/logger -s -p user.notice -t $0" else LOGGER=echo fi [ -z "${DEVICE_IP_IFACE}" ] && exit # Unmount NFS and shutdown autofs if we are shutting down the last ethernet device or exiting vpn if [ "$(/usr/bin/nmcli --terse --fields 'device,type' c show --active | grep -v "^${DEVICE_IP_IFACE}:" | grep -c :802-)" -eq 0 -o \ "${DEVICE_IP_IFACE}" = tun0 ]; then $LOGGER "Unmounting NFS/CIFS directories" /usr/local/sbin/offnet $LOGGER "Performing autofs pre-down stop" systemctl stop autofs.service fi ---- # cat /usr/local/sbin/offnet #!/bin/bash . /etc/init.d/functions # __umount_loop awk_program fstab_file first_msg retry_msg retry_umount_args # awk_program should process fstab_file and return a list of fstab-encoded # paths; it doesn't have to handle comments in fstab_file. __umount_loop() { local remaining sig local retry=3 count remaining=$(LC_ALL=C awk "/^#/ {next} $1" "$2" | sort -r) while [ -n "$remaining" -a "$retry" -gt 0 ]; do if [ "$retry" -eq 3 ]; then action "$3" umount $remaining else action "$4" umount $5 $remaining fi count=4 remaining=$(LC_ALL=C awk "/^#/ {next} $1" "$2" | sort -r) while [ "$count" -gt 0 ]; do [ -z "$remaining" ] && break count=$(($count-1)) usleep 500000 remaining=$(LC_ALL=C awk "/^#/ {next} $1" "$2" | sort -r) done [ -z "$remaining" ] && break kill $sig $(/sbin/fuser -m $remaining 2>/dev/null | sed -e "s/\b$$\b//g") > /dev/null sleep 3 retry=$(($retry -1)) sig=-9 done } __umount_loop '$3 ~ /^nfs/ && $3 != "nfsd" && $2 != "/" {print $2}' \ /proc/mounts \ $"Unmounting NFS filesystems: " \ $"Unmounting NFS filesystems (retry): " \ "-f -l" __umount_loop '$3 ~ /^cifs/ && $2 != "/" {print $2}' \ /proc/mounts \ $"Unmounting CIFS filesystems: " \ $"Unmounting CIFS filesystems (retry): " \ "-f -l"> Is there a way to tell NFS to retry an operation _now_ after the connection > went down and came back, rather than having to wait for a possibly rather long > time?Not that I'm aware of.> Is there a better alternative for mounting remote file systems over unreliable > connections?I would second the recommendation for SMBv3/CIFS for a fault tolerant remote file system. -- Orion Poplawski Manager of NWRA Technical Systems 720-772-5637 NWRA, Boulder/CoRA Office FAX: 303-415-9702 3380 Mitchell Lane orion at nwra.com Boulder, CO 80301 https://www.nwra.com/
Possibly Parallel Threads
- how does autofs deal with stuck NFS mounts and suspending to RAM?
- how does autofs deal with stuck NFS mounts and suspending to RAM?
- how does autofs deal with stuck NFS mounts and suspending to RAM?
- how does autofs deal with stuck NFS mounts and suspending to RAM?
- Using Puppet to manage NFS mounts, or autofs?