Hello,
We have a VM (under KVM - a VPS service by our ISP) running CentOS 7.
On it we have 2 NFS mounts, one for backup and one as a live file system
(where there are two user homes as well):
-----------------------------------------------------------------------------------------------------------------------
# cat /etc/fstab
/dev/mapper/centos-root / xfs defaults 0 0
UUID=7a3ae70a-8ef3-463b-8f5b-be4e2e7be894 /boot xfs defaults 0 0
/dev/mapper/centos-swap swap swap defaults 0 0
10.201.40.34:/data/col1/noc-bkups-1 /mnt/dd2500-1 nfs
auto,noatime,nolock,bg,nfsvers=3,intr,tcp,actimeo=1800 0 0
10.201.40.34:/data/col1/hesperia-mount /hesperiamount nfs
auto,noatime,nolock,bg,nfsvers=3,intr,tcp,actimeo=1800 0 0
-----------------------------------------------------------------------------------------------------------------------
This setup has been working fine for over a year, even under significant
load, without issues.
However, yesterday, the "live" NFS mount (/hesperiamount) has started
crashing. When bootingeverything is fine, but very soonafter boot we
noticed that we lose communication to the mount, although the remote
storage system is accessible(without reporting any errors) and no
network issues have occurred. We found that dmesg reports failures with
call traces (2 examples):
https://pastebin.com/GVSDbxFr
https://pastebin.com/WujKQuHG
This happens repeatedly/consistently (after several reboots) so we have
been forced to replace the NFS mount with a local mount (on a new local
virtual hard disk), to restore normal system operation. So the fstab has
now become:
-------------------------------------------------------------------------------------------------------------------
# cat /etc/fstab
/dev/mapper/centos-root / xfs defaults 0 0
UUID=7a3ae70a-8ef3-463b-8f5b-be4e2e7be894 /boot xfs defaults 0 0
/dev/mapper/centos-swap swap swap defaults 0 0
/dev/mapper/vg2-lv1 /hesperiamount xfs defaults 0 0
10.201.40.34:/data/col1/noc-bkups-1 /mnt/dd2500-1 nfs
auto,noatime,nolock,bg,nfsvers=3,intr,tcp,actimeo=1800 0 0
# 10.201.40.34:/data/col1/hesperia-mount /hesperiamount nfs
auto,noatime,nolock,bg,nfsvers=3,intr,tcp,actimeo=1800 0 0
-------------------------------------------------------------------------------------------------------------------
Note that when I later mounted manually the same NFS share on the same
box (in order to copy data from it using rsync), it did not crash (but
it only had reads and no writes in this scenario). The share was
manually mounted with the following command:
# mount -vv -o auto,noatime,nolock,bg,nfsvers=3,intr,tcp,actimeo=1800 -t
nfs 10.201.40.34:/data/col1/hesperia-mount /hesperiamount2
Questions:
* Is this a known issue/bug?
* Have we possibly made any NFS misconfigurations (which however have
not caused any errors for about a year now)?
* What could we do to prevent the error from occurring again?
Please advise.
Thanks,
Nick