I've a the following happen a couple times now, and my internet searches are failing to locate an answer to the problem. We've got a few servers that primarily house VMs using KVM. They've got E-3 cpus and 32 GB RAM, and they run stock CentOS 6.4, fully patched (not yet migrated to 6.5). The VM disk images are housed on an NFS server. None of the VMs is particularly resource-hungry. They run a variety of Linux distros: CentOS 5/6, Debian 6/7. I'll start to see the VMs fail to write files to their local filesytems. No machine in the chain has rebooted or been updated in any significant way, but the root filesystem is off-limits. (This will happen on just one of our servers; the other VM platforms run without issue.) In /var/log/messages, I'll see the following entry for each impacted VM: <date> <host> kernel: kvm: <pid>: cpu0 disabled perfctr wrmsr: 0xc1 data 0xabcd In /var/log/libvirt/qemu/<vm-name>.log, I'll see block I/O error in device 'drive-virtio-disk0': Stale file handle (116) Oddly, the underlying host might be running, say, five VMs, but only four of them will get the log messages, and show the read-only symptoms, while the fifth just keeps chugging along. Googling suggests that the "disabled perfctr wrmsr" message is harmless, but my experience suggests otherwise. Any hints, workarounds, or relevent information is very welcome. Thanks! -- Paul Heinlein heinlein at madboa.com 45?38' N, 122?6' W
> <date> <host> kernel: kvm: <pid>: cpu0 disabled perfctr wrmsr: 0xc1 data > 0xabcd > > In /var/log/libvirt/qemu/<vm-name>.log, I'll see > > block I/O error in device 'drive-virtio-disk0': Stale file handle (116)Are you sure your network is sound. If you turn on debug logging on the NFS do you see anything interesting at the time you get the Stale file handle errors?
On 02.12.2013 23:29, Paul Heinlein wrote:> I've a the following happen a couple times now, and my internet > searches are failing to locate an answer to the problem. > > We've got a few servers that primarily house VMs using KVM. They've > got E-3 cpus and 32 GB RAM, and they run stock CentOS 6.4, fully > patched (not yet migrated to 6.5). The VM disk images are housed on an > NFS server. None of the VMs is particularly resource-hungry. They run > a variety of Linux distros: CentOS 5/6, Debian 6/7. > > I'll start to see the VMs fail to write files to their local > filesytems. No machine in the chain has rebooted or been updated in > any significant way, but the root filesystem is off-limits. (This will > happen on just one of our servers; the other VM platforms run without > issue.) > > In /var/log/messages, I'll see the following entry for each impacted > VM: > > <date> <host> kernel: kvm: <pid>: cpu0 disabled perfctr wrmsr: 0xc1 > data 0xabcd > > In /var/log/libvirt/qemu/<vm-name>.log, I'll see > > block I/O error in device 'drive-virtio-disk0': Stale file handle > (116) > > Oddly, the underlying host might be running, say, five VMs, but only > four of them will get the log messages, and show the read-only > symptoms, while the fifth just keeps chugging along.I think CentOS ext4 filesystems do remount read-only in cases where the underlying device has problems; if in your case your network has any timeouts or is maxed-out then it could explain the problem. To ignore this might prolly be unwise, but it can be done by specifying errors=continue in /etc/fstab. I would do some network/throughput tests between your hosts though, check that all drives are fine, that have available space etc. Also check the logs, dmesg and so on. -- Sent from the Delta quadrant using Borg technology! Nux! www.nux.ro
On 03.Dez.2013, at 00:29, Paul Heinlein wrote:> I've a the following happen a couple times now, and my internet searches are failing to locate an answer to the problem. > > We've got a few servers that primarily house VMs using KVM. They've got E-3 cpus and 32 GB RAM, and they run stock CentOS 6.4, fully patched (not yet migrated to 6.5). The VM disk images are housed on an NFS server. None of the VMs is particularly resource-hungry. They run a variety of Linux distros: CentOS 5/6, Debian 6/7. > > I'll start to see the VMs fail to write files to their local filesytems. No machine in the chain has rebooted or been updated in any significant way, but the root filesystem is off-limits. (This will happen on just one of our servers; the other VM platforms run without issue.) > > In /var/log/messages, I'll see the following entry for each impacted VM: > > <date> <host> kernel: kvm: <pid>: cpu0 disabled perfctr wrmsr: 0xc1 data 0xabcd > > In /var/log/libvirt/qemu/<vm-name>.log, I'll see > > block I/O error in device 'drive-virtio-disk0': Stale file handle (116) > > Oddly, the underlying host might be running, say, five VMs, but only four of them will get the log messages, and show the read-only symptoms, while the fifth just keeps chugging along. > > Googling suggests that the "disabled perfctr wrmsr" message is harmless, but my experience suggests otherwise. > > Any hints, workarounds, or relevent information is very welcome.I have seen a non-root ext4 filesystem going read only while providing it to 2 virtual machines at the same time by mistake. I went read-only only on one virtual machine. -- Markus
Reasonably Related Threads
- Re: Sr-iov passthrough - no packet arrive to guest
- Intermittent problem, likely disk IO related - mptscsih: ioc0: attempting task abort!
- KVM does not start anymore
- Sr-iov passthrough - no packet arrive to guest
- kvm: centos 7 guest on centos 6 host network problems