André Bauer
2015-Oct-22 18:45 UTC
[Gluster-users] VM fs becomes read only when one gluster node goes down
Hi, i have a 4 node Glusterfs 3.5.6 Cluster. My VM images are in an replicated distributed volume which is accessed from kvm/qemu via libgfapi. Mount is against storage.domain.local which has IPs for all 4 Gluster nodes set in DNS. When one of the Gluster nodes goes down (accidently reboot) a lot of the vms getting read only filesystem. Even when the node comes back up. How can i prevent this? I expect that the vm just uses the replicated file on the other node, without getting ro fs. Any hints? Thanks in advance. -- Regards Andr? Bauer
Krutika Dhananjay
2015-Oct-23 02:24 UTC
[Gluster-users] VM fs becomes read only when one gluster node goes down
Could you share the output of 'gluster volume info', and also information as to which node went down on reboot? -Krutika ----- Original Message -----> From: "Andr? Bauer" <abauer at magix.net> > To: "gluster-users" <gluster-users at gluster.org> > Cc: gluster-devel at gluster.org > Sent: Friday, October 23, 2015 12:15:04 AM > Subject: [Gluster-users] VM fs becomes read only when one gluster node goes > down> Hi,> i have a 4 node Glusterfs 3.5.6 Cluster.> My VM images are in an replicated distributed volume which is accessed > from kvm/qemu via libgfapi.> Mount is against storage.domain.local which has IPs for all 4 Gluster > nodes set in DNS.> When one of the Gluster nodes goes down (accidently reboot) a lot of the > vms getting read only filesystem. Even when the node comes back up.> How can i prevent this? > I expect that the vm just uses the replicated file on the other node, > without getting ro fs.> Any hints?> Thanks in advance.> -- > Regards > Andr? Bauer> _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20151022/7c07a7bc/attachment.html>
Niels de Vos
2015-Oct-26 20:56 UTC
[Gluster-users] [Gluster-devel] VM fs becomes read only when one gluster node goes down
On Thu, Oct 22, 2015 at 08:45:04PM +0200, Andr? Bauer wrote:> Hi, > > i have a 4 node Glusterfs 3.5.6 Cluster. > > My VM images are in an replicated distributed volume which is accessed > from kvm/qemu via libgfapi. > > Mount is against storage.domain.local which has IPs for all 4 Gluster > nodes set in DNS. > > When one of the Gluster nodes goes down (accidently reboot) a lot of the > vms getting read only filesystem. Even when the node comes back up. > > How can i prevent this? > I expect that the vm just uses the replicated file on the other node, > without getting ro fs. > > Any hints?There are at least two timeouts that are involved in this problem: 1. The filesystem in a VM can go read-only when the virtual disk where the filesystem is located does not respond for a while. 2. When a storage server that holds a replica of the virtual disk becomes unreachable, the Gluster client (qemu+libgfapi) waits for max. network.ping-timeout seconds before it resumes I/O. Once a filesystem in a VM goes read-only, you might be able to fsck and re-mount it read-writable again. It is not something a VM will do by itself. The timeouts for (1) are set in sysfs: $ cat /sys/block/sda/device/timeout 30 30 seconds is the default for SD-devices, and for testing you can change it with an echo: # echo 300 > /sys/block/sda/device/timeout This is not a peristent change, you can create a udev-rule to apply this change at bootup. Some of the filesystem offer a mount option that can change the behaviour after a disk error is detected. "man mount" shows the "errors" option for ext*. Changing this to "continue" is not recommended, "abort" or "panic" will be the most safe for your data. The timeout mentioned in (2) is for the Gluster Volume, and checked by the client. When a client does a write to a replicated volume, the write needs to be acknowledged by both/all replicas. The client (libgfapi) delays the reply to the application (qemu) until both/all replies from the replicas has been received. This delay is configured as the volume option network.ping-timeout (42 seconds by default). Now, if the VM returns block errors after 30 seconds, and the client waits up to 42 seconds for recovery, there is an issue... So, your solution could be to increase the timeout for error detection of the disks inside the VMs, and/or decrease the network.ping-timeout. It would be interesting to know if adapting these values prevents the read-only occurrences in your environment. If you do any testing with this, please keep me informed about the results. Niels -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: not available URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20151026/b23a1bb4/attachment.sig>