Hi all, I am running replica 3 on SSDs with 10G networking, everything works OK but VMs stored in Gluster volume occasionally freeze with ?Task XY blocked for more than 120 seconds?. Only solution is to poweroff (hard) VM and than boot it up again. I am unable to SSH and also login with console, its stuck probably on some disk operation. No error/warning logs or messages are store in VMs logs. KVM/Libvirt(qemu) using libgfapi and fuse mount to access VM disks on replica volume. Can someone advice how to debug this problem or what can cause these issues? It?s really annoying, I?ve tried to google everything but nothing came up. I?ve tried changing virtio-scsi-pci to virtio-blk-pci disk drivers, but its not related. BR, Martin These are volume settings : Type: Replicate Volume ID: b021bbb6-fa99-4cc7-88f6-49152a22cb9e Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: node1:/imagestore/brick1 Brick2: node2:/imagestore/brick1 Brick3: node3:/imagestore/brick1 Options Reconfigured: performance.client-io-threads: on performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: on cluster.min-free-disk: 10% cluster.server-quorum-type: server cluster.quorum-type: auto cluster.eager-lock: enable cluster.data-self-heal-algorithm: full network.remote-dio: enable network.ping-timeout: 30 diagnostics.count-fop-hits: on diagnostics.latency-measurement: on client.event-threads: 4 server.event-threads: 4 storage.owner-gid: 9869 storage.owner-uid: 9869 server.allow-insecure: on nfs.disable: on performance.readdir-ahead: on -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190513/7b425685/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2019-05-13 at 08.32.24.png Type: image/png Size: 144426 bytes Desc: not available URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190513/7b425685/attachment.png>
lemonnierk at ulrar.net
2019-May-13 06:55 UTC
[Gluster-users] VMs blocked for more than 120 seconds
On Mon, May 13, 2019 at 08:47:45AM +0200, Martin Toth wrote:> Hi all,Hi> > I am running replica 3 on SSDs with 10G networking, everything works OK but VMs stored in Gluster volume occasionally freeze with ?Task XY blocked for more than 120 seconds?. > Only solution is to poweroff (hard) VM and than boot it up again. I am unable to SSH and also login with console, its stuck probably on some disk operation. No error/warning logs or messages are store in VMs logs. >As far as I know this should be unrelated, I get this during heals without any freezes, it just means the storage is slow I think.> KVM/Libvirt(qemu) using libgfapi and fuse mount to access VM disks on replica volume. Can someone advice how to debug this problem or what can cause these issues? > It?s really annoying, I?ve tried to google everything but nothing came up. I?ve tried changing virtio-scsi-pci to virtio-blk-pci disk drivers, but its not related. >Any chance your gluster goes readonly ? Have you checked your gluster logs to see if maybe they lose each other some times ? /var/log/glusterfs For libgfapi accesses you'd have it's log on qemu's standard output, that might contain the actual error at the time of the freez.