thr3ads.net - Gluster users - [Gluster-users] VMs blocked for more than 120 seconds [May 2019]

If this information is useful, please help other people find it:
Share via:

Martin Toth

2019-May-13 06:47 UTC

[Gluster-users] VMs blocked for more than 120 seconds

Hi all,

I am running replica 3 on SSDs with 10G networking, everything works OK but VMs
stored in Gluster volume occasionally freeze with ?Task XY blocked for more than
120 seconds?.
Only solution is to poweroff (hard) VM and than boot it up again. I am unable to
SSH and also login with console, its stuck probably on some disk operation. No
error/warning logs or messages are store in VMs logs.

KVM/Libvirt(qemu) using libgfapi and fuse mount to access VM disks on replica
volume. Can someone advice  how to debug this problem or what can cause these
issues?
It?s really annoying, I?ve tried to google everything but nothing came up. I?ve
tried changing virtio-scsi-pci to virtio-blk-pci disk drivers, but its not
related.

BR,
Martin



These are volume settings :

Type: Replicate
Volume ID: b021bbb6-fa99-4cc7-88f6-49152a22cb9e
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: node1:/imagestore/brick1
Brick2: node2:/imagestore/brick1
Brick3: node3:/imagestore/brick1
Options Reconfigured:
performance.client-io-threads: on
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: on
cluster.min-free-disk: 10%
cluster.server-quorum-type: server
cluster.quorum-type: auto
cluster.eager-lock: enable
cluster.data-self-heal-algorithm: full
network.remote-dio: enable
network.ping-timeout: 30
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
client.event-threads: 4
server.event-threads: 4
storage.owner-gid: 9869
storage.owner-uid: 9869
server.allow-insecure: on
nfs.disable: on
performance.readdir-ahead: on



-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20190513/7b425685/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screen Shot 2019-05-13 at 08.32.24.png
Type: image/png
Size: 144426 bytes
Desc: not available
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20190513/7b425685/attachment.png>

lemonnierk at ulrar.net

2019-May-13 06:55 UTC

head link

[Gluster-users] VMs blocked for more than 120 seconds

On Mon, May 13, 2019 at 08:47:45AM +0200, Martin Toth
wrote:> Hi all,
Hi
> 
> I am running replica 3 on SSDs with 10G networking, everything works OK but
VMs stored in Gluster volume occasionally freeze with ?Task XY blocked for more
than 120 seconds?.
> Only solution is to poweroff (hard) VM and than boot it up again. I am
unable to SSH and also login with console, its stuck probably on some disk
operation. No error/warning logs or messages are store in VMs logs.
> 
As far as I know this should be unrelated, I get this during heals
without any freezes, it just means the storage is slow I think.
> KVM/Libvirt(qemu) using libgfapi and fuse mount to access VM disks on
replica volume. Can someone advice  how to debug this problem or what can cause
these issues?
> It?s really annoying, I?ve tried to google everything but nothing came up.
I?ve tried changing virtio-scsi-pci to virtio-blk-pci disk drivers, but its not
related.
> 
Any chance your gluster goes readonly ? Have you checked your gluster
logs to see if maybe they lose each other some times ?
/var/log/glusterfs

For libgfapi accesses you'd have it's log on qemu's standard output,
that might contain the actual error at the time of the freez.

Gluster users - May 2019 - VMs blocked for more than 120 seconds

[Gluster-users] VMs blocked for more than 120 seconds

[Gluster-users] VMs blocked for more than 120 seconds