thr3ads.net - Gluster users - [Gluster-users] KVM lockups on Gluster 4.1.1 [Aug 2018]

If this information is useful, please help other people find it:
Share via:

Claus Jeppesen

2018-Aug-20 07:38 UTC

[Gluster-users] KVM lockups on Gluster 4.1.1

I think I have seen this also on our CentOS 7.5 systems using GlusterFS
4.1.1 (*) - has an upgrade to 4.1.2 helped out ? I'm trying this now.

Thanx,

Claus.

(*)  libvirt/quemu log:
[2018-08-19 16:45:54.275830] E [MSGID: 114031]
[client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk]
0-glu-vol01-lab-client-0: remote operation failed [Invalid argument]
[2018-08-19 16:45:54.276156] E [MSGID: 114031]
[client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk]
0-glu-vol01-lab-client-1: remote operation failed [Invalid argument]
[2018-08-19 16:45:54.276159] E [MSGID: 108010]
[afr-lk-common.c:284:afr_unlock_inodelk_cbk] 0-glu-vol01-lab-replicate-0:
path=(null) gfid=00000000-0000-0000-0000-000000000000: unlock failed on
subvolume glu-vol
01-lab-client-0 with lock owner 28ae497049560000 [Invalid argument]
[2018-08-19 16:45:54.276183] E [MSGID: 108010]
[afr-lk-common.c:284:afr_unlock_inodelk_cbk] 0-glu-vol01-lab-replicate-0:
path=(null) gfid=00000000-0000-0000-0000-000000000000: unlock failed on
subvolume glu-vol
01-lab-client-1 with lock owner 28ae497049560000 [Invalid argument]
[2018-08-19 17:16:03.690808] E [rpc-clnt.c:184:call_bail]
0-glu-vol01-lab-client-0: bailing out frame type(GlusterFS 4.x v1)
op(FINODELK(30)) xid = 0x3071a5 sent = 2018-08-19 16:45:54.276560. timeout
= 1800 for
192.168.13.131:49152
[2018-08-19 17:16:03.691113] E [MSGID: 114031]
[client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk]
0-glu-vol01-lab-client-0: remote operation failed [Transport endpoint is
not connected]
[2018-08-19 17:46:03.855909] E [rpc-clnt.c:184:call_bail]
0-glu-vol01-lab-client-1: bailing out frame type(GlusterFS 4.x v1)
op(FINODELK(30)) xid = 0x301d0f sent = 2018-08-19 17:16:03.691174. timeout
= 1800 for
192.168.13.132:49152
[2018-08-19 17:46:03.856170] E [MSGID: 114031]
[client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk]
0-glu-vol01-lab-client-1: remote operation failed [Transport endpoint is
not connected]
block I/O error in device 'drive-virtio-disk0': Operation not permitted
(1)
... many repeats ...
block I/O error in device 'drive-virtio-disk0': Operation not permitted
(1)
[2018-08-19 18:16:04.022526] E [rpc-clnt.c:184:call_bail]
0-glu-vol01-lab-client-0: bailing out frame type(GlusterFS 4.x v1)
op(FINODELK(30)) xid = 0x307221 sent = 2018-08-19 17:46:03.861005. timeout
= 1800 for
192.168.13.131:49152
[2018-08-19 18:16:04.022788] E [MSGID: 114031]
[client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk]
0-glu-vol01-lab-client-0: remote operation failed [Transport endpoint is
not connected]
[2018-08-19 18:46:04.195590] E [rpc-clnt.c:184:call_bail]
0-glu-vol01-lab-client-1: bailing out frame type(GlusterFS 4.x v1)
op(FINODELK(30)) xid = 0x301d8a sent = 2018-08-19 18:16:04.022838. timeout
= 1800 for
192.168.13.132:49152
[2018-08-19 18:46:04.195881] E [MSGID: 114031]
[client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk]
0-glu-vol01-lab-client-1: remote operation failed [Transport endpoint is
not connected]
block I/O error in device 'drive-virtio-disk0': Operation not permitted
(1)
block I/O error in device 'drive-virtio-disk0': Operation not permitted
(1)
block I/O error in device 'drive-virtio-disk0': Operation not permitted
(1)
block I/O error in device 'drive-virtio-disk0': Operation not permitted
(1)
block I/O error in device 'drive-virtio-disk0': Operation not permitted
(1)
qemu: terminating on signal 15 from pid 507
2018-08-19 19:36:59.065+0000: shutting down, reason=destroyed
2018-08-19 19:37:08.059+0000: starting up libvirt version: 3.9.0, package:
14.el7_5.6 (CentOS BuildSystem <http://bugs.centos.org>,
2018-06-27-14:13:57, x86-01.bsys.centos.org), qemu version: 1.5.3
(qemu-kvm-1.
5.3-156.el7_5.3)

At 19:37 the VM was restarted.

On Wed, Aug 15, 2018 at 8:25 PM Walter Deignan <WDeignan at uline.com>
wrote:
> I am using gluster to host KVM/QEMU images. I am seeing an intermittent
> issue where access to an image will hang. I have to do a lazy dismount of
> the gluster volume in order to break the lock and then reset the impacted
> virtual machine.
>
> It happened again today and I caught the events below in the client side
> logs. Any thoughts on what might cause this? It seemed to begin after I
> upgraded from 3.12.10 to 4.1.1 a few weeks ago.
>
> [2018-08-14 14:22:15.549501] E [MSGID: 114031]
> [client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 2-gv1-client-4: remote
> operation failed [Invalid argument]
> [2018-08-14 14:22:15.549576] E [MSGID: 114031]
> [client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 2-gv1-client-5: remote
> operation failed [Invalid argument]
> [2018-08-14 14:22:15.549583] E [MSGID: 108010]
> [afr-lk-common.c:284:afr_unlock_inodelk_cbk] 2-gv1-replicate-2: path=(null)
> gfid=00000000-0000-0000-0000-000000000000: unlock failed on subvolume
> gv1-client-4 with lock owner d89caca92b7f0000 [Invalid argument]
> [2018-08-14 14:22:15.549615] E [MSGID: 108010]
> [afr-lk-common.c:284:afr_unlock_inodelk_cbk] 2-gv1-replicate-2: path=(null)
> gfid=00000000-0000-0000-0000-000000000000: unlock failed on subvolume
> gv1-client-5 with lock owner d89caca92b7f0000 [Invalid argument]
> [2018-08-14 14:52:18.726219] E [rpc-clnt.c:184:call_bail] 2-gv1-client-4:
> bailing out frame type(GlusterFS 4.x v1) op(FINODELK(30)) xid = 0xc5e00
> sent = 2018-08-14 14:22:15.699082. timeout = 1800 for 10.35.20.106:49159
> [2018-08-14 14:52:18.726254] E [MSGID: 114031]
> [client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 2-gv1-client-4: remote
> operation failed [Transport endpoint is not connected]
> [2018-08-14 15:22:25.962546] E [rpc-clnt.c:184:call_bail] 2-gv1-client-5:
> bailing out frame type(GlusterFS 4.x v1) op(FINODELK(30)) xid = 0xc4a6d
> sent = 2018-08-14 14:52:18.726329. timeout = 1800 for 10.35.20.107:49164
> [2018-08-14 15:22:25.962587] E [MSGID: 114031]
> [client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 2-gv1-client-5: remote
> operation failed [Transport endpoint is not connected]
> [2018-08-14 15:22:25.962618] W [MSGID: 108019]
> [afr-lk-common.c:601:is_blocking_locks_count_sufficient] 2-gv1-replicate-2:
> Unable to obtain blocking inode lock on even one child for
> gfid:24a48cae-53fe-4634-8fb7-0254c85ad672.
> [2018-08-14 15:22:25.962668] W [fuse-bridge.c:1441:fuse_err_cbk]
> 0-glusterfs-fuse: 3715808: FSYNC() ERR => -1 (Transport endpoint is not
> connected)
>
> Volume configuration -
>
> Volume Name: gv1
> Type: Distributed-Replicate
> Volume ID: 66ad703e-3bae-4e79-a0b7-29ea38e8fcfc
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 5 x 2 = 10
> Transport-type: tcp
> Bricks:
> Brick1: dc-vihi44:/gluster/bricks/megabrick/data
> Brick2: dc-vihi45:/gluster/bricks/megabrick/data
> Brick3: dc-vihi44:/gluster/bricks/brick1/data
> Brick4: dc-vihi45:/gluster/bricks/brick1/data
> Brick5: dc-vihi44:/gluster/bricks/brick2_1/data
> Brick6: dc-vihi45:/gluster/bricks/brick2/data
> Brick7: dc-vihi44:/gluster/bricks/brick3/data
> Brick8: dc-vihi45:/gluster/bricks/brick3/data
> Brick9: dc-vihi44:/gluster/bricks/brick4/data
> Brick10: dc-vihi45:/gluster/bricks/brick4/data
> Options Reconfigured:
> cluster.min-free-inodes: 6%
> performance.client-io-threads: off
> nfs.disable: on
> transport.address-family: inet
> performance.quick-read: off
> performance.read-ahead: off
> performance.io-cache: off
> performance.low-prio-threads: 32
> network.remote-dio: enable
> cluster.eager-lock: enable
> cluster.server-quorum-type: server
> cluster.data-self-heal-algorithm: full
> cluster.locking-scheme: granular
> cluster.shd-max-threads: 8
> cluster.shd-wait-qlength: 10000
> user.cifs: off
> cluster.choose-local: off
> features.shard: on
> cluster.server-quorum-ratio: 51%
>
> -Walter Deignan
> -Uline IT, Systems Architect
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users

-- 
*Claus Jeppesen*
Manager, Network Services
Datto, Inc.
p +45 6170 5901 | Copenhagen Office
www.datto.com

<http://www.datto.com/datto-signature/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20180820/942d6054/attachment.html>

Amar Tumballi

2018-Aug-20 08:50 UTC

head link

[Gluster-users] KVM lockups on Gluster 4.1.1

Thanks for this report! We will look into this. This is something new we
are seeing, and not aware of a RCA yet!

-Amar

On Mon, Aug 20, 2018 at 1:08 PM, Claus Jeppesen <cjeppesen at datto.com>
wrote:
> I think I have seen this also on our CentOS 7.5 systems using GlusterFS
> 4.1.1 (*) - has an upgrade to 4.1.2 helped out ? I'm trying this now.
>
> Thanx,
>
> Claus.
>
> (*)  libvirt/quemu log:
> [2018-08-19 16:45:54.275830] E [MSGID: 114031]
[client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk]
> 0-glu-vol01-lab-client-0: remote operation failed [Invalid argument]
> [2018-08-19 16:45:54.276156] E [MSGID: 114031]
[client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk]
> 0-glu-vol01-lab-client-1: remote operation failed [Invalid argument]
> [2018-08-19 16:45:54.276159] E [MSGID: 108010]
[afr-lk-common.c:284:afr_unlock_inodelk_cbk]
> 0-glu-vol01-lab-replicate-0: path=(null)
gfid=00000000-0000-0000-0000-000000000000:
> unlock failed on subvolume glu-vol
> 01-lab-client-0 with lock owner 28ae497049560000 [Invalid argument]
> [2018-08-19 16:45:54.276183] E [MSGID: 108010]
[afr-lk-common.c:284:afr_unlock_inodelk_cbk]
> 0-glu-vol01-lab-replicate-0: path=(null)
gfid=00000000-0000-0000-0000-000000000000:
> unlock failed on subvolume glu-vol
> 01-lab-client-1 with lock owner 28ae497049560000 [Invalid argument]
> [2018-08-19 17:16:03.690808] E [rpc-clnt.c:184:call_bail]
> 0-glu-vol01-lab-client-0: bailing out frame type(GlusterFS 4.x v1)
> op(FINODELK(30)) xid = 0x3071a5 sent = 2018-08-19 16:45:54.276560. timeout
> = 1800 for
> 192.168.13.131:49152
> [2018-08-19 17:16:03.691113] E [MSGID: 114031]
[client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk]
> 0-glu-vol01-lab-client-0: remote operation failed [Transport endpoint is
> not connected]
> [2018-08-19 17:46:03.855909] E [rpc-clnt.c:184:call_bail]
> 0-glu-vol01-lab-client-1: bailing out frame type(GlusterFS 4.x v1)
> op(FINODELK(30)) xid = 0x301d0f sent = 2018-08-19 17:16:03.691174. timeout
> = 1800 for
> 192.168.13.132:49152
> [2018-08-19 17:46:03.856170] E [MSGID: 114031]
[client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk]
> 0-glu-vol01-lab-client-1: remote operation failed [Transport endpoint is
> not connected]
> block I/O error in device 'drive-virtio-disk0': Operation not
permitted
> (1)
> ... many repeats ...
> block I/O error in device 'drive-virtio-disk0': Operation not
permitted
> (1)
> [2018-08-19 18:16:04.022526] E [rpc-clnt.c:184:call_bail]
> 0-glu-vol01-lab-client-0: bailing out frame type(GlusterFS 4.x v1)
> op(FINODELK(30)) xid = 0x307221 sent = 2018-08-19 17:46:03.861005. timeout
> = 1800 for
> 192.168.13.131:49152
> [2018-08-19 18:16:04.022788] E [MSGID: 114031]
[client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk]
> 0-glu-vol01-lab-client-0: remote operation failed [Transport endpoint is
> not connected]
> [2018-08-19 18:46:04.195590] E [rpc-clnt.c:184:call_bail]
> 0-glu-vol01-lab-client-1: bailing out frame type(GlusterFS 4.x v1)
> op(FINODELK(30)) xid = 0x301d8a sent = 2018-08-19 18:16:04.022838. timeout
> = 1800 for
> 192.168.13.132:49152
> [2018-08-19 18:46:04.195881] E [MSGID: 114031]
[client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk]
> 0-glu-vol01-lab-client-1: remote operation failed [Transport endpoint is
> not connected]
> block I/O error in device 'drive-virtio-disk0': Operation not
permitted
> (1)
> block I/O error in device 'drive-virtio-disk0': Operation not
permitted
> (1)
> block I/O error in device 'drive-virtio-disk0': Operation not
permitted
> (1)
> block I/O error in device 'drive-virtio-disk0': Operation not
permitted
> (1)
> block I/O error in device 'drive-virtio-disk0': Operation not
permitted
> (1)
> qemu: terminating on signal 15 from pid 507
> 2018-08-19 19:36:59.065+0000: shutting down, reason=destroyed
> 2018-08-19 19:37:08.059+0000: starting up libvirt version: 3.9.0, package:
> 14.el7_5.6 (CentOS BuildSystem <http://bugs.centos.org>,
> 2018-06-27-14:13:57, x86-01.bsys.centos.org), qemu version: 1.5.3
> (qemu-kvm-1.
> 5.3-156.el7_5.3)
>
> At 19:37 the VM was restarted.
>
>
>
> On Wed, Aug 15, 2018 at 8:25 PM Walter Deignan <WDeignan at
uline.com> wrote:
>
>> I am using gluster to host KVM/QEMU images. I am seeing an intermittent
>> issue where access to an image will hang. I have to do a lazy dismount
of
>> the gluster volume in order to break the lock and then reset the
impacted
>> virtual machine.
>>
>> It happened again today and I caught the events below in the client
side
>> logs. Any thoughts on what might cause this? It seemed to begin after I
>> upgraded from 3.12.10 to 4.1.1 a few weeks ago.
>>
>> [2018-08-14 14:22:15.549501] E [MSGID: 114031]
[client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk]
>> 2-gv1-client-4: remote operation failed [Invalid argument]
>> [2018-08-14 14:22:15.549576] E [MSGID: 114031]
[client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk]
>> 2-gv1-client-5: remote operation failed [Invalid argument]
>> [2018-08-14 14:22:15.549583] E [MSGID: 108010]
[afr-lk-common.c:284:afr_unlock_inodelk_cbk]
>> 2-gv1-replicate-2: path=(null)
gfid=00000000-0000-0000-0000-000000000000:
>> unlock failed on subvolume gv1-client-4 with lock owner
d89caca92b7f0000
>> [Invalid argument]
>> [2018-08-14 14:22:15.549615] E [MSGID: 108010]
[afr-lk-common.c:284:afr_unlock_inodelk_cbk]
>> 2-gv1-replicate-2: path=(null)
gfid=00000000-0000-0000-0000-000000000000:
>> unlock failed on subvolume gv1-client-5 with lock owner
d89caca92b7f0000
>> [Invalid argument]
>> [2018-08-14 14:52:18.726219] E [rpc-clnt.c:184:call_bail]
2-gv1-client-4:
>> bailing out frame type(GlusterFS 4.x v1) op(FINODELK(30)) xid = 0xc5e00
>> sent = 2018-08-14 14:22:15.699082. timeout = 1800 for
10.35.20.106:49159
>> [2018-08-14 14:52:18.726254] E [MSGID: 114031]
[client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk]
>> 2-gv1-client-4: remote operation failed [Transport endpoint is not
>> connected]
>> [2018-08-14 15:22:25.962546] E [rpc-clnt.c:184:call_bail]
2-gv1-client-5:
>> bailing out frame type(GlusterFS 4.x v1) op(FINODELK(30)) xid = 0xc4a6d
>> sent = 2018-08-14 14:52:18.726329. timeout = 1800 for
10.35.20.107:49164
>> [2018-08-14 15:22:25.962587] E [MSGID: 114031]
[client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk]
>> 2-gv1-client-5: remote operation failed [Transport endpoint is not
>> connected]
>> [2018-08-14 15:22:25.962618] W [MSGID: 108019] [afr-lk-common.c:601:is_
>> blocking_locks_count_sufficient] 2-gv1-replicate-2: Unable to obtain
>> blocking inode lock on even one child for gfid:24a48cae-53fe-4634-8fb7-
>> 0254c85ad672.
>> [2018-08-14 15:22:25.962668] W [fuse-bridge.c:1441:fuse_err_cbk]
>> 0-glusterfs-fuse: 3715808: FSYNC() ERR => -1 (Transport endpoint is
not
>> connected)
>>
>> Volume configuration -
>>
>> Volume Name: gv1
>> Type: Distributed-Replicate
>> Volume ID: 66ad703e-3bae-4e79-a0b7-29ea38e8fcfc
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 5 x 2 = 10
>> Transport-type: tcp
>> Bricks:
>> Brick1: dc-vihi44:/gluster/bricks/megabrick/data
>> Brick2: dc-vihi45:/gluster/bricks/megabrick/data
>> Brick3: dc-vihi44:/gluster/bricks/brick1/data
>> Brick4: dc-vihi45:/gluster/bricks/brick1/data
>> Brick5: dc-vihi44:/gluster/bricks/brick2_1/data
>> Brick6: dc-vihi45:/gluster/bricks/brick2/data
>> Brick7: dc-vihi44:/gluster/bricks/brick3/data
>> Brick8: dc-vihi45:/gluster/bricks/brick3/data
>> Brick9: dc-vihi44:/gluster/bricks/brick4/data
>> Brick10: dc-vihi45:/gluster/bricks/brick4/data
>> Options Reconfigured:
>> cluster.min-free-inodes: 6%
>> performance.client-io-threads: off
>> nfs.disable: on
>> transport.address-family: inet
>> performance.quick-read: off
>> performance.read-ahead: off
>> performance.io-cache: off
>> performance.low-prio-threads: 32
>> network.remote-dio: enable
>> cluster.eager-lock: enable
>> cluster.server-quorum-type: server
>> cluster.data-self-heal-algorithm: full
>> cluster.locking-scheme: granular
>> cluster.shd-max-threads: 8
>> cluster.shd-wait-qlength: 10000
>> user.cifs: off
>> cluster.choose-local: off
>> features.shard: on
>> cluster.server-quorum-ratio: 51%
>>
>> -Walter Deignan
>> -Uline IT, Systems Architect_____________________
>> __________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
> --
> *Claus Jeppesen*
> Manager, Network Services
> Datto, Inc.
> p +45 6170 5901 | Copenhagen Office
> www.datto.com
>
> <http://www.datto.com/datto-signature/>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>


-- 
Amar Tumballi (amarts)
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20180820/45ee2595/attachment.html>

Walter Deignan

2018-Aug-20 12:50 UTC

head link

[Gluster-users] KVM lockups on Gluster 4.1.1

I upgraded late last week to 4.1.2. Since then I've seen several posix 
health checks fail and bricks drop offline but I'm not sure if that's 
related or a different root issue.

I haven't seen the issue described below re-occur on 4.1.2 yet but it was 
intermittent to begin with so I'll probably need to run for a week or more 
to be confident.

-Walter Deignan
-Uline IT, Systems Architect



From:   "Claus Jeppesen" <cjeppesen at datto.com>
To:     WDeignan at uline.com
Cc:     gluster-users at gluster.org
Date:   08/20/2018 07:20 AM
Subject:        Re: [Gluster-users] KVM lockups on Gluster 4.1.1



I think I have seen this also on our CentOS 7.5 systems using GlusterFS 
4.1.1 (*) - has an upgrade to 4.1.2 helped out ? I'm trying this now.

Thanx,

Claus.

(*)  libvirt/quemu log:
[2018-08-19 16:45:54.275830] E [MSGID: 114031] 
[client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 
0-glu-vol01-lab-client-0: remote operation failed [Invalid argument] 
[2018-08-19 16:45:54.276156] E [MSGID: 114031] 
[client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 
0-glu-vol01-lab-client-1: remote operation failed [Invalid argument] 
[2018-08-19 16:45:54.276159] E [MSGID: 108010] 
[afr-lk-common.c:284:afr_unlock_inodelk_cbk] 0-glu-vol01-lab-replicate-0: 
path=(null) gfid=00000000-0000-0000-0000-000000000000: unlock failed on 
subvolume glu-vol
01-lab-client-0 with lock owner 28ae497049560000 [Invalid argument] 
[2018-08-19 16:45:54.276183] E [MSGID: 108010] 
[afr-lk-common.c:284:afr_unlock_inodelk_cbk] 0-glu-vol01-lab-replicate-0: 
path=(null) gfid=00000000-0000-0000-0000-000000000000: unlock failed on 
subvolume glu-vol
01-lab-client-1 with lock owner 28ae497049560000 [Invalid argument] 
[2018-08-19 17:16:03.690808] E [rpc-clnt.c:184:call_bail] 
0-glu-vol01-lab-client-0: bailing out frame type(GlusterFS 4.x v1) 
op(FINODELK(30)) xid = 0x3071a5 sent = 2018-08-19 16:45:54.276560. timeout 
= 1800 for
192.168.13.131:49152 
[2018-08-19 17:16:03.691113] E [MSGID: 114031] 
[client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 
0-glu-vol01-lab-client-0: remote operation failed [Transport endpoint is 
not connected] 
[2018-08-19 17:46:03.855909] E [rpc-clnt.c:184:call_bail] 
0-glu-vol01-lab-client-1: bailing out frame type(GlusterFS 4.x v1) 
op(FINODELK(30)) xid = 0x301d0f sent = 2018-08-19 17:16:03.691174. timeout 
= 1800 for
192.168.13.132:49152 
[2018-08-19 17:46:03.856170] E [MSGID: 114031] 
[client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 
0-glu-vol01-lab-client-1: remote operation failed [Transport endpoint is 
not connected] 
block I/O error in device 'drive-virtio-disk0': Operation not permitted 
(1) 
... many repeats ... 
block I/O error in device 'drive-virtio-disk0': Operation not permitted 
(1) 
[2018-08-19 18:16:04.022526] E [rpc-clnt.c:184:call_bail] 
0-glu-vol01-lab-client-0: bailing out frame type(GlusterFS 4.x v1) 
op(FINODELK(30)) xid = 0x307221 sent = 2018-08-19 17:46:03.861005. timeout 
= 1800 for
192.168.13.131:49152 
[2018-08-19 18:16:04.022788] E [MSGID: 114031] 
[client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 
0-glu-vol01-lab-client-0: remote operation failed [Transport endpoint is 
not connected] 
[2018-08-19 18:46:04.195590] E [rpc-clnt.c:184:call_bail] 
0-glu-vol01-lab-client-1: bailing out frame type(GlusterFS 4.x v1) 
op(FINODELK(30)) xid = 0x301d8a sent = 2018-08-19 18:16:04.022838. timeout 
= 1800 for
192.168.13.132:49152 
[2018-08-19 18:46:04.195881] E [MSGID: 114031] 
[client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 
0-glu-vol01-lab-client-1: remote operation failed [Transport endpoint is 
not connected] 
block I/O error in device 'drive-virtio-disk0': Operation not permitted 
(1) 
block I/O error in device 'drive-virtio-disk0': Operation not permitted 
(1) 
block I/O error in device 'drive-virtio-disk0': Operation not permitted 
(1) 
block I/O error in device 'drive-virtio-disk0': Operation not permitted 
(1) 
block I/O error in device 'drive-virtio-disk0': Operation not permitted 
(1) 
qemu: terminating on signal 15 from pid 507 
2018-08-19 19:36:59.065+0000: shutting down, reason=destroyed 
2018-08-19 19:37:08.059+0000: starting up libvirt version: 3.9.0, package: 
14.el7_5.6 (CentOS BuildSystem <http://bugs.centos.org>, 
2018-06-27-14:13:57, x86-01.bsys.centos.org), qemu version: 1.5.3 
(qemu-kvm-1.
5.3-156.el7_5.3)

At 19:37 the VM was restarted.



On Wed, Aug 15, 2018 at 8:25 PM Walter Deignan <WDeignan at uline.com>
wrote:
I am using gluster to host KVM/QEMU images. I am seeing an intermittent 
issue where access to an image will hang. I have to do a lazy dismount of 
the gluster volume in order to break the lock and then reset the impacted 
virtual machine. 

It happened again today and I caught the events below in the client side 
logs. Any thoughts on what might cause this? It seemed to begin after I 
upgraded from 3.12.10 to 4.1.1 a few weeks ago. 

[2018-08-14 14:22:15.549501] E [MSGID: 114031] 
[client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 2-gv1-client-4: remote 
operation failed [Invalid argument] 
[2018-08-14 14:22:15.549576] E [MSGID: 114031] 
[client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 2-gv1-client-5: remote 
operation failed [Invalid argument] 
[2018-08-14 14:22:15.549583] E [MSGID: 108010] 
[afr-lk-common.c:284:afr_unlock_inodelk_cbk] 2-gv1-replicate-2: 
path=(null) gfid=00000000-0000-0000-0000-000000000000: unlock failed on 
subvolume gv1-client-4 with lock owner d89caca92b7f0000 [Invalid argument] 

[2018-08-14 14:22:15.549615] E [MSGID: 108010] 
[afr-lk-common.c:284:afr_unlock_inodelk_cbk] 2-gv1-replicate-2: 
path=(null) gfid=00000000-0000-0000-0000-000000000000: unlock failed on 
subvolume gv1-client-5 with lock owner d89caca92b7f0000 [Invalid argument] 

[2018-08-14 14:52:18.726219] E [rpc-clnt.c:184:call_bail] 2-gv1-client-4: 
bailing out frame type(GlusterFS 4.x v1) op(FINODELK(30)) xid = 0xc5e00 
sent = 2018-08-14 14:22:15.699082. timeout = 1800 for 10.35.20.106:49159 
[2018-08-14 14:52:18.726254] E [MSGID: 114031] 
[client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 2-gv1-client-4: remote 
operation failed [Transport endpoint is not connected] 
[2018-08-14 15:22:25.962546] E [rpc-clnt.c:184:call_bail] 2-gv1-client-5: 
bailing out frame type(GlusterFS 4.x v1) op(FINODELK(30)) xid = 0xc4a6d 
sent = 2018-08-14 14:52:18.726329. timeout = 1800 for 10.35.20.107:49164 
[2018-08-14 15:22:25.962587] E [MSGID: 114031] 
[client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 2-gv1-client-5: remote 
operation failed [Transport endpoint is not connected] 
[2018-08-14 15:22:25.962618] W [MSGID: 108019] 
[afr-lk-common.c:601:is_blocking_locks_count_sufficient] 
2-gv1-replicate-2: Unable to obtain blocking inode lock on even one child 
for gfid:24a48cae-53fe-4634-8fb7-0254c85ad672. 
[2018-08-14 15:22:25.962668] W [fuse-bridge.c:1441:fuse_err_cbk] 
0-glusterfs-fuse: 3715808: FSYNC() ERR => -1 (Transport endpoint is not 
connected) 

Volume configuration - 

Volume Name: gv1 
Type: Distributed-Replicate 
Volume ID: 66ad703e-3bae-4e79-a0b7-29ea38e8fcfc 
Status: Started 
Snapshot Count: 0 
Number of Bricks: 5 x 2 = 10 
Transport-type: tcp 
Bricks: 
Brick1: dc-vihi44:/gluster/bricks/megabrick/data 
Brick2: dc-vihi45:/gluster/bricks/megabrick/data 
Brick3: dc-vihi44:/gluster/bricks/brick1/data 
Brick4: dc-vihi45:/gluster/bricks/brick1/data 
Brick5: dc-vihi44:/gluster/bricks/brick2_1/data 
Brick6: dc-vihi45:/gluster/bricks/brick2/data 
Brick7: dc-vihi44:/gluster/bricks/brick3/data 
Brick8: dc-vihi45:/gluster/bricks/brick3/data 
Brick9: dc-vihi44:/gluster/bricks/brick4/data 
Brick10: dc-vihi45:/gluster/bricks/brick4/data 
Options Reconfigured: 
cluster.min-free-inodes: 6% 
performance.client-io-threads: off 
nfs.disable: on 
transport.address-family: inet 
performance.quick-read: off 
performance.read-ahead: off 
performance.io-cache: off 
performance.low-prio-threads: 32 
network.remote-dio: enable 
cluster.eager-lock: enable 
cluster.server-quorum-type: server 
cluster.data-self-heal-algorithm: full 
cluster.locking-scheme: granular 
cluster.shd-max-threads: 8 
cluster.shd-wait-qlength: 10000 
user.cifs: off 
cluster.choose-local: off 
features.shard: on 
cluster.server-quorum-ratio: 51% 

-Walter Deignan
-Uline IT, Systems Architect
_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


-- 
Claus Jeppesen
Manager, Network Services
Datto, Inc.
p +45 6170 5901 | Copenhagen Office
www.datto.com




-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20180820/28816dfc/attachment.html>

Gluster users - Aug 2018 - KVM lockups on Gluster 4.1.1

[Gluster-users] KVM lockups on Gluster 4.1.1

[Gluster-users] KVM lockups on Gluster 4.1.1

[Gluster-users] KVM lockups on Gluster 4.1.1