I think I have seen this also on our CentOS 7.5 systems using GlusterFS 4.1.1 (*) - has an upgrade to 4.1.2 helped out ? I'm trying this now. Thanx, Claus. (*) libvirt/quemu log: [2018-08-19 16:45:54.275830] E [MSGID: 114031] [client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 0-glu-vol01-lab-client-0: remote operation failed [Invalid argument] [2018-08-19 16:45:54.276156] E [MSGID: 114031] [client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 0-glu-vol01-lab-client-1: remote operation failed [Invalid argument] [2018-08-19 16:45:54.276159] E [MSGID: 108010] [afr-lk-common.c:284:afr_unlock_inodelk_cbk] 0-glu-vol01-lab-replicate-0: path=(null) gfid=00000000-0000-0000-0000-000000000000: unlock failed on subvolume glu-vol 01-lab-client-0 with lock owner 28ae497049560000 [Invalid argument] [2018-08-19 16:45:54.276183] E [MSGID: 108010] [afr-lk-common.c:284:afr_unlock_inodelk_cbk] 0-glu-vol01-lab-replicate-0: path=(null) gfid=00000000-0000-0000-0000-000000000000: unlock failed on subvolume glu-vol 01-lab-client-1 with lock owner 28ae497049560000 [Invalid argument] [2018-08-19 17:16:03.690808] E [rpc-clnt.c:184:call_bail] 0-glu-vol01-lab-client-0: bailing out frame type(GlusterFS 4.x v1) op(FINODELK(30)) xid = 0x3071a5 sent = 2018-08-19 16:45:54.276560. timeout = 1800 for 192.168.13.131:49152 [2018-08-19 17:16:03.691113] E [MSGID: 114031] [client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 0-glu-vol01-lab-client-0: remote operation failed [Transport endpoint is not connected] [2018-08-19 17:46:03.855909] E [rpc-clnt.c:184:call_bail] 0-glu-vol01-lab-client-1: bailing out frame type(GlusterFS 4.x v1) op(FINODELK(30)) xid = 0x301d0f sent = 2018-08-19 17:16:03.691174. timeout = 1800 for 192.168.13.132:49152 [2018-08-19 17:46:03.856170] E [MSGID: 114031] [client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 0-glu-vol01-lab-client-1: remote operation failed [Transport endpoint is not connected] block I/O error in device 'drive-virtio-disk0': Operation not permitted (1) ... many repeats ... block I/O error in device 'drive-virtio-disk0': Operation not permitted (1) [2018-08-19 18:16:04.022526] E [rpc-clnt.c:184:call_bail] 0-glu-vol01-lab-client-0: bailing out frame type(GlusterFS 4.x v1) op(FINODELK(30)) xid = 0x307221 sent = 2018-08-19 17:46:03.861005. timeout = 1800 for 192.168.13.131:49152 [2018-08-19 18:16:04.022788] E [MSGID: 114031] [client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 0-glu-vol01-lab-client-0: remote operation failed [Transport endpoint is not connected] [2018-08-19 18:46:04.195590] E [rpc-clnt.c:184:call_bail] 0-glu-vol01-lab-client-1: bailing out frame type(GlusterFS 4.x v1) op(FINODELK(30)) xid = 0x301d8a sent = 2018-08-19 18:16:04.022838. timeout = 1800 for 192.168.13.132:49152 [2018-08-19 18:46:04.195881] E [MSGID: 114031] [client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 0-glu-vol01-lab-client-1: remote operation failed [Transport endpoint is not connected] block I/O error in device 'drive-virtio-disk0': Operation not permitted (1) block I/O error in device 'drive-virtio-disk0': Operation not permitted (1) block I/O error in device 'drive-virtio-disk0': Operation not permitted (1) block I/O error in device 'drive-virtio-disk0': Operation not permitted (1) block I/O error in device 'drive-virtio-disk0': Operation not permitted (1) qemu: terminating on signal 15 from pid 507 2018-08-19 19:36:59.065+0000: shutting down, reason=destroyed 2018-08-19 19:37:08.059+0000: starting up libvirt version: 3.9.0, package: 14.el7_5.6 (CentOS BuildSystem <http://bugs.centos.org>, 2018-06-27-14:13:57, x86-01.bsys.centos.org), qemu version: 1.5.3 (qemu-kvm-1. 5.3-156.el7_5.3) At 19:37 the VM was restarted. On Wed, Aug 15, 2018 at 8:25 PM Walter Deignan <WDeignan at uline.com> wrote:> I am using gluster to host KVM/QEMU images. I am seeing an intermittent > issue where access to an image will hang. I have to do a lazy dismount of > the gluster volume in order to break the lock and then reset the impacted > virtual machine. > > It happened again today and I caught the events below in the client side > logs. Any thoughts on what might cause this? It seemed to begin after I > upgraded from 3.12.10 to 4.1.1 a few weeks ago. > > [2018-08-14 14:22:15.549501] E [MSGID: 114031] > [client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 2-gv1-client-4: remote > operation failed [Invalid argument] > [2018-08-14 14:22:15.549576] E [MSGID: 114031] > [client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 2-gv1-client-5: remote > operation failed [Invalid argument] > [2018-08-14 14:22:15.549583] E [MSGID: 108010] > [afr-lk-common.c:284:afr_unlock_inodelk_cbk] 2-gv1-replicate-2: path=(null) > gfid=00000000-0000-0000-0000-000000000000: unlock failed on subvolume > gv1-client-4 with lock owner d89caca92b7f0000 [Invalid argument] > [2018-08-14 14:22:15.549615] E [MSGID: 108010] > [afr-lk-common.c:284:afr_unlock_inodelk_cbk] 2-gv1-replicate-2: path=(null) > gfid=00000000-0000-0000-0000-000000000000: unlock failed on subvolume > gv1-client-5 with lock owner d89caca92b7f0000 [Invalid argument] > [2018-08-14 14:52:18.726219] E [rpc-clnt.c:184:call_bail] 2-gv1-client-4: > bailing out frame type(GlusterFS 4.x v1) op(FINODELK(30)) xid = 0xc5e00 > sent = 2018-08-14 14:22:15.699082. timeout = 1800 for 10.35.20.106:49159 > [2018-08-14 14:52:18.726254] E [MSGID: 114031] > [client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 2-gv1-client-4: remote > operation failed [Transport endpoint is not connected] > [2018-08-14 15:22:25.962546] E [rpc-clnt.c:184:call_bail] 2-gv1-client-5: > bailing out frame type(GlusterFS 4.x v1) op(FINODELK(30)) xid = 0xc4a6d > sent = 2018-08-14 14:52:18.726329. timeout = 1800 for 10.35.20.107:49164 > [2018-08-14 15:22:25.962587] E [MSGID: 114031] > [client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 2-gv1-client-5: remote > operation failed [Transport endpoint is not connected] > [2018-08-14 15:22:25.962618] W [MSGID: 108019] > [afr-lk-common.c:601:is_blocking_locks_count_sufficient] 2-gv1-replicate-2: > Unable to obtain blocking inode lock on even one child for > gfid:24a48cae-53fe-4634-8fb7-0254c85ad672. > [2018-08-14 15:22:25.962668] W [fuse-bridge.c:1441:fuse_err_cbk] > 0-glusterfs-fuse: 3715808: FSYNC() ERR => -1 (Transport endpoint is not > connected) > > Volume configuration - > > Volume Name: gv1 > Type: Distributed-Replicate > Volume ID: 66ad703e-3bae-4e79-a0b7-29ea38e8fcfc > Status: Started > Snapshot Count: 0 > Number of Bricks: 5 x 2 = 10 > Transport-type: tcp > Bricks: > Brick1: dc-vihi44:/gluster/bricks/megabrick/data > Brick2: dc-vihi45:/gluster/bricks/megabrick/data > Brick3: dc-vihi44:/gluster/bricks/brick1/data > Brick4: dc-vihi45:/gluster/bricks/brick1/data > Brick5: dc-vihi44:/gluster/bricks/brick2_1/data > Brick6: dc-vihi45:/gluster/bricks/brick2/data > Brick7: dc-vihi44:/gluster/bricks/brick3/data > Brick8: dc-vihi45:/gluster/bricks/brick3/data > Brick9: dc-vihi44:/gluster/bricks/brick4/data > Brick10: dc-vihi45:/gluster/bricks/brick4/data > Options Reconfigured: > cluster.min-free-inodes: 6% > performance.client-io-threads: off > nfs.disable: on > transport.address-family: inet > performance.quick-read: off > performance.read-ahead: off > performance.io-cache: off > performance.low-prio-threads: 32 > network.remote-dio: enable > cluster.eager-lock: enable > cluster.server-quorum-type: server > cluster.data-self-heal-algorithm: full > cluster.locking-scheme: granular > cluster.shd-max-threads: 8 > cluster.shd-wait-qlength: 10000 > user.cifs: off > cluster.choose-local: off > features.shard: on > cluster.server-quorum-ratio: 51% > > -Walter Deignan > -Uline IT, Systems Architect > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users-- *Claus Jeppesen* Manager, Network Services Datto, Inc. p +45 6170 5901 | Copenhagen Office www.datto.com <http://www.datto.com/datto-signature/> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180820/942d6054/attachment.html>
Thanks for this report! We will look into this. This is something new we are seeing, and not aware of a RCA yet! -Amar On Mon, Aug 20, 2018 at 1:08 PM, Claus Jeppesen <cjeppesen at datto.com> wrote:> I think I have seen this also on our CentOS 7.5 systems using GlusterFS > 4.1.1 (*) - has an upgrade to 4.1.2 helped out ? I'm trying this now. > > Thanx, > > Claus. > > (*) libvirt/quemu log: > [2018-08-19 16:45:54.275830] E [MSGID: 114031] [client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] > 0-glu-vol01-lab-client-0: remote operation failed [Invalid argument] > [2018-08-19 16:45:54.276156] E [MSGID: 114031] [client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] > 0-glu-vol01-lab-client-1: remote operation failed [Invalid argument] > [2018-08-19 16:45:54.276159] E [MSGID: 108010] [afr-lk-common.c:284:afr_unlock_inodelk_cbk] > 0-glu-vol01-lab-replicate-0: path=(null) gfid=00000000-0000-0000-0000-000000000000: > unlock failed on subvolume glu-vol > 01-lab-client-0 with lock owner 28ae497049560000 [Invalid argument] > [2018-08-19 16:45:54.276183] E [MSGID: 108010] [afr-lk-common.c:284:afr_unlock_inodelk_cbk] > 0-glu-vol01-lab-replicate-0: path=(null) gfid=00000000-0000-0000-0000-000000000000: > unlock failed on subvolume glu-vol > 01-lab-client-1 with lock owner 28ae497049560000 [Invalid argument] > [2018-08-19 17:16:03.690808] E [rpc-clnt.c:184:call_bail] > 0-glu-vol01-lab-client-0: bailing out frame type(GlusterFS 4.x v1) > op(FINODELK(30)) xid = 0x3071a5 sent = 2018-08-19 16:45:54.276560. timeout > = 1800 for > 192.168.13.131:49152 > [2018-08-19 17:16:03.691113] E [MSGID: 114031] [client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] > 0-glu-vol01-lab-client-0: remote operation failed [Transport endpoint is > not connected] > [2018-08-19 17:46:03.855909] E [rpc-clnt.c:184:call_bail] > 0-glu-vol01-lab-client-1: bailing out frame type(GlusterFS 4.x v1) > op(FINODELK(30)) xid = 0x301d0f sent = 2018-08-19 17:16:03.691174. timeout > = 1800 for > 192.168.13.132:49152 > [2018-08-19 17:46:03.856170] E [MSGID: 114031] [client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] > 0-glu-vol01-lab-client-1: remote operation failed [Transport endpoint is > not connected] > block I/O error in device 'drive-virtio-disk0': Operation not permitted > (1) > ... many repeats ... > block I/O error in device 'drive-virtio-disk0': Operation not permitted > (1) > [2018-08-19 18:16:04.022526] E [rpc-clnt.c:184:call_bail] > 0-glu-vol01-lab-client-0: bailing out frame type(GlusterFS 4.x v1) > op(FINODELK(30)) xid = 0x307221 sent = 2018-08-19 17:46:03.861005. timeout > = 1800 for > 192.168.13.131:49152 > [2018-08-19 18:16:04.022788] E [MSGID: 114031] [client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] > 0-glu-vol01-lab-client-0: remote operation failed [Transport endpoint is > not connected] > [2018-08-19 18:46:04.195590] E [rpc-clnt.c:184:call_bail] > 0-glu-vol01-lab-client-1: bailing out frame type(GlusterFS 4.x v1) > op(FINODELK(30)) xid = 0x301d8a sent = 2018-08-19 18:16:04.022838. timeout > = 1800 for > 192.168.13.132:49152 > [2018-08-19 18:46:04.195881] E [MSGID: 114031] [client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] > 0-glu-vol01-lab-client-1: remote operation failed [Transport endpoint is > not connected] > block I/O error in device 'drive-virtio-disk0': Operation not permitted > (1) > block I/O error in device 'drive-virtio-disk0': Operation not permitted > (1) > block I/O error in device 'drive-virtio-disk0': Operation not permitted > (1) > block I/O error in device 'drive-virtio-disk0': Operation not permitted > (1) > block I/O error in device 'drive-virtio-disk0': Operation not permitted > (1) > qemu: terminating on signal 15 from pid 507 > 2018-08-19 19:36:59.065+0000: shutting down, reason=destroyed > 2018-08-19 19:37:08.059+0000: starting up libvirt version: 3.9.0, package: > 14.el7_5.6 (CentOS BuildSystem <http://bugs.centos.org>, > 2018-06-27-14:13:57, x86-01.bsys.centos.org), qemu version: 1.5.3 > (qemu-kvm-1. > 5.3-156.el7_5.3) > > At 19:37 the VM was restarted. > > > > On Wed, Aug 15, 2018 at 8:25 PM Walter Deignan <WDeignan at uline.com> wrote: > >> I am using gluster to host KVM/QEMU images. I am seeing an intermittent >> issue where access to an image will hang. I have to do a lazy dismount of >> the gluster volume in order to break the lock and then reset the impacted >> virtual machine. >> >> It happened again today and I caught the events below in the client side >> logs. Any thoughts on what might cause this? It seemed to begin after I >> upgraded from 3.12.10 to 4.1.1 a few weeks ago. >> >> [2018-08-14 14:22:15.549501] E [MSGID: 114031] [client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] >> 2-gv1-client-4: remote operation failed [Invalid argument] >> [2018-08-14 14:22:15.549576] E [MSGID: 114031] [client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] >> 2-gv1-client-5: remote operation failed [Invalid argument] >> [2018-08-14 14:22:15.549583] E [MSGID: 108010] [afr-lk-common.c:284:afr_unlock_inodelk_cbk] >> 2-gv1-replicate-2: path=(null) gfid=00000000-0000-0000-0000-000000000000: >> unlock failed on subvolume gv1-client-4 with lock owner d89caca92b7f0000 >> [Invalid argument] >> [2018-08-14 14:22:15.549615] E [MSGID: 108010] [afr-lk-common.c:284:afr_unlock_inodelk_cbk] >> 2-gv1-replicate-2: path=(null) gfid=00000000-0000-0000-0000-000000000000: >> unlock failed on subvolume gv1-client-5 with lock owner d89caca92b7f0000 >> [Invalid argument] >> [2018-08-14 14:52:18.726219] E [rpc-clnt.c:184:call_bail] 2-gv1-client-4: >> bailing out frame type(GlusterFS 4.x v1) op(FINODELK(30)) xid = 0xc5e00 >> sent = 2018-08-14 14:22:15.699082. timeout = 1800 for 10.35.20.106:49159 >> [2018-08-14 14:52:18.726254] E [MSGID: 114031] [client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] >> 2-gv1-client-4: remote operation failed [Transport endpoint is not >> connected] >> [2018-08-14 15:22:25.962546] E [rpc-clnt.c:184:call_bail] 2-gv1-client-5: >> bailing out frame type(GlusterFS 4.x v1) op(FINODELK(30)) xid = 0xc4a6d >> sent = 2018-08-14 14:52:18.726329. timeout = 1800 for 10.35.20.107:49164 >> [2018-08-14 15:22:25.962587] E [MSGID: 114031] [client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] >> 2-gv1-client-5: remote operation failed [Transport endpoint is not >> connected] >> [2018-08-14 15:22:25.962618] W [MSGID: 108019] [afr-lk-common.c:601:is_ >> blocking_locks_count_sufficient] 2-gv1-replicate-2: Unable to obtain >> blocking inode lock on even one child for gfid:24a48cae-53fe-4634-8fb7- >> 0254c85ad672. >> [2018-08-14 15:22:25.962668] W [fuse-bridge.c:1441:fuse_err_cbk] >> 0-glusterfs-fuse: 3715808: FSYNC() ERR => -1 (Transport endpoint is not >> connected) >> >> Volume configuration - >> >> Volume Name: gv1 >> Type: Distributed-Replicate >> Volume ID: 66ad703e-3bae-4e79-a0b7-29ea38e8fcfc >> Status: Started >> Snapshot Count: 0 >> Number of Bricks: 5 x 2 = 10 >> Transport-type: tcp >> Bricks: >> Brick1: dc-vihi44:/gluster/bricks/megabrick/data >> Brick2: dc-vihi45:/gluster/bricks/megabrick/data >> Brick3: dc-vihi44:/gluster/bricks/brick1/data >> Brick4: dc-vihi45:/gluster/bricks/brick1/data >> Brick5: dc-vihi44:/gluster/bricks/brick2_1/data >> Brick6: dc-vihi45:/gluster/bricks/brick2/data >> Brick7: dc-vihi44:/gluster/bricks/brick3/data >> Brick8: dc-vihi45:/gluster/bricks/brick3/data >> Brick9: dc-vihi44:/gluster/bricks/brick4/data >> Brick10: dc-vihi45:/gluster/bricks/brick4/data >> Options Reconfigured: >> cluster.min-free-inodes: 6% >> performance.client-io-threads: off >> nfs.disable: on >> transport.address-family: inet >> performance.quick-read: off >> performance.read-ahead: off >> performance.io-cache: off >> performance.low-prio-threads: 32 >> network.remote-dio: enable >> cluster.eager-lock: enable >> cluster.server-quorum-type: server >> cluster.data-self-heal-algorithm: full >> cluster.locking-scheme: granular >> cluster.shd-max-threads: 8 >> cluster.shd-wait-qlength: 10000 >> user.cifs: off >> cluster.choose-local: off >> features.shard: on >> cluster.server-quorum-ratio: 51% >> >> -Walter Deignan >> -Uline IT, Systems Architect_____________________ >> __________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > > > -- > *Claus Jeppesen* > Manager, Network Services > Datto, Inc. > p +45 6170 5901 | Copenhagen Office > www.datto.com > > <http://www.datto.com/datto-signature/> > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users >-- Amar Tumballi (amarts) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180820/45ee2595/attachment.html>
I upgraded late last week to 4.1.2. Since then I've seen several posix health checks fail and bricks drop offline but I'm not sure if that's related or a different root issue. I haven't seen the issue described below re-occur on 4.1.2 yet but it was intermittent to begin with so I'll probably need to run for a week or more to be confident. -Walter Deignan -Uline IT, Systems Architect From: "Claus Jeppesen" <cjeppesen at datto.com> To: WDeignan at uline.com Cc: gluster-users at gluster.org Date: 08/20/2018 07:20 AM Subject: Re: [Gluster-users] KVM lockups on Gluster 4.1.1 I think I have seen this also on our CentOS 7.5 systems using GlusterFS 4.1.1 (*) - has an upgrade to 4.1.2 helped out ? I'm trying this now. Thanx, Claus. (*) libvirt/quemu log: [2018-08-19 16:45:54.275830] E [MSGID: 114031] [client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 0-glu-vol01-lab-client-0: remote operation failed [Invalid argument] [2018-08-19 16:45:54.276156] E [MSGID: 114031] [client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 0-glu-vol01-lab-client-1: remote operation failed [Invalid argument] [2018-08-19 16:45:54.276159] E [MSGID: 108010] [afr-lk-common.c:284:afr_unlock_inodelk_cbk] 0-glu-vol01-lab-replicate-0: path=(null) gfid=00000000-0000-0000-0000-000000000000: unlock failed on subvolume glu-vol 01-lab-client-0 with lock owner 28ae497049560000 [Invalid argument] [2018-08-19 16:45:54.276183] E [MSGID: 108010] [afr-lk-common.c:284:afr_unlock_inodelk_cbk] 0-glu-vol01-lab-replicate-0: path=(null) gfid=00000000-0000-0000-0000-000000000000: unlock failed on subvolume glu-vol 01-lab-client-1 with lock owner 28ae497049560000 [Invalid argument] [2018-08-19 17:16:03.690808] E [rpc-clnt.c:184:call_bail] 0-glu-vol01-lab-client-0: bailing out frame type(GlusterFS 4.x v1) op(FINODELK(30)) xid = 0x3071a5 sent = 2018-08-19 16:45:54.276560. timeout = 1800 for 192.168.13.131:49152 [2018-08-19 17:16:03.691113] E [MSGID: 114031] [client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 0-glu-vol01-lab-client-0: remote operation failed [Transport endpoint is not connected] [2018-08-19 17:46:03.855909] E [rpc-clnt.c:184:call_bail] 0-glu-vol01-lab-client-1: bailing out frame type(GlusterFS 4.x v1) op(FINODELK(30)) xid = 0x301d0f sent = 2018-08-19 17:16:03.691174. timeout = 1800 for 192.168.13.132:49152 [2018-08-19 17:46:03.856170] E [MSGID: 114031] [client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 0-glu-vol01-lab-client-1: remote operation failed [Transport endpoint is not connected] block I/O error in device 'drive-virtio-disk0': Operation not permitted (1) ... many repeats ... block I/O error in device 'drive-virtio-disk0': Operation not permitted (1) [2018-08-19 18:16:04.022526] E [rpc-clnt.c:184:call_bail] 0-glu-vol01-lab-client-0: bailing out frame type(GlusterFS 4.x v1) op(FINODELK(30)) xid = 0x307221 sent = 2018-08-19 17:46:03.861005. timeout = 1800 for 192.168.13.131:49152 [2018-08-19 18:16:04.022788] E [MSGID: 114031] [client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 0-glu-vol01-lab-client-0: remote operation failed [Transport endpoint is not connected] [2018-08-19 18:46:04.195590] E [rpc-clnt.c:184:call_bail] 0-glu-vol01-lab-client-1: bailing out frame type(GlusterFS 4.x v1) op(FINODELK(30)) xid = 0x301d8a sent = 2018-08-19 18:16:04.022838. timeout = 1800 for 192.168.13.132:49152 [2018-08-19 18:46:04.195881] E [MSGID: 114031] [client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 0-glu-vol01-lab-client-1: remote operation failed [Transport endpoint is not connected] block I/O error in device 'drive-virtio-disk0': Operation not permitted (1) block I/O error in device 'drive-virtio-disk0': Operation not permitted (1) block I/O error in device 'drive-virtio-disk0': Operation not permitted (1) block I/O error in device 'drive-virtio-disk0': Operation not permitted (1) block I/O error in device 'drive-virtio-disk0': Operation not permitted (1) qemu: terminating on signal 15 from pid 507 2018-08-19 19:36:59.065+0000: shutting down, reason=destroyed 2018-08-19 19:37:08.059+0000: starting up libvirt version: 3.9.0, package: 14.el7_5.6 (CentOS BuildSystem <http://bugs.centos.org>, 2018-06-27-14:13:57, x86-01.bsys.centos.org), qemu version: 1.5.3 (qemu-kvm-1. 5.3-156.el7_5.3) At 19:37 the VM was restarted. On Wed, Aug 15, 2018 at 8:25 PM Walter Deignan <WDeignan at uline.com> wrote: I am using gluster to host KVM/QEMU images. I am seeing an intermittent issue where access to an image will hang. I have to do a lazy dismount of the gluster volume in order to break the lock and then reset the impacted virtual machine. It happened again today and I caught the events below in the client side logs. Any thoughts on what might cause this? It seemed to begin after I upgraded from 3.12.10 to 4.1.1 a few weeks ago. [2018-08-14 14:22:15.549501] E [MSGID: 114031] [client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 2-gv1-client-4: remote operation failed [Invalid argument] [2018-08-14 14:22:15.549576] E [MSGID: 114031] [client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 2-gv1-client-5: remote operation failed [Invalid argument] [2018-08-14 14:22:15.549583] E [MSGID: 108010] [afr-lk-common.c:284:afr_unlock_inodelk_cbk] 2-gv1-replicate-2: path=(null) gfid=00000000-0000-0000-0000-000000000000: unlock failed on subvolume gv1-client-4 with lock owner d89caca92b7f0000 [Invalid argument] [2018-08-14 14:22:15.549615] E [MSGID: 108010] [afr-lk-common.c:284:afr_unlock_inodelk_cbk] 2-gv1-replicate-2: path=(null) gfid=00000000-0000-0000-0000-000000000000: unlock failed on subvolume gv1-client-5 with lock owner d89caca92b7f0000 [Invalid argument] [2018-08-14 14:52:18.726219] E [rpc-clnt.c:184:call_bail] 2-gv1-client-4: bailing out frame type(GlusterFS 4.x v1) op(FINODELK(30)) xid = 0xc5e00 sent = 2018-08-14 14:22:15.699082. timeout = 1800 for 10.35.20.106:49159 [2018-08-14 14:52:18.726254] E [MSGID: 114031] [client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 2-gv1-client-4: remote operation failed [Transport endpoint is not connected] [2018-08-14 15:22:25.962546] E [rpc-clnt.c:184:call_bail] 2-gv1-client-5: bailing out frame type(GlusterFS 4.x v1) op(FINODELK(30)) xid = 0xc4a6d sent = 2018-08-14 14:52:18.726329. timeout = 1800 for 10.35.20.107:49164 [2018-08-14 15:22:25.962587] E [MSGID: 114031] [client-rpc-fops_v2.c:1352:client4_0_finodelk_cbk] 2-gv1-client-5: remote operation failed [Transport endpoint is not connected] [2018-08-14 15:22:25.962618] W [MSGID: 108019] [afr-lk-common.c:601:is_blocking_locks_count_sufficient] 2-gv1-replicate-2: Unable to obtain blocking inode lock on even one child for gfid:24a48cae-53fe-4634-8fb7-0254c85ad672. [2018-08-14 15:22:25.962668] W [fuse-bridge.c:1441:fuse_err_cbk] 0-glusterfs-fuse: 3715808: FSYNC() ERR => -1 (Transport endpoint is not connected) Volume configuration - Volume Name: gv1 Type: Distributed-Replicate Volume ID: 66ad703e-3bae-4e79-a0b7-29ea38e8fcfc Status: Started Snapshot Count: 0 Number of Bricks: 5 x 2 = 10 Transport-type: tcp Bricks: Brick1: dc-vihi44:/gluster/bricks/megabrick/data Brick2: dc-vihi45:/gluster/bricks/megabrick/data Brick3: dc-vihi44:/gluster/bricks/brick1/data Brick4: dc-vihi45:/gluster/bricks/brick1/data Brick5: dc-vihi44:/gluster/bricks/brick2_1/data Brick6: dc-vihi45:/gluster/bricks/brick2/data Brick7: dc-vihi44:/gluster/bricks/brick3/data Brick8: dc-vihi45:/gluster/bricks/brick3/data Brick9: dc-vihi44:/gluster/bricks/brick4/data Brick10: dc-vihi45:/gluster/bricks/brick4/data Options Reconfigured: cluster.min-free-inodes: 6% performance.client-io-threads: off nfs.disable: on transport.address-family: inet performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.low-prio-threads: 32 network.remote-dio: enable cluster.eager-lock: enable cluster.server-quorum-type: server cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 user.cifs: off cluster.choose-local: off features.shard: on cluster.server-quorum-ratio: 51% -Walter Deignan -Uline IT, Systems Architect _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -- Claus Jeppesen Manager, Network Services Datto, Inc. p +45 6170 5901 | Copenhagen Office www.datto.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180820/28816dfc/attachment.html>