Hi all! One hypervisor on our virtualization environment crashed and now some of the VM images cannot be accessed. After investigation we found out that there was lots of images that still had active lock on crashed hypervisor. We were able to remove locks from "regular files", but it doesn't seem possible to remove locks from shards. We are running GlusterFS 3.8.15 on all nodes. Here is part of statedump that shows shard having active lock on crashed node: [xlator.features.locks.zone2-ssd1-vmstor1-locks.inode] path=/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21 mandatory=0 inodelk-count=1 lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0 inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 3568, owner=14ce372c397f0000, client=0x7f3198388770, connection-id ovirt8z2.xxx-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-1-7-0, granted at 2018-01-20 08:57:24 If we try to run clear-locks we get following error message: # gluster volume clear-locks zone2-ssd1-vmstor1 /.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21 kind all inode Volume clear-locks unsuccessful clear-locks getxattr command failed. Reason: Operation not permitted Gluster vol info if needed: Volume Name: zone2-ssd1-vmstor1 Type: Replicate Volume ID: b6319968-690b-4060-8fff-b212d2295208 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: rdma Bricks: Brick1: sto1z2.xxx:/ssd1/zone2-vmstor1/export Brick2: sto2z2.xxx:/ssd1/zone2-vmstor1/export Options Reconfigured: cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 8 cluster.locking-scheme: granular performance.low-prio-threads: 32 cluster.data-self-heal-algorithm: full performance.client-io-threads: off storage.linux-aio: off performance.readdir-ahead: on client.event-threads: 16 server.event-threads: 16 performance.strict-write-ordering: off performance.quick-read: off performance.read-ahead: on performance.io-cache: off performance.stat-prefetch: off cluster.eager-lock: enable network.remote-dio: on cluster.quorum-type: none network.ping-timeout: 22 performance.write-behind: off nfs.disable: on features.shard: on features.shard-block-size: 512MB storage.owner-uid: 36 storage.owner-gid: 36 performance.io-thread-count: 64 performance.cache-size: 2048MB performance.write-behind-window-size: 256MB server.allow-insecure: on cluster.ensure-durability: off config.transport: rdma server.outstanding-rpc-limit: 512 diagnostics.brick-log-level: INFO Any recommendations how to advance from here? Best regards, Samuli Heinonen
Hi again, here is more information regarding issue described earlier It looks like self healing is stuck. According to "heal statistics" crawl began at Sat Jan 20 12:56:19 2018 and it's still going on (It's around Sun Jan 21 20:30 when writing this). However glustershd.log says that last heal was completed at "2018-01-20 11:00:13.090697" (which is 13:00 UTC+2). Also "heal info" has been running now for over 16 hours without any information. In statedump I can see that storage nodes have locks on files and some of those are blocked. Ie. Here again it says that ovirt8z2 is having active lock even ovirt8z2 crashed after the lock was granted.: [xlator.features.locks.zone2-ssd1-vmstor1-locks.inode] path=/.shard/3d55f8cc-cda9-489a-b0a3-fd0f43d67876.27 mandatory=0 inodelk-count=3 lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 18446744073709551610, owner=d0c6d857a87f0000, client=0x7f885845efa0, connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zone2-ssd1-vmstor1-client-0-0-0, granted at 2018-01-20 10:59:52 lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0 inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 3420, owner=d8b9372c397f0000, client=0x7f8858410be0, connection-id=ovirt8z2.xxx.com-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-0-7-0, granted at 2018-01-20 08:57:23 inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 18446744073709551610, owner=d0c6d857a87f0000, client=0x7f885845efa0, connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zone2-ssd1-vmstor1-client-0-0-0, blocked at 2018-01-20 10:59:52 I'd also like to add that volume had arbiter brick before crash happened. We decided to remove it because we thought that it was causing issues. However now I think that this was unnecessary. After the crash arbiter logs had lots of messages like this: [2018-01-20 10:19:36.515717] I [MSGID: 115072] [server-rpc-fops.c:1640:server_setattr_cbk] 0-zone2-ssd1-vmstor1-server: 37374187: SETATTR <gfid:a52055bd-e2e9-42dd-92a3-e96b693bcafe> (a52055bd-e2e9-42dd-92a3-e96b693bcafe) ==> (Operation not permitted) [Operation not permitted] Is there anyways to force self heal to stop? Any help would be very much appreciated :) Best regards, Samuli Heinonen> Samuli Heinonen <mailto:samppah at neutraali.net> > 20 January 2018 at 21.57 > Hi all! > > One hypervisor on our virtualization environment crashed and now some > of the VM images cannot be accessed. After investigation we found out > that there was lots of images that still had active lock on crashed > hypervisor. We were able to remove locks from "regular files", but it > doesn't seem possible to remove locks from shards. > > We are running GlusterFS 3.8.15 on all nodes. > > Here is part of statedump that shows shard having active lock on > crashed node: > [xlator.features.locks.zone2-ssd1-vmstor1-locks.inode] > path=/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21 > mandatory=0 > inodelk-count=1 > lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata > lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal > lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0 > inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = > 3568, owner=14ce372c397f0000, client=0x7f3198388770, connection-id > ovirt8z2.xxx-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-1-7-0, > granted at 2018-01-20 08:57:24 > > If we try to run clear-locks we get following error message: > # gluster volume clear-locks zone2-ssd1-vmstor1 > /.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21 kind all inode > Volume clear-locks unsuccessful > clear-locks getxattr command failed. Reason: Operation not permitted > > Gluster vol info if needed: > Volume Name: zone2-ssd1-vmstor1 > Type: Replicate > Volume ID: b6319968-690b-4060-8fff-b212d2295208 > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 2 = 2 > Transport-type: rdma > Bricks: > Brick1: sto1z2.xxx:/ssd1/zone2-vmstor1/export > Brick2: sto2z2.xxx:/ssd1/zone2-vmstor1/export > Options Reconfigured: > cluster.shd-wait-qlength: 10000 > cluster.shd-max-threads: 8 > cluster.locking-scheme: granular > performance.low-prio-threads: 32 > cluster.data-self-heal-algorithm: full > performance.client-io-threads: off > storage.linux-aio: off > performance.readdir-ahead: on > client.event-threads: 16 > server.event-threads: 16 > performance.strict-write-ordering: off > performance.quick-read: off > performance.read-ahead: on > performance.io-cache: off > performance.stat-prefetch: off > cluster.eager-lock: enable > network.remote-dio: on > cluster.quorum-type: none > network.ping-timeout: 22 > performance.write-behind: off > nfs.disable: on > features.shard: on > features.shard-block-size: 512MB > storage.owner-uid: 36 > storage.owner-gid: 36 > performance.io-thread-count: 64 > performance.cache-size: 2048MB > performance.write-behind-window-size: 256MB > server.allow-insecure: on > cluster.ensure-durability: off > config.transport: rdma > server.outstanding-rpc-limit: 512 > diagnostics.brick-log-level: INFO > > Any recommendations how to advance from here? > > Best regards, > Samuli Heinonen > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180121/5b7c3029/attachment.html>
On Mon, Jan 22, 2018 at 12:33 AM, Samuli Heinonen <samppah at neutraali.net> wrote:> Hi again, > > here is more information regarding issue described earlier > > It looks like self healing is stuck. According to "heal statistics" crawl > began at Sat Jan 20 12:56:19 2018 and it's still going on (It's around Sun > Jan 21 20:30 when writing this). However glustershd.log says that last heal > was completed at "2018-01-20 11:00:13.090697" (which is 13:00 UTC+2). Also > "heal info" has been running now for over 16 hours without any information. > In statedump I can see that storage nodes have locks on files and some of > those are blocked. Ie. Here again it says that ovirt8z2 is having active > lock even ovirt8z2 crashed after the lock was granted.: > > [xlator.features.locks.zone2-ssd1-vmstor1-locks.inode] > path=/.shard/3d55f8cc-cda9-489a-b0a3-fd0f43d67876.27 > mandatory=0 > inodelk-count=3 > lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal > inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid > 18446744073709551610, owner=d0c6d857a87f0000, client=0x7f885845efa0, > connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14: > 649541-zone2-ssd1-vmstor1-client-0-0-0, granted at 2018-01-20 10:59:52 > lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata > lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0 > inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid > 3420, owner=d8b9372c397f0000, client=0x7f8858410be0, > connection-id=ovirt8z2.xxx.com-5652-2017/12/27-09:49:02: > 946825-zone2-ssd1-vmstor1-client-0-7-0, granted at 2018-01-20 08:57:23 > inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid > 18446744073709551610, owner=d0c6d857a87f0000, client=0x7f885845efa0, > connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14: > 649541-zone2-ssd1-vmstor1-client-0-0-0, blocked at 2018-01-20 10:59:52 > > I'd also like to add that volume had arbiter brick before crash happened. > We decided to remove it because we thought that it was causing issues. > However now I think that this was unnecessary. After the crash arbiter logs > had lots of messages like this: > [2018-01-20 10:19:36.515717] I [MSGID: 115072] [server-rpc-fops.c:1640:server_setattr_cbk] > 0-zone2-ssd1-vmstor1-server: 37374187: SETATTR > <gfid:a52055bd-e2e9-42dd-92a3-e96b693bcafe> (a52055bd-e2e9-42dd-92a3-e96b693bcafe) > ==> (Operation not permitted) [Operation not permitted] > > Is there anyways to force self heal to stop? Any help would be very much > appreciated :) >The locks are contending in afr self-heal and data path domains. It's possible that the deadlock is not caused by the hypervisor as if that were the case, the locks should have been released when it crashed/disconnected. Adding AFR devs to check what's causing the deadlock in the first place. -Krutika> > Best regards, > Samuli Heinonen > > > > > > Samuli Heinonen <samppah at neutraali.net> > 20 January 2018 at 21.57 > Hi all! > > One hypervisor on our virtualization environment crashed and now some of > the VM images cannot be accessed. After investigation we found out that > there was lots of images that still had active lock on crashed hypervisor. > We were able to remove locks from "regular files", but it doesn't seem > possible to remove locks from shards. > > We are running GlusterFS 3.8.15 on all nodes. > > Here is part of statedump that shows shard having active lock on crashed > node: > [xlator.features.locks.zone2-ssd1-vmstor1-locks.inode] > path=/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21 > mandatory=0 > inodelk-count=1 > lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata > lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal > lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0 > inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid > 3568, owner=14ce372c397f0000, client=0x7f3198388770, connection-id > ovirt8z2.xxx-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-1-7-0, > granted at 2018-01-20 08:57:24 > > If we try to run clear-locks we get following error message: > # gluster volume clear-locks zone2-ssd1-vmstor1 /.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21 > kind all inode > Volume clear-locks unsuccessful > clear-locks getxattr command failed. Reason: Operation not permitted > > Gluster vol info if needed: > Volume Name: zone2-ssd1-vmstor1 > Type: Replicate > Volume ID: b6319968-690b-4060-8fff-b212d2295208 > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 2 = 2 > Transport-type: rdma > Bricks: > Brick1: sto1z2.xxx:/ssd1/zone2-vmstor1/export > Brick2: sto2z2.xxx:/ssd1/zone2-vmstor1/export > Options Reconfigured: > cluster.shd-wait-qlength: 10000 > cluster.shd-max-threads: 8 > cluster.locking-scheme: granular > performance.low-prio-threads: 32 > cluster.data-self-heal-algorithm: full > performance.client-io-threads: off > storage.linux-aio: off > performance.readdir-ahead: on > client.event-threads: 16 > server.event-threads: 16 > performance.strict-write-ordering: off > performance.quick-read: off > performance.read-ahead: on > performance.io-cache: off > performance.stat-prefetch: off > cluster.eager-lock: enable > network.remote-dio: on > cluster.quorum-type: none > network.ping-timeout: 22 > performance.write-behind: off > nfs.disable: on > features.shard: on > features.shard-block-size: 512MB > storage.owner-uid: 36 > storage.owner-gid: 36 > performance.io-thread-count: 64 > performance.cache-size: 2048MB > performance.write-behind-window-size: 256MB > server.allow-insecure: on > cluster.ensure-durability: off > config.transport: rdma > server.outstanding-rpc-limit: 512 > diagnostics.brick-log-level: INFO > > Any recommendations how to advance from here? > > Best regards, > Samuli Heinonen > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180123/37618e1f/attachment.html>
On Mon, Jan 22, 2018 at 12:33 AM, Samuli Heinonen <samppah at neutraali.net> wrote:> Hi again, > > here is more information regarding issue described earlier > > It looks like self healing is stuck. According to "heal statistics" crawl > began at Sat Jan 20 12:56:19 2018 and it's still going on (It's around Sun > Jan 21 20:30 when writing this). However glustershd.log says that last heal > was completed at "2018-01-20 11:00:13.090697" (which is 13:00 UTC+2). Also > "heal info" has been running now for over 16 hours without any information. > In statedump I can see that storage nodes have locks on files and some of > those are blocked. Ie. Here again it says that ovirt8z2 is having active > lock even ovirt8z2 crashed after the lock was granted.: > > [xlator.features.locks.zone2-ssd1-vmstor1-locks.inode] > path=/.shard/3d55f8cc-cda9-489a-b0a3-fd0f43d67876.27 > mandatory=0 > inodelk-count=3 > lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal > inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid > 18446744073709551610, owner=d0c6d857a87f0000, client=0x7f885845efa0, > connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14: > 649541-zone2-ssd1-vmstor1-client-0-0-0, granted at 2018-01-20 10:59:52 > lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata > lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0 > inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid > 3420, owner=d8b9372c397f0000, client=0x7f8858410be0, > connection-id=ovirt8z2.xxx.com-5652-2017/12/27-09:49:02: > 946825-zone2-ssd1-vmstor1-client-0-7-0, granted at 2018-01-20 08:57:23 > inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid > 18446744073709551610, owner=d0c6d857a87f0000, client=0x7f885845efa0, > connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14: > 649541-zone2-ssd1-vmstor1-client-0-0-0, blocked at 2018-01-20 10:59:52 > > I'd also like to add that volume had arbiter brick before crash happened. > We decided to remove it because we thought that it was causing issues. > However now I think that this was unnecessary. After the crash arbiter logs > had lots of messages like this: > [2018-01-20 10:19:36.515717] I [MSGID: 115072] [server-rpc-fops.c:1640:server_setattr_cbk] > 0-zone2-ssd1-vmstor1-server: 37374187: SETATTR > <gfid:a52055bd-e2e9-42dd-92a3-e96b693bcafe> (a52055bd-e2e9-42dd-92a3-e96b693bcafe) > ==> (Operation not permitted) [Operation not permitted] > > Is there anyways to force self heal to stop? Any help would be very much > appreciated :) >Exposing .shard to a normal mount is opening a can of worms. You should probably look at mounting the volume with gfid aux-mount where you can access a file with <path-to-mount>/.gfid/<gfid-string>to clear locks on it. Mount command: mount -t glusterfs -o aux-gfid-mount vm1:test /mnt/testvol A gfid string will have some hyphens like: 11118443-1894-4273-9340-4b212fa1c0e4 That said. Next disconnect on the brick where you successfully did the clear-locks will crash the brick. There was a bug in 3.8.x series with clear-locks which was fixed in 3.9.0 with a feature. The self-heal deadlocks that you witnessed also is fixed in 3.10 version of the release. 3.8.x is EOLed, so I recommend you to upgrade to a supported version soon.> > Best regards, > Samuli Heinonen > > > > > > Samuli Heinonen <samppah at neutraali.net> > 20 January 2018 at 21.57 > Hi all! > > One hypervisor on our virtualization environment crashed and now some of > the VM images cannot be accessed. After investigation we found out that > there was lots of images that still had active lock on crashed hypervisor. > We were able to remove locks from "regular files", but it doesn't seem > possible to remove locks from shards. > > We are running GlusterFS 3.8.15 on all nodes. > > Here is part of statedump that shows shard having active lock on crashed > node: > [xlator.features.locks.zone2-ssd1-vmstor1-locks.inode] > path=/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21 > mandatory=0 > inodelk-count=1 > lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata > lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal > lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0 > inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid > 3568, owner=14ce372c397f0000, client=0x7f3198388770, connection-id > ovirt8z2.xxx-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-1-7-0, > granted at 2018-01-20 08:57:24 > > If we try to run clear-locks we get following error message: > # gluster volume clear-locks zone2-ssd1-vmstor1 /.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21 > kind all inode > Volume clear-locks unsuccessful > clear-locks getxattr command failed. Reason: Operation not permitted > > Gluster vol info if needed: > Volume Name: zone2-ssd1-vmstor1 > Type: Replicate > Volume ID: b6319968-690b-4060-8fff-b212d2295208 > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 2 = 2 > Transport-type: rdma > Bricks: > Brick1: sto1z2.xxx:/ssd1/zone2-vmstor1/export > Brick2: sto2z2.xxx:/ssd1/zone2-vmstor1/export > Options Reconfigured: > cluster.shd-wait-qlength: 10000 > cluster.shd-max-threads: 8 > cluster.locking-scheme: granular > performance.low-prio-threads: 32 > cluster.data-self-heal-algorithm: full > performance.client-io-threads: off > storage.linux-aio: off > performance.readdir-ahead: on > client.event-threads: 16 > server.event-threads: 16 > performance.strict-write-ordering: off > performance.quick-read: off > performance.read-ahead: on > performance.io-cache: off > performance.stat-prefetch: off > cluster.eager-lock: enable > network.remote-dio: on > cluster.quorum-type: none > network.ping-timeout: 22 > performance.write-behind: off > nfs.disable: on > features.shard: on > features.shard-block-size: 512MB > storage.owner-uid: 36 > storage.owner-gid: 36 > performance.io-thread-count: 64 > performance.cache-size: 2048MB > performance.write-behind-window-size: 256MB > server.allow-insecure: on > cluster.ensure-durability: off > config.transport: rdma > server.outstanding-rpc-limit: 512 > diagnostics.brick-log-level: INFO > > Any recommendations how to advance from here? > > Best regards, > Samuli Heinonen > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users >-- Pranith -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180123/f108d2ff/attachment.html>