thr3ads.net - Gluster users - [Gluster-users] Stale locks on shards [Jan 2018]

If this information is useful, please help other people find it:
Share via:

Pranith Kumar Karampuri

2018-Jan-23 08:30 UTC

[Gluster-users] Stale locks on shards

On Tue, Jan 23, 2018 at 1:38 PM, Samuli Heinonen <samppah at
neutraali.net>
wrote:
> Pranith Kumar Karampuri kirjoitti 23.01.2018 09:34:
>
>> On Mon, Jan 22, 2018 at 12:33 AM, Samuli Heinonen
>> <samppah at neutraali.net> wrote:
>>
>> Hi again,
>>>
>>> here is more information regarding issue described earlier
>>>
>>> It looks like self healing is stuck. According to "heal
statistics"
>>> crawl began at Sat Jan 20 12:56:19 2018 and it's still going on
>>> (It's around Sun Jan 21 20:30 when writing this). However
>>> glustershd.log says that last heal was completed at
"2018-01-20
>>> 11:00:13.090697" (which is 13:00 UTC+2). Also "heal
info" has been
>>> running now for over 16 hours without any information. In statedump
>>> I can see that storage nodes have locks on files and some of those
>>> are blocked. Ie. Here again it says that ovirt8z2 is having active
>>> lock even ovirt8z2 crashed after the lock was granted.:
>>>
>>> [xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
>>> path=/.shard/3d55f8cc-cda9-489a-b0a3-fd0f43d67876.27
>>> mandatory=0
>>> inodelk-count=3
>>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
>>> inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0,
pid
>>> = 18446744073709551610, owner=d0c6d857a87f0000,
>>> client=0x7f885845efa0,
>>>
>>> connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-
>> zone2-ssd1-vmstor1-client-0-0-0,
>>
>>> granted at 2018-01-20 10:59:52
>>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
>>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
>>> inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0,
pid
>>> = 3420, owner=d8b9372c397f0000, client=0x7f8858410be0,
>>>
>>> connection-id=ovirt8z2.xxx.com-5652-2017/12/27-09:49:02:9468
>> 25-zone2-ssd1-vmstor1-client-0-7-0,
>>
>>> granted at 2018-01-20 08:57:23
>>> inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0,
>>> pid = 18446744073709551610, owner=d0c6d857a87f0000,
>>> client=0x7f885845efa0,
>>>
>>> connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-
>> zone2-ssd1-vmstor1-client-0-0-0,
>>
>>> blocked at 2018-01-20 10:59:52
>>>
>>> I'd also like to add that volume had arbiter brick before crash
>>> happened. We decided to remove it because we thought that it was
>>> causing issues. However now I think that this was unnecessary.
After
>>> the crash arbiter logs had lots of messages like this:
>>> [2018-01-20 10:19:36.515717] I [MSGID: 115072]
>>> [server-rpc-fops.c:1640:server_setattr_cbk]
>>> 0-zone2-ssd1-vmstor1-server: 37374187: SETATTR
>>> <gfid:a52055bd-e2e9-42dd-92a3-e96b693bcafe>
>>> (a52055bd-e2e9-42dd-92a3-e96b693bcafe) ==> (Operation not
permitted)
>>> [Operation not permitted]
>>>
>>> Is there anyways to force self heal to stop? Any help would be very
>>> much appreciated :)
>>>
>>
>> Exposing .shard to a normal mount is opening a can of worms. You
>> should probably look at mounting the volume with gfid aux-mount where
>> you can access a file with
<path-to-mount>/.gfid/<gfid-string>to clear
>> locks on it.
>>
>> Mount command:  mount -t glusterfs -o aux-gfid-mount vm1:test
>> /mnt/testvol
>>
>> A gfid string will have some hyphens like:
>> 11118443-1894-4273-9340-4b212fa1c0e4
>>
>> That said. Next disconnect on the brick where you successfully did the
>> clear-locks will crash the brick. There was a bug in 3.8.x series with
>> clear-locks which was fixed in 3.9.0 with a feature. The self-heal
>> deadlocks that you witnessed also is fixed in 3.10 version of the
>> release.
>>
>
>
> Thank you the answer. Could you please tell more about crash? What will
> actually happen or is there a bug report about it? Just want to make sure
> that we can do everything to secure data on bricks. We will look into
> upgrade but we have to make sure that new version works for us and of
> course get self healing working before doing anything :)
>
Locks xlator/module maintains a list of locks that are granted to a client.
Clear locks had an issue where it forgets to remove the lock from this
list. So the connection list ends up pointing to data that is freed in that
list after a clear lock. When a disconnect happens, all the locks that are
granted to a client need to be unlocked. So the process starts traversing
through this list and when it starts trying to access this freed data it
leads to a crash. I found it while reviewing a feature patch sent by
facebook folks to locks xlator (http://review.gluster.org/14816) for 3.9.0
and they also fixed this bug as well as part of that feature patch.

>
> Br,
> Samuli
>
>
>> 3.8.x is EOLed, so I recommend you to upgrade to a supported version
>> soon.
>>
>> Best regards,
>>> Samuli Heinonen
>>>
>>> Samuli Heinonen
>>>> 20 January 2018 at 21.57
>>>>
>>>> Hi all!
>>>>
>>>> One hypervisor on our virtualization environment crashed and
now
>>>> some of the VM images cannot be accessed. After investigation
we
>>>> found out that there was lots of images that still had active
lock
>>>> on crashed hypervisor. We were able to remove locks from
"regular
>>>> files", but it doesn't seem possible to remove locks
from shards.
>>>>
>>>> We are running GlusterFS 3.8.15 on all nodes.
>>>>
>>>> Here is part of statedump that shows shard having active lock
on
>>>> crashed node:
>>>> [xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
>>>> path=/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
>>>> mandatory=0
>>>> inodelk-count=1
>>>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
>>>>
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
>>>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
>>>> inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0,
len=0,
>>>> pid = 3568, owner=14ce372c397f0000, client=0x7f3198388770,
>>>> connection-id
>>>>
>>>>
>>> ovirt8z2.xxx-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmst
>> or1-client-1-7-0,
>>
>>> granted at 2018-01-20 08:57:24
>>>>
>>>> If we try to run clear-locks we get following error message:
>>>> # gluster volume clear-locks zone2-ssd1-vmstor1
>>>> /.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21 kind all inode
>>>> Volume clear-locks unsuccessful
>>>> clear-locks getxattr command failed. Reason: Operation not
>>>> permitted
>>>>
>>>> Gluster vol info if needed:
>>>> Volume Name: zone2-ssd1-vmstor1
>>>> Type: Replicate
>>>> Volume ID: b6319968-690b-4060-8fff-b212d2295208
>>>> Status: Started
>>>> Snapshot Count: 0
>>>> Number of Bricks: 1 x 2 = 2
>>>> Transport-type: rdma
>>>> Bricks:
>>>> Brick1: sto1z2.xxx:/ssd1/zone2-vmstor1/export
>>>> Brick2: sto2z2.xxx:/ssd1/zone2-vmstor1/export
>>>> Options Reconfigured:
>>>> cluster.shd-wait-qlength: 10000
>>>> cluster.shd-max-threads: 8
>>>> cluster.locking-scheme: granular
>>>> performance.low-prio-threads: 32
>>>> cluster.data-self-heal-algorithm: full
>>>> performance.client-io-threads: off
>>>> storage.linux-aio: off
>>>> performance.readdir-ahead: on
>>>> client.event-threads: 16
>>>> server.event-threads: 16
>>>> performance.strict-write-ordering: off
>>>> performance.quick-read: off
>>>> performance.read-ahead: on
>>>> performance.io-cache: off
>>>> performance.stat-prefetch: off
>>>> cluster.eager-lock: enable
>>>> network.remote-dio: on
>>>> cluster.quorum-type: none
>>>> network.ping-timeout: 22
>>>> performance.write-behind: off
>>>> nfs.disable: on
>>>> features.shard: on
>>>> features.shard-block-size: 512MB
>>>> storage.owner-uid: 36
>>>> storage.owner-gid: 36
>>>> performance.io-thread-count: 64
>>>> performance.cache-size: 2048MB
>>>> performance.write-behind-window-size: 256MB
>>>> server.allow-insecure: on
>>>> cluster.ensure-durability: off
>>>> config.transport: rdma
>>>> server.outstanding-rpc-limit: 512
>>>> diagnostics.brick-log-level: INFO
>>>>
>>>> Any recommendations how to advance from here?
>>>>
>>>> Best regards,
>>>> Samuli Heinonen
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> http://lists.gluster.org/mailman/listinfo/gluster-users [1]
>>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://lists.gluster.org/mailman/listinfo/gluster-users [1]
>>>
>>
>> --
>>
>> Pranith
>>
>>
>> Links:
>> ------
>> [1] http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>

-- 
Pranith
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20180123/398dc79d/attachment.html>

Samuli Heinonen

2018-Jan-24 20:57 UTC

head link

[Gluster-users] Stale locks on shards

Hi!

Thank you very much for your help so far. Could you please tell an 
example command how to use aux-gid-mount to remove locks? "gluster vol 
clear-locks" seems to mount volume by itself.

Best regards,
Samuli Heinonen
> Pranith Kumar Karampuri <mailto:pkarampu at redhat.com>
> 23 January 2018 at 10.30
>
>
> On Tue, Jan 23, 2018 at 1:38 PM, Samuli Heinonen 
> <samppah at neutraali.net <mailto:samppah at neutraali.net>>
wrote:
>
>     Pranith Kumar Karampuri kirjoitti 23.01.2018 09:34:
>
>         On Mon, Jan 22, 2018 at 12:33 AM, Samuli Heinonen
>         <samppah at neutraali.net <mailto:samppah at
neutraali.net>> wrote:
>
>             Hi again,
>
>             here is more information regarding issue described earlier
>
>             It looks like self healing is stuck. According to "heal
>             statistics"
>             crawl began at Sat Jan 20 12:56:19 2018 and it's still
>             going on
>             (It's around Sun Jan 21 20:30 when writing this). However
>             glustershd.log says that last heal was completed at
>             "2018-01-20
>             11:00:13.090697" (which is 13:00 UTC+2). Also "heal
info"
>             has been
>             running now for over 16 hours without any information. In
>             statedump
>             I can see that storage nodes have locks on files and some
>             of those
>             are blocked. Ie. Here again it says that ovirt8z2 is
>             having active
>             lock even ovirt8z2 crashed after the lock was granted.:
>
>             [xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
>             path=/.shard/3d55f8cc-cda9-489a-b0a3-fd0f43d67876.27
>             mandatory=0
>             inodelk-count=3
>            
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
>             inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0,
>             len=0, pid
>             = 18446744073709551610, owner=d0c6d857a87f0000,
>             client=0x7f885845efa0,
>
>        
connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zone2-ssd1-vmstor1-client-0-0-0,
>
>             granted at 2018-01-20 10:59:52
>             lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
>             lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
>             inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0,
>             len=0, pid
>             = 3420, owner=d8b9372c397f0000, client=0x7f8858410be0,
>
>         connection-id=ovirt8z2.xxx.com
>        
<http://ovirt8z2.xxx.com>-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-0-7-0,
>
>             granted at 2018-01-20 08:57:23
>             inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0,
>             len=0,
>             pid = 18446744073709551610, owner=d0c6d857a87f0000,
>             client=0x7f885845efa0,
>
>        
connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zone2-ssd1-vmstor1-client-0-0-0,
>
>             blocked at 2018-01-20 10:59:52
>
>             I'd also like to add that volume had arbiter brick before
>             crash
>             happened. We decided to remove it because we thought that
>             it was
>             causing issues. However now I think that this was
>             unnecessary. After
>             the crash arbiter logs had lots of messages like this:
>             [2018-01-20 10:19:36.515717] I [MSGID: 115072]
>             [server-rpc-fops.c:1640:server_setattr_cbk]
>             0-zone2-ssd1-vmstor1-server: 37374187: SETATTR
>             <gfid:a52055bd-e2e9-42dd-92a3-e96b693bcafe>
>             (a52055bd-e2e9-42dd-92a3-e96b693bcafe) ==> (Operation not
>             permitted)
>             [Operation not permitted]
>
>             Is there anyways to force self heal to stop? Any help
>             would be very
>             much appreciated :)
>
>
>         Exposing .shard to a normal mount is opening a can of worms. You
>         should probably look at mounting the volume with gfid
>         aux-mount where
>         you can access a file with
>         <path-to-mount>/.gfid/<gfid-string>to clear
>         locks on it.
>
>         Mount command:  mount -t glusterfs -o aux-gfid-mount vm1:test
>         /mnt/testvol
>
>         A gfid string will have some hyphens like:
>         11118443-1894-4273-9340-4b212fa1c0e4
>
>         That said. Next disconnect on the brick where you successfully
>         did the
>         clear-locks will crash the brick. There was a bug in 3.8.x
>         series with
>         clear-locks which was fixed in 3.9.0 with a feature. The self-heal
>         deadlocks that you witnessed also is fixed in 3.10 version of the
>         release.
>
>
>
>     Thank you the answer. Could you please tell more about crash? What
>     will actually happen or is there a bug report about it? Just want
>     to make sure that we can do everything to secure data on bricks.
>     We will look into upgrade but we have to make sure that new
>     version works for us and of course get self healing working before
>     doing anything :)
>
>
> Locks xlator/module maintains a list of locks that are granted to a 
> client. Clear locks had an issue where it forgets to remove the lock 
> from this list. So the connection list ends up pointing to data that 
> is freed in that list after a clear lock. When a disconnect happens, 
> all the locks that are granted to a client need to be unlocked. So the 
> process starts traversing through this list and when it starts trying 
> to access this freed data it leads to a crash. I found it while 
> reviewing a feature patch sent by facebook folks to locks xlator 
> (http://review.gluster.org/14816) for 3.9.0 and they also fixed this 
> bug as well as part of that feature patch.
>
>
>     Br,
>     Samuli
>
>
>         3.8.x is EOLed, so I recommend you to upgrade to a supported
>         version
>         soon.
>
>             Best regards,
>             Samuli Heinonen
>
>                 Samuli Heinonen
>                 20 January 2018 at 21.57
>
>                 Hi all!
>
>                 One hypervisor on our virtualization environment
>                 crashed and now
>                 some of the VM images cannot be accessed. After
>                 investigation we
>                 found out that there was lots of images that still had
>                 active lock
>                 on crashed hypervisor. We were able to remove locks
>                 from "regular
>                 files", but it doesn't seem possible to remove
locks
>                 from shards.
>
>                 We are running GlusterFS 3.8.15 on all nodes.
>
>                 Here is part of statedump that shows shard having
>                 active lock on
>                 crashed node:
>                 [xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
>                 path=/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
>                 mandatory=0
>                 inodelk-count=1
>                
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
>                
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
>                 lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
>                 inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0,
>                 start=0, len=0,
>                 pid = 3568, owner=14ce372c397f0000, client=0x7f3198388770,
>                 connection-id
>
>
>        
ovirt8z2.xxx-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-1-7-0,
>
>                 granted at 2018-01-20 08:57:24
>
>                 If we try to run clear-locks we get following error
>                 message:
>                 # gluster volume clear-locks zone2-ssd1-vmstor1
>                 /.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21 kind
>                 all inode
>                 Volume clear-locks unsuccessful
>                 clear-locks getxattr command failed. Reason: Operation not
>                 permitted
>
>                 Gluster vol info if needed:
>                 Volume Name: zone2-ssd1-vmstor1
>                 Type: Replicate
>                 Volume ID: b6319968-690b-4060-8fff-b212d2295208
>                 Status: Started
>                 Snapshot Count: 0
>                 Number of Bricks: 1 x 2 = 2
>                 Transport-type: rdma
>                 Bricks:
>                 Brick1: sto1z2.xxx:/ssd1/zone2-vmstor1/export
>                 Brick2: sto2z2.xxx:/ssd1/zone2-vmstor1/export
>                 Options Reconfigured:
>                 cluster.shd-wait-qlength: 10000
>                 cluster.shd-max-threads: 8
>                 cluster.locking-scheme: granular
>                 performance.low-prio-threads: 32
>                 cluster.data-self-heal-algorithm: full
>                 performance.client-io-threads: off
>                 storage.linux-aio: off
>                 performance.readdir-ahead: on
>                 client.event-threads: 16
>                 server.event-threads: 16
>                 performance.strict-write-ordering: off
>                 performance.quick-read: off
>                 performance.read-ahead: on
>                 performance.io-cache: off
>                 performance.stat-prefetch: off
>                 cluster.eager-lock: enable
>                 network.remote-dio: on
>                 cluster.quorum-type: none
>                 network.ping-timeout: 22
>                 performance.write-behind: off
>                 nfs.disable: on
>                 features.shard: on
>                 features.shard-block-size: 512MB
>                 storage.owner-uid: 36
>                 storage.owner-gid: 36
>                 performance.io-thread-count: 64
>                 performance.cache-size: 2048MB
>                 performance.write-behind-window-size: 256MB
>                 server.allow-insecure: on
>                 cluster.ensure-durability: off
>                 config.transport: rdma
>                 server.outstanding-rpc-limit: 512
>                 diagnostics.brick-log-level: INFO
>
>                 Any recommendations how to advance from here?
>
>                 Best regards,
>                 Samuli Heinonen
>
>                 _______________________________________________
>                 Gluster-users mailing list
>                 Gluster-users at gluster.org
>                 <mailto:Gluster-users at gluster.org>
>                 http://lists.gluster.org/mailman/listinfo/gluster-users
<http://lists.gluster.org/mailman/listinfo/gluster-users>
>                 [1]
>
>
>             _______________________________________________
>             Gluster-users mailing list
>             Gluster-users at gluster.org <mailto:Gluster-users at
gluster.org>
>             http://lists.gluster.org/mailman/listinfo/gluster-users
>             <http://lists.gluster.org/mailman/listinfo/gluster-users>
[1]
>
>
>         --
>
>         Pranith
>
>
>         Links:
>         ------
>         [1] http://lists.gluster.org/mailman/listinfo/gluster-users
>         <http://lists.gluster.org/mailman/listinfo/gluster-users>
>
>
>
>
> -- 
> Pranith
> Samuli Heinonen <mailto:samppah at neutraali.net>
> 21 January 2018 at 21.03
> Hi again,
>
> here is more information regarding issue described earlier
>
> It looks like self healing is stuck. According to "heal
statistics"
> crawl began at Sat Jan 20 12:56:19 2018 and it's still going on
(It's
> around Sun Jan 21 20:30 when writing this). However glustershd.log 
> says that last heal was completed at "2018-01-20 11:00:13.090697"
> (which is 13:00 UTC+2). Also "heal info" has been running now for
over
> 16 hours without any information. In statedump I can see that storage 
> nodes have locks on files and some of those are blocked. Ie. Here 
> again it says that ovirt8z2 is having active lock even ovirt8z2 
> crashed after the lock was granted.:
>
> [xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
> path=/.shard/3d55f8cc-cda9-489a-b0a3-fd0f43d67876.27
> mandatory=0
> inodelk-count=3
> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
> inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 
> 18446744073709551610, owner=d0c6d857a87f0000, client=0x7f885845efa0, 
>
connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zone2-ssd1-vmstor1-client-0-0-0,
> granted at 2018-01-20 10:59:52
> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
> inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 
> 3420, owner=d8b9372c397f0000, client=0x7f8858410be0, 
>
connection-id=ovirt8z2.xxx.com-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-0-7-0,
> granted at 2018-01-20 08:57:23
> inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid 
> = 18446744073709551610, owner=d0c6d857a87f0000, client=0x7f885845efa0, 
>
connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zone2-ssd1-vmstor1-client-0-0-0,
> blocked at 2018-01-20 10:59:52
>
> I'd also like to add that volume had arbiter brick before crash 
> happened. We decided to remove it because we thought that it was 
> causing issues. However now I think that this was unnecessary. After 
> the crash arbiter logs had lots of messages like this:
> [2018-01-20 10:19:36.515717] I [MSGID: 115072] 
> [server-rpc-fops.c:1640:server_setattr_cbk] 
> 0-zone2-ssd1-vmstor1-server: 37374187: SETATTR 
> <gfid:a52055bd-e2e9-42dd-92a3-e96b693bcafe> 
> (a52055bd-e2e9-42dd-92a3-e96b693bcafe) ==> (Operation not permitted) 
> [Operation not permitted]
>
> Is there anyways to force self heal to stop? Any help would be very 
> much appreciated :)
>
> Best regards,
> Samuli Heinonen
>
>
>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
> Samuli Heinonen <mailto:samppah at neutraali.net>
> 20 January 2018 at 21.57
> Hi all!
>
> One hypervisor on our virtualization environment crashed and now some 
> of the VM images cannot be accessed. After investigation we found out 
> that there was lots of images that still had active lock on crashed 
> hypervisor. We were able to remove locks from "regular files",
but it
> doesn't seem possible to remove locks from shards.
>
> We are running GlusterFS 3.8.15 on all nodes.
>
> Here is part of statedump that shows shard having active lock on 
> crashed node:
> [xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
> path=/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
> mandatory=0
> inodelk-count=1
> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
> inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 
> 3568, owner=14ce372c397f0000, client=0x7f3198388770, connection-id 
>
ovirt8z2.xxx-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-1-7-0,
> granted at 2018-01-20 08:57:24
>
> If we try to run clear-locks we get following error message:
> # gluster volume clear-locks zone2-ssd1-vmstor1 
> /.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21 kind all inode
> Volume clear-locks unsuccessful
> clear-locks getxattr command failed. Reason: Operation not permitted
>
> Gluster vol info if needed:
> Volume Name: zone2-ssd1-vmstor1
> Type: Replicate
> Volume ID: b6319968-690b-4060-8fff-b212d2295208
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 2 = 2
> Transport-type: rdma
> Bricks:
> Brick1: sto1z2.xxx:/ssd1/zone2-vmstor1/export
> Brick2: sto2z2.xxx:/ssd1/zone2-vmstor1/export
> Options Reconfigured:
> cluster.shd-wait-qlength: 10000
> cluster.shd-max-threads: 8
> cluster.locking-scheme: granular
> performance.low-prio-threads: 32
> cluster.data-self-heal-algorithm: full
> performance.client-io-threads: off
> storage.linux-aio: off
> performance.readdir-ahead: on
> client.event-threads: 16
> server.event-threads: 16
> performance.strict-write-ordering: off
> performance.quick-read: off
> performance.read-ahead: on
> performance.io-cache: off
> performance.stat-prefetch: off
> cluster.eager-lock: enable
> network.remote-dio: on
> cluster.quorum-type: none
> network.ping-timeout: 22
> performance.write-behind: off
> nfs.disable: on
> features.shard: on
> features.shard-block-size: 512MB
> storage.owner-uid: 36
> storage.owner-gid: 36
> performance.io-thread-count: 64
> performance.cache-size: 2048MB
> performance.write-behind-window-size: 256MB
> server.allow-insecure: on
> cluster.ensure-durability: off
> config.transport: rdma
> server.outstanding-rpc-limit: 512
> diagnostics.brick-log-level: INFO
>
> Any recommendations how to advance from here?
>
> Best regards,
> Samuli Heinonen
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users

Pranith Kumar Karampuri

2018-Jan-25 05:09 UTC

head link

[Gluster-users] Stale locks on shards

On Thu, Jan 25, 2018 at 2:27 AM, Samuli Heinonen <samppah at
neutraali.net>
wrote:
> Hi!
>
> Thank you very much for your help so far. Could you please tell an example
> command how to use aux-gid-mount to remove locks? "gluster vol
clear-locks"
> seems to mount volume by itself.
>
You are correct, sorry, this was implemented around 7 years back and I
forgot that bit about it :-(. Essentially it becomes a getxattr syscall on
the file.
Could you give me the clear-locks command you were trying to execute and I
can probably convert it to the getfattr command?

>
> Best regards,
> Samuli Heinonen
>
> Pranith Kumar Karampuri <mailto:pkarampu at redhat.com>
>> 23 January 2018 at 10.30
>>
>>
>> On Tue, Jan 23, 2018 at 1:38 PM, Samuli Heinonen <samppah at
neutraali.net
>> <mailto:samppah at neutraali.net>> wrote:
>>
>>     Pranith Kumar Karampuri kirjoitti 23.01.2018 09:34:
>>
>>         On Mon, Jan 22, 2018 at 12:33 AM, Samuli Heinonen
>>         <samppah at neutraali.net <mailto:samppah at
neutraali.net>> wrote:
>>
>>             Hi again,
>>
>>             here is more information regarding issue described earlier
>>
>>             It looks like self healing is stuck. According to
"heal
>>             statistics"
>>             crawl began at Sat Jan 20 12:56:19 2018 and it's still
>>             going on
>>             (It's around Sun Jan 21 20:30 when writing this).
However
>>             glustershd.log says that last heal was completed at
>>             "2018-01-20
>>             11:00:13.090697" (which is 13:00 UTC+2). Also
"heal info"
>>             has been
>>             running now for over 16 hours without any information. In
>>             statedump
>>             I can see that storage nodes have locks on files and some
>>             of those
>>             are blocked. Ie. Here again it says that ovirt8z2 is
>>             having active
>>             lock even ovirt8z2 crashed after the lock was granted.:
>>
>>             [xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
>>             path=/.shard/3d55f8cc-cda9-489a-b0a3-fd0f43d67876.27
>>             mandatory=0
>>             inodelk-count=3
>>            
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-
>> heal
>>             inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0,
>>             len=0, pid
>>             = 18446744073709551610, owner=d0c6d857a87f0000,
>>             client=0x7f885845efa0,
>>
>>         connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-
>> zone2-ssd1-vmstor1-client-0-0-0,
>>
>>             granted at 2018-01-20 10:59:52
>>            
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metad
>> ata
>>             lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
>>             inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0,
>>             len=0, pid
>>             = 3420, owner=d8b9372c397f0000, client=0x7f8858410be0,
>>
>>         connection-id=ovirt8z2.xxx.com
>>        
<http://ovirt8z2.xxx.com>-5652-2017/12/27-09:49:02:946825-
>> zone2-ssd1-vmstor1-client-0-7-0,
>>
>>
>>             granted at 2018-01-20 08:57:23
>>             inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0,
>>             len=0,
>>             pid = 18446744073709551610, owner=d0c6d857a87f0000,
>>             client=0x7f885845efa0,
>>
>>         connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-
>> zone2-ssd1-vmstor1-client-0-0-0,
>>
>>             blocked at 2018-01-20 10:59:52
>>
>>             I'd also like to add that volume had arbiter brick
before
>>             crash
>>             happened. We decided to remove it because we thought that
>>             it was
>>             causing issues. However now I think that this was
>>             unnecessary. After
>>             the crash arbiter logs had lots of messages like this:
>>             [2018-01-20 10:19:36.515717] I [MSGID: 115072]
>>             [server-rpc-fops.c:1640:server_setattr_cbk]
>>             0-zone2-ssd1-vmstor1-server: 37374187: SETATTR
>>             <gfid:a52055bd-e2e9-42dd-92a3-e96b693bcafe>
>>             (a52055bd-e2e9-42dd-92a3-e96b693bcafe) ==> (Operation
not
>>             permitted)
>>             [Operation not permitted]
>>
>>             Is there anyways to force self heal to stop? Any help
>>             would be very
>>             much appreciated :)
>>
>>
>>         Exposing .shard to a normal mount is opening a can of worms.
You
>>         should probably look at mounting the volume with gfid
>>         aux-mount where
>>         you can access a file with
>>         <path-to-mount>/.gfid/<gfid-string>to clear
>>         locks on it.
>>
>>         Mount command:  mount -t glusterfs -o aux-gfid-mount vm1:test
>>         /mnt/testvol
>>
>>         A gfid string will have some hyphens like:
>>         11118443-1894-4273-9340-4b212fa1c0e4
>>
>>         That said. Next disconnect on the brick where you successfully
>>         did the
>>         clear-locks will crash the brick. There was a bug in 3.8.x
>>         series with
>>         clear-locks which was fixed in 3.9.0 with a feature. The
self-heal
>>         deadlocks that you witnessed also is fixed in 3.10 version of
the
>>         release.
>>
>>
>>
>>     Thank you the answer. Could you please tell more about crash? What
>>     will actually happen or is there a bug report about it? Just want
>>     to make sure that we can do everything to secure data on bricks.
>>     We will look into upgrade but we have to make sure that new
>>     version works for us and of course get self healing working before
>>     doing anything :)
>>
>>
>> Locks xlator/module maintains a list of locks that are granted to a
>> client. Clear locks had an issue where it forgets to remove the lock
from
>> this list. So the connection list ends up pointing to data that is
freed in
>> that list after a clear lock. When a disconnect happens, all the locks
that
>> are granted to a client need to be unlocked. So the process starts
>> traversing through this list and when it starts trying to access this
freed
>> data it leads to a crash. I found it while reviewing a feature patch
sent
>> by facebook folks to locks xlator (http://review.gluster.org/14816) for
>> 3.9.0 and they also fixed this bug as well as part of that feature
patch.
>>
>>
>>     Br,
>>     Samuli
>>
>>
>>         3.8.x is EOLed, so I recommend you to upgrade to a supported
>>         version
>>         soon.
>>
>>             Best regards,
>>             Samuli Heinonen
>>
>>                 Samuli Heinonen
>>                 20 January 2018 at 21.57
>>
>>                 Hi all!
>>
>>                 One hypervisor on our virtualization environment
>>                 crashed and now
>>                 some of the VM images cannot be accessed. After
>>                 investigation we
>>                 found out that there was lots of images that still had
>>                 active lock
>>                 on crashed hypervisor. We were able to remove locks
>>                 from "regular
>>                 files", but it doesn't seem possible to remove
locks
>>                 from shards.
>>
>>                 We are running GlusterFS 3.8.15 on all nodes.
>>
>>                 Here is part of statedump that shows shard having
>>                 active lock on
>>                 crashed node:
>>                 [xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
>>                 path=/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
>>                 mandatory=0
>>                 inodelk-count=1
>>                 lock-dump.domain.domain=zone2-
>> ssd1-vmstor1-replicate-0:metadata
>>                 lock-dump.domain.domain=zone2-
>> ssd1-vmstor1-replicate-0:self-heal
>>                 lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
>>                 inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0,
>>                 start=0, len=0,
>>                 pid = 3568, owner=14ce372c397f0000,
client=0x7f3198388770,
>>                 connection-id
>>
>>
>>         ovirt8z2.xxx-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmst
>> or1-client-1-7-0,
>>
>>                 granted at 2018-01-20 08:57:24
>>
>>                 If we try to run clear-locks we get following error
>>                 message:
>>                 # gluster volume clear-locks zone2-ssd1-vmstor1
>>                 /.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21 kind
>>                 all inode
>>                 Volume clear-locks unsuccessful
>>                 clear-locks getxattr command failed. Reason: Operation
not
>>                 permitted
>>
>>                 Gluster vol info if needed:
>>                 Volume Name: zone2-ssd1-vmstor1
>>                 Type: Replicate
>>                 Volume ID: b6319968-690b-4060-8fff-b212d2295208
>>                 Status: Started
>>                 Snapshot Count: 0
>>                 Number of Bricks: 1 x 2 = 2
>>                 Transport-type: rdma
>>                 Bricks:
>>                 Brick1: sto1z2.xxx:/ssd1/zone2-vmstor1/export
>>                 Brick2: sto2z2.xxx:/ssd1/zone2-vmstor1/export
>>                 Options Reconfigured:
>>                 cluster.shd-wait-qlength: 10000
>>                 cluster.shd-max-threads: 8
>>                 cluster.locking-scheme: granular
>>                 performance.low-prio-threads: 32
>>                 cluster.data-self-heal-algorithm: full
>>                 performance.client-io-threads: off
>>                 storage.linux-aio: off
>>                 performance.readdir-ahead: on
>>                 client.event-threads: 16
>>                 server.event-threads: 16
>>                 performance.strict-write-ordering: off
>>                 performance.quick-read: off
>>                 performance.read-ahead: on
>>                 performance.io-cache: off
>>                 performance.stat-prefetch: off
>>                 cluster.eager-lock: enable
>>                 network.remote-dio: on
>>                 cluster.quorum-type: none
>>                 network.ping-timeout: 22
>>                 performance.write-behind: off
>>                 nfs.disable: on
>>                 features.shard: on
>>                 features.shard-block-size: 512MB
>>                 storage.owner-uid: 36
>>                 storage.owner-gid: 36
>>                 performance.io-thread-count: 64
>>                 performance.cache-size: 2048MB
>>                 performance.write-behind-window-size: 256MB
>>                 server.allow-insecure: on
>>                 cluster.ensure-durability: off
>>                 config.transport: rdma
>>                 server.outstanding-rpc-limit: 512
>>                 diagnostics.brick-log-level: INFO
>>
>>                 Any recommendations how to advance from here?
>>
>>                 Best regards,
>>                 Samuli Heinonen
>>
>>                 _______________________________________________
>>                 Gluster-users mailing list
>>                 Gluster-users at gluster.org
>>                 <mailto:Gluster-users at gluster.org>
>>                 http://lists.gluster.org/mailman/listinfo/gluster-users
<
>> http://lists.gluster.org/mailman/listinfo/gluster-users>
>>                 [1]
>>
>>
>>             _______________________________________________
>>             Gluster-users mailing list
>>             Gluster-users at gluster.org <mailto:Gluster-users at
gluster.org>
>>             http://lists.gluster.org/mailman/listinfo/gluster-users
>>            
<http://lists.gluster.org/mailman/listinfo/gluster-users> [1]
>>
>>
>>         --
>>
>>         Pranith
>>
>>
>>         Links:
>>         ------
>>         [1] http://lists.gluster.org/mailman/listinfo/gluster-users
>>         <http://lists.gluster.org/mailman/listinfo/gluster-users>
>>
>>
>>
>>
>> --
>> Pranith
>> Samuli Heinonen <mailto:samppah at neutraali.net>
>> 21 January 2018 at 21.03
>> Hi again,
>>
>> here is more information regarding issue described earlier
>>
>> It looks like self healing is stuck. According to "heal
statistics" crawl
>> began at Sat Jan 20 12:56:19 2018 and it's still going on (It's
around Sun
>> Jan 21 20:30 when writing this). However glustershd.log says that last
heal
>> was completed at "2018-01-20 11:00:13.090697" (which is 13:00
UTC+2). Also
>> "heal info" has been running now for over 16 hours without
any information.
>> In statedump I can see that storage nodes have locks on files and some
of
>> those are blocked. Ie. Here again it says that ovirt8z2 is having
active
>> lock even ovirt8z2 crashed after the lock was granted.:
>>
>> [xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
>> path=/.shard/3d55f8cc-cda9-489a-b0a3-fd0f43d67876.27
>> mandatory=0
>> inodelk-count=3
>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
>> inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid
>> 18446744073709551610, owner=d0c6d857a87f0000, client=0x7f885845efa0,
>> connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-
>> zone2-ssd1-vmstor1-client-0-0-0, granted at 2018-01-20 10:59:52
>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
>> inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid
>> 3420, owner=d8b9372c397f0000, client=0x7f8858410be0,
connection-id>> ovirt8z2.xxx.com-5652-2017/12/27-09:49:02:9468
>> 25-zone2-ssd1-vmstor1-client-0-7-0, granted at 2018-01-20 08:57:23
>> inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid
>> 18446744073709551610, owner=d0c6d857a87f0000, client=0x7f885845efa0,
>> connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-
>> zone2-ssd1-vmstor1-client-0-0-0, blocked at 2018-01-20 10:59:52
>>
>> I'd also like to add that volume had arbiter brick before crash
happened.
>> We decided to remove it because we thought that it was causing issues.
>> However now I think that this was unnecessary. After the crash arbiter
logs
>> had lots of messages like this:
>> [2018-01-20 10:19:36.515717] I [MSGID: 115072]
>> [server-rpc-fops.c:1640:server_setattr_cbk]
0-zone2-ssd1-vmstor1-server:
>> 37374187: SETATTR <gfid:a52055bd-e2e9-42dd-92a3-e96b693bcafe>
>> (a52055bd-e2e9-42dd-92a3-e96b693bcafe) ==> (Operation not permitted)
>> [Operation not permitted]
>>
>> Is there anyways to force self heal to stop? Any help would be very
much
>> appreciated :)
>>
>> Best regards,
>> Samuli Heinonen
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>> Samuli Heinonen <mailto:samppah at neutraali.net>
>>
>> 20 January 2018 at 21.57
>> Hi all!
>>
>> One hypervisor on our virtualization environment crashed and now some
of
>> the VM images cannot be accessed. After investigation we found out that
>> there was lots of images that still had active lock on crashed
hypervisor.
>> We were able to remove locks from "regular files", but it
doesn't seem
>> possible to remove locks from shards.
>>
>> We are running GlusterFS 3.8.15 on all nodes.
>>
>> Here is part of statedump that shows shard having active lock on
crashed
>> node:
>> [xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
>> path=/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
>> mandatory=0
>> inodelk-count=1
>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
>> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
>> inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid
>> 3568, owner=14ce372c397f0000, client=0x7f3198388770, connection-id
>>
ovirt8z2.xxx-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-1-7-0,
>> granted at 2018-01-20 08:57:24
>>
>> If we try to run clear-locks we get following error message:
>> # gluster volume clear-locks zone2-ssd1-vmstor1
>> /.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21 kind all inode
>> Volume clear-locks unsuccessful
>> clear-locks getxattr command failed. Reason: Operation not permitted
>>
>> Gluster vol info if needed:
>> Volume Name: zone2-ssd1-vmstor1
>> Type: Replicate
>> Volume ID: b6319968-690b-4060-8fff-b212d2295208
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 1 x 2 = 2
>> Transport-type: rdma
>> Bricks:
>> Brick1: sto1z2.xxx:/ssd1/zone2-vmstor1/export
>> Brick2: sto2z2.xxx:/ssd1/zone2-vmstor1/export
>> Options Reconfigured:
>> cluster.shd-wait-qlength: 10000
>> cluster.shd-max-threads: 8
>> cluster.locking-scheme: granular
>> performance.low-prio-threads: 32
>> cluster.data-self-heal-algorithm: full
>> performance.client-io-threads: off
>> storage.linux-aio: off
>> performance.readdir-ahead: on
>> client.event-threads: 16
>> server.event-threads: 16
>> performance.strict-write-ordering: off
>> performance.quick-read: off
>> performance.read-ahead: on
>> performance.io-cache: off
>> performance.stat-prefetch: off
>> cluster.eager-lock: enable
>> network.remote-dio: on
>> cluster.quorum-type: none
>> network.ping-timeout: 22
>> performance.write-behind: off
>> nfs.disable: on
>> features.shard: on
>> features.shard-block-size: 512MB
>> storage.owner-uid: 36
>> storage.owner-gid: 36
>> performance.io-thread-count: 64
>> performance.cache-size: 2048MB
>> performance.write-behind-window-size: 256MB
>> server.allow-insecure: on
>> cluster.ensure-durability: off
>> config.transport: rdma
>> server.outstanding-rpc-limit: 512
>> diagnostics.brick-log-level: INFO
>>
>> Any recommendations how to advance from here?
>>
>> Best regards,
>> Samuli Heinonen
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
>

-- 
Pranith
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20180125/f5481463/attachment.html>

Seemingly Similar Threads

Search for more reasonably related threads

Gluster users - Jan 2018 - Stale locks on shards

[Gluster-users] Stale locks on shards

[Gluster-users] Stale locks on shards

[Gluster-users] Stale locks on shards

Seemingly Similar Threads