thr3ads.net - Gluster users - [Gluster-users] Gluster failure due to "0-management: Lock not released for <volumename>" [Jun 2017]

If this information is useful, please help other people find it:
Share via:

Victor Nomura

2017-Jun-21 18:10 UTC

[Gluster-users] Gluster failure due to "0-management: Lock not released for <volumename>"

Hi All,

 

I'm fairly new to Gluster (3.10.3) and got it going for a couple of months
now but suddenly after a power failure in our building it all came crashing
down.  No client is able to connect after powering back the 3 nodes I have
setup.

 

Looking at the logs, it looks like there's some sort of "Lock"
placed on the
volume which prevents all the clients from connecting to the Gluster
endpoint.

 

I can't even do a #gluster volume status all command IF more than 1 node is
powered up.  I have to shutdown node2-3 and then I am able to issue the
command on node1 to see volume status.  When all nodes are powered up and I
check the peer status, it says that all peers are connected.  Trying to
connect to the Gluster volume from all clients says gluster endpoint is not
available and times out. There are no network issues and each node can ping
each other and there are no firewalls or any other device between the nodes
and clients.

 

Please help if you think you know how to fix this.  I have a feeling it's
this "lock" that's not "released" due to the whole setup
losing power all of
a sudden.  I've tried restarting all the nodes, restarting glusterfs-server
etc. I'm out of ideas.

 

Thanks in advance!

 

Victor

 

Volume Name: teravolume

Type: Distributed-Replicate

Volume ID: 85af74d0-f1bc-4b0d-8901-4dea6e4efae5

Status: Started

Snapshot Count: 0

Number of Bricks: 3 x 2 = 6

Transport-type: tcp

Bricks:

Brick1: gfsnode1:/media/brick1

Brick2: gfsnode2:/media/brick1

Brick3: gfsnode3:/media/brick1

Brick4: gfsnode1:/media/brick2

Brick5: gfsnode2:/media/brick2

Brick6: gfsnode3:/media/brick2

Options Reconfigured:

nfs.disable: on

 

 

[2017-06-21 16:02:52.376709] W [MSGID: 106118]
[glusterd-handler.c:5913:__glusterd_peer_rpc_notify] 0-management: Lock not
released for teravolume

[2017-06-21 16:03:03.429032] I [MSGID: 106163]
[glusterd-handshake.c:1309:__glusterd_mgmt_hndsk_versions_ack] 0-management:
using the op-version 31000

[2017-06-21 16:13:13.326478] E [rpc-clnt.c:200:call_bail] 0-management:
bailing out frame type(Peer mgmt) op(--(2)) xid = 0x105 sent = 2017-06-21
16:03:03.202284. timeout = 600 for 192.168.150.52:$

[2017-06-21 16:13:13.326519] E [rpc-clnt.c:200:call_bail] 0-management:
bailing out frame type(Peer mgmt) op(--(2)) xid = 0x105 sent = 2017-06-21
16:03:03.204555. timeout = 600 for 192.168.150.53:$

[2017-06-21 16:18:34.456522] I [MSGID: 106004]
[glusterd-handler.c:5888:__glusterd_peer_rpc_notify] 0-management: Peer
<gfsnode2> (<e1e1caa5-9842-40d8-8492-a82b079879a3>), in state
<Peer in
Cluste$

[2017-06-21 16:18:34.456619] W
[glusterd-locks.c:675:glusterd_mgmt_v3_unlock]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x1f
879) [0x7fee6bc22879] -->/usr/lib/x86_64-l$

[2017-06-21 16:18:34.456638] W [MSGID: 106118]
[glusterd-handler.c:5913:__glusterd_peer_rpc_notify] 0-management: Lock not
released for teravolume

[2017-06-21 16:18:34.456661] I [MSGID: 106004]
[glusterd-handler.c:5888:__glusterd_peer_rpc_notify] 0-management: Peer
<gfsnode3> (<59b9effa-2b88-4764-9130-4f31c14c362e>), in state
<Peer in
Cluste$

[2017-06-21 16:18:34.456692] W
[glusterd-locks.c:675:glusterd_mgmt_v3_unlock]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x1f
879) [0x7fee6bc22879] -->/usr/lib/x86_64-l$

[2017-06-21 16:18:43.323944] I [MSGID: 106163]
[glusterd-handshake.c:1309:__glusterd_mgmt_hndsk_versions_ack] 0-management:
using the op-version 31000

[2017-06-21 16:18:34.456699] W [MSGID: 106118]
[glusterd-handler.c:5913:__glusterd_peer_rpc_notify] 0-management: Lock not
released for teravolume

[2017-06-21 16:18:45.628552] I [MSGID: 106163]
[glusterd-handshake.c:1309:__glusterd_mgmt_hndsk_versions_ack] 0-management:
using the op-version 31000

[2017-06-21 16:23:40.607173] I [MSGID: 106499]
[glusterd-handler.c:4363:__glusterd_handle_status_volume] 0-management:
Received status volume req for volume teravolume

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170621/e60697de/attachment.html>

Atin Mukherjee

2017-Jun-22 16:00 UTC

head link

[Gluster-users] Gluster failure due to "0-management: Lock not released for <volumename>"

Could you attach glusterd.log and cmd_history.log files from all the nodes?

On Wed, Jun 21, 2017 at 11:40 PM, Victor Nomura <victor at mezine.com>
wrote:
> Hi All,
>
>
>
> I?m fairly new to Gluster (3.10.3) and got it going for a couple of months
> now but suddenly after a power failure in our building it all came crashing
> down.  No client is able to connect after powering back the 3 nodes I have
> setup.
>
>
>
> Looking at the logs, it looks like there?s some sort of ?Lock? placed on
> the volume which prevents all the clients from connecting to the Gluster
> endpoint.
>
>
>
> I can?t even do a #gluster volume status all command IF more than 1 node
> is powered up.  I have to shutdown node2-3 and then I am able to issue the
> command on node1 to see volume status.  When all nodes are powered up and
> I check the peer status, it says that all peers are connected.  Trying to
> connect to the Gluster volume from all clients says gluster endpoint is not
> available and times out. There are no network issues and each node can
> ping each other and there are no firewalls or any other device between the
> nodes and clients.
>
>
>
> Please help if you think you know how to fix this.  I have a feeling it?s
> this ?lock? that?s not ?released? due to the whole setup losing power all
> of a sudden.  I?ve tried restarting all the nodes, restarting
> glusterfs-server etc. I?m out of ideas.
>
>
>
> Thanks in advance!
>
>
>
> Victor
>
>
>
> Volume Name: teravolume
>
> Type: Distributed-Replicate
>
> Volume ID: 85af74d0-f1bc-4b0d-8901-4dea6e4efae5
>
> Status: Started
>
> Snapshot Count: 0
>
> Number of Bricks: 3 x 2 = 6
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: gfsnode1:/media/brick1
>
> Brick2: gfsnode2:/media/brick1
>
> Brick3: gfsnode3:/media/brick1
>
> Brick4: gfsnode1:/media/brick2
>
> Brick5: gfsnode2:/media/brick2
>
> Brick6: gfsnode3:/media/brick2
>
> Options Reconfigured:
>
> nfs.disable: on
>
>
>
>
>
> [2017-06-21 16:02:52.376709] W [MSGID: 106118]
[glusterd-handler.c:5913:__glusterd_peer_rpc_notify]
> 0-management: Lock not released for teravolume
>
> [2017-06-21 16:03:03.429032] I [MSGID: 106163]
> [glusterd-handshake.c:1309:__glusterd_mgmt_hndsk_versions_ack]
> 0-management: using the op-version 31000
>
> [2017-06-21 16:13:13.326478] E [rpc-clnt.c:200:call_bail] 0-management:
> bailing out frame type(Peer mgmt) op(--(2)) xid = 0x105 sent = 2017-06-21
> 16:03:03.202284. timeout = 600 for 192.168.150.52:$
>
> [2017-06-21 16:13:13.326519] E [rpc-clnt.c:200:call_bail] 0-management:
> bailing out frame type(Peer mgmt) op(--(2)) xid = 0x105 sent = 2017-06-21
> 16:03:03.204555. timeout = 600 for 192.168.150.53:$
>
> [2017-06-21 16:18:34.456522] I [MSGID: 106004]
[glusterd-handler.c:5888:__glusterd_peer_rpc_notify]
> 0-management: Peer <gfsnode2>
(<e1e1caa5-9842-40d8-8492-a82b079879a3>),
> in state <Peer in Cluste$
>
> [2017-06-21 16:18:34.456619] W
[glusterd-locks.c:675:glusterd_mgmt_v3_unlock]
>
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x1f879)
> [0x7fee6bc22879] -->/usr/lib/x86_64-l$
>
> [2017-06-21 16:18:34.456638] W [MSGID: 106118]
[glusterd-handler.c:5913:__glusterd_peer_rpc_notify]
> 0-management: Lock not released for teravolume
>
> [2017-06-21 16:18:34.456661] I [MSGID: 106004]
[glusterd-handler.c:5888:__glusterd_peer_rpc_notify]
> 0-management: Peer <gfsnode3>
(<59b9effa-2b88-4764-9130-4f31c14c362e>),
> in state <Peer in Cluste$
>
> [2017-06-21 16:18:34.456692] W
[glusterd-locks.c:675:glusterd_mgmt_v3_unlock]
>
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x1f879)
> [0x7fee6bc22879] -->/usr/lib/x86_64-l$
>
> [2017-06-21 16:18:43.323944] I [MSGID: 106163]
> [glusterd-handshake.c:1309:__glusterd_mgmt_hndsk_versions_ack]
> 0-management: using the op-version 31000
>
> [2017-06-21 16:18:34.456699] W [MSGID: 106118]
[glusterd-handler.c:5913:__glusterd_peer_rpc_notify]
> 0-management: Lock not released for teravolume
>
> [2017-06-21 16:18:45.628552] I [MSGID: 106163]
> [glusterd-handshake.c:1309:__glusterd_mgmt_hndsk_versions_ack]
> 0-management: using the op-version 31000
>
> [2017-06-21 16:23:40.607173] I [MSGID: 106499]
[glusterd-handler.c:4363:__glusterd_handle_status_volume]
> 0-management: Received status volume req for volume teravolume
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170622/94c5b3b3/attachment.html>

Atin Mukherjee

2017-Jun-27 07:28 UTC

head link

[Gluster-users] Gluster failure due to "0-management: Lock not released for <volumename>"

I had looked at the logs shared by Victor privately and it seems to be
there is a N/W glitch in the cluster which is causing the glusterd to lose
its connection with other peers and as a side effect to this, lot of rpc
requests are getting bailed out resulting glusterd to end up into a stale
lock and hence you see that some of the commands failed with "another
transaction is in progress or locking failed."

Some examples of the symptom highlighted:

[2017-06-21 23:02:03.826858] E [rpc-clnt.c:200:call_bail] 0-management:
bailing out frame type(Peer mgmt) op(--(2)) xid = 0x4 sent = 2017-06-21
22:52:02.719068. timeout = 600 for 192.168.150.53:24007
[2017-06-21 23:02:03.826888] E [rpc-clnt.c:200:call_bail] 0-management:
bailing out frame type(Peer mgmt) op(--(2)) xid = 0x4 sent = 2017-06-21
22:52:02.716782. timeout = 600 for 192.168.150.52:24007
[2017-06-21 23:02:53.836936] E [rpc-clnt.c:200:call_bail] 0-management:
bailing out frame type(glusterd mgmt v3) op(--(1)) xid = 0x5 sent 2017-06-21
22:52:47.909169. timeout = 600 for 192.168.150.53:24007
[2017-06-21 23:02:53.836991] E [MSGID: 106116]
[glusterd-mgmt.c:124:gd_mgmt_v3_collate_errors] 0-management: Locking
failed on gfsnode3. Please check log file for details.
[2017-06-21 23:02:53.837016] E [rpc-clnt.c:200:call_bail] 0-management:
bailing out frame type(glusterd mgmt v3) op(--(1)) xid = 0x5 sent 2017-06-21
22:52:47.909175. timeout = 600 for 192.168.150.52:24007

I'd like you to request to first look at the N/W layer and rectify the
problems.





On Thu, Jun 22, 2017 at 9:30 PM, Atin Mukherjee <amukherj at redhat.com>
wrote:
> Could you attach glusterd.log and cmd_history.log files from all the nodes?
>
> On Wed, Jun 21, 2017 at 11:40 PM, Victor Nomura <victor at
mezine.com> wrote:
>
>> Hi All,
>>
>>
>>
>> I?m fairly new to Gluster (3.10.3) and got it going for a couple of
>> months now but suddenly after a power failure in our building it all
came
>> crashing down.  No client is able to connect after powering back the 3
>> nodes I have setup.
>>
>>
>>
>> Looking at the logs, it looks like there?s some sort of ?Lock? placed
on
>> the volume which prevents all the clients from connecting to the
Gluster
>> endpoint.
>>
>>
>>
>> I can?t even do a #gluster volume status all command IF more than 1
node
>> is powered up.  I have to shutdown node2-3 and then I am able to issue
the
>> command on node1 to see volume status.  When all nodes are powered up
>> and I check the peer status, it says that all peers are connected. 
Trying
>> to connect to the Gluster volume from all clients says gluster endpoint
is
>> not available and times out. There are no network issues and each node
>> can ping each other and there are no firewalls or any other device
between
>> the nodes and clients.
>>
>>
>>
>> Please help if you think you know how to fix this.  I have a feeling
it?s
>> this ?lock? that?s not ?released? due to the whole setup losing power
all
>> of a sudden.  I?ve tried restarting all the nodes, restarting
>> glusterfs-server etc. I?m out of ideas.
>>
>>
>>
>> Thanks in advance!
>>
>>
>>
>> Victor
>>
>>
>>
>> Volume Name: teravolume
>>
>> Type: Distributed-Replicate
>>
>> Volume ID: 85af74d0-f1bc-4b0d-8901-4dea6e4efae5
>>
>> Status: Started
>>
>> Snapshot Count: 0
>>
>> Number of Bricks: 3 x 2 = 6
>>
>> Transport-type: tcp
>>
>> Bricks:
>>
>> Brick1: gfsnode1:/media/brick1
>>
>> Brick2: gfsnode2:/media/brick1
>>
>> Brick3: gfsnode3:/media/brick1
>>
>> Brick4: gfsnode1:/media/brick2
>>
>> Brick5: gfsnode2:/media/brick2
>>
>> Brick6: gfsnode3:/media/brick2
>>
>> Options Reconfigured:
>>
>> nfs.disable: on
>>
>>
>>
>>
>>
>> [2017-06-21 16:02:52.376709] W [MSGID: 106118]
>> [glusterd-handler.c:5913:__glusterd_peer_rpc_notify] 0-management: Lock
>> not released for teravolume
>>
>> [2017-06-21 16:03:03.429032] I [MSGID: 106163]
>> [glusterd-handshake.c:1309:__glusterd_mgmt_hndsk_versions_ack]
>> 0-management: using the op-version 31000
>>
>> [2017-06-21 16:13:13.326478] E [rpc-clnt.c:200:call_bail] 0-management:
>> bailing out frame type(Peer mgmt) op(--(2)) xid = 0x105 sent =
2017-06-21
>> 16:03:03.202284. timeout = 600 for 192.168.150.52:$
>>
>> [2017-06-21 16:13:13.326519] E [rpc-clnt.c:200:call_bail] 0-management:
>> bailing out frame type(Peer mgmt) op(--(2)) xid = 0x105 sent =
2017-06-21
>> 16:03:03.204555. timeout = 600 for 192.168.150.53:$
>>
>> [2017-06-21 16:18:34.456522] I [MSGID: 106004]
>> [glusterd-handler.c:5888:__glusterd_peer_rpc_notify] 0-management: Peer
>> <gfsnode2> (<e1e1caa5-9842-40d8-8492-a82b079879a3>), in
state <Peer in
>> Cluste$
>>
>> [2017-06-21 16:18:34.456619] W
[glusterd-locks.c:675:glusterd_mgmt_v3_unlock]
>>
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x1f879)
>> [0x7fee6bc22879] -->/usr/lib/x86_64-l$
>>
>> [2017-06-21 16:18:34.456638] W [MSGID: 106118]
>> [glusterd-handler.c:5913:__glusterd_peer_rpc_notify] 0-management: Lock
>> not released for teravolume
>>
>> [2017-06-21 16:18:34.456661] I [MSGID: 106004]
>> [glusterd-handler.c:5888:__glusterd_peer_rpc_notify] 0-management: Peer
>> <gfsnode3> (<59b9effa-2b88-4764-9130-4f31c14c362e>), in
state <Peer in
>> Cluste$
>>
>> [2017-06-21 16:18:34.456692] W
[glusterd-locks.c:675:glusterd_mgmt_v3_unlock]
>>
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0x1f879)
>> [0x7fee6bc22879] -->/usr/lib/x86_64-l$
>>
>> [2017-06-21 16:18:43.323944] I [MSGID: 106163]
>> [glusterd-handshake.c:1309:__glusterd_mgmt_hndsk_versions_ack]
>> 0-management: using the op-version 31000
>>
>> [2017-06-21 16:18:34.456699] W [MSGID: 106118]
>> [glusterd-handler.c:5913:__glusterd_peer_rpc_notify] 0-management: Lock
>> not released for teravolume
>>
>> [2017-06-21 16:18:45.628552] I [MSGID: 106163]
>> [glusterd-handshake.c:1309:__glusterd_mgmt_hndsk_versions_ack]
>> 0-management: using the op-version 31000
>>
>> [2017-06-21 16:23:40.607173] I [MSGID: 106499]
>> [glusterd-handler.c:4363:__glusterd_handle_status_volume] 0-management:
>> Received status volume req for volume teravolume
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170627/99f0a294/attachment.html>

Reasonably Related Threads

Search for more possibly parallel threads

Gluster users - Jun 2017 - Gluster failure due to "0-management: Lock not released for <volumename>"

[Gluster-users] Gluster failure due to "0-management: Lock not released for <volumename>"

[Gluster-users] Gluster failure due to "0-management: Lock not released for <volumename>"

[Gluster-users] Gluster failure due to "0-management: Lock not released for <volumename>"

Reasonably Related Threads