thr3ads.net - Gluster users - [Gluster-users] Unable to make HA work; mounts hang on remote node reboot [Apr 2015]

If this information is useful, please help other people find it:
Share via:

CJ Baar

2015-Apr-27 19:59 UTC

[Gluster-users] Unable to make HA work; mounts hang on remote node reboot

FYI, I?ve tried with both glusterfs and NFS mounts, and the reaction is the
same. The value of ping.timeout seems to have no effect at all.

I did discover one thing that makes a difference on reboot. There is a second
service descriptor for ?glusterfsd?, which is not enabled by default, but is
started by something else (glusterd, I assume?). However, whatever it is that
starts the process, does not shut it down cleanly during a reboot? and it
appears to be the loss of that process without de-registration in the peer group
that causes the other nodes to hang. If I enable the service (chkconfig
glusterfsd on), it does nothing by default because the config is commented out
(/etc/sysconfig/glusterfsd). But, having those K scripts in place in rc.d, I can
manually touch /var/lock/subsys/glusterfsd, and then I can successfully reboot
one node without the others hanging. This at least helps when I need to take a
node down for maintenance; it obviously still does nothing for a true node
failure.

I guess my next step is to figure out to modify the init scripts for glusterd to
touch the other lock file on startup as well. Does not seem a very elegant
solution, but having the lock file in place and the init scripts enabled seems
to solve at least half of the issue.

?CJ


> On Apr 25, 2015, at 11:34 AM, Corey Kovacs <corey.kovacs at
gmail.com> wrote:
> 
> That's not cool..you certainly have a quorum. are you using the fuse
client or regular old nfs?
> 
> C
> 
> On Apr 24, 2015 4:50 PM, "CJ Baar" <gsml at ffisys.com
<mailto:gsml at ffisys.com>> wrote:
> Corey?
> I was able to get a third node setup. I recreated the volume as ?replica
3?. The hang still happens (on two nodes, now) when I reboot a single node, even
though two are still surviving, which should constitute a quorum.
> ?CJ
> 
> 
>> On Apr 17, 2015, at 6:18 AM, Corey Kovacs <corey.kovacs at gmail.com
<mailto:corey.kovacs at gmail.com>> wrote:
>> 
>> Typically you need to meet a quorum requirement to run just about any
cluster.  By definition,  two nodes doesn't make a good cluster. A third
node would let you start with just two since that would allow you to meet
quorum. Can you add a third node to at least test?
>> 
>> Corey
>> 
>> On Apr 16, 2015 6:52 PM, "CJ Baar" <gsml at ffisys.com
<mailto:gsml at ffisys.com>> wrote:
>> I appreciate the info. I have tried adjust the ping-timeout setting,
and it has seems to have no effect. The whole system hangs for 45+ seconds,
which is about what it takes the second node to reboot, no matter what the value
of ping-timeout is.  The output of the mnt-log is below.  It shows the adjust
value I am currently testing (30s), but the system still hangs for longer than
that.
>> 
>> Also, I have realized that the problem is deeper than I originally
thought.  It?s not just the mount that is hanging when a node reboots? it
appears to be the entire system.  I cannot use my SSH connection, no matter
where I am in the system, and services such as httpd become unresponsive.  I can
ping the ?surviving? system, but other than that it appears pretty unusable. 
This is a major drawback to using gluster.  I can?t afford to lost two entire
systems if one dies.
>> 
>> [2015-04-16 22:59:21.281365] C
[rpc-clnt-ping.c:109:rpc_clnt_ping_timer_expired] 0-common-client-0: server
172.31.64.200:49152 <http://172.31.64.200:49152/> has not responded in the
last 30 seconds, disconnecting.
>> [2015-04-16 22:59:21.281560] E [rpc-clnt.c:362:saved_frames_unwind]
(--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x7fce96450550]
(--> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e7)[0x7fce96225787]
(--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fce9622589e]
(-->
/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x91)[0x7fce96225951]
(--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x15f)[0x7fce96225f1f] )))))
0-common-client-0: forced unwinding frame type(GlusterFS 3.3) op(LOOKUP(27))
called at 2015-04-16 22:58:45.830962 (xid=0x6d)
>> [2015-04-16 22:59:21.281588] W
[client-rpc-fops.c:2766:client3_3_lookup_cbk] 0-common-client-0: remote
operation failed: Transport endpoint is not connected. Path: /
(00000000-0000-0000-0000-000000000001)
>> [2015-04-16 22:59:21.281788] E [rpc-clnt.c:362:saved_frames_unwind]
(--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x7fce96450550]
(--> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e7)[0x7fce96225787]
(--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fce9622589e]
(-->
/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x91)[0x7fce96225951]
(--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x15f)[0x7fce96225f1f] )))))
0-common-client-0: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at
2015-04-16 22:58:51.277528 (xid=0x6e)
>> [2015-04-16 22:59:21.281806] W [rpc-clnt-ping.c:154:rpc_clnt_ping_cbk]
0-common-client-0: socket disconnected
>> [2015-04-16 22:59:21.281816] I [client.c:2215:client_rpc_notify]
0-common-client-0: disconnected from common-client-0. Client process will keep
trying to connect to glusterd until brick's port is available
>> [2015-04-16 22:59:21.283637] I [socket.c:3292:socket_submit_request]
0-common-client-0: not connected (priv->connected = 0)
>> [2015-04-16 22:59:21.283663] W [rpc-clnt.c:1562:rpc_clnt_submit]
0-common-client-0: failed to submit rpc-request (XID: 0x6f Program: GlusterFS
3.3, ProgVers: 330, Proc: 27) to rpc-transport (common-client-0)
>> [2015-04-16 22:59:21.283674] W
[client-rpc-fops.c:2766:client3_3_lookup_cbk] 0-common-client-0: remote
operation failed: Transport endpoint is not connected. Path: /src
(63fc077b-869d-4928-8819-a79cc5c5ffa6)
>> [2015-04-16 22:59:21.284219] W
[client-rpc-fops.c:2766:client3_3_lookup_cbk] 0-common-client-0: remote
operation failed: Transport endpoint is not connected. Path: (null)
(00000000-0000-0000-0000-000000000000)
>> [2015-04-16 22:59:52.322952] E
[client-handshake.c:1496:client_query_portmap_cbk] 0-common-client-0: failed to
get the port number for [root at cfm-c glusterfs]#
>> 
>> 
>> ?CJ
>> 
>> 
>> 
>>> On Apr 7, 2015, at 10:26 PM, Ravishankar N <ravishankar at
redhat.com <mailto:ravishankar at redhat.com>> wrote:
>>> 
>>> 
>>> 
>>> On 04/07/2015 10:11 PM, CJ Baar wrote:
>>>> Then, I issue ?init 0? on node2, and the mount on node1 becomes
unresponsive. This is the log from node1
>>>> [2015-04-07 16:36:04.250693] W
[glusterd-op-sm.c:4021:glusterd_op_modify_op_ctx] 0-management: op_ctx
modification failed
>>>> [2015-04-07 16:36:04.251102] I
[glusterd-handler.c:3803:__glusterd_handle_status_volume] 0-management: Received
status volume req for volume test1
>>>> The message "I [MSGID: 106004]
[glusterd-handler.c:4365:__glusterd_peer_rpc_notify] 0-management: Peer
1069f037-13eb-458e-a9c4-0e7e79e595d0, in Peer in Cluster state, has disconnected
from glusterd." repeated 39 times between [2015-04-07 16:34:40.609878] and
[2015-04-07 16:36:37.752489]
>>>> [2015-04-07 16:36:40.755989] I [MSGID: 106004]
[glusterd-handler.c:4365:__glusterd_peer_rpc_notify] 0-management: Peer
1069f037-13eb-458e-a9c4-0e7e79e595d0, in Peer in Cluster state, has disconnected
from glusterd.
>>> This is the glusterd log. Could you also share the mount log of the
healthy node in the non-responsive -->responsive time interval?
>>> If this is indeed the ping timer issue, you should see something
like: "server xxx has not responded in the last 42 seconds,
disconnecting."
>>> Have you, for testing sake, tried reducing the network.ping-timeout
value to something lower and checked that the hang happens only for that time?
>>>> 
>>>> This does not seem like desired behaviour. I was trying to
create this cluster because I was under the impression it would be more
resilient than a single-point-of-failure NFS server. However, if the mount halts
when one node in the cluster dies, then I?m no better off.
>>>> 
>>>> I also can?t seem to figure out how to bring a volume online if
only one node in the cluster is running; again, not really functioning as HA.
The gluster service runs and the volume ?starts?, but it is not ?online? or
mountable until both nodes are running. In a situation where a node fails and we
need storage online before we can troubleshoot the cause of the node failure,
how do I get a volume to go online?
>>> This is expected behavior. In a two node cluster, if only one is
powered on, glusterd will not start other gluster processes (brick, nfs, shd )
until the glusterd of the other node is also up (i.e. quorum is met). If you
want to override this behavior, do a `gluster vol start <volname> force`
on the node that is up.
>>> 
>>> -Ravi
>>>> 
>>>> Thanks.
>>> 
>> 
>> 
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org <mailto:Gluster-users at
gluster.org>
>> http://www.gluster.org/mailman/listinfo/gluster-users
<http://www.gluster.org/mailman/listinfo/gluster-users>
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150427/3f18909d/attachment.html>

Corey Kovacs

2015-Apr-28 07:28 UTC

head link

[Gluster-users] Unable to make HA work; mounts hang on remote node reboot

Someone correct me if i am wrong, but glusterfsd is for self healing as I
recall. Its launched when it's needed.

On Mon, Apr 27, 2015 at 1:59 PM, CJ Baar <gsml at ffisys.com> wrote:
> FYI, I?ve tried with both glusterfs and NFS mounts, and the reaction is
> the same. The value of ping.timeout seems to have no effect at all.
>
> I did discover one thing that makes a difference on reboot. There is a
> second service descriptor for ?glusterfsd?, which is not enabled by
> default, but is started by something else (glusterd, I assume?). However,
> whatever it is that starts the process, does not shut it down cleanly
> during a reboot? and it appears to be the loss of that process without
> de-registration in the peer group that causes the other nodes to hang. If I
> enable the service (chkconfig glusterfsd on), it does nothing by default
> because the config is commented out (/etc/sysconfig/glusterfsd). But,
> having those K scripts in place in rc.d, I can manually touch
> /var/lock/subsys/glusterfsd, and then I can successfully reboot one node
> without the others hanging. This at least helps when I need to take a node
> down for maintenance; it obviously still does nothing for a true node
> failure.
>
> I guess my next step is to figure out to modify the init scripts for
> glusterd to touch the other lock file on startup as well. Does not seem a
> very elegant solution, but having the lock file in place and the init
> scripts enabled seems to solve at least half of the issue.
>
> ?CJ
>
>
>
> On Apr 25, 2015, at 11:34 AM, Corey Kovacs <corey.kovacs at
gmail.com> wrote:
>
> That's not cool..you certainly have a quorum. are you using the fuse
> client or regular old nfs?
>
> C
> On Apr 24, 2015 4:50 PM, "CJ Baar" <gsml at ffisys.com>
wrote:
>
>> Corey?
>> I was able to get a third node setup. I recreated the volume as
?replica
>> 3?. The hang still happens (on two nodes, now) when I reboot a single
node,
>> even though two are still surviving, which should constitute a quorum.
>> ?CJ
>>
>>
>> On Apr 17, 2015, at 6:18 AM, Corey Kovacs <corey.kovacs at
gmail.com> wrote:
>>
>> Typically you need to meet a quorum requirement to run just about any
>> cluster.  By definition,  two nodes doesn't make a good cluster. A
third
>> node would let you start with just two since that would allow you to
meet
>> quorum. Can you add a third node to at least test?
>>
>> Corey
>> On Apr 16, 2015 6:52 PM, "CJ Baar" <gsml at ffisys.com>
wrote:
>>
>>> I appreciate the info. I have tried adjust the ping-timeout
setting, and
>>> it has seems to have no effect. The whole system hangs for 45+
seconds,
>>> which is about what it takes the second node to reboot, no matter
what the
>>> value of ping-timeout is.  The output of the mnt-log is below.  It
shows
>>> the adjust value I am currently testing (30s), but the system still
hangs
>>> for longer than that.
>>>
>>> Also, I have realized that the problem is deeper than I originally
>>> thought.  It?s not just the mount that is hanging when a node
reboots? it
>>> appears to be the entire system.  I cannot use my SSH connection,
no matter
>>> where I am in the system, and services such as httpd become
unresponsive.
>>> I can ping the ?surviving? system, but other than that it appears
pretty
>>> unusable.  This is a major drawback to using gluster.  I can?t
afford to
>>> lost two entire systems if one dies.
>>>
>>> [2015-04-16 22:59:21.281365] C
>>> [rpc-clnt-ping.c:109:rpc_clnt_ping_timer_expired]
0-common-client-0: server
>>> 172.31.64.200:49152 has not responded in the last 30 seconds,
>>> disconnecting.
>>> [2015-04-16 22:59:21.281560] E [rpc-clnt.c:362:saved_frames_unwind]
(-->
>>>
/usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x7fce96450550] (-->
>>> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e7)[0x7fce96225787]
(-->
>>> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fce9622589e]
(-->
>>>
/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x91)[0x7fce96225951]
>>> (-->
/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x15f)[0x7fce96225f1f] )))))
>>> 0-common-client-0: forced unwinding frame type(GlusterFS 3.3)
>>> op(LOOKUP(27)) called at 2015-04-16 22:58:45.830962 (xid=0x6d)
>>> [2015-04-16 22:59:21.281588] W
>>> [client-rpc-fops.c:2766:client3_3_lookup_cbk] 0-common-client-0:
remote
>>> operation failed: Transport endpoint is not connected. Path: /
>>> (00000000-0000-0000-0000-000000000001)
>>> [2015-04-16 22:59:21.281788] E [rpc-clnt.c:362:saved_frames_unwind]
(-->
>>>
/usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x7fce96450550] (-->
>>> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e7)[0x7fce96225787]
(-->
>>> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fce9622589e]
(-->
>>>
/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x91)[0x7fce96225951]
>>> (-->
/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x15f)[0x7fce96225f1f] )))))
>>> 0-common-client-0: forced unwinding frame type(GF-DUMP) op(NULL(2))
called
>>> at 2015-04-16 22:58:51.277528 (xid=0x6e)
>>> [2015-04-16 22:59:21.281806] W
[rpc-clnt-ping.c:154:rpc_clnt_ping_cbk]
>>> 0-common-client-0: socket disconnected
>>> [2015-04-16 22:59:21.281816] I [client.c:2215:client_rpc_notify]
>>> 0-common-client-0: disconnected from common-client-0. Client
process will
>>> keep trying to connect to glusterd until brick's port is
available
>>> [2015-04-16 22:59:21.283637] I
[socket.c:3292:socket_submit_request]
>>> 0-common-client-0: not connected (priv->connected = 0)
>>> [2015-04-16 22:59:21.283663] W [rpc-clnt.c:1562:rpc_clnt_submit]
>>> 0-common-client-0: failed to submit rpc-request (XID: 0x6f Program:
>>> GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport
(common-client-0)
>>> [2015-04-16 22:59:21.283674] W
>>> [client-rpc-fops.c:2766:client3_3_lookup_cbk] 0-common-client-0:
remote
>>> operation failed: Transport endpoint is not connected. Path: /src
>>> (63fc077b-869d-4928-8819-a79cc5c5ffa6)
>>> [2015-04-16 22:59:21.284219] W
>>> [client-rpc-fops.c:2766:client3_3_lookup_cbk] 0-common-client-0:
remote
>>> operation failed: Transport endpoint is not connected. Path: (null)
>>> (00000000-0000-0000-0000-000000000000)
>>> [2015-04-16 22:59:52.322952] E
>>> [client-handshake.c:1496:client_query_portmap_cbk]
0-common-client-0:
>>> failed to get the port number for [root at cfm-c glusterfs]#
>>>
>>>
>>> ?CJ
>>>
>>>
>>>
>>> On Apr 7, 2015, at 10:26 PM, Ravishankar N <ravishankar at
redhat.com>
>>> wrote:
>>>
>>>
>>>
>>> On 04/07/2015 10:11 PM, CJ Baar wrote:
>>>
>>> Then, I issue ?init 0? on node2, and the mount on node1 becomes
>>> unresponsive. This is the log from node1
>>> [2015-04-07 16:36:04.250693] W
>>> [glusterd-op-sm.c:4021:glusterd_op_modify_op_ctx] 0-management:
op_ctx
>>> modification failed
>>> [2015-04-07 16:36:04.251102] I
>>> [glusterd-handler.c:3803:__glusterd_handle_status_volume]
0-management:
>>> Received status volume req for volume test1
>>> The message "I [MSGID: 106004]
>>> [glusterd-handler.c:4365:__glusterd_peer_rpc_notify] 0-management:
Peer
>>> 1069f037-13eb-458e-a9c4-0e7e79e595d0, in Peer in Cluster state, has
>>> disconnected from glusterd." repeated 39 times between
[2015-04-07
>>> 16:34:40.609878] and [2015-04-07 16:36:37.752489]
>>> [2015-04-07 16:36:40.755989] I [MSGID: 106004]
>>> [glusterd-handler.c:4365:__glusterd_peer_rpc_notify] 0-management:
Peer
>>> 1069f037-13eb-458e-a9c4-0e7e79e595d0, in Peer in Cluster state, has
>>> disconnected from glusterd.
>>>
>>> This is the glusterd log. Could you also share the mount log of the
>>> healthy node in the non-responsive -->responsive time interval?
>>> If this is indeed the ping timer issue, you should see something
like:
>>> "server xxx has not responded in the last 42 seconds,
disconnecting."
>>> Have you, for testing sake, tried reducing the network.ping-timeout
>>> value to something lower and checked that the hang happens only for
that
>>> time?
>>>
>>>
>>> This does not seem like desired behaviour. I was trying to create
this
>>> cluster because I was under the impression it would be more
resilient than
>>> a single-point-of-failure NFS server. However, if the mount halts
when one
>>> node in the cluster dies, then I?m no better off.
>>>
>>> I also can?t seem to figure out how to bring a volume online if
only one
>>> node in the cluster is running; again, not really functioning as
HA. The
>>> gluster service runs and the volume ?starts?, but it is not
?online? or
>>> mountable until both nodes are running. In a situation where a node
fails
>>> and we need storage online before we can troubleshoot the cause of
the node
>>> failure, how do I get a volume to go online?
>>>
>>> This is expected behavior. In a two node cluster, if only one is
powered
>>> on, glusterd will not start other gluster processes (brick, nfs,
shd )
>>> until the glusterd of the other node is also up (i.e. quorum is
met). If
>>> you want to override this behavior, do a `gluster vol start
<volname>
>>> force` on the node that is up.
>>>
>>> -Ravi
>>>
>>>
>>> Thanks.
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>
>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150428/7bcaa0e5/attachment.html>

Gluster users - Apr 2015 - Unable to make HA work; mounts hang on remote node reboot

[Gluster-users] Unable to make HA work; mounts hang on remote node reboot

[Gluster-users] Unable to make HA work; mounts hang on remote node reboot