thr3ads.net - Gluster users - [Gluster-users] Volume ping-timeout parameter and client side mount timeouts [Nov 2016]

If this information is useful, please help other people find it:
Share via:

Martin Schlegel

2016-Nov-14 15:43 UTC

[Gluster-users] Volume ping-timeout parameter and client side mount timeouts

Hello Gluster Community

We have 2x brick nodes running with replication for a volume gv0 for which set a
"gluster volume set gv0?ping-timeout 20".

In our tests it seemed there is unknown delay with this ping-timeout - we see it
timing out much later after about 35 seconds and not at around 20 seconds (see
test below).

Our distributed database cluster is using Gluster as a secondary file system for
backups etc. - it's Pacemaker cluster manager needs to know how long to wait
before giving up on the glusterfs mounted file system to become available again
or when to failover to another node.

1. When do we know when to give up waiting on the glusterfs mount point to
become accessible again following an outage on the brick server this client was
connected to ?
2. Is there a timeout / interval setting on the client side that we could
reduce, so that it more quickly tries to switch the mount point to a different,
available brick server ?


Regards,
Martin Schlegel

__________

Here is how we tested this:

As a test we blocked the entire network?on one of these brick nodes:
root at glusterfs-brick-node1 $ date;iptables -A INPUT -i bond0 -j DROP ;
iptables
-A OUTPUT -o bond0 -j DROP
Mon Nov 14 08:26:55 UTC 2016
>From the syslog on the glusterfs-client-nodeNov 14 08:27:30 glusterfs-client-node1 pgshared1[26783]: [2016-11-14
08:27:30.275694] C [rpc-clnt-ping.c:160:rpc_clnt_ping_timer_expired]
0-gv0-client-0: server glusterfs-brick-node1:49152 has not responded in the last
20 seconds, disconnecting.

<--- This last message "has not responded in the last 20 seconds"
is confusing
to me, because the brick node was clearly blocked for 35 seconds already ! Is
there some client-side check interval that can be reduced ?

Mohammed Rafi K C

2016-Nov-15 09:52 UTC

head link

[Gluster-users] Volume ping-timeout parameter and client side mount timeouts

If I understand the query correctly, the problem is that gluster takes
more than 20seconds to timeout even though the brick was offline for
more than 35s. With that assumptions I have some

How did you understand that the timer has expired after 35s only, by log
file? If so glusterfs wait  some time to flush the logs to console to
push it as batch, not sure how long. So the actual timing in the logs
may not be accurate.

If you have already confirmed that by using a wireshark or similar tools
that it takes more than 20seconds to disconnect the socket, then there
could be some thing else which we need to look into.

Can you conform that using wireshark or similar tools if not already done.


Rafi KC


On 11/14/2016 09:13 PM, Martin Schlegel wrote:> Hello Gluster Community
>
> We have 2x brick nodes running with replication for a volume gv0 for which
set a
> "gluster volume set gv0 ping-timeout 20".
>
> In our tests it seemed there is unknown delay with this ping-timeout - we
see it
> timing out much later after about 35 seconds and not at around 20 seconds
(see
> test below).
>
> Our distributed database cluster is using Gluster as a secondary file
system for
> backups etc. - it's Pacemaker cluster manager needs to know how long to
wait
> before giving up on the glusterfs mounted file system to become available
again
> or when to failover to another node.
>
> 1. When do we know when to give up waiting on the glusterfs mount point to
> become accessible again following an outage on the brick server this client
was
> connected to ?
> 2. Is there a timeout / interval setting on the client side that we could
> reduce, so that it more quickly tries to switch the mount point to a
different,
> available brick server ?
>
>
> Regards,
> Martin Schlegel
>
> __________
>
> Here is how we tested this:
>
> As a test we blocked the entire network on one of these brick nodes:
> root at glusterfs-brick-node1 $ date;iptables -A INPUT -i bond0 -j DROP ;
iptables
> -A OUTPUT -o bond0 -j DROP
> Mon Nov 14 08:26:55 UTC 2016
>
> From the syslog on the glusterfs-client-node
> Nov 14 08:27:30 glusterfs-client-node1 pgshared1[26783]: [2016-11-14
> 08:27:30.275694] C [rpc-clnt-ping.c:160:rpc_clnt_ping_timer_expired]
> 0-gv0-client-0: server glusterfs-brick-node1:49152 has not responded in the
last
> 20 seconds, disconnecting.
>
> <--- This last message "has not responded in the last 20
seconds" is confusing
> to me, because the brick node was clearly blocked for 35 seconds already !
Is
> there some client-side check interval that can be reduced ?
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users

Gluster users - Nov 2016 - Volume ping-timeout parameter and client side mount timeouts

[Gluster-users] Volume ping-timeout parameter and client side mount timeouts

[Gluster-users] Volume ping-timeout parameter and client side mount timeouts