thr3ads.net - Gluster users - [Gluster-users] Exact purpose of network.ping-timeout [Dec 2017]

If this information is useful, please help other people find it:
Share via:

Omar Kohl

2017-Dec-26 13:11 UTC

[Gluster-users] Exact purpose of network.ping-timeout

Hi,

I have a question regarding the "ping-timeout" option. I have been
researching its purpose for a few days and it is not completely clear to me.
Especially that it is apparently strongly encouraged by the Gluster community
not to change or at least decrease this value!

Assuming that I set ping-timeout to 10 seconds (instead of the default 42) this
would mean that if I have a network outage of 11 seconds then Gluster internally
would have to re-allocate some resources that it freed after the 10 seconds,
correct? But apart from that there are no negative implications, are there? For
instance if I'm copying files during the network outage then those files
will continue copying after those 11 seconds.

This means that the only purpose of ping-timeout is to save those extra
resources that are used by "short" network outages. Is that correct?

If I am confident that my network will not have many 11 second outages and if
they do occur I am willing to incur those extra costs due to resource allocation
is there any reason not to set ping-timeout to 10 seconds?

The problem I have with a long ping-timeout is that the Windows Samba Client
disconnects after 25 seconds. So if one of the nodes of a Gluster cluster shuts
down ungracefully then the Samba Client disconnects and the file that was being
copied is incomplete on the server. These "costs" seem to be much
higher than the potential costs of those Gluster resource re-allocations. But it
is hard to estimate because there is not clear documentation what exactly those
Gluster costs are.

In general I would be very interested in a comprehensive explanation of
ping-timeout and the up- and downsides of setting high or low values for it.

Kinds regards,
Omar

lemonnierk at ulrar.net

2017-Dec-26 21:05 UTC

head link

[Gluster-users] Exact purpose of network.ping-timeout

Hi,

It's just the delay for which a node can stop responding before being
marked as down.
Basically that's how long a node can go down before a heal becomes
necessary to bring it back.

If you set it to 10 seconds, and a node goes down, you'll see a 10
seconds freez in all I/O for the volume. That's why you don't want it
too high (having a 2 minutes freez on I/O for example would be
pretty bad, depending on what you host), but you don't want it too
low either (to avoid triggering heals all the time).

You can configure it because it depends on what you host. You might be
okay with a few minutes freez to avoid a heal, or you might not care
about heals at all and prefer a very low value to avoid feezes.
The default value should work pretty well for most things though

On Tue, Dec 26, 2017 at 01:11:48PM +0000, Omar Kohl
wrote:> Hi,
> 
> I have a question regarding the "ping-timeout" option. I have
been researching its purpose for a few days and it is not completely clear to
me. Especially that it is apparently strongly encouraged by the Gluster
community not to change or at least decrease this value!
> 
> Assuming that I set ping-timeout to 10 seconds (instead of the default 42)
this would mean that if I have a network outage of 11 seconds then Gluster
internally would have to re-allocate some resources that it freed after the 10
seconds, correct? But apart from that there are no negative implications, are
there? For instance if I'm copying files during the network outage then
those files will continue copying after those 11 seconds.
> 
> This means that the only purpose of ping-timeout is to save those extra
resources that are used by "short" network outages. Is that correct?
> 
> If I am confident that my network will not have many 11 second outages and
if they do occur I am willing to incur those extra costs due to resource
allocation is there any reason not to set ping-timeout to 10 seconds?
> 
> The problem I have with a long ping-timeout is that the Windows Samba
Client disconnects after 25 seconds. So if one of the nodes of a Gluster cluster
shuts down ungracefully then the Samba Client disconnects and the file that was
being copied is incomplete on the server. These "costs" seem to be
much higher than the potential costs of those Gluster resource re-allocations.
But it is hard to estimate because there is not clear documentation what exactly
those Gluster costs are.
> 
> In general I would be very interested in a comprehensive explanation of
ping-timeout and the up- and downsides of setting high or low values for it.
> 
> Kinds regards,
> Omar
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: Digital signature
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20171226/4d226d76/attachment.sig>

Omar Kohl

2017-Dec-27 11:17 UTC

head link

[Gluster-users] Exact purpose of network.ping-timeout

Hi,
> If you set it to 10 seconds, and a node goes down, you'll see a 10
seconds freez in all I/O for the volume.
Exactly! ONLY 10 seconds instead of the default 42 seconds :-)

As I said before the problem with the 42 seconds is that a Windows Samba Client
will disconnect (and therefore interrupt any read/write operation) after waiting
for about 25 seconds. So 42 seconds is too high. In this case it would therefore
make more sense to reduce the ping-timeout, right?

Has anyone done any performance measurements on what the implications of a low
ping-timeout are? What are the costs of "triggering heals all the
time"?

On a related note I found the extras/hook-scripts/start/post/S29CTDBsetup.sh
script that mounts a CTDB (Samba) share and explicitly sets the ping-timeout to
10 seconds. There is a comment saying: "Make sure ping-timeout is not
default for CTDB volume". Unfortunately there is no explanation in the
script, in the commit or in the Gerrit review history
(https://review.gluster.org/#/c/7569/, https://review.gluster.org/#/c/8007/) for
WHY you make sure ping-timeout is not default. Can anyone tell me the reason?

Kind regards,
Omar

-----Urspr?ngliche Nachricht-----
Von: gluster-users-bounces at gluster.org [mailto:gluster-users-bounces at
gluster.org] Im Auftrag von lemonnierk at ulrar.net
Gesendet: Dienstag, 26. Dezember 2017 22:05
An: gluster-users at gluster.org
Betreff: Re: [Gluster-users] Exact purpose of network.ping-timeout

Hi,

It's just the delay for which a node can stop responding before being marked
as down.
Basically that's how long a node can go down before a heal becomes necessary
to bring it back.

If you set it to 10 seconds, and a node goes down, you'll see a 10 seconds
freez in all I/O for the volume. That's why you don't want it too high
(having a 2 minutes freez on I/O for example would be pretty bad, depending on
what you host), but you don't want it too low either (to avoid triggering
heals all the time).

You can configure it because it depends on what you host. You might be okay with
a few minutes freez to avoid a heal, or you might not care about heals at all
and prefer a very low value to avoid feezes.
The default value should work pretty well for most things though

On Tue, Dec 26, 2017 at 01:11:48PM +0000, Omar Kohl
wrote:> Hi,
> 
> I have a question regarding the "ping-timeout" option. I have
been researching its purpose for a few days and it is not completely clear to
me. Especially that it is apparently strongly encouraged by the Gluster
community not to change or at least decrease this value!
> 
> Assuming that I set ping-timeout to 10 seconds (instead of the default 42)
this would mean that if I have a network outage of 11 seconds then Gluster
internally would have to re-allocate some resources that it freed after the 10
seconds, correct? But apart from that there are no negative implications, are
there? For instance if I'm copying files during the network outage then
those files will continue copying after those 11 seconds.
> 
> This means that the only purpose of ping-timeout is to save those extra
resources that are used by "short" network outages. Is that correct?
> 
> If I am confident that my network will not have many 11 second outages and
if they do occur I am willing to incur those extra costs due to resource
allocation is there any reason not to set ping-timeout to 10 seconds?
> 
> The problem I have with a long ping-timeout is that the Windows Samba
Client disconnects after 25 seconds. So if one of the nodes of a Gluster cluster
shuts down ungracefully then the Samba Client disconnects and the file that was
being copied is incomplete on the server. These "costs" seem to be
much higher than the potential costs of those Gluster resource re-allocations.
But it is hard to estimate because there is not clear documentation what exactly
those Gluster costs are.
> 
> In general I would be very interested in a comprehensive explanation of
ping-timeout and the up- and downsides of setting high or low values for it.
> 
> Kinds regards,
> Omar
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users

Raghavendra Gowdappa

2018-Jan-10 05:26 UTC

head link

[Gluster-users] Exact purpose of network.ping-timeout

Sorry about the delayed response. Had to dig into the history to answer various
"why"s.

----- Original Message -----> From: "Omar Kohl" <omar.kohl at iternity.com>
> To: gluster-users at gluster.org
> Sent: Tuesday, December 26, 2017 6:41:48 PM
> Subject: [Gluster-users] Exact purpose of network.ping-timeout
> 
> Hi,
> 
> I have a question regarding the "ping-timeout" option. I have
been
> researching its purpose for a few days and it is not completely clear to
me.
> Especially that it is apparently strongly encouraged by the Gluster
> community not to change or at least decrease this value!
> 
> Assuming that I set ping-timeout to 10 seconds (instead of the default 42)
> this would mean that if I have a network outage of 11 seconds then Gluster
> internally would have to re-allocate some resources that it freed after the
> 10 seconds, correct? But apart from that there are no negative
implications,
> are there? For instance if I'm copying files during the network outage
then
> those files will continue copying after those 11 seconds.
> 
> This means that the only purpose of ping-timeout is to save those extra
> resources that are used by "short" network outages. Is that
correct?
Basic purpose of ping-timer/heartbeat is to identify an unresponsive brick.
Unresponsiveness can be caused due to various reasons like:
* A deadlocked server. We no longer see too many instances of deadlocked
bricks/server
* Slow execution of fops in brick stack. For eg., 
    - due to lock contention. There have been some efforts to fix the lock
contention on brick stack.
    - bad backend OS/filesystem. Posix health checker was an effort to fix this.
    - Not enough threads for execution etc
  Note that ideally its not the job of ping framework to identify this scenario
and following the same thought process we've shielded the processing of ping
requests on bricks from the costs of execution of requests to Glusterfs Program.

* Ungraceful shutdown of network connections. For eg.,
    - hard shutdown of machine/container/VM running the brick
    - physically pulling out the network cable
  Basically all those different scenarios where TCP/IP doesn't get a chance
to inform the other end that it is going down. Note that some of the scenarios
of ungraceful network shutdown can be identified using TCP_KEEPALIVE and
TCP_USERTIMEOUT [1]. However, at the time when heartbeat mechanism was
introduced in Glusterfs, TCP_KEEPALIVE couldn't identify all the ungraceful
network shutdown scenarios and TCP_USER_TIMEOUT was yet to be implemented in
Linux kernel. One scenario which TCP_KEEPALIVE could identify was the exact
scenario TCP_USER_TIMEOUT aims to solve - identifying an hard network shutdown
when data is in transit. However there might be other limitations in
TCP_KEEPALIVE which we need to test out before retiring heart beat mechanism in
favor of TCP_KEEPALIVE and TCP_USER_TIMEOUT.

The next interesting question would be why we need to identify an unresponsive
brick. Various reasons why we need to do that would be:
* To replace/fix any problems the brick might have
* Almost all of the cluster translators - DHT, AFR, EC - wait for a response
from all of their children - either successful or failure - before sending the
response back to application. This means one or more slow/unresponsive brick can
increase the latencies of fops/syscalls even though other bricks are responsive
and healthy. However there are ongoing efforts to minimize the effect of few
slow/unresponsive bricks [2]. I think principles of [2] can applied to DHT and
AFR too.

Some recent discussions on the necessity of ping framework in glusterfs can be
found at [3].

Having given all the above reasons for the existence of ping framework, its also
important that ping-framework shouldn't bring down an otherwise healthy
connection (False positives). Reasons are:
* As pointed out by Joe Julian in another mail on this thread, each connection
carries some state on bricks like locks/open-fds which is cleaned up on a
disconnect. So, disconnects (even those followed by quick reconnects) are not
completely transient to application. Though presence of HA layers like EC/AFR
mitigates this problem to some extent, we still don't have a lock healing
implementation in place. So, once Quorum number of AFR/EC children go down
(though may not be all at once), locks are no longer held on bricks.
* All the fops that are in transit in the time window starting from the time of
disconnect till a successful reconnect are failed by rpc/transport layer. So,
based on the configuration of volumes (whether AFR/EC/DHT prevent these errors
from being seen by application), this *may* result in application seeing the
error.

IOW, disconnects are not lightweight and we need to avoid them whenever
possible. Since the action on ping-timer expiry is to disconnect the connection,
we suggest not have very low values to avoid spurious disconnections.

[1] http://man7.org/linux/man-pages/man7/tcp.7.html
[2] https://github.com/gluster/glusterfs/issues/366
[3] http://lists.gluster.org/pipermail/gluster-devel/2017-January/051938.html
> 
> If I am confident that my network will not have many 11 second outages and
if
> they do occur I am willing to incur those extra costs due to resource
> allocation is there any reason not to set ping-timeout to 10 seconds?
> 
> The problem I have with a long ping-timeout is that the Windows Samba
Client
> disconnects after 25 seconds. So if one of the nodes of a Gluster cluster
> shuts down ungracefully then the Samba Client disconnects and the file that
> was being copied is incomplete on the server. These "costs" seem
to be much
> higher than the potential costs of those Gluster resource re-allocations.
> But it is hard to estimate because there is not clear documentation what
> exactly those Gluster costs are.
> 
> In general I would be very interested in a comprehensive explanation of
> ping-timeout and the up- and downsides of setting high or low values for
it.
> 
> Kinds regards,
> Omar
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>

Raghavendra Gowdappa

2018-Jan-10 06:17 UTC

head link

[Gluster-users] Exact purpose of network.ping-timeout

----- Original Message -----> From: "Raghavendra Gowdappa" <rgowdapp at redhat.com>
> To: "Omar Kohl" <omar.kohl at iternity.com>
> Cc: gluster-users at gluster.org
> Sent: Wednesday, January 10, 2018 10:56:21 AM
> Subject: Re: [Gluster-users] Exact purpose of network.ping-timeout
> 
> Sorry about the delayed response. Had to dig into the history to answer
> various "why"s.
> 
> ----- Original Message -----
> > From: "Omar Kohl" <omar.kohl at iternity.com>
> > To: gluster-users at gluster.org
> > Sent: Tuesday, December 26, 2017 6:41:48 PM
> > Subject: [Gluster-users] Exact purpose of network.ping-timeout
> > 
> > Hi,
> > 
> > I have a question regarding the "ping-timeout" option. I
have been
> > researching its purpose for a few days and it is not completely clear
to
> > me.
> > Especially that it is apparently strongly encouraged by the Gluster
> > community not to change or at least decrease this value!
> > 
> > Assuming that I set ping-timeout to 10 seconds (instead of the default
42)
> > this would mean that if I have a network outage of 11 seconds then
Gluster
> > internally would have to re-allocate some resources that it freed
after the
> > 10 seconds, correct? But apart from that there are no negative
> > implications,
> > are there? For instance if I'm copying files during the network
outage then
> > those files will continue copying after those 11 seconds.
> > 
> > This means that the only purpose of ping-timeout is to save those
extra
> > resources that are used by "short" network outages. Is that
correct?
> 
> Basic purpose of ping-timer/heartbeat is to identify an unresponsive brick.
> Unresponsiveness can be caused due to various reasons like:
> * A deadlocked server. We no longer see too many instances of deadlocked
> bricks/server
> * Slow execution of fops in brick stack. For eg.,
>     - due to lock contention. There have been some efforts to fix the lock
>     contention on brick stack.
>     - bad backend OS/filesystem. Posix health checker was an effort to fix
>     this.
>     - Not enough threads for execution etc
>   Note that ideally its not the job of ping framework to identify this
>   scenario and following the same thought process we've shielded the
>   processing of ping requests on bricks from the costs of execution of
>   requests to Glusterfs Program.
> 
> * Ungraceful shutdown of network connections. For eg.,
>     - hard shutdown of machine/container/VM running the brick
>     - physically pulling out the network cable
>   Basically all those different scenarios where TCP/IP doesn't get a
chance
>   to inform the other end that it is going down. Note that some of the
>   scenarios of ungraceful network shutdown can be identified using
>   TCP_KEEPALIVE and TCP_USERTIMEOUT [1]. However, at the time when
heartbeat
>   mechanism was introduced in Glusterfs, TCP_KEEPALIVE couldn't
identify all
>   the ungraceful network shutdown scenarios and TCP_USER_TIMEOUT was yet to
>   be implemented in Linux kernel. One scenario which TCP_KEEPALIVE could
s/could/couldn't/
>   identify was the exact scenario TCP_USER_TIMEOUT aims to solve -
>   identifying an hard network shutdown when data is in transit. However
>   there might be other limitations in TCP_KEEPALIVE which we need to test
>   out before retiring heart beat mechanism in favor of TCP_KEEPALIVE and
>   TCP_USER_TIMEOUT.
> 
> The next interesting question would be why we need to identify an
> unresponsive brick. Various reasons why we need to do that would be:
> * To replace/fix any problems the brick might have
> * Almost all of the cluster translators - DHT, AFR, EC - wait for a
response
> from all of their children - either successful or failure - before sending
> the response back to application. This means one or more slow/unresponsive
> brick can increase the latencies of fops/syscalls even though other bricks
> are responsive and healthy. However there are ongoing efforts to minimize
> the effect of few slow/unresponsive bricks [2]. I think principles of [2]
> can applied to DHT and AFR too.
> 
> Some recent discussions on the necessity of ping framework in glusterfs can
> be found at [3].
> 
> Having given all the above reasons for the existence of ping framework, its
> also important that ping-framework shouldn't bring down an otherwise
healthy
> connection (False positives). Reasons are:
> * As pointed out by Joe Julian in another mail on this thread, each
> connection carries some state on bricks like locks/open-fds which is
cleaned
> up on a disconnect. So, disconnects (even those followed by quick
> reconnects) are not completely transient to application. Though presence of
> HA layers like EC/AFR mitigates this problem to some extent, we still
don't
> have a lock healing implementation in place. So, once Quorum number of
> AFR/EC children go down (though may not be all at once), locks are no
longer
> held on bricks.
> * All the fops that are in transit in the time window starting from the
time
> of disconnect till a successful reconnect are failed by rpc/transport
layer.
> So, based on the configuration of volumes (whether AFR/EC/DHT prevent these
> errors from being seen by application), this *may* result in application
> seeing the error.
> 
> IOW, disconnects are not lightweight and we need to avoid them whenever
> possible. Since the action on ping-timer expiry is to disconnect the
> connection, we suggest not have very low values to avoid spurious
> disconnections.
> 
> [1] http://man7.org/linux/man-pages/man7/tcp.7.html
> [2] https://github.com/gluster/glusterfs/issues/366
> [3]
http://lists.gluster.org/pipermail/gluster-devel/2017-January/051938.html
> 
> > 
> > If I am confident that my network will not have many 11 second outages
and
> > if
> > they do occur I am willing to incur those extra costs due to resource
> > allocation is there any reason not to set ping-timeout to 10 seconds?
> > 
> > The problem I have with a long ping-timeout is that the Windows Samba
> > Client
> > disconnects after 25 seconds. So if one of the nodes of a Gluster
cluster
> > shuts down ungracefully then the Samba Client disconnects and the file
that
> > was being copied is incomplete on the server. These "costs"
seem to be much
> > higher than the potential costs of those Gluster resource
re-allocations.
> > But it is hard to estimate because there is not clear documentation
what
> > exactly those Gluster costs are.
> > 
> > In general I would be very interested in a comprehensive explanation
of
> > ping-timeout and the up- and downsides of setting high or low values
for
> > it.
> > 
> > Kinds regards,
> > Omar
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://lists.gluster.org/mailman/listinfo/gluster-users
> > 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>

Amar Tumballi

2018-Jan-10 06:28 UTC

head link

[Gluster-users] Exact purpose of network.ping-timeout

Can this get into 'FAQ' document somewhere? This is one of the major
question asked all the time.

Regards,
Amar

On Wed, Jan 10, 2018 at 10:56 AM, Raghavendra Gowdappa <rgowdapp at
redhat.com>
wrote:
> Sorry about the delayed response. Had to dig into the history to answer
> various "why"s.
>
> ----- Original Message -----
> > From: "Omar Kohl" <omar.kohl at iternity.com>
> > To: gluster-users at gluster.org
> > Sent: Tuesday, December 26, 2017 6:41:48 PM
> > Subject: [Gluster-users] Exact purpose of network.ping-timeout
> >
> > Hi,
> >
> > I have a question regarding the "ping-timeout" option. I
have been
> > researching its purpose for a few days and it is not completely clear
to
> me.
> > Especially that it is apparently strongly encouraged by the Gluster
> > community not to change or at least decrease this value!
> >
> > Assuming that I set ping-timeout to 10 seconds (instead of the default
> 42)
> > this would mean that if I have a network outage of 11 seconds then
> Gluster
> > internally would have to re-allocate some resources that it freed
after
> the
> > 10 seconds, correct? But apart from that there are no negative
> implications,
> > are there? For instance if I'm copying files during the network
outage
> then
> > those files will continue copying after those 11 seconds.
> >
> > This means that the only purpose of ping-timeout is to save those
extra
> > resources that are used by "short" network outages. Is that
correct?
>
> Basic purpose of ping-timer/heartbeat is to identify an unresponsive
> brick. Unresponsiveness can be caused due to various reasons like:
> * A deadlocked server. We no longer see too many instances of deadlocked
> bricks/server
> * Slow execution of fops in brick stack. For eg.,
>     - due to lock contention. There have been some efforts to fix the lock
> contention on brick stack.
>     - bad backend OS/filesystem. Posix health checker was an effort to fix
> this.
>     - Not enough threads for execution etc
>   Note that ideally its not the job of ping framework to identify this
> scenario and following the same thought process we've shielded the
> processing of ping requests on bricks from the costs of execution of
> requests to Glusterfs Program.
>
> * Ungraceful shutdown of network connections. For eg.,
>     - hard shutdown of machine/container/VM running the brick
>     - physically pulling out the network cable
>   Basically all those different scenarios where TCP/IP doesn't get a
> chance to inform the other end that it is going down. Note that some of the
> scenarios of ungraceful network shutdown can be identified using
> TCP_KEEPALIVE and TCP_USERTIMEOUT [1]. However, at the time when heartbeat
> mechanism was introduced in Glusterfs, TCP_KEEPALIVE couldn't identify
all
> the ungraceful network shutdown scenarios and TCP_USER_TIMEOUT was yet to
> be implemented in Linux kernel. One scenario which TCP_KEEPALIVE could
> identify was the exact scenario TCP_USER_TIMEOUT aims to solve -
> identifying an hard network shutdown when data is in transit. However there
> might be other limitations in TCP_KEEPALIVE which we need to test out
> before retiring heart beat mechanism in favor of TCP_KEEPALIVE and
> TCP_USER_TIMEOUT.
>
> The next interesting question would be why we need to identify an
> unresponsive brick. Various reasons why we need to do that would be:
> * To replace/fix any problems the brick might have
> * Almost all of the cluster translators - DHT, AFR, EC - wait for a
> response from all of their children - either successful or failure - before
> sending the response back to application. This means one or more
> slow/unresponsive brick can increase the latencies of fops/syscalls even
> though other bricks are responsive and healthy. However there are ongoing
> efforts to minimize the effect of few slow/unresponsive bricks [2]. I think
> principles of [2] can applied to DHT and AFR too.
>
> Some recent discussions on the necessity of ping framework in glusterfs
> can be found at [3].
>
> Having given all the above reasons for the existence of ping framework,
> its also important that ping-framework shouldn't bring down an
otherwise
> healthy connection (False positives). Reasons are:
> * As pointed out by Joe Julian in another mail on this thread, each
> connection carries some state on bricks like locks/open-fds which is
> cleaned up on a disconnect. So, disconnects (even those followed by quick
> reconnects) are not completely transient to application. Though presence of
> HA layers like EC/AFR mitigates this problem to some extent, we still
don't
> have a lock healing implementation in place. So, once Quorum number of
> AFR/EC children go down (though may not be all at once), locks are no
> longer held on bricks.
> * All the fops that are in transit in the time window starting from the
> time of disconnect till a successful reconnect are failed by rpc/transport
> layer. So, based on the configuration of volumes (whether AFR/EC/DHT
> prevent these errors from being seen by application), this *may* result in
> application seeing the error.
>
> IOW, disconnects are not lightweight and we need to avoid them whenever
> possible. Since the action on ping-timer expiry is to disconnect the
> connection, we suggest not have very low values to avoid spurious
> disconnections.
>
> [1] http://man7.org/linux/man-pages/man7/tcp.7.html
> [2] https://github.com/gluster/glusterfs/issues/366
> [3] http://lists.gluster.org/pipermail/gluster-devel/2017-
> January/051938.html
>
> >
> > If I am confident that my network will not have many 11 second outages
> and if
> > they do occur I am willing to incur those extra costs due to resource
> > allocation is there any reason not to set ping-timeout to 10 seconds?
> >
> > The problem I have with a long ping-timeout is that the Windows Samba
> Client
> > disconnects after 25 seconds. So if one of the nodes of a Gluster
cluster
> > shuts down ungracefully then the Samba Client disconnects and the file
> that
> > was being copied is incomplete on the server. These "costs"
seem to be
> much
> > higher than the potential costs of those Gluster resource
re-allocations.
> > But it is hard to estimate because there is not clear documentation
what
> > exactly those Gluster costs are.
> >
> > In general I would be very interested in a comprehensive explanation
of
> > ping-timeout and the up- and downsides of setting high or low values
for
> it.
> >
> > Kinds regards,
> > Omar
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://lists.gluster.org/mailman/listinfo/gluster-users
> >
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>


-- 
Amar Tumballi (amarts)
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20180110/56af1da3/attachment.html>

Reasonably Related Threads

Search for more maybe matching threads

Gluster users - Dec 2017 - Exact purpose of network.ping-timeout

[Gluster-users] Exact purpose of network.ping-timeout

[Gluster-users] Exact purpose of network.ping-timeout

[Gluster-users] Exact purpose of network.ping-timeout

[Gluster-users] Exact purpose of network.ping-timeout

[Gluster-users] Exact purpose of network.ping-timeout

[Gluster-users] Exact purpose of network.ping-timeout

Reasonably Related Threads