Hi Gluster community, Can someone who has insight on how rpc_client_ping_timer_expired operates, I would love to learn more about. The reason behind it is that last week I had 2 fuse clients produce the same disconnect message, but reconnected immediately afterwards. What I'd like to know is what may have caused it to behave this way and where else I can look to build and understanding of root cause. The gluster node does show the same disconnect/reconnect. Jan 28 14:25:27 omhq1cab GlusterFS[1640]: [2016-01-28 20:25:27.685703] C [client-handshake.c:127:rpc_client_ping_timer_expired] 0-prodstatic-client-3: server 72.36.4.204:49155 has not responded in the last 10 seconds, disconnecting. Jan 28 14:24:52 omhq1ca9 GlusterFS[1612]: [2016-01-28 20:24:52.589450] C [client-handshake.c:127:rpc_client_ping_timer_expired] 0-prodstatic-client-3: server 72.36.4.204:49155 has not responded in the last 10 seconds, disconnecting. My setup for the volume is as follows: Brick4 was the one that appeared not responding to the clients. I have an environment where multiple clients(30+) mount this volume and none of them had any issues with Brick4 logged. Volume Name: prodstatic Type: Distributed-Replicate Volume ID: 187c241d-0eeb-4405-80f2-c704ea44bc36 Status: Started Number of Bricks: 2 x 4 = 8 Transport-type: tcp Bricks: Brick1: server1140:/export/content/static Brick2: server1c5d:/export/content/static Brick3: server11ad:/export/content/static Brick4: server1781:/export/content/static Brick5: server1c56:/export/content/static Brick6: server1c58:/export/content/static Brick7: server1c57:/export/content/static Brick8: server1c59:/export/content/static Options Reconfigured: network.ping-timeout: 10 server.allow-insecure: on features.quota: on Thanks Khoi ** This email and any attachments may contain information that is confidential and/or privileged for the sole use of the intended recipient. Any use, review, disclosure, copying, distribution or reliance by others, and any forwarding of this email or its contents, without the express permission of the sender is strictly prohibited by law. If you are not the intended recipient, please contact the sender immediately, delete the e-mail and destroy all copies. ** -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160204/2f344a24/attachment.html>
Pranith Kumar Karampuri
2016-Feb-05 05:26 UTC
[Gluster-users] rpc_client_ping_timer_expired logic
On 02/04/2016 08:26 PM, Khoi Mai wrote:> Hi Gluster community, > > Can someone who has insight on how rpc_client_ping_timer_expired > operates, I would love to learn more about. The reason behind it is > that last week I had 2 fuse clients produce the same disconnect > message, but reconnected immediately afterwards. What I'd like to > know is what may have caused it to behave this way and where else I > can look to build and understanding of root cause. The gluster node > does show the same disconnect/reconnect.The way it works is, when a first message is sent to the server, ping rpc is sent to server and a 42 seconds timer is started by default (It can be changed with network.ping-timeout). If the ping response comes it will stop the earlier timer and will start a 42 second timer again for next ping message. If the ping response doesn't come in 42 seconds timer expires at that point if there was no transport activity where some other messages were sent/received the transport gets disconnected and reconnect is attempted. Otherwise it think the ping response may come after some more time so delays the timer by 42 more seconds to see if the response comes. Pranith> > Jan 28 14:25:27 omhq1cab GlusterFS[1640]: [2016-01-28 20:25:27.685703] > C [client-handshake.c:127:rpc_client_ping_timer_expired] > 0-prodstatic-client-3: server 72.36.4.204:49155 has not responded in > the last 10 seconds, disconnecting. > > > Jan 28 14:24:52 omhq1ca9 GlusterFS[1612]: [2016-01-28 20:24:52.589450] > C [client-handshake.c:127:rpc_client_ping_timer_expired] > 0-prodstatic-client-3: server 72.36.4.204:49155 has not responded in > the last 10 seconds, disconnecting. > > My setup for the volume is as follows: Brick4 was the one that > appeared not responding to the clients. I have an environment where > multiple clients(30+) mount this volume and none of them had any > issues with Brick4 logged. > > Volume Name: prodstatic > Type: Distributed-Replicate > Volume ID: 187c241d-0eeb-4405-80f2-c704ea44bc36 > Status: Started > Number of Bricks: 2 x 4 = 8 > Transport-type: tcp > Bricks: > Brick1: server1140:/export/content/static > Brick2: server1c5d:/export/content/static > Brick3: server11ad:/export/content/static > *Brick4: server1781:/export/content/static* > Brick5: server1c56:/export/content/static > Brick6: server1c58:/export/content/static > Brick7: server1c57:/export/content/static > Brick8: server1c59:/export/content/static > Options Reconfigured: > network.ping-timeout: 10 > server.allow-insecure: on > features.quota: on > > Thanks > Khoi > > > ** > > > > This email and any attachments may contain information that is > confidential and/or privileged for the sole use of the intended > recipient. Any use, review, disclosure, copying, distribution or > reliance by others, and any forwarding of this email or its contents, > without the express permission of the sender is strictly prohibited by > law. If you are not the intended recipient, please contact the sender > immediately, delete the e-mail and destroy all copies. > > ** > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160205/8b0c6694/attachment.html>