thr3ads.net - Gluster users - [Gluster-users] After stop and start wrong port is advertised [Sep 2017]

If this information is useful, please help other people find it:
Share via:

Jo Goossens

2017-Sep-21 21:07 UTC

[Gluster-users] After stop and start wrong port is advertised

Hi,

?
?
We use glusterfs 3.10.5 on Debian 9.

?
When we stop or restart the service, e.g.:?service glusterfs-server restart

?
We see that the wrong port get's advertised afterwards. For example:

?
Before restart:

?
Status of volume: public
Gluster process ? ? ? ? ? ? ? ? ? ? ? ? ? ? TCP Port ?RDMA Port ?Online ?Pid
------------------------------------------------------------------------------
Brick 192.168.140.41:/gluster/public ? ? ? ?49153 ? ? 0 ? ? ? ? ?Y ? ? ? 6364
Brick 192.168.140.42:/gluster/public ? ? ? ?49152 ? ? 0 ? ? ? ? ?Y ? ? ? 1483
Brick 192.168.140.43:/gluster/public ? ? ? ?49152 ? ? 0 ? ? ? ? ?Y ? ? ? 5913
Self-heal Daemon on localhost ? ? ? ? ? ? ? N/A ? ? ? N/A ? ? ? ?Y ? ? ? 5932
Self-heal Daemon on 192.168.140.42 ? ? ? ? ?N/A ? ? ? N/A ? ? ? ?Y ? ? ? 13084
Self-heal Daemon on 192.168.140.41 ? ? ? ? ?N/A ? ? ? N/A ? ? ? ?Y ? ? ? 15499
?Task Status of Volume public
------------------------------------------------------------------------------
There are no active volume tasks
??After restart of the service on one of the nodes (192.168.140.43) the port
seems to have changed (but it didn't):
?root at app3:/var/log/glusterfs# ?gluster volume status
Status of volume: public
Gluster process ? ? ? ? ? ? ? ? ? ? ? ? ? ? TCP Port ?RDMA Port ?Online ?Pid
------------------------------------------------------------------------------
Brick 192.168.140.41:/gluster/public ? ? ? ?49153 ? ? 0 ? ? ? ? ?Y ? ? ? 6364
Brick 192.168.140.42:/gluster/public ? ? ? ?49152 ? ? 0 ? ? ? ? ?Y ? ? ? 1483
Brick 192.168.140.43:/gluster/public ? ? ? ?49154 ? ? 0 ? ? ? ? ?Y ? ? ? 5913
Self-heal Daemon on localhost ? ? ? ? ? ? ? N/A ? ? ? N/A ? ? ? ?Y ? ? ? 4628
Self-heal Daemon on 192.168.140.42 ? ? ? ? ?N/A ? ? ? N/A ? ? ? ?Y ? ? ? 3077
Self-heal Daemon on 192.168.140.41 ? ? ? ? ?N/A ? ? ? N/A ? ? ? ?Y ? ? ? 28777
?Task Status of Volume public
------------------------------------------------------------------------------
There are no active volume tasks
??However the active process is STILL the same pid AND still listening on the
old port
?root at 192.168.140.43:/var/log/glusterfs# netstat -tapn | grep gluster
tcp ? ? ? ?0 ? ? ?0 0.0.0.0:49152 ? ? ? ? ? 0.0.0.0:* ? ? ? ? ? ? ? LISTEN ? ?
?5913/glusterfsd
??The other nodes logs fill up with errors because they can't reach the
daemon anymore. They try to reach it on the "new" port instead of the
old one:
?[2017-09-21 08:33:25.225006] E [socket.c:2327:socket_connect_finish]
0-public-client-2: connection to 192.168.140.43:49154 failed (Connection
refused); disconnecting socket
[2017-09-21 08:33:29.226633] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
0-public-client-2: changing port to 49154 (from 0)
[2017-09-21 08:33:29.227490] E [socket.c:2327:socket_connect_finish]
0-public-client-2: connection to 192.168.140.43:49154 failed (Connection
refused); disconnecting socket
[2017-09-21 08:33:33.225849] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
0-public-client-2: changing port to 49154 (from 0)
[2017-09-21 08:33:33.236395] E [socket.c:2327:socket_connect_finish]
0-public-client-2: connection to 192.168.140.43:49154 failed (Connection
refused); disconnecting socket
[2017-09-21 08:33:37.225095] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
0-public-client-2: changing port to 49154 (from 0)
[2017-09-21 08:33:37.225628] E [socket.c:2327:socket_connect_finish]
0-public-client-2: connection to 192.168.140.43:49154 failed (Connection
refused); disconnecting socket
[2017-09-21 08:33:41.225805] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
0-public-client-2: changing port to 49154 (from 0)
[2017-09-21 08:33:41.226440] E [socket.c:2327:socket_connect_finish]
0-public-client-2: connection to 192.168.140.43:49154 failed (Connection
refused); disconnecting socket
?So they now try 49154 instead of the old 49152?
?Is this also by design? We had a lot of issues because of this recently. We
don't understand why it starts advertising a completely wrong port after
stop/start.
?????
Regards

Jo Goossens

?

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170921/42c04558/attachment.html>

Atin Mukherjee

2017-Sep-22 08:36 UTC

head link

[Gluster-users] After stop and start wrong port is advertised

On Fri, Sep 22, 2017 at 2:37 AM, Jo Goossens <jo.goossens at
hosted-power.com>
wrote:
> Hi,
>
>
>
>
>
> We use glusterfs 3.10.5 on Debian 9.
>
>
>
> When we stop or restart the service, e.g.: service glusterfs-server restart
>
>
>
> We see that the wrong port get's advertised afterwards. For example:
>
>
>
> Before restart:
>
>
> Status of volume: public
> Gluster process                             TCP Port  RDMA Port  Online
>  Pid
> ------------------------------------------------------------
> ------------------
> Brick 192.168.140.41:/gluster/public        49153     0          Y
> 6364
> Brick 192.168.140.42:/gluster/public        49152     0          Y
> 1483
> Brick 192.168.140.43:/gluster/public        49152     0          Y
> 5913
> Self-heal Daemon on localhost               N/A       N/A        Y
> 5932
> Self-heal Daemon on 192.168.140.42          N/A       N/A        Y
> 13084
> Self-heal Daemon on 192.168.140.41          N/A       N/A        Y
> 15499
>
> Task Status of Volume public
> ------------------------------------------------------------
> ------------------
> There are no active volume tasks
>
>
> After restart of the service on one of the nodes (192.168.140.43) the port
> seems to have changed (but it didn't):
>
> root at app3:/var/log/glusterfs#  gluster volume status
> Status of volume: public
> Gluster process                             TCP Port  RDMA Port  Online
>  Pid
> ------------------------------------------------------------
> ------------------
> Brick 192.168.140.41:/gluster/public        49153     0          Y
> 6364
> Brick 192.168.140.42:/gluster/public        49152     0          Y
> 1483
> Brick 192.168.140.43:/gluster/public        49154     0          Y
> 5913
> Self-heal Daemon on localhost               N/A       N/A        Y
> 4628
> Self-heal Daemon on 192.168.140.42          N/A       N/A        Y
> 3077
> Self-heal Daemon on 192.168.140.41          N/A       N/A        Y
> 28777
>
> Task Status of Volume public
> ------------------------------------------------------------
> ------------------
> There are no active volume tasks
>
>
> However the active process is STILL the same pid AND still listening on
> the old port
>
> root at 192.168.140.43:/var/log/glusterfs# netstat -tapn | grep gluster
> tcp        0      0 0.0.0.0:49152           0.0.0.0:*
> LISTEN      5913/glusterfsd
>
>
> The other nodes logs fill up with errors because they can't reach the
> daemon anymore. They try to reach it on the "new" port instead of
the old
> one:
>
> [2017-09-21 08:33:25.225006] E [socket.c:2327:socket_connect_finish]
> 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection
> refused); disconnecting socket
> [2017-09-21 08:33:29.226633] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
> 0-public-client-2: changing port to 49154 (from 0)
> [2017-09-21 08:33:29.227490] E [socket.c:2327:socket_connect_finish]
> 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection
> refused); disconnecting socket
> [2017-09-21 08:33:33.225849] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
> 0-public-client-2: changing port to 49154 (from 0)
> [2017-09-21 08:33:33.236395] E [socket.c:2327:socket_connect_finish]
> 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection
> refused); disconnecting socket
> [2017-09-21 08:33:37.225095] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
> 0-public-client-2: changing port to 49154 (from 0)
> [2017-09-21 08:33:37.225628] E [socket.c:2327:socket_connect_finish]
> 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection
> refused); disconnecting socket
> [2017-09-21 08:33:41.225805] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
> 0-public-client-2: changing port to 49154 (from 0)
> [2017-09-21 08:33:41.226440] E [socket.c:2327:socket_connect_finish]
> 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection
> refused); disconnecting socket
>
> So they now try 49154 instead of the old 49152
>
> Is this also by design? We had a lot of issues because of this recently.
> We don't understand why it starts advertising a completely wrong port
after
> stop/start.
>
This looks like a bug to me. For some reason glusterd's portmap is
referring to a stale port (IMO) where as brick is still listening to the
correct port. But ideally when glusterd service is restarted, all the
portmap in-memory is rebuilt. I'd request for the following details from
you to let us start analysing it:

1. glusterd statedump output from 192.168.140.43 . You can use kill
-SIGUSR2 <pid of glusterd> to request for a statedump and the file will be
available in /var/run/gluster
2. glusterd, brick logfile for 192.168.140.43:/gluster/public from
192.168.140.43
3. cmd_history logfile from all the nodes.
4. Content of /var/lib/glusterd/vols/public/

>
>
>
>
>
>
> Regards
>
> Jo Goossens
>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170922/9a7ab323/attachment.html>

Possibly Parallel Threads

Search for more reasonably related threads

Gluster users - Sep 2017 - After stop and start wrong port is advertised

[Gluster-users] After stop and start wrong port is advertised

[Gluster-users] After stop and start wrong port is advertised

Possibly Parallel Threads