thr3ads.net - Gluster users - [Gluster-users] BUG: After stop and start wrong port is advertised [Dec 2017]

If this information is useful, please help other people find it:
Share via:

Atin Mukherjee

2017-Oct-30 16:40 UTC

[Gluster-users] BUG: After stop and start wrong port is advertised

On Sat, 28 Oct 2017 at 02:36, Jo Goossens <jo.goossens at
hosted-power.com>
wrote:
> Hello Atin,
>
>
>
>
>
> I just read it and very happy you found the issue. We really hope this
> will be fixed in the next 3.10.7 version!
>
3.10.7 - no I guess as the patch is still in review and 3.10.7 is getting
tagged today. You?ll get this fix in 3.10.8.

>
>
>
> PS: Wow nice all that c code and those "goto out" statements (not
always
> considered clean but the best way often I think). Can remember the days I
> wrote kernel drivers myself in c :)
>
>
>
>
>
> Regards
>
> Jo Goossens
>
>
>
>
>
>
>
>
> -----Original message-----
> *From:* Atin Mukherjee <amukherj at redhat.com>
> *Sent:* Fri 27-10-2017 21:01
> *Subject:* Re: [Gluster-users] BUG: After stop and start wrong port is
> advertised
> *To:* Jo Goossens <jo.goossens at hosted-power.com>;
> *CC:* gluster-users at gluster.org;
>
> We (finally) figured out the root cause, Jo!
>
> Patch https://review.gluster.org/#/c/18579 posted upstream for review.
>
> On Thu, Sep 21, 2017 at 2:08 PM, Jo Goossens <jo.goossens at
hosted-power.com
> > wrote:
>
> Hi,
>
>
>
>
>
> We use glusterfs 3.10.5 on Debian 9.
>
>
>
> When we stop or restart the service, e.g.: service glusterfs-server restart
>
>
>
> We see that the wrong port get's advertised afterwards. For example:
>
>
>
> Before restart:
>
>
> Status of volume: public
> Gluster process                             TCP Port  RDMA Port  Online
>  Pid
>
>
------------------------------------------------------------------------------
> Brick 192.168.140.41:/gluster/public        49153     0          Y
> 6364
> Brick 192.168.140.42:/gluster/public        49152     0          Y
> 1483
> Brick 192.168.140.43:/gluster/public        49152     0          Y
> 5913
> Self-heal Daemon on localhost               N/A       N/A        Y
> 5932
> Self-heal Daemon on 192.168.140.42          N/A       N/A        Y
> 13084
> Self-heal Daemon on 192.168.140.41          N/A       N/A        Y
> 15499
>
> Task Status of Volume public
>
>
------------------------------------------------------------------------------
> There are no active volume tasks
>
>
> After restart of the service on one of the nodes (192.168.140.43) the port
> seems to have changed (but it didn't):
>
> root at app3:/var/log/glusterfs#  gluster volume status
> Status of volume: public
> Gluster process                             TCP Port  RDMA Port  Online
>  Pid
>
>
------------------------------------------------------------------------------
> Brick 192.168.140.41:/gluster/public        49153     0          Y
> 6364
> Brick 192.168.140.42:/gluster/public        49152     0          Y
> 1483
> Brick 192.168.140.43:/gluster/public        49154     0          Y
> 5913
> Self-heal Daemon on localhost               N/A       N/A        Y
> 4628
> Self-heal Daemon on 192.168.140.42          N/A       N/A        Y
> 3077
> Self-heal Daemon on 192.168.140.41          N/A       N/A        Y
> 28777
>
> Task Status of Volume public
>
>
------------------------------------------------------------------------------
> There are no active volume tasks
>
>
> However the active process is STILL the same pid AND still listening on
> the old port
>
> root at 192.168.140.43:/var/log/glusterfs# netstat -tapn | grep gluster
> tcp        0      0 0.0.0.0:49152           0.0.0.0:*
> LISTEN      5913/glusterfsd
>
>
> The other nodes logs fill up with errors because they can't reach the
> daemon anymore. They try to reach it on the "new" port instead of
the old
> one:
>
> [2017-09-21 08:33:25.225006] E [socket.c:2327:socket_connect_finish]
> 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection
> refused); disconnecting socket
> [2017-09-21 08:33:29.226633] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
> 0-public-client-2: changing port to 49154 (from 0)
> [2017-09-21 08:33:29.227490] E [socket.c:2327:socket_connect_finish]
> 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection
> refused); disconnecting socket
> [2017-09-21 08:33:33.225849] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
> 0-public-client-2: changing port to 49154 (from 0)
> [2017-09-21 08:33:33.236395] E [socket.c:2327:socket_connect_finish]
> 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection
> refused); disconnecting socket
> [2017-09-21 08:33:37.225095] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
> 0-public-client-2: changing port to 49154 (from 0)
> [2017-09-21 08:33:37.225628] E [socket.c:2327:socket_connect_finish]
> 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection
> refused); disconnecting socket
> [2017-09-21 08:33:41.225805] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
> 0-public-client-2: changing port to 49154 (from 0)
> [2017-09-21 08:33:41.226440] E [socket.c:2327:socket_connect_finish]
> 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection
> refused); disconnecting socket
>
> So they now try 49154 instead of the old 49152
>
> Is this also by design? We had a lot of issues because of this recently.
> We don't understand why it starts advertising a completely wrong port
after
> stop/start.
>
>
>
>
>
>
>
> Regards
>
> Jo Goossens
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
> --- Atin (atinm)
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20171030/c7a32f87/attachment.html>

Jo Goossens

2017-Dec-02 13:53 UTC

head link

[Gluster-users] BUG: After stop and start wrong port is advertised

Hello Atin,

?
?
Could you confirm this should have been fixed in 3.10.8? If so we'll test it
for sure!



Regards

Jo

?

?
-----Original message-----
From:Atin Mukherjee <amukherj at redhat.com>
Sent:Mon 30-10-2017 17:40
Subject:Re: [Gluster-users] BUG: After stop and start wrong port is advertised
To:Jo Goossens <jo.goossens at hosted-power.com>; 
CC:gluster-users at gluster.org; 
 

On Sat, 28 Oct 2017 at 02:36, Jo Goossens <jo.goossens at hosted-power.com
<mailto:jo.goossens at hosted-power.com> > wrote:
 

Hello Atin,

?
?
I just read it and very happy you found the issue. We really hope this will be
fixed in the next 3.10.7 version!

?3.10.7 - no I guess as the patch is still in review and 3.10.7 is getting
tagged today. You?ll get this fix in 3.10.8.?
??
?
?
PS: Wow nice all that c code and those "goto out" statements (not
always considered clean but the best way often I think). Can remember the days I
wrote kernel drivers myself in c :)

?
?
Regards

Jo Goossens

?
?

?
-----Original message-----
From:Atin Mukherjee <amukherj at redhat.com <mailto:amukherj at
redhat.com> >
Sent:Fri 27-10-2017 21:01
Subject:Re: [Gluster-users] BUG: After stop and start wrong port is advertised
To:Jo Goossens <jo.goossens at hosted-power.com <mailto:jo.goossens at
hosted-power.com> >;
CC:gluster-users at gluster.org <mailto:gluster-users at gluster.org> ; 
 
We (finally) figured out the root cause, Jo!
?Patch https://review.gluster.org/#/c/18579 posted upstream for review.

On Thu, Sep 21, 2017 at 2:08 PM, Jo Goossens <jo.goossens at hosted-power.com
<mailto:jo.goossens at hosted-power.com> > wrote:
 

Hi,

?
?
We use glusterfs 3.10.5 on Debian 9.

?
When we stop or restart the service, e.g.:?service glusterfs-server restart

?
We see that the wrong port get's advertised afterwards. For example:

?
Before restart:

?
Status of volume: public
Gluster process ? ? ? ? ? ? ? ? ? ? ? ? ? ? TCP Port ?RDMA Port ?Online ?Pid
------------------------------------------------------------------------------
Brick 192.168.140.41:/gluster/public ? ? ? ?49153 ? ? 0 ? ? ? ? ?Y ? ? ? 6364
Brick 192.168.140.42:/gluster/public ? ? ? ?49152 ? ? 0 ? ? ? ? ?Y ? ? ? 1483
Brick 192.168.140.43:/gluster/public ? ? ? ?49152 ? ? 0 ? ? ? ? ?Y ? ? ? 5913
Self-heal Daemon on localhost ? ? ? ? ? ? ? N/A ? ? ? N/A ? ? ? ?Y ? ? ? 5932
Self-heal Daemon on 192.168.140.42 ? ? ? ? ?N/A ? ? ? N/A ? ? ? ?Y ? ? ? 13084
Self-heal Daemon on 192.168.140.41 ? ? ? ? ?N/A ? ? ? N/A ? ? ? ?Y ? ? ? 15499
?Task Status of Volume public
------------------------------------------------------------------------------
There are no active volume tasks
??After restart of the service on one of the nodes (192.168.140.43) the port
seems to have changed (but it didn't):
?root at app3:/var/log/glusterfs# ?gluster volume status
Status of volume: public
Gluster process ? ? ? ? ? ? ? ? ? ? ? ? ? ? TCP Port ?RDMA Port ?Online ?Pid
------------------------------------------------------------------------------
Brick 192.168.140.41:/gluster/public ? ? ? ?49153 ? ? 0 ? ? ? ? ?Y ? ? ? 6364
Brick 192.168.140.42:/gluster/public ? ? ? ?49152 ? ? 0 ? ? ? ? ?Y ? ? ? 1483
Brick 192.168.140.43:/gluster/public ? ? ? ?49154 ? ? 0 ? ? ? ? ?Y ? ? ? 5913
Self-heal Daemon on localhost ? ? ? ? ? ? ? N/A ? ? ? N/A ? ? ? ?Y ? ? ? 4628
Self-heal Daemon on 192.168.140.42 ? ? ? ? ?N/A ? ? ? N/A ? ? ? ?Y ? ? ? 3077
Self-heal Daemon on 192.168.140.41 ? ? ? ? ?N/A ? ? ? N/A ? ? ? ?Y ? ? ? 28777
?Task Status of Volume public
------------------------------------------------------------------------------
There are no active volume tasks
??However the active process is STILL the same pid AND still listening on the
old port
?root at 192.168.140.43:/var/log/glusterfs# netstat -tapn | grep gluster
tcp ? ? ? ?0 ? ? ?0 0.0.0.0:49152 <http://0.0.0.0:49152> ? ? ? ? ?
0.0.0.0:* ? ? ? ? ? ? ? LISTEN ? ? ?5913/glusterfsd
??The other nodes logs fill up with errors because they can't reach the
daemon anymore. They try to reach it on the "new" port instead of the
old one:
?[2017-09-21 08:33:25.225006] E [socket.c:2327:socket_connect_finish]
0-public-client-2: connection to 192.168.140.43:49154
<http://192.168.140.43:49154> failed (Connection refused); disconnecting
socket
[2017-09-21 08:33:29.226633] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
0-public-client-2: changing port to 49154 (from 0)
[2017-09-21 08:33:29.227490] E [socket.c:2327:socket_connect_finish]
0-public-client-2: connection to 192.168.140.43:49154
<http://192.168.140.43:49154> failed (Connection refused); disconnecting
socket
[2017-09-21 08:33:33.225849] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
0-public-client-2: changing port to 49154 (from 0)
[2017-09-21 08:33:33.236395] E [socket.c:2327:socket_connect_finish]
0-public-client-2: connection to 192.168.140.43:49154
<http://192.168.140.43:49154> failed (Connection refused); disconnecting
socket
[2017-09-21 08:33:37.225095] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
0-public-client-2: changing port to 49154 (from 0)
[2017-09-21 08:33:37.225628] E [socket.c:2327:socket_connect_finish]
0-public-client-2: connection to 192.168.140.43:49154
<http://192.168.140.43:49154> failed (Connection refused); disconnecting
socket
[2017-09-21 08:33:41.225805] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
0-public-client-2: changing port to 49154 (from 0)
[2017-09-21 08:33:41.226440] E [socket.c:2327:socket_connect_finish]
0-public-client-2: connection to 192.168.140.43:49154
<http://192.168.140.43:49154> failed (Connection refused); disconnecting
socket
?So they now try 49154 instead of the old 49152?
?Is this also by design? We had a lot of issues because of this recently. We
don't understand why it starts advertising a completely wrong port after
stop/start.
?????
Regards

Jo Goossens

?
 
 
_______________________________________________
 Gluster-users mailing list
 Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> 
 http://lists.gluster.org/mailman/listinfo/gluster-users
<http://lists.gluster.org/mailman/listinfo/gluster-users>
 
-- 
- Atin (atinm)
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20171202/7b5dc184/attachment.html>

Atin Mukherjee

2017-Dec-02 15:23 UTC

head link

[Gluster-users] BUG: After stop and start wrong port is advertised

On Sat, 2 Dec 2017 at 19:29, Jo Goossens <jo.goossens at hosted-power.com>
wrote:
> Hello Atin,
>
>
>
>
>
> Could you confirm this should have been fixed in 3.10.8? If so we'll
test
> it for sure!
>
Fix should be part of 3.10.8 which is awaiting release announcement.

>
> Regards
>
> Jo
>
>
>
>
>
>
> -----Original message-----
> *From:* Atin Mukherjee <amukherj at redhat.com>
>
> *Sent:* Mon 30-10-2017 17:40
> *Subject:* Re: [Gluster-users] BUG: After stop and start wrong port is
> advertised
> *To:* Jo Goossens <jo.goossens at hosted-power.com>;
> *CC:* gluster-users at gluster.org;
>
> On Sat, 28 Oct 2017 at 02:36, Jo Goossens <jo.goossens at
hosted-power.com>
> wrote:
>
> Hello Atin,
>
>
>
>
>
> I just read it and very happy you found the issue. We really hope this
> will be fixed in the next 3.10.7 version!
>
>
> 3.10.7 - no I guess as the patch is still in review and 3.10.7 is getting
> tagged today. You?ll get this fix in 3.10.8.
>
>
>
>
>
>
>
>
> PS: Wow nice all that c code and those "goto out" statements (not
always
> considered clean but the best way often I think). Can remember the days I
> wrote kernel drivers myself in c :)
>
>
>
>
>
> Regards
>
> Jo Goossens
>
>
>
>
>
>
>
>
> -----Original message-----
> *From:* Atin Mukherjee <amukherj at redhat.com>
> *Sent:* Fri 27-10-2017 21:01
> *Subject:* Re: [Gluster-users] BUG: After stop and start wrong port is
> advertised
> *To:* Jo Goossens <jo.goossens at hosted-power.com>;
> *CC:* gluster-users at gluster.org;
>
> We (finally) figured out the root cause, Jo!
>
> Patch https://review.gluster.org/#/c/18579 posted upstream for review.
>
> On Thu, Sep 21, 2017 at 2:08 PM, Jo Goossens <jo.goossens at
hosted-power.com>
> wrote:
>
> Hi,
>
>
>
>
>
> We use glusterfs 3.10.5 on Debian 9.
>
>
>
> When we stop or restart the service, e.g.: service glusterfs-server restart
>
>
>
> We see that the wrong port get's advertised afterwards. For example:
>
>
>
> Before restart:
>
>
> Status of volume: public
> Gluster process                             TCP Port  RDMA Port  Online
>  Pid
>
>
------------------------------------------------------------------------------
> Brick 192.168.140.41:/gluster/public        49153     0          Y
> 6364
> Brick 192.168.140.42:/gluster/public        49152     0          Y
> 1483
> Brick 192.168.140.43:/gluster/public        49152     0          Y
> 5913
> Self-heal Daemon on localhost               N/A       N/A        Y
> 5932
> Self-heal Daemon on 192.168.140.42          N/A       N/A        Y
> 13084
> Self-heal Daemon on 192.168.140.41          N/A       N/A        Y
> 15499
>
> Task Status of Volume public
>
>
------------------------------------------------------------------------------
> There are no active volume tasks
>
>
> After restart of the service on one of the nodes (192.168.140.43) the port
> seems to have changed (but it didn't):
>
> root at app3:/var/log/glusterfs#  gluster volume status
> Status of volume: public
> Gluster process                             TCP Port  RDMA Port  Online
>  Pid
>
>
------------------------------------------------------------------------------
> Brick 192.168.140.41:/gluster/public        49153     0          Y
> 6364
> Brick 192.168.140.42:/gluster/public        49152     0          Y
> 1483
> Brick 192.168.140.43:/gluster/public        49154     0          Y
> 5913
> Self-heal Daemon on localhost               N/A       N/A        Y
> 4628
> Self-heal Daemon on 192.168.140.42          N/A       N/A        Y
> 3077
> Self-heal Daemon on 192.168.140.41          N/A       N/A        Y
> 28777
>
> Task Status of Volume public
>
>
------------------------------------------------------------------------------
> There are no active volume tasks
>
>
> However the active process is STILL the same pid AND still listening on
> the old port
>
> root at 192.168.140.43:/var/log/glusterfs# netstat -tapn | grep gluster
> tcp        0      0 0.0.0.0:49152           0.0.0.0:*
> LISTEN      5913/glusterfsd
>
>
> The other nodes logs fill up with errors because they can't reach the
> daemon anymore. They try to reach it on the "new" port instead of
the old
> one:
>
> [2017-09-21 08:33:25.225006] E [socket.c:2327:socket_connect_finish]
> 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection
> refused); disconnecting socket
> [2017-09-21 08:33:29.226633] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
> 0-public-client-2: changing port to 49154 (from 0)
> [2017-09-21 08:33:29.227490] E [socket.c:2327:socket_connect_finish]
> 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection
> refused); disconnecting socket
> [2017-09-21 08:33:33.225849] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
> 0-public-client-2: changing port to 49154 (from 0)
> [2017-09-21 08:33:33.236395] E [socket.c:2327:socket_connect_finish]
> 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection
> refused); disconnecting socket
> [2017-09-21 08:33:37.225095] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
> 0-public-client-2: changing port to 49154 (from 0)
> [2017-09-21 08:33:37.225628] E [socket.c:2327:socket_connect_finish]
> 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection
> refused); disconnecting socket
> [2017-09-21 08:33:41.225805] I [rpc-clnt.c:2000:rpc_clnt_reconfig]
> 0-public-client-2: changing port to 49154 (from 0)
> [2017-09-21 08:33:41.226440] E [socket.c:2327:socket_connect_finish]
> 0-public-client-2: connection to 192.168.140.43:49154 failed (Connection
> refused); disconnecting socket
>
> So they now try 49154 instead of the old 49152
>
> Is this also by design? We had a lot of issues because of this recently.
> We don't understand why it starts advertising a completely wrong port
after
> stop/start.
>
>
>
>
>
>
>
> Regards
>
> Jo Goossens
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
> --
> - Atin (atinm)
>
> --- Atin (atinm)
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20171202/58a938ce/attachment.html>

Possibly Parallel Threads

Search for more apparently analagous threads

Gluster users - Dec 2017 - BUG: After stop and start wrong port is advertised

[Gluster-users] BUG: After stop and start wrong port is advertised

[Gluster-users] BUG: After stop and start wrong port is advertised

[Gluster-users] BUG: After stop and start wrong port is advertised

Possibly Parallel Threads