thr3ads.net - Gluster users - [Gluster-users] Upgrade 5.3 -> 5.4 on debian: public IP is used instead of LAN IP [Mar 2019]

If this information is useful, please help other people find it:
Share via:

Hu Bert

2019-Mar-05 07:23 UTC

[Gluster-users] Upgrade 5.3 -> 5.4 on debian: public IP is used instead of LAN IP

Interestingly: gluster volume status misses gluster1, while heal
statistics show gluster1:

gluster volume status workdata
Status of volume: workdata
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick gluster2:/gluster/md4/workdata        49153     0          Y       1723
Brick gluster3:/gluster/md4/workdata        49153     0          Y       2068
Self-heal Daemon on localhost               N/A       N/A        Y       1732
Self-heal Daemon on gluster3                N/A       N/A        Y       2077

vs.

gluster volume heal workdata statistics heal-count
Gathering count of entries to be healed on volume workdata has been successful

Brick gluster1:/gluster/md4/workdata
Number of entries: 0

Brick gluster2:/gluster/md4/workdata
Number of entries: 10745

Brick gluster3:/gluster/md4/workdata
Number of entries: 10744

Am Di., 5. M?rz 2019 um 08:18 Uhr schrieb Hu Bert <revirii at
googlemail.com>:>
> Hi Miling,
>
> well, there are such entries, but those haven't been a problem during
> install and the last kernel update+reboot. The entries look like:
>
> PUBLIC_IP  gluster2.alpserver.de gluster2
>
> 192.168.0.50 gluster1
> 192.168.0.51 gluster2
> 192.168.0.52 gluster3
>
> 'ping gluster2' resolves to LAN IP; I removed the last entry in the
> 1st line, did a reboot ... no, didn't help. From
> /var/log/glusterfs/glusterd.log
>  on gluster 2:
>
> [2019-03-05 07:04:36.188128] E [MSGID: 106010]
> [glusterd-utils.c:3483:glusterd_compare_friend_volume] 0-management:
> Version of Cksums persistent differ. local cksum = 3950307018, remote
> cksum = 455409345 on peer gluster1
> [2019-03-05 07:04:36.188314] I [MSGID: 106493]
> [glusterd-handler.c:3843:glusterd_xfer_friend_add_resp] 0-glusterd:
> Responded to gluster1 (0), ret: 0, op_ret: -1
>
> Interestingly there are no entries in the brick logs of the rejected
> server. Well, not surprising as no brick process is running. The
> server gluster1 is still in rejected state.
>
> 'gluster volume start workdata force' starts the brick process on
> gluster1, and some heals are happening on gluster2+3, but via 'gluster
> volume status workdata' the volumes still aren't complete.
>
> gluster1:
>
------------------------------------------------------------------------------
> Brick gluster1:/gluster/md4/workdata        49152     0          Y      
2523
> Self-heal Daemon on localhost               N/A       N/A        Y      
2549
>
> gluster2:
> Gluster process                             TCP Port  RDMA Port  Online 
Pid
>
------------------------------------------------------------------------------
> Brick gluster2:/gluster/md4/workdata        49153     0          Y      
1723
> Brick gluster3:/gluster/md4/workdata        49153     0          Y      
2068
> Self-heal Daemon on localhost               N/A       N/A        Y      
1732
> Self-heal Daemon on gluster3                N/A       N/A        Y      
2077
>
>
> Hubert
>
> Am Di., 5. M?rz 2019 um 07:58 Uhr schrieb Milind Changire <mchangir at
redhat.com>:
> >
> > There are probably DNS entries or /etc/hosts entries with the public
IP Addresses that the host names (gluster1, gluster2, gluster3) are getting
resolved to.
> > /etc/resolv.conf would tell which is the default domain searched for
the node names and the DNS servers which respond to the queries.
> >
> >
> > On Tue, Mar 5, 2019 at 12:14 PM Hu Bert <revirii at
googlemail.com> wrote:
> >>
> >> Good morning,
> >>
> >> i have a replicate 3 setup with 2 volumes, running on version 5.3
on
> >> debian stretch. This morning i upgraded one server to version 5.4
and
> >> rebooted the machine; after the restart i noticed that:
> >>
> >> - no brick process is running
> >> - gluster volume status only shows the server itself:
> >> gluster volume status workdata
> >> Status of volume: workdata
> >> Gluster process                             TCP Port  RDMA Port 
Online  Pid
> >>
------------------------------------------------------------------------------
> >> Brick gluster1:/gluster/md4/workdata        N/A       N/A        N
N/A
> >> NFS Server on localhost                     N/A       N/A        N
N/A
> >>
> >> - gluster peer status on the server
> >> gluster peer status
> >> Number of Peers: 2
> >>
> >> Hostname: gluster3
> >> Uuid: c7b4a448-ca6a-4051-877f-788f9ee9bc4a
> >> State: Peer Rejected (Connected)
> >>
> >> Hostname: gluster2
> >> Uuid: 162fea82-406a-4f51-81a3-e90235d8da27
> >> State: Peer Rejected (Connected)
> >>
> >> - gluster peer status on the other 2 servers:
> >> gluster peer status
> >> Number of Peers: 2
> >>
> >> Hostname: gluster1
> >> Uuid: 9a360776-7b58-49ae-831e-a0ce4e4afbef
> >> State: Peer Rejected (Connected)
> >>
> >> Hostname: gluster3
> >> Uuid: c7b4a448-ca6a-4051-877f-788f9ee9bc4a
> >> State: Peer in Cluster (Connected)
> >>
> >> I noticed that, in the brick logs, i see that the public IP is
used
> >> instead of the LAN IP. brick logs from one of the volumes:
> >>
> >> rejected node: https://pastebin.com/qkpj10Sd
> >> connected nodes: https://pastebin.com/8SxVVYFV
> >>
> >> Why is the public IP suddenly used instead of the LAN IP? Killing
all
> >> gluster processes and rebooting (again) didn't help.
> >>
> >>
> >> Thx,
> >> Hubert
> >> _______________________________________________
> >> Gluster-users mailing list
> >> Gluster-users at gluster.org
> >> https://lists.gluster.org/mailman/listinfo/gluster-users
> >
> >
> >
> > --
> > Milind
> >

Hari Gowtham

2019-Mar-05 07:32 UTC

head link

[Gluster-users] Upgrade 5.3 -> 5.4 on debian: public IP is used instead of LAN IP

Hi,

This is a known issue we are working on.
As the checksum differs between the updated and non updated node, the
peers are getting rejected.
The bricks aren't coming because of the same issue.

More about the issue: https://bugzilla.redhat.com/show_bug.cgi?id=1685120

On Tue, Mar 5, 2019 at 12:56 PM Hu Bert <revirii at googlemail.com>
wrote:>
> Interestingly: gluster volume status misses gluster1, while heal
> statistics show gluster1:
>
> gluster volume status workdata
> Status of volume: workdata
> Gluster process                             TCP Port  RDMA Port  Online 
Pid
>
------------------------------------------------------------------------------
> Brick gluster2:/gluster/md4/workdata        49153     0          Y      
1723
> Brick gluster3:/gluster/md4/workdata        49153     0          Y      
2068
> Self-heal Daemon on localhost               N/A       N/A        Y      
1732
> Self-heal Daemon on gluster3                N/A       N/A        Y      
2077
>
> vs.
>
> gluster volume heal workdata statistics heal-count
> Gathering count of entries to be healed on volume workdata has been
successful
>
> Brick gluster1:/gluster/md4/workdata
> Number of entries: 0
>
> Brick gluster2:/gluster/md4/workdata
> Number of entries: 10745
>
> Brick gluster3:/gluster/md4/workdata
> Number of entries: 10744
>
> Am Di., 5. M?rz 2019 um 08:18 Uhr schrieb Hu Bert <revirii at
googlemail.com>:
> >
> > Hi Miling,
> >
> > well, there are such entries, but those haven't been a problem
during
> > install and the last kernel update+reboot. The entries look like:
> >
> > PUBLIC_IP  gluster2.alpserver.de gluster2
> >
> > 192.168.0.50 gluster1
> > 192.168.0.51 gluster2
> > 192.168.0.52 gluster3
> >
> > 'ping gluster2' resolves to LAN IP; I removed the last entry
in the
> > 1st line, did a reboot ... no, didn't help. From
> > /var/log/glusterfs/glusterd.log
> >  on gluster 2:
> >
> > [2019-03-05 07:04:36.188128] E [MSGID: 106010]
> > [glusterd-utils.c:3483:glusterd_compare_friend_volume] 0-management:
> > Version of Cksums persistent differ. local cksum = 3950307018, remote
> > cksum = 455409345 on peer gluster1
> > [2019-03-05 07:04:36.188314] I [MSGID: 106493]
> > [glusterd-handler.c:3843:glusterd_xfer_friend_add_resp] 0-glusterd:
> > Responded to gluster1 (0), ret: 0, op_ret: -1
> >
> > Interestingly there are no entries in the brick logs of the rejected
> > server. Well, not surprising as no brick process is running. The
> > server gluster1 is still in rejected state.
> >
> > 'gluster volume start workdata force' starts the brick process
on
> > gluster1, and some heals are happening on gluster2+3, but via
'gluster
> > volume status workdata' the volumes still aren't complete.
> >
> > gluster1:
> >
------------------------------------------------------------------------------
> > Brick gluster1:/gluster/md4/workdata        49152     0          Y    
2523
> > Self-heal Daemon on localhost               N/A       N/A        Y    
2549
> >
> > gluster2:
> > Gluster process                             TCP Port  RDMA Port 
Online  Pid
> >
------------------------------------------------------------------------------
> > Brick gluster2:/gluster/md4/workdata        49153     0          Y    
1723
> > Brick gluster3:/gluster/md4/workdata        49153     0          Y    
2068
> > Self-heal Daemon on localhost               N/A       N/A        Y    
1732
> > Self-heal Daemon on gluster3                N/A       N/A        Y    
2077
> >
> >
> > Hubert
> >
> > Am Di., 5. M?rz 2019 um 07:58 Uhr schrieb Milind Changire <mchangir
at redhat.com>:
> > >
> > > There are probably DNS entries or /etc/hosts entries with the
public IP Addresses that the host names (gluster1, gluster2, gluster3) are
getting resolved to.
> > > /etc/resolv.conf would tell which is the default domain searched
for the node names and the DNS servers which respond to the queries.
> > >
> > >
> > > On Tue, Mar 5, 2019 at 12:14 PM Hu Bert <revirii at
googlemail.com> wrote:
> > >>
> > >> Good morning,
> > >>
> > >> i have a replicate 3 setup with 2 volumes, running on version
5.3 on
> > >> debian stretch. This morning i upgraded one server to version
5.4 and
> > >> rebooted the machine; after the restart i noticed that:
> > >>
> > >> - no brick process is running
> > >> - gluster volume status only shows the server itself:
> > >> gluster volume status workdata
> > >> Status of volume: workdata
> > >> Gluster process                             TCP Port  RDMA
Port  Online  Pid
> > >>
------------------------------------------------------------------------------
> > >> Brick gluster1:/gluster/md4/workdata        N/A       N/A    
N       N/A
> > >> NFS Server on localhost                     N/A       N/A    
N       N/A
> > >>
> > >> - gluster peer status on the server
> > >> gluster peer status
> > >> Number of Peers: 2
> > >>
> > >> Hostname: gluster3
> > >> Uuid: c7b4a448-ca6a-4051-877f-788f9ee9bc4a
> > >> State: Peer Rejected (Connected)
> > >>
> > >> Hostname: gluster2
> > >> Uuid: 162fea82-406a-4f51-81a3-e90235d8da27
> > >> State: Peer Rejected (Connected)
> > >>
> > >> - gluster peer status on the other 2 servers:
> > >> gluster peer status
> > >> Number of Peers: 2
> > >>
> > >> Hostname: gluster1
> > >> Uuid: 9a360776-7b58-49ae-831e-a0ce4e4afbef
> > >> State: Peer Rejected (Connected)
> > >>
> > >> Hostname: gluster3
> > >> Uuid: c7b4a448-ca6a-4051-877f-788f9ee9bc4a
> > >> State: Peer in Cluster (Connected)
> > >>
> > >> I noticed that, in the brick logs, i see that the public IP
is used
> > >> instead of the LAN IP. brick logs from one of the volumes:
> > >>
> > >> rejected node: https://pastebin.com/qkpj10Sd
> > >> connected nodes: https://pastebin.com/8SxVVYFV
> > >>
> > >> Why is the public IP suddenly used instead of the LAN IP?
Killing all
> > >> gluster processes and rebooting (again) didn't help.
> > >>
> > >>
> > >> Thx,
> > >> Hubert
> > >> _______________________________________________
> > >> Gluster-users mailing list
> > >> Gluster-users at gluster.org
> > >> https://lists.gluster.org/mailman/listinfo/gluster-users
> > >
> > >
> > >
> > > --
> > > Milind
> > >
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users


-- 
Regards,
Hari Gowtham.

Gluster users - Mar 2019 - Upgrade 5.3 -> 5.4 on debian: public IP is used instead of LAN IP

[Gluster-users] Upgrade 5.3 -> 5.4 on debian: public IP is used instead of LAN IP

[Gluster-users] Upgrade 5.3 -> 5.4 on debian: public IP is used instead of LAN IP