thr3ads.net - Gluster users - [Gluster-users] Upgrade 5.3 -> 5.4 on debian: public IP is used instead of LAN IP [Mar 2019]

If this information is useful, please help other people find it:
Share via:

Hu Bert

2019-Mar-05 07:36 UTC

[Gluster-users] Upgrade 5.3 -> 5.4 on debian: public IP is used instead of LAN IP

Hi Hari,

thx for the hint. Do you know when this will be fixed? Is a downgrade
5.4 -> 5.3 a possibility to fix this?

Hubert

Am Di., 5. M?rz 2019 um 08:32 Uhr schrieb Hari Gowtham <hgowtham at
redhat.com>:>
> Hi,
>
> This is a known issue we are working on.
> As the checksum differs between the updated and non updated node, the
> peers are getting rejected.
> The bricks aren't coming because of the same issue.
>
> More about the issue: https://bugzilla.redhat.com/show_bug.cgi?id=1685120
>
> On Tue, Mar 5, 2019 at 12:56 PM Hu Bert <revirii at googlemail.com>
wrote:
> >
> > Interestingly: gluster volume status misses gluster1, while heal
> > statistics show gluster1:
> >
> > gluster volume status workdata
> > Status of volume: workdata
> > Gluster process                             TCP Port  RDMA Port 
Online  Pid
> >
------------------------------------------------------------------------------
> > Brick gluster2:/gluster/md4/workdata        49153     0          Y    
1723
> > Brick gluster3:/gluster/md4/workdata        49153     0          Y    
2068
> > Self-heal Daemon on localhost               N/A       N/A        Y    
1732
> > Self-heal Daemon on gluster3                N/A       N/A        Y    
2077
> >
> > vs.
> >
> > gluster volume heal workdata statistics heal-count
> > Gathering count of entries to be healed on volume workdata has been
successful
> >
> > Brick gluster1:/gluster/md4/workdata
> > Number of entries: 0
> >
> > Brick gluster2:/gluster/md4/workdata
> > Number of entries: 10745
> >
> > Brick gluster3:/gluster/md4/workdata
> > Number of entries: 10744
> >
> > Am Di., 5. M?rz 2019 um 08:18 Uhr schrieb Hu Bert <revirii at
googlemail.com>:
> > >
> > > Hi Miling,
> > >
> > > well, there are such entries, but those haven't been a
problem during
> > > install and the last kernel update+reboot. The entries look like:
> > >
> > > PUBLIC_IP  gluster2.alpserver.de gluster2
> > >
> > > 192.168.0.50 gluster1
> > > 192.168.0.51 gluster2
> > > 192.168.0.52 gluster3
> > >
> > > 'ping gluster2' resolves to LAN IP; I removed the last
entry in the
> > > 1st line, did a reboot ... no, didn't help. From
> > > /var/log/glusterfs/glusterd.log
> > >  on gluster 2:
> > >
> > > [2019-03-05 07:04:36.188128] E [MSGID: 106010]
> > > [glusterd-utils.c:3483:glusterd_compare_friend_volume]
0-management:
> > > Version of Cksums persistent differ. local cksum = 3950307018,
remote
> > > cksum = 455409345 on peer gluster1
> > > [2019-03-05 07:04:36.188314] I [MSGID: 106493]
> > > [glusterd-handler.c:3843:glusterd_xfer_friend_add_resp]
0-glusterd:
> > > Responded to gluster1 (0), ret: 0, op_ret: -1
> > >
> > > Interestingly there are no entries in the brick logs of the
rejected
> > > server. Well, not surprising as no brick process is running. The
> > > server gluster1 is still in rejected state.
> > >
> > > 'gluster volume start workdata force' starts the brick
process on
> > > gluster1, and some heals are happening on gluster2+3, but via
'gluster
> > > volume status workdata' the volumes still aren't
complete.
> > >
> > > gluster1:
> > >
------------------------------------------------------------------------------
> > > Brick gluster1:/gluster/md4/workdata        49152     0         
Y       2523
> > > Self-heal Daemon on localhost               N/A       N/A       
Y       2549
> > >
> > > gluster2:
> > > Gluster process                             TCP Port  RDMA Port 
Online  Pid
> > >
------------------------------------------------------------------------------
> > > Brick gluster2:/gluster/md4/workdata        49153     0         
Y       1723
> > > Brick gluster3:/gluster/md4/workdata        49153     0         
Y       2068
> > > Self-heal Daemon on localhost               N/A       N/A       
Y       1732
> > > Self-heal Daemon on gluster3                N/A       N/A       
Y       2077
> > >
> > >
> > > Hubert
> > >
> > > Am Di., 5. M?rz 2019 um 07:58 Uhr schrieb Milind Changire
<mchangir at redhat.com>:
> > > >
> > > > There are probably DNS entries or /etc/hosts entries with
the public IP Addresses that the host names (gluster1, gluster2, gluster3) are
getting resolved to.
> > > > /etc/resolv.conf would tell which is the default domain
searched for the node names and the DNS servers which respond to the queries.
> > > >
> > > >
> > > > On Tue, Mar 5, 2019 at 12:14 PM Hu Bert <revirii at
googlemail.com> wrote:
> > > >>
> > > >> Good morning,
> > > >>
> > > >> i have a replicate 3 setup with 2 volumes, running on
version 5.3 on
> > > >> debian stretch. This morning i upgraded one server to
version 5.4 and
> > > >> rebooted the machine; after the restart i noticed that:
> > > >>
> > > >> - no brick process is running
> > > >> - gluster volume status only shows the server itself:
> > > >> gluster volume status workdata
> > > >> Status of volume: workdata
> > > >> Gluster process                             TCP Port 
RDMA Port  Online  Pid
> > > >>
------------------------------------------------------------------------------
> > > >> Brick gluster1:/gluster/md4/workdata        N/A      
N/A        N       N/A
> > > >> NFS Server on localhost                     N/A      
N/A        N       N/A
> > > >>
> > > >> - gluster peer status on the server
> > > >> gluster peer status
> > > >> Number of Peers: 2
> > > >>
> > > >> Hostname: gluster3
> > > >> Uuid: c7b4a448-ca6a-4051-877f-788f9ee9bc4a
> > > >> State: Peer Rejected (Connected)
> > > >>
> > > >> Hostname: gluster2
> > > >> Uuid: 162fea82-406a-4f51-81a3-e90235d8da27
> > > >> State: Peer Rejected (Connected)
> > > >>
> > > >> - gluster peer status on the other 2 servers:
> > > >> gluster peer status
> > > >> Number of Peers: 2
> > > >>
> > > >> Hostname: gluster1
> > > >> Uuid: 9a360776-7b58-49ae-831e-a0ce4e4afbef
> > > >> State: Peer Rejected (Connected)
> > > >>
> > > >> Hostname: gluster3
> > > >> Uuid: c7b4a448-ca6a-4051-877f-788f9ee9bc4a
> > > >> State: Peer in Cluster (Connected)
> > > >>
> > > >> I noticed that, in the brick logs, i see that the public
IP is used
> > > >> instead of the LAN IP. brick logs from one of the
volumes:
> > > >>
> > > >> rejected node: https://pastebin.com/qkpj10Sd
> > > >> connected nodes: https://pastebin.com/8SxVVYFV
> > > >>
> > > >> Why is the public IP suddenly used instead of the LAN
IP? Killing all
> > > >> gluster processes and rebooting (again) didn't help.
> > > >>
> > > >>
> > > >> Thx,
> > > >> Hubert
> > > >> _______________________________________________
> > > >> Gluster-users mailing list
> > > >> Gluster-users at gluster.org
> > > >> https://lists.gluster.org/mailman/listinfo/gluster-users
> > > >
> > > >
> > > >
> > > > --
> > > > Milind
> > > >
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
> --
> Regards,
> Hari Gowtham.

Hari Gowtham

2019-Mar-05 08:26 UTC

head link

[Gluster-users] Upgrade 5.3 -> 5.4 on debian: public IP is used instead of LAN IP

There are plans to revert the patch causing this error and rebuilt 5.4.
This should happen faster. the rebuilt 5.4 should be void of this upgrade issue.

In the meantime, you can use 5.3 for this cluster.
Downgrading to 5.3 will work if it was just one node that was upgrade to 5.4
and the other nodes are still in 5.3.

On Tue, Mar 5, 2019 at 1:07 PM Hu Bert <revirii at googlemail.com>
wrote:>
> Hi Hari,
>
> thx for the hint. Do you know when this will be fixed? Is a downgrade
> 5.4 -> 5.3 a possibility to fix this?
>
> Hubert
>
> Am Di., 5. M?rz 2019 um 08:32 Uhr schrieb Hari Gowtham <hgowtham at
redhat.com>:
> >
> > Hi,
> >
> > This is a known issue we are working on.
> > As the checksum differs between the updated and non updated node, the
> > peers are getting rejected.
> > The bricks aren't coming because of the same issue.
> >
> > More about the issue:
https://bugzilla.redhat.com/show_bug.cgi?id=1685120
> >
> > On Tue, Mar 5, 2019 at 12:56 PM Hu Bert <revirii at
googlemail.com> wrote:
> > >
> > > Interestingly: gluster volume status misses gluster1, while heal
> > > statistics show gluster1:
> > >
> > > gluster volume status workdata
> > > Status of volume: workdata
> > > Gluster process                             TCP Port  RDMA Port 
Online  Pid
> > >
------------------------------------------------------------------------------
> > > Brick gluster2:/gluster/md4/workdata        49153     0         
Y       1723
> > > Brick gluster3:/gluster/md4/workdata        49153     0         
Y       2068
> > > Self-heal Daemon on localhost               N/A       N/A       
Y       1732
> > > Self-heal Daemon on gluster3                N/A       N/A       
Y       2077
> > >
> > > vs.
> > >
> > > gluster volume heal workdata statistics heal-count
> > > Gathering count of entries to be healed on volume workdata has
been successful
> > >
> > > Brick gluster1:/gluster/md4/workdata
> > > Number of entries: 0
> > >
> > > Brick gluster2:/gluster/md4/workdata
> > > Number of entries: 10745
> > >
> > > Brick gluster3:/gluster/md4/workdata
> > > Number of entries: 10744
> > >
> > > Am Di., 5. M?rz 2019 um 08:18 Uhr schrieb Hu Bert <revirii at
googlemail.com>:
> > > >
> > > > Hi Miling,
> > > >
> > > > well, there are such entries, but those haven't been a
problem during
> > > > install and the last kernel update+reboot. The entries look
like:
> > > >
> > > > PUBLIC_IP  gluster2.alpserver.de gluster2
> > > >
> > > > 192.168.0.50 gluster1
> > > > 192.168.0.51 gluster2
> > > > 192.168.0.52 gluster3
> > > >
> > > > 'ping gluster2' resolves to LAN IP; I removed the
last entry in the
> > > > 1st line, did a reboot ... no, didn't help. From
> > > > /var/log/glusterfs/glusterd.log
> > > >  on gluster 2:
> > > >
> > > > [2019-03-05 07:04:36.188128] E [MSGID: 106010]
> > > > [glusterd-utils.c:3483:glusterd_compare_friend_volume]
0-management:
> > > > Version of Cksums persistent differ. local cksum =
3950307018, remote
> > > > cksum = 455409345 on peer gluster1
> > > > [2019-03-05 07:04:36.188314] I [MSGID: 106493]
> > > > [glusterd-handler.c:3843:glusterd_xfer_friend_add_resp]
0-glusterd:
> > > > Responded to gluster1 (0), ret: 0, op_ret: -1
> > > >
> > > > Interestingly there are no entries in the brick logs of the
rejected
> > > > server. Well, not surprising as no brick process is running.
The
> > > > server gluster1 is still in rejected state.
> > > >
> > > > 'gluster volume start workdata force' starts the
brick process on
> > > > gluster1, and some heals are happening on gluster2+3, but
via 'gluster
> > > > volume status workdata' the volumes still aren't
complete.
> > > >
> > > > gluster1:
> > > >
------------------------------------------------------------------------------
> > > > Brick gluster1:/gluster/md4/workdata        49152     0     
Y       2523
> > > > Self-heal Daemon on localhost               N/A       N/A   
Y       2549
> > > >
> > > > gluster2:
> > > > Gluster process                             TCP Port  RDMA
Port  Online  Pid
> > > >
------------------------------------------------------------------------------
> > > > Brick gluster2:/gluster/md4/workdata        49153     0     
Y       1723
> > > > Brick gluster3:/gluster/md4/workdata        49153     0     
Y       2068
> > > > Self-heal Daemon on localhost               N/A       N/A   
Y       1732
> > > > Self-heal Daemon on gluster3                N/A       N/A   
Y       2077
> > > >
> > > >
> > > > Hubert
> > > >
> > > > Am Di., 5. M?rz 2019 um 07:58 Uhr schrieb Milind Changire
<mchangir at redhat.com>:
> > > > >
> > > > > There are probably DNS entries or /etc/hosts entries
with the public IP Addresses that the host names (gluster1, gluster2, gluster3)
are getting resolved to.
> > > > > /etc/resolv.conf would tell which is the default domain
searched for the node names and the DNS servers which respond to the queries.
> > > > >
> > > > >
> > > > > On Tue, Mar 5, 2019 at 12:14 PM Hu Bert <revirii at
googlemail.com> wrote:
> > > > >>
> > > > >> Good morning,
> > > > >>
> > > > >> i have a replicate 3 setup with 2 volumes, running
on version 5.3 on
> > > > >> debian stretch. This morning i upgraded one server
to version 5.4 and
> > > > >> rebooted the machine; after the restart i noticed
that:
> > > > >>
> > > > >> - no brick process is running
> > > > >> - gluster volume status only shows the server
itself:
> > > > >> gluster volume status workdata
> > > > >> Status of volume: workdata
> > > > >> Gluster process                             TCP
Port  RDMA Port  Online  Pid
> > > > >>
------------------------------------------------------------------------------
> > > > >> Brick gluster1:/gluster/md4/workdata        N/A    
N/A        N       N/A
> > > > >> NFS Server on localhost                     N/A    
N/A        N       N/A
> > > > >>
> > > > >> - gluster peer status on the server
> > > > >> gluster peer status
> > > > >> Number of Peers: 2
> > > > >>
> > > > >> Hostname: gluster3
> > > > >> Uuid: c7b4a448-ca6a-4051-877f-788f9ee9bc4a
> > > > >> State: Peer Rejected (Connected)
> > > > >>
> > > > >> Hostname: gluster2
> > > > >> Uuid: 162fea82-406a-4f51-81a3-e90235d8da27
> > > > >> State: Peer Rejected (Connected)
> > > > >>
> > > > >> - gluster peer status on the other 2 servers:
> > > > >> gluster peer status
> > > > >> Number of Peers: 2
> > > > >>
> > > > >> Hostname: gluster1
> > > > >> Uuid: 9a360776-7b58-49ae-831e-a0ce4e4afbef
> > > > >> State: Peer Rejected (Connected)
> > > > >>
> > > > >> Hostname: gluster3
> > > > >> Uuid: c7b4a448-ca6a-4051-877f-788f9ee9bc4a
> > > > >> State: Peer in Cluster (Connected)
> > > > >>
> > > > >> I noticed that, in the brick logs, i see that the
public IP is used
> > > > >> instead of the LAN IP. brick logs from one of the
volumes:
> > > > >>
> > > > >> rejected node: https://pastebin.com/qkpj10Sd
> > > > >> connected nodes: https://pastebin.com/8SxVVYFV
> > > > >>
> > > > >> Why is the public IP suddenly used instead of the
LAN IP? Killing all
> > > > >> gluster processes and rebooting (again) didn't
help.
> > > > >>
> > > > >>
> > > > >> Thx,
> > > > >> Hubert
> > > > >> _______________________________________________
> > > > >> Gluster-users mailing list
> > > > >> Gluster-users at gluster.org
> > > > >>
https://lists.gluster.org/mailman/listinfo/gluster-users
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Milind
> > > > >
> > > _______________________________________________
> > > Gluster-users mailing list
> > > Gluster-users at gluster.org
> > > https://lists.gluster.org/mailman/listinfo/gluster-users
> >
> >
> >
> > --
> > Regards,
> > Hari Gowtham.


-- 
Regards,
Hari Gowtham.

Gluster users - Mar 2019 - Upgrade 5.3 -> 5.4 on debian: public IP is used instead of LAN IP

[Gluster-users] Upgrade 5.3 -> 5.4 on debian: public IP is used instead of LAN IP

[Gluster-users] Upgrade 5.3 -> 5.4 on debian: public IP is used instead of LAN IP