Hu Bert
2019-Mar-05 07:23 UTC
[Gluster-users] Upgrade 5.3 -> 5.4 on debian: public IP is used instead of LAN IP
Interestingly: gluster volume status misses gluster1, while heal statistics show gluster1: gluster volume status workdata Status of volume: workdata Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick gluster2:/gluster/md4/workdata 49153 0 Y 1723 Brick gluster3:/gluster/md4/workdata 49153 0 Y 2068 Self-heal Daemon on localhost N/A N/A Y 1732 Self-heal Daemon on gluster3 N/A N/A Y 2077 vs. gluster volume heal workdata statistics heal-count Gathering count of entries to be healed on volume workdata has been successful Brick gluster1:/gluster/md4/workdata Number of entries: 0 Brick gluster2:/gluster/md4/workdata Number of entries: 10745 Brick gluster3:/gluster/md4/workdata Number of entries: 10744 Am Di., 5. M?rz 2019 um 08:18 Uhr schrieb Hu Bert <revirii at googlemail.com>:> > Hi Miling, > > well, there are such entries, but those haven't been a problem during > install and the last kernel update+reboot. The entries look like: > > PUBLIC_IP gluster2.alpserver.de gluster2 > > 192.168.0.50 gluster1 > 192.168.0.51 gluster2 > 192.168.0.52 gluster3 > > 'ping gluster2' resolves to LAN IP; I removed the last entry in the > 1st line, did a reboot ... no, didn't help. From > /var/log/glusterfs/glusterd.log > on gluster 2: > > [2019-03-05 07:04:36.188128] E [MSGID: 106010] > [glusterd-utils.c:3483:glusterd_compare_friend_volume] 0-management: > Version of Cksums persistent differ. local cksum = 3950307018, remote > cksum = 455409345 on peer gluster1 > [2019-03-05 07:04:36.188314] I [MSGID: 106493] > [glusterd-handler.c:3843:glusterd_xfer_friend_add_resp] 0-glusterd: > Responded to gluster1 (0), ret: 0, op_ret: -1 > > Interestingly there are no entries in the brick logs of the rejected > server. Well, not surprising as no brick process is running. The > server gluster1 is still in rejected state. > > 'gluster volume start workdata force' starts the brick process on > gluster1, and some heals are happening on gluster2+3, but via 'gluster > volume status workdata' the volumes still aren't complete. > > gluster1: > ------------------------------------------------------------------------------ > Brick gluster1:/gluster/md4/workdata 49152 0 Y 2523 > Self-heal Daemon on localhost N/A N/A Y 2549 > > gluster2: > Gluster process TCP Port RDMA Port Online Pid > ------------------------------------------------------------------------------ > Brick gluster2:/gluster/md4/workdata 49153 0 Y 1723 > Brick gluster3:/gluster/md4/workdata 49153 0 Y 2068 > Self-heal Daemon on localhost N/A N/A Y 1732 > Self-heal Daemon on gluster3 N/A N/A Y 2077 > > > Hubert > > Am Di., 5. M?rz 2019 um 07:58 Uhr schrieb Milind Changire <mchangir at redhat.com>: > > > > There are probably DNS entries or /etc/hosts entries with the public IP Addresses that the host names (gluster1, gluster2, gluster3) are getting resolved to. > > /etc/resolv.conf would tell which is the default domain searched for the node names and the DNS servers which respond to the queries. > > > > > > On Tue, Mar 5, 2019 at 12:14 PM Hu Bert <revirii at googlemail.com> wrote: > >> > >> Good morning, > >> > >> i have a replicate 3 setup with 2 volumes, running on version 5.3 on > >> debian stretch. This morning i upgraded one server to version 5.4 and > >> rebooted the machine; after the restart i noticed that: > >> > >> - no brick process is running > >> - gluster volume status only shows the server itself: > >> gluster volume status workdata > >> Status of volume: workdata > >> Gluster process TCP Port RDMA Port Online Pid > >> ------------------------------------------------------------------------------ > >> Brick gluster1:/gluster/md4/workdata N/A N/A N N/A > >> NFS Server on localhost N/A N/A N N/A > >> > >> - gluster peer status on the server > >> gluster peer status > >> Number of Peers: 2 > >> > >> Hostname: gluster3 > >> Uuid: c7b4a448-ca6a-4051-877f-788f9ee9bc4a > >> State: Peer Rejected (Connected) > >> > >> Hostname: gluster2 > >> Uuid: 162fea82-406a-4f51-81a3-e90235d8da27 > >> State: Peer Rejected (Connected) > >> > >> - gluster peer status on the other 2 servers: > >> gluster peer status > >> Number of Peers: 2 > >> > >> Hostname: gluster1 > >> Uuid: 9a360776-7b58-49ae-831e-a0ce4e4afbef > >> State: Peer Rejected (Connected) > >> > >> Hostname: gluster3 > >> Uuid: c7b4a448-ca6a-4051-877f-788f9ee9bc4a > >> State: Peer in Cluster (Connected) > >> > >> I noticed that, in the brick logs, i see that the public IP is used > >> instead of the LAN IP. brick logs from one of the volumes: > >> > >> rejected node: https://pastebin.com/qkpj10Sd > >> connected nodes: https://pastebin.com/8SxVVYFV > >> > >> Why is the public IP suddenly used instead of the LAN IP? Killing all > >> gluster processes and rebooting (again) didn't help. > >> > >> > >> Thx, > >> Hubert > >> _______________________________________________ > >> Gluster-users mailing list > >> Gluster-users at gluster.org > >> https://lists.gluster.org/mailman/listinfo/gluster-users > > > > > > > > -- > > Milind > >
Hari Gowtham
2019-Mar-05 07:32 UTC
[Gluster-users] Upgrade 5.3 -> 5.4 on debian: public IP is used instead of LAN IP
Hi, This is a known issue we are working on. As the checksum differs between the updated and non updated node, the peers are getting rejected. The bricks aren't coming because of the same issue. More about the issue: https://bugzilla.redhat.com/show_bug.cgi?id=1685120 On Tue, Mar 5, 2019 at 12:56 PM Hu Bert <revirii at googlemail.com> wrote:> > Interestingly: gluster volume status misses gluster1, while heal > statistics show gluster1: > > gluster volume status workdata > Status of volume: workdata > Gluster process TCP Port RDMA Port Online Pid > ------------------------------------------------------------------------------ > Brick gluster2:/gluster/md4/workdata 49153 0 Y 1723 > Brick gluster3:/gluster/md4/workdata 49153 0 Y 2068 > Self-heal Daemon on localhost N/A N/A Y 1732 > Self-heal Daemon on gluster3 N/A N/A Y 2077 > > vs. > > gluster volume heal workdata statistics heal-count > Gathering count of entries to be healed on volume workdata has been successful > > Brick gluster1:/gluster/md4/workdata > Number of entries: 0 > > Brick gluster2:/gluster/md4/workdata > Number of entries: 10745 > > Brick gluster3:/gluster/md4/workdata > Number of entries: 10744 > > Am Di., 5. M?rz 2019 um 08:18 Uhr schrieb Hu Bert <revirii at googlemail.com>: > > > > Hi Miling, > > > > well, there are such entries, but those haven't been a problem during > > install and the last kernel update+reboot. The entries look like: > > > > PUBLIC_IP gluster2.alpserver.de gluster2 > > > > 192.168.0.50 gluster1 > > 192.168.0.51 gluster2 > > 192.168.0.52 gluster3 > > > > 'ping gluster2' resolves to LAN IP; I removed the last entry in the > > 1st line, did a reboot ... no, didn't help. From > > /var/log/glusterfs/glusterd.log > > on gluster 2: > > > > [2019-03-05 07:04:36.188128] E [MSGID: 106010] > > [glusterd-utils.c:3483:glusterd_compare_friend_volume] 0-management: > > Version of Cksums persistent differ. local cksum = 3950307018, remote > > cksum = 455409345 on peer gluster1 > > [2019-03-05 07:04:36.188314] I [MSGID: 106493] > > [glusterd-handler.c:3843:glusterd_xfer_friend_add_resp] 0-glusterd: > > Responded to gluster1 (0), ret: 0, op_ret: -1 > > > > Interestingly there are no entries in the brick logs of the rejected > > server. Well, not surprising as no brick process is running. The > > server gluster1 is still in rejected state. > > > > 'gluster volume start workdata force' starts the brick process on > > gluster1, and some heals are happening on gluster2+3, but via 'gluster > > volume status workdata' the volumes still aren't complete. > > > > gluster1: > > ------------------------------------------------------------------------------ > > Brick gluster1:/gluster/md4/workdata 49152 0 Y 2523 > > Self-heal Daemon on localhost N/A N/A Y 2549 > > > > gluster2: > > Gluster process TCP Port RDMA Port Online Pid > > ------------------------------------------------------------------------------ > > Brick gluster2:/gluster/md4/workdata 49153 0 Y 1723 > > Brick gluster3:/gluster/md4/workdata 49153 0 Y 2068 > > Self-heal Daemon on localhost N/A N/A Y 1732 > > Self-heal Daemon on gluster3 N/A N/A Y 2077 > > > > > > Hubert > > > > Am Di., 5. M?rz 2019 um 07:58 Uhr schrieb Milind Changire <mchangir at redhat.com>: > > > > > > There are probably DNS entries or /etc/hosts entries with the public IP Addresses that the host names (gluster1, gluster2, gluster3) are getting resolved to. > > > /etc/resolv.conf would tell which is the default domain searched for the node names and the DNS servers which respond to the queries. > > > > > > > > > On Tue, Mar 5, 2019 at 12:14 PM Hu Bert <revirii at googlemail.com> wrote: > > >> > > >> Good morning, > > >> > > >> i have a replicate 3 setup with 2 volumes, running on version 5.3 on > > >> debian stretch. This morning i upgraded one server to version 5.4 and > > >> rebooted the machine; after the restart i noticed that: > > >> > > >> - no brick process is running > > >> - gluster volume status only shows the server itself: > > >> gluster volume status workdata > > >> Status of volume: workdata > > >> Gluster process TCP Port RDMA Port Online Pid > > >> ------------------------------------------------------------------------------ > > >> Brick gluster1:/gluster/md4/workdata N/A N/A N N/A > > >> NFS Server on localhost N/A N/A N N/A > > >> > > >> - gluster peer status on the server > > >> gluster peer status > > >> Number of Peers: 2 > > >> > > >> Hostname: gluster3 > > >> Uuid: c7b4a448-ca6a-4051-877f-788f9ee9bc4a > > >> State: Peer Rejected (Connected) > > >> > > >> Hostname: gluster2 > > >> Uuid: 162fea82-406a-4f51-81a3-e90235d8da27 > > >> State: Peer Rejected (Connected) > > >> > > >> - gluster peer status on the other 2 servers: > > >> gluster peer status > > >> Number of Peers: 2 > > >> > > >> Hostname: gluster1 > > >> Uuid: 9a360776-7b58-49ae-831e-a0ce4e4afbef > > >> State: Peer Rejected (Connected) > > >> > > >> Hostname: gluster3 > > >> Uuid: c7b4a448-ca6a-4051-877f-788f9ee9bc4a > > >> State: Peer in Cluster (Connected) > > >> > > >> I noticed that, in the brick logs, i see that the public IP is used > > >> instead of the LAN IP. brick logs from one of the volumes: > > >> > > >> rejected node: https://pastebin.com/qkpj10Sd > > >> connected nodes: https://pastebin.com/8SxVVYFV > > >> > > >> Why is the public IP suddenly used instead of the LAN IP? Killing all > > >> gluster processes and rebooting (again) didn't help. > > >> > > >> > > >> Thx, > > >> Hubert > > >> _______________________________________________ > > >> Gluster-users mailing list > > >> Gluster-users at gluster.org > > >> https://lists.gluster.org/mailman/listinfo/gluster-users > > > > > > > > > > > > -- > > > Milind > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users-- Regards, Hari Gowtham.