Strahil Nikolov
2021-Sep-21 14:44 UTC
[Gluster-users] gluster update question regarding new DNS resolution requirement
As gf_resolve_ip6 fails, I guess you can disable ipv6 on the host (if not using the protocol) and check if it will workaround the problem till it's solved. For RH you can check?https://access.redhat.com/solutions/8709 (use RH dev subscription to read it, or ping me directly and I will try to summarize it for your OS version). Best Regards,Strahil Nikolov On Mon, Sep 20, 2021 at 19:35, Erik Jacobson<erik.jacobson at hpe.com> wrote: I missed the other important log snip: The message "E [MSGID: 101075] [common-utils.c:520:gf_resolve_ip6] 0-resolver: error in getaddrinfo [{family=10}, {ret=Address family for hostname not supported}]" repeated 620 times between [2021-09-20 15:49:23.720633 +0000] and [2021-09-20 15:50:41.731542 +0000] So I will dig in to the code some here. On Mon, Sep 20, 2021 at 10:59:30AM -0500, Erik Jacobson wrote:> Hello all! I hope you are well. > > We are starting a new software release cycle and I am trying to find a > way to upgrade customers from our build of gluster 7.9 to our build of > gluster 9.3 > > When we deploy gluster, we foribly remove all references to any host > names and use only IP addresses. This is because, if for any reason a > DNS server is unreachable, even if the peer files have IPs and DNS, it > causes glusterd to be unable to reach peers properly. We can't really > rely on /etc/hosts either because customers take artistic licene with > their /etc/hosts files and don't realize that problems that can cause. > > So our deployed peer files look something like this: > > uuid=46a4b506-029d-4750-acfb-894501a88977 > state=3 > hostname1=172.23.0.16 > > That is, with full intention, we avoid host names. > > When we upgrade to gluster 9.3, we fall over with these errors and > gluster is now partitioned and the updated gluster servers can't reach > anybody: > > [2021-09-20 15:50:41.731543 +0000] E [name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host 172.23.0.16 > > > As you can see, we have defined on purpose everything using IPs but in > 9.3 it appears this method fails. Are there any suggestions short of > putting real host names in peer files? > > > > FYI > > This supercomputer will be using gluster for part of its system > management. It is how we deploy the Image Objects (squashfs images) > hosted on NFS today and served by gluster leader nodes and also store > system logs, console logs, and other data. > > https://www.olcf.ornl.gov/frontier/ > > > Erik > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://meet.google.com/cpu-eiue-hvk > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20210921/02ef7675/attachment.html>
Erik Jacobson
2021-Sep-21 14:51 UTC
[Gluster-users] gluster update question regarding new DNS resolution requirement
There is a discussion in -devel as well. I came at this just thinking "an update should work" and did take a quick look at release notes for 9.0 and 9.3. Come to think of it, I didn't read the Gluster8 relnotes so maybe that's why I missed this. We were at 7.9 and I read 9.0 and 9.3. We can't really disable IPV6 100% here. Well we could today but we'd have to open it again in a couple months. Our main head node already needs to talk to some IPV6-only stuff while also talking to IPV4 stuff. These leaders (gluster servers) will need to speak IPV6 very soon at least minimally. Some controllers are starting to appear, which these 'leader' nodes need to talk to, that are IPV6-only. It sounds like what you wrote is true though, that if there is any IPV6 around that function thinks that's what you want is IPV6. A couple private replies (thank you!!) also mentioned this. Maybe we'll have to make a more formal version of the patch rather than just force-setting IPV4 (for our internal use) later on. Basically, I am in the "once in a year" window where I can update gluster and get complete testing to be sure we don't have regressions so we'll keep moving forward with 9.3 with the ipv4 hack in place for now. This helps me get the context thank you for this note !! Erik On Tue, Sep 21, 2021 at 02:44:36PM +0000, Strahil Nikolov wrote:> As gf_resolve_ip6 fails, I guess you can disable ipv6 on the host (if not using > the protocol) and check if it will workaround the problem till it's solved. > > For RH you can check https://access.redhat.com/solutions/8709 (use RH dev > subscription to read it, or ping me directly and I will try to summarize it for > your OS version). > > > Best Regards, > Strahil Nikolov > > > On Mon, Sep 20, 2021 at 19:35, Erik Jacobson > <erik.jacobson at hpe.com> wrote: > I missed the other important log snip: > > The message "E [MSGID: 101075] [common-utils.c:520:gf_resolve_ip6] > 0-resolver: error in getaddrinfo [{family=10}, {ret=Address family for > hostname not supported}]" repeated 620 times between [2021-09-20 > 15:49:23.720633 +0000] and [2021-09-20 15:50:41.731542 +0000] > > So I will dig in to the code some here. > > > On Mon, Sep 20, 2021 at 10:59:30AM -0500, Erik Jacobson wrote: > > Hello all! I hope you are well. > > > > We are starting a new software release cycle and I am trying to find a > > way to upgrade customers from our build of gluster 7.9 to our build of > > gluster 9.3 > > > > When we deploy gluster, we foribly remove all references to any host > > names and use only IP addresses. This is because, if for any reason a > > DNS server is unreachable, even if the peer files have IPs and DNS, it > > causes glusterd to be unable to reach peers properly. We can't really > > rely on /etc/hosts either because customers take artistic licene with > > their /etc/hosts files and don't realize that problems that can cause. > > > > So our deployed peer files look something like this: > > > > uuid=46a4b506-029d-4750-acfb-894501a88977 > > state=3 > > hostname1=172.23.0.16 > > > > That is, with full intention, we avoid host names. > > > > When we upgrade to gluster 9.3, we fall over with these errors and > > gluster is now partitioned and the updated gluster servers can't reach > > anybody: > > > > [2021-09-20 15:50:41.731543 +0000] E > [name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS > resolution failed on host 172.23.0.16 > > > > > > As you can see, we have defined on purpose everything using IPs but in > > 9.3 it appears this method fails. Are there any suggestions short of > > putting real host names in peer files? > > > > > > > > FYI > > > > This supercomputer will be using gluster for part of its system > > management. It is how we deploy the Image Objects (squashfs images) > > hosted on NFS today and served by gluster leader nodes and also store > > system logs, console logs, and other data. > > > > https://www.olcf.ornl.gov/frontier/ > > > > > > Erik > > ________ > > > > > > > > Community Meeting Calendar: > > > > Schedule - > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > > Bridge: https://meet.google.com/cpu-eiue-hvk > > Gluster-users mailing list > > Gluster-users at gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-users > > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://meet.google.com/cpu-eiue-hvk > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users >
Strahil Nikolov
2021-Sep-21 15:00 UTC
[Gluster-users] gluster update question regarding new DNS resolution requirement
Another option that comes to my mind is to use dnsmasq locally (/etc/resolv.conf pointing to it) as a caching layer and thus you will be able to survive a DNS issue . This is how we run our whole infra as we solely rely on FQDNs. Of course it has it's own drawbacks, so it should be considered carefully. P.S. If you decide to go that way, don't forget to put 127.0.0.1 as the first resolver and the "upstream" dns on the second & third location. Prevents dns issues when restarting dnsmasq. Best Regards, Strahil Nikolov ? ???????, 21 ????????? 2021 ?., 17:51:25 ?. ???????+3, Erik Jacobson <erik.jacobson at hpe.com> ??????: There is a discussion in -devel as well. I came at this just thinking "an update should work" and did take a quick look at release notes for 9.0 and 9.3. Come to think of it, I didn't read the Gluster8 relnotes so maybe that's why I missed this. We were at 7.9 and I read 9.0 and 9.3. We can't really disable IPV6 100% here. Well we could today but we'd have to open it again in a couple months. Our main head node already needs to talk to some IPV6-only stuff while also talking to IPV4 stuff. These leaders (gluster servers) will need to speak IPV6 very soon at least minimally. Some controllers are starting to appear, which these 'leader' nodes need to talk to, that are IPV6-only. It sounds like what you wrote is true though, that if there is any IPV6 around that function thinks that's what you want is IPV6. A couple private replies (thank you!!) also mentioned this. Maybe we'll have to make a more formal version of the patch rather than just force-setting IPV4 (for our internal use) later on. Basically, I am in the "once in a year" window where I can update gluster and get complete testing to be sure we don't have regressions so we'll keep moving forward with 9.3 with the ipv4 hack in place for now. This helps me get the context thank you for this note !! Erik On Tue, Sep 21, 2021 at 02:44:36PM +0000, Strahil Nikolov wrote:> As gf_resolve_ip6 fails, I guess you can disable ipv6 on the host (if not using > the protocol) and check if it will workaround the problem till it's solved. > > For RH you can check https://access.redhat.com/solutions/8709 (use RH dev > subscription to read it, or ping me directly and I will try to summarize it for > your OS version). > > > Best Regards, > Strahil Nikolov > > >? ? On Mon, Sep 20, 2021 at 19:35, Erik Jacobson >? ? <erik.jacobson at hpe.com> wrote: >? ? I missed the other important log snip: > >? ? The message "E [MSGID: 101075] [common-utils.c:520:gf_resolve_ip6] >? ? 0-resolver: error in getaddrinfo [{family=10}, {ret=Address family for >? ? hostname not supported}]" repeated 620 times between [2021-09-20 >? ? 15:49:23.720633 +0000] and [2021-09-20 15:50:41.731542 +0000] > >? ? So I will dig in to the code some here. > > >? ? On Mon, Sep 20, 2021 at 10:59:30AM -0500, Erik Jacobson wrote: >? ? > Hello all! I hope you are well. >? ? > >? ? > We are starting a new software release cycle and I am trying to find a >? ? > way to upgrade customers from our build of gluster 7.9 to our build of >? ? > gluster 9.3 >? ? > >? ? > When we deploy gluster, we foribly remove all references to any host >? ? > names and use only IP addresses. This is because, if for any reason a >? ? > DNS server is unreachable, even if the peer files have IPs and DNS, it >? ? > causes glusterd to be unable to reach peers properly. We can't really >? ? > rely on /etc/hosts either because customers take artistic licene with >? ? > their /etc/hosts files and don't realize that problems that can cause. >? ? > >? ? > So our deployed peer files look something like this: >? ? > >? ? > uuid=46a4b506-029d-4750-acfb-894501a88977 >? ? > state=3 >? ? > hostname1=172.23.0.16 >? ? > >? ? > That is, with full intention, we avoid host names. >? ? > >? ? > When we upgrade to gluster 9.3, we fall over with these errors and >? ? > gluster is now partitioned and the updated gluster servers can't reach >? ? > anybody: >? ? > >? ? > [2021-09-20 15:50:41.731543 +0000] E >? ? [name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS >? ? resolution failed on host 172.23.0.16 >? ? > >? ? > >? ? > As you can see, we have defined on purpose everything using IPs but in >? ? > 9.3 it appears this method fails. Are there any suggestions short of >? ? > putting real host names in peer files? >? ? > >? ? > >? ? > >? ? > FYI >? ? > >? ? > This supercomputer will be using gluster for part of its system >? ? > management. It is how we deploy the Image Objects (squashfs images) >? ? > hosted on NFS today and served by gluster leader nodes and also store >? ? > system logs, console logs, and other data. >? ? > >? ? > https://www.olcf.ornl.gov/frontier/ >? ? > >? ? > >? ? > Erik >? ? > ________ >? ? > >? ? > >? ? > >? ? > Community Meeting Calendar: >? ? > >? ? > Schedule - >? ? > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >? ? > Bridge: https://meet.google.com/cpu-eiue-hvk >? ? > Gluster-users mailing list >? ? > Gluster-users at gluster.org >? ? > https://lists.gluster.org/mailman/listinfo/gluster-users >? ? >? ? ________ > > > >? ? Community Meeting Calendar: > >? ? Schedule - >? ? Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >? ? Bridge: https://meet.google.com/cpu-eiue-hvk >? ? Gluster-users mailing list >? ? Gluster-users at gluster.org >? ? https://lists.gluster.org/mailman/listinfo/gluster-users >