thr3ads.net - Gluster users - [Gluster-users] gluster forcing IPV6 on our IPV4 servers, glusterd fails (was gluster update question regarding new DNS resolution requirement) [Sep 2021]

If this information is useful, please help other people find it:
Share via:

Strahil Nikolov

2021-Sep-21 14:50 UTC

[Gluster-users] gluster forcing IPV6 on our IPV4 servers, glusterd fails (was gluster update question regarding new DNS resolution requirement)

I would disable ipv6 if not needed.
Don't forget to run the geo-replication fix script , if you missed to do it
before the upgrade.
Best Regards,Strahil Nikolov
 
 
  On Tue, Sep 21, 2021 at 0:46, Erik Jacobson<erik.jacobson at hpe.com>
wrote:   I pretended I'm a low-level C programmer with network and
filesystem
experience for a few hours.

I'm not sure what the right solution is but what was happening was the
code was trying to treat our IPV4 hosts as AF_INET6 and the family was
incompatible with our IPV4 IP addresses. Yes, we need to move to IPV6
but we're hoping to do that on our own time (~50 years like everybody
else :)

I found a chunk of the code that seemed to be force-setting us to
AF_INET6.

While I'm sure it is not 100% the correct patch, the patch attached and
pasted below is working for me so I'll integrate it with our internal
build to continue testing.

Please let me know if there is a configuration item I missed or a
different way to do this. I added -devel to this email.

In the previous thread, you would have seen that we're testing a
hopeful change that will upgrade our deployed customers from gluster
7.9 to gluster 9.3.

Thank you!! Advice on next steps would be appreciated !!


diff -Narup glusterfs-9.3-ORIG/rpc/rpc-transport/socket/src/name.c
glusterfs-9.3-NEW/rpc/rpc-transport/socket/src/name.c
--- glusterfs-9.3-ORIG/rpc/rpc-transport/socket/src/name.c??? 2021-06-29
00:27:44.381408294 -0500
+++ glusterfs-9.3-NEW/rpc/rpc-transport/socket/src/name.c??? 2021-09-20
16:34:28.969425361 -0500
@@ -252,9 +252,16 @@ af_inet_client_get_remote_sockaddr(rpc_t
? ? /* Need to update transport-address family if address-family is not provided
? ? ? ? to command-line arguments
? ? */
+? ? /* HPE This is forcing our IPV4 servers in to to an IPV6 address
+? ? * family that is not compatible with IPV4. For now we will just set it
+? ? * to AF_INET.
+? ? */
+? ? /*
? ? if (inet_pton(AF_INET6, remote_host, &serveraddr)) {
? ? ? ? sockaddr->sa_family = AF_INET6;
? ? }
+? ? */
+? ? sockaddr->sa_family = AF_INET;
 
? ? /* TODO: gf_resolve is a blocking call. kick in some
? ? ? ? non blocking dns techniques */


On Mon, Sep 20, 2021 at 11:35:35AM -0500, Erik Jacobson
wrote:> I missed the other important log snip:
> 
> The message "E [MSGID: 101075] [common-utils.c:520:gf_resolve_ip6]
0-resolver: error in getaddrinfo [{family=10}, {ret=Address family for hostname
not supported}]" repeated 620 times between [2021-09-20 15:49:23.720633
+0000] and [2021-09-20 15:50:41.731542 +0000]
> 
> So I will dig in to the code some here.
> 
> 
> On Mon, Sep 20, 2021 at 10:59:30AM -0500, Erik Jacobson wrote:
> > Hello all! I hope you are well.
> > 
> > We are starting a new software release cycle and I am trying to find a
> > way to upgrade customers from our build of gluster 7.9 to our build of
> > gluster 9.3
> > 
> > When we deploy gluster, we foribly remove all references to any host
> > names and use only IP addresses. This is because, if for any reason a
> > DNS server is unreachable, even if the peer files have IPs and DNS, it
> > causes glusterd to be unable to reach peers properly. We can't
really
> > rely on /etc/hosts either because customers take artistic licene with
> > their /etc/hosts files and don't realize that problems that can
cause.
> > 
> > So our deployed peer files look something like this:
> > 
> > uuid=46a4b506-029d-4750-acfb-894501a88977
> > state=3
> > hostname1=172.23.0.16
> > 
> > That is, with full intention, we avoid host names.
> > 
> > When we upgrade to gluster 9.3, we fall over with these errors and
> > gluster is now partitioned and the updated gluster servers can't
reach
> > anybody:
> > 
> > [2021-09-20 15:50:41.731543 +0000] E
[name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution
failed on host 172.23.0.16
> > 
> > 
> > As you can see, we have defined on purpose everything using IPs but in
> > 9.3 it appears this method fails. Are there any suggestions short of
> > putting real host names in peer files?
> > 
> > 
> > 
> > FYI
> > 
> > This supercomputer will be using gluster for part of its system
> > management. It is how we deploy the Image Objects (squashfs images)
> > hosted on NFS today and served by gluster leader nodes and also store
> > system logs, console logs, and other data.
> > 
> > https://www.olcf.ornl.gov/frontier/  
> > 
> > 
> > Erik
> > ________
> > 
> > 
> > 
> > Community Meeting Calendar:
> > 
> > Schedule -
> > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> > Bridge: https://meet.google.com/cpu-eiue-hvk  
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > https://lists.gluster.org/mailman/listinfo/gluster-users  
> ________
> 
> 
> 
> Community Meeting Calendar:
> 
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk 
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users   
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20210921/c8b57aa4/attachment.html>

Erik Jacobson

2021-Sep-21 14:59 UTC

head link

[Gluster-users] gluster forcing IPV6 on our IPV4 servers, glusterd fails (was gluster update question regarding new DNS resolution requirement)

> Don't forget to run the geo-replication fix script , if you missed to
do it
> before the upgrade.
We don't use geo-replication YET but thank you for this thoughtful
reminder.

Just a note on things like this -- we really try to do everything in a
package update because that's how we'd have to deploy to customers in an
automated way. So having to run a script as part of the upgrade would be
very hard in a package based work flow for a packged solution.

I'm not complaining I love gluster but this is just food for thought.

I can't even hardly say it with a straight face because we suffer from
similar issues on the cluster management side - updating one CM to the
next is harder than it should be so I'm certainly not judging. Updating
is always painful.

I LOVE that slowly updating our gluster servers is "Just working".

This will allow a supercomputer to slowly update their infrastructure
while taking no compute nodes (using nfs-hosted squashfs images or root)
down. It's really remarkable since it's a big jump too 7.9 to 9.3 I am
impressed by this part. It's a huge relief that I didn't have to do an
intermediate jump to gluster8 in the middle as that would have been
nearly impossible for us to get right.

Thank you all!!

PS: Frontier will have 21 leader nodes running gluster servers.
Distributed/replicate in groups of 3 hosting nfs-exported squashfs image
objects for compute node root filesystems. Many thousands of nodes.
> 
> Best Regards,
> Strahil Nikolov
> 
> 
>     On Tue, Sep 21, 2021 at 0:46, Erik Jacobson
>     <erik.jacobson at hpe.com> wrote:
>     I pretended I'm a low-level C programmer with network and
filesystem
>     experience for a few hours.
> 
>     I'm not sure what the right solution is but what was happening was
the
>     code was trying to treat our IPV4 hosts as AF_INET6 and the family was
>     incompatible with our IPV4 IP addresses. Yes, we need to move to IPV6
>     but we're hoping to do that on our own time (~50 years like
everybody
>     else :)
> 
>     I found a chunk of the code that seemed to be force-setting us to
>     AF_INET6.
> 
>     While I'm sure it is not 100% the correct patch, the patch attached
and
>     pasted below is working for me so I'll integrate it with our
internal
>     build to continue testing.
> 
>     Please let me know if there is a configuration item I missed or a
>     different way to do this. I added -devel to this email.
> 
>     In the previous thread, you would have seen that we're testing a
>     hopeful change that will upgrade our deployed customers from gluster
>     7.9 to gluster 9.3.
> 
>     Thank you!! Advice on next steps would be appreciated !!
> 
> 
>     diff -Narup glusterfs-9.3-ORIG/rpc/rpc-transport/socket/src/name.c
>     glusterfs-9.3-NEW/rpc/rpc-transport/socket/src/name.c
>     --- glusterfs-9.3-ORIG/rpc/rpc-transport/socket/src/name.c   
2021-06-29
>     00:27:44.381408294 -0500
>     +++ glusterfs-9.3-NEW/rpc/rpc-transport/socket/src/name.c    2021-09-20
>     16:34:28.969425361 -0500
>     @@ -252,9 +252,16 @@ af_inet_client_get_remote_sockaddr(rpc_t
>         /* Need to update transport-address family if address-family is not
>     provided
>             to command-line arguments
>         */
>     +    /* HPE This is forcing our IPV4 servers in to to an IPV6 address
>     +    * family that is not compatible with IPV4. For now we will just
set it
>     +    * to AF_INET.
>     +    */
>     +    /*
>         if (inet_pton(AF_INET6, remote_host, &serveraddr)) {
>             sockaddr->sa_family = AF_INET6;
>         }
>     +    */
>     +    sockaddr->sa_family = AF_INET;
> 
>         /* TODO: gf_resolve is a blocking call. kick in some
>             non blocking dns techniques */
> 
>    
>     On Mon, Sep 20, 2021 at 11:35:35AM -0500, Erik Jacobson wrote:
>     > I missed the other important log snip:
>     >
>     > The message "E [MSGID: 101075]
[common-utils.c:520:gf_resolve_ip6]
>     0-resolver: error in getaddrinfo [{family=10}, {ret=Address family for
>     hostname not supported}]" repeated 620 times between [2021-09-20
>     15:49:23.720633 +0000] and [2021-09-20 15:50:41.731542 +0000]
>     >
>     > So I will dig in to the code some here.
>     >
>     >
>     > On Mon, Sep 20, 2021 at 10:59:30AM -0500, Erik Jacobson wrote:
>     > > Hello all! I hope you are well.
>     > >
>     > > We are starting a new software release cycle and I am trying
to find a
>     > > way to upgrade customers from our build of gluster 7.9 to our
build of
>     > > gluster 9.3
>     > >
>     > > When we deploy gluster, we foribly remove all references to
any host
>     > > names and use only IP addresses. This is because, if for any
reason a
>     > > DNS server is unreachable, even if the peer files have IPs
and DNS, it
>     > > causes glusterd to be unable to reach peers properly. We
can't really
>     > > rely on /etc/hosts either because customers take artistic
licene with
>     > > their /etc/hosts files and don't realize that problems
that can cause.
>     > >
>     > > So our deployed peer files look something like this:
>     > >
>     > > uuid=46a4b506-029d-4750-acfb-894501a88977
>     > > state=3
>     > > hostname1=172.23.0.16
>     > >
>     > > That is, with full intention, we avoid host names.
>     > >
>     > > When we upgrade to gluster 9.3, we fall over with these
errors and
>     > > gluster is now partitioned and the updated gluster servers
can't reach
>     > > anybody:
>     > >
>     > > [2021-09-20 15:50:41.731543 +0000] E
>     [name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS
>     resolution failed on host 172.23.0.16
>     > >
>     > >
>     > > As you can see, we have defined on purpose everything using
IPs but in
>     > > 9.3 it appears this method fails. Are there any suggestions
short of
>     > > putting real host names in peer files?
>     > >
>     > >
>     > >
>     > > FYI
>     > >
>     > > This supercomputer will be using gluster for part of its
system
>     > > management. It is how we deploy the Image Objects (squashfs
images)
>     > > hosted on NFS today and served by gluster leader nodes and
also store
>     > > system logs, console logs, and other data.
>     > >
>     > > https://www.olcf.ornl.gov/frontier/
>     > >
>     > >
>     > > Erik
>     > > ________
>     > >
>     > >
>     > >
>     > > Community Meeting Calendar:
>     > >
>     > > Schedule -
>     > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>     > > Bridge: https://meet.google.com/cpu-eiue-hvk
>     > > Gluster-users mailing list
>     > > Gluster-users at gluster.org
>     > > https://lists.gluster.org/mailman/listinfo/gluster-users
>     > ________
>     >
>     >
>     >
>     > Community Meeting Calendar:
>     >
>     > Schedule -
>     > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>     > Bridge: https://meet.google.com/cpu-eiue-hvk
>     > Gluster-users mailing list
>     > Gluster-users at gluster.org
>     > https://lists.gluster.org/mailman/listinfo/gluster-users
>

Gluster users - Sep 2021 - gluster forcing IPV6 on our IPV4 servers, glusterd fails (was gluster update question regarding new DNS resolution requirement)

[Gluster-users] gluster forcing IPV6 on our IPV4 servers, glusterd fails (was gluster update question regarding new DNS resolution requirement)

[Gluster-users] gluster forcing IPV6 on our IPV4 servers, glusterd fails (was gluster update question regarding new DNS resolution requirement)