Artem Russakovskii
2021-Jul-23 04:54 UTC
[Gluster-users] Broken status, peer probe, "DNS resolution failed on host" and "Error disabling sockopt IPV6_V6ONLY: "Protocol not available" after updating from gluster 7.9 to 9.1
Hi Strahil, I am using repo builds from https://download.opensuse.org/repositories/filesystems/openSUSE_Leap_15.2/x86_64/ (currently glusterfs-9.1-lp152.88.2.x86_64.rpm) and don't build them. Perhaps the builds at https://download.opensuse.org/repositories/home:/glusterfs:/Leap15.2-9/openSUSE_Leap_15.2/x86_64/ are better (currently glusterfs-9.1-lp152.112.1.x86_64.rpm), does anyone know? None of the repos currently have 9.3. And regardless, I don't care for gluster using IPv6 if IPv4 works fine. Is there a way to make it stop trying to use IPv6 and only use IPv4? Sincerely, Artem -- Founder, Android Police <http://www.androidpolice.com>, APK Mirror <http://www.apkmirror.com/>, Illogical Robot LLC beerpla.net | @ArtemR <http://twitter.com/ArtemR> On Thu, Jul 22, 2021 at 9:09 PM Strahil Nikolov <hunter86_bg at yahoo.com> wrote:> Did you try with latest 9.X ? Based on the release notes that should be > 9.3 . > > Best Regards, > Strahil Nikolov > > On Fri, Jul 23, 2021 at 3:06, Artem Russakovskii > <archon810 at gmail.com> wrote: > Hi all, > > I just filed this ticket https://github.com/gluster/glusterfs/issues/2648, > and wanted to bring it to your attention. Any feedback would be appreciated. > > Description of problem: > We have a 4-node replicate cluster running gluster 7.9. I'm currently > setting up a new cluster on a new set of machines and went straight for > gluster 9.1. > > However, I was unable to probe any servers due to this error: > > [2021-07-17 00:31:05.228609 +0000] I [MSGID: 106487] [glusterd-handler.c:1160:__glusterd_handle_cli_probe] 0-glusterd: Received CLI probe req nexus2 24007 > [2021-07-17 00:31:05.229727 +0000] E [MSGID: 101075] [common-utils.c:3657:gf_is_local_addr] 0-management: error in getaddrinfo [{ret=Name or service not known}] > [2021-07-17 00:31:05.230785 +0000] E [MSGID: 106408] [glusterd-peer-utils.c:217:glusterd_peerinfo_find_by_hostname] 0-management: error in getaddrinfo: Name or service not known > [Unknown error -2] > [2021-07-17 00:31:05.353971 +0000] I [MSGID: 106128] [glusterd-handler.c:3719:glusterd_probe_begin] 0-glusterd: Unable to find peerinfo for host: nexus2 (24007) > [2021-07-17 00:31:05.375871 +0000] W [MSGID: 106061] [glusterd-handler.c:3488:glusterd_transport_inet_options_build] 0-glusterd: Failed to get tcp-user-timeout > [2021-07-17 00:31:05.375903 +0000] I [rpc-clnt.c:1010:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 > [2021-07-17 00:31:05.377021 +0000] E [MSGID: 101075] [common-utils.c:520:gf_resolve_ip6] 0-resolver: error in getaddrinfo [{family=10}, {ret=Name or service not known}] > [2021-07-17 00:31:05.377043 +0000] E [name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host nexus2 > [2021-07-17 00:31:05.377147 +0000] I [MSGID: 106498] [glusterd-handler.c:3648:glusterd_friend_add] 0-management: connect returned 0 > [2021-07-17 00:31:05.377201 +0000] I [MSGID: 106004] [glusterd-handler.c:6427:__glusterd_peer_rpc_notify] 0-management: Peer <nexus2> (<00000000-0000-0000-0000-000000000000>), in state <Establishing Connection>, has disconnected from glusterd. > [2021-07-17 00:31:05.377453 +0000] E [MSGID: 101032] [store.c:464:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/glusterd.info. [No such file or directory] > > I then wiped the /var/lib/glusterd dir to start clean and downgraded to > 7.9, then attempted to peer probe again. This time, it worked fine, proving > 7.9 is working, same as it is on prod. > > At this point, I made a volume, started it, and played around with testing > to my satisfaction. Then I decided to see what would happen if I tried to > upgrade this working volume from 7.9 to 9.1. > > The end result is: > > - gluster volume status is only showing the local gluster node and not > any of the remote nodes > - data does seem to replicate, so the connection between the servers > is actually established > - logs are now filled with constantly repeating messages like so: > > [2021-07-22 23:29:31.039004 +0000] E [name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host nexus2 > [2021-07-22 23:29:31.039212 +0000] E [name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host citadel > [2021-07-22 23:29:31.039304 +0000] E [name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host hive > The message "E [MSGID: 101075] [common-utils.c:520:gf_resolve_ip6] 0-resolver: error in getaddrinfo [{family=10}, {ret=Name or service not known}]" repeated 119 times between [2021-07-22 23:27:34.025983 +0000] and [2021-07-22 23:29:31.039302 +0000] > [2021-07-22 23:29:34.039369 +0000] E [MSGID: 101075] [common-utils.c:520:gf_resolve_ip6] 0-resolver: error in getaddrinfo [{family=10}, {ret=Name or service not known}] > [2021-07-22 23:29:34.039441 +0000] E [name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host nexus2 > [2021-07-22 23:29:34.039558 +0000] E [name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host citadel > [2021-07-22 23:29:34.039659 +0000] E [name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host hive > [2021-07-22 23:29:37.039741 +0000] E [name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host nexus2 > [2021-07-22 23:29:37.039921 +0000] E [name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host citadel > [2021-07-22 23:29:37.040015 +0000] E [name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host hive > > When I issue a command in cli: > > ==> cli.log <=> [2021-07-22 23:38:11.802596 +0000] I [cli.c:840:main] 0-cli: Started running gluster with version 9.1 > **[2021-07-22 23:38:11.804007 +0000] W [socket.c:3434:socket_connect] 0-glusterfs: Error disabling sockopt IPV6_V6ONLY: "Operation not supported"** > [2021-07-22 23:38:11.906865 +0000] I [MSGID: 101190] [event-epoll.c:670:event_dispatch_epoll_worker] 0-epoll: Started thread with index [{index=0}] > > **Mandatory info:** **- The output of the `gluster volume info` command**: > > gluster volume info > > Volume Name: ap > Type: Replicate > Volume ID: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 4 = 4 > Transport-type: tcp > Bricks: > Brick1: nexus2:/mnt/nexus2_block1/ap > Brick2: forge:/mnt/forge_block1/ap > Brick3: hive:/mnt/hive_block1/ap > Brick4: citadel:/mnt/citadel_block1/ap > Options Reconfigured: > performance.client-io-threads: on > nfs.disable: on > storage.fips-mode-rchecksum: on > transport.address-family: inet > cluster.self-heal-daemon: enable > client.event-threads: 4 > cluster.data-self-heal-algorithm: full > cluster.lookup-optimize: on > cluster.quorum-count: 1 > cluster.quorum-type: fixed > cluster.readdir-optimize: on > cluster.heal-timeout: 1800 > disperse.eager-lock: on > features.cache-invalidation: on > features.cache-invalidation-timeout: 600 > network.inode-lru-limit: 500000 > network.ping-timeout: 7 > network.remote-dio: enable > performance.cache-invalidation: on > performance.cache-size: 1GB > performance.io-thread-count: 4 > performance.md-cache-timeout: 600 > performance.rda-cache-limit: 256MB > performance.read-ahead: off > performance.readdir-ahead: on > performance.stat-prefetch: on > performance.write-behind-window-size: 32MB > server.event-threads: 4 > cluster.background-self-heal-count: 1 > performance.cache-refresh-timeout: 10 > features.ctime: off > cluster.granular-entry-heal: enable > > - The output of the gluster volume status command: > > gluster volume status > Status of volume: ap > Gluster process TCP Port RDMA Port Online Pid > ------------------------------------------------------------------------------ > Brick forge:/mnt/forge_block1/ap 49152 0 Y 2622 > Self-heal Daemon on localhost N/A N/A N N/A > > Task Status of Volume ap > ------------------------------------------------------------------------------ > There are no active volume tasks > > - The output of the gluster volume heal command: > > gluster volume heal ap enable > Enable heal on volume ap has been successful > > gluster volume heal ap > Launching heal operation to perform index self heal on volume ap has been unsuccessful: > Self-heal daemon is not running. Check self-heal daemon log file. > > - The operating system / glusterfs version: > OpenSUSE 15.2, glusterfs 9.1. > > > Sincerely, > Artem > > -- > Founder, Android Police <http://www.androidpolice.com>, APK Mirror > <http://www.apkmirror.com/>, Illogical Robot LLC > beerpla.net | @ArtemR <http://twitter.com/ArtemR> > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://meet.google.com/cpu-eiue-hvk > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20210722/254d28f3/attachment.html>
Strahil Nikolov
2021-Jul-24 02:08 UTC
[Gluster-users] Broken status, peer probe, "DNS resolution failed on host" and "Error disabling sockopt IPV6_V6ONLY: "Protocol not available" after updating from gluster 7.9 to 9.1
Can you try setting "transport.address-family: inet" at /etc/glusterfs/glusterd.vol on all nodes ? About the rpms, if they are not yet built - the only other option is to build them from source. I assume , that the second try is on a fresh set of systems without any remnants of old Gluster install. Best Regards, Strahil Nikolov ? ?????, 23 ??? 2021 ?., 07:55:01 ?. ???????+3, Artem Russakovskii <archon810 at gmail.com> ??????: Hi Strahil, I am using repo builds from https://download.opensuse.org/repositories/filesystems/openSUSE_Leap_15.2/x86_64/ (currently glusterfs-9.1-lp152.88.2.x86_64.rpm) and don't build them. Perhaps the builds at https://download.opensuse.org/repositories/home:/glusterfs:/Leap15.2-9/openSUSE_Leap_15.2/x86_64/ are better (currently glusterfs-9.1-lp152.112.1.x86_64.rpm), does anyone know? None of the repos currently have 9.3. And regardless, I don't care for gluster using IPv6 if IPv4 works fine. Is there a way to make it stop trying to use IPv6 and only use IPv4? Sincerely, Artem -- Founder, Android Police,?APK Mirror, Illogical Robot LLC beerpla.net | @ArtemR On Thu, Jul 22, 2021 at 9:09 PM Strahil Nikolov <hunter86_bg at yahoo.com> wrote:> Did you try with latest 9.X ? Based on the release notes that should be 9.3 . > > Best Regards, > Strahil Nikolov > > >>?? >>?? >> On Fri, Jul 23, 2021 at 3:06, Artem Russakovskii >> <archon810 at gmail.com> wrote: >> >> >>?? >> Hi all, >> >> I just filed this ticket?https://github.com/gluster/glusterfs/issues/2648, and wanted to bring it to?your attention. Any feedback would be?appreciated. >> >> Description of problem: >> We have a 4-node replicate cluster running gluster 7.9. I'm currently setting up a new cluster on a new set of machines and went straight for gluster 9.1. >> However, I was unable to probe any servers due to this error: >> [2021-07-17 00:31:05.228609 +0000] I [MSGID: 106487] [glusterd-handler.c:1160:__glusterd_handle_cli_probe] 0-glusterd: Received CLI probe req nexus2 24007 >> [2021-07-17 00:31:05.229727 +0000] E [MSGID: 101075] [common-utils.c:3657:gf_is_local_addr] 0-management: error in getaddrinfo [{ret=Name or service not known}] >> [2021-07-17 00:31:05.230785 +0000] E [MSGID: 106408] [glusterd-peer-utils.c:217:glusterd_peerinfo_find_by_hostname] 0-management: error in getaddrinfo: Name or service not known >>??[Unknown error -2] >> [2021-07-17 00:31:05.353971 +0000] I [MSGID: 106128] [glusterd-handler.c:3719:glusterd_probe_begin] 0-glusterd: Unable to find peerinfo for host: nexus2 (24007) >> [2021-07-17 00:31:05.375871 +0000] W [MSGID: 106061] [glusterd-handler.c:3488:glusterd_transport_inet_options_build] 0-glusterd: Failed to get tcp-user-timeout >> [2021-07-17 00:31:05.375903 +0000] I [rpc-clnt.c:1010:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 >> [2021-07-17 00:31:05.377021 +0000] E [MSGID: 101075] [common-utils.c:520:gf_resolve_ip6] 0-resolver: error in getaddrinfo [{family=10}, {ret=Name or service not known}] >> [2021-07-17 00:31:05.377043 +0000] E [name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host nexus2 >> [2021-07-17 00:31:05.377147 +0000] I [MSGID: 106498] [glusterd-handler.c:3648:glusterd_friend_add] 0-management: connect returned 0 >> [2021-07-17 00:31:05.377201 +0000] I [MSGID: 106004] [glusterd-handler.c:6427:__glusterd_peer_rpc_notify] 0-management: Peer <nexus2> (<00000000-0000-0000-0000-000000000000>), in state <Establishing Connection>, has disconnected from glusterd. >> [2021-07-17 00:31:05.377453 +0000] E [MSGID: 101032] [store.c:464:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/glusterd.info. [No such file or directory] >> >> I then wiped the?/var/lib/glusterd?dir to start clean and downgraded to 7.9, then attempted to peer probe again. This time, it worked fine, proving 7.9 is working, same as it is on prod. >> At this point, I made a volume, started it, and played around with testing to my satisfaction. Then I decided to see what would happen if I tried to upgrade this working volume from 7.9 to 9.1. >> The end result is: >> ????* gluster volume status?is only showing the local gluster node and not any of the remote nodes >> ????* data does seem to replicate, so the connection between the servers is actually established >> ????* logs are now filled with constantly repeating messages like so: >> [2021-07-22 23:29:31.039004 +0000] E [name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host nexus2 >> [2021-07-22 23:29:31.039212 +0000] E [name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host citadel >> [2021-07-22 23:29:31.039304 +0000] E [name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host hive >> The message "E [MSGID: 101075] [common-utils.c:520:gf_resolve_ip6] 0-resolver: error in getaddrinfo [{family=10}, {ret=Name or service not known}]" repeated 119 times between [2021-07-22 23:27:34.025983 +0000] and [2021-07-22 23:29:31.039302 +0000] >> [2021-07-22 23:29:34.039369 +0000] E [MSGID: 101075] [common-utils.c:520:gf_resolve_ip6] 0-resolver: error in getaddrinfo [{family=10}, {ret=Name or service not known}] >> [2021-07-22 23:29:34.039441 +0000] E [name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host nexus2 >> [2021-07-22 23:29:34.039558 +0000] E [name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host citadel >> [2021-07-22 23:29:34.039659 +0000] E [name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host hive >> [2021-07-22 23:29:37.039741 +0000] E [name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host nexus2 >> [2021-07-22 23:29:37.039921 +0000] E [name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host citadel >> [2021-07-22 23:29:37.040015 +0000] E [name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution failed on host hive >> >> When I issue a command in cli: >> ==> cli.log <=>> [2021-07-22 23:38:11.802596 +0000] I [cli.c:840:main] 0-cli: Started running gluster with version 9.1 >> **[2021-07-22 23:38:11.804007 +0000] W [socket.c:3434:socket_connect] 0-glusterfs: Error disabling sockopt IPV6_V6ONLY: "Operation not supported"** >> [2021-07-22 23:38:11.906865 +0000] I [MSGID: 101190] [event-epoll.c:670:event_dispatch_epoll_worker] 0-epoll: Started thread with index [{index=0}] >> >> **Mandatory info:** **- The output of the `gluster volume info` command**: >> gluster volume info >>?? >> Volume Name: ap >> Type: Replicate >> Volume ID: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX >> Status: Started >> Snapshot Count: 0 >> Number of Bricks: 1 x 4 = 4 >> Transport-type: tcp >> Bricks: >> Brick1: nexus2:/mnt/nexus2_block1/ap >> Brick2: forge:/mnt/forge_block1/ap >> Brick3: hive:/mnt/hive_block1/ap >> Brick4: citadel:/mnt/citadel_block1/ap >> Options Reconfigured: >> performance.client-io-threads: on >> nfs.disable: on >> storage.fips-mode-rchecksum: on >> transport.address-family: inet >> cluster.self-heal-daemon: enable >> client.event-threads: 4 >> cluster.data-self-heal-algorithm: full >> cluster.lookup-optimize: on >> cluster.quorum-count: 1 >> cluster.quorum-type: fixed >> cluster.readdir-optimize: on >> cluster.heal-timeout: 1800 >> disperse.eager-lock: on >> features.cache-invalidation: on >> features.cache-invalidation-timeout: 600 >> network.inode-lru-limit: 500000 >> network.ping-timeout: 7 >> network.remote-dio: enable >> performance.cache-invalidation: on >> performance.cache-size: 1GB >> performance.io-thread-count: 4 >> performance.md-cache-timeout: 600 >> performance.rda-cache-limit: 256MB >> performance.read-ahead: off >> performance.readdir-ahead: on >> performance.stat-prefetch: on >> performance.write-behind-window-size: 32MB >> server.event-threads: 4 >> cluster.background-self-heal-count: 1 >> performance.cache-refresh-timeout: 10 >> features.ctime: off >> cluster.granular-entry-heal: enable >> >> - The output of the?gluster volume status?command: >> gluster volume status >> Status of volume: ap >> Gluster process???????????????????????????? TCP Port??RDMA Port??Online??Pid >> ------------------------------------------------------------------------------ >> Brick forge:/mnt/forge_block1/ap????????????49152???? 0??????????Y?????? 2622 >> Self-heal Daemon on localhost?????????????? N/A?????? N/A????????N?????? N/A?? >>?? >> Task Status of Volume ap >> ------------------------------------------------------------------------------ >> There are no active volume tasks >> >> - The output of the?gluster volume heal?command: >> gluster volume heal ap enable >> Enable heal on volume ap has been successful >> >> gluster volume heal ap >> Launching heal operation to perform index self heal on volume ap has been unsuccessful: >> Self-heal daemon is not running. Check self-heal daemon log file. >> >> - The operating system / glusterfs version: >> OpenSUSE 15.2, glusterfs 9.1. >> >> >> Sincerely, >> Artem >> >> -- >> Founder, Android Police,?APK Mirror, Illogical Robot LLC >> beerpla.net | @ArtemR >> >> ________ >> >> >> >> Community Meeting Calendar: >> >> Schedule - >> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >> Bridge: https://meet.google.com/cpu-eiue-hvk >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users >> >>