thr3ads.net - Gluster users - [Gluster-users] Broken status, peer probe, "DNS resolution failed on host" and "Error disabling sockopt IPV6_V6ONLY: "Protocol not available" after updating from gluster 7.9 to 9.1 [Jul 2021]

If this information is useful, please help other people find it:
Share via:

Artem Russakovskii

2021-Jul-23 04:54 UTC

[Gluster-users] Broken status, peer probe, "DNS resolution failed on host" and "Error disabling sockopt IPV6_V6ONLY: "Protocol not available" after updating from gluster 7.9 to 9.1

Hi Strahil,

I am using repo builds from
https://download.opensuse.org/repositories/filesystems/openSUSE_Leap_15.2/x86_64/
(currently glusterfs-9.1-lp152.88.2.x86_64.rpm) and don't build them.

Perhaps the builds at
https://download.opensuse.org/repositories/home:/glusterfs:/Leap15.2-9/openSUSE_Leap_15.2/x86_64/
are better (currently glusterfs-9.1-lp152.112.1.x86_64.rpm), does anyone
know?

None of the repos currently have 9.3.

And regardless, I don't care for gluster using IPv6 if IPv4 works fine. Is
there a way to make it stop trying to use IPv6 and only use IPv4?

Sincerely,
Artem

--
Founder, Android Police <http://www.androidpolice.com>, APK Mirror
<http://www.apkmirror.com/>, Illogical Robot LLC
beerpla.net | @ArtemR <http://twitter.com/ArtemR>


On Thu, Jul 22, 2021 at 9:09 PM Strahil Nikolov <hunter86_bg at yahoo.com>
wrote:
> Did you try with latest 9.X ? Based on the release notes that should be
> 9.3 .
>
> Best Regards,
> Strahil Nikolov
>
> On Fri, Jul 23, 2021 at 3:06, Artem Russakovskii
> <archon810 at gmail.com> wrote:
> Hi all,
>
> I just filed this ticket https://github.com/gluster/glusterfs/issues/2648,
> and wanted to bring it to your attention. Any feedback would be
appreciated.
>
> Description of problem:
> We have a 4-node replicate cluster running gluster 7.9. I'm currently
> setting up a new cluster on a new set of machines and went straight for
> gluster 9.1.
>
> However, I was unable to probe any servers due to this error:
>
> [2021-07-17 00:31:05.228609 +0000] I [MSGID: 106487]
[glusterd-handler.c:1160:__glusterd_handle_cli_probe] 0-glusterd: Received CLI
probe req nexus2 24007
> [2021-07-17 00:31:05.229727 +0000] E [MSGID: 101075]
[common-utils.c:3657:gf_is_local_addr] 0-management: error in getaddrinfo
[{ret=Name or service not known}]
> [2021-07-17 00:31:05.230785 +0000] E [MSGID: 106408]
[glusterd-peer-utils.c:217:glusterd_peerinfo_find_by_hostname] 0-management:
error in getaddrinfo: Name or service not known
>  [Unknown error -2]
> [2021-07-17 00:31:05.353971 +0000] I [MSGID: 106128]
[glusterd-handler.c:3719:glusterd_probe_begin] 0-glusterd: Unable to find
peerinfo for host: nexus2 (24007)
> [2021-07-17 00:31:05.375871 +0000] W [MSGID: 106061]
[glusterd-handler.c:3488:glusterd_transport_inet_options_build] 0-glusterd:
Failed to get tcp-user-timeout
> [2021-07-17 00:31:05.375903 +0000] I
[rpc-clnt.c:1010:rpc_clnt_connection_init] 0-management: setting frame-timeout
to 600
> [2021-07-17 00:31:05.377021 +0000] E [MSGID: 101075]
[common-utils.c:520:gf_resolve_ip6] 0-resolver: error in getaddrinfo
[{family=10}, {ret=Name or service not known}]
> [2021-07-17 00:31:05.377043 +0000] E
[name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution
failed on host nexus2
> [2021-07-17 00:31:05.377147 +0000] I [MSGID: 106498]
[glusterd-handler.c:3648:glusterd_friend_add] 0-management: connect returned 0
> [2021-07-17 00:31:05.377201 +0000] I [MSGID: 106004]
[glusterd-handler.c:6427:__glusterd_peer_rpc_notify] 0-management: Peer
<nexus2> (<00000000-0000-0000-0000-000000000000>), in state
<Establishing Connection>, has disconnected from glusterd.
> [2021-07-17 00:31:05.377453 +0000] E [MSGID: 101032]
[store.c:464:gf_store_handle_retrieve] 0-: Path corresponding to
/var/lib/glusterd/glusterd.info. [No such file or directory]
>
> I then wiped the /var/lib/glusterd dir to start clean and downgraded to
> 7.9, then attempted to peer probe again. This time, it worked fine, proving
> 7.9 is working, same as it is on prod.
>
> At this point, I made a volume, started it, and played around with testing
> to my satisfaction. Then I decided to see what would happen if I tried to
> upgrade this working volume from 7.9 to 9.1.
>
> The end result is:
>
>    - gluster volume status is only showing the local gluster node and not
>    any of the remote nodes
>    - data does seem to replicate, so the connection between the servers
>    is actually established
>    - logs are now filled with constantly repeating messages like so:
>
> [2021-07-22 23:29:31.039004 +0000] E
[name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution
failed on host nexus2
> [2021-07-22 23:29:31.039212 +0000] E
[name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution
failed on host citadel
> [2021-07-22 23:29:31.039304 +0000] E
[name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution
failed on host hive
> The message "E [MSGID: 101075] [common-utils.c:520:gf_resolve_ip6]
0-resolver: error in getaddrinfo [{family=10}, {ret=Name or service not
known}]" repeated 119 times between [2021-07-22 23:27:34.025983 +0000] and
[2021-07-22 23:29:31.039302 +0000]
> [2021-07-22 23:29:34.039369 +0000] E [MSGID: 101075]
[common-utils.c:520:gf_resolve_ip6] 0-resolver: error in getaddrinfo
[{family=10}, {ret=Name or service not known}]
> [2021-07-22 23:29:34.039441 +0000] E
[name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution
failed on host nexus2
> [2021-07-22 23:29:34.039558 +0000] E
[name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution
failed on host citadel
> [2021-07-22 23:29:34.039659 +0000] E
[name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution
failed on host hive
> [2021-07-22 23:29:37.039741 +0000] E
[name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution
failed on host nexus2
> [2021-07-22 23:29:37.039921 +0000] E
[name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution
failed on host citadel
> [2021-07-22 23:29:37.040015 +0000] E
[name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution
failed on host hive
>
> When I issue a command in cli:
>
> ==> cli.log <=> [2021-07-22 23:38:11.802596 +0000] I
[cli.c:840:main] 0-cli: Started running gluster with version 9.1
> **[2021-07-22 23:38:11.804007 +0000] W [socket.c:3434:socket_connect]
0-glusterfs: Error disabling sockopt IPV6_V6ONLY: "Operation not
supported"**
> [2021-07-22 23:38:11.906865 +0000] I [MSGID: 101190]
[event-epoll.c:670:event_dispatch_epoll_worker] 0-epoll: Started thread with
index [{index=0}]
>
> **Mandatory info:** **- The output of the `gluster volume info` command**:
>
> gluster volume info
>
> Volume Name: ap
> Type: Replicate
> Volume ID: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 4 = 4
> Transport-type: tcp
> Bricks:
> Brick1: nexus2:/mnt/nexus2_block1/ap
> Brick2: forge:/mnt/forge_block1/ap
> Brick3: hive:/mnt/hive_block1/ap
> Brick4: citadel:/mnt/citadel_block1/ap
> Options Reconfigured:
> performance.client-io-threads: on
> nfs.disable: on
> storage.fips-mode-rchecksum: on
> transport.address-family: inet
> cluster.self-heal-daemon: enable
> client.event-threads: 4
> cluster.data-self-heal-algorithm: full
> cluster.lookup-optimize: on
> cluster.quorum-count: 1
> cluster.quorum-type: fixed
> cluster.readdir-optimize: on
> cluster.heal-timeout: 1800
> disperse.eager-lock: on
> features.cache-invalidation: on
> features.cache-invalidation-timeout: 600
> network.inode-lru-limit: 500000
> network.ping-timeout: 7
> network.remote-dio: enable
> performance.cache-invalidation: on
> performance.cache-size: 1GB
> performance.io-thread-count: 4
> performance.md-cache-timeout: 600
> performance.rda-cache-limit: 256MB
> performance.read-ahead: off
> performance.readdir-ahead: on
> performance.stat-prefetch: on
> performance.write-behind-window-size: 32MB
> server.event-threads: 4
> cluster.background-self-heal-count: 1
> performance.cache-refresh-timeout: 10
> features.ctime: off
> cluster.granular-entry-heal: enable
>
> - The output of the gluster volume status command:
>
> gluster volume status
> Status of volume: ap
> Gluster process                             TCP Port  RDMA Port  Online 
Pid
>
------------------------------------------------------------------------------
> Brick forge:/mnt/forge_block1/ap            49152     0          Y      
2622
> Self-heal Daemon on localhost               N/A       N/A        N      
N/A
>
> Task Status of Volume ap
>
------------------------------------------------------------------------------
> There are no active volume tasks
>
> - The output of the gluster volume heal command:
>
> gluster volume heal ap enable
> Enable heal on volume ap has been successful
>
> gluster volume heal ap
> Launching heal operation to perform index self heal on volume ap has been
unsuccessful:
> Self-heal daemon is not running. Check self-heal daemon log file.
>
> - The operating system / glusterfs version:
> OpenSUSE 15.2, glusterfs 9.1.
>
>
> Sincerely,
> Artem
>
> --
> Founder, Android Police <http://www.androidpolice.com>, APK Mirror
> <http://www.apkmirror.com/>, Illogical Robot LLC
> beerpla.net | @ArtemR <http://twitter.com/ArtemR>
> ________
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20210722/254d28f3/attachment.html>

Strahil Nikolov

2021-Jul-24 02:08 UTC

head link

[Gluster-users] Broken status, peer probe, "DNS resolution failed on host" and "Error disabling sockopt IPV6_V6ONLY: "Protocol not available" after updating from gluster 7.9 to 9.1

Can you try setting "transport.address-family: inet" at
/etc/glusterfs/glusterd.vol on all nodes ?

About the rpms, if they are not yet built - the only other option is to build
them from source.

I assume , that the second try is on a fresh set of systems without any remnants
of old Gluster install.

Best Regards,
Strahil Nikolov






? ?????, 23 ??? 2021 ?., 07:55:01 ?. ???????+3, Artem Russakovskii <archon810
at gmail.com> ??????:





Hi Strahil,

I am using repo builds from
https://download.opensuse.org/repositories/filesystems/openSUSE_Leap_15.2/x86_64/
(currently glusterfs-9.1-lp152.88.2.x86_64.rpm) and don't build them.

Perhaps the builds at
https://download.opensuse.org/repositories/home:/glusterfs:/Leap15.2-9/openSUSE_Leap_15.2/x86_64/
are better (currently glusterfs-9.1-lp152.112.1.x86_64.rpm), does anyone know?

None of the repos currently have 9.3.

And regardless, I don't care for gluster using IPv6 if IPv4 works fine. Is
there a way to make it stop trying to use IPv6 and only use IPv4?

Sincerely,
Artem

--
Founder, Android Police,?APK Mirror, Illogical Robot LLC
beerpla.net | @ArtemR


On Thu, Jul 22, 2021 at 9:09 PM Strahil Nikolov <hunter86_bg at yahoo.com>
wrote:> Did you try with latest 9.X ? Based on the release notes that should be 9.3
.
> 
> Best Regards,
> Strahil Nikolov
> 
> 
>>??
>>??
>> On Fri, Jul 23, 2021 at 3:06, Artem Russakovskii
>> <archon810 at gmail.com> wrote:
>> 
>> 
>>??
>> Hi all,
>> 
>> I just filed this
ticket?https://github.com/gluster/glusterfs/issues/2648, and wanted to bring it
to?your attention. Any feedback would be?appreciated.
>> 
>> Description of problem:
>> We have a 4-node replicate cluster running gluster 7.9. I'm
currently setting up a new cluster on a new set of machines and went straight
for gluster 9.1.
>> However, I was unable to probe any servers due to this error:
>> [2021-07-17 00:31:05.228609 +0000] I [MSGID: 106487]
[glusterd-handler.c:1160:__glusterd_handle_cli_probe] 0-glusterd: Received CLI
probe req nexus2 24007
>> [2021-07-17 00:31:05.229727 +0000] E [MSGID: 101075]
[common-utils.c:3657:gf_is_local_addr] 0-management: error in getaddrinfo
[{ret=Name or service not known}]
>> [2021-07-17 00:31:05.230785 +0000] E [MSGID: 106408]
[glusterd-peer-utils.c:217:glusterd_peerinfo_find_by_hostname] 0-management:
error in getaddrinfo: Name or service not known
>>??[Unknown error -2]
>> [2021-07-17 00:31:05.353971 +0000] I [MSGID: 106128]
[glusterd-handler.c:3719:glusterd_probe_begin] 0-glusterd: Unable to find
peerinfo for host: nexus2 (24007)
>> [2021-07-17 00:31:05.375871 +0000] W [MSGID: 106061]
[glusterd-handler.c:3488:glusterd_transport_inet_options_build] 0-glusterd:
Failed to get tcp-user-timeout
>> [2021-07-17 00:31:05.375903 +0000] I
[rpc-clnt.c:1010:rpc_clnt_connection_init] 0-management: setting frame-timeout
to 600
>> [2021-07-17 00:31:05.377021 +0000] E [MSGID: 101075]
[common-utils.c:520:gf_resolve_ip6] 0-resolver: error in getaddrinfo
[{family=10}, {ret=Name or service not known}]
>> [2021-07-17 00:31:05.377043 +0000] E
[name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution
failed on host nexus2
>> [2021-07-17 00:31:05.377147 +0000] I [MSGID: 106498]
[glusterd-handler.c:3648:glusterd_friend_add] 0-management: connect returned 0
>> [2021-07-17 00:31:05.377201 +0000] I [MSGID: 106004]
[glusterd-handler.c:6427:__glusterd_peer_rpc_notify] 0-management: Peer
<nexus2> (<00000000-0000-0000-0000-000000000000>), in state
<Establishing Connection>, has disconnected from glusterd.
>> [2021-07-17 00:31:05.377453 +0000] E [MSGID: 101032]
[store.c:464:gf_store_handle_retrieve] 0-: Path corresponding to
/var/lib/glusterd/glusterd.info. [No such file or directory]
>> 
>> I then wiped the?/var/lib/glusterd?dir to start clean and downgraded to
7.9, then attempted to peer probe again. This time, it worked fine, proving 7.9
is working, same as it is on prod.
>> At this point, I made a volume, started it, and played around with
testing to my satisfaction. Then I decided to see what would happen if I tried
to upgrade this working volume from 7.9 to 9.1.
>> The end result is:
>> ????* gluster volume status?is only showing the local gluster node and
not any of the remote nodes
>> ????* data does seem to replicate, so the connection between the
servers is actually established
>> ????* logs are now filled with constantly repeating messages like so:
>> [2021-07-22 23:29:31.039004 +0000] E
[name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution
failed on host nexus2
>> [2021-07-22 23:29:31.039212 +0000] E
[name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution
failed on host citadel
>> [2021-07-22 23:29:31.039304 +0000] E
[name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution
failed on host hive
>> The message "E [MSGID: 101075] [common-utils.c:520:gf_resolve_ip6]
0-resolver: error in getaddrinfo [{family=10}, {ret=Name or service not
known}]" repeated 119 times between [2021-07-22 23:27:34.025983 +0000] and
[2021-07-22 23:29:31.039302 +0000]
>> [2021-07-22 23:29:34.039369 +0000] E [MSGID: 101075]
[common-utils.c:520:gf_resolve_ip6] 0-resolver: error in getaddrinfo
[{family=10}, {ret=Name or service not known}]
>> [2021-07-22 23:29:34.039441 +0000] E
[name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution
failed on host nexus2
>> [2021-07-22 23:29:34.039558 +0000] E
[name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution
failed on host citadel
>> [2021-07-22 23:29:34.039659 +0000] E
[name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution
failed on host hive
>> [2021-07-22 23:29:37.039741 +0000] E
[name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution
failed on host nexus2
>> [2021-07-22 23:29:37.039921 +0000] E
[name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution
failed on host citadel
>> [2021-07-22 23:29:37.040015 +0000] E
[name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution
failed on host hive
>> 
>> When I issue a command in cli:
>> ==> cli.log <=>> [2021-07-22 23:38:11.802596 +0000] I
[cli.c:840:main] 0-cli: Started running gluster with version 9.1
>> **[2021-07-22 23:38:11.804007 +0000] W [socket.c:3434:socket_connect]
0-glusterfs: Error disabling sockopt IPV6_V6ONLY: "Operation not
supported"**
>> [2021-07-22 23:38:11.906865 +0000] I [MSGID: 101190]
[event-epoll.c:670:event_dispatch_epoll_worker] 0-epoll: Started thread with
index [{index=0}]
>> 
>> **Mandatory info:** **- The output of the `gluster volume info`
command**:
>> gluster volume info
>>??
>> Volume Name: ap
>> Type: Replicate
>> Volume ID: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 1 x 4 = 4
>> Transport-type: tcp
>> Bricks:
>> Brick1: nexus2:/mnt/nexus2_block1/ap
>> Brick2: forge:/mnt/forge_block1/ap
>> Brick3: hive:/mnt/hive_block1/ap
>> Brick4: citadel:/mnt/citadel_block1/ap
>> Options Reconfigured:
>> performance.client-io-threads: on
>> nfs.disable: on
>> storage.fips-mode-rchecksum: on
>> transport.address-family: inet
>> cluster.self-heal-daemon: enable
>> client.event-threads: 4
>> cluster.data-self-heal-algorithm: full
>> cluster.lookup-optimize: on
>> cluster.quorum-count: 1
>> cluster.quorum-type: fixed
>> cluster.readdir-optimize: on
>> cluster.heal-timeout: 1800
>> disperse.eager-lock: on
>> features.cache-invalidation: on
>> features.cache-invalidation-timeout: 600
>> network.inode-lru-limit: 500000
>> network.ping-timeout: 7
>> network.remote-dio: enable
>> performance.cache-invalidation: on
>> performance.cache-size: 1GB
>> performance.io-thread-count: 4
>> performance.md-cache-timeout: 600
>> performance.rda-cache-limit: 256MB
>> performance.read-ahead: off
>> performance.readdir-ahead: on
>> performance.stat-prefetch: on
>> performance.write-behind-window-size: 32MB
>> server.event-threads: 4
>> cluster.background-self-heal-count: 1
>> performance.cache-refresh-timeout: 10
>> features.ctime: off
>> cluster.granular-entry-heal: enable
>> 
>> - The output of the?gluster volume status?command:
>> gluster volume status
>> Status of volume: ap
>> Gluster process???????????????????????????? TCP Port??RDMA
Port??Online??Pid
>>
------------------------------------------------------------------------------
>> Brick forge:/mnt/forge_block1/ap????????????49152????
0??????????Y?????? 2622
>> Self-heal Daemon on localhost?????????????? N/A??????
N/A????????N?????? N/A??
>>??
>> Task Status of Volume ap
>>
------------------------------------------------------------------------------
>> There are no active volume tasks
>> 
>> - The output of the?gluster volume heal?command:
>> gluster volume heal ap enable
>> Enable heal on volume ap has been successful 
>> 
>> gluster volume heal ap 
>> Launching heal operation to perform index self heal on volume ap has
been unsuccessful:
>> Self-heal daemon is not running. Check self-heal daemon log file.
>> 
>> - The operating system / glusterfs version:
>> OpenSUSE 15.2, glusterfs 9.1.
>> 
>> 
>> Sincerely,
>> Artem
>> 
>> --
>> Founder, Android Police,?APK Mirror, Illogical Robot LLC
>> beerpla.net | @ArtemR
>> 
>> ________
>> 
>> 
>> 
>> Community Meeting Calendar:
>> 
>> Schedule -
>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>> Bridge: https://meet.google.com/cpu-eiue-hvk
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>> 
>>

Gluster users - Jul 2021 - Broken status, peer probe, "DNS resolution failed on host" and "Error disabling sockopt IPV6_V6ONLY: "Protocol not available" after updating from gluster 7.9 to 9.1

[Gluster-users] Broken status, peer probe, "DNS resolution failed on host" and "Error disabling sockopt IPV6_V6ONLY: "Protocol not available" after updating from gluster 7.9 to 9.1

[Gluster-users] Broken status, peer probe, "DNS resolution failed on host" and "Error disabling sockopt IPV6_V6ONLY: "Protocol not available" after updating from gluster 7.9 to 9.1