thr3ads.net - Gluster users - [Gluster-users] Broken status, peer probe, "DNS resolution failed on host" and "Error disabling sockopt IPV6_V6ONLY: "Protocol not available" after updating from gluster 7.9 to 9.1 [Jul 2021]

If this information is useful, please help other people find it:
Share via:

Strahil Nikolov

2021-Jul-23 04:09 UTC

[Gluster-users] Broken status, peer probe, "DNS resolution failed on host" and "Error disabling sockopt IPV6_V6ONLY: "Protocol not available" after updating from gluster 7.9 to 9.1

Did you try with latest 9.X ? Based on the release notes that should be 9.3 .
 
Best Regards,Strahil Nikolov
 
  On Fri, Jul 23, 2021 at 3:06, Artem Russakovskii<archon810 at gmail.com>
wrote:   Hi all,
I just filed this ticket?https://github.com/gluster/glusterfs/issues/2648, and
wanted to bring it to?your attention. Any feedback would be?appreciated.

Description of problem:
We have a 4-node replicate cluster running gluster 7.9. I'm currently
setting up a new cluster on a new set of machines and went straight for gluster
9.1.

However, I was unable to probe any servers due to this error:
[2021-07-17 00:31:05.228609 +0000] I [MSGID: 106487]
[glusterd-handler.c:1160:__glusterd_handle_cli_probe] 0-glusterd: Received CLI
probe req nexus2 24007
[2021-07-17 00:31:05.229727 +0000] E [MSGID: 101075]
[common-utils.c:3657:gf_is_local_addr] 0-management: error in getaddrinfo
[{ret=Name or service not known}]
[2021-07-17 00:31:05.230785 +0000] E [MSGID: 106408]
[glusterd-peer-utils.c:217:glusterd_peerinfo_find_by_hostname] 0-management:
error in getaddrinfo: Name or service not known
 [Unknown error -2]
[2021-07-17 00:31:05.353971 +0000] I [MSGID: 106128]
[glusterd-handler.c:3719:glusterd_probe_begin] 0-glusterd: Unable to find
peerinfo for host: nexus2 (24007)
[2021-07-17 00:31:05.375871 +0000] W [MSGID: 106061]
[glusterd-handler.c:3488:glusterd_transport_inet_options_build] 0-glusterd:
Failed to get tcp-user-timeout
[2021-07-17 00:31:05.375903 +0000] I [rpc-clnt.c:1010:rpc_clnt_connection_init]
0-management: setting frame-timeout to 600
[2021-07-17 00:31:05.377021 +0000] E [MSGID: 101075]
[common-utils.c:520:gf_resolve_ip6] 0-resolver: error in getaddrinfo
[{family=10}, {ret=Name or service not known}]
[2021-07-17 00:31:05.377043 +0000] E
[name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution
failed on host nexus2
[2021-07-17 00:31:05.377147 +0000] I [MSGID: 106498]
[glusterd-handler.c:3648:glusterd_friend_add] 0-management: connect returned 0
[2021-07-17 00:31:05.377201 +0000] I [MSGID: 106004]
[glusterd-handler.c:6427:__glusterd_peer_rpc_notify] 0-management: Peer
<nexus2> (<00000000-0000-0000-0000-000000000000>), in state
<Establishing Connection>, has disconnected from glusterd.
[2021-07-17 00:31:05.377453 +0000] E [MSGID: 101032]
[store.c:464:gf_store_handle_retrieve] 0-: Path corresponding to
/var/lib/glusterd/glusterd.info. [No such file or directory]

I then wiped the?/var/lib/glusterd?dir to start clean and downgraded to 7.9,
then attempted to peer probe again. This time, it worked fine, proving 7.9 is
working, same as it is on prod.

At this point, I made a volume, started it, and played around with testing to my
satisfaction. Then I decided to see what would happen if I tried to upgrade this
working volume from 7.9 to 9.1.

The end result is:
   
   - gluster volume status?is only showing the local gluster node and not any of
the remote nodes
   - data does seem to replicate, so the connection between the servers is
actually established
   - logs are now filled with constantly repeating messages like so:
[2021-07-22 23:29:31.039004 +0000] E
[name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution
failed on host nexus2
[2021-07-22 23:29:31.039212 +0000] E
[name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution
failed on host citadel
[2021-07-22 23:29:31.039304 +0000] E
[name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution
failed on host hive
The message "E [MSGID: 101075] [common-utils.c:520:gf_resolve_ip6]
0-resolver: error in getaddrinfo [{family=10}, {ret=Name or service not
known}]" repeated 119 times between [2021-07-22 23:27:34.025983 +0000] and
[2021-07-22 23:29:31.039302 +0000]
[2021-07-22 23:29:34.039369 +0000] E [MSGID: 101075]
[common-utils.c:520:gf_resolve_ip6] 0-resolver: error in getaddrinfo
[{family=10}, {ret=Name or service not known}]
[2021-07-22 23:29:34.039441 +0000] E
[name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution
failed on host nexus2
[2021-07-22 23:29:34.039558 +0000] E
[name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution
failed on host citadel
[2021-07-22 23:29:34.039659 +0000] E
[name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution
failed on host hive
[2021-07-22 23:29:37.039741 +0000] E
[name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution
failed on host nexus2
[2021-07-22 23:29:37.039921 +0000] E
[name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution
failed on host citadel
[2021-07-22 23:29:37.040015 +0000] E
[name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution
failed on host hive

When I issue a command in cli:
==> cli.log <=[2021-07-22 23:38:11.802596 +0000] I [cli.c:840:main] 0-cli:
Started running gluster with version 9.1
**[2021-07-22 23:38:11.804007 +0000] W [socket.c:3434:socket_connect]
0-glusterfs: Error disabling sockopt IPV6_V6ONLY: "Operation not
supported"**
[2021-07-22 23:38:11.906865 +0000] I [MSGID: 101190]
[event-epoll.c:670:event_dispatch_epoll_worker] 0-epoll: Started thread with
index [{index=0}]
**Mandatory info:** **- The output of the `gluster volume info`
command**:gluster volume info
 
Volume Name: ap
Type: Replicate
Volume ID: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 4 = 4
Transport-type: tcp
Bricks:
Brick1: nexus2:/mnt/nexus2_block1/ap
Brick2: forge:/mnt/forge_block1/ap
Brick3: hive:/mnt/hive_block1/ap
Brick4: citadel:/mnt/citadel_block1/ap
Options Reconfigured:
performance.client-io-threads: on
nfs.disable: on
storage.fips-mode-rchecksum: on
transport.address-family: inet
cluster.self-heal-daemon: enable
client.event-threads: 4
cluster.data-self-heal-algorithm: full
cluster.lookup-optimize: on
cluster.quorum-count: 1
cluster.quorum-type: fixed
cluster.readdir-optimize: on
cluster.heal-timeout: 1800
disperse.eager-lock: on
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
network.inode-lru-limit: 500000
network.ping-timeout: 7
network.remote-dio: enable
performance.cache-invalidation: on
performance.cache-size: 1GB
performance.io-thread-count: 4
performance.md-cache-timeout: 600
performance.rda-cache-limit: 256MB
performance.read-ahead: off
performance.readdir-ahead: on
performance.stat-prefetch: on
performance.write-behind-window-size: 32MB
server.event-threads: 4
cluster.background-self-heal-count: 1
performance.cache-refresh-timeout: 10
features.ctime: off
cluster.granular-entry-heal: enable

- The output of the?gluster volume status?command:
gluster volume status
Status of volume: ap
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick forge:/mnt/forge_block1/ap            49152     0          Y       2622 
Self-heal Daemon on localhost               N/A       N/A        N       N/A  
 
Task Status of Volume ap
------------------------------------------------------------------------------
There are no active volume tasks

- The output of the?gluster volume heal?command:
gluster volume heal ap enable
Enable heal on volume ap has been successful 
gluster volume heal ap 
Launching heal operation to perform index self heal on volume ap has been
unsuccessful:
Self-heal daemon is not running. Check self-heal daemon log file.

- The operating system / glusterfs version:
OpenSUSE 15.2, glusterfs 9.1.


Sincerely,
Artem

--
Founder, Android Police,?APK Mirror, Illogical Robot LLCbeerpla.net | @ArtemR
________



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users at gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users
  
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20210723/42bde5bd/attachment.html>

Artem Russakovskii

2021-Jul-23 04:54 UTC

head link

[Gluster-users] Broken status, peer probe, "DNS resolution failed on host" and "Error disabling sockopt IPV6_V6ONLY: "Protocol not available" after updating from gluster 7.9 to 9.1

Hi Strahil,

I am using repo builds from
https://download.opensuse.org/repositories/filesystems/openSUSE_Leap_15.2/x86_64/
(currently glusterfs-9.1-lp152.88.2.x86_64.rpm) and don't build them.

Perhaps the builds at
https://download.opensuse.org/repositories/home:/glusterfs:/Leap15.2-9/openSUSE_Leap_15.2/x86_64/
are better (currently glusterfs-9.1-lp152.112.1.x86_64.rpm), does anyone
know?

None of the repos currently have 9.3.

And regardless, I don't care for gluster using IPv6 if IPv4 works fine. Is
there a way to make it stop trying to use IPv6 and only use IPv4?

Sincerely,
Artem

--
Founder, Android Police <http://www.androidpolice.com>, APK Mirror
<http://www.apkmirror.com/>, Illogical Robot LLC
beerpla.net | @ArtemR <http://twitter.com/ArtemR>


On Thu, Jul 22, 2021 at 9:09 PM Strahil Nikolov <hunter86_bg at yahoo.com>
wrote:
> Did you try with latest 9.X ? Based on the release notes that should be
> 9.3 .
>
> Best Regards,
> Strahil Nikolov
>
> On Fri, Jul 23, 2021 at 3:06, Artem Russakovskii
> <archon810 at gmail.com> wrote:
> Hi all,
>
> I just filed this ticket https://github.com/gluster/glusterfs/issues/2648,
> and wanted to bring it to your attention. Any feedback would be
appreciated.
>
> Description of problem:
> We have a 4-node replicate cluster running gluster 7.9. I'm currently
> setting up a new cluster on a new set of machines and went straight for
> gluster 9.1.
>
> However, I was unable to probe any servers due to this error:
>
> [2021-07-17 00:31:05.228609 +0000] I [MSGID: 106487]
[glusterd-handler.c:1160:__glusterd_handle_cli_probe] 0-glusterd: Received CLI
probe req nexus2 24007
> [2021-07-17 00:31:05.229727 +0000] E [MSGID: 101075]
[common-utils.c:3657:gf_is_local_addr] 0-management: error in getaddrinfo
[{ret=Name or service not known}]
> [2021-07-17 00:31:05.230785 +0000] E [MSGID: 106408]
[glusterd-peer-utils.c:217:glusterd_peerinfo_find_by_hostname] 0-management:
error in getaddrinfo: Name or service not known
>  [Unknown error -2]
> [2021-07-17 00:31:05.353971 +0000] I [MSGID: 106128]
[glusterd-handler.c:3719:glusterd_probe_begin] 0-glusterd: Unable to find
peerinfo for host: nexus2 (24007)
> [2021-07-17 00:31:05.375871 +0000] W [MSGID: 106061]
[glusterd-handler.c:3488:glusterd_transport_inet_options_build] 0-glusterd:
Failed to get tcp-user-timeout
> [2021-07-17 00:31:05.375903 +0000] I
[rpc-clnt.c:1010:rpc_clnt_connection_init] 0-management: setting frame-timeout
to 600
> [2021-07-17 00:31:05.377021 +0000] E [MSGID: 101075]
[common-utils.c:520:gf_resolve_ip6] 0-resolver: error in getaddrinfo
[{family=10}, {ret=Name or service not known}]
> [2021-07-17 00:31:05.377043 +0000] E
[name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution
failed on host nexus2
> [2021-07-17 00:31:05.377147 +0000] I [MSGID: 106498]
[glusterd-handler.c:3648:glusterd_friend_add] 0-management: connect returned 0
> [2021-07-17 00:31:05.377201 +0000] I [MSGID: 106004]
[glusterd-handler.c:6427:__glusterd_peer_rpc_notify] 0-management: Peer
<nexus2> (<00000000-0000-0000-0000-000000000000>), in state
<Establishing Connection>, has disconnected from glusterd.
> [2021-07-17 00:31:05.377453 +0000] E [MSGID: 101032]
[store.c:464:gf_store_handle_retrieve] 0-: Path corresponding to
/var/lib/glusterd/glusterd.info. [No such file or directory]
>
> I then wiped the /var/lib/glusterd dir to start clean and downgraded to
> 7.9, then attempted to peer probe again. This time, it worked fine, proving
> 7.9 is working, same as it is on prod.
>
> At this point, I made a volume, started it, and played around with testing
> to my satisfaction. Then I decided to see what would happen if I tried to
> upgrade this working volume from 7.9 to 9.1.
>
> The end result is:
>
>    - gluster volume status is only showing the local gluster node and not
>    any of the remote nodes
>    - data does seem to replicate, so the connection between the servers
>    is actually established
>    - logs are now filled with constantly repeating messages like so:
>
> [2021-07-22 23:29:31.039004 +0000] E
[name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution
failed on host nexus2
> [2021-07-22 23:29:31.039212 +0000] E
[name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution
failed on host citadel
> [2021-07-22 23:29:31.039304 +0000] E
[name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution
failed on host hive
> The message "E [MSGID: 101075] [common-utils.c:520:gf_resolve_ip6]
0-resolver: error in getaddrinfo [{family=10}, {ret=Name or service not
known}]" repeated 119 times between [2021-07-22 23:27:34.025983 +0000] and
[2021-07-22 23:29:31.039302 +0000]
> [2021-07-22 23:29:34.039369 +0000] E [MSGID: 101075]
[common-utils.c:520:gf_resolve_ip6] 0-resolver: error in getaddrinfo
[{family=10}, {ret=Name or service not known}]
> [2021-07-22 23:29:34.039441 +0000] E
[name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution
failed on host nexus2
> [2021-07-22 23:29:34.039558 +0000] E
[name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution
failed on host citadel
> [2021-07-22 23:29:34.039659 +0000] E
[name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution
failed on host hive
> [2021-07-22 23:29:37.039741 +0000] E
[name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution
failed on host nexus2
> [2021-07-22 23:29:37.039921 +0000] E
[name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution
failed on host citadel
> [2021-07-22 23:29:37.040015 +0000] E
[name.c:265:af_inet_client_get_remote_sockaddr] 0-management: DNS resolution
failed on host hive
>
> When I issue a command in cli:
>
> ==> cli.log <=> [2021-07-22 23:38:11.802596 +0000] I
[cli.c:840:main] 0-cli: Started running gluster with version 9.1
> **[2021-07-22 23:38:11.804007 +0000] W [socket.c:3434:socket_connect]
0-glusterfs: Error disabling sockopt IPV6_V6ONLY: "Operation not
supported"**
> [2021-07-22 23:38:11.906865 +0000] I [MSGID: 101190]
[event-epoll.c:670:event_dispatch_epoll_worker] 0-epoll: Started thread with
index [{index=0}]
>
> **Mandatory info:** **- The output of the `gluster volume info` command**:
>
> gluster volume info
>
> Volume Name: ap
> Type: Replicate
> Volume ID: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 4 = 4
> Transport-type: tcp
> Bricks:
> Brick1: nexus2:/mnt/nexus2_block1/ap
> Brick2: forge:/mnt/forge_block1/ap
> Brick3: hive:/mnt/hive_block1/ap
> Brick4: citadel:/mnt/citadel_block1/ap
> Options Reconfigured:
> performance.client-io-threads: on
> nfs.disable: on
> storage.fips-mode-rchecksum: on
> transport.address-family: inet
> cluster.self-heal-daemon: enable
> client.event-threads: 4
> cluster.data-self-heal-algorithm: full
> cluster.lookup-optimize: on
> cluster.quorum-count: 1
> cluster.quorum-type: fixed
> cluster.readdir-optimize: on
> cluster.heal-timeout: 1800
> disperse.eager-lock: on
> features.cache-invalidation: on
> features.cache-invalidation-timeout: 600
> network.inode-lru-limit: 500000
> network.ping-timeout: 7
> network.remote-dio: enable
> performance.cache-invalidation: on
> performance.cache-size: 1GB
> performance.io-thread-count: 4
> performance.md-cache-timeout: 600
> performance.rda-cache-limit: 256MB
> performance.read-ahead: off
> performance.readdir-ahead: on
> performance.stat-prefetch: on
> performance.write-behind-window-size: 32MB
> server.event-threads: 4
> cluster.background-self-heal-count: 1
> performance.cache-refresh-timeout: 10
> features.ctime: off
> cluster.granular-entry-heal: enable
>
> - The output of the gluster volume status command:
>
> gluster volume status
> Status of volume: ap
> Gluster process                             TCP Port  RDMA Port  Online 
Pid
>
------------------------------------------------------------------------------
> Brick forge:/mnt/forge_block1/ap            49152     0          Y      
2622
> Self-heal Daemon on localhost               N/A       N/A        N      
N/A
>
> Task Status of Volume ap
>
------------------------------------------------------------------------------
> There are no active volume tasks
>
> - The output of the gluster volume heal command:
>
> gluster volume heal ap enable
> Enable heal on volume ap has been successful
>
> gluster volume heal ap
> Launching heal operation to perform index self heal on volume ap has been
unsuccessful:
> Self-heal daemon is not running. Check self-heal daemon log file.
>
> - The operating system / glusterfs version:
> OpenSUSE 15.2, glusterfs 9.1.
>
>
> Sincerely,
> Artem
>
> --
> Founder, Android Police <http://www.androidpolice.com>, APK Mirror
> <http://www.apkmirror.com/>, Illogical Robot LLC
> beerpla.net | @ArtemR <http://twitter.com/ArtemR>
> ________
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20210722/254d28f3/attachment.html>

Gluster users - Jul 2021 - Broken status, peer probe, "DNS resolution failed on host" and "Error disabling sockopt IPV6_V6ONLY: "Protocol not available" after updating from gluster 7.9 to 9.1

[Gluster-users] Broken status, peer probe, "DNS resolution failed on host" and "Error disabling sockopt IPV6_V6ONLY: "Protocol not available" after updating from gluster 7.9 to 9.1

[Gluster-users] Broken status, peer probe, "DNS resolution failed on host" and "Error disabling sockopt IPV6_V6ONLY: "Protocol not available" after updating from gluster 7.9 to 9.1