Torbjørn Thorsen
2014-Apr-23 14:22 UTC
[Gluster-users] Self-heal working ? Seeing "background meta-data data self-heal failed" in logs
Greetings. We have a distributed and replicated setup where one of the servers, which means two of the bricks, have been off-line for some time. Now when I engage the previously-down server, I see some worrying lines in the client log. [2014-04-23 13:02:17.384463] E [afr-self-heal-common.c:2212:afr_self_heal_completion_cbk] 0-gluster0-replicate-0: background meta-data data self-heal failed on /some-path-here/disk0 Not much traffic is moving between the client and the previously-down server, although I can see from stat-ing and using getfattr on the file directly on the bricks that writes are happening both places. I'm not at all sure I'm actually performing the self-heal process, though. We're on Gluster 3.4, although both servers have been rolling upgraded, so it seems from the log that we're still in a 3.3 state of mind, so to speak. Here's the full log from the client when the server comes back: [2014-04-23 13:02:03.747626] I [client-handshake.c:1658:select_server_supported_programs] 0-gluster0-client-2: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2014-04-23 13:02:03.763042] I [client-handshake.c:1658:select_server_supported_programs] 0-gluster0-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2014-04-23 13:02:03.763399] I [client-handshake.c:1456:client_setvolume_cbk] 0-gluster0-client-2: Connected to 192.168.51.201:49153, attached to remote volume '/srv/gluster/brick1'. [2014-04-23 13:02:03.763418] I [client-handshake.c:1468:client_setvolume_cbk] 0-gluster0-client-2: Server and Client lk-version numbers are not same, reopening the fds [2014-04-23 13:02:03.763468] I [client-handshake.c:1308:client_post_handshake] 0-gluster0-client-2: 1 fds open - Delaying child_up until they are re-opened [2014-04-23 13:02:03.764814] I [client-handshake.c:1456:client_setvolume_cbk] 0-gluster0-client-0: Connected to 192.168.51.201:49152, attached to remote volume '/srv/gluster/brick0'. [2014-04-23 13:02:03.764832] I [client-handshake.c:1468:client_setvolume_cbk] 0-gluster0-client-0: Server and Client lk-version numbers are not same, reopening the fds [2014-04-23 13:02:03.764846] I [client-handshake.c:1308:client_post_handshake] 0-gluster0-client-0: 2 fds open - Delaying child_up until they are re-opened [2014-04-23 13:02:03.764952] I [client-handshake.c:930:client_child_up_reopen_done] 0-gluster0-client-2: last fd open'd/lock-self-heal'd - notifying CHILD-UP [2014-04-23 13:02:03.766292] I [client-handshake.c:930:client_child_up_reopen_done] 0-gluster0-client-0: last fd open'd/lock-self-heal'd - notifying CHILD-UP [2014-04-23 13:02:03.766379] I [client-handshake.c:450:client_set_lk_version_cbk] 0-gluster0-client-2: Server lk version = 1 [2014-04-23 13:02:03.853489] I [client-handshake.c:450:client_set_lk_version_cbk] 0-gluster0-client-0: Server lk version = 1 [2014-04-23 13:02:17.384463] E [afr-self-heal-common.c:2212:afr_self_heal_completion_cbk] 0-gluster0-replicate-0: background meta-data data self-heal failed on /some-path-here/disk0 [2014-04-23 13:02:20.253380] E [afr-self-heal-common.c:2212:afr_self_heal_completion_cbk] 0-gluster0-replicate-1: background meta-data data self-heal failed on /some-other-path/disk0 -- Vennlig hilsen Torbj?rn Thorsen Utvikler / driftstekniker Trollweb Solutions AS - Professional Magento Partner www.trollweb.no Telefon dagtid: +47 51215300 Telefon kveld/helg: For kunder med Serviceavtale Bes?ksadresse: Luramyrveien 40, 4313 Sandnes Postadresse: Maurholen 57, 4316 Sandnes Husk at alle v?re standard-vilk?r alltid er gjeldende
Pranith Kumar Karampuri
2014-Apr-25 06:28 UTC
[Gluster-users] Self-heal working ? Seeing "background meta-data data self-heal failed" in logs
----- Original Message -----> From: "Torbj?rn Thorsen" <torbjorn at trollweb.no> > To: gluster-users at gluster.org > Sent: Wednesday, April 23, 2014 7:52:24 PM > Subject: [Gluster-users] Self-heal working ? Seeing "background meta-data data self-heal failed" in logs > > Greetings. > > We have a distributed and replicated setup where one of the servers, > which means two of the bricks, have been off-line for some time. > > Now when I engage the previously-down server, I see some worrying > lines in the client log. > > [2014-04-23 13:02:17.384463] E > [afr-self-heal-common.c:2212:afr_self_heal_completion_cbk] > 0-gluster0-replicate-0: background meta-data data self-heal failed on > /some-path-here/disk0 > > Not much traffic is moving between the client and the previously-down server, > although I can see from stat-ing and using getfattr on the file > directly on the bricks that writes are happening both places. > I'm not at all sure I'm actually performing the self-heal process, though. > > We're on Gluster 3.4, although both servers have been rolling > upgraded, so it seems from the log that we're still in a 3.3 state of > mind, so to speak. > > Here's the full log from the client when the server comes back: > [2014-04-23 13:02:03.747626] I > [client-handshake.c:1658:select_server_supported_programs] > 0-gluster0-client-2: Using Program GlusterFS 3.3, Num (1298437), > Version (330) > [2014-04-23 13:02:03.763042] I > [client-handshake.c:1658:select_server_supported_programs] > 0-gluster0-client-0: Using Program GlusterFS 3.3, Num (1298437), > Version (330) > [2014-04-23 13:02:03.763399] I > [client-handshake.c:1456:client_setvolume_cbk] 0-gluster0-client-2: > Connected to 192.168.51.201:49153, attached to remote volume > '/srv/gluster/brick1'. > [2014-04-23 13:02:03.763418] I > [client-handshake.c:1468:client_setvolume_cbk] 0-gluster0-client-2: > Server and Client lk-version numbers are not same, reopening the fds > [2014-04-23 13:02:03.763468] I > [client-handshake.c:1308:client_post_handshake] 0-gluster0-client-2: 1 > fds open - Delaying child_up until they are re-opened > [2014-04-23 13:02:03.764814] I > [client-handshake.c:1456:client_setvolume_cbk] 0-gluster0-client-0: > Connected to 192.168.51.201:49152, attached to remote volume > '/srv/gluster/brick0'. > [2014-04-23 13:02:03.764832] I > [client-handshake.c:1468:client_setvolume_cbk] 0-gluster0-client-0: > Server and Client lk-version numbers are not same, reopening the fds > [2014-04-23 13:02:03.764846] I > [client-handshake.c:1308:client_post_handshake] 0-gluster0-client-0: 2 > fds open - Delaying child_up until they are re-opened > [2014-04-23 13:02:03.764952] I > [client-handshake.c:930:client_child_up_reopen_done] > 0-gluster0-client-2: last fd open'd/lock-self-heal'd - notifying > CHILD-UP > [2014-04-23 13:02:03.766292] I > [client-handshake.c:930:client_child_up_reopen_done] > 0-gluster0-client-0: last fd open'd/lock-self-heal'd - notifying > CHILD-UP > [2014-04-23 13:02:03.766379] I > [client-handshake.c:450:client_set_lk_version_cbk] > 0-gluster0-client-2: Server lk version = 1 > [2014-04-23 13:02:03.853489] I > [client-handshake.c:450:client_set_lk_version_cbk] > 0-gluster0-client-0: Server lk version = 1 > [2014-04-23 13:02:17.384463] E > [afr-self-heal-common.c:2212:afr_self_heal_completion_cbk] > 0-gluster0-replicate-0: background meta-data data self-heal failed on > /some-path-here/disk0 > [2014-04-23 13:02:20.253380] E > [afr-self-heal-common.c:2212:afr_self_heal_completion_cbk] > 0-gluster0-replicate-1: background meta-data data self-heal failed on > /some-other-path/disk0Are there any other failures in brick logs? Could you attach the logs of both bricks and clients please. Pranith> > > > -- > Vennlig hilsen > Torbj?rn Thorsen > Utvikler / driftstekniker > > Trollweb Solutions AS > - Professional Magento Partner > www.trollweb.no > > Telefon dagtid: +47 51215300 > Telefon kveld/helg: For kunder med Serviceavtale > > Bes?ksadresse: Luramyrveien 40, 4313 Sandnes > Postadresse: Maurholen 57, 4316 Sandnes > > Husk at alle v?re standard-vilk?r alltid er gjeldende > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users