Hi, I'm running glusterFS 3.3.1 on Centos 6.4. ? Gluster volume status Status of volume: glustervol Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick KWTOCUATGS001:/mnt/cloudbrick 24009 Y 20031 Brick KWTOCUATGS002:/mnt/cloudbrick 24009 Y 1260 NFS Server on localhost 38467 Y 43320 Self-heal Daemon on localhost N/A Y 43326 NFS Server on KWTOCUATGS002 38467 Y 5842 Self-heal Daemon on KWTOCUATGS002 N/A Y 5848 The self heal stops working and application write only to 1 brick and it doesn't replicate. When I check /var/log/glusterfs/glustershd.log I see the following.: [2013-12-03 05:42:32.033563] W [socket.c:410:__socket_keepalive] 0-socket: failed to set keep idle on socket 8 [2013-12-03 05:42:32.033646] W [socket.c:1876:socket_server_event_handler] 0-socket.glusterfsd: Failed to set keep-alive: Operation not supported [2013-12-03 05:42:32.790473] I [client-handshake.c:1614:select_server_supported_programs] 0-glustervol-client-1: Using Program GlusterFS 3.3.2, Num (1298437), Version (330) [2013-12-03 05:42:32.790840] I [client-handshake.c:1411:client_setvolume_cbk] 0-glustervol-client-1: Connected to 172.16.95.153:24009, attached to remote volume '/mnt/cloudbrick'. [2013-12-03 05:42:32.790884] I [client-handshake.c:1423:client_setvolume_cbk] 0-glustervol-client-1: Server and Client lk-version numbers are not same, reopening the fds [2013-12-03 05:42:32.791003] I [afr-common.c:3685:afr_notify] 0-glustervol-replicate-0: Subvolume 'glustervol-client-1' came back up; going online. [2013-12-03 05:42:32.791161] I [client-handshake.c:453:client_set_lk_version_cbk] 0-glustervol-client-1: Server lk version = 1 [2013-12-03 05:42:32.795103] E [afr-self-heal-data.c:1321:afr_sh_data_open_cbk] 0-glustervol-replicate-0: open of <gfid:a7e88fd1-6e32-40ab-90f6-ea452242a7c6> failed on child glustervol-client-0 (Transport endpoint is not connected) [2013-12-03 05:42:32.798064] E [afr-self-heal-data.c:1321:afr_sh_data_open_cbk] 0-glustervol-replicate-0: open of <gfid:081c6657-301a-42a4-9f95-6eeba6c67413> failed on child glustervol-client-0 (Transport endpoint is not connected) [2013-12-03 05:42:32.799278] E [afr-self-heal-data.c:1321:afr_sh_data_open_cbk] 0-glustervol-replicate-0: open of <gfid:565f1358-449c-45e2-8535-93b5632c0d1e> failed on child glustervol-client-0 (Transport endpoint is not connected) [2013-12-03 05:42:32.800636] E [afr-self-heal-data.c:1321:afr_sh_data_open_cbk] 0-glustervol-replicate-0: open of <gfid:9c7010ac-5c11-4561-8b86-5c4d6561f34e> failed on child glustervol-client-0 (Transport endpoint is not connected) [2013-12-03 05:42:32.802223] E [afr-self-heal-data.c:1321:afr_sh_data_open_cbk] 0-glustervol-replicate-0: open of <gfid:25fd406f-63e0-4037-bb01-da282cbe4d76> failed on child glustervol-client-0 (Transport endpoint is not connected) [2013-12-03 05:42:32.803339] E [afr-self-heal-data.c:1321:afr_sh_data_open_cbk] 0-glustervol-replicate-0: open of <gfid:a109c429-5885-499e-8711-09fdccd396f2> failed on child glustervol-client-0 (Transport endpoint is not connected) [2013-12-03 05:42:32.804308] E [afr-self-heal-data.c:1321:afr_sh_data_open_cbk] 0-glustervol-replicate-0: open of <gfid:5a8fd3bf-9215-444c-b974-5c280f5699a6> failed on child glustervol-client-0 (Transport endpoint is not connected) [2013-12-03 05:42:32.804877] I [client-handshake.c:1614:select_server_supported_programs] 0-glustervol-client-0: Using Program GlusterFS 3.3.2, Num (1298437), Version (330) [2013-12-03 05:42:32.807517] I [client-handshake.c:1411:client_setvolume_cbk] 0-glustervol-client-0: Connected to 172.16.107.154:24009, attached to remote volume '/mnt/cloudbrick'. [2013-12-03 05:42:32.807562] I [client-handshake.c:1423:client_setvolume_cbk] 0-glustervol-client-0: Server and Client lk-version numbers are not same, reopening the fds [2013-12-03 05:42:32.810357] I [client-handshake.c:453:client_set_lk_version_cbk] 0-glustervol-client-0: Server lk version = 1 [2013-12-03 05:42:32.827437] E [afr-self-heal-data.c:764:afr_sh_data_fxattrop_fstat_done] 0-glustervol-replicate-0: Unable to self-heal contents of '<gfid:1262d40d-46a3-4e57-b07b-0fcc972c8403>' (possible split-brain). Please delete the file from all but the preferred subvolume. [2013-12-03 05:42:39.205157] E [afr-self-heal-metadata.c:472:afr_sh_metadata_fix] 0-glustervol-replicate-0: Unable to self-heal permissions/ownership of '<gfid:c590e3fb-a376-4ac9-86a6-14a80814e06f>' (possible split-brain). Please fix the file on all backend volumes [2013-12-03 05:42:39.215793] E [afr-self-heal-metadata.c:472:afr_sh_metadata_fix] 0-glustervol-replicate-0: Unable to self-heal permissions/ownership of '<gfid:c0660768-289f-48ac-b8e5-e5b5a3a4b965>' (possible split-brain). Please fix the file on all backend volumes PLEASE ADVICE. Thanks & Regards, Bobby Jacob -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131203/0027965e/attachment.html>
Just and addition: on the node where the self heal is not working when I check /var/log/glusterd/glustershd.log, I see the following: [2013-12-03 05:49:18.348637] E [afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000) [2013-12-03 05:49:18.350273] E [afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000) [2013-12-03 05:49:18.354813] E [afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000) [2013-12-03 05:49:18.355893] E [afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000) [2013-12-03 05:49:18.356901] E [afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000) [2013-12-03 05:49:18.357730] E [afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000) [2013-12-03 05:49:18.359136] E [afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000) [2013-12-03 05:49:18.360276] E [afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000) [2013-12-03 05:49:18.361168] E [afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000) [2013-12-03 05:49:18.362135] E [afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000) [2013-12-03 05:49:18.363569] E [afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000) [2013-12-03 05:49:18.364232] E [afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000) [2013-12-03 05:49:18.364872] E [afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000) [2013-12-03 05:49:18.365777] E [afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000) [2013-12-03 05:49:18.367383] E [afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000) [2013-12-03 05:49:18.368075] E [afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000) Thanks & Regards, Bobby Jacob From: gluster-users-bounces at gluster.org [mailto:gluster-users-bounces at gluster.org] On Behalf Of Bobby Jacob Sent: Tuesday, December 03, 2013 8:48 AM To: gluster-users at gluster.org Subject: [Gluster-users] Self Heal Issue GlusterFS 3.3.1 Hi, I'm running glusterFS 3.3.1 on Centos 6.4. ? Gluster volume status Status of volume: glustervol Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick KWTOCUATGS001:/mnt/cloudbrick 24009 Y 20031 Brick KWTOCUATGS002:/mnt/cloudbrick 24009 Y 1260 NFS Server on localhost 38467 Y 43320 Self-heal Daemon on localhost N/A Y 43326 NFS Server on KWTOCUATGS002 38467 Y 5842 Self-heal Daemon on KWTOCUATGS002 N/A Y 5848 The self heal stops working and application write only to 1 brick and it doesn't replicate. When I check /var/log/glusterfs/glustershd.log I see the following.: [2013-12-03 05:42:32.033563] W [socket.c:410:__socket_keepalive] 0-socket: failed to set keep idle on socket 8 [2013-12-03 05:42:32.033646] W [socket.c:1876:socket_server_event_handler] 0-socket.glusterfsd: Failed to set keep-alive: Operation not supported [2013-12-03 05:42:32.790473] I [client-handshake.c:1614:select_server_supported_programs] 0-glustervol-client-1: Using Program GlusterFS 3.3.2, Num (1298437), Version (330) [2013-12-03 05:42:32.790840] I [client-handshake.c:1411:client_setvolume_cbk] 0-glustervol-client-1: Connected to 172.16.95.153:24009, attached to remote volume '/mnt/cloudbrick'. [2013-12-03 05:42:32.790884] I [client-handshake.c:1423:client_setvolume_cbk] 0-glustervol-client-1: Server and Client lk-version numbers are not same, reopening the fds [2013-12-03 05:42:32.791003] I [afr-common.c:3685:afr_notify] 0-glustervol-replicate-0: Subvolume 'glustervol-client-1' came back up; going online. [2013-12-03 05:42:32.791161] I [client-handshake.c:453:client_set_lk_version_cbk] 0-glustervol-client-1: Server lk version = 1 [2013-12-03 05:42:32.795103] E [afr-self-heal-data.c:1321:afr_sh_data_open_cbk] 0-glustervol-replicate-0: open of <gfid:a7e88fd1-6e32-40ab-90f6-ea452242a7c6> failed on child glustervol-client-0 (Transport endpoint is not connected) [2013-12-03 05:42:32.798064] E [afr-self-heal-data.c:1321:afr_sh_data_open_cbk] 0-glustervol-replicate-0: open of <gfid:081c6657-301a-42a4-9f95-6eeba6c67413> failed on child glustervol-client-0 (Transport endpoint is not connected) [2013-12-03 05:42:32.799278] E [afr-self-heal-data.c:1321:afr_sh_data_open_cbk] 0-glustervol-replicate-0: open of <gfid:565f1358-449c-45e2-8535-93b5632c0d1e> failed on child glustervol-client-0 (Transport endpoint is not connected) [2013-12-03 05:42:32.800636] E [afr-self-heal-data.c:1321:afr_sh_data_open_cbk] 0-glustervol-replicate-0: open of <gfid:9c7010ac-5c11-4561-8b86-5c4d6561f34e> failed on child glustervol-client-0 (Transport endpoint is not connected) [2013-12-03 05:42:32.802223] E [afr-self-heal-data.c:1321:afr_sh_data_open_cbk] 0-glustervol-replicate-0: open of <gfid:25fd406f-63e0-4037-bb01-da282cbe4d76> failed on child glustervol-client-0 (Transport endpoint is not connected) [2013-12-03 05:42:32.803339] E [afr-self-heal-data.c:1321:afr_sh_data_open_cbk] 0-glustervol-replicate-0: open of <gfid:a109c429-5885-499e-8711-09fdccd396f2> failed on child glustervol-client-0 (Transport endpoint is not connected) [2013-12-03 05:42:32.804308] E [afr-self-heal-data.c:1321:afr_sh_data_open_cbk] 0-glustervol-replicate-0: open of <gfid:5a8fd3bf-9215-444c-b974-5c280f5699a6> failed on child glustervol-client-0 (Transport endpoint is not connected) [2013-12-03 05:42:32.804877] I [client-handshake.c:1614:select_server_supported_programs] 0-glustervol-client-0: Using Program GlusterFS 3.3.2, Num (1298437), Version (330) [2013-12-03 05:42:32.807517] I [client-handshake.c:1411:client_setvolume_cbk] 0-glustervol-client-0: Connected to 172.16.107.154:24009, attached to remote volume '/mnt/cloudbrick'. [2013-12-03 05:42:32.807562] I [client-handshake.c:1423:client_setvolume_cbk] 0-glustervol-client-0: Server and Client lk-version numbers are not same, reopening the fds [2013-12-03 05:42:32.810357] I [client-handshake.c:453:client_set_lk_version_cbk] 0-glustervol-client-0: Server lk version = 1 [2013-12-03 05:42:32.827437] E [afr-self-heal-data.c:764:afr_sh_data_fxattrop_fstat_done] 0-glustervol-replicate-0: Unable to self-heal contents of '<gfid:1262d40d-46a3-4e57-b07b-0fcc972c8403>' (possible split-brain). Please delete the file from all but the preferred subvolume. [2013-12-03 05:42:39.205157] E [afr-self-heal-metadata.c:472:afr_sh_metadata_fix] 0-glustervol-replicate-0: Unable to self-heal permissions/ownership of '<gfid:c590e3fb-a376-4ac9-86a6-14a80814e06f>' (possible split-brain). Please fix the file on all backend volumes [2013-12-03 05:42:39.215793] E [afr-self-heal-metadata.c:472:afr_sh_metadata_fix] 0-glustervol-replicate-0: Unable to self-heal permissions/ownership of '<gfid:c0660768-289f-48ac-b8e5-e5b5a3a4b965>' (possible split-brain). Please fix the file on all backend volumes PLEASE ADVICE. Thanks & Regards, Bobby Jacob -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131203/e4542e06/attachment.html> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ATT00001.txt URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131203/e4542e06/attachment.txt>
Hi, Can someone please advise on this issue. ?? Urgent. Selfheal is working every 10 minutes only. ?? Thanks & Regards, Bobby Jacob From: Bobby Jacob Sent: Tuesday, December 03, 2013 8:51 AM To: gluster-users at gluster.org Subject: FW: Self Heal Issue GlusterFS 3.3.1 Just and addition: on the node where the self heal is not working when I check /var/log/glusterd/glustershd.log, I see the following: [2013-12-03 05:49:18.348637] E [afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000) [2013-12-03 05:49:18.350273] E [afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000) [2013-12-03 05:49:18.354813] E [afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000) [2013-12-03 05:49:18.355893] E [afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000) [2013-12-03 05:49:18.356901] E [afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000) [2013-12-03 05:49:18.357730] E [afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000) [2013-12-03 05:49:18.359136] E [afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000) [2013-12-03 05:49:18.360276] E [afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000) [2013-12-03 05:49:18.361168] E [afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000) [2013-12-03 05:49:18.362135] E [afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000) [2013-12-03 05:49:18.363569] E [afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000) [2013-12-03 05:49:18.364232] E [afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000) [2013-12-03 05:49:18.364872] E [afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000) [2013-12-03 05:49:18.365777] E [afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000) [2013-12-03 05:49:18.367383] E [afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000) [2013-12-03 05:49:18.368075] E [afr-self-heald.c:685:_link_inode_update_loc] 0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000) Thanks & Regards, Bobby Jacob From: gluster-users-bounces at gluster.org [mailto:gluster-users-bounces at gluster.org] On Behalf Of Bobby Jacob Sent: Tuesday, December 03, 2013 8:48 AM To: gluster-users at gluster.org Subject: [Gluster-users] Self Heal Issue GlusterFS 3.3.1 Hi, I'm running glusterFS 3.3.1 on Centos 6.4. ? Gluster volume status Status of volume: glustervol Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick KWTOCUATGS001:/mnt/cloudbrick 24009 Y 20031 Brick KWTOCUATGS002:/mnt/cloudbrick 24009 Y 1260 NFS Server on localhost 38467 Y 43320 Self-heal Daemon on localhost N/A Y 43326 NFS Server on KWTOCUATGS002 38467 Y 5842 Self-heal Daemon on KWTOCUATGS002 N/A Y 5848 The self heal stops working and application write only to 1 brick and it doesn't replicate. When I check /var/log/glusterfs/glustershd.log I see the following.: [2013-12-03 05:42:32.033563] W [socket.c:410:__socket_keepalive] 0-socket: failed to set keep idle on socket 8 [2013-12-03 05:42:32.033646] W [socket.c:1876:socket_server_event_handler] 0-socket.glusterfsd: Failed to set keep-alive: Operation not supported [2013-12-03 05:42:32.790473] I [client-handshake.c:1614:select_server_supported_programs] 0-glustervol-client-1: Using Program GlusterFS 3.3.2, Num (1298437), Version (330) [2013-12-03 05:42:32.790840] I [client-handshake.c:1411:client_setvolume_cbk] 0-glustervol-client-1: Connected to 172.16.95.153:24009, attached to remote volume '/mnt/cloudbrick'. [2013-12-03 05:42:32.790884] I [client-handshake.c:1423:client_setvolume_cbk] 0-glustervol-client-1: Server and Client lk-version numbers are not same, reopening the fds [2013-12-03 05:42:32.791003] I [afr-common.c:3685:afr_notify] 0-glustervol-replicate-0: Subvolume 'glustervol-client-1' came back up; going online. [2013-12-03 05:42:32.791161] I [client-handshake.c:453:client_set_lk_version_cbk] 0-glustervol-client-1: Server lk version = 1 [2013-12-03 05:42:32.795103] E [afr-self-heal-data.c:1321:afr_sh_data_open_cbk] 0-glustervol-replicate-0: open of <gfid:a7e88fd1-6e32-40ab-90f6-ea452242a7c6> failed on child glustervol-client-0 (Transport endpoint is not connected) [2013-12-03 05:42:32.798064] E [afr-self-heal-data.c:1321:afr_sh_data_open_cbk] 0-glustervol-replicate-0: open of <gfid:081c6657-301a-42a4-9f95-6eeba6c67413> failed on child glustervol-client-0 (Transport endpoint is not connected) [2013-12-03 05:42:32.799278] E [afr-self-heal-data.c:1321:afr_sh_data_open_cbk] 0-glustervol-replicate-0: open of <gfid:565f1358-449c-45e2-8535-93b5632c0d1e> failed on child glustervol-client-0 (Transport endpoint is not connected) [2013-12-03 05:42:32.800636] E [afr-self-heal-data.c:1321:afr_sh_data_open_cbk] 0-glustervol-replicate-0: open of <gfid:9c7010ac-5c11-4561-8b86-5c4d6561f34e> failed on child glustervol-client-0 (Transport endpoint is not connected) [2013-12-03 05:42:32.802223] E [afr-self-heal-data.c:1321:afr_sh_data_open_cbk] 0-glustervol-replicate-0: open of <gfid:25fd406f-63e0-4037-bb01-da282cbe4d76> failed on child glustervol-client-0 (Transport endpoint is not connected) [2013-12-03 05:42:32.803339] E [afr-self-heal-data.c:1321:afr_sh_data_open_cbk] 0-glustervol-replicate-0: open of <gfid:a109c429-5885-499e-8711-09fdccd396f2> failed on child glustervol-client-0 (Transport endpoint is not connected) [2013-12-03 05:42:32.804308] E [afr-self-heal-data.c:1321:afr_sh_data_open_cbk] 0-glustervol-replicate-0: open of <gfid:5a8fd3bf-9215-444c-b974-5c280f5699a6> failed on child glustervol-client-0 (Transport endpoint is not connected) [2013-12-03 05:42:32.804877] I [client-handshake.c:1614:select_server_supported_programs] 0-glustervol-client-0: Using Program GlusterFS 3.3.2, Num (1298437), Version (330) [2013-12-03 05:42:32.807517] I [client-handshake.c:1411:client_setvolume_cbk] 0-glustervol-client-0: Connected to 172.16.107.154:24009, attached to remote volume '/mnt/cloudbrick'. [2013-12-03 05:42:32.807562] I [client-handshake.c:1423:client_setvolume_cbk] 0-glustervol-client-0: Server and Client lk-version numbers are not same, reopening the fds [2013-12-03 05:42:32.810357] I [client-handshake.c:453:client_set_lk_version_cbk] 0-glustervol-client-0: Server lk version = 1 [2013-12-03 05:42:32.827437] E [afr-self-heal-data.c:764:afr_sh_data_fxattrop_fstat_done] 0-glustervol-replicate-0: Unable to self-heal contents of '<gfid:1262d40d-46a3-4e57-b07b-0fcc972c8403>' (possible split-brain). Please delete the file from all but the preferred subvolume. [2013-12-03 05:42:39.205157] E [afr-self-heal-metadata.c:472:afr_sh_metadata_fix] 0-glustervol-replicate-0: Unable to self-heal permissions/ownership of '<gfid:c590e3fb-a376-4ac9-86a6-14a80814e06f>' (possible split-brain). Please fix the file on all backend volumes [2013-12-03 05:42:39.215793] E [afr-self-heal-metadata.c:472:afr_sh_metadata_fix] 0-glustervol-replicate-0: Unable to self-heal permissions/ownership of '<gfid:c0660768-289f-48ac-b8e5-e5b5a3a4b965>' (possible split-brain). Please fix the file on all backend volumes PLEASE ADVICE. Thanks & Regards, Bobby Jacob -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131210/8fa935eb/attachment.html> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ATT00001.txt URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131210/8fa935eb/attachment.txt>
On Tue, 2013-12-03 at 05:47 +0000, Bobby Jacob wrote:> Hi, > > > > I?m running glusterFS 3.3.1 on Centos 6.4. > > ? Gluster volume status > > > > Status of volume: glustervol > > Gluster process Port Online > Pid > > ------------------------------------------------------------------------------ > > Brick KWTOCUATGS001:/mnt/cloudbrick 24009 Y > 20031 > > Brick KWTOCUATGS002:/mnt/cloudbrick 24009 Y > 1260 > > NFS Server on localhost > 38467 Y 43320 > > Self-heal Daemon on localhost N/A > Y 43326 > > NFS Server on KWTOCUATGS002 38467 Y > 5842 > > Self-heal Daemon on KWTOCUATGS002 N/A Y > 5848 > > > > The self heal stops working and application write only to 1 brick and > it doesn?t replicate. When I check /var/log/glusterfs/glustershd.log I > see the following.: > > > > [2013-12-03 05:42:32.033563] W [socket.c:410:__socket_keepalive] > 0-socket: failed to set keep idle on socket 8 > > [2013-12-03 05:42:32.033646] W > [socket.c:1876:socket_server_event_handler] 0-socket.glusterfsd: > Failed to set keep-alive: Operation not supported > > [2013-12-03 05:42:32.790473] I > [client-handshake.c:1614:select_server_supported_programs] > 0-glustervol-client-1: Using Program GlusterFS 3.3.2, Num (1298437), > Version (330) > > [2013-12-03 05:42:32.790840] I > [client-handshake.c:1411:client_setvolume_cbk] 0-glustervol-client-1: > Connected to 172.16.95.153:24009, attached to remote volume > '/mnt/cloudbrick'. > > [2013-12-03 05:42:32.790884] I > [client-handshake.c:1423:client_setvolume_cbk] 0-glustervol-client-1: > Server and Client lk-version numbers are not same, reopening the fds > > [2013-12-03 05:42:32.791003] I [afr-common.c:3685:afr_notify] > 0-glustervol-replicate-0: Subvolume 'glustervol-client-1' came back > up; going online. > > [2013-12-03 05:42:32.791161] I > [client-handshake.c:453:client_set_lk_version_cbk] > 0-glustervol-client-1: Server lk version = 1 > > [2013-12-03 05:42:32.795103] E > [afr-self-heal-data.c:1321:afr_sh_data_open_cbk] > 0-glustervol-replicate-0: open of > <gfid:a7e88fd1-6e32-40ab-90f6-ea452242a7c6> failed on child > glustervol-client-0 (Transport endpoint is not connected) > > [2013-12-03 05:42:32.798064] E > [afr-self-heal-data.c:1321:afr_sh_data_open_cbk] > 0-glustervol-replicate-0: open of > <gfid:081c6657-301a-42a4-9f95-6eeba6c67413> failed on child > glustervol-client-0 (Transport endpoint is not connected) > > [2013-12-03 05:42:32.799278] E > [afr-self-heal-data.c:1321:afr_sh_data_open_cbk] > 0-glustervol-replicate-0: open of > <gfid:565f1358-449c-45e2-8535-93b5632c0d1e> failed on child > glustervol-client-0 (Transport endpoint is not connected) > > [2013-12-03 05:42:32.800636] E > [afr-self-heal-data.c:1321:afr_sh_data_open_cbk] > 0-glustervol-replicate-0: open of > <gfid:9c7010ac-5c11-4561-8b86-5c4d6561f34e> failed on child > glustervol-client-0 (Transport endpoint is not connected) > > [2013-12-03 05:42:32.802223] E > [afr-self-heal-data.c:1321:afr_sh_data_open_cbk] > 0-glustervol-replicate-0: open of > <gfid:25fd406f-63e0-4037-bb01-da282cbe4d76> failed on child > glustervol-client-0 (Transport endpoint is not connected) > > [2013-12-03 05:42:32.803339] E > [afr-self-heal-data.c:1321:afr_sh_data_open_cbk] > 0-glustervol-replicate-0: open of > <gfid:a109c429-5885-499e-8711-09fdccd396f2> failed on child > glustervol-client-0 (Transport endpoint is not connected) > > [2013-12-03 05:42:32.804308] E > [afr-self-heal-data.c:1321:afr_sh_data_open_cbk] > 0-glustervol-replicate-0: open of > <gfid:5a8fd3bf-9215-444c-b974-5c280f5699a6> failed on child > glustervol-client-0 (Transport endpoint is not connected) > > [2013-12-03 05:42:32.804877] I > [client-handshake.c:1614:select_server_supported_programs] > 0-glustervol-client-0: Using Program GlusterFS 3.3.2, Num (1298437), > Version (330) > > [2013-12-03 05:42:32.807517] I > [client-handshake.c:1411:client_setvolume_cbk] 0-glustervol-client-0: > Connected to 172.16.107.154:24009, attached to remote volume > '/mnt/cloudbrick'. > > [2013-12-03 05:42:32.807562] I > [client-handshake.c:1423:client_setvolume_cbk] 0-glustervol-client-0: > Server and Client lk-version numbers are not same, reopening the fds > > [2013-12-03 05:42:32.810357] I > [client-handshake.c:453:client_set_lk_version_cbk] > 0-glustervol-client-0: Server lk version = 1 > > [2013-12-03 05:42:32.827437] E > [afr-self-heal-data.c:764:afr_sh_data_fxattrop_fstat_done] > 0-glustervol-replicate-0: Unable to self-heal contents of > '<gfid:1262d40d-46a3-4e57-b07b-0fcc972c8403>' (possible split-brain). > Please delete the file from all but the preferred subvolume.That file is at $brick/.glusterfs/12/62/1262d40d-46a3-4e57-b07b-0fcc972c8403 Try picking one to remove like it says.> > [2013-12-03 05:42:39.205157] E > [afr-self-heal-metadata.c:472:afr_sh_metadata_fix] > 0-glustervol-replicate-0: Unable to self-heal permissions/ownership of > '<gfid:c590e3fb-a376-4ac9-86a6-14a80814e06f>' (possible split-brain). > Please fix the file on all backend volumes > > [2013-12-03 05:42:39.215793] E > [afr-self-heal-metadata.c:472:afr_sh_metadata_fix] > 0-glustervol-replicate-0: Unable to self-heal permissions/ownership of > '<gfid:c0660768-289f-48ac-b8e5-e5b5a3a4b965>' (possible split-brain). > Please fix the file on all backend volumes > >If that doesn't allow it to heal, you may need to find which filename that's hardlinked to. ls -li the gfid file at the path I demonstrated earlier. With that inode number in hand, find $brick -inum $inode_number Once you know which filenames it's linked with, remove all linked copies from all but one replica. Then the self-heal can continue successfully.