Pranith Kumar Karampuri
2015-Jan-23 07:41 UTC
[Gluster-users] [gluster] possible split-brain issue
hi Arnold, You gave the output only on one brick it seems? Could you also provide it on other brick as well. Sorry I didn't make that clear in my earlier mail. Pranith On 01/23/2015 10:44 AM, Arnold Yang wrote:> > Hi Pranith, > > Here is the output for the commands provide by you, anything more you > need, please tell us! > > Thanks! > > [root at dmf-wpst-1 ~]# getfattr -d -m. -e hex /export/vdb1/brick/ > > getfattr: Removing leading '/' from absolute path names > > # file: export/vdb1/brick/ > > security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000 > > trusted.afr.gv0-client-0=0x000000000000000000000000 > > trusted.afr.gv0-client-1=0x000000000000001400000000 > > trusted.gfid=0x00000000000000000000000000000001 > > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > > trusted.glusterfs.volume-id=0x51de44c3f01e486da6b710c7b7a270d7 > > [root at dmf-wpst-1 ~]# getfattr -d -m. -e hex /export/vdb1/brick/mpdis/ > > getfattr: Removing leading '/' from absolute path names > > # file: export/vdb1/brick/mpdis/ > > security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000 > > trusted.afr.gv0-client-0=0x000000000000000000000000 > > trusted.afr.gv0-client-1=0x000000000000000400000000 > > trusted.gfid=0x8ff7afeb996244cd9d1bf213568398d7 > > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > > [root at dmf-wpst-1 ~]# getfattr -d -m. -e hex > /export/vdb1/brick/mpdis/test.rep.00.00.00.00.dmf1 > > getfattr: Removing leading '/' from absolute path names > > # file: export/vdb1/brick/mpdis/test.rep.00.00.00.00.dmf1 > > security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000 > > trusted.afr.gv0-client-0=0x000000000000000000000000 > > trusted.afr.gv0-client-1=0x000000000000000000000000 > > trusted.gfid=0x85ed306b179b46819d7c02eb336543b8 > > [root at dmf-wpst-1 ~]# getfattr -d -m. -e hex > /export/vdb1/brick/mpdis/test.rep.00.00.00.00.dmf2 > > getfattr: Removing leading '/' from absolute path names > > # file: export/vdb1/brick/mpdis/test.rep.00.00.00.00.dmf2 > > security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000 > > trusted.afr.gv0-client-0=0x000000000000000000000000 > > trusted.afr.gv0-client-1=0x000000000000000000000000 > > trusted.gfid=0xa826a389e7a042c2b5175a1acbecae9b > > *From:*Pranith Kumar Karampuri [mailto:pkarampu at redhat.com] > *Sent:* Thursday, January 22, 2015 12:14 AM > *To:* Jifeng Li; Gluster-users at gluster.org; Arnold Yang > *Subject:* Re: [Gluster-users] [gluster] possible split-brain issue > > On 01/14/2015 04:48 PM, Jifeng Li wrote: > > Hi , > > [issue]: To ensure the glusterFS mount point work, a script will > periodically using HTTP put a file to subdirectory under mount > point which is used as Apache DocumentRoot. But after running > some time, some errors show below: > > [2015-01-14 09:18:40.915639] E > [afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] > 0-gv0-replicate-0: metadata self heal failed, on /mpdis > > [2015-01-14 09:18:41.924584] E > [afr-self-heal-common.c:233:afr_sh_print_split_brain_log] > 0-gv0-replicate-0: Unable to self-heal contents of '/' (possible > split-brain). Please delete the file from all but the preferred > subvolume.- Pending matrix: [ [ 0 20 ] [ 21 0 ] ] > > [2015-01-14 09:18:41.925182] E > [afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] > 0-gv0-replicate-0: metadata self heal failed, on / > > [2015-01-14 09:18:41.934827] E > [afr-self-heal-common.c:233:afr_sh_print_split_brain_log] > 0-gv0-replicate-0: Unable to self-heal contents of '/mpdis' > (possible split-brain). Please delete the file from all but the > preferred subvolume.- Pending matrix: [ [ 0 4 ] [ 2 0 ] ] > > [2015-01-14 09:18:41.935375] E > [afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] > 0-gv0-replicate-0: metadata self heal failed, on /mpdis > > [2015-01-14 09:18:42.943742] E > [afr-self-heal-common.c:233:afr_sh_print_split_brain_log] > 0-gv0-replicate-0: Unable to self-heal contents of '/' (possible > split-brain). Please delete the file from all but the preferred > subvolume.- Pending matrix: [ [ 0 20 ] [ 21 0 ] ] > > [2015-01-14 09:18:42.944432] E > [afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] > 0-gv0-replicate-0: metadata self heal failed, on / > > [2015-01-14 09:18:42.946664] E > [afr-self-heal-common.c:233:afr_sh_print_split_brain_log] > 0-gv0-replicate-0: Unable to self-heal contents of '/mpdis' > (possible split-brain). Please delete the file from all but the > preferred subvolume.- Pending matrix: [ [ 0 4 ] [ 2 0 ] ] > > [2015-01-14 09:18:42.947323] E > [afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] > 0-gv0-replicate-0: metadata self heal failed, on /mpdis > > [2015-01-14 09:18:43.955929] E > [afr-self-heal-common.c:233:afr_sh_print_split_brain_log] > 0-gv0-replicate-0: Unable to self-heal contents of '/' (possible > split-brain). Please delete the file from all but the preferred > subvolume.- Pending matrix: [ [ 0 20 ] [ 21 0 ] ] > > [2015-01-14 09:18:43.956701] E > [afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] > 0-gv0-replicate-0: metadata self heal failed, on / > > [2015-01-14 09:18:43.958874] E > [afr-self-heal-common.c:233:afr_sh_print_split_brain_log] > 0-gv0-replicate-0: Unable to self-heal contents of '/mpdis' > (possible split-brain). Please delete the file from all but the > preferred subvolume.- Pending matrix: [ [ 0 4 ] [ 2 0 ] ] > > Besides, I find Input/output error shown below when listing the > files of under mount point: > > [root at dmf-wpst-2 mpdis]# ll > > total 0 > > -rwxr-xr-x. 1 apache apache 0 Jan 14 04:21 test.rep.00.00.00.00.dmf1 > > -rw-r--r--. 1 apache apache 0 Jan 14 04:21 test.rep.00.00.00.00.dmf2 > > [root at dmf-wpst-2 mpdis]# ll > > total 0 > > -rwxr-xr-x. 1 apache apache 0 Jan 14 04:21 test.rep.00.00.00.00.dmf1 > > -rw-r--r--. 1 apache apache 0 Jan 14 04:21 test.rep.00.00.00.00.dmf2 > > [root at dmf-wpst-2 mpdis]# ll > > ls: cannot open directory .: Input/output error > > [root at dmf-wpst-2 mpdis]# ll > > ls: cannot access test.rep.00.00.00.00.dmf1: Input/output error > > ls: cannot access test.rep.00.00.00.00.dmf2: Input/output error > > total 0 > > ?????????? ? ? ? ? ? test.rep.00.00.00.00.dmf1 > > ?????????? ? ? ? ? ? test.rep.00.00.00.00.dmf2 > > * Any tips about debugging further or getting this fixed up > would be appreciated. * > > [version]: 3.5.3 > > [environment]: two virtual server each has one brick : > > root at dmf-wpst-2 mpdis]# gluster volume status > > Status of volume: gv0 > > Gluster process Port Online Pid > > ------------------------------------------------------------------------------ > > Brick dmf-ha-1-glusterfs:/export/vdb1/brick 49152 > Y 332 > > Brick dmf-ha-2-glusterfs:/export/vdb1/brick 49154 > Y 19396 > > Self-heal Daemon on localhost N/A Y 19410 > > Self-heal Daemon on 10.175.123.246 N/A Y 999 > > [root at dmf-wpst-1 mpdis]# gluster volume info > > Volume Name: gv0 > > Type: Replicate > > Volume ID: 51de44c3-f01e-486d-a6b7-10c7b7a270d7 > > Status: Started > > Number of Bricks: 1 x 2 = 2 > > Transport-type: tcp > > Bricks: > > Brick1: dmf-ha-1-glusterfs:/export/vdb1/brick > > Brick2: dmf-ha-2-glusterfs:/export/vdb1/brick > > Options Reconfigured: > > nfs.disable: ON > > network.ping-timeout: 2 > > storage.bd-aio: on > > storage.linux-aio: on > > cluster.eager-lock: on > > performance.client-io-threads: on > > performance.cache-refresh-timeout: 60 > > performance.io-thread-count: 64 > > performance.cache-size: 8GB > > cluster.server-quorum-type: none > > [mount-point info]: > > 1.mount command > > glusterfs -p /var/run/glusterfs.pid > --volfile-server=dmf-ha-1-glusterfs > --volfile-server=dmf-ha-2-glusterfs --volfile-id=gv0 /dmfcontents > > 2.mount point directory hierarchy > > [root at dmf-wpst-2 /]# ls -ld /dmfcontents/ > > drwxr-xr-x. 5 root root 71 Jan 14 04:39 /dmfcontents/ > > [root at dmf-wpst-2 /]# ls -ld /dmfcontents/mpdis/ > > drwxr-xr-x. 2 apache apache 89 Jan 14 04:39 > /dmfcontents/mpdis/ > > hi Jifeng Li, > Sorry for the delay in response. Could you post the output of: > 'getfattr -d -m. -e hex <brick-path>' > 'getfattr -d -m. -e hex <brick-path>/mpdis' > 'getfattr -d -m. -e hex <brick-path>/mpdis/test.rep.00.00.00.00.dmf1' > 'getfattr -d -m. -e hex <brick-path>/mpdis/test.rep.00.00.00.00.dmf2' > > Pranith > > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> > http://www.gluster.org/mailman/listinfo/gluster-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150123/b5f82139/attachment.html>
Hi Pranith, No worries! Here is the output of the other brick: [root at dmf-wpst-2 ~]# getfattr -d -m. -e hex /export/vdb1/brick/ getfattr: Removing leading '/' from absolute path names # file: export/vdb1/brick/ security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000 trusted.afr.gv0-client-0=0x000000000000001500000000 trusted.afr.gv0-client-1=0x000000000000000000000000 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.volume-id=0x51de44c3f01e486da6b710c7b7a270d7 [root at dmf-wpst-2 ~]# getfattr -d -m. -e hex /export/vdb1/brick/mpdis/ getfattr: Removing leading '/' from absolute path names # file: export/vdb1/brick/mpdis/ security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000 trusted.afr.gv0-client-0=0x000000000000000200000000 trusted.afr.gv0-client-1=0x000000000000000000000000 trusted.gfid=0x8ff7afeb996244cd9d1bf213568398d7 trusted.glusterfs.dht=0x000000010000000000000000ffffffff [root at dmf-wpst-2 ~]# getfattr -d -m. -e hex /export/vdb1/brick/mpdis/test.rep.00.00.00.00.dmf1 getfattr: Removing leading '/' from absolute path names # file: export/vdb1/brick/mpdis/test.rep.00.00.00.00.dmf1 security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000 trusted.afr.gv0-client-0=0x000000000000000000000000 trusted.afr.gv0-client-1=0x000000000000000000000000 trusted.gfid=0x85ed306b179b46819d7c02eb336543b8 [root at dmf-wpst-2 ~]# getfattr -d -m. -e hex /export/vdb1/brick/mpdis/test.rep.00.00.00.00.dmf2 getfattr: Removing leading '/' from absolute path names # file: export/vdb1/brick/mpdis/test.rep.00.00.00.00.dmf2 security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000 trusted.afr.gv0-client-0=0x000000000000000000000000 trusted.afr.gv0-client-1=0x000000000000000000000000 trusted.gfid=0xa826a389e7a042c2b5175a1acbecae9b From: Pranith Kumar Karampuri [mailto:pkarampu at redhat.com] Sent: Friday, January 23, 2015 3:42 PM To: Arnold Yang; Jifeng Li; Gluster-users at gluster.org Subject: Re: [Gluster-users] [gluster] possible split-brain issue hi Arnold, You gave the output only on one brick it seems? Could you also provide it on other brick as well. Sorry I didn't make that clear in my earlier mail. Pranith On 01/23/2015 10:44 AM, Arnold Yang wrote: Hi Pranith, Here is the output for the commands provide by you, anything more you need, please tell us! Thanks! [root at dmf-wpst-1 ~]# getfattr -d -m. -e hex /export/vdb1/brick/ getfattr: Removing leading '/' from absolute path names # file: export/vdb1/brick/ security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000 trusted.afr.gv0-client-0=0x000000000000000000000000 trusted.afr.gv0-client-1=0x000000000000001400000000 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.volume-id=0x51de44c3f01e486da6b710c7b7a270d7 [root at dmf-wpst-1 ~]# getfattr -d -m. -e hex /export/vdb1/brick/mpdis/ getfattr: Removing leading '/' from absolute path names # file: export/vdb1/brick/mpdis/ security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000 trusted.afr.gv0-client-0=0x000000000000000000000000 trusted.afr.gv0-client-1=0x000000000000000400000000 trusted.gfid=0x8ff7afeb996244cd9d1bf213568398d7 trusted.glusterfs.dht=0x000000010000000000000000ffffffff [root at dmf-wpst-1 ~]# getfattr -d -m. -e hex /export/vdb1/brick/mpdis/test.rep.00.00.00.00.dmf1 getfattr: Removing leading '/' from absolute path names # file: export/vdb1/brick/mpdis/test.rep.00.00.00.00.dmf1 security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000 trusted.afr.gv0-client-0=0x000000000000000000000000 trusted.afr.gv0-client-1=0x000000000000000000000000 trusted.gfid=0x85ed306b179b46819d7c02eb336543b8 [root at dmf-wpst-1 ~]# getfattr -d -m. -e hex /export/vdb1/brick/mpdis/test.rep.00.00.00.00.dmf2 getfattr: Removing leading '/' from absolute path names # file: export/vdb1/brick/mpdis/test.rep.00.00.00.00.dmf2 security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000 trusted.afr.gv0-client-0=0x000000000000000000000000 trusted.afr.gv0-client-1=0x000000000000000000000000 trusted.gfid=0xa826a389e7a042c2b5175a1acbecae9b From: Pranith Kumar Karampuri [mailto:pkarampu at redhat.com] Sent: Thursday, January 22, 2015 12:14 AM To: Jifeng Li; Gluster-users at gluster.org<mailto:Gluster-users at gluster.org>; Arnold Yang Subject: Re: [Gluster-users] [gluster] possible split-brain issue On 01/14/2015 04:48 PM, Jifeng Li wrote: Hi , [issue]: To ensure the glusterFS mount point work, a script will periodically using HTTP put a file to subdirectory under mount point which is used as Apache DocumentRoot. But after running some time, some errors show below: [2015-01-14 09:18:40.915639] E [afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] 0-gv0-replicate-0: metadata self heal failed, on /mpdis [2015-01-14 09:18:41.924584] E [afr-self-heal-common.c:233:afr_sh_print_split_brain_log] 0-gv0-replicate-0: Unable to self-heal contents of '/' (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix: [ [ 0 20 ] [ 21 0 ] ] [2015-01-14 09:18:41.925182] E [afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] 0-gv0-replicate-0: metadata self heal failed, on / [2015-01-14 09:18:41.934827] E [afr-self-heal-common.c:233:afr_sh_print_split_brain_log] 0-gv0-replicate-0: Unable to self-heal contents of '/mpdis' (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix: [ [ 0 4 ] [ 2 0 ] ] [2015-01-14 09:18:41.935375] E [afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] 0-gv0-replicate-0: metadata self heal failed, on /mpdis [2015-01-14 09:18:42.943742] E [afr-self-heal-common.c:233:afr_sh_print_split_brain_log] 0-gv0-replicate-0: Unable to self-heal contents of '/' (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix: [ [ 0 20 ] [ 21 0 ] ] [2015-01-14 09:18:42.944432] E [afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] 0-gv0-replicate-0: metadata self heal failed, on / [2015-01-14 09:18:42.946664] E [afr-self-heal-common.c:233:afr_sh_print_split_brain_log] 0-gv0-replicate-0: Unable to self-heal contents of '/mpdis' (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix: [ [ 0 4 ] [ 2 0 ] ] [2015-01-14 09:18:42.947323] E [afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] 0-gv0-replicate-0: metadata self heal failed, on /mpdis [2015-01-14 09:18:43.955929] E [afr-self-heal-common.c:233:afr_sh_print_split_brain_log] 0-gv0-replicate-0: Unable to self-heal contents of '/' (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix: [ [ 0 20 ] [ 21 0 ] ] [2015-01-14 09:18:43.956701] E [afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] 0-gv0-replicate-0: metadata self heal failed, on / [2015-01-14 09:18:43.958874] E [afr-self-heal-common.c:233:afr_sh_print_split_brain_log] 0-gv0-replicate-0: Unable to self-heal contents of '/mpdis' (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix: [ [ 0 4 ] [ 2 0 ] ] Besides, I find Input/output error shown below when listing the files of under mount point: [root at dmf-wpst-2 mpdis]# ll total 0 -rwxr-xr-x. 1 apache apache 0 Jan 14 04:21 test.rep.00.00.00.00.dmf1 -rw-r--r--. 1 apache apache 0 Jan 14 04:21 test.rep.00.00.00.00.dmf2 [root at dmf-wpst-2 mpdis]# ll total 0 -rwxr-xr-x. 1 apache apache 0 Jan 14 04:21 test.rep.00.00.00.00.dmf1 -rw-r--r--. 1 apache apache 0 Jan 14 04:21 test.rep.00.00.00.00.dmf2 [root at dmf-wpst-2 mpdis]# ll ls: cannot open directory .: Input/output error [root at dmf-wpst-2 mpdis]# ll ls: cannot access test.rep.00.00.00.00.dmf1: Input/output error ls: cannot access test.rep.00.00.00.00.dmf2: Input/output error total 0 ?????????? ? ? ? ? ? test.rep.00.00.00.00.dmf1 ?????????? ? ? ? ? ? test.rep.00.00.00.00.dmf2 Any tips about debugging further or getting this fixed up would be appreciated. [version]: 3.5.3 [environment]: two virtual server each has one brick : root at dmf-wpst-2 mpdis]# gluster volume status Status of volume: gv0 Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick dmf-ha-1-glusterfs:/export/vdb1/brick 49152 Y 332 Brick dmf-ha-2-glusterfs:/export/vdb1/brick 49154 Y 19396 Self-heal Daemon on localhost N/A Y 19410 Self-heal Daemon on 10.175.123.246 N/A Y 999 [root at dmf-wpst-1 mpdis]# gluster volume info Volume Name: gv0 Type: Replicate Volume ID: 51de44c3-f01e-486d-a6b7-10c7b7a270d7 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: dmf-ha-1-glusterfs:/export/vdb1/brick Brick2: dmf-ha-2-glusterfs:/export/vdb1/brick Options Reconfigured: nfs.disable: ON network.ping-timeout: 2 storage.bd-aio: on storage.linux-aio: on cluster.eager-lock: on performance.client-io-threads: on performance.cache-refresh-timeout: 60 performance.io-thread-count: 64 performance.cache-size: 8GB cluster.server-quorum-type: none [mount-point info]: 1. mount command glusterfs -p /var/run/glusterfs.pid --volfile-server=dmf-ha-1-glusterfs --volfile-server=dmf-ha-2-glusterfs --volfile-id=gv0 /dmfcontents 2. mount point directory hierarchy [root at dmf-wpst-2 /]# ls -ld /dmfcontents/ drwxr-xr-x. 5 root root 71 Jan 14 04:39 /dmfcontents/ [root at dmf-wpst-2 /]# ls -ld /dmfcontents/mpdis/ drwxr-xr-x. 2 apache apache 89 Jan 14 04:39 /dmfcontents/mpdis/ hi Jifeng Li, Sorry for the delay in response. Could you post the output of: 'getfattr -d -m. -e hex <brick-path>' 'getfattr -d -m. -e hex <brick-path>/mpdis' 'getfattr -d -m. -e hex <brick-path>/mpdis/test.rep.00.00.00.00.dmf1' 'getfattr -d -m. -e hex <brick-path>/mpdis/test.rep.00.00.00.00.dmf2' Pranith _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org<mailto:Gluster-users at gluster.org> http://www.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150123/40a111c1/attachment.html>