Milos Cuculovic
2019-Mar-21 12:33 UTC
[Gluster-users] Heal flapping between Possibly undergoing heal and In split brain
Sure: brick1: ???????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????? sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 getfattr: Removing leading '/' from absolute path names # file: data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 trusted.gfid=0xe358ff34504241d387efe1e76eb28bb0 trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.dht.mds=0x00000000 sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 getfattr: Removing leading '/' from absolute path names # file: data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 trusted.gfid=0x33585a577c4a4c55b39b9abc07eacff1 trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.dht.mds=0x00000000 sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf getfattr: Removing leading '/' from absolute path names # file: data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf trusted.gfid=0x99cdfe0773d74d7ab29592ce40cf47f3 trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.dht.mds=0x00000000 sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b getfattr: Removing leading '/' from absolute path names # file: data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b trusted.gfid=0x8a96a80ddf8a4163ac00bb12369389df trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.dht.mds=0x00000000 sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b getfattr: Removing leading '/' from absolute path names # file: data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b trusted.gfid=0x04d087a96cc6421ca7bc356fc20d7753 trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.dht.mds=0x00000000 sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e getfattr: Removing leading '/' from absolute path names # file: data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e trusted.gfid=0xdee76d06204d4bd9b7652eb7a79be1a5 trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.dht.mds=0x00000000 ???????????????????????????????????????????????????????????? sudo stat /data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 File: '/data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59' Size: 33 Blocks: 0 IO Block: 4096 directory Device: 807h/2055d Inode: 40809094709 Links: 3 Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) Access: 2019-03-20 11:06:26.994047597 +0100 Modify: 2019-03-20 11:28:28.294689870 +0100 Change: 2019-03-21 13:01:03.077654239 +0100 Birth: - sudo stat /data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 File: '/data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617' Size: 33 Blocks: 0 IO Block: 4096 directory Device: 807h/2055d Inode: 49399908865 Links: 3 Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) Access: 2019-03-20 11:07:20.342140927 +0100 Modify: 2019-03-20 11:28:28.318690015 +0100 Change: 2019-03-21 13:01:03.133654344 +0100 Birth: - sudo stat /data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf File: '/data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf' Size: 33 Blocks: 0 IO Block: 4096 directory Device: 807h/2055d Inode: 53706303549 Links: 3 Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) Access: 2019-03-20 11:06:55.414097315 +0100 Modify: 2019-03-20 11:28:28.362690281 +0100 Change: 2019-03-21 13:01:03.141654359 +0100 Birth: - sudo stat /data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b File: '/data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b' Size: 33 Blocks: 0 IO Block: 4096 directory Device: 807h/2055d Inode: 57990935591 Links: 3 Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) Access: 2019-03-20 11:07:08.558120309 +0100 Modify: 2019-03-20 11:28:14.226604869 +0100 Change: 2019-03-21 13:01:03.189654448 +0100 Birth: - sudo stat /data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b File: '/data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b' Size: 33 Blocks: 0 IO Block: 4096 directory Device: 807h/2055d Inode: 62291339781 Links: 3 Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) Access: 2019-03-20 11:06:02.070003998 +0100 Modify: 2019-03-20 11:28:28.458690861 +0100 Change: 2019-03-21 13:01:03.281654621 +0100 Birth: - sudo stat /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e File: '/data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e' Size: 33 Blocks: 0 IO Block: 4096 directory Device: 807h/2055d Inode: 66574223479 Links: 3 Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) Access: 2019-03-20 11:28:10.826584325 +0100 Modify: 2019-03-20 11:28:10.834584374 +0100 Change: 2019-03-20 14:06:07.937449353 +0100 Birth: - root at storage3:/var/log/glusterfs# ???????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????? brick2: ???????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????? sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e getfattr: Removing leading '/' from absolute path names # file: data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e trusted.afr.dirty=0x000000000000000000000000 trusted.afr.storage2-client-0=0x000000000000000000000000 trusted.gfid=0xdee76d06204d4bd9b7652eb7a79be1a5 trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.dht.mds=0x00000000 sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 getfattr: Removing leading '/' from absolute path names # file: data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 trusted.afr.storage2-client-0=0x000000000000000000000000 trusted.gfid=0x33585a577c4a4c55b39b9abc07eacff1 trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.dht.mds=0x00000000 sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf getfattr: Removing leading '/' from absolute path names # file: data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf trusted.afr.storage2-client-0=0x000000000000000000000000 trusted.gfid=0x99cdfe0773d74d7ab29592ce40cf47f3 trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.dht.mds=0x00000000 sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b getfattr: Removing leading '/' from absolute path names # file: data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b trusted.afr.storage2-client-0=0x000000000000000000000000 trusted.gfid=0x8a96a80ddf8a4163ac00bb12369389df trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.dht.mds=0x00000000 sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b getfattr: Removing leading '/' from absolute path names # file: data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b trusted.afr.storage2-client-0=0x000000000000000000000000 trusted.gfid=0x04d087a96cc6421ca7bc356fc20d7753 trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.dht.mds=0x00000000 sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e getfattr: Removing leading '/' from absolute path names # file: data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e trusted.afr.dirty=0x000000000000000000000000 trusted.afr.storage2-client-0=0x000000000000000000000000 trusted.gfid=0xdee76d06204d4bd9b7652eb7a79be1a5 trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.dht.mds=0x00000000 ???????????????????????????????????????????????????????????? sudo stat /data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 File: '/data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59' Size: 33 Blocks: 0 IO Block: 4096 directory Device: 807h/2055d Inode: 42232631305 Links: 3 Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) Access: 2019-03-20 11:06:26.994047597 +0100 Modify: 2019-03-20 11:28:28.294689870 +0100 Change: 2019-03-21 13:01:03.078748131 +0100 Birth: - sudo stat /data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 File: '/data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617' Size: 33 Blocks: 0 IO Block: 4096 directory Device: 807h/2055d Inode: 78589109305 Links: 3 Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) Access: 2019-03-20 11:07:20.342140927 +0100 Modify: 2019-03-20 11:28:28.318690015 +0100 Change: 2019-03-21 13:01:03.134748477 +0100 Birth: - sudo stat /data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf File: '/data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf' Size: 33 Blocks: 0 IO Block: 4096 directory Device: 807h/2055d Inode: 54972096517 Links: 3 Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) Access: 2019-03-20 11:06:55.414097315 +0100 Modify: 2019-03-20 11:28:28.362690281 +0100 Change: 2019-03-21 13:01:03.162748650 +0100 Birth: - sudo stat /data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b File: '/data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b' Size: 33 Blocks: 0 IO Block: 4096 directory Device: 807h/2055d Inode: 40821259275 Links: 3 Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) Access: 2019-03-20 11:07:08.558120309 +0100 Modify: 2019-03-20 11:28:14.226604869 +0100 Change: 2019-03-21 13:01:03.194748848 +0100 Birth: - sudo stat /data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b File: '/data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b' Size: 33 Blocks: 0 IO Block: 4096 directory Device: 807h/2055d Inode: 15876654 Links: 3 Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) Access: 2019-03-20 11:06:02.070003998 +0100 Modify: 2019-03-20 11:28:28.458690861 +0100 Change: 2019-03-21 13:01:03.282749392 +0100 Birth: - sudo stat /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e File: '/data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e' Size: 33 Blocks: 0 IO Block: 4096 directory Device: 807h/2055d Inode: 49408944650 Links: 3 Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) Access: 2019-03-20 11:28:10.826584325 +0100 Modify: 2019-03-20 11:28:10.834584374 +0100 Change: 2019-03-20 14:06:07.940849268 +0100 Birth: - ???????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????? The file is from brick 2 that I upgraded and started the heal on. - Kindest regards, Milos Cuculovic IT Manager --- MDPI AG Postfach, CH-4020 Basel, Switzerland Office: St. Alban-Anlage 66, 4052 Basel, Switzerland Tel. +41 61 683 77 35 Fax +41 61 302 89 18 Email: cuculovic at mdpi.com Skype: milos.cuculovic.mdpi Disclaimer: The information and files contained in this message are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this message in error, please notify me and delete this message from your system. You may not copy this message in its entirety or in part, or disclose its contents to anyone.> On 21 Mar 2019, at 13:05, Karthik Subrahmanya <ksubrahm at redhat.com> wrote: > > Can you give me the stat & getfattr output of all those 6 entries from both the bricks and the glfsheal-<volname>.log file from the node where you run this command? > Meanwhile can you also try running this with the source-brick option? > > On Thu, Mar 21, 2019 at 5:22 PM Milos Cuculovic <cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>> wrote: > Thank you Karthik, > > I have run this for all files (see example below) and it says the file is not in split-brain: > > sudo gluster volume heal storage2 split-brain latest-mtime /dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 > Healing /dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 failed: File not in split-brain. > Volume heal failed. > > > - Kindest regards, > > Milos Cuculovic > IT Manager > > --- > MDPI AG > Postfach, CH-4020 Basel, Switzerland > Office: St. Alban-Anlage 66, 4052 Basel, Switzerland > Tel. +41 61 683 77 35 > Fax +41 61 302 89 18 > Email:?cuculovic at mdpi.com <mailto:cuculovic at mdpi.com> > Skype: milos.cuculovic.mdpi > > Disclaimer: The information and files contained in this message are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this message in error, please notify me and delete this message from your system. You may not copy this message in its entirety or in part, or disclose its contents to anyone. > >> On 21 Mar 2019, at 12:36, Karthik Subrahmanya <ksubrahm at redhat.com <mailto:ksubrahm at redhat.com>> wrote: >> >> Hi Milos, >> >> Thanks for the logs and the getfattr output. >> From the logs I can see that there are 6 entries under the directory "/data/data-cluster/dms/final_archive" named >> 41be9ff5ec05c4b1c989c6053e709e59 >> 5543982fab4b56060aa09f667a8ae617 >> a8b7f31775eebc8d1867e7f9de7b6eaf >> c1d3f3c2d7ae90e891e671e2f20d5d4b >> e5934699809a3b6dcfc5945f408b978b >> e7cdc94f60d390812a5f9754885e119e >> which are having gfid mismatch, so the heal is failing on this directory. >> >> You can use the CLI option to resolve these files from gfid mismatch. You can use any of the 3 methods available: >> 1. bigger-file >> gluster volume heal <VOLNAME> split-brain bigger-file <FILE> >> >> 2. latest-mtime >> gluster volume heal <VOLNAME> split-brain latest-mtime <FILE> >> >> 3. source-brick >> gluster volume heal <VOLNAME> split-brain source-brick <HOSTNAME:BRICKNAME> <FILE> >> >> where <FILE> must be absolute path w.r.t. the volume, starting with '/'. >> If all those entries are directories then go for either latest-mtime/source-brick option. >> After you resolve all these gfid-mismatches, run the "gluster volume heal <volname>" command. Then check the heal info and let me know the result. >> >> Regards, >> Karthik >> >> On Thu, Mar 21, 2019 at 4:27 PM Milos Cuculovic <cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>> wrote: >> Sure, thank you for following up. >> >> About the commands, here is what I see: >> >> brick1: >> ????????????????????????????????????? >> sudo gluster volume heal storage2 info >> Brick storage3:/data/data-cluster >> <gfid:256ca960-1601-4f0d-9b08-905c6fd52326> >> <gfid:7a63a729-c48f-4a00-9040-c3e2a0710ae6> >> /dms/final_archive - Possibly undergoing heal >> >> Status: Connected >> Number of entries: 3 >> >> Brick storage4:/data/data-cluster >> <gfid:276fec9a-1c9b-4efe-9715-dcf4207e99b0> >> /dms/final_archive - Possibly undergoing heal >> >> Status: Connected >> Number of entries: 2 >> ????????????????????????????????????? >> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive >> getfattr: Removing leading '/' from absolute path names >> # file: data/data-cluster/dms/final_archive >> trusted.afr.dirty=0x000000000000000000000000 >> trusted.afr.storage2-client-1=0x000000000000000000000010 >> trusted.gfid=0x16c6a1e2b3fe4851972b998980097a87 >> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >> trusted.glusterfs.dht.mds=0x00000000 >> ????????????????????????????????????? >> stat /data/data-cluster/dms/final_archive >> File: '/data/data-cluster/dms/final_archive' >> Size: 3497984 Blocks: 8768 IO Block: 4096 directory >> Device: 807h/2055d Inode: 26427748396 Links: 72123 >> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >> Access: 2018-10-09 04:22:40.514629044 +0200 >> Modify: 2019-03-21 11:55:37.382278863 +0100 >> Change: 2019-03-21 11:55:37.382278863 +0100 >> Birth: - >> ????????????????????????????????????? >> ????????????????????????????????????? >> >> brick2: >> ????????????????????????????????????? >> sudo gluster volume heal storage2 info >> Brick storage3:/data/data-cluster >> <gfid:256ca960-1601-4f0d-9b08-905c6fd52326> >> <gfid:7a63a729-c48f-4a00-9040-c3e2a0710ae6> >> /dms/final_archive - Possibly undergoing heal >> >> Status: Connected >> Number of entries: 3 >> >> Brick storage4:/data/data-cluster >> <gfid:276fec9a-1c9b-4efe-9715-dcf4207e99b0> >> /dms/final_archive - Possibly undergoing heal >> >> Status: Connected >> Number of entries: 2 >> ????????????????????????????????????? >> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive >> getfattr: Removing leading '/' from absolute path names >> # file: data/data-cluster/dms/final_archive >> trusted.afr.dirty=0x000000000000000000000000 >> trusted.afr.storage2-client-0=0x000000000000000000000001 >> trusted.gfid=0x16c6a1e2b3fe4851972b998980097a87 >> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >> trusted.glusterfs.dht.mds=0x00000000 >> ????????????????????????????????????? >> stat /data/data-cluster/dms/final_archive >> File: '/data/data-cluster/dms/final_archive' >> Size: 3497984 Blocks: 8760 IO Block: 4096 directory >> Device: 807h/2055d Inode: 13563551265 Links: 72124 >> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >> Access: 2018-10-09 04:22:40.514629044 +0200 >> Modify: 2019-03-21 11:55:46.382565124 +0100 >> Change: 2019-03-21 11:55:46.382565124 +0100 >> Birth: - >> ????????????????????????????????????? >> >> Hope this helps. >> >> - Kindest regards, >> >> Milos Cuculovic >> IT Manager >> >> --- >> MDPI AG >> Postfach, CH-4020 Basel, Switzerland >> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland >> Tel. +41 61 683 77 35 >> Fax +41 61 302 89 18 >> Email:?cuculovic at mdpi.com <mailto:cuculovic at mdpi.com> >> Skype: milos.cuculovic.mdpi >> >> Disclaimer: The information and files contained in this message are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this message in error, please notify me and delete this message from your system. You may not copy this message in its entirety or in part, or disclose its contents to anyone. >> >>> On 21 Mar 2019, at 11:43, Karthik Subrahmanya <ksubrahm at redhat.com <mailto:ksubrahm at redhat.com>> wrote: >>> >>> Can you attach the "glustershd.log" file which will be present under "/var/log/glusterfs/" from both the nodes and the "stat" & "getfattr -d -m . -e hex <file-path-on-brick>" output of all the entries listed in the heal info output from both the bricks? >>> >>> On Thu, Mar 21, 2019 at 3:54 PM Milos Cuculovic <cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>> wrote: >>> Thanks Karthik! >>> >>> I was trying to find some resolution methods from [2] but unfortunately none worked (I can explain what I tried if needed). >>> >>>> I guess the volume you are talking about is of type replica-2 (1x2). >>> That?s correct, aware of the arbiter solution but still didn?t took time to implement. >>> >>> From the info results I posted, how to know in which situation I am. No files are mentioned in spit brain, only directories. One brick has 3 entries and one two entries. >>> >>> sudo gluster volume heal storage2 info >>> [sudo] password for sshadmin: >>> Brick storage3:/data/data-cluster >>> <gfid:256ca960-1601-4f0d-9b08-905c6fd52326> >>> <gfid:7a63a729-c48f-4a00-9040-c3e2a0710ae6> >>> /dms/final_archive - Possibly undergoing heal >>> >>> Status: Connected >>> Number of entries: 3 >>> >>> Brick storage4:/data/data-cluster >>> <gfid:276fec9a-1c9b-4efe-9715-dcf4207e99b0> >>> /dms/final_archive - Possibly undergoing heal >>> >>> Status: Connected >>> Number of entries: 2 >>> >>> - Kindest regards, >>> >>> Milos Cuculovic >>> IT Manager >>> >>> --- >>> MDPI AG >>> Postfach, CH-4020 Basel, Switzerland >>> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland >>> Tel. +41 61 683 77 35 >>> Fax +41 61 302 89 18 >>> Email:?cuculovic at mdpi.com <mailto:cuculovic at mdpi.com> >>> Skype: milos.cuculovic.mdpi >>> >>> Disclaimer: The information and files contained in this message are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this message in error, please notify me and delete this message from your system. You may not copy this message in its entirety or in part, or disclose its contents to anyone. >>> >>>> On 21 Mar 2019, at 10:27, Karthik Subrahmanya <ksubrahm at redhat.com <mailto:ksubrahm at redhat.com>> wrote: >>>> >>>> Hi, >>>> >>>> Note: I guess the volume you are talking about is of type replica-2 (1x2). Usually replica 2 volumes are prone to split-brain. If you can consider converting them to arbiter or replica-3, they will handle most of the cases which can lead to slit-brains. For more information see [1]. >>>> >>>> Resolving the split-brain: [2] talks about how to interpret the heal info output and different ways to resolve them using the CLI/manually/using the favorite-child-policy. >>>> If you are having entry split brain, and is a gfid split-brain (file/dir having different gfids on the replica bricks) then you can use the CLI option to resolve them. If a directory is in gfid split-brain in a distributed-replicate volume and you are using the source-brick option please make sure you use the brick of this subvolume, which has the same gfid as that of the other distribute subvolume(s) where you have the correct gfid, as the source. >>>> If you are having a type mismatch then follow the steps in [3] to resolve the split-brain. >>>> >>>> [1] https://docs.gluster.org/en/v3/Administrator%20Guide/Split%20brain%20and%20ways%20to%20deal%20with%20it/ <https://docs.gluster.org/en/v3/Administrator%20Guide/Split%20brain%20and%20ways%20to%20deal%20with%20it/> >>>> [2] https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/ <https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/> >>>> [3] https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/#dir-split-brain <https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/#dir-split-brain> >>>> >>>> HTH, >>>> Karthik >>>> >>>> On Thu, Mar 21, 2019 at 1:45 PM Milos Cuculovic <cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>> wrote: >>>> I was now able to catch the split brain log: >>>> >>>> sudo gluster volume heal storage2 info >>>> Brick storage3:/data/data-cluster >>>> <gfid:256ca960-1601-4f0d-9b08-905c6fd52326> >>>> <gfid:7a63a729-c48f-4a00-9040-c3e2a0710ae6> >>>> /dms/final_archive - Is in split-brain >>>> >>>> Status: Connected >>>> Number of entries: 3 >>>> >>>> Brick storage4:/data/data-cluster >>>> <gfid:276fec9a-1c9b-4efe-9715-dcf4207e99b0> >>>> /dms/final_archive - Is in split-brain >>>> >>>> Status: Connected >>>> Number of entries: 2 >>>> >>>> Milos >>>> >>>>> On 21 Mar 2019, at 09:07, Milos Cuculovic <cuculovic at mdpi.com <mailto:cuculovic at mdpi.com>> wrote: >>>>> >>>>> Since 24h, after upgrading from 4.0 to 4.1.7 one of the servers, the heal shows this: >>>>> >>>>> sudo gluster volume heal storage2 info >>>>> Brick storage3:/data/data-cluster >>>>> <gfid:256ca960-1601-4f0d-9b08-905c6fd52326> >>>>> <gfid:7a63a729-c48f-4a00-9040-c3e2a0710ae6> >>>>> /dms/final_archive - Possibly undergoing heal >>>>> >>>>> Status: Connected >>>>> Number of entries: 3 >>>>> >>>>> Brick storage4:/data/data-cluster >>>>> <gfid:276fec9a-1c9b-4efe-9715-dcf4207e99b0> >>>>> /dms/final_archive - Possibly undergoing heal >>>>> >>>>> Status: Connected >>>>> Number of entries: 2 >>>>> >>>>> The same files stay there. From time to time the status of the /dms/final_archive is in split brain at the following command shows: >>>>> >>>>> sudo gluster volume heal storage2 info split-brain >>>>> Brick storage3:/data/data-cluster >>>>> /dms/final_archive >>>>> Status: Connected >>>>> Number of entries in split-brain: 1 >>>>> >>>>> Brick storage4:/data/data-cluster >>>>> /dms/final_archive >>>>> Status: Connected >>>>> Number of entries in split-brain: 1 >>>>> >>>>> How to know the file who is in split brain? The files in /dms/final_archive are not very important, fine to remove (ideally resolve the split brain) for the ones that differ. >>>>> >>>>> I can only see the directory and GFID. Any idea on how to resolve this situation as I would like to continue with the upgrade on the 2nd server, and for this the heal needs to be done with 0 entries in sudo gluster volume heal storage2 info >>>>> >>>>> Thank you in advance, Milos. >>>> >>>> _______________________________________________ >>>> Gluster-users mailing list >>>> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> >>>> https://lists.gluster.org/mailman/listinfo/gluster-users <https://lists.gluster.org/mailman/listinfo/gluster-users> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190321/3f9556f3/attachment-0002.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: glfsheal-storage2.log Type: application/octet-stream Size: 513885 bytes Desc: not available URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190321/3f9556f3/attachment-0001.obj> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190321/3f9556f3/attachment-0003.html>
Karthik Subrahmanya
2019-Mar-21 13:07 UTC
[Gluster-users] Heal flapping between Possibly undergoing heal and In split brain
Hey Milos, I see that gfid got healed for those directories from the getfattr output and the glfsheal log also has messages corresponding to deleting the entries on one brick as part of healing which then got recreated on the brick with the correct gfid. Can you run the "guster volume heal <volname>" & "gluster volume heal <volname> info" command and paste the output here? If you still see entries pending heal, give the latest glustershd.log files from both the nodes along with the getfattr output of the files which are listed in the heal info output. Regards, Karthik On Thu, Mar 21, 2019 at 6:03 PM Milos Cuculovic <cuculovic at mdpi.com> wrote:> Sure: > > brick1: > ???????????????????????????????????????????????????????????? > ???????????????????????????????????????????????????????????? > sudo getfattr -d -m . -e hex > /data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 > getfattr: Removing leading '/' from absolute path names > # file: > data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 > trusted.gfid=0xe358ff34504241d387efe1e76eb28bb0 > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > trusted.glusterfs.dht.mds=0x00000000 > > sudo getfattr -d -m . -e hex > /data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 > getfattr: Removing leading '/' from absolute path names > # file: > data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 > trusted.gfid=0x33585a577c4a4c55b39b9abc07eacff1 > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > trusted.glusterfs.dht.mds=0x00000000 > > sudo getfattr -d -m . -e hex > /data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf > getfattr: Removing leading '/' from absolute path names > # file: > data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf > trusted.gfid=0x99cdfe0773d74d7ab29592ce40cf47f3 > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > trusted.glusterfs.dht.mds=0x00000000 > > sudo getfattr -d -m . -e hex > /data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b > getfattr: Removing leading '/' from absolute path names > # file: > data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b > trusted.gfid=0x8a96a80ddf8a4163ac00bb12369389df > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > trusted.glusterfs.dht.mds=0x00000000 > > sudo getfattr -d -m . -e hex > /data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b > getfattr: Removing leading '/' from absolute path names > # file: > data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b > trusted.gfid=0x04d087a96cc6421ca7bc356fc20d7753 > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > trusted.glusterfs.dht.mds=0x00000000 > > sudo getfattr -d -m . -e hex > /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e > getfattr: Removing leading '/' from absolute path names > # file: > data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e > trusted.gfid=0xdee76d06204d4bd9b7652eb7a79be1a5 > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > trusted.glusterfs.dht.mds=0x00000000 > ???????????????????????????????????????????????????????????? > sudo stat > /data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 > File: > '/data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59' > Size: 33 Blocks: 0 IO Block: 4096 directory > Device: 807h/2055d Inode: 40809094709 Links: 3 > Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) > Access: 2019-03-20 11:06:26.994047597 +0100 > Modify: 2019-03-20 11:28:28.294689870 +0100 > Change: 2019-03-21 13:01:03.077654239 +0100 > Birth: - > > sudo stat > /data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 > File: > '/data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617' > Size: 33 Blocks: 0 IO Block: 4096 directory > Device: 807h/2055d Inode: 49399908865 Links: 3 > Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) > Access: 2019-03-20 11:07:20.342140927 +0100 > Modify: 2019-03-20 11:28:28.318690015 +0100 > Change: 2019-03-21 13:01:03.133654344 +0100 > Birth: - > > sudo stat > /data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf > File: > '/data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf' > Size: 33 Blocks: 0 IO Block: 4096 directory > Device: 807h/2055d Inode: 53706303549 Links: 3 > Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) > Access: 2019-03-20 11:06:55.414097315 +0100 > Modify: 2019-03-20 11:28:28.362690281 +0100 > Change: 2019-03-21 13:01:03.141654359 +0100 > Birth: - > > sudo stat > /data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b > File: > '/data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b' > Size: 33 Blocks: 0 IO Block: 4096 directory > Device: 807h/2055d Inode: 57990935591 Links: 3 > Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) > Access: 2019-03-20 11:07:08.558120309 +0100 > Modify: 2019-03-20 11:28:14.226604869 +0100 > Change: 2019-03-21 13:01:03.189654448 +0100 > Birth: - > > sudo stat > /data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b > File: > '/data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b' > Size: 33 Blocks: 0 IO Block: 4096 directory > Device: 807h/2055d Inode: 62291339781 Links: 3 > Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) > Access: 2019-03-20 11:06:02.070003998 +0100 > Modify: 2019-03-20 11:28:28.458690861 +0100 > Change: 2019-03-21 13:01:03.281654621 +0100 > Birth: - > > sudo stat > /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e > File: > '/data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e' > Size: 33 Blocks: 0 IO Block: 4096 directory > Device: 807h/2055d Inode: 66574223479 Links: 3 > Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) > Access: 2019-03-20 11:28:10.826584325 +0100 > Modify: 2019-03-20 11:28:10.834584374 +0100 > Change: 2019-03-20 14:06:07.937449353 +0100 > Birth: - > root at storage3:/var/log/glusterfs# > ???????????????????????????????????????????????????????????? > ???????????????????????????????????????????????????????????? > > brick2: > ???????????????????????????????????????????????????????????? > ???????????????????????????????????????????????????????????? > sudo getfattr -d -m . -e hex > /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e > getfattr: Removing leading '/' from absolute path names > # file: > data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e > trusted.afr.dirty=0x000000000000000000000000 > trusted.afr.storage2-client-0=0x000000000000000000000000 > trusted.gfid=0xdee76d06204d4bd9b7652eb7a79be1a5 > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > trusted.glusterfs.dht.mds=0x00000000 > > sudo getfattr -d -m . -e hex > /data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 > getfattr: Removing leading '/' from absolute path names > # file: > data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 > trusted.afr.storage2-client-0=0x000000000000000000000000 > trusted.gfid=0x33585a577c4a4c55b39b9abc07eacff1 > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > trusted.glusterfs.dht.mds=0x00000000 > > sudo getfattr -d -m . -e hex > /data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf > getfattr: Removing leading '/' from absolute path names > # file: > data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf > trusted.afr.storage2-client-0=0x000000000000000000000000 > trusted.gfid=0x99cdfe0773d74d7ab29592ce40cf47f3 > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > trusted.glusterfs.dht.mds=0x00000000 > > sudo getfattr -d -m . -e hex > /data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b > getfattr: Removing leading '/' from absolute path names > # file: > data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b > trusted.afr.storage2-client-0=0x000000000000000000000000 > trusted.gfid=0x8a96a80ddf8a4163ac00bb12369389df > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > trusted.glusterfs.dht.mds=0x00000000 > > sudo getfattr -d -m . -e hex > /data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b > getfattr: Removing leading '/' from absolute path names > # file: > data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b > trusted.afr.storage2-client-0=0x000000000000000000000000 > trusted.gfid=0x04d087a96cc6421ca7bc356fc20d7753 > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > trusted.glusterfs.dht.mds=0x00000000 > > sudo getfattr -d -m . -e hex > /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e > getfattr: Removing leading '/' from absolute path names > # file: > data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e > trusted.afr.dirty=0x000000000000000000000000 > trusted.afr.storage2-client-0=0x000000000000000000000000 > trusted.gfid=0xdee76d06204d4bd9b7652eb7a79be1a5 > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > trusted.glusterfs.dht.mds=0x00000000 > > ???????????????????????????????????????????????????????????? > > sudo stat > /data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 > File: > '/data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59' > Size: 33 Blocks: 0 IO Block: 4096 directory > Device: 807h/2055d Inode: 42232631305 Links: 3 > Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) > Access: 2019-03-20 11:06:26.994047597 +0100 > Modify: 2019-03-20 11:28:28.294689870 +0100 > Change: 2019-03-21 13:01:03.078748131 +0100 > Birth: - > > sudo stat > /data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617 > File: > '/data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617' > Size: 33 Blocks: 0 IO Block: 4096 directory > Device: 807h/2055d Inode: 78589109305 Links: 3 > Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) > Access: 2019-03-20 11:07:20.342140927 +0100 > Modify: 2019-03-20 11:28:28.318690015 +0100 > Change: 2019-03-21 13:01:03.134748477 +0100 > Birth: - > > sudo stat > /data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf > File: > '/data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf' > Size: 33 Blocks: 0 IO Block: 4096 directory > Device: 807h/2055d Inode: 54972096517 Links: 3 > Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) > Access: 2019-03-20 11:06:55.414097315 +0100 > Modify: 2019-03-20 11:28:28.362690281 +0100 > Change: 2019-03-21 13:01:03.162748650 +0100 > Birth: - > > sudo stat > /data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b > File: > '/data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b' > Size: 33 Blocks: 0 IO Block: 4096 directory > Device: 807h/2055d Inode: 40821259275 Links: 3 > Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) > Access: 2019-03-20 11:07:08.558120309 +0100 > Modify: 2019-03-20 11:28:14.226604869 +0100 > Change: 2019-03-21 13:01:03.194748848 +0100 > Birth: - > > sudo stat > /data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b > File: > '/data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b' > Size: 33 Blocks: 0 IO Block: 4096 directory > Device: 807h/2055d Inode: 15876654 Links: 3 > Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) > Access: 2019-03-20 11:06:02.070003998 +0100 > Modify: 2019-03-20 11:28:28.458690861 +0100 > Change: 2019-03-21 13:01:03.282749392 +0100 > Birth: - > > sudo stat > /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e > File: > '/data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e' > Size: 33 Blocks: 0 IO Block: 4096 directory > Device: 807h/2055d Inode: 49408944650 Links: 3 > Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) > Access: 2019-03-20 11:28:10.826584325 +0100 > Modify: 2019-03-20 11:28:10.834584374 +0100 > Change: 2019-03-20 14:06:07.940849268 +0100 > Birth: - > ???????????????????????????????????????????????????????????? > ???????????????????????????????????????????????????????????? > > The file is from brick 2 that I upgraded and started the heal on. > > > - Kindest regards, > > Milos Cuculovic > IT Manager > > --- > MDPI AG > Postfach, CH-4020 Basel, Switzerland > Office: St. Alban-Anlage 66, 4052 Basel, Switzerland > Tel. +41 61 683 77 35 > Fax +41 61 302 89 18 > Email: cuculovic at mdpi.com <cuculovic at mdpi.com> > Skype: milos.cuculovic.mdpi > > Disclaimer: The information and files contained in this message > are confidential and intended solely for the use of the individual or > entity to whom they are addressed. If you have received this message in > error, please notify me and delete this message from your system. You may > not copy this message in its entirety or in part, or disclose its contents > to anyone. > > On 21 Mar 2019, at 13:05, Karthik Subrahmanya <ksubrahm at redhat.com> wrote: > > Can you give me the stat & getfattr output of all those 6 entries from > both the bricks and the glfsheal-<volname>.log file from the node where you > run this command? > Meanwhile can you also try running this with the source-brick option? > > On Thu, Mar 21, 2019 at 5:22 PM Milos Cuculovic <cuculovic at mdpi.com> > wrote: > >> Thank you Karthik, >> >> I have run this for all files (see example below) and it says the file is >> not in split-brain: >> >> sudo gluster volume heal storage2 split-brain latest-mtime >> /dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 >> Healing /dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 failed: File >> not in split-brain. >> Volume heal failed. >> >> >> - Kindest regards, >> >> Milos Cuculovic >> IT Manager >> >> --- >> MDPI AG >> Postfach, CH-4020 Basel, Switzerland >> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland >> Tel. +41 61 683 77 35 >> Fax +41 61 302 89 18 >> Email: cuculovic at mdpi.com <cuculovic at mdpi.com> >> Skype: milos.cuculovic.mdpi >> >> Disclaimer: The information and files contained in this message >> are confidential and intended solely for the use of the individual or >> entity to whom they are addressed. If you have received this message in >> error, please notify me and delete this message from your system. You may >> not copy this message in its entirety or in part, or disclose its contents >> to anyone. >> >> On 21 Mar 2019, at 12:36, Karthik Subrahmanya <ksubrahm at redhat.com> >> wrote: >> >> Hi Milos, >> >> Thanks for the logs and the getfattr output. >> From the logs I can see that there are 6 entries under the >> directory "/data/data-cluster/dms/final_archive" named >> 41be9ff5ec05c4b1c989c6053e709e59 >> 5543982fab4b56060aa09f667a8ae617 >> a8b7f31775eebc8d1867e7f9de7b6eaf >> c1d3f3c2d7ae90e891e671e2f20d5d4b >> e5934699809a3b6dcfc5945f408b978b >> e7cdc94f60d390812a5f9754885e119e >> which are having gfid mismatch, so the heal is failing on this directory. >> >> You can use the CLI option to resolve these files from gfid mismatch. You >> can use any of the 3 methods available: >> 1. bigger-file >> gluster volume heal <VOLNAME> split-brain bigger-file <FILE> >> >> 2. latest-mtime >> gluster volume heal <VOLNAME> split-brain latest-mtime <FILE> >> >> 3. source-brick >> gluster volume heal <VOLNAME> split-brain source-brick >> <HOSTNAME:BRICKNAME> <FILE> >> >> where <FILE> must be absolute path w.r.t. the volume, starting with '/'. >> If all those entries are directories then go for either >> latest-mtime/source-brick option. >> After you resolve all these gfid-mismatches, run the "gluster volume heal >> <volname>" command. Then check the heal info and let me know the result. >> >> Regards, >> Karthik >> >> On Thu, Mar 21, 2019 at 4:27 PM Milos Cuculovic <cuculovic at mdpi.com> >> wrote: >> >>> Sure, thank you for following up. >>> >>> About the commands, here is what I see: >>> >>> brick1: >>> ????????????????????????????????????? >>> sudo gluster volume heal storage2 info >>> Brick storage3:/data/data-cluster >>> <gfid:256ca960-1601-4f0d-9b08-905c6fd52326> >>> <gfid:7a63a729-c48f-4a00-9040-c3e2a0710ae6> >>> /dms/final_archive - Possibly undergoing heal >>> >>> Status: Connected >>> Number of entries: 3 >>> >>> Brick storage4:/data/data-cluster >>> <gfid:276fec9a-1c9b-4efe-9715-dcf4207e99b0> >>> /dms/final_archive - Possibly undergoing heal >>> >>> Status: Connected >>> Number of entries: 2 >>> ????????????????????????????????????? >>> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive >>> getfattr: Removing leading '/' from absolute path names >>> # file: data/data-cluster/dms/final_archive >>> trusted.afr.dirty=0x000000000000000000000000 >>> trusted.afr.storage2-client-1=0x000000000000000000000010 >>> trusted.gfid=0x16c6a1e2b3fe4851972b998980097a87 >>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>> trusted.glusterfs.dht.mds=0x00000000 >>> ????????????????????????????????????? >>> stat /data/data-cluster/dms/final_archive >>> File: '/data/data-cluster/dms/final_archive' >>> Size: 3497984 Blocks: 8768 IO Block: 4096 directory >>> Device: 807h/2055d Inode: 26427748396 Links: 72123 >>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >>> Access: 2018-10-09 04:22:40.514629044 +0200 >>> Modify: 2019-03-21 11:55:37.382278863 +0100 >>> Change: 2019-03-21 11:55:37.382278863 +0100 >>> Birth: - >>> ????????????????????????????????????? >>> ????????????????????????????????????? >>> >>> brick2: >>> ????????????????????????????????????? >>> sudo gluster volume heal storage2 info >>> Brick storage3:/data/data-cluster >>> <gfid:256ca960-1601-4f0d-9b08-905c6fd52326> >>> <gfid:7a63a729-c48f-4a00-9040-c3e2a0710ae6> >>> /dms/final_archive - Possibly undergoing heal >>> >>> Status: Connected >>> Number of entries: 3 >>> >>> Brick storage4:/data/data-cluster >>> <gfid:276fec9a-1c9b-4efe-9715-dcf4207e99b0> >>> /dms/final_archive - Possibly undergoing heal >>> >>> Status: Connected >>> Number of entries: 2 >>> ????????????????????????????????????? >>> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive >>> getfattr: Removing leading '/' from absolute path names >>> # file: data/data-cluster/dms/final_archive >>> trusted.afr.dirty=0x000000000000000000000000 >>> trusted.afr.storage2-client-0=0x000000000000000000000001 >>> trusted.gfid=0x16c6a1e2b3fe4851972b998980097a87 >>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>> trusted.glusterfs.dht.mds=0x00000000 >>> ????????????????????????????????????? >>> stat /data/data-cluster/dms/final_archive >>> File: '/data/data-cluster/dms/final_archive' >>> Size: 3497984 Blocks: 8760 IO Block: 4096 directory >>> Device: 807h/2055d Inode: 13563551265 Links: 72124 >>> Access: (0755/drwxr-xr-x) Uid: ( 33/www-data) Gid: ( 33/www-data) >>> Access: 2018-10-09 04:22:40.514629044 +0200 >>> Modify: 2019-03-21 11:55:46.382565124 +0100 >>> Change: 2019-03-21 11:55:46.382565124 +0100 >>> Birth: - >>> ????????????????????????????????????? >>> >>> Hope this helps. >>> >>> - Kindest regards, >>> >>> Milos Cuculovic >>> IT Manager >>> >>> --- >>> MDPI AG >>> Postfach, CH-4020 Basel, Switzerland >>> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland >>> Tel. +41 61 683 77 35 >>> Fax +41 61 302 89 18 >>> Email: cuculovic at mdpi.com <cuculovic at mdpi.com> >>> Skype: milos.cuculovic.mdpi >>> >>> Disclaimer: The information and files contained in this message >>> are confidential and intended solely for the use of the individual or >>> entity to whom they are addressed. If you have received this message in >>> error, please notify me and delete this message from your system. You may >>> not copy this message in its entirety or in part, or disclose its contents >>> to anyone. >>> >>> On 21 Mar 2019, at 11:43, Karthik Subrahmanya <ksubrahm at redhat.com> >>> wrote: >>> >>> Can you attach the "glustershd.log" file which will be present under >>> "/var/log/glusterfs/" from both the nodes and the "stat" & "getfattr -d -m >>> . -e hex <file-path-on-brick>" output of all the entries listed in the heal >>> info output from both the bricks? >>> >>> On Thu, Mar 21, 2019 at 3:54 PM Milos Cuculovic <cuculovic at mdpi.com> >>> wrote: >>> >>>> Thanks Karthik! >>>> >>>> I was trying to find some resolution methods from [2] but unfortunately >>>> none worked (I can explain what I tried if needed). >>>> >>>> I guess the volume you are talking about is of type replica-2 (1x2). >>>> >>>> That?s correct, aware of the arbiter solution but still didn?t took >>>> time to implement. >>>> >>>> From the info results I posted, how to know in which situation I am. No >>>> files are mentioned in spit brain, only directories. One brick has 3 >>>> entries and one two entries. >>>> >>>> sudo gluster volume heal storage2 info >>>> [sudo] password for sshadmin: >>>> Brick storage3:/data/data-cluster >>>> <gfid:256ca960-1601-4f0d-9b08-905c6fd52326> >>>> <gfid:7a63a729-c48f-4a00-9040-c3e2a0710ae6> >>>> /dms/final_archive - Possibly undergoing heal >>>> >>>> Status: Connected >>>> Number of entries: 3 >>>> >>>> Brick storage4:/data/data-cluster >>>> <gfid:276fec9a-1c9b-4efe-9715-dcf4207e99b0> >>>> /dms/final_archive - Possibly undergoing heal >>>> >>>> Status: Connected >>>> Number of entries: 2 >>>> >>>> - Kindest regards, >>>> >>>> Milos Cuculovic >>>> IT Manager >>>> >>>> --- >>>> MDPI AG >>>> Postfach, CH-4020 Basel, Switzerland >>>> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland >>>> Tel. +41 61 683 77 35 >>>> Fax +41 61 302 89 18 >>>> Email: cuculovic at mdpi.com <cuculovic at mdpi.com> >>>> Skype: milos.cuculovic.mdpi >>>> >>>> Disclaimer: The information and files contained in this message >>>> are confidential and intended solely for the use of the individual or >>>> entity to whom they are addressed. If you have received this message in >>>> error, please notify me and delete this message from your system. You may >>>> not copy this message in its entirety or in part, or disclose its contents >>>> to anyone. >>>> >>>> On 21 Mar 2019, at 10:27, Karthik Subrahmanya <ksubrahm at redhat.com> >>>> wrote: >>>> >>>> Hi, >>>> >>>> Note: I guess the volume you are talking about is of type replica-2 >>>> (1x2). Usually replica 2 volumes are prone to split-brain. If you can >>>> consider converting them to arbiter or replica-3, they will handle most of >>>> the cases which can lead to slit-brains. For more information see [1]. >>>> >>>> Resolving the split-brain: [2] talks about how to interpret the heal >>>> info output and different ways to resolve them using the CLI/manually/using >>>> the favorite-child-policy. >>>> If you are having entry split brain, and is a gfid split-brain >>>> (file/dir having different gfids on the replica bricks) then you can use >>>> the CLI option to resolve them. If a directory is in gfid split-brain in a >>>> distributed-replicate volume and you are using the source-brick option >>>> please make sure you use the brick of this subvolume, which has the same >>>> gfid as that of the other distribute subvolume(s) where you have the >>>> correct gfid, as the source. >>>> If you are having a type mismatch then follow the steps in [3] to >>>> resolve the split-brain. >>>> >>>> [1] >>>> https://docs.gluster.org/en/v3/Administrator%20Guide/Split%20brain%20and%20ways%20to%20deal%20with%20it/ >>>> [2] >>>> https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/ >>>> [3] >>>> https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/#dir-split-brain >>>> >>>> HTH, >>>> Karthik >>>> >>>> On Thu, Mar 21, 2019 at 1:45 PM Milos Cuculovic <cuculovic at mdpi.com> >>>> wrote: >>>> >>>>> I was now able to catch the split brain log: >>>>> >>>>> sudo gluster volume heal storage2 info >>>>> Brick storage3:/data/data-cluster >>>>> <gfid:256ca960-1601-4f0d-9b08-905c6fd52326> >>>>> <gfid:7a63a729-c48f-4a00-9040-c3e2a0710ae6> >>>>> /dms/final_archive - Is in split-brain >>>>> >>>>> Status: Connected >>>>> Number of entries: 3 >>>>> >>>>> Brick storage4:/data/data-cluster >>>>> <gfid:276fec9a-1c9b-4efe-9715-dcf4207e99b0> >>>>> /dms/final_archive - Is in split-brain >>>>> >>>>> Status: Connected >>>>> Number of entries: 2 >>>>> >>>>> Milos >>>>> >>>>> On 21 Mar 2019, at 09:07, Milos Cuculovic <cuculovic at mdpi.com> wrote: >>>>> >>>>> Since 24h, after upgrading from 4.0 to 4.1.7 one of the servers, the >>>>> heal shows this: >>>>> >>>>> sudo gluster volume heal storage2 info >>>>> Brick storage3:/data/data-cluster >>>>> <gfid:256ca960-1601-4f0d-9b08-905c6fd52326> >>>>> <gfid:7a63a729-c48f-4a00-9040-c3e2a0710ae6> >>>>> /dms/final_archive - Possibly undergoing heal >>>>> >>>>> Status: Connected >>>>> Number of entries: 3 >>>>> >>>>> Brick storage4:/data/data-cluster >>>>> <gfid:276fec9a-1c9b-4efe-9715-dcf4207e99b0> >>>>> /dms/final_archive - Possibly undergoing heal >>>>> >>>>> Status: Connected >>>>> Number of entries: 2 >>>>> >>>>> The same files stay there. From time to time the status of the >>>>> /dms/final_archive is in split brain at the following command shows: >>>>> >>>>> sudo gluster volume heal storage2 info split-brain >>>>> Brick storage3:/data/data-cluster >>>>> /dms/final_archive >>>>> Status: Connected >>>>> Number of entries in split-brain: 1 >>>>> >>>>> Brick storage4:/data/data-cluster >>>>> /dms/final_archive >>>>> Status: Connected >>>>> Number of entries in split-brain: 1 >>>>> >>>>> How to know the file who is in split brain? The files in >>>>> /dms/final_archive are not very important, fine to remove (ideally resolve >>>>> the split brain) for the ones that differ. >>>>> >>>>> I can only see the directory and GFID. Any idea on how to resolve this >>>>> situation as I would like to continue with the upgrade on the 2nd server, >>>>> and for this the heal needs to be done with 0 entries in sudo gluster >>>>> volume heal storage2 info >>>>> >>>>> Thank you in advance, Milos. >>>>> >>>>> >>>>> _______________________________________________ >>>>> Gluster-users mailing list >>>>> Gluster-users at gluster.org >>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>> >>>> >>>> >>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190321/06d2d049/attachment.html>