I am new to gluster but already like it. I did a maintenance last week where I shutdown both nodes (one after each others). I had many files that needed to be healed after that. Everything worked well, except for 1 file. It is in split-brain, with 2 different GFID. I read the documentation but it only covers the cases where the GFID is the same on both bricks. BTW, I am running Gluster 3.10. Here are some details... [root at NAS-01 .glusterfs]# gluster volume heal data01 info Brick 192.168.186.11:/mnt/DATA/data /abc/.zsh_history /abc - Is in split-brain Status: Connected Number of entries: 2 Brick 192.168.186.12:/mnt/DATA/data /abc - Is in split-brain /abc/.zsh_history Status: Connected Number of entries: 2 On brick 1: [root at NAS-01 abc]# ls -lart total 75 drwxr-xr-x. 2 root root 2 Jun 8 13:26 .zsh_history drwxr-xr-x. 3 12078 root 3 Jun 12 11:36 . drwxrwxrwt. 17 root root 17 Jun 12 12:20 .. On brick 2: [root at DC-MTL-NAS-02 abc]# ls -lart total 66 -rw-rw-r--. 2 12078 12078 1085 Jun 12 04:42 .zsh_history drwxr-xr-x. 2 12078 root 3 Jun 12 10:36 . drwxrwxrwt. 17 root root 17 Jun 12 11:20 .. Notice that on one brick, it is a file and on the other one it is a directory. On brick 1: [root at NAS-01 abc]# getfattr -d -m . -e hex /mnt/DATA/data/abc/.zsh_history getfattr: Removing leading '/' from absolute path names # file: mnt/DATA/data/abc/.zsh_history security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.data01-client-0=0x000000000000000000000000 trusted.afr.data01-client-1=0x000000000000000200000000 trusted.gfid=0xdee43407139d41f091d13e106a51f262 trusted.glusterfs.dht=0x000000010000000000000000ffffffff On brick 2: root at NAS-02 abc]# getfattr -d -m . -e hex /mnt/DATA/data/abc/.zsh_history getfattr: Removing leading '/' from absolute path names # file: mnt/DATA/data/abc/.zsh_history security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.data01-client-0=0x000000170000000200000000 trusted.afr.data01-client-1=0x000000000000000000000000 trusted.bit-rot.version=0x060000000000000059397acd0005dadd trusted.gfid=0xa70ae9af887a4a37875f5c7c81ebc803 Any recommendation on how to recover from that? BTW, the file is not important and I could easily get rid of it without impact. So, if this is an easy solution... Regards, -- Ludwig Gamache -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170614/9df30fd2/attachment.html>
Hi Ludwig, There is no way to resolve gfid split-brains with type mismatch. You have to do it manually by following the steps in [1]. In case of type mismatch it is recommended to resolve it manually. But for only gfid mismatch in 3.11 we have a way to resolve it by using the *favorite-child-policy*. Since the file is not important, you can go with deleting that. [1] https://gluster.readthedocs.io/en/latest/Troubleshooting/split-brain/#fixing-directory-entry-split-brain HTH, Karthik On Thu, Jun 15, 2017 at 8:23 AM, Ludwig Gamache <ludwig at elementai.com> wrote:> I am new to gluster but already like it. I did a maintenance last week > where I shutdown both nodes (one after each others). I had many files that > needed to be healed after that. Everything worked well, except for 1 file. > It is in split-brain, with 2 different GFID. I read the documentation but > it only covers the cases where the GFID is the same on both bricks. BTW, I > am running Gluster 3.10. > > Here are some details... > > [root at NAS-01 .glusterfs]# gluster volume heal data01 info > > Brick 192.168.186.11:/mnt/DATA/data > > /abc/.zsh_history > > /abc - Is in split-brain > > > Status: Connected > > Number of entries: 2 > > > Brick 192.168.186.12:/mnt/DATA/data > > /abc - Is in split-brain > > > /abc/.zsh_history > > Status: Connected > > Number of entries: 2 > > On brick 1: > > [root at NAS-01 abc]# ls -lart > > total 75 > > drwxr-xr-x. 2 root root 2 Jun 8 13:26 .zsh_history > > drwxr-xr-x. 3 12078 root 3 Jun 12 11:36 . > > drwxrwxrwt. 17 root root 17 Jun 12 12:20 .. > > On brick 2: > > [root at DC-MTL-NAS-02 abc]# ls -lart > > total 66 > > -rw-rw-r--. 2 12078 12078 1085 Jun 12 04:42 .zsh_history > > drwxr-xr-x. 2 12078 root 3 Jun 12 10:36 . > > drwxrwxrwt. 17 root root 17 Jun 12 11:20 .. > > Notice that on one brick, it is a file and on the other one it is a > directory. > > On brick 1: > > [root at NAS-01 abc]# getfattr -d -m . -e hex /mnt/DATA/data/abc/.zsh_history > > getfattr: Removing leading '/' from absolute path names > > # file: mnt/DATA/data/abc/.zsh_history > > security.selinux=0x73797374656d5f753a6f626a6563 > 745f723a756e6c6162656c65645f743a733000 > > trusted.afr.data01-client-0=0x000000000000000000000000 > > trusted.afr.data01-client-1=0x000000000000000200000000 > > trusted.gfid=0xdee43407139d41f091d13e106a51f262 > > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > > On brick 2: > > root at NAS-02 abc]# getfattr -d -m . -e hex /mnt/DATA/data/abc/.zsh_history > > getfattr: Removing leading '/' from absolute path names > > # file: mnt/DATA/data/abc/.zsh_history > > security.selinux=0x73797374656d5f753a6f626a6563 > 745f723a756e6c6162656c65645f743a733000 > > trusted.afr.data01-client-0=0x000000170000000200000000 > > trusted.afr.data01-client-1=0x000000000000000000000000 > > trusted.bit-rot.version=0x060000000000000059397acd0005dadd > > trusted.gfid=0xa70ae9af887a4a37875f5c7c81ebc803 > > Any recommendation on how to recover from that? BTW, the file is not > important and I could easily get rid of it without impact. So, if this is > an easy solution... > > Regards, > > -- > Ludwig Gamache > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170615/d4405923/attachment.html>
Can you please explain How we ended up in this scenario. I think that will help to understand more about this scenarios and why gluster recommend replica 3 or arbiter volume. Regards Rafi KC On 06/15/2017 10:46 AM, Karthik Subrahmanya wrote:> Hi Ludwig, > > There is no way to resolve gfid split-brains with type mismatch. You > have to do it manually by following the steps in [1]. > In case of type mismatch it is recommended to resolve it manually. But > for only gfid mismatch in 3.11 we have a way to > resolve it by using the *favorite-child-policy*. > Since the file is not important, you can go with deleting that. > > [1] > https://gluster.readthedocs.io/en/latest/Troubleshooting/split-brain/#fixing-directory-entry-split-brain > > HTH, > Karthik > > On Thu, Jun 15, 2017 at 8:23 AM, Ludwig Gamache <ludwig at elementai.com > <mailto:ludwig at elementai.com>> wrote: > > I am new to gluster but already like it. I did a maintenance last > week where I shutdown both nodes (one after each others). I had > many files that needed to be healed after that. Everything worked > well, except for 1 file. It is in split-brain, with 2 different > GFID. I read the documentation but it only covers the cases where > the GFID is the same on both bricks. BTW, I am running Gluster 3.10. > > Here are some details... > > [root at NAS-01 .glusterfs]# gluster volume heal data01 info > > Brick 192.168.186.11:/mnt/DATA/data > > /abc/.zsh_history > > /abc - Is in split-brain > > > Status: Connected > > Number of entries: 2 > > > Brick 192.168.186.12:/mnt/DATA/data > > /abc - Is in split-brain > > > /abc/.zsh_history > > Status: Connected > > Number of entries: 2 > > > On brick 1: > > [root at NAS-01 abc]# ls -lart > > total 75 > > drwxr-xr-x. 2 root root 2 Jun 8 13:26 .zsh_history > > drwxr-xr-x. 3 12078 root 3 Jun 12 11:36 . > > drwxrwxrwt. 17 root root 17 Jun 12 12:20 .. > > > On brick 2: > > [root at DC-MTL-NAS-02 abc]# ls -lart > > total 66 > > -rw-rw-r--. 2 12078 12078 1085 Jun 12 04:42 .zsh_history > > drwxr-xr-x. 2 12078 root 3 Jun 12 10:36 . > > drwxrwxrwt. 17 root root 17 Jun 12 11:20 .. > > > Notice that on one brick, it is a file and on the other one it is > a directory. > > On brick 1: > > [root at NAS-01 abc]# getfattr -d -m . -e hex > /mnt/DATA/data/abc/.zsh_history > > getfattr: Removing leading '/' from absolute path names > > # file: mnt/DATA/data/abc/.zsh_history > > security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 > > trusted.afr.data01-client-0=0x000000000000000000000000 > > trusted.afr.data01-client-1=0x000000000000000200000000 > > trusted.gfid=0xdee43407139d41f091d13e106a51f262 > > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > > > On brick 2: > > root at NAS-02 abc]# getfattr -d -m . -e hex > /mnt/DATA/data/abc/.zsh_history > > getfattr: Removing leading '/' from absolute path names > > # file: mnt/DATA/data/abc/.zsh_history > > security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 > > trusted.afr.data01-client-0=0x000000170000000200000000 > > trusted.afr.data01-client-1=0x000000000000000000000000 > > trusted.bit-rot.version=0x060000000000000059397acd0005dadd > > trusted.gfid=0xa70ae9af887a4a37875f5c7c81ebc803 > > > Any recommendation on how to recover from that? BTW, the file is > not important and I could easily get rid of it without impact. So, > if this is an easy solution... > > Regards, > > -- > Ludwig Gamache > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> > http://lists.gluster.org/mailman/listinfo/gluster-users > <http://lists.gluster.org/mailman/listinfo/gluster-users> > > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170615/db7e20de/attachment.html>