Jeff Darcy
2015-Jan-30 12:58 UTC
[Gluster-users] ... i was able to produce a split brain...
> Pranith and I had a discussion regarding this issue and here is what we have > in our mind right now. > > We plan to provide the user commands to execute from mount so that he can > access the files in split-brain. This way he can choose which copy is to be > used as source. The user will have to perform a set of getfattrs and > setfattrs (on virtual xattrs) to decide which child to choose as source and > inform AFR with his decision. > > A) To know the split-brain status : > getfattr -n trusted.afr.split-brain-status <path-to-file> > > This will provide user with the following details - > 1) Whether the file is in metadata split-brain > 2) Whether the file is in data split-brain > > It will also list the name of afr-children to choose from. Something like : > Option0: client-0 > Option1: client-1 > > We also tell the user what the user could do to view metadata/data info; like > stat to get metadata etc. > > B) Now the user has to choose one of the options (client-x/client-y..) to > inspect the files. > e.g., setfattr -n trusted.afr.split-brain-choice -v client-0 <path-to-file> > We save the read-child info in inode-ctx in order to provide the user access > to the file in split-brain from that child. Once the user inspects the file, > he proceeds to do the same from the other child of replica pair and makes an > informed decision. > > C) Once the above steps are done, AFR is to be informed with the final choice > for source. This is achieved by - > (say the fresh copy is in client-0) > e.g., setfattr -n trusted.afr.split-brain-heal-finalize -v client-0 > <path-to-file> > This child will be chosen as source and split-brain resolution will be done.+1 That looks quite nice, and AFAICT shouldn't be prohibitively hard to implement.
Joe Julian
2015-Jan-30 15:42 UTC
[Gluster-users] ... i was able to produce a split brain...
Looks good! On January 30, 2015 4:58:35 AM PST, Jeff Darcy <jdarcy at redhat.com> wrote:>> Pranith and I had a discussion regarding this issue and here is what >we have >> in our mind right now. >> >> We plan to provide the user commands to execute from mount so that he >can >> access the files in split-brain. This way he can choose which copy is >to be >> used as source. The user will have to perform a set of getfattrs and >> setfattrs (on virtual xattrs) to decide which child to choose as >source and >> inform AFR with his decision. >> >> A) To know the split-brain status : >> getfattr -n trusted.afr.split-brain-status <path-to-file> >> >> This will provide user with the following details - >> 1) Whether the file is in metadata split-brain >> 2) Whether the file is in data split-brain >> >> It will also list the name of afr-children to choose from. Something >like : >> Option0: client-0 >> Option1: client-1 >> >> We also tell the user what the user could do to view metadata/data >info; like >> stat to get metadata etc. >> >> B) Now the user has to choose one of the options >(client-x/client-y..) to >> inspect the files. >> e.g., setfattr -n trusted.afr.split-brain-choice -v client-0 ><path-to-file> >> We save the read-child info in inode-ctx in order to provide the user >access >> to the file in split-brain from that child. Once the user inspects >the file, >> he proceeds to do the same from the other child of replica pair and >makes an >> informed decision. >> >> C) Once the above steps are done, AFR is to be informed with the >final choice >> for source. This is achieved by - >> (say the fresh copy is in client-0) >> e.g., setfattr -n trusted.afr.split-brain-heal-finalize -v client-0 >> <path-to-file> >> This child will be chosen as source and split-brain resolution will >be done. > >+1 > >That looks quite nice, and AFAICT shouldn't be prohibitively hard to >implement.-- Sent from my Android device with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150130/de791a3b/attachment.html>
Pranith Kumar Karampuri
2015-Jan-31 05:47 UTC
[Gluster-users] ... i was able to produce a split brain...
On 01/30/2015 06:28 PM, Jeff Darcy wrote:>> Pranith and I had a discussion regarding this issue and here is what we have >> in our mind right now. >> >> We plan to provide the user commands to execute from mount so that he can >> access the files in split-brain. This way he can choose which copy is to be >> used as source. The user will have to perform a set of getfattrs and >> setfattrs (on virtual xattrs) to decide which child to choose as source and >> inform AFR with his decision. >> >> A) To know the split-brain status : >> getfattr -n trusted.afr.split-brain-status <path-to-file> >> >> This will provide user with the following details - >> 1) Whether the file is in metadata split-brain >> 2) Whether the file is in data split-brain >> >> It will also list the name of afr-children to choose from. Something like : >> Option0: client-0 >> Option1: client-1 >> >> We also tell the user what the user could do to view metadata/data info; like >> stat to get metadata etc. >> >> B) Now the user has to choose one of the options (client-x/client-y..) to >> inspect the files. >> e.g., setfattr -n trusted.afr.split-brain-choice -v client-0 <path-to-file> >> We save the read-child info in inode-ctx in order to provide the user access >> to the file in split-brain from that child. Once the user inspects the file, >> he proceeds to do the same from the other child of replica pair and makes an >> informed decision. >> >> C) Once the above steps are done, AFR is to be informed with the final choice >> for source. This is achieved by - >> (say the fresh copy is in client-0) >> e.g., setfattr -n trusted.afr.split-brain-heal-finalize -v client-0 >> <path-to-file> >> This child will be chosen as source and split-brain resolution will be done. > +1 > > That looks quite nice, and AFAICT shouldn't be prohibitively hard to > implement.The only problem I see is kernel read caching which may come in the way. We may have to invoke fuse_invalidate if it comes in the way. We will find out once we implement this. Pranith> >
Ted Miller
2015-Feb-03 17:12 UTC
[Gluster-users] ... i was able to produce a split brain...
On 1/31/2015 12:47 AM, Pranith Kumar Karampuri wrote:> > On 01/30/2015 06:28 PM, Jeff Darcy wrote: >>> Pranith and I had a discussion regarding this issue and here is what we have >>> in our mind right now. >>> >>> We plan to provide the user commands to execute from mount so that he can >>> access the files in split-brain. This way he can choose which copy is to be >>> used as source. The user will have to perform a set of getfattrs and >>> setfattrs (on virtual xattrs) to decide which child to choose as source and >>> inform AFR with his decision. >>> >>> A) To know the split-brain status : >>> getfattr -n trusted.afr.split-brain-status <path-to-file> >>> >>> This will provide user with the following details - >>> 1) Whether the file is in metadata split-brain >>> 2) Whether the file is in data split-brain >>> >>> It will also list the name of afr-children to choose from. Something like : >>> Option0: client-0 >>> Option1: client-1 >>> >>> We also tell the user what the user could do to view metadata/data info; >>> like >>> stat to get metadata etc. >>> >>> B) Now the user has to choose one of the options (client-x/client-y..) to >>> inspect the files. >>> e.g., setfattr -n trusted.afr.split-brain-choice -v client-0 <path-to-file> >>> We save the read-child info in inode-ctx in order to provide the user access >>> to the file in split-brain from that child. Once the user inspects the file, >>> he proceeds to do the same from the other child of replica pair and makes an >>> informed decision. >>> >>> C) Once the above steps are done, AFR is to be informed with the final >>> choice >>> for source. This is achieved by - >>> (say the fresh copy is in client-0) >>> e.g., setfattr -n trusted.afr.split-brain-heal-finalize -v client-0 >>> <path-to-file> >>> This child will be chosen as source and split-brain resolution will be done.May I suggest another possible way to get around the difficulty in determining which of the files is the one to keep? What if each of the files were to be renamed by appending the name of the brick-host that it lives on? For example, in a replica 2 system: brick-1: data1 host-1: host1 brick-2: date1 host-2: host2 file name: hair-pulling.txt after running script/command to resolve split-brain, file system would have two files: hair-pulling.txt__host-1_data1 hair-pulling.txt__host-2_data1 the user would then delete the unwanted file and rename the wanted file back to hair-pulling.txt. The only problem would come with a very large file with a large number of replicas (like the replica 5 system I am working with). You might run out of space for all the copies. Otherwise, this seems (to me) to present a user-friendly way to do this. If the user has doubts (and disk space), user can choose to keep the rejected file around for a while, "just in case" it happens to have something useful in it that is missing from the "accepted" file. **************************************************************** That brought another thought to mind (have not had reason to try it): How does gluster cope if you go behind its back and rename a "rejected" file? For instance, in my example above, what if I go directly on the brick and rename the host-2 copy of the file to hair-pulling.txt-dud? The ideal scenario would seem to be that if user does a heal it would treat the copy as new file, see no dupe for hair-pulling.txt, and create a new dupe on host-2. Since hair-pulling.txt-dud is also a new file, a dupe would be created on host-1. User could then access files, verify correctness, and then delete hair-pulling.txt-dud. ***************************************************************** A not-officially-sanctioned way that I dealt with a split-brain a few versions back: 1. decided I wanted to keep file on host-2 2. log onto host-2 3. cp /brick/data1/hair-pulling.txt /gluster/data1/hair-pulling.txt-dud 4. rm /brick/data1/hair-pulling.txt 5. follow some Joe Julian blog stuff to delete the "invisible fork" of file 6. gluster volume heal data1 all I believe that this did work for me at that time. I have not had to do it on a recent gluster version. Ted Miller Elkhart, IN