Daniel Berteaud
2017-Nov-16  07:24 UTC
[Gluster-users] Help with reconnecting a faulty brick
Le 15/11/2017 ? 09:45, Ravishankar N a ?crit?:> If it is only the brick that is faulty on the bad node, but everything > else is fine, like glusterd running, the node being a part of the > trusted storage pool etc,? you could just kill the brick first and do > step-13 in "10.6.2. Replacing a Host Machine with the Same Hostname", > (the mkdir of non-existent dir, followed by setfattr of non-existent > key) of > https://access.redhat.com/documentation/en-US/Red_Hat_Storage/3.1/pdf/Administration_Guide/Red_Hat_Storage-3.1-Administration_Guide-en-US.pdf, > then restart the brick by restarting glusterd on that node. Read 10.5 > and 10.6 sections in the doc to get a better understanding of > replacing bricks.Thanks, I'll try that. Any way in this situation to check which file will be healed from which brick before reconnecting ? Using some getfattr tricks ? Regards, Daniel -- Logo FWS *Daniel Berteaud* FIREWALL-SERVICES SAS. Soci?t? de Services en Logiciels Libres Tel : 05 56 64 15 32 <tel:0556641532> Matrix: @dani:fws.fr /www.firewall-services.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20171116/23d48d47/attachment.html>
On 11/16/2017 12:54 PM, Daniel Berteaud wrote:> Le 15/11/2017 ? 09:45, Ravishankar N a ?crit?: >> If it is only the brick that is faulty on the bad node, but >> everything else is fine, like glusterd running, the node being a part >> of the trusted storage pool etc,? you could just kill the brick first >> and do step-13 in "10.6.2. Replacing a Host Machine with the Same >> Hostname", (the mkdir of non-existent dir, followed by setfattr of >> non-existent key) of >> https://access.redhat.com/documentation/en-US/Red_Hat_Storage/3.1/pdf/Administration_Guide/Red_Hat_Storage-3.1-Administration_Guide-en-US.pdf, >> then restart the brick by restarting glusterd on that node. Read 10.5 >> and 10.6 sections in the doc to get a better understanding of >> replacing bricks. > > Thanks, I'll try that. > Any way in this situation to check which file will be healed from > which brick before reconnecting ? Using some getfattr tricks ?Yes, there are afr xattrs that determine the heal direction for each file. The good copy will have non-zero trusted.afr* xattrs that blame the bad one and heal will happen from good to bad.? If both bricks have attrs blaming the other, then the file is in split-brain. -Ravi> > Regards, Daniel > > -- > > Logo FWS > > *Daniel Berteaud* > > FIREWALL-SERVICES SAS. > Soci?t? de Services en Logiciels Libres > Tel : 05 56 64 15 32 <tel:0556641532> > Matrix: @dani:fws.fr > /www.firewall-services.com/ >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20171116/6c36b343/attachment.html>
Daniel Berteaud
2017-Nov-17  10:11 UTC
[Gluster-users] ?==?utf-8?q? Help with reconnecting a faulty brick
Le Jeudi, Novembre 16, 2017 13:07 CET, Ravishankar N <ravishankar at redhat.com> a ?crit:> On 11/16/2017 12:54 PM, Daniel Berteaud wrote: > > Any way in this situation to check which file will be healed from > > which brick before reconnecting ? Using some getfattr tricks ? > Yes, there are afr xattrs that determine the heal direction for each > file. The good copy will have non-zero trusted.afr* xattrs that blame > the bad one and heal will happen from good to bad.? If both bricks have > attrs blaming the other, then the file is in split-brain.Thanks. So, say I have a file with this on the correct node # file: mnt/bricks/vmstore/prod/bilbao_sys.qcow2 security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000 trusted.afr.vmstore-client-0=0x00050f7e0000000200000000 trusted.afr.vmstore-client-1=0x000000000000000100000000 trusted.gfid=0xe86c24e5fc6b4fc6bf2b896f3cc8537d And this on the bad one # file: mnt/bricks/vmstore/prod/bilbao_sys.qcow2 security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000 trusted.afr.vmstore-client-0=0x000000000000000000000000 trusted.afr.vmstore-client-1=0x000000000000000000000000 trusted.gfid=0xe86c24e5fc6b4fc6bf2b896f3cc8537d I can guarantee Gluster will heal from the correct one to the bad. And in case of both having a non nul afr, I can manually (using setfattr) set the afr attribute to a null value before reconnecting the faulty brick, and it'll heal from the correct one. And for files which have been deleted/renamed/created on the correct node while the bad one was offline, how are those handled ? For example, I have /mnt/bricks/vmstore/prod/contis_sys.qcow2 ont btoh bricks. But, on the correct one, the file was deleted and recreated while the bad one was offline. So they haven't the same gfid now. How does gluster handle this ? Sorry for all those questions, I'm just a bit nervous :-) -- Daniel Berteaud FIREWALL-SERVICES SAS. Soci?t? de Services en Logiciels Libres Tel : 05 56 64 15 32 Visio: https://vroom.fws.fr/dani Web : http://www.firewall-services.com