thr3ads.net - Gluster users - [Gluster-users] ?==?utf-8?q? Help with reconnecting a faulty brick [Nov 2017]

If this information is useful, please help other people find it:
Share via:

Daniel Berteaud

2017-Nov-16 07:24 UTC

[Gluster-users] Help with reconnecting a faulty brick

Le 15/11/2017 ? 09:45, Ravishankar N a ?crit?:> If it is only the brick that is faulty on the bad node, but everything 
> else is fine, like glusterd running, the node being a part of the 
> trusted storage pool etc,? you could just kill the brick first and do 
> step-13 in "10.6.2. Replacing a Host Machine with the Same
Hostname",
> (the mkdir of non-existent dir, followed by setfattr of non-existent 
> key) of 
>
https://access.redhat.com/documentation/en-US/Red_Hat_Storage/3.1/pdf/Administration_Guide/Red_Hat_Storage-3.1-Administration_Guide-en-US.pdf,
> then restart the brick by restarting glusterd on that node. Read 10.5 
> and 10.6 sections in the doc to get a better understanding of 
> replacing bricks.
Thanks, I'll try that.
Any way in this situation to check which file will be healed from which 
brick before reconnecting ? Using some getfattr tricks ?

Regards, Daniel

-- 

Logo FWS

	*Daniel Berteaud*

FIREWALL-SERVICES SAS.
Soci?t? de Services en Logiciels Libres
Tel : 05 56 64 15 32 <tel:0556641532>
Matrix: @dani:fws.fr
/www.firewall-services.com/

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20171116/23d48d47/attachment.html>

Ravishankar N

2017-Nov-16 12:07 UTC

head link

[Gluster-users] Help with reconnecting a faulty brick

On 11/16/2017 12:54 PM, Daniel Berteaud wrote:> Le 15/11/2017 ? 09:45, Ravishankar N a ?crit?:
>> If it is only the brick that is faulty on the bad node, but 
>> everything else is fine, like glusterd running, the node being a part 
>> of the trusted storage pool etc,? you could just kill the brick first 
>> and do step-13 in "10.6.2. Replacing a Host Machine with the Same 
>> Hostname", (the mkdir of non-existent dir, followed by setfattr of
>> non-existent key) of 
>>
https://access.redhat.com/documentation/en-US/Red_Hat_Storage/3.1/pdf/Administration_Guide/Red_Hat_Storage-3.1-Administration_Guide-en-US.pdf,
>> then restart the brick by restarting glusterd on that node. Read 10.5 
>> and 10.6 sections in the doc to get a better understanding of 
>> replacing bricks.
>
> Thanks, I'll try that.
> Any way in this situation to check which file will be healed from 
> which brick before reconnecting ? Using some getfattr tricks ?Yes, there are afr xattrs that determine the heal direction for each 
file. The good copy will have non-zero trusted.afr* xattrs that blame 
the bad one and heal will happen from good to bad.? If both bricks have 
attrs blaming the other, then the file is in split-brain.
-Ravi>
> Regards, Daniel
>
> -- 
>
> Logo FWS
>
> 	*Daniel Berteaud*
>
> FIREWALL-SERVICES SAS.
> Soci?t? de Services en Logiciels Libres
> Tel : 05 56 64 15 32 <tel:0556641532>
> Matrix: @dani:fws.fr
> /www.firewall-services.com/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20171116/6c36b343/attachment.html>

Daniel Berteaud

2017-Nov-17 10:11 UTC

head link

[Gluster-users] ?==?utf-8?q? Help with reconnecting a faulty brick

Le Jeudi, Novembre 16, 2017 13:07 CET, Ravishankar N <ravishankar at
redhat.com> a ?crit:
 > On 11/16/2017 12:54 PM, Daniel Berteaud wrote:
> > Any way in this situation to check which file will be healed from 
> > which brick before reconnecting ? Using some getfattr tricks ?
> Yes, there are afr xattrs that determine the heal direction for each 
> file. The good copy will have non-zero trusted.afr* xattrs that blame 
> the bad one and heal will happen from good to bad.? If both bricks have 
> attrs blaming the other, then the file is in split-brain.
Thanks.

So, say I have a file with this on the correct node
# file: mnt/bricks/vmstore/prod/bilbao_sys.qcow2
security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000
trusted.afr.vmstore-client-0=0x00050f7e0000000200000000
trusted.afr.vmstore-client-1=0x000000000000000100000000
trusted.gfid=0xe86c24e5fc6b4fc6bf2b896f3cc8537d

And this on the bad one

# file: mnt/bricks/vmstore/prod/bilbao_sys.qcow2
security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000
trusted.afr.vmstore-client-0=0x000000000000000000000000
trusted.afr.vmstore-client-1=0x000000000000000000000000
trusted.gfid=0xe86c24e5fc6b4fc6bf2b896f3cc8537d

I can guarantee Gluster will heal from the correct one to the bad. And in case
of both having a non nul afr, I can manually (using setfattr) set the afr
attribute to a null value before reconnecting the faulty brick, and it'll
heal from the correct one.

And for files which have been deleted/renamed/created on the correct node while
the bad one was offline, how are those handled ? For example, I have

/mnt/bricks/vmstore/prod/contis_sys.qcow2 ont btoh bricks. But, on the correct
one, the file was deleted and recreated while the bad one was offline. So they
haven't the same gfid now. How does gluster handle this ?

Sorry for all those questions, I'm just a bit nervous :-)

-- 
Daniel Berteaud
FIREWALL-SERVICES SAS.
Soci?t? de Services en Logiciels Libres
Tel : 05 56 64 15 32
Visio: https://vroom.fws.fr/dani
Web : http://www.firewall-services.com

Possibly Parallel Threads

Search for more maybe matching threads

Gluster users - Nov 2017 - ?==?utf-8?q? Help with reconnecting a faulty brick

[Gluster-users] Help with reconnecting a faulty brick

[Gluster-users] Help with reconnecting a faulty brick

[Gluster-users] ?==?utf-8?q? Help with reconnecting a faulty brick

Possibly Parallel Threads