thr3ads.net - Gluster users - [Gluster-users] How different self healing scenarios works ? [Dec 2016]

If this information is useful, please help other people find it:
Share via:

Cedric Lemarchand

2016-Dec-16 11:50 UTC

[Gluster-users] How different self healing scenarios works ?

Hello,

I am testing different uses cases where I am not sure to well understand how
Gluster (3.9 here) self healing works. The context is a dispersed 4+2 volume
?vol1? on 6 nodes gl[1..6], one brick per node.

1) while a client is reading a 5Go file F on vol1, the file on gl6 (actually a
1/4 portion) is emptied with echo > F. At this point I can see a reversing of
the network flows from gl6 => client to client => gl6, and the portion of
the file F start healing. I assume that the file is recovered from the client to
gl6.

Q1.1 : I would think that gl6 detect the file corruption and recreate it from
existing portion on gl[1..5], but it happens from the client which is reading
the file. That make sens in the way that this file is already accessed by others
node, but in term of performance it could be a bottleneck on the client if the
files where multiple To ?
Q1.2 : why and in which conditions this happens ? does the same happens when the
file is written ?

2) vol1 in mounted by the client but not files are accessed, then all portions
of files (20 files of 5Go each) on gl6 are removed with rm *. If "gluster v
heal vol1 full? is issued on gl6 or the glusterfs-server process is restarted,
all files re-apear instantly, this is nice ! the file system on nodes is xfs.

Q2.1 I assume that files metadata are recovered from other nodes and data are
re-linked from blocks stil existing on file system ? is it specific for XFS or
will it be the same with others fs like ext or zfs ? (future plans are to use
zfs as the underlaying fs)

3) vol1 in mounted by the client but not files are accessed, files (the same 20
files of 5Go) are voluntary corrupted with echo 0 >> F, then "gluster
v heal vol1 full? is issued on gl6, files are recovered one by one from gl3
here.

Q3.1 : Why gluster didn?t try to recover multiples files from different nodes at
the same times ?
Q3.2 : I already see Gluster recovering files from multiples nodes at the same
times during heavy workloads, in which circonstances this happens ?
Q3.3 : after which time gluster would have detect the corruption and start the
self healing process ?

Actually I didn?t find any explanations on the web on how self healing process
works and what are different uses cases / scenarios, any pointers ?

Thanks

Cedric Lemarchand

2016-Dec-16 13:00 UTC

head link

[Gluster-users] How different self healing scenarios works ?

Ok I find some documentation, I should have searched better :
https://staged-gluster-docs.readthedocs.io/en/release3.7.0beta1/Developer-guide/afr-self-heal-daemon/

Replying myself to some questions :

Q2.1 : files are hard linked in .gluster/index, so the recovery process is done
by the index-heal that just re-creats hard links.
Q3.1 : only one self-heal process is allowed at a time ? or
disperse.shd-max-threads is set to 1 by default
Q3.2 : disperse.shd-max-threads is by default set to 1
Q3.3 : the index-heal is scheduled every 600 second (cluster.heal-timeout), it
seems the full-heal have to be triggered manually, so in case of file
corruption, without

Others questions Q1.1 & Q1.2 remains at the moment.

Thanks
> On 16 Dec 2016, at 12:50, Cedric Lemarchand <yipikai7 at gmail.com>
wrote:
> 
> Hello,
> 
> I am testing different uses cases where I am not sure to well understand
how Gluster (3.9 here) self healing works. The context is a dispersed 4+2 volume
?vol1? on 6 nodes gl[1..6], one brick per node.
> 
> 1) while a client is reading a 5Go file F on vol1, the file on gl6
(actually a 1/4 portion) is emptied with echo > F. At this point I can see a
reversing of the network flows from gl6 => client to client => gl6, and
the portion of the file F start healing. I assume that the file is recovered
from the client to gl6.
> 
> Q1.1 : I would think that gl6 detect the file corruption and recreate it
from existing portion on gl[1..5], but it happens from the client which is
reading the file. That make sens in the way that this file is already accessed
by others node, but in term of performance it could be a bottleneck on the
client if the files where multiple To ?
> Q1.2 : why and in which conditions this happens ? does the same happens
when the file is written ?
> 
> 2) vol1 in mounted by the client but not files are accessed, then all
portions of files (20 files of 5Go each) on gl6 are removed with rm *. If
"gluster v heal vol1 full? is issued on gl6 or the glusterfs-server process
is restarted, all files re-apear instantly, this is nice ! the file system on
nodes is xfs.
> 
> Q2.1 I assume that files metadata are recovered from other nodes and data
are re-linked from blocks stil existing on file system ? is it specific for XFS
or will it be the same with others fs like ext or zfs ? (future plans are to use
zfs as the underlaying fs)
> 
> 3) vol1 in mounted by the client but not files are accessed, files (the
same 20 files of 5Go) are voluntary corrupted with echo 0 >> F, then
"gluster v heal vol1 full? is issued on gl6, files are recovered one by one
from gl3 here.
> 
> Q3.1 : Why gluster didn?t try to recover multiples files from different
nodes at the same times ?
> Q3.2 : I already see Gluster recovering files from multiples nodes at the
same times during heavy workloads, in which circonstances this happens ?
> Q3.3 : after which time gluster would have detect the corruption and start
the self healing process ?
> 
> Actually I didn?t find any explanations on the web on how self healing
process works and what are different uses cases / scenarios, any pointers ?
> 
> Thanks

Gluster users - Dec 2016 - How different self healing scenarios works ?

[Gluster-users] How different self healing scenarios works ?

[Gluster-users] How different self healing scenarios works ?