thr3ads.net - Gluster users - [Gluster-users] A few queries on self-healing and AFR (glusterfs 3.4.2) [Feb 2015]

If this information is useful, please help other people find it:
Share via:

A Ghoshal

2015-Feb-02 18:30 UTC

[Gluster-users] A few queries on self-healing and AFR (glusterfs 3.4.2)

Hello,

I have a replica-2 volume in which I store a large number of files that
are updated frequently (critical log files, etc). My files are generally
stable, but one thing that does worry me from time to time is that files
show up on one of the bricks in the output of gluster v <volname> heal
info. These entries disappear on their own after a while (I am guessing
when cluster.heal-timeout expires and another heal by the self-heal daemon
is triggered). For certain files, this could be a bit of a bother - in
terms of fault tolerance...

I was wondering if there is a way I could force AFR to return
write-completion to the application only _after_ the data is written to
both replicas successfully (kind of, like, atomic writes) - even if it
were at the cost of performance. This way I could ensure that my bricks
shall always be in sync.

The other thing I could possibly do is reduce my cluster.heal-timeout (it
is 600 currently). Is it a bad idea to set it to something as small as
say, 60 seconds for volumes where redundancy is a prime concern?

One question, though - is heal through self-heal daemon accomplished using
separate threads for each replicated volume, or is it a single thread for
every volume? The reason I ask is I have a large number of replicated
file-systems on each volume (17, to be precise) but I do have a reasonably
powerful multicore processor array and large RAM and top indicates the
load on the system resources is quite moderate.

Thanks,
Anirban
=====-----=====-----====Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150203/90fc7906/attachment.html>

A Ghoshal

2015-Feb-02 22:41 UTC

head link

[Gluster-users] A few queries on self-healing and AFR (glusterfs 3.4.2)

It seems I found out what goes wrong here - and this was useful learning 
to me:

On one of the replica servers, the client mount did not have an open port 
to communicate with the other krfsd process. To illustrate:

root at serv1:/root> ps -ef | grep replicated_vol
root     30627     1  0 Jan29 ?        00:17:30 /usr/sbin/glusterfs 
--volfile-id=replicated_vol --volfile-server=serv1 /mnt/replicated_vol
root     31132 18322  0 23:04 pts/1    00:00:00 grep 
_opt_kapsch_cnp_data_memusage
root     31280     1  0 06:32 ?        00:09:10 /usr/sbin/glusterfsd -s 
serv1 --volfile-id replicated_vol.serv1.mnt-bricks-replicated_vol-brick -p 
/var/lib/glusterd/vols/replicated_vol/run/serv1-mnt-bricks-replicated_vol-brick.pid
-S /var/run/4d70e99b47c1f95cc2eab1715d3a9b67.socket --brick-name 
/mnt/bricks/replicated_vol/brick -l 
/var/log/glusterfs/bricks/mnt-bricks-replicated_vol-bricks.log 
--xlator-option *-posix.glusterd-uuid=c7930be6-969f-4f62-b119-c5bbe4df22a3 
--brick-port 49172 --xlator-option replicated_vol.listen-port=49172


root at serv1:/root> netstat -p | grep 30627
tcp        0      0 serv1:715           serv1:24007         ESTABLISHED 
30627/glusterfs <= client<->local glusterd
tcp        0      0 serv1:863           serv1:49172         ESTABLISHED 
30627/glusterfs <= client<->local brick
root at serv1:/root> 

However, the client on the other server did have a port open to the mount, 
and so whatever one wrote on the other server synced over immediately.

root at serv0:/root> ps -ef | grep replicated_vol
root     12761  7556  0 23:05 pts/1    00:00:00 replicated_vol
root     15067     1  0 06:32 ?        00:04:50 /usr/sbin/glusterfsd -s 
serv1 --volfile-id replicated_vol.serv1.mnt-bricks-replicated_vol-brick -p 
/var/lib/glusterd/vols/replicated_vol/run/serv1-mnt-bricks-replicated_vol-brick.pid
-S /var/run/f642d7dbff0ab7a475a23236f6f50b33.socket --brick-name 
/mnt/bricks/replicated_vol/brick -l 
/var/log/glusterfs/bricks/mnt-bricks-replicated_vol-bricks.log 
--xlator-option *-posix.glusterd-uuid=13df1bd2-6dc8-49fa-ade0-5cd95f6b1f19 
--brick-port 49209 --xlator-option replicated_vol.listen-port=49209
root     30587     1  0 Jan30 ?        00:12:17 /usr/sbin/glusterfs 
--volfile-id=serv --volfile-server=serv0 /mnt/replicated_vol

root at serv0:/root> netstat -p | grep 30587
tcp        0      0 serv0:859           serv1:49172         ESTABLISHED 
30587/glusterfs <= client<->remote brick
tcp        0      0 serv0:746           serv0:24007         ESTABLISHED 
30587/glusterfs <= client<->glusterd
tcp        0      0 serv0:857           serv0:49209         ESTABLISHED 
30587/glusterfs <= client<->local brick
root at serv0:/root> 

So, the client has no open tcp link with the mate brick - which is why it 
cannot write to the mate brick directly, and instead has to rely on the 
self-heal daemon instead to do the job. Of course, I now need to debug why 
the connection fails, but at least we are clean on AFR. 

Thanks everyone.



From:   A Ghoshal <a.ghoshal at tcs.com>
To:     gluster-users at gluster.org
Date:   02/03/2015 12:00 AM
Subject:        [Gluster-users] A few queries on self-healing and AFR 
(glusterfs      3.4.2)
Sent by:        gluster-users-bounces at gluster.org



Hello, 

I have a replica-2 volume in which I store a large number of files that 
are updated frequently (critical log files, etc). My files are generally 
stable, but one thing that does worry me from time to time is that files 
show up on one of the bricks in the output of gluster v <volname> heal 
info. These entries disappear on their own after a while (I am guessing 
when cluster.heal-timeout expires and another heal by the self-heal daemon 
is triggered). For certain files, this could be a bit of a bother - in 
terms of fault tolerance... 

I was wondering if there is a way I could force AFR to return 
write-completion to the application only _after_ the data is written to 
both replicas successfully (kind of, like, atomic writes) - even if it 
were at the cost of performance. This way I could ensure that my bricks 
shall always be in sync. 

The other thing I could possibly do is reduce my cluster.heal-timeout (it 
is 600 currently). Is it a bad idea to set it to something as small as 
say, 60 seconds for volumes where redundancy is a prime concern? 

One question, though - is heal through self-heal daemon accomplished using 
separate threads for each replicated volume, or is it a single thread for 
every volume? The reason I ask is I have a large number of replicated 
file-systems on each volume (17, to be precise) but I do have a reasonably 
powerful multicore processor array and large RAM and top indicates the 
load on the system resources is quite moderate. 

Thanks, 
Anirban
=====-----=====-----====Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you
_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150203/2e352da8/attachment.html>

Krutika Dhananjay

2015-Feb-05 12:14 UTC

head link

[Gluster-users] A few queries on self-healing and AFR (glusterfs 3.4.2)

----- Original Message -----
> From: "A Ghoshal" <a.ghoshal at tcs.com>
> To: gluster-users at gluster.org
> Sent: Tuesday, February 3, 2015 12:00:15 AM
> Subject: [Gluster-users] A few queries on self-healing and AFR (glusterfs
> 3.4.2)
> Hello,
> I have a replica-2 volume in which I store a large number of files that are
> updated frequently (critical log files, etc). My files are generally
stable,
> but one thing that does worry me from time to time is that files show up on
> one of the bricks in the output of gluster v <volname> heal info.
These
> entries disappear on their own after a while (I am guessing when
> cluster.heal-timeout expires and another heal by the self-heal daemon is
> triggered). For certain files, this could be a bit of a bother - in terms
of
> fault tolerance...In 3.4.x, even files that are currently undergoing modification will be listed
in heal-info output. So this could be the reason why the file(s) disappear from
the output after a while, in which case reducing cluster.heal-timeout might not
solve the problem. Since 3.5.1, heal-info _only_ reports those files which are
truly undergoing heal.
> I was wondering if there is a way I could force AFR to return
> write-completion to the application only _after_ the data is written to
both
> replicas successfully (kind of, like, atomic writes) - even if it were at
> the cost of performance. This way I could ensure that my bricks shall
always
> be in sync.AFR has always returned write-completion status to the application only _after_
the data is written to all replicas. The appearance of files under modification
in heal-info output might have led you to think the changes have not (yet) been
synced to the other replica(s).
> The other thing I could possibly do is reduce my cluster.heal-timeout (it
is
> 600 currently). Is it a bad idea to set it to something as small as say, 60
> seconds for volumes where redundancy is a prime concern?
> One question, though - is heal through self-heal daemon accomplished using
> separate threads for each replicated volume, or is it a single thread for
> every volume? The reason I ask is I have a large number of replicated
> file-systems on each volume (17, to be precise) but I do have a reasonably
> powerful multicore processor array and large RAM and top indicates the load
> on the system resources is quite moderate.There is an infra piece called syncop in gluster using which multiple heal jobs
are handled by handful of threads. The maximum it can scale up to is 16
depending on the load. It is safe to assume that there will be one healer thread
per replica set. But if the load is not too high, just 1 thread may do all the
healing.

-Krutika 
> Thanks,
> Anirban
> =====-----=====-----====> Notice: The information contained in this
e-mail
> message and/or attachments to it may contain
> confidential or privileged information. If you are
> not the intended recipient, any dissemination, use,
> review, distribution, printing or copying of the
> information contained in this e-mail message
> and/or attachments to it are strictly prohibited. If
> you have received this communication in error,
> please notify us by reply e-mail or telephone and
> immediately and permanently delete the message
> and any attachments. Thank you
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150205/2eff8899/attachment.html>

Gluster users - Feb 2015 - A few queries on self-healing and AFR (glusterfs 3.4.2)

[Gluster-users] A few queries on self-healing and AFR (glusterfs 3.4.2)

[Gluster-users] A few queries on self-healing and AFR (glusterfs 3.4.2)

[Gluster-users] A few queries on self-healing and AFR (glusterfs 3.4.2)