thr3ads.net - Gluster users - [Gluster-users] AFR Version used for self-heal [Feb 2016]

If this information is useful, please help other people find it:
Share via:

Kyle Maas

2016-Feb-26 04:32 UTC

[Gluster-users] AFR Version used for self-heal

On 02/25/2016 08:20 PM, Ravishankar N wrote:> On 02/25/2016 11:36 PM, Kyle Maas wrote:
>> How can I tell what AFR version a cluster is using for self-heal?
> If all your servers and clients are 3.7.8, then they are by default
> running afr-v2.  Afr-v2 was a re-write of afr that went in for 3.6.,
> so any gluster package from then on has this code, you don't need to
> explicitly enable anything.
That was what I thought until I ran across this IRC log where JoeJulian
asked if it was explicitly enabled:

https://irclog.perlgeek.de/gluster/2015-10-29
>>
>> The reason I ask is that I have a two-node replicated 3.7.8 cluster (no
>> arbiters) which has locking behavior during self-heal which looks very
>> similar to that of AFRv1 (only heals one file at a time per self-heal
>> daemon, appears to lock the full inode while it's healing it
instead of
>> just ranges, etc.),
>  Both v1 and v2 use range locks while healing a given file, so clients
> shouldn't block when heals happen. What is the problem you're
facing?
> Are your clients also at 3.7.8?
Primary symptoms are:

1. While a self-heal is running, only one file at a time is healed per
brick.  As I understand it, AFRv2 and up should allow for multiple files
to be healed concurrently or at least multiple ranges within a file,
particularly with io-thread-count set to >1.  During a self-heal,
neither I/O nor network is saturated, which leads me to believe that I'm
looking at a single synchronous self-healing process.

3. More troubling is that during a self-heal, clients cannot so much as
list the files on the volume until the self-heal is done.  No errors. 
No timeouts.  They just freeze.  As soon as the self-heal is complete,
they unfreeze and list the contents.

4. Any file access during a self-heal also freezes, just like a
directory listing, until the self-heal is done.  This wreaks havoc on
users who have files open when one of the bricks is rebooted and has to
be healed, since with as much data is stored on this cluster, a
self-heal can take almost 24 hours.

I experience the same problems when I run without any clients other than
the bricks themselves mounting the volume, so yes, it happens with the
clients on 3.7.8 as well.

Warm Regards,
Kyle Maas

Ravishankar N

2016-Feb-26 05:48 UTC

head link

[Gluster-users] AFR Version used for self-heal

On 02/26/2016 10:02 AM, Kyle Maas wrote:> On 02/25/2016 08:20 PM, Ravishankar N wrote:
>> On 02/25/2016 11:36 PM, Kyle Maas wrote:
>>> How can I tell what AFR version a cluster is using for self-heal?
>> If all your servers and clients are 3.7.8, then they are by default
>> running afr-v2.  Afr-v2 was a re-write of afr that went in for 3.6.,
>> so any gluster package from then on has this code, you don't need
to
>> explicitly enable anything.
> That was what I thought until I ran across this IRC log where JoeJulian
> asked if it was explicitly enabled:
>
> https://irclog.perlgeek.de/gluster/2015-10-29
>
>>> The reason I ask is that I have a two-node replicated 3.7.8 cluster
(no
>>> arbiters) which has locking behavior during self-heal which looks
very
>>> similar to that of AFRv1 (only heals one file at a time per
self-heal
>>> daemon, appears to lock the full inode while it's healing it
instead of
>>> just ranges, etc.),
>>   Both v1 and v2 use range locks while healing a given file, so clients
>> shouldn't block when heals happen. What is the problem you're
facing?
>> Are your clients also at 3.7.8?
> Primary symptoms are:
>
> 1. While a self-heal is running, only one file at a time is healed per
> brick.  As I understand it, AFRv2 and up should allow for multiple files
> to be healed concurrently or at least multiple ranges within a file,
> particularly with io-thread-count set to >1.  During a self-heal,
> neither I/O nor network is saturated, which leads me to believe that
I'm
> looking at a single synchronous self-healing process.The self-heal daemon on each node processes one file at a time per 
replica, so in that sense it is serial. We are  working on the 
multi-threaded self heal patch (http://review.gluster.org/#/c/13329/) 
for parallel heals.>
> 3. More troubling is that during a self-heal, clients cannot so much as
> list the files on the volume until the self-heal is done.  No errors.
> No timeouts.  They just freeze.  As soon as the self-heal is complete,
> they unfreeze and list the contents.I'm guessing http://review.gluster.org/#/c/13207/ would fix that. But as 
a work around, can you see if  'gluster vol set volname data-self-heal 
off` makes them more responsive?>
> 4. Any file access during a self-heal also freezes, just like a
> directory listing, until the self-heal is done.Ditto as above, please see if disabling client-side heal helps.

Regards,
Ravi
> This wreaks havoc on
> users who have files open when one of the bricks is rebooted and has to
> be healed, since with as much data is stored on this cluster, a
> self-heal can take almost 24 hours.
>
> I experience the same problems when I run without any clients other than
> the bricks themselves mounting the volume, so yes, it happens with the
> clients on 3.7.8 as well.
>
> Warm Regards,
> Kyle Maas
>

Joe Julian

2016-Feb-26 06:01 UTC

head link

[Gluster-users] AFR Version used for self-heal

On February 25, 2016 8:32:44 PM PST, Kyle Maas <kyle at
virtualinterconnect.com> wrote:>On 02/25/2016 08:20 PM, Ravishankar N wrote:
>> On 02/25/2016 11:36 PM, Kyle Maas wrote:
>>> How can I tell what AFR version a cluster is using for self-heal?
>> If all your servers and clients are 3.7.8, then they are by default
>> running afr-v2.  Afr-v2 was a re-write of afr that went in for 3.6.,
>> so any gluster package from then on has this code, you don't need
to
>> explicitly enable anything.
>
>That was what I thought until I ran across this IRC log where JoeJulian
>asked if it was explicitly enabled:
>
>https://irclog.perlgeek.de/gluster/2015-10-29
>

A couple lines down, though, i continued "Ah, I was confusing that with
nsr."
>>>
>>> The reason I ask is that I have a two-node replicated 3.7.8 cluster
>(no
>>> arbiters) which has locking behavior during self-heal which looks
>very
>>> similar to that of AFRv1 (only heals one file at a time per
>self-heal
>>> daemon, appears to lock the full inode while it's healing it
instead
>of
>>> just ranges, etc.),
>>  Both v1 and v2 use range locks while healing a given file, so
>clients
>> shouldn't block when heals happen. What is the problem you're
facing?
>> Are your clients also at 3.7.8?
>
>Primary symptoms are:
>
>1. While a self-heal is running, only one file at a time is healed per
>brick.  As I understand it, AFRv2 and up should allow for multiple
>files
>to be healed concurrently or at least multiple ranges within a file,
>particularly with io-thread-count set to >1.  During a self-heal,
>neither I/O nor network is saturated, which leads me to believe that
>I'm
>looking at a single synchronous self-healing process.
>
>3. More troubling is that during a self-heal, clients cannot so much as
>list the files on the volume until the self-heal is done.  No errors. 
>No timeouts.  They just freeze.  As soon as the self-heal is complete,
>they unfreeze and list the contents.
>
>4. Any file access during a self-heal also freezes, just like a
>directory listing, until the self-heal is done.  This wreaks havoc on
>users who have files open when one of the bricks is rebooted and has to
>be healed, since with as much data is stored on this cluster, a
>self-heal can take almost 24 hours.
>
>I experience the same problems when I run without any clients other
>than
>the bricks themselves mounting the volume, so yes, it happens with the
>clients on 3.7.8 as well.
>
>Warm Regards,
>Kyle Maas
>
>_______________________________________________
>Gluster-users mailing list
>Gluster-users at gluster.org
>http://www.gluster.org/mailman/listinfo/gluster-users
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Gluster users - Feb 2016 - AFR Version used for self-heal

[Gluster-users] AFR Version used for self-heal

[Gluster-users] AFR Version used for self-heal

[Gluster-users] AFR Version used for self-heal