thr3ads.net - Gluster users - [Gluster-users] Very poor heal behaviour in 3.7.9 [Mar 2016]

If this information is useful, please help other people find it:
Share via:

Lindsay Mathieson

2016-Mar-26 01:25 UTC

[Gluster-users] Very poor heal behaviour in 3.7.9

On 26/03/2016 12:14 AM, Ravishankar N wrote:> I think you need the exact no. of files and size of files that need 
> healing to make any meaningful comparison of self-heal performance 
> across versions.
> VM workloads with sharding might not be the ideal 'reproducer'
since
> you really don't know how many shards get modified when a replica is 
> down and I/O on the VMs happen. I suppose you could try testing the 
> heal performance of a specific no. of files on a sharded volume and 
> compare results.
Maybe my subject description was poor - while heal progress is not the 
best, its the I/O stalls that *really* concern me. If I reboot a node 
(or it crashes etc) any VM that is running on the cluster when that 
happened freezes on I/O access when heal kicks in until it finishes, 
which will take over an hour.

I see similar behaviour noted in the "GlusterFS cluster stalls if one 
server from the cluster goes down and then comes back up".

I tried setting "cluster.data-self-heal" off as suggested on that
thread
and it seems to have improved things. In the middle of maintenance right 
now and will test it more later.

thanks,

-- 
Lindsay Mathieson

Pranith Kumar Karampuri

2016-Mar-26 13:32 UTC

head link

[Gluster-users] Very poor heal behaviour in 3.7.9

On 03/26/2016 06:55 AM, Lindsay Mathieson wrote:> On 26/03/2016 12:14 AM, Ravishankar N wrote:
>> I think you need the exact no. of files and size of files that need 
>> healing to make any meaningful comparison of self-heal performance 
>> across versions.
>> VM workloads with sharding might not be the ideal 'reproducer'
since
>> you really don't know how many shards get modified when a replica
is
>> down and I/O on the VMs happen. I suppose you could try testing the 
>> heal performance of a specific no. of files on a sharded volume and 
>> compare results.
>
> Maybe my subject description was poor - while heal progress is not the 
> best, its the I/O stalls that *really* concern me. If I reboot a node 
> (or it crashes etc) any VM that is running on the cluster when that 
> happened freezes on I/O access when heal kicks in until it finishes, 
> which will take over an hour.
>
> I see similar behaviour noted in the "GlusterFS cluster stalls if one 
> server from the cluster goes down and then comes back up".
>
> I tried setting "cluster.data-self-heal" off as suggested on that
> thread and it seems to have improved things. In the middle of 
> maintenance right now and will test it more later.
Yes, this is a bug we are addressing for 3.7.10. The patch is already 
merged. http://review.gluster.org/13564

Pranith>
> thanks,
>

Gluster users - Mar 2016 - Very poor heal behaviour in 3.7.9

[Gluster-users] Very poor heal behaviour in 3.7.9

[Gluster-users] Very poor heal behaviour in 3.7.9