thr3ads.net - Gluster users - [Gluster-users] poor performance during healing [Feb 2015]

If this information is useful, please help other people find it:
Share via:

Kingsley

2015-Feb-24 11:16 UTC

[Gluster-users] poor performance during healing

When testing gluster, I found similar issues when I simulated a brick
failure on a replicated volume - while it was rebuilding the newly
replaced brick, the volume was very unresponsive.

Our bricks are on SATA drives and the server LAN runs at 1Gbps. The
disks couldn't cope with the IOPS that the network was throwing at them.

I solved that particular issue by using traffic shaping to limit the
network bandwidth that the servers could use between each other (but not
limiting it to anywhere else). The volume took longer to rebuild the
replaced brick, but the volume was still responsive to clients during
the rebuild.

Please let me know if what we tried is a bad idea ...

Cheers,
Kingsley.

On Tue, 2015-02-24 at 07:11 +0530, Ravishankar N wrote:> On 02/24/2015 05:00 AM, Craig Yoshioka wrote:
> > I?m using Gluster 3.6 to host a volume with some KVM images.  I?d seen
before that other people were having terrible performance while Gluster was
auto-healing but that a rewrite in 3.6 had potentially solved this problem.
> >
> > Well, it hasn?t (for me).  If my gluster volume starts to auto-heal,
performance can get so bad that some of the VMs essentially lock up.  In top I
can see the glusterfsd process sometime hitting 700% of the CPU.  Is there
anything I can do to prevent this by throttling the healing process?
> For VM workloads, you could set the
'cluster.data-self-heal-algorithm'
> option to 'full'. The checksum computation in the 'diff'
algorithm can
> be cpu intensive, especially since VM images are big files.
> 
> [root at tuxpad glusterfs]# gluster v set help|grep algorithm
> Option: cluster.data-self-heal-algorithm
> Description: Select between "full", "diff". The
"full" algorithm copies
> the entire file from source to sink. The "diff" algorithm copies
to sink
> only those blocks whose checksums don't match with those of source. If 
> no option is configured the option is chosen dynamically as follows: If 
> the file does not exist on one of the sinks or empty file exists or if 
> the source file size is about the same as page size the entire file will 
> be read and written i.e "full" algo, otherwise "diff"
algo is chosen.
> 
> Hope this helps.
> Ravi
> 
> > Here are my volume options:
> >
> > Volume Name: vm-images
> > Type: Replicate
> > Volume ID: 5b38ddbe-a1ae-4e10-b0ad-dcd785a44493
> > Status: Started
> > Number of Bricks: 1 x 2 = 2
> > Transport-type: tcp
> > Bricks:
> > Brick1: vmhost-1:/gfs/brick-0
> > Brick2: vmhost-2:/gfs/brick-0
> > Options Reconfigured:
> > nfs.disable: on
> > cluster.quorum-count: 1
> > network.frame-timeout: 1800
> > network.ping-timeout: 15
> > server.allow-insecure: on
> > storage.owner-gid: 36
> > storage.owner-uid: 107
> > performance.quick-read: off
> > performance.read-ahead: off
> > performance.io-cache: off
> > performance.stat-prefetch: off
> > cluster.eager-lock: enable
> > network.remote-dio: enable
> > cluster.quorum-type: fixed
> > cluster.server-quorum-type: server
> > cluster.server-quorum-ratio: 51%
> >
> > Thanks!
> > -Craig
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-users
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users

Ravishankar N

2015-Feb-24 12:42 UTC

head link

[Gluster-users] poor performance during healing

On 02/24/2015 04:46 PM, Kingsley wrote:> When testing gluster, I found similar issues when I simulated a brick
> failure on a replicated volume - while it was rebuilding the newly
> replaced brick, the volume was very unresponsive.
>
> Our bricks are on SATA drives and the server LAN runs at 1Gbps. The
> disks couldn't cope with the IOPS that the network was throwing at
them.
>
> I solved that particular issue by using traffic shaping to limit the
> network bandwidth that the servers could use between each other (but not
> limiting it to anywhere else). The volume took longer to rebuild the
> replaced brick, but the volume was still responsive to clients during
> the rebuild.
>
> Please let me know if what we tried is a bad idea ...The self-heal daemon (shd) which does the heals also runs on the 
servers.  It is basically a process that loads some of the client side 
xlators so that it has a cluster view. It then connects to the bricks 
like a normal client and does the heals from the source to the sink. So 
limiting that bandwidth between the shd and the bricks so that the 
'real' clients can connect to the bricks seems to support your findings.

But what exactly did you do to limit the bandwidth? The gluster nfs 
server process also resides on the  brick nodes. So maybe limiting the 
bandwidth between that and the brick processes would slow down nfs 
clients as well.

Also, what version of gluster did you try this on? Beginning with 3.6, 
AFR has granular entry self-heals. Before this, (i.e. 3.5 and less) AFR 
used to take a full-lock on the directory and clients could not modify 
the directory contents until the heal was complete.

Thanks,
Ravi> Cheers,
> Kingsley.
>
> On Tue, 2015-02-24 at 07:11 +0530, Ravishankar N wrote:
>> On 02/24/2015 05:00 AM, Craig Yoshioka wrote:
>>> I?m using Gluster 3.6 to host a volume with some KVM images.  I?d
seen before that other people were having terrible performance while Gluster was
auto-healing but that a rewrite in 3.6 had potentially solved this problem.
>>>
>>> Well, it hasn?t (for me).  If my gluster volume starts to
auto-heal, performance can get so bad that some of the VMs essentially lock up. 
In top I can see the glusterfsd process sometime hitting 700% of the CPU.  Is
there anything I can do to prevent this by throttling the healing process?
>> For VM workloads, you could set the
'cluster.data-self-heal-algorithm'
>> option to 'full'. The checksum computation in the
'diff' algorithm can
>> be cpu intensive, especially since VM images are big files.
>>
>> [root at tuxpad glusterfs]# gluster v set help|grep algorithm
>> Option: cluster.data-self-heal-algorithm
>> Description: Select between "full", "diff". The
"full" algorithm copies
>> the entire file from source to sink. The "diff" algorithm
copies to sink
>> only those blocks whose checksums don't match with those of source.
If
>> no option is configured the option is chosen dynamically as follows: If
>> the file does not exist on one of the sinks or empty file exists or if
>> the source file size is about the same as page size the entire file
will
>> be read and written i.e "full" algo, otherwise
"diff" algo is chosen.
>>
>> Hope this helps.
>> Ravi
>>
>>> Here are my volume options:
>>>
>>> Volume Name: vm-images
>>> Type: Replicate
>>> Volume ID: 5b38ddbe-a1ae-4e10-b0ad-dcd785a44493
>>> Status: Started
>>> Number of Bricks: 1 x 2 = 2
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: vmhost-1:/gfs/brick-0
>>> Brick2: vmhost-2:/gfs/brick-0
>>> Options Reconfigured:
>>> nfs.disable: on
>>> cluster.quorum-count: 1
>>> network.frame-timeout: 1800
>>> network.ping-timeout: 15
>>> server.allow-insecure: on
>>> storage.owner-gid: 36
>>> storage.owner-uid: 107
>>> performance.quick-read: off
>>> performance.read-ahead: off
>>> performance.io-cache: off
>>> performance.stat-prefetch: off
>>> cluster.eager-lock: enable
>>> network.remote-dio: enable
>>> cluster.quorum-type: fixed
>>> cluster.server-quorum-type: server
>>> cluster.server-quorum-ratio: 51%
>>>
>>> Thanks!
>>> -Craig
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://www.gluster.org/mailman/listinfo/gluster-users
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users

Gluster users - Feb 2015 - poor performance during healing

[Gluster-users] poor performance during healing

[Gluster-users] poor performance during healing