thr3ads.net - Gluster users - [Gluster-users] GlusterFS healing questions [Nov 2017]

If this information is useful, please help other people find it:
Share via:

Rolf Larsen

2017-Nov-09 14:20 UTC

[Gluster-users] GlusterFS healing questions

Hi,

We ran a test on GlusterFS 3.12.1 with erasurecoded volumes 8+2 with 10
bricks (default config,tested with 100gb, 200gb, 400gb bricksizes,10gbit
nics)

1.
Tests show that healing takes about double the time on healing 200gb vs
100, and abit under the double on 400gb vs 200gb bricksizes. Is this
expected behaviour? In light of this would make 6,4 tb bricksizes use ~ 377
hours to heal.

100gb brick heal: 18 hours (8+2)
200gb brick heal: 37 hours (8+2) +205%
400gb brick heal: 59 hours (8+2) +159%

Each 100gb is filled with 80000 x 10mb files (200gb is 2x and 400gb is 4x)

2.
Are there any possibility to show the progress of a heal? As per now we run
gluster volume heal volume info, but this exit's when a brick is done
healing and when we run heal info again the command contiunes showing
gfid's until the brick is done again. This gives quite a bad picture of the
status of a heal.

3.
What kind of config tweaks is recommended for these kind of EC volumes?


$ gluster volume info
Volume Name: test-ec-100g
Type: Disperse
Volume ID: 0254281d-2f6e-4ac4-a773-2b8e0eb8ab27
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (8 + 2) = 10
Transport-type: tcp
Bricks:
Brick1: dn-304:/mnt/test-ec-100/brick
Brick2: dn-305:/mnt/test-ec-100/brick
Brick3: dn-306:/mnt/test-ec-100/brick
Brick4: dn-307:/mnt/test-ec-100/brick
Brick5: dn-308:/mnt/test-ec-100/brick
Brick6: dn-309:/mnt/test-ec-100/brick
Brick7: dn-310:/mnt/test-ec-100/brick
Brick8: dn-311:/mnt/test-ec-2/brick
Brick9: dn-312:/mnt/test-ec-100/brick
Brick10: dn-313:/mnt/test-ec-100/brick
Options Reconfigured:
nfs.disable: on
transport.address-family: inet

Volume Name: test-ec-200
Type: Disperse
Volume ID: 2ce23e32-7086-49c5-bf0c-7612fd7b3d5d
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (8 + 2) = 10
Transport-type: tcp
Bricks:
Brick1: dn-304:/mnt/test-ec-200/brick
Brick2: dn-305:/mnt/test-ec-200/brick
Brick3: dn-306:/mnt/test-ec-200/brick
Brick4: dn-307:/mnt/test-ec-200/brick
Brick5: dn-308:/mnt/test-ec-200/brick
Brick6: dn-309:/mnt/test-ec-200/brick
Brick7: dn-310:/mnt/test-ec-200/brick
Brick8: dn-311:/mnt/test-ec-200_2/brick
Brick9: dn-312:/mnt/test-ec-200/brick
Brick10: dn-313:/mnt/test-ec-200/brick
Options Reconfigured:
nfs.disable: on
transport.address-family: inet

Volume Name: test-ec-400
Type: Disperse
Volume ID: fe00713a-7099-404d-ba52-46c6b4b6ecc0
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (8 + 2) = 10
Transport-type: tcp
Bricks:
Brick1: dn-304:/mnt/test-ec-400/brick
Brick2: dn-305:/mnt/test-ec-400/brick
Brick3: dn-306:/mnt/test-ec-400/brick
Brick4: dn-307:/mnt/test-ec-400/brick
Brick5: dn-308:/mnt/test-ec-400/brick
Brick6: dn-309:/mnt/test-ec-400/brick
Brick7: dn-310:/mnt/test-ec-400/brick
Brick8: dn-311:/mnt/test-ec-400_2/brick
Brick9: dn-312:/mnt/test-ec-400/brick
Brick10: dn-313:/mnt/test-ec-400/brick
Options Reconfigured:
nfs.disable: on
transport.address-family: inet

-- 

Regards
Rolf Arne Larsen
Ops Engineer
rolf at jottacloud.com <rolf at startsiden.no>
<http://www.jottacloud.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20171109/5c4ca7f1/attachment.html>

Xavi Hernandez

2017-Nov-09 14:47 UTC

head link

[Gluster-users] GlusterFS healing questions

Hi Rolf,

answers follow inline...

On Thu, Nov 9, 2017 at 3:20 PM, Rolf Larsen <rolf at jotta.no> wrote:
> Hi,
>
> We ran a test on GlusterFS 3.12.1 with erasurecoded volumes 8+2 with 10
> bricks (default config,tested with 100gb, 200gb, 400gb bricksizes,10gbit
> nics)
>
> 1.
> Tests show that healing takes about double the time on healing 200gb vs
> 100, and abit under the double on 400gb vs 200gb bricksizes. Is this
> expected behaviour? In light of this would make 6,4 tb bricksizes use ~ 377
> hours to heal.
>
> 100gb brick heal: 18 hours (8+2)
> 200gb brick heal: 37 hours (8+2) +205%
> 400gb brick heal: 59 hours (8+2) +159%
>
> Each 100gb is filled with 80000 x 10mb files (200gb is 2x and 400gb is 4x)
>
If I understand it correctly, you are storing 80.000 files of 10 MB each
when you are using 100GB bricks, but you double this value for 200GB bricks
(160.000 files of 10MB each). And for 400GB bricks you create 320.000
files. Have I understood it correctly ?

If this is true, it's normal that twice the space requires approximately
twice the heal time. The healing time depends on the contents of the brick,
not brick size. The same amount of files should take the same healing time,
whatever the brick size is.

>
> 2.
> Are there any possibility to show the progress of a heal? As per now we
> run gluster volume heal volume info, but this exit's when a brick is
done
> healing and when we run heal info again the command contiunes showing
> gfid's until the brick is done again. This gives quite a bad picture of
the
> status of a heal.
>
The output of 'gluster volume heal <volname> info' shows the list
of files
pending to be healed on each brick. The heal is complete when the list is
empty. A faster alternative if you don't want to see the whole list of
files is to use 'gluster volume heal <volname> statistics
heal-count'. This
will only show the number of pending files on each brick.

I don't know any other way to track progress of self-heal.

>
> 3.
> What kind of config tweaks is recommended for these kind of EC volumes?
>
I usually use the following values (specific only for ec):

client.event-threads 4
server.event-threads 4
performance.client-io-threads on

Regards,

Xavi



>
>
> $ gluster volume info
> Volume Name: test-ec-100g
> Type: Disperse
> Volume ID: 0254281d-2f6e-4ac4-a773-2b8e0eb8ab27
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x (8 + 2) = 10
> Transport-type: tcp
> Bricks:
> Brick1: dn-304:/mnt/test-ec-100/brick
> Brick2: dn-305:/mnt/test-ec-100/brick
> Brick3: dn-306:/mnt/test-ec-100/brick
> Brick4: dn-307:/mnt/test-ec-100/brick
> Brick5: dn-308:/mnt/test-ec-100/brick
> Brick6: dn-309:/mnt/test-ec-100/brick
> Brick7: dn-310:/mnt/test-ec-100/brick
> Brick8: dn-311:/mnt/test-ec-2/brick
> Brick9: dn-312:/mnt/test-ec-100/brick
> Brick10: dn-313:/mnt/test-ec-100/brick
> Options Reconfigured:
> nfs.disable: on
> transport.address-family: inet
>
> Volume Name: test-ec-200
> Type: Disperse
> Volume ID: 2ce23e32-7086-49c5-bf0c-7612fd7b3d5d
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x (8 + 2) = 10
> Transport-type: tcp
> Bricks:
> Brick1: dn-304:/mnt/test-ec-200/brick
> Brick2: dn-305:/mnt/test-ec-200/brick
> Brick3: dn-306:/mnt/test-ec-200/brick
> Brick4: dn-307:/mnt/test-ec-200/brick
> Brick5: dn-308:/mnt/test-ec-200/brick
> Brick6: dn-309:/mnt/test-ec-200/brick
> Brick7: dn-310:/mnt/test-ec-200/brick
> Brick8: dn-311:/mnt/test-ec-200_2/brick
> Brick9: dn-312:/mnt/test-ec-200/brick
> Brick10: dn-313:/mnt/test-ec-200/brick
> Options Reconfigured:
> nfs.disable: on
> transport.address-family: inet
>
> Volume Name: test-ec-400
> Type: Disperse
> Volume ID: fe00713a-7099-404d-ba52-46c6b4b6ecc0
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x (8 + 2) = 10
> Transport-type: tcp
> Bricks:
> Brick1: dn-304:/mnt/test-ec-400/brick
> Brick2: dn-305:/mnt/test-ec-400/brick
> Brick3: dn-306:/mnt/test-ec-400/brick
> Brick4: dn-307:/mnt/test-ec-400/brick
> Brick5: dn-308:/mnt/test-ec-400/brick
> Brick6: dn-309:/mnt/test-ec-400/brick
> Brick7: dn-310:/mnt/test-ec-400/brick
> Brick8: dn-311:/mnt/test-ec-400_2/brick
> Brick9: dn-312:/mnt/test-ec-400/brick
> Brick10: dn-313:/mnt/test-ec-400/brick
> Options Reconfigured:
> nfs.disable: on
> transport.address-family: inet
>
> --
>
> Regards
> Rolf Arne Larsen
> Ops Engineer
> rolf at jottacloud.com <rolf at startsiden.no>
> <http://www.jottacloud.com>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20171109/0919db84/attachment.html>

Serkan Çoban

2017-Nov-09 17:00 UTC

head link

[Gluster-users] GlusterFS healing questions

Hi,

You can set disperse.shd-max-threads to 2 or 4 in order to make heal
faster. This makes my heal times 2-3x faster.
Also you can play with disperse.self-heal-window-size to read more
bytes at one time, but i did not test it.

On Thu, Nov 9, 2017 at 4:47 PM, Xavi Hernandez <jahernan at redhat.com>
wrote:> Hi Rolf,
>
> answers follow inline...
>
> On Thu, Nov 9, 2017 at 3:20 PM, Rolf Larsen <rolf at jotta.no> wrote:
>>
>> Hi,
>>
>> We ran a test on GlusterFS 3.12.1 with erasurecoded volumes 8+2 with 10
>> bricks (default config,tested with 100gb, 200gb, 400gb
bricksizes,10gbit
>> nics)
>>
>> 1.
>> Tests show that healing takes about double the time on healing 200gb vs
>> 100, and abit under the double on 400gb vs 200gb bricksizes. Is this
>> expected behaviour? In light of this would make 6,4 tb bricksizes use ~
377
>> hours to heal.
>>
>> 100gb brick heal: 18 hours (8+2)
>> 200gb brick heal: 37 hours (8+2) +205%
>> 400gb brick heal: 59 hours (8+2) +159%
>>
>> Each 100gb is filled with 80000 x 10mb files (200gb is 2x and 400gb is
4x)
>
>
> If I understand it correctly, you are storing 80.000 files of 10 MB each
> when you are using 100GB bricks, but you double this value for 200GB bricks
> (160.000 files of 10MB each). And for 400GB bricks you create 320.000
files.
> Have I understood it correctly ?
>
> If this is true, it's normal that twice the space requires
approximately
> twice the heal time. The healing time depends on the contents of the brick,
> not brick size. The same amount of files should take the same healing time,
> whatever the brick size is.
>
>>
>>
>> 2.
>> Are there any possibility to show the progress of a heal? As per now we
>> run gluster volume heal volume info, but this exit's when a brick
is done
>> healing and when we run heal info again the command contiunes showing
gfid's
>> until the brick is done again. This gives quite a bad picture of the
status
>> of a heal.
>
>
> The output of 'gluster volume heal <volname> info' shows the
list of files
> pending to be healed on each brick. The heal is complete when the list is
> empty. A faster alternative if you don't want to see the whole list of
files
> is to use 'gluster volume heal <volname> statistics
heal-count'. This will
> only show the number of pending files on each brick.
>
> I don't know any other way to track progress of self-heal.
>
>>
>>
>> 3.
>> What kind of config tweaks is recommended for these kind of EC volumes?
>
>
> I usually use the following values (specific only for ec):
>
> client.event-threads 4
> server.event-threads 4
> performance.client-io-threads on
>
> Regards,
>
> Xavi
>
>
>
>>
>>
>>
>> $ gluster volume info
>> Volume Name: test-ec-100g
>> Type: Disperse
>> Volume ID: 0254281d-2f6e-4ac4-a773-2b8e0eb8ab27
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 1 x (8 + 2) = 10
>> Transport-type: tcp
>> Bricks:
>> Brick1: dn-304:/mnt/test-ec-100/brick
>> Brick2: dn-305:/mnt/test-ec-100/brick
>> Brick3: dn-306:/mnt/test-ec-100/brick
>> Brick4: dn-307:/mnt/test-ec-100/brick
>> Brick5: dn-308:/mnt/test-ec-100/brick
>> Brick6: dn-309:/mnt/test-ec-100/brick
>> Brick7: dn-310:/mnt/test-ec-100/brick
>> Brick8: dn-311:/mnt/test-ec-2/brick
>> Brick9: dn-312:/mnt/test-ec-100/brick
>> Brick10: dn-313:/mnt/test-ec-100/brick
>> Options Reconfigured:
>> nfs.disable: on
>> transport.address-family: inet
>>
>> Volume Name: test-ec-200
>> Type: Disperse
>> Volume ID: 2ce23e32-7086-49c5-bf0c-7612fd7b3d5d
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 1 x (8 + 2) = 10
>> Transport-type: tcp
>> Bricks:
>> Brick1: dn-304:/mnt/test-ec-200/brick
>> Brick2: dn-305:/mnt/test-ec-200/brick
>> Brick3: dn-306:/mnt/test-ec-200/brick
>> Brick4: dn-307:/mnt/test-ec-200/brick
>> Brick5: dn-308:/mnt/test-ec-200/brick
>> Brick6: dn-309:/mnt/test-ec-200/brick
>> Brick7: dn-310:/mnt/test-ec-200/brick
>> Brick8: dn-311:/mnt/test-ec-200_2/brick
>> Brick9: dn-312:/mnt/test-ec-200/brick
>> Brick10: dn-313:/mnt/test-ec-200/brick
>> Options Reconfigured:
>> nfs.disable: on
>> transport.address-family: inet
>>
>> Volume Name: test-ec-400
>> Type: Disperse
>> Volume ID: fe00713a-7099-404d-ba52-46c6b4b6ecc0
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 1 x (8 + 2) = 10
>> Transport-type: tcp
>> Bricks:
>> Brick1: dn-304:/mnt/test-ec-400/brick
>> Brick2: dn-305:/mnt/test-ec-400/brick
>> Brick3: dn-306:/mnt/test-ec-400/brick
>> Brick4: dn-307:/mnt/test-ec-400/brick
>> Brick5: dn-308:/mnt/test-ec-400/brick
>> Brick6: dn-309:/mnt/test-ec-400/brick
>> Brick7: dn-310:/mnt/test-ec-400/brick
>> Brick8: dn-311:/mnt/test-ec-400_2/brick
>> Brick9: dn-312:/mnt/test-ec-400/brick
>> Brick10: dn-313:/mnt/test-ec-400/brick
>> Options Reconfigured:
>> nfs.disable: on
>> transport.address-family: inet
>>
>> --
>>
>> Regards
>> Rolf Arne Larsen
>> Ops Engineer
>> rolf at jottacloud.com
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users

Maybe Matching Threads

Search for more possibly parallel threads

Gluster users - Nov 2017 - GlusterFS healing questions

[Gluster-users] GlusterFS healing questions

[Gluster-users] GlusterFS healing questions

[Gluster-users] GlusterFS healing questions

Maybe Matching Threads