thr3ads.net - Gluster users - [Gluster-users] Rebalance Issues [Nov 2021]

If this information is useful, please help other people find it:
Share via:

Shreyansh Shah

2021-Nov-12 06:31 UTC

[Gluster-users] Rebalance Issues

Hi All,

I have a distributed glusterfs 5.10 setup with 8 nodes and each of them
having 1 TB disk and 3 disk of 4TB each (so total 22 TB per node).
Recently I added  a new node with 3 additional disks (1 x 10TB + 2 x 8TB).
Post this I ran rebalance and it does not seem to complete successfully
(adding result of gluster volume rebalance data status below). On a few
nodes it shows failed and on the node it is showing as completed the
rebalance is not even.


root at gluster6-new:~# gluster v rebalance data status
                                    Node Rebalanced-files          size
  scanned      failures       skipped               status  run time in
h:m:s
                               ---------      -----------   -----------
-----------   -----------   -----------         ------------
--------------
                               localhost            22836         2.4TB
   136149             1         27664          in progress       14:48:56
                             10.132.1.15               80         5.0MB
     1134             3           121               failed        1:08:33
                             10.132.1.14            18573         2.5TB
   137827            20         31278          in progress       14:48:56
                             10.132.1.12              607        61.3MB
     1667             5            60               failed        1:08:33
      gluster4.c.storage-186813.internal            26479         2.8TB
   148402            14         38271          in progress       14:48:56
                             10.132.1.18               86         6.4MB
     1094             5            70               failed        1:08:33
                             10.132.1.17            21953         2.6TB
   131573             4         26818          in progress       14:48:56
                             10.132.1.16               56        45.0MB
     1203             5           111               failed        1:08:33
                             10.132.0.19             3108         1.9TB
   224707             2        160148            completed       13:56:31
Estimated time left for rebalance to complete :       22:04:28


Adding 'df -h'  output for the node that has been marked as completed in
the above status command, the data does not seem to be evenly balanced.

root at gluster-9:~$ df -h /data*
Filesystem      Size  Used Avail Use% Mounted on
/dev/bcache0     10T  8.9T  1.1T  90% /data
/dev/bcache1    8.0T  5.0T  3.0T  63% /data1
/dev/bcache2    8.0T  5.0T  3.0T  63% /data2



I would appreciate any help to identify the issues here:

1. Failures during rebalance.
2. Im-balance in data size post gluster rebalance command.
3. Another thing I would like to mention is that we had to re-balance twice
as in the initial run one of the new disks on the new node (10 TB), got
100% full. Any thoughts as to why this could happen during rebalance? The
disks on the new node were completely blank disks before rebalance.
4. Does glusterfs rebalance data based on percentage used or absolute free
disk space available?

I can share more details/logs if required. Thanks.


-- 
Regards,
Shreyansh Shah
*AlphaGrep Securities Pvt. Ltd.*
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20211112/7ce5129c/attachment.html>

Thomas Bätzler

2021-Nov-12 07:37 UTC

head link

[Gluster-users] Rebalance Issues

Hello Shreyansh Shah,

 

How is your gluster set up? I think it would be very helpful for our
understanding of your setup to see the output of ?gluster v info all? annotated
with brick sizes.

Otherwise, how could anybody answer your questions?



Best regards,

i.A. Thomas B?tzler

-- 

BRINGE Informationstechnik GmbH

Zur Seeplatte 12

D-76228 Karlsruhe

Germany

 

Fon: +49 721 94246-0

Fon: +49 171 5438457

Fax: +49 721 94246-66

Web:  <http://www.bringe.de/> http://www.bringe.de/

 

Gesch?ftsf?hrer: Dipl.-Ing. (FH) Martin Bringe

Ust.Id: DE812936645, HRB 108943 Mannheim

 

Von: Gluster-users <gluster-users-bounces at gluster.org> Im Auftrag von
Shreyansh Shah
Gesendet: Freitag, 12. November 2021 07:31
An: gluster-users <gluster-users at gluster.org>
Betreff: [Gluster-users] Rebalance Issues

 

Hi All,

I have a distributed glusterfs 5.10 setup with 8 nodes and each of them having 1
TB disk and 3 disk of 4TB each (so total 22 TB per node).
Recently I added  a new node with 3 additional disks (1 x 10TB + 2 x 8TB). Post
this I ran rebalance and it does not seem to complete successfully (adding
result of gluster volume rebalance data status below). On a few nodes it shows
failed and on the node it is showing as completed the rebalance is not even.



root at gluster6-new:~# gluster v rebalance data status
                                    Node Rebalanced-files          size      
scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------  
-----------   -----------   -----------         ------------     --------------
                               localhost            22836         2.4TB       
136149             1         27664          in progress       14:48:56
                             10.132.1.15               80         5.0MB         
1134             3           121               failed        1:08:33
                             10.132.1.14            18573         2.5TB       
137827            20         31278          in progress       14:48:56
                             10.132.1.12              607        61.3MB         
1667             5            60               failed        1:08:33
      gluster4.c.storage-186813.internal            26479         2.8TB       
148402            14         38271          in progress       14:48:56
                             10.132.1.18               86         6.4MB         
1094             5            70               failed        1:08:33
                             10.132.1.17            21953         2.6TB       
131573             4         26818          in progress       14:48:56
                             10.132.1.16               56        45.0MB         
1203             5           111               failed        1:08:33
                             10.132.0.19             3108         1.9TB       
224707             2        160148            completed       13:56:31
Estimated time left for rebalance to complete :       22:04:28


Adding 'df -h'  output for the node that has been marked as completed in
the above status command, the data does not seem to be evenly balanced.

root at gluster-9:~$ df -h /data*
Filesystem      Size  Used Avail Use% Mounted on
/dev/bcache0     10T  8.9T  1.1T  90% /data
/dev/bcache1    8.0T  5.0T  3.0T  63% /data1
/dev/bcache2    8.0T  5.0T  3.0T  63% /data2





I would appreciate any help to identify the issues here:

1. Failures during rebalance.
2. Im-balance in data size post gluster rebalance command.

3. Another thing I would like to mention is that we had to re-balance twice as
in the initial run one of the new disks on the new node (10 TB), got 100% full.
Any thoughts as to why this could happen during rebalance? The disks on the new
node were completely blank disks before rebalance.
4. Does glusterfs rebalance data based on percentage used or absolute free disk
space available?

I can share more details/logs if required. Thanks.



-- 

Regards,
Shreyansh Shah

AlphaGrep Securities Pvt. Ltd.

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20211112/988319e9/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5050 bytes
Desc: not available
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20211112/988319e9/attachment.p7s>

Shreyansh Shah

2021-Nov-12 07:41 UTC

head link

[Gluster-users] Rebalance Issues

Hi Thomas,
Thank you for your response. Adding the required info below:

Volume Name: data
Type: Distribute
Volume ID: 75410231-bb25-4f14-bcde-caf18fce1d31
Status: Started
Snapshot Count: 0
Number of Bricks: 35
Transport-type: tcp
Bricks:
Brick1: 10.132.1.12:/data/data
Brick2: 10.132.1.12:/data1/data
Brick3: 10.132.1.12:/data2/data
Brick4: 10.132.1.12:/data3/data
Brick5: 10.132.1.13:/data/data
Brick6: 10.132.1.13:/data1/data
Brick7: 10.132.1.13:/data2/data
Brick8: 10.132.1.13:/data3/data
Brick9: 10.132.1.14:/data3/data
Brick10: 10.132.1.14:/data2/data
Brick11: 10.132.1.14:/data1/data
Brick12: 10.132.1.14:/data/data
Brick13: 10.132.1.15:/data/data
Brick14: 10.132.1.15:/data1/data
Brick15: 10.132.1.15:/data2/data
Brick16: 10.132.1.15:/data3/data
Brick17: 10.132.1.16:/data/data
Brick18: 10.132.1.16:/data1/data
Brick19: 10.132.1.16:/data2/data
Brick20: 10.132.1.16:/data3/data
Brick21: 10.132.1.17:/data3/data
Brick22: 10.132.1.17:/data2/data
Brick23: 10.132.1.17:/data1/data
Brick24: 10.132.1.17:/data/data
Brick25: 10.132.1.18:/data/data
Brick26: 10.132.1.18:/data1/data
Brick27: 10.132.1.18:/data2/data
Brick28: 10.132.1.18:/data3/data
Brick29: 10.132.1.19:/data3/data
Brick30: 10.132.1.19:/data2/data
Brick31: 10.132.1.19:/data1/data
Brick32: 10.132.1.19:/data/data
Brick33: 10.132.0.19:/data1/data
Brick34: 10.132.0.19:/data2/data
Brick35: 10.132.0.19:/data/data
Options Reconfigured:
performance.cache-refresh-timeout: 60
performance.cache-size: 8GB
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: on
storage.health-check-interval: 60
server.keepalive-time: 60
client.keepalive-time: 60
network.ping-timeout: 90

server.event-threads: 2

On Fri, Nov 12, 2021 at 1:08 PM Thomas B?tzler <t.baetzler at bringe.com>
wrote:
> Hello Shreyansh Shah,
>
>
>
> How is your gluster set up? I think it would be very helpful for our
> understanding of your setup to see the output of ?gluster v info all?
> annotated with brick sizes.
>
> Otherwise, how could anybody answer your questions?
>
> Best regards,
>
> i.A. Thomas B?tzler
>
> --
>
> BRINGE Informationstechnik GmbH
>
> Zur Seeplatte 12
>
> D-76228 Karlsruhe
>
> Germany
>
>
>
> Fon: +49 721 94246-0
>
> Fon: +49 171 5438457
>
> Fax: +49 721 94246-66
>
> Web: http://www.bringe.de/
>
>
>
> Gesch?ftsf?hrer: Dipl.-Ing. (FH) Martin Bringe
>
> Ust.Id: DE812936645, HRB 108943 Mannheim
>
>
>
> *Von:* Gluster-users <gluster-users-bounces at gluster.org> *Im
Auftrag von *Shreyansh
> Shah
> *Gesendet:* Freitag, 12. November 2021 07:31
> *An:* gluster-users <gluster-users at gluster.org>
> *Betreff:* [Gluster-users] Rebalance Issues
>
>
>
> Hi All,
>
> I have a distributed glusterfs 5.10 setup with 8 nodes and each of them
> having 1 TB disk and 3 disk of 4TB each (so total 22 TB per node).
> Recently I added  a new node with 3 additional disks (1 x 10TB + 2 x 8TB).
> Post this I ran rebalance and it does not seem to complete successfully
> (adding result of gluster volume rebalance data status below). On a few
> nodes it shows failed and on the node it is showing as completed the
> rebalance is not even.
>
> root at gluster6-new:~# gluster v rebalance data status
>                                     Node Rebalanced-files          size
>     scanned      failures       skipped               status  run time in
> h:m:s
>                                ---------      -----------   -----------
> -----------   -----------   -----------         ------------
> --------------
>                                localhost            22836         2.4TB
>      136149             1         27664          in progress       14:48:56
>                              10.132.1.15               80         5.0MB
>        1134             3           121               failed        1:08:33
>                              10.132.1.14            18573         2.5TB
>      137827            20         31278          in progress       14:48:56
>                              10.132.1.12              607        61.3MB
>        1667             5            60               failed        1:08:33
>       gluster4.c.storage-186813.internal            26479         2.8TB
>      148402            14         38271          in progress       14:48:56
>                              10.132.1.18               86         6.4MB
>        1094             5            70               failed        1:08:33
>                              10.132.1.17            21953         2.6TB
>      131573             4         26818          in progress       14:48:56
>                              10.132.1.16               56        45.0MB
>        1203             5           111               failed        1:08:33
>                              10.132.0.19             3108         1.9TB
>      224707             2        160148            completed       13:56:31
> Estimated time left for rebalance to complete :       22:04:28
>
>
> Adding 'df -h'  output for the node that has been marked as
completed in
> the above status command, the data does not seem to be evenly balanced.
>
> root at gluster-9:~$ df -h /data*
> Filesystem      Size  Used Avail Use% Mounted on
> /dev/bcache0     10T  8.9T  1.1T  90% /data
> /dev/bcache1    8.0T  5.0T  3.0T  63% /data1
> /dev/bcache2    8.0T  5.0T  3.0T  63% /data2
>
>
>
> I would appreciate any help to identify the issues here:
>
> 1. Failures during rebalance.
> 2. Im-balance in data size post gluster rebalance command.
>
> 3. Another thing I would like to mention is that we had to re-balance
> twice as in the initial run one of the new disks on the new node (10 TB),
> got 100% full. Any thoughts as to why this could happen during rebalance?
> The disks on the new node were completely blank disks before rebalance.
> 4. Does glusterfs rebalance data based on percentage used or absolute free
> disk space available?
>
> I can share more details/logs if required. Thanks.
>
> --
>
> Regards,
> Shreyansh Shah
>
> *AlphaGrep Securities Pvt. Ltd.*
>

-- 
Regards,
Shreyansh Shah
*AlphaGrep Securities Pvt. Ltd.*
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20211112/6163b33e/attachment.html>

Gluster users - Nov 2021 - Rebalance Issues

[Gluster-users] Rebalance Issues

[Gluster-users] Rebalance Issues

[Gluster-users] Rebalance Issues