Hi All, I have a distributed glusterfs 5.10 setup with 8 nodes and each of them having 1 TB disk and 3 disk of 4TB each (so total 22 TB per node). Recently I added a new node with 3 additional disks (1 x 10TB + 2 x 8TB). Post this I ran rebalance and it does not seem to complete successfully (adding result of gluster volume rebalance data status below). On a few nodes it shows failed and on the node it is showing as completed the rebalance is not even. root at gluster6-new:~# gluster v rebalance data status Node Rebalanced-files size scanned failures skipped status run time in h:m:s --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 22836 2.4TB 136149 1 27664 in progress 14:48:56 10.132.1.15 80 5.0MB 1134 3 121 failed 1:08:33 10.132.1.14 18573 2.5TB 137827 20 31278 in progress 14:48:56 10.132.1.12 607 61.3MB 1667 5 60 failed 1:08:33 gluster4.c.storage-186813.internal 26479 2.8TB 148402 14 38271 in progress 14:48:56 10.132.1.18 86 6.4MB 1094 5 70 failed 1:08:33 10.132.1.17 21953 2.6TB 131573 4 26818 in progress 14:48:56 10.132.1.16 56 45.0MB 1203 5 111 failed 1:08:33 10.132.0.19 3108 1.9TB 224707 2 160148 completed 13:56:31 Estimated time left for rebalance to complete : 22:04:28 Adding 'df -h' output for the node that has been marked as completed in the above status command, the data does not seem to be evenly balanced. root at gluster-9:~$ df -h /data* Filesystem Size Used Avail Use% Mounted on /dev/bcache0 10T 8.9T 1.1T 90% /data /dev/bcache1 8.0T 5.0T 3.0T 63% /data1 /dev/bcache2 8.0T 5.0T 3.0T 63% /data2 I would appreciate any help to identify the issues here: 1. Failures during rebalance. 2. Im-balance in data size post gluster rebalance command. 3. Another thing I would like to mention is that we had to re-balance twice as in the initial run one of the new disks on the new node (10 TB), got 100% full. Any thoughts as to why this could happen during rebalance? The disks on the new node were completely blank disks before rebalance. 4. Does glusterfs rebalance data based on percentage used or absolute free disk space available? I can share more details/logs if required. Thanks. -- Regards, Shreyansh Shah *AlphaGrep Securities Pvt. Ltd.* -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20211112/7ce5129c/attachment.html>
Hello Shreyansh Shah, How is your gluster set up? I think it would be very helpful for our understanding of your setup to see the output of ?gluster v info all? annotated with brick sizes. Otherwise, how could anybody answer your questions? Best regards, i.A. Thomas B?tzler -- BRINGE Informationstechnik GmbH Zur Seeplatte 12 D-76228 Karlsruhe Germany Fon: +49 721 94246-0 Fon: +49 171 5438457 Fax: +49 721 94246-66 Web: <http://www.bringe.de/> http://www.bringe.de/ Gesch?ftsf?hrer: Dipl.-Ing. (FH) Martin Bringe Ust.Id: DE812936645, HRB 108943 Mannheim Von: Gluster-users <gluster-users-bounces at gluster.org> Im Auftrag von Shreyansh Shah Gesendet: Freitag, 12. November 2021 07:31 An: gluster-users <gluster-users at gluster.org> Betreff: [Gluster-users] Rebalance Issues Hi All, I have a distributed glusterfs 5.10 setup with 8 nodes and each of them having 1 TB disk and 3 disk of 4TB each (so total 22 TB per node). Recently I added a new node with 3 additional disks (1 x 10TB + 2 x 8TB). Post this I ran rebalance and it does not seem to complete successfully (adding result of gluster volume rebalance data status below). On a few nodes it shows failed and on the node it is showing as completed the rebalance is not even. root at gluster6-new:~# gluster v rebalance data status Node Rebalanced-files size scanned failures skipped status run time in h:m:s --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 22836 2.4TB 136149 1 27664 in progress 14:48:56 10.132.1.15 80 5.0MB 1134 3 121 failed 1:08:33 10.132.1.14 18573 2.5TB 137827 20 31278 in progress 14:48:56 10.132.1.12 607 61.3MB 1667 5 60 failed 1:08:33 gluster4.c.storage-186813.internal 26479 2.8TB 148402 14 38271 in progress 14:48:56 10.132.1.18 86 6.4MB 1094 5 70 failed 1:08:33 10.132.1.17 21953 2.6TB 131573 4 26818 in progress 14:48:56 10.132.1.16 56 45.0MB 1203 5 111 failed 1:08:33 10.132.0.19 3108 1.9TB 224707 2 160148 completed 13:56:31 Estimated time left for rebalance to complete : 22:04:28 Adding 'df -h' output for the node that has been marked as completed in the above status command, the data does not seem to be evenly balanced. root at gluster-9:~$ df -h /data* Filesystem Size Used Avail Use% Mounted on /dev/bcache0 10T 8.9T 1.1T 90% /data /dev/bcache1 8.0T 5.0T 3.0T 63% /data1 /dev/bcache2 8.0T 5.0T 3.0T 63% /data2 I would appreciate any help to identify the issues here: 1. Failures during rebalance. 2. Im-balance in data size post gluster rebalance command. 3. Another thing I would like to mention is that we had to re-balance twice as in the initial run one of the new disks on the new node (10 TB), got 100% full. Any thoughts as to why this could happen during rebalance? The disks on the new node were completely blank disks before rebalance. 4. Does glusterfs rebalance data based on percentage used or absolute free disk space available? I can share more details/logs if required. Thanks. -- Regards, Shreyansh Shah AlphaGrep Securities Pvt. Ltd. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20211112/988319e9/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5050 bytes Desc: not available URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20211112/988319e9/attachment.p7s>
Hi Thomas, Thank you for your response. Adding the required info below: Volume Name: data Type: Distribute Volume ID: 75410231-bb25-4f14-bcde-caf18fce1d31 Status: Started Snapshot Count: 0 Number of Bricks: 35 Transport-type: tcp Bricks: Brick1: 10.132.1.12:/data/data Brick2: 10.132.1.12:/data1/data Brick3: 10.132.1.12:/data2/data Brick4: 10.132.1.12:/data3/data Brick5: 10.132.1.13:/data/data Brick6: 10.132.1.13:/data1/data Brick7: 10.132.1.13:/data2/data Brick8: 10.132.1.13:/data3/data Brick9: 10.132.1.14:/data3/data Brick10: 10.132.1.14:/data2/data Brick11: 10.132.1.14:/data1/data Brick12: 10.132.1.14:/data/data Brick13: 10.132.1.15:/data/data Brick14: 10.132.1.15:/data1/data Brick15: 10.132.1.15:/data2/data Brick16: 10.132.1.15:/data3/data Brick17: 10.132.1.16:/data/data Brick18: 10.132.1.16:/data1/data Brick19: 10.132.1.16:/data2/data Brick20: 10.132.1.16:/data3/data Brick21: 10.132.1.17:/data3/data Brick22: 10.132.1.17:/data2/data Brick23: 10.132.1.17:/data1/data Brick24: 10.132.1.17:/data/data Brick25: 10.132.1.18:/data/data Brick26: 10.132.1.18:/data1/data Brick27: 10.132.1.18:/data2/data Brick28: 10.132.1.18:/data3/data Brick29: 10.132.1.19:/data3/data Brick30: 10.132.1.19:/data2/data Brick31: 10.132.1.19:/data1/data Brick32: 10.132.1.19:/data/data Brick33: 10.132.0.19:/data1/data Brick34: 10.132.0.19:/data2/data Brick35: 10.132.0.19:/data/data Options Reconfigured: performance.cache-refresh-timeout: 60 performance.cache-size: 8GB transport.address-family: inet nfs.disable: on performance.client-io-threads: on storage.health-check-interval: 60 server.keepalive-time: 60 client.keepalive-time: 60 network.ping-timeout: 90 server.event-threads: 2 On Fri, Nov 12, 2021 at 1:08 PM Thomas B?tzler <t.baetzler at bringe.com> wrote:> Hello Shreyansh Shah, > > > > How is your gluster set up? I think it would be very helpful for our > understanding of your setup to see the output of ?gluster v info all? > annotated with brick sizes. > > Otherwise, how could anybody answer your questions? > > Best regards, > > i.A. Thomas B?tzler > > -- > > BRINGE Informationstechnik GmbH > > Zur Seeplatte 12 > > D-76228 Karlsruhe > > Germany > > > > Fon: +49 721 94246-0 > > Fon: +49 171 5438457 > > Fax: +49 721 94246-66 > > Web: http://www.bringe.de/ > > > > Gesch?ftsf?hrer: Dipl.-Ing. (FH) Martin Bringe > > Ust.Id: DE812936645, HRB 108943 Mannheim > > > > *Von:* Gluster-users <gluster-users-bounces at gluster.org> *Im Auftrag von *Shreyansh > Shah > *Gesendet:* Freitag, 12. November 2021 07:31 > *An:* gluster-users <gluster-users at gluster.org> > *Betreff:* [Gluster-users] Rebalance Issues > > > > Hi All, > > I have a distributed glusterfs 5.10 setup with 8 nodes and each of them > having 1 TB disk and 3 disk of 4TB each (so total 22 TB per node). > Recently I added a new node with 3 additional disks (1 x 10TB + 2 x 8TB). > Post this I ran rebalance and it does not seem to complete successfully > (adding result of gluster volume rebalance data status below). On a few > nodes it shows failed and on the node it is showing as completed the > rebalance is not even. > > root at gluster6-new:~# gluster v rebalance data status > Node Rebalanced-files size > scanned failures skipped status run time in > h:m:s > --------- ----------- ----------- > ----------- ----------- ----------- ------------ > -------------- > localhost 22836 2.4TB > 136149 1 27664 in progress 14:48:56 > 10.132.1.15 80 5.0MB > 1134 3 121 failed 1:08:33 > 10.132.1.14 18573 2.5TB > 137827 20 31278 in progress 14:48:56 > 10.132.1.12 607 61.3MB > 1667 5 60 failed 1:08:33 > gluster4.c.storage-186813.internal 26479 2.8TB > 148402 14 38271 in progress 14:48:56 > 10.132.1.18 86 6.4MB > 1094 5 70 failed 1:08:33 > 10.132.1.17 21953 2.6TB > 131573 4 26818 in progress 14:48:56 > 10.132.1.16 56 45.0MB > 1203 5 111 failed 1:08:33 > 10.132.0.19 3108 1.9TB > 224707 2 160148 completed 13:56:31 > Estimated time left for rebalance to complete : 22:04:28 > > > Adding 'df -h' output for the node that has been marked as completed in > the above status command, the data does not seem to be evenly balanced. > > root at gluster-9:~$ df -h /data* > Filesystem Size Used Avail Use% Mounted on > /dev/bcache0 10T 8.9T 1.1T 90% /data > /dev/bcache1 8.0T 5.0T 3.0T 63% /data1 > /dev/bcache2 8.0T 5.0T 3.0T 63% /data2 > > > > I would appreciate any help to identify the issues here: > > 1. Failures during rebalance. > 2. Im-balance in data size post gluster rebalance command. > > 3. Another thing I would like to mention is that we had to re-balance > twice as in the initial run one of the new disks on the new node (10 TB), > got 100% full. Any thoughts as to why this could happen during rebalance? > The disks on the new node were completely blank disks before rebalance. > 4. Does glusterfs rebalance data based on percentage used or absolute free > disk space available? > > I can share more details/logs if required. Thanks. > > -- > > Regards, > Shreyansh Shah > > *AlphaGrep Securities Pvt. Ltd.* >-- Regards, Shreyansh Shah *AlphaGrep Securities Pvt. Ltd.* -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20211112/6163b33e/attachment.html>