Hi Thomas,
Thank you for your input and suggestions.
This is a production setup. We have small as well as large files present
(larger files could be upto around 30-40 GB max). The 10TB disk was the
largest disk on that node hence that filling up before the smaller disks
get filled looks like some issue in glusterfs rebalance.
We have tried sharding earlier but that caused way more problems
than solving the issues at hand hence we decided to stay away from using it.
On Fri, Nov 12, 2021 at 2:22 PM Thomas B?tzler <t.baetzler at bringe.com>
wrote:
> Hello Shreyansh Shah,
>
> I?m assuming you configured this as a test system, since there?s no
> redundancy in this setup? From my own experience I?d say that gluster tries
> to fill bricks evenly, i.e. the 10TB disk should get 2.5 times more data
> than the 4TB disk in a perfect world with lots of smallish files. I say
> ?should?, because this really depends on the hashing algorithm that Gluster
> uses to decide on where to store a file. If you have lots of little files,
> you?ll get a good distribution across all disks. If you have large files,
> however, you might end up with several of them put together on the smallest
> drive. That might happen when you rebalance, too.
>
> There is an option ? feature.sharding - to split large files into smaller
> parts (?shards?) that are then in turn distributed across the bricks in
> your gluster. It might help with overfilling on your smaller drives.
> However, at least until Gluster 7.9 it was severely broken in that delete
> operations didn?t actually delete all of the shards that were allocated for
> large files.
>
> As for rebalance breaking down ? yeah, been there, done that. We were in
> the unenviable position of having to add two more nodes to a 4x2
> distribute-replicate gluster of about 60TB with ~ 150M of small files.
> Rebalancing took 5 weeks, mainly because we had to restart it twice.
>
> Best regards,
>
> i.A. Thomas B?tzler
>
> --
>
> BRINGE Informationstechnik GmbH
>
> Zur Seeplatte 12
>
> D-76228 Karlsruhe
>
> Germany
>
>
>
> Fon: +49 721 94246-0
>
> Fon: +49 171 5438457
>
> Fax: +49 721 94246-66
>
> Web: http://www.bringe.de/
>
>
>
> Gesch?ftsf?hrer: Dipl.-Ing. (FH) Martin Bringe
>
> Ust.Id: DE812936645, HRB 108943 Mannheim
>
>
>
> *Von:* Shreyansh Shah <shreyansh.shah at alpha-grep.com>
> *Gesendet:* Freitag, 12. November 2021 08:42
> *An:* Thomas B?tzler <t.baetzler at bringe.com>
> *Cc:* gluster-users <gluster-users at gluster.org>
> *Betreff:* Re: [Gluster-users] Rebalance Issues
>
>
>
> Hi Thomas,
> Thank you for your response. Adding the required info below:
>
> Volume Name: data
> Type: Distribute
> Volume ID: 75410231-bb25-4f14-bcde-caf18fce1d31
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 35
> Transport-type: tcp
> Bricks:
> Brick1: 10.132.1.12:/data/data
> Brick2: 10.132.1.12:/data1/data
> Brick3: 10.132.1.12:/data2/data
> Brick4: 10.132.1.12:/data3/data
> Brick5: 10.132.1.13:/data/data
> Brick6: 10.132.1.13:/data1/data
> Brick7: 10.132.1.13:/data2/data
> Brick8: 10.132.1.13:/data3/data
> Brick9: 10.132.1.14:/data3/data
> Brick10: 10.132.1.14:/data2/data
> Brick11: 10.132.1.14:/data1/data
> Brick12: 10.132.1.14:/data/data
> Brick13: 10.132.1.15:/data/data
> Brick14: 10.132.1.15:/data1/data
> Brick15: 10.132.1.15:/data2/data
> Brick16: 10.132.1.15:/data3/data
> Brick17: 10.132.1.16:/data/data
> Brick18: 10.132.1.16:/data1/data
> Brick19: 10.132.1.16:/data2/data
> Brick20: 10.132.1.16:/data3/data
> Brick21: 10.132.1.17:/data3/data
> Brick22: 10.132.1.17:/data2/data
> Brick23: 10.132.1.17:/data1/data
> Brick24: 10.132.1.17:/data/data
> Brick25: 10.132.1.18:/data/data
> Brick26: 10.132.1.18:/data1/data
> Brick27: 10.132.1.18:/data2/data
> Brick28: 10.132.1.18:/data3/data
> Brick29: 10.132.1.19:/data3/data
> Brick30: 10.132.1.19:/data2/data
> Brick31: 10.132.1.19:/data1/data
> Brick32: 10.132.1.19:/data/data
> Brick33: 10.132.0.19:/data1/data
> Brick34: 10.132.0.19:/data2/data
> Brick35: 10.132.0.19:/data/data
> Options Reconfigured:
> performance.cache-refresh-timeout: 60
> performance.cache-size: 8GB
> transport.address-family: inet
> nfs.disable: on
> performance.client-io-threads: on
> storage.health-check-interval: 60
> server.keepalive-time: 60
> client.keepalive-time: 60
> network.ping-timeout: 90
>
> server.event-threads: 2
>
>
>
> On Fri, Nov 12, 2021 at 1:08 PM Thomas B?tzler <t.baetzler at
bringe.com>
> wrote:
>
> Hello Shreyansh Shah,
>
>
>
> How is your gluster set up? I think it would be very helpful for our
> understanding of your setup to see the output of ?gluster v info all?
> annotated with brick sizes.
>
> Otherwise, how could anybody answer your questions?
>
> Best regards,
>
> i.A. Thomas B?tzler
>
> --
>
> BRINGE Informationstechnik GmbH
>
> Zur Seeplatte 12
>
> D-76228 Karlsruhe
>
> Germany
>
>
>
> Fon: +49 721 94246-0
>
> Fon: +49 171 5438457
>
> Fax: +49 721 94246-66
>
> Web: http://www.bringe.de/
>
>
>
> Gesch?ftsf?hrer: Dipl.-Ing. (FH) Martin Bringe
>
> Ust.Id: DE812936645, HRB 108943 Mannheim
>
>
>
> *Von:* Gluster-users <gluster-users-bounces at gluster.org> *Im
Auftrag von *Shreyansh
> Shah
> *Gesendet:* Freitag, 12. November 2021 07:31
> *An:* gluster-users <gluster-users at gluster.org>
> *Betreff:* [Gluster-users] Rebalance Issues
>
>
>
> Hi All,
>
> I have a distributed glusterfs 5.10 setup with 8 nodes and each of them
> having 1 TB disk and 3 disk of 4TB each (so total 22 TB per node).
> Recently I added a new node with 3 additional disks (1 x 10TB + 2 x 8TB).
> Post this I ran rebalance and it does not seem to complete successfully
> (adding result of gluster volume rebalance data status below). On a few
> nodes it shows failed and on the node it is showing as completed the
> rebalance is not even.
>
> root at gluster6-new:~# gluster v rebalance data status
> Node Rebalanced-files size
> scanned failures skipped status run time in
> h:m:s
> --------- ----------- -----------
> ----------- ----------- ----------- ------------
> --------------
> localhost 22836 2.4TB
> 136149 1 27664 in progress 14:48:56
> 10.132.1.15 80 5.0MB
> 1134 3 121 failed 1:08:33
> 10.132.1.14 18573 2.5TB
> 137827 20 31278 in progress 14:48:56
> 10.132.1.12 607 61.3MB
> 1667 5 60 failed 1:08:33
> gluster4.c.storage-186813.internal 26479 2.8TB
> 148402 14 38271 in progress 14:48:56
> 10.132.1.18 86 6.4MB
> 1094 5 70 failed 1:08:33
> 10.132.1.17 21953 2.6TB
> 131573 4 26818 in progress 14:48:56
> 10.132.1.16 56 45.0MB
> 1203 5 111 failed 1:08:33
> 10.132.0.19 3108 1.9TB
> 224707 2 160148 completed 13:56:31
> Estimated time left for rebalance to complete : 22:04:28
>
>
> Adding 'df -h' output for the node that has been marked as
completed in
> the above status command, the data does not seem to be evenly balanced.
>
> root at gluster-9:~$ df -h /data*
> Filesystem Size Used Avail Use% Mounted on
> /dev/bcache0 10T 8.9T 1.1T 90% /data
> /dev/bcache1 8.0T 5.0T 3.0T 63% /data1
> /dev/bcache2 8.0T 5.0T 3.0T 63% /data2
>
>
>
> I would appreciate any help to identify the issues here:
>
> 1. Failures during rebalance.
> 2. Im-balance in data size post gluster rebalance command.
>
> 3. Another thing I would like to mention is that we had to re-balance
> twice as in the initial run one of the new disks on the new node (10 TB),
> got 100% full. Any thoughts as to why this could happen during rebalance?
> The disks on the new node were completely blank disks before rebalance.
> 4. Does glusterfs rebalance data based on percentage used or absolute free
> disk space available?
>
> I can share more details/logs if required. Thanks.
>
> --
>
> Regards,
> Shreyansh Shah
>
> *AlphaGrep Securities Pvt. Ltd.*
>
>
>
>
> --
>
> Regards,
> Shreyansh Shah
>
> *AlphaGrep Securities Pvt. Ltd.*
>
--
Regards,
Shreyansh Shah
*AlphaGrep Securities Pvt. Ltd.*
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20211112/38955b75/attachment.html>