thr3ads.net - Gluster users - [Gluster-users] gluster rebalance taking multiple days [Dec 2010]

If this information is useful, please help other people find it:
Share via:

Michael Robbert

2010-Dec-07 00:50 UTC

[Gluster-users] gluster rebalance taking multiple days

How long should a rebalance take? I know that it depends so lets take this
example. 4 servers, 1 brick per server. here is the df -i output from the
servers:

[root at ra5 ~]# pdsh -g rack7 "df -i|grep brick"
iosrv-7-1:                      366288896 2720139 363568757    1% /mnt/brick1
iosrv-7-4:                      366288896 3240868 363048028    1% /mnt/brick4
iosrv-7-2:                      366288896 2594165 363694731    1% /mnt/brick2
iosrv-7-3:                      366288896 3267152 363021744    1% /mnt/brick3

So, it looks like there are roughly 10 million files. I have a rebalance running
on one of the servers since last Friday and this is what the status looks like
right now:

[root at iosrv-7-2 ~]# gluster volume rebalance gluster-test status
rebalance step 1: layout fix in progress: fixed layout 149531740

As a side note I started this rebalance when I noticed that about half of my
clients are missing a certain set of files. Upon further investigation I found
that a different set of clients are missing different data. This problem
happened after many problems getting an upgrade to 3.1.1 working. Unfortunately
I don't remember which version was running when I was last able to write to
this volume.

Any thoughts?

Mike

Jacob Shucart

2010-Dec-07 15:43 UTC

head link

[Gluster-users] gluster rebalance taking multiple days

Mike,

I know Allen had sent you something within the support system regarding
this.  Based on his email, it sounds to me like it is possible something
underneath Gluster has caused the rebalance to hang.  This isn't any sort
of fatal problem as you can simply restart the rebalance once that has
been resolved and it will resume what it was doing.  The amount of time it
takes to rebalance depends on a few things like the number of files, the
size of the data, the network speed, and also the extent to which things
were unbalanced(if you are adding a lot more servers as opposed to a
couple it might take longer).  Generally speaking, if you double the
number of servers then the rebalance will probably move half of your data,
so it will take as long as that would normally take.

Jacob Shucart | Gluster
Systems Engineer
E-Mail	: Jacob at gluster.com
Direct	: (408)770-1504

-----Original Message-----
From: gluster-users-bounces at gluster.org
[mailto:gluster-users-bounces at gluster.org] On Behalf Of Michael Robbert
Sent: Monday, December 06, 2010 4:51 PM
To: gluster-users at gluster.org
Subject: [Gluster-users] gluster rebalance taking multiple days

How long should a rebalance take? I know that it depends so lets take this
example. 4 servers, 1 brick per server. here is the df -i output from the
servers:

[root at ra5 ~]# pdsh -g rack7 "df -i|grep brick"
iosrv-7-1:                      366288896 2720139 363568757    1%
/mnt/brick1
iosrv-7-4:                      366288896 3240868 363048028    1%
/mnt/brick4
iosrv-7-2:                      366288896 2594165 363694731    1%
/mnt/brick2
iosrv-7-3:                      366288896 3267152 363021744    1%
/mnt/brick3

So, it looks like there are roughly 10 million files. I have a rebalance
running on one of the servers since last Friday and this is what the
status looks like right now:

[root at iosrv-7-2 ~]# gluster volume rebalance gluster-test status
rebalance step 1: layout fix in progress: fixed layout 149531740

As a side note I started this rebalance when I noticed that about half of
my clients are missing a certain set of files. Upon further investigation
I found that a different set of clients are missing different data. This
problem happened after many problems getting an upgrade to 3.1.1 working.
Unfortunately I don't remember which version was running when I was last
able to write to this volume.

Any thoughts?

Mike

_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Craig Carl

2010-Dec-08 06:30 UTC

head link

[Gluster-users] gluster rebalance taking multiple days

All -
   It is possible to calculate in advance the number of files that will 
be moved by a re-balance. By testing performance in advance with some 
small rsyncs, and the formula below you should be able to get an 
accurate estimate of the time it will take. Starting in Gluster 3.1 it 
is possible to stop a re-balance, then restart it where it left off, see -

volume rebalance <VOLNAME> start - start rebalance of volume
<VOLNAME>
volume rebalance <VOLNAME> stop - stop rebalance of volume <VOLNAME>
volume rebalance <VOLNAME> status - rebalance status of volume
<VOLNAME>

/Basic Assumptions:-  Distribute equally distributes all the files 
across all the nodes :O
Existing nodes in the cluster are a set of "N" nodes
New nodes being added to cluster are a set of "M" nodes.
N+M will be the total number of nodes in new volume configuration.
Total files in the cluster before rebalance "X"
Number of  files on each existing nodes are "J"  = (X / N)
Number of files on each nodes after rebalance/scaling are "K"  = (X /
(N+M))
K * M = Z (Total Number of Files on set of M nodes after rebalance/scaling)
J * N = X (Total files in the cluster before rebalance/scaling)
Z / N = Y  (Total Number of Files moved from each existing nodes after 
rebalance/scaling)
( Y / J ) * 100 = Percentage of Files moved from each 'N' nodes after 
rebalance/scaling.
( J - Y ) / J * 100 = Percentage of Files existing on each 'N' nodes 
after rebalance/scaling
NOTE: "N" is obtained as not as just number of nodes but total 
sub-volumes for "distribute" translator.  "M" is number of
additional
sub-volumes added before starting rebalance and scaling.
So for multiple exports from a single server we need to calculate the 
total value moved from the server by multiplying with such number of 
exports./

Thanks,

Craig

-->
Craig Carl
Senior Systems Engineer
Gluster

On 12/06/2010 04:50 PM, Michael Robbert wrote:> How long should a rebalance take? I know that it depends so lets take this
example. 4 servers, 1 brick per server. here is the df -i output from the
servers:
>
> [root at ra5 ~]# pdsh -g rack7 "df -i|grep brick"
> iosrv-7-1:                      366288896 2720139 363568757    1%
/mnt/brick1
> iosrv-7-4:                      366288896 3240868 363048028    1%
/mnt/brick4
> iosrv-7-2:                      366288896 2594165 363694731    1%
/mnt/brick2
> iosrv-7-3:                      366288896 3267152 363021744    1%
/mnt/brick3
>
> So, it looks like there are roughly 10 million files. I have a rebalance
running on one of the servers since last Friday and this is what the status
looks like right now:
>
> [root at iosrv-7-2 ~]# gluster volume rebalance gluster-test status
> rebalance step 1: layout fix in progress: fixed layout 149531740
>
> As a side note I started this rebalance when I noticed that about half of
my clients are missing a certain set of files. Upon further investigation I
found that a different set of clients are missing different data. This problem
happened after many problems getting an upgrade to 3.1.1 working. Unfortunately
I don't remember which version was running when I was last able to write to
this volume.
>
> Any thoughts?
>
> Mike
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Gluster users - Dec 2010 - gluster rebalance taking multiple days

[Gluster-users] gluster rebalance taking multiple days

[Gluster-users] gluster rebalance taking multiple days

[Gluster-users] gluster rebalance taking multiple days