John -
Great questions, thanks. If you don't rebalance you will get lots of stub
files, creating and redirecting using stub files will slow your environment
down. When you add one node you will always move 50% of your files during a
rebalance.
Counterintuitively (I think) they more nodes you add at a single time the fewer
files that get moved. Details are below. You can use cron to schedule when to
run the rebalance, when load is getting high run "volume rebalance
<VOLNAME> stop", w hen load is low you would run "volume
rebalance <VOLNAME> start". The rebalance will start again where it
stopped.
======================================================
Basic Assumptions:- Distribute equally distributes all the files across all the
nodes :O
Existing nodes in the cluster are a set of "N" nodes
New nodes being added to cluster are a set of "M" nodes.
N+M will be the total number of nodes in new volume configuration.
Total files in the cluster before rebalance "X"
Number of files on each existing nodes are "J" = (X / N)
Number of files on each nodes after rebalance/scaling are "K" = (X /
(N+M))
K * M = Z (Total Number of Files on set of M nodes after rebalance/scaling)
J * N = X (Total files in the cluster before rebalance/scaling)
Z / N = Y (Total Number of Files moved from each existing nodes after
rebalance/scaling)
( Y / J ) * 100 = Percentage of Files moved from each 'N' nodes after
rebalance/scaling.
( J - Y ) / J * 100 = Percentage of Files existing on each 'N' nodes
after rebalance/scaling
NOTE: "N" is obtained as not as just number of nodes but total
sub-volumes for "distribute" translator. "M" is number of
additional sub-volumes added before starting rebalance and scaling.
So for multiple exports from a single server we need to calculate the total
value moved from the server by multiplying with such number of exports.
Thanks,
Craig
-->
Craig Carl
Senior Systems Engineer
Gluster
From: "John Lao" <jlao at cloud9analytics.com>
To: gluster-users at gluster.org
Sent: Wednesday, November 10, 2010 1:36:02 PM
Subject: [Gluster-users] Questions about expanding a volume
Hi,
I am currently running glusterfs 3.1 with 3 bricks in distribute mode and I am
thinking of adding a 4th brick. How does gluster treat a new brick when it is
added to an existing volume? If I do not rebalance the volume will it send
all/most new data to the new brick or will it still distribute it evenly?
Also, what's the performance impact on the volume when running a rebalance?
We have about 5.5TB of data, most files are less than 1 meg.
Thanks,
John Lao
_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users