thr3ads.net - Gluster users - [Gluster-users] Is Gluster the wrong solution for us? [Dec 2013]

If this information is useful, please help other people find it:
Share via:

Scott Smith

2013-Dec-12 01:15 UTC

[Gluster-users] Is Gluster the wrong solution for us?

We are about to abandon GlusterFS as a solution for our object storage needs.
I'm hoping to get some feedback to tell me whether we have missed something
and are making the wrong decision. We're already a year into this project
after evaluating a number of solutions. I'd like not to abandon GlusterFS
if we just misunderstand how it works.

Our use case is fairly straight forward. We need to save a bunch of somewhat
large files (1MB-100MB). For the most part, these files are write once, read
several times. Our initial store is 80TB, but we expect to go to roughly 320TB
fairly quickly. After that, we expect to be adding another 80TB every few
months. We are using some COTS servers which we add in pairs; each server has
40TB of usable storage. We intend to keep two copies of each file. We
currently run 4TB bricks

In our somewhat limited test environment, GlusterFS seemed to work well. And,
our initial introduction of GlusterFS into our production environment went well.
We had our initial 2 server (80TB) cluster about 50% full and things seemed to
be going well.

Then we added another pair of servers (for a total of 160TB). This went fine
until we did the rebalance. We were running 3.3.1. We ran into the handle leak
problem (which unfortunately we didn't know about beforehand). We also
found that if any of the bricks went offline while the rebalance was going on,
then files were lost or they lost their permissions. We still don't know
why some of the bricks went offline, but they did and we have verified in our
test environment that this is sufficient to cause the corruption problem.

The good news is that we think both of these problems got fixed in 3.4.1. So
why are we leaving?

In trying to figure out what was going on with our GlusterFS system after the
disastrous rebalance, we ran across two posts. The first one was
http://hekafs.org/index.php/2012/03/glusterfs-algorithms-distribution/. If we
understand it correctly, anytime you add new storage servers to your cluster,
you have to do a rebalance and that rebalance will require a minimum of 50% of
the data in the cluster to be moved to make the hashing algorithms work. This
means that when we have a 320TB cluster and add another 80TB, we have to move at
least 160TB just to get things back into balance. Our estimate is that that
will take months. It probably won't finish before we need to add another
80TB.

The other post we ran across was
http://www.gluster.org/community/documentation/index.php/Planning34/ElasticBrick.
This post seems to confirm our understanding of the rebalance. It appears to be
a discussion of the rebalance problem and a possible solution. It was
apparently discussed for 3.4, but didn't make the cut.

I'd be happy to find out that we just got it wrong. Tell me that
rebalancing doesn't work the way we think. Or maybe we should configure
things different or something.

My problem is that if GlusterFS isn't good for starting with a small cluster
(80TB) and growing over time to half a petabyte, what is the use case it is
intended for? Do you really have to start out with the amount of storage you
think you'll need in the long-run and just fill it up as you go? That's
why I'm nervous about our understanding of the rebalance. It's hard to
believe it works this way (at least from our perspective).

We have a lot of man hours into writing code and putting infrastructure in for
GlusterFS. We can likely reuse much of it for another system. I would just
like to know that we really do understand the rebalance and that it really works
the way I described it before we start evaluating other object store solutions.

Comments?

Scott

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131212/0773da7e/attachment.html>

Anand Avati

2013-Dec-12 02:27 UTC

head link

[Gluster-users] Is Gluster the wrong solution for us?

Scott,
It is really unfortunate that you were bit by that bug. I am hoping to
convince you to at least not abandon the deployment this early with some
responses:

- Note that you typically don't have to proactively rebalance your volume.
If your new data comes in the form of new directories, they naturally
spread out. Even old directories will consume the new servers once
min-free-disk is reached.

- Rebalance algorithm has a layout-overlap-maximizer function to minimize
the amount of data moved. The diagram in the blog post you linked is
describing old behavior. The overlap maximizer can be found here:
https://github.com/gluster/glusterfs/blob/master/xlators/cluster/dht/src/dht-selfheal.c#L633

- Other than the overlap maximizer, there are enhancements to cancel
negative moves (moving to serve with lesser free space) which also
contribute significantly towards minimizing "churn".

- There have been a lot of bug fixes in rebalance in the master branch, and
we are actively backporting them into 3.4.2. I am fairly confident you will
have a much smoother experience with 3.4.2.

Hope that helps!
Avati



On Wed, Dec 11, 2013 at 5:15 PM, Scott Smith <ssmith at
mainstreamdata.com>wrote:
>  We are about to abandon GlusterFS as a solution for our object storage
> needs.  I?m hoping to get some feedback to tell me whether we have missed
> something and are making the wrong decision.  We?re already a year into
> this project after evaluating a number of solutions.  I?d like not to
> abandon GlusterFS if we just misunderstand how it works.
>
>
>
> Our use case is fairly straight forward.  We need to save a bunch of
> somewhat large files (1MB-100MB).  For the most part, these files are write
> once, read several times.  Our initial store is 80TB, but we expect to go
> to roughly 320TB fairly quickly.  After that, we expect to be adding
> another 80TB every few months.  We are using some COTS servers which we add
> in pairs; each server has 40TB of usable storage.  We intend to keep two
> copies of each file.  We currently run 4TB bricks
>
>
>
> In our somewhat limited test environment, GlusterFS seemed to work well.
> And, our initial introduction of GlusterFS into our production environment
> went well.  We had our initial 2 server (80TB) cluster about 50% full and
> things seemed to be going well.
>
>
>
> Then we added another pair of servers (for a total of 160TB).  This went
> fine until we did the rebalance.  We were running 3.3.1.  We ran into the
> handle leak problem (which unfortunately we didn?t know about beforehand).
> We also found that if any of the bricks went offline while the rebalance
> was going on, then files were lost or they lost their permissions.  We
> still don?t know why some of the bricks went offline, but they did and we
> have verified in our test environment that this is sufficient to cause the
> corruption problem.
>
>
>
> The good news is that we think both of these problems got fixed in 3.4.1.
> So why are we leaving?
>
>
>
> In trying to figure out what was going on with our GlusterFS system after
> the disastrous rebalance, we ran across two posts.  The first one was
> http://hekafs.org/index.php/2012/03/glusterfs-algorithms-distribution/.
> If we understand it correctly, anytime you add new storage servers to your
> cluster, you have to do a rebalance and that rebalance will require a
> minimum of 50% of the data in the cluster to be moved to make the hashing
> algorithms work.  This means that when we have a 320TB cluster and add
> another 80TB, we have to move at least 160TB just to get things back into
> balance.  Our estimate is that that will take months.  It probably won?t
> finish before we need to add another 80TB.
>
>
>
> The other post we ran across was
>
http://www.gluster.org/community/documentation/index.php/Planning34/ElasticBrick.
> This post seems to confirm our understanding of the rebalance.  It appears
> to be a discussion of the rebalance problem and a possible solution.  It
> was apparently discussed for 3.4, but didn?t make the cut.
>
>
>
> I?d be happy to find out that we just got it wrong.  Tell me that
> rebalancing doesn?t work the way we think.  Or maybe we should configure
> things different or something.
>
>
>
> My problem is that if GlusterFS isn?t good for starting with a small
> cluster (80TB) and growing over time to half a petabyte, what is the use
> case it is intended for?  Do you really have to start out with the amount
> of storage you think you?ll need in the long-run and just fill it up as you
> go?  That?s why I?m nervous about our understanding of the rebalance.  It?s
> hard to believe it works this way (at least from our perspective).
>
>
>
> We have a lot of man hours into writing code and putting infrastructure in
> for GlusterFS.  We can likely reuse much of it for another system.  I would
> just like to know that we really do understand the rebalance and that it
> really works the way I described it before we start evaluating other object
> store solutions.
>
>
>
> Comments?
>
>
>
> Scott
>
>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131211/2ec2c821/attachment.html>

Franco Broi

2013-Dec-12 02:31 UTC

head link

[Gluster-users] Is Gluster the wrong solution for us?

How long-lived are your files? We have 400TB and are just about to
double that but have decided not to rebalance the data, instead we are
hoping that the disks will rebalance naturally through attrition and not
waste any valuable time or bandwidth moving data around.

On Thu, 2013-12-12 at 01:15 +0000, Scott Smith wrote: > We are about to abandon GlusterFS as a solution for our object storage
> needs.  I?m hoping to get some feedback to tell me whether we have
> missed something and are making the wrong decision.  We?re already a
> year into this project after evaluating a number of solutions.  I?d
> like not to abandon GlusterFS if we just misunderstand how it works.
> 
>  
> 
> Our use case is fairly straight forward.  We need to save a bunch of
> somewhat large files (1MB-100MB).  For the most part, these files are
> write once, read several times.  Our initial store is 80TB, but we
> expect to go to roughly 320TB fairly quickly.  After that, we expect
> to be adding another 80TB every few months.  We are using some COTS
> servers which we add in pairs; each server has 40TB of usable storage.
> We intend to keep two copies of each file.  We currently run 4TB
> bricks
> 
>  
> 
> In our somewhat limited test environment, GlusterFS seemed to work
> well.  And, our initial introduction of GlusterFS into our production
> environment went well.  We had our initial 2 server (80TB) cluster
> about 50% full and things seemed to be going well.  
> 
>  
> 
> Then we added another pair of servers (for a total of 160TB).  This
> went fine until we did the rebalance.  We were running 3.3.1.  We ran
> into the handle leak problem (which unfortunately we didn?t know about
> beforehand).  We also found that if any of the bricks went offline
> while the rebalance was going on, then files were lost or they lost
> their permissions.  We still don?t know why some of the bricks went
> offline, but they did and we have verified in our test environment
> that this is sufficient to cause the corruption problem.
> 
>  
> 
> The good news is that we think both of these problems got fixed in
> 3.4.1.  So why are we leaving?
> 
>  
> 
> In trying to figure out what was going on with our GlusterFS system
> after the disastrous rebalance, we ran across two posts.  The first
> one was
> http://hekafs.org/index.php/2012/03/glusterfs-algorithms-distribution/.  If
we understand it correctly, anytime you add new storage servers to your cluster,
you have to do a rebalance and that rebalance will require a minimum of 50% of
the data in the cluster to be moved to make the hashing algorithms work.  This
means that when we have a 320TB cluster and add another 80TB, we have to move at
least 160TB just to get things back into balance.  Our estimate is that that
will take months.  It probably won?t finish before we need to add another 80TB.
> 
>  
> 
> The other post we ran across was
>
http://www.gluster.org/community/documentation/index.php/Planning34/ElasticBrick.
This post seems to confirm our understanding of the rebalance.  It appears to be
a discussion of the rebalance problem and a possible solution.  It was
apparently discussed for 3.4, but didn?t make the cut.
> 
>  
> 
> I?d be happy to find out that we just got it wrong.  Tell me that
> rebalancing doesn?t work the way we think.  Or maybe we should
> configure things different or something.
> 
>  
> 
> My problem is that if GlusterFS isn?t good for starting with a small
> cluster (80TB) and growing over time to half a petabyte, what is the
> use case it is intended for?  Do you really have to start out with the
> amount of storage you think you?ll need in the long-run and just fill
> it up as you go?  That?s why I?m nervous about our understanding of
> the rebalance.  It?s hard to believe it works this way (at least from
> our perspective).
> 
>  
> 
> We have a lot of man hours into writing code and putting
> infrastructure in for GlusterFS.  We can likely reuse much of it for
> another system.  I would just like to know that we really do
> understand the rebalance and that it really works the way I described
> it before we start evaluating other object store solutions.
> 
>  
> 
> Comments?
> 
>  
> 
> Scott
> 
>  
> 
>  
> 
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users

James

2013-Dec-12 04:48 UTC

head link

[Gluster-users] Is Gluster the wrong solution for us?

On Wed, Dec 11, 2013 at 8:15 PM, Scott Smith <ssmith at
mainstreamdata.com> wrote:> In trying to figure out what was going on with our GlusterFS system after
> the disastrous rebalance, we ran across two posts.  The first one was
> http://hekafs.org/index.php/2012/03/glusterfs-algorithms-distribution/.  If
> we understand it correctly, anytime you add new storage servers to your
> cluster, you have to do a rebalance and that rebalance will require a
> minimum of 50% of the data in the cluster to be moved to make the hashing
> algorithms work.  This means that when we have a 320TB cluster and add
> another 80TB, we have to move at least 160TB just to get things back into
> balance.  Our estimate is that that will take months.  It probably won?t
> finish before we need to add another 80TB.
As other have alluded to / mentioned, you might want to add new
bricks, but _not_ run a straight rebalance. You might want to try
running _just_ fix layout, and letting files settle to approximate
equilibrium over time...

Possibly Parallel Threads

Search for more maybe matching threads

Gluster users - Dec 2013 - Is Gluster the wrong solution for us?

[Gluster-users] Is Gluster the wrong solution for us?

[Gluster-users] Is Gluster the wrong solution for us?

[Gluster-users] Is Gluster the wrong solution for us?

[Gluster-users] Is Gluster the wrong solution for us?

Possibly Parallel Threads